goldspace
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Scientific Fields
Repository
Basic Info
- Host: GitHub
- Owner: Biogod2020
- License: other
- Language: Python
- Default Branch: main
- Size: 15.8 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
GoldSpace Project
This repository contains the source code for the SpaGLaM (Spatial Graph Large Model) project, including the training framework based on open_clip and a new, high-performance preprocessing pipeline.
spaglam-preproc: The Preprocessing Pipeline
A high-performance, single-pass data preprocessing pipeline designed for SpaGLaM. This tool efficiently converts spatial transcriptomics data (AnnData and histology images) into graph-based webdataset shards suitable for large-scale model training.
Features
- High-Performance Single Pass: Extracts image tiles and generates gene sentences on-the-fly, eliminating the I/O bottleneck of writing and reading millions of intermediate files.
- Flexible Output: Generate
webdatasetshards containing either raw data (.png,.txt) or pre-computed OmiCLIP embeddings (.pth), controlled by a simple config flag. - Versatile Image Support: Natively handles Whole-Slide Images (e.g.,
.svs,.tif), standard images (.png,.jpeg), and images embedded inAnnDataobjects. - Robust Quality Control: Includes pre-run validation checks, live progress monitoring, and automatically generates a final QC report and a visual sample grid for easy verification.
- User-Friendly Interface: A simple Command-Line Interface (CLI) driven by a clean YAML configuration file.
- Notebook-Ready: The core pipeline is encapsulated in a class, allowing for easy, interactive use and visualization within Jupyter notebooks.
Installation
It is recommended to install the project in editable mode. From the GoldSpace root directory:
1. Basic Installation (for training with existing data):
Install the base dependencies for training. ```bash pip install -e .
You may also need to install from your requirements files
pip install -r requirements.txt
Owner
- Name: Jiahao Ji
- Login: Biogod2020
- Kind: user
- Repositories: 1
- Profile: https://github.com/Biogod2020
A student from Fudan University. Especially interested in bioinformatics.
Citation (CITATION.cff)
cff-version: 1.1.0
message: If you use this software, please cite it as below.
authors:
- family-names: Ilharco
given-names: Gabriel
- family-names: Wortsman
given-names: Mitchell
- family-names: Wightman
given-names: Ross
- family-names: Gordon
given-names: Cade
- family-names: Carlini
given-names: Nicholas
- family-names: Taori
given-names: Rohan
- family-names: Dave
given-names: Achal
- family-names: Shankar
given-names: Vaishaal
- family-names: Namkoong
given-names: Hongseok
- family-names: Miller
given-names: John
- family-names: Hajishirzi
given-names: Hannaneh
- family-names: Farhadi
given-names: Ali
- family-names: Schmidt
given-names: Ludwig
title: OpenCLIP
version: v0.1
doi: 10.5281/zenodo.5143773
date-released: 2021-07-28
GitHub Events
Total
- Push event: 20
Last Year
- Push event: 20
Dependencies
- actions/cache v3 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/upload-artifact v4 composite
- actions/github-script v6 composite
- actions-ecosystem/action-regex-match v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- softprops/action-gh-release v1 composite
- ftfy *
- huggingface-hub *
- regex *
- safetensors *
- timm *
- torch >=1.9.0
- torchvision *
- tqdm *
- pytest ==7.2.0 test
- pytest-split ==0.8.0 test
- timm >=1.0.10 test
- transformers * test
- braceexpand *
- fsspec *
- ftfy *
- huggingface_hub *
- pandas *
- regex *
- safetensors *
- timm >=1.0.15
- torch >=1.9.0
- torchvision *
- tqdm *
- transformers *
- webdataset >=0.2.5,<=0.2.86
- ftfy *
- huggingface_hub *
- regex *
- safetensors *
- timm *
- torch >=1.9.0
- torchvision *
- tqdm *