Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 69% confidence
Last synced: 4 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: Biogod2020
  • License: other
  • Language: Python
  • Default Branch: main
  • Size: 15.8 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 6 months ago · Last pushed 5 months ago
Metadata Files
Readme Changelog License Citation

README.md

GoldSpace Project

This repository contains the source code for the SpaGLaM (Spatial Graph Large Model) project, including the training framework based on open_clip and a new, high-performance preprocessing pipeline.

spaglam-preproc: The Preprocessing Pipeline

A high-performance, single-pass data preprocessing pipeline designed for SpaGLaM. This tool efficiently converts spatial transcriptomics data (AnnData and histology images) into graph-based webdataset shards suitable for large-scale model training.

Features

  • High-Performance Single Pass: Extracts image tiles and generates gene sentences on-the-fly, eliminating the I/O bottleneck of writing and reading millions of intermediate files.
  • Flexible Output: Generate webdataset shards containing either raw data (.png, .txt) or pre-computed OmiCLIP embeddings (.pth), controlled by a simple config flag.
  • Versatile Image Support: Natively handles Whole-Slide Images (e.g., .svs, .tif), standard images (.png, .jpeg), and images embedded in AnnData objects.
  • Robust Quality Control: Includes pre-run validation checks, live progress monitoring, and automatically generates a final QC report and a visual sample grid for easy verification.
  • User-Friendly Interface: A simple Command-Line Interface (CLI) driven by a clean YAML configuration file.
  • Notebook-Ready: The core pipeline is encapsulated in a class, allowing for easy, interactive use and visualization within Jupyter notebooks.

Installation

It is recommended to install the project in editable mode. From the GoldSpace root directory:

1. Basic Installation (for training with existing data):

Install the base dependencies for training. ```bash pip install -e .

You may also need to install from your requirements files

pip install -r requirements.txt

Owner

  • Name: Jiahao Ji
  • Login: Biogod2020
  • Kind: user

A student from Fudan University. Especially interested in bioinformatics.

Citation (CITATION.cff)

cff-version: 1.1.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Ilharco
    given-names: Gabriel
  - family-names: Wortsman
    given-names: Mitchell
  - family-names: Wightman
    given-names: Ross
  - family-names: Gordon
    given-names: Cade   
  - family-names: Carlini
    given-names: Nicholas
  - family-names: Taori
    given-names: Rohan
  - family-names: Dave
    given-names: Achal
  - family-names: Shankar
    given-names: Vaishaal
  - family-names: Namkoong
    given-names: Hongseok
  - family-names: Miller
    given-names: John
  - family-names: Hajishirzi
    given-names: Hannaneh
  - family-names: Farhadi
    given-names: Ali
  - family-names: Schmidt
    given-names: Ludwig
title: OpenCLIP
version: v0.1
doi: 10.5281/zenodo.5143773
date-released: 2021-07-28

GitHub Events

Total
  • Push event: 20
Last Year
  • Push event: 20

Dependencies

.github/workflows/ci.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v4 composite
.github/workflows/clear-cache.yml actions
  • actions/github-script v6 composite
.github/workflows/python-publish.yml actions
  • actions-ecosystem/action-regex-match v2 composite
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • softprops/action-gh-release v1 composite
pyproject.toml pypi
  • ftfy *
  • huggingface-hub *
  • regex *
  • safetensors *
  • timm *
  • torch >=1.9.0
  • torchvision *
  • tqdm *
requirements-test.txt pypi
  • pytest ==7.2.0 test
  • pytest-split ==0.8.0 test
  • timm >=1.0.10 test
  • transformers * test
requirements-training.txt pypi
  • braceexpand *
  • fsspec *
  • ftfy *
  • huggingface_hub *
  • pandas *
  • regex *
  • safetensors *
  • timm >=1.0.15
  • torch >=1.9.0
  • torchvision *
  • tqdm *
  • transformers *
  • webdataset >=0.2.5,<=0.2.86
requirements.txt pypi
  • ftfy *
  • huggingface_hub *
  • regex *
  • safetensors *
  • timm *
  • torch >=1.9.0
  • torchvision *
  • tqdm *