https://github.com/braingeneers/sims

SIMS: Scalable, Interpretable Models for Cell Annotation of large scale single-cell RNA-seq data

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

SIMS: Scalable, Interpretable Models for Cell Annotation of large scale single-cell RNA-seq data

Basic Info

Host: GitHub
Owner: braingeneers
License: mit
Language: Python
Default Branch: main
Size: 33.9 MB

Statistics

Stars: 9
Watchers: 6
Forks: 9
Open Issues: 1
Releases: 0

Created over 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

SIMS: Scalable, Interpretable Modeling for Single-Cell RNA-Seq Data Classification

SIMS is a pipeline for building interpretable and accurate classifiers for identifying any target on single-cell rna-seq data. The SIMS model is based on a sequential transformer, a transformer model specifically built for large-scale tabular datasets.

SIMS takes in a list of arbitrarily many expression matrices along with their corresponding target variables. We assume the matrix form cell x gene, and NOT gene x cell, since our training samples are the transcriptomes of individual cells.

The code is run with python. To use the package, we recommend using a virtual environment such as miniconda which will allow you to install packages without harming your system python.

Installation

If using conda, run 1. Create a new virtual environment with conda create --name=<NAME> python=3.9 2. Enter into your virtual environment with conda activate NAME

Otherwise, enter your virtual environment of choice and 1. Install the SIMS package with pip install --use-pep517 git+https://github.com/braingeneers/SIMS.git 2. Set up the model training code in a MYFILE.py file, and run it with python MYFILE.py. A tutorial on how to set up training code is shown below.

Training and inference

The sims library uses a cell-by-gene matrix. This means our input data to the model should be an (M, N) matrix of M cells with expression levels across N different genes. The data should be log1p normalized before model training and model inference.

To train a model, we can set up a SIMS class in the following way:

```python from scsims import SIMS from pytorch_lightning.loggers import WandbLogger import scanpy as sc

logger = WandbLogger(offline=True)

data = an.read_h5ad('mydata.h5ad')

Perform some light filtering

sc.pp.filtercells(adata, mingenes=100) sc.pp.filtergenes(adata, mincells=3)

Transform the data for model ingestion

sc.pp.normalize_total(adata)#Normalize counts per cell sc.pp.log1p(adata) ### Logarithmizing the data sc.pp.scale(adata) #Scale mean to zero and variance to 1

sims = SIMS(data=data, classlabel='classlabel') sims.setup_trainer(accelerator="gpu", devices=1, logger=logger) sims.train() ```

This will set up the underlying dataloaders, model, model checkpointing, and everything else we need. Model checkpoints will be saved every training epoch.

To load in a model to infer new cell types on an unlabeled dataset, we load in the model checkpoint, point to the label file that we originally trained on, and run the predict method on new data.

```python sims = SIMS(weights_path='myawesomemodel.ckpt')# If the model has been trained on GPU move the weights to CPU, this is the case for our pretrained models

SIMS(weightspath=checkpointpath,map_location=torch.device('cpu'))

unlabeleddata = an.readh5ad('my/new/unlabeled.h5ad')

Process the data the same way you processed the training data. For all our pretrained models we followed this steps.

sc.pp.filtercells(unlabeleddata, mingenes=100) sc.pp.filtergenes(unlabeleddata, mincells=3)

Transform the data for model ingestion

sc.pp.normalizetotal(unlabeleddata)#Normalize counts per cell sc.pp.log1p(unlabeleddata) ### Logarithmizing the data sc.pp.scale(unlabeleddata) #Scale mean to zero and variance to 1

Perform the predictions

cellpredictions = sims.predict(unlabeleddata) ```

Finally, to look at the explainability of the model, we similarly run python explainability_matrix = sims.explain('my/new/unlabeled.h5ad') # this can also be labeled data, of course

Custom training jobs / logging

To customize the underlying pl.Trainer and SIMS model params, we can initialize the SIMS model like ```python from pytorchlightning.loggers import WandbLogger from pytorchlightning.callbacks import EarlyStopping, LearningRateMonitor from scsims import SIMS import anndata as an

adata = an.readh5ad("mylabeleddata.h5ad") # can read h5 using anndata as well wandblogger = WandbLogger(project=f"My Project", name=f"SIMS Model Training") # set up the logger to log data to Weights and Biases

sims = SIMS(data=adata, classlabel='classlabel') sims.setupmodel(na=64, nd=64, weights=sims.weights) # weighting loss inversely proportional by label freq, helps learn rare cell types (recommended) sims.setuptrainer( logger=wandblogger, callbacks=[ EarlyStopping( monitor="valloss", patience=50, ), LearningRateMonitor(logginginterval="epoch"), ], numepochs=100, ) sims.train() ``This will train the SIMS model on the given expression matrices with target variable given by theclass_label` column in each label file.

Using SIMS inside github codespaces

If you are using SIMS only for predictions using an already trained model, github codespaces is the recommended way to use this tool. You can also use this pipeline to train it in smaller datasets as the computing services offered in codespaces are modest. To use this tool in github codespaces start by forking the repo in your github account. Then create a new codespace with the SIMS repo as the Repository of choice. Once inside the newly created environment pull the latest SIMS image: docker docker pull jmlehrer/sims:latest Run the docker container mounting the file folder containing datasets and model checkpoints to the filesystem: docker docker run -it -v /path/to/local/folder:/path/in/container [image_name] /bin/bash Run main.py to check if the installation has been completed. You can alter this file as shown above to perform the different tasks. bash python main.py

Owner

Name: braingeneers
Login: braingeneers
Kind: organization

Repositories: 15
Profile: https://github.com/braingeneers

GitHub Events

Total

Issues event: 4
Watch event: 5
Issue comment event: 5
Push event: 6
Pull request event: 2
Fork event: 2

Last Year

Issues event: 4
Watch event: 5
Issue comment event: 5
Push event: 6
Pull request event: 2
Fork event: 2

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 1
Total pull requests: 0
Average time to close issues: 29 days
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 3.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: 29 days
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 3.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

nick-youngblut (1)
jennbparker (1)

Pull Request Authors

JesusGF1 (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 61 last-month
Total docker downloads: 1,623

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 20
Total maintainers: 1

pypi.org: scsims

Scalable, Interpretable Deep Learning for Single-Cell RNA-seq Classification

Homepage: https://github.com/braingeneers/sims
Documentation: https://scsims.readthedocs.io/
License: MIT license
Latest release: 3.0.6
published over 2 years ago

Versions: 20
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 61 Last month
Docker Downloads: 1,623

Rankings

Docker downloads count: 3.8%

Dependent packages count: 10.0%

Downloads: 15.2%

Average: 16.6%

Forks count: 16.8%

Dependent repos count: 21.7%

Stargazers count: 31.9%

Maintainers (1)

jlehrer1

Last synced: 10 months ago

Dependencies

Dockerfile docker

anibali/pytorch 1.10.2-cuda11.3 build

requirements.txt pypi

GitPython ==3.1.31
Jinja2 ==3.1.2
MarkupSafe ==2.1.2
Pillow ==9.5.0
PyJWT ==2.6.0
PyYAML ==6.0
Pygments ==2.15.1
aiohttp ==3.8.4
aiosignal ==1.3.1
anndata ==0.9.1
anyio ==3.6.2
appdirs ==1.4.4
arrow ==1.2.3
async-timeout ==4.0.2
attrs ==23.1.0
beautifulsoup4 ==4.12.2
blessed ==1.20.0
boto3 ==1.26.130
botocore ==1.29.130
certifi ==2023.5.7
charset-normalizer ==3.1.0
click ==8.1.3
contourpy ==1.0.7
croniter ==1.3.14
cycler ==0.11.0
dateutils ==0.6.12
deepdiff ==6.3.0
docker-pycreds ==0.4.0
fastapi ==0.88.0
fonttools ==4.39.3
fortran-language-server ==1.12.0
frozenlist ==1.3.3
fsspec ==2023.5.0
gitdb ==4.0.10
h11 ==0.14.0
h5py ==3.8.0
idna ==3.4
importlib-resources ==5.12.0
inquirer ==3.1.3
itsdangerous ==2.1.2
jmespath ==1.0.1
joblib ==1.2.0
kiwisolver ==1.4.4
lightning ==2.0.2
lightning-cloud ==0.5.34
lightning-utilities ==0.8.0
llvmlite ==0.40.0
markdown-it-py ==2.2.0
matplotlib ==3.7.1
mdurl ==0.1.2
multidict ==6.0.4
natsort ==8.3.1
networkx ==3.1
numba ==0.57.0
numpy ==1.24.3
ordered-set ==4.1.0
packaging ==23.1
pandas ==2.0.1
pathtools ==0.1.2
patsy ==0.5.3
protobuf ==4.23.0
psutil ==5.9.5
pydantic ==1.10.7
pynndescent ==0.5.10
pyparsing ==3.0.9
python-dateutil ==2.8.2
python-editor ==1.0.4
python-multipart ==0.0.6
pytorch-lightning ==2.0.2
pytorch-tabnet ==4.0
pytz ==2023.3
readchar ==4.0.5
requests ==2.30.0
rich ==13.3.5
s3transfer ==0.6.1
scanpy ==1.9.3
scikit-learn ==1.2.2
scipy ==1.10.1
seaborn ==0.12.2
sentry-sdk ==1.22.2
session-info ==1.0.0
setproctitle ==1.3.2
six ==1.16.0
smmap ==5.0.0
sniffio ==1.3.0
soupsieve ==2.4.1
starlette ==0.22.0
starsessions ==1.3.0
statsmodels ==0.14.0
stdlib-list ==0.8.0
threadpoolctl ==3.1.0
torch ==1.13.1
torchmetrics ==0.11.4
tqdm ==4.65.0
traitlets ==5.9.0
typing_extensions ==4.5.0
tzdata ==2023.3
umap-learn ==0.5.3
urllib3 ==1.26.15
uvicorn ==0.22.0
wandb ==0.15.2
wcwidth ==0.2.6
websocket-client ==1.5.1
websockets ==11.0.3
yarl ==1.9.2
zipp ==3.15.0

setup.py pypi

https://github.com/braingeneers/sims

Science Score: 23.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

SIMS: Scalable, Interpretable Modeling for Single-Cell RNA-Seq Data Classification

Installation

Training and inference

Perform some light filtering

Transform the data for model ingestion

SIMS(weightspath=checkpointpath,map_location=torch.device('cpu'))

Process the data the same way you processed the training data. For all our pretrained models we followed this steps.

Transform the data for model ingestion

Perform the predictions

Custom training jobs / logging

Using SIMS inside github codespaces

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: scsims

Rankings

Maintainers (1)

Dependencies