https://github.com/broadinstitute/celligner2

A new version of the celligner package using VAEs. Inspired from the Theis lab's scArches.

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: biorxiv.org, nature.com
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.3%) to scientific vocabulary

Keywords

alignment cancer-genomics machine-learning omics rna-seq vae-pytorch

Last synced: 5 months ago · JSON representation

Repository

A new version of the celligner package using VAEs. Inspired from the Theis lab's scArches.

Basic Info

Host: GitHub
Owner: broadinstitute
License: unlicense
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 251 MB

Statistics

Stars: 5
Watchers: 11
Forks: 2
Open Issues: 0
Releases: 0

Topics

alignment cancer-genomics machine-learning omics rna-seq vae-pytorch

Created over 4 years ago · Last pushed over 3 years ago

Metadata Files

Readme Changelog Contributing Funding License

celligner2

Created by Jrmie Kalfon @jkobject (BroadInsitute, Celligner2 is a new version of the celligner tool to align cancer transcriptomics data through tumors and models. Find out more about celligner1 here: Global computational alignment of tumor and cell line transcriptional profiles

This method is based on the trVAE/scArches method from the Theis Lab and adds multiple features to improve its performance for our needs. Amongst those:

Semi-supervision to classify cell type and any other feature provided. This improves the latent space and makes the model focus on what the researcher is interested about.
Improved surgery by allowing to increase model size and freezing trained weight.
Multi dataset MMD on latent space together with better batch mixing. These are improvements to method already there and allows the user to :
- have multiple dataset at once.
- perform better correction when large bath effects exist. (e.g. between Cancer cell lines and frozen tumor tissues)
Explainable AI tools like LRP with GSEA to look at pathway enrichment to understand the features the model is looking at to make a prediction.
QC methods: getting at quality (using scIB). making interactive umap plots. looking at reconstruction, classifications and more..

A next phase of development regards the addition of the expimap_mode. In this mode we have copied the code coming from expimap so that the model can use a different latent space, based on gene sets and a decoder that is replaced by a linear model masked by the genes in each gene set. references to the expimap mode can be seen in places with the #expimap comment. only a partial implementation of that was made. This means some arguments and functions have been copied from the expimap ode and started to be used and adapted to the Celligner2 codebase. Running it currently would yields bugs as this is not finished. Some references to the graph NN model or improvements to the architectural surgery might be seen in the code and don't have functional implications yet.

More about the model on this presentation: Celligner2.0 Update

Install it

bash git clone https://github.com/broadinstitute/celligner2.git cd .. pip install -e .

pypi

/!\ not functional yet

bash pip install celligner2

Usage

For information on usage please see the different notebooks in runs/. Unfortunately a general demo notebook is not yet present. The latest version of the run is in -v4.ipynb.

For information about data generation please see the data/ folder.

```py from celligner2 import BaseClass from celligner2 import base_function

BaseClass().basemethod() basefunction() ```

/!\ not functional yet

```bash $ python -m celligner2

or

$ celligner2 ```

About the Code

The code model is the one used by pytorch and the Theis lab. More can be understood by looking at the code and the usage in the notebook Some base model functions are implemented as different class (othermodels/base/_base.py) to be extended by the model/celligner2model.py. This file contains the full definition of the model (with the training, data management and some usage). The model architecture however is listed in the model/celligner2.py file. additional key model functions are model/modules and model/losses. The training definition is in trainers/celligner2/trainer.py which is extended by trainers/celligner2/semisupervised.py. Dataset management (encoding / preprocessing etc..) is defined in dataset/ and dataset/celligner2/_ . Finally, plotting/ contains plotting/celligner2eval.py which is the evaluator of the model. it expects a trained celligner model and can produce many plots and evaluation of the model, including things related to its use post training, that would be better placed in the model/celligner2model.py file.

The definition of things as /base and /celligner2 is made because initially scArches is a reimplementation of many models where each is reusing and reimplementing base modules/tools. We decided to keep it this way for ease of use / collaboration with the Theis lab.

Development

Read the CONTRIBUTING.md file.

Current ongoing tasks are in the Asana project: Celligner in the Celligner2 section.

Owner

Name: Broad Institute
Login: broadinstitute
Kind: organization
Location: Cambridge, MA

Website: http://www.broadinstitute.org/
Twitter: broadinstitute
Repositories: 1,083
Profile: https://github.com/broadinstitute

Broad Institute of MIT and Harvard

GitHub Events

Total

Watch event: 1
Fork event: 1

Last Year

Watch event: 1
Fork event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/broadinstitute/celligner2

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

celligner2

Install it

pypi

Usage

or

About the Code

Development

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels