chemcpa

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.

https://github.com/theislab/chemcpa

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
✓
Institutional organization owner
Organization theislab has institutional domain (www.helmholtz-muenchen.de)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.7%) to scientific vocabulary

Keywords

disentanglement drug-discovery genomics perturbation single-cell transfer-learning

Last synced: 4 months ago · JSON representation ·

Repository

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.

Basic Info

Host: GitHub
Owner: theislab
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage: https://arxiv.org/abs/2204.13545
Size: 234 MB

Statistics

Stars: 123
Watchers: 3
Forks: 30
Open Issues: 7
Releases: 0

Topics

disentanglement drug-discovery genomics perturbation single-cell transfer-learning

Created over 4 years ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution

Code accompanying the NeurIPS 2022 paper (PDF).

architecture of CCPA

Our talk on chemCPA at the M2D2 reading club is available here. A previous version of this work was a spotlight paper at ICLR MLDD 2022. Code for this previous version can be found under the v1.0 git tag.

Codebase overview

chemCPA/: contains the code for the model, the data, and the training loop.
embeddings: There is one folder for each molecular embedding model we benchmarked. Each contains an environment.yml with dependencies. We generated the embeddings using the provided notebooks and saved them to disk, to load them during the main training loop.
experiments: Each folder contains a README.md with the experiment description, a .yaml file with the seml configuration, and a notebook to analyze the results.
notebooks: Example analysis notebooks.
preprocessing: Notebooks for processing the data. For each dataset there is one notebook that loads the raw data.
tests: A few very basic tests.

All experiments where run through seml. The entry function is ExperimentWrapper.__init__ in chemCPA/seml_sweep_icb.py. For convenience, we provide a script to run experiments manually for debugging purposes at chemCPA/manual_seml_sweep.py. The script expects a manual_run.yaml file containing the experiment configuration.

All notebooks also exist as Python scripts (converted through jupytext) to make them easier to review.

Getting started

Environment

The easiest way to get started is to use a docker image we provide docker run -it -p 8888:8888 --platform=linux/amd64 registry.hf.space/b1ro-chemcpa:latest this image contains the source code and all dependencies to run the experiments. By default it runs a jupyter server on port 8888.

Alternatively you may clone this repository and setup your own environment by running:

python conda env create -f environment.yml python setup.py install -e .

Datasets

The datasets are not included in the docker image, but get automatically downloaded when you run the notebooks that require them. The datasets may alternatively be downloaded manually using the python tool in the raw_data/dataset.py folder. Usage is: python raw_data/dataset.py --list python raw_data/dataset.py --dataset <dataset_name>

or you may use the following links: - weight checkpoints - hyperparameter configuration - raw datasets - processed datasets - embeddings

Some of the notebooks use a drugbank_all.csv file, which can be downloaded from here (registration needed).

Data preparation

To train the models, first the raw data needs to be processed. This can be done by running the notebooks inside the preprocessing/ folder in a sequential order. Alternatively, you may run

python preprocessing/run_notebooks.py A description of the preprocessing steps is given in the preprocessing/README.md file and in the headers of individual notebooks. Section 4 of the paper is also highly relevant.

Training the models

Run python chemCPA/train_hydra.py

Citation

You can cite our work as:

@inproceedings{hetzel2022predicting, title={Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution}, author={Hetzel, Leon and Böhm, Simon and Kilbertus, Niki and Günnemann, Stephan and Lotfollahi, Mohammad and Theis, Fabian J}, booktitle={NeurIPS 2022}, year={2022} }

Owner

Name: Theis Lab
Login: theislab
Kind: organization
Email: icb.office@helmholtz-muenchen.de
Location: Munich

Website: https://www.helmholtz-muenchen.de/icb/
Repositories: 213
Profile: https://github.com/theislab

Institute of Computational Biology

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Predicting Cellular Responses to Novel Drug
  Perturbations at a Single-Cell Resolution
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Leon
    family-names: Hetzel
  - given-names: Simon
    family-names: Boehm
  - given-names: Niki
    family-names: Kilbertus
  - given-names: Stephan
    family-names: Günnemann
  - given-names: Mohammad
    family-names: Lotfollahi
  - given-names: Fabian
    name-particle: J
    family-names: Theis
identifiers:
  - type: url
    value: 'https://neurips.cc/virtual/2022/poster/53227'
repository-code: 'https://github.com/theislab/chemCPA'
abstract: >+
  Single-cell transcriptomics enabled the study of
  cellular heterogeneity in response to perturbations
  at the resolution of individual cells. However,
  scaling high-throughput screens (HTSs) to measure
  cellular responses for many drugs remains a
  challenge due to technical limitations and, more
  importantly, the cost of such multiplexed
  experiments. Thus, transferring information from
  routinely performed bulk RNA HTS is required to
  enrich single-cell data meaningfully.We introduce
  chemCPA, a new encoder-decoder architecture to
  study the perturbational effects of unseen drugs.
  We combine the model with an architecture surgery
  for transfer learning and demonstrate how training
  on existing bulk RNA HTS datasets can improve
  generalisation performance. Better generalisation
  reduces the need for extensive and costly screens
  at single-cell resolution. We envision that our
  proposed method will facilitate more efficient
  experiment designs through its ability to generate
  in-silico hypotheses, ultimately accelerating drug
  discovery.

keywords:
  - transfer learning
  - disentanglement
  - perturbation
  - single cell
  - genomics
  - Drug Discovery
  - unsupervised

GitHub Events

Total

Issues event: 18
Watch event: 19
Member event: 1
Issue comment event: 9
Push event: 8
Pull request event: 2
Fork event: 6
Create event: 2

Last Year

Issues event: 18
Watch event: 19
Member event: 1
Issue comment event: 9
Push event: 8
Pull request event: 2
Fork event: 6
Create event: 2

Issues and Pull Requests

Last synced: 4 months ago

All Time

Total issues: 14
Total pull requests: 1
Average time to close issues: 12 months
Average time to close pull requests: 1 minute
Total issue authors: 9
Total pull request authors: 1
Average comments per issue: 1.79
Average comments per pull request: 1.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 5
Pull requests: 1
Average time to close issues: about 24 hours
Average time to close pull requests: 1 minute
Issue authors: 2
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 1.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

bhomass (9)
hraeder41 (3)
sepidism (2)
ChangxiChi (2)
xianglin226 (1)
xDogBaby (1)
ArturDev42 (1)
wangyucheng1234 (1)
Tigerrr07 (1)
ceesu (1)
rvinas (1)

Pull Request Authors

MxMstrmn (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

embeddings/grover/environment.yml conda

boost 1.68.0
boost-cpp 1.68.0
descriptastorus 2.2.0
jupyter
numpy 1.16.4
numpy-base 1.16.4
pandas 0.25.0
pyarrow
python 3.6.8
pytorch 1.1.0
rdkit 2019.03.4.0
readline 7.0
scanpy
scikit-learn 0.21.2
scipy 1.3.0
tensorboard 1.13.1
torchvision 0.3.0
tqdm 4.32.1
typing 3.6.4

embeddings/jtvae/environment.yml conda

cudatoolkit 10.2.*
dgl-cuda10.2
jupyter
pip
pyarrow
python
pytorch
rdkit 2018.09.3.*
seml
tqdm

embeddings/grover/requirements.txt pypi

boost =1.68.0=py36h8619c78_1001
boost-cpp =1.68.0=h11c811c_1000
descriptastorus =2.2.0=py_0
numpy =1.16.4=py36h7e9f1db_0
numpy-base =1.16.4=py36hde5b4d6_0
pandas =0.25.0=py36hb3f55d8_0
python =3.6.8=h0371630_0
pytorch =1.1.0=py3.6_cuda9.0.176_cudnn7.5.1_0
rdkit =2019.03.4.0=py36hc20afe1_1
readline =7.0=h7b6447c_5
scikit-learn =0.21.2=py36hcdab131_1
scipy =1.3.0=py36h921218d_1
tensorboard =1.13.1=py36_0
torchvision =0.3.0=py36_cu9.0.176_1
tqdm =4.32.1=py_0
typing =3.6.4=py36_0

embeddings/seq2seq/environment.yml conda

deepchem 2.5.0
jupyter
pandas
pip
pyarrow
rdkit

environment.yml conda

adjusttext
bokeh
colorcet
cudatoolkit 11.3.*
datashader
deepchem
dgl-cuda11.3
h5py <3.2
holoviews
jupyter
jupytext
matplotlib
numpy
pandas
pip
pre-commit
py-spy
pyarrow
pytest
python 3.7.*
pytorch
rdkit 2021.09.2.*
scanpy
scikit-image
scikit-learn
scipy
seaborn
seml
submitit
torchmetrics
umap-learn

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science