speclet

A Bayesian hierarchical model to discover tissue-specific cancer driver genes and synthetic lethal interactions from CRISPR/Cas9 LoF screens.

https://github.com/kevin-haigis-lab/speclet

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary

Keywords

bayesian bayesian-data-analysis crispr-cas9 hierarchical-models jax linear-models mcmc pymc python statistical-models

Last synced: 6 months ago · JSON representation

Repository

A Bayesian hierarchical model to discover tissue-specific cancer driver genes and synthetic lethal interactions from CRISPR/Cas9 LoF screens.

Basic Info

Host: GitHub
Owner: Kevin-Haigis-Lab
License: gpl-3.0
Language: Jupyter Notebook
Default Branch: master
Homepage:
Size: 3.05 GB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Topics

bayesian bayesian-data-analysis crispr-cas9 hierarchical-models jax linear-models mcmc pymc python statistical-models

Created over 5 years ago · Last pushed about 3 years ago

Metadata Files

Readme License Citation

speclet - A Bayesian hierarchical model to discover tissue-specific cancer driver genes and synthetic lethal interactions from CRISPR/Cas9 LoF screens

speclet model diagram

The speclet model accounts for cell line- and chromosome-specific differences while simultaneously measuring the effect of targeting each gene across multiple molecular covariates including copy number, mRNA expression, and mutation status. The effect of the presence of mutations to key driver and tumor suppressor genes is also included to identify putative synthetic lethal interactions. The results of this project have been published in Chapter 4 of my Ph.D. dissertation available here: "Studying the tissue-specificity of cancer driver genes through KRAS and genetic dependency screens" (link to come soon).

Setup

Many setup and running commands have been added as make commands. Run make help to see the options available.

Python virtual environments

There are two 'conda' environments for this project: the first speclet for modeling and analysis, the second speclet_smk for the pipelines. They can be created using the following commands. Here, we use 'mamba' as a drop-in replacement for 'conda' to speed up the installation process.

bash conda install -n base -c conda-forge mamba mamba env create -f conda.yaml mamba env create -f conda_smk.yaml

Either environment can then be used like a normal 'conda' environment. For example, below is the command it activate the speclet environment.

bash conda activate speclet

Alternatively, the above commands can be accomplished using the make pyenvs command.

```bash

Same as above.

make pyenvs ```

On O2, because I don't have control over the base conda environment, I follow the incantations below for each environment:

bash conda create -n speclet --yes -c conda-forge python=3.9 mamba conda activate speclet && mamba env update --name speclet --file conda.yaml

In addition to that fun, there is also a problem with installing Python 3.10 on the installed version of conda, so I find I need to instead install 3.9 and then let the mamba install step update it.

GPU

Some additions to the environment need to be made in order to use a GPU for sampling from posterior distributions with the JAX backend in PyMC. There are instructions provided on the JAX GitHub repo and the PyMC repo First, the cuda and cudnn libraries need to be installed. Second, a specific distribution of jax should be installed. At the time of writing, the following commands work, but I would recommend consulting the two links above if doing this again in the future.

bash mamba install --yes -c nvidia "cuda>=11.1" "cudnn>=8.2" pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

These commands have been added to the Makefile under the command make gpu. Use the same commands with the speclet_smk environment active to be able to use the GPU in the pipelines.

R environment

The 'renv' package is used to manage the R packages. R is only used for data processing in this project. The environment can be setup in multiple ways. The first is by entering R and following the prompts to install the necessary packages. Another option is to install 'renv' and running its restore command, as shown below in the R console.

r install.packages("renv") renv::restore()

This can simply be accomplished with the following make command.

bash make renv

Confirm installation

Installation of the Python virtual environment can be confirmed by running the 'speclet' test suite.

```bash conda activate speclet pytest

Alternatively

make test # or make test_o2 if on O2 HPC ```

Pre-commit

If you plan to work on the code in this project, I recommend install 'precommit' so that all git commits are first checked for various style and code features. The package is included in the speclet virtual environment so you just need to run the following command once.

bash precommit install

Configuration

Project configuration YAML

There are options for configuration in the "project-config.yaml" file. There are controls for various constants and parameters for analyses and pipelines. Most are intuitively named.

Environment variables

There is a required ".env" file that should be configured as follows.

text PROJECT_ROOT=${PWD} # location of the root directory PROJECT_CONFIG=${PROJECT_ROOT}/project-config.yaml # location of project config file

An optional global environment that is used by 'speclet' is AESARA_GCC_FLAG to set any desired Aesara gcc/g++ flags in the pipelines. I need to have it set so that theano uses the correct gcc and blas modules when running in pipelines on O2 (see issue #151 for details).

Project organization

Data preparation

The data is downloaded to the "data/" directory and prepared in the "munge/" directory. The prepared data is available in "modeling_data/". Please see the READMEs in the respective directories for more information.

All of the data can be downloaded and prepared using the following commands.

bash make download_data make munge # or `make munge_o2` if on O2 HPC

Notebooks

Exploration and analyses are conducted in the "notebooks/" directory. Subdirectories divide related notebooks. See the README in that directory for further details.

Python Module

All shared Python code is contained in the "speclet/" directory. The installation of this directory as an editable module should be done automatically when the conda environment is created. If this failed, the module can be installed using the following command.

```python

Run only if the module was not automatically installed by conda.

pip install -e . ```

The modules are tested using 'pytest' see below for how to run the tests. They also conform to the 'black' and 'isort' formatters and make heavy use of Python's type-hinting system checked by 'mypy'. The functions are well documented using the Google documentation style and are checked by 'pydocstyle'.

Pipelines

All pipelines and associated files (e.g. configurations and runners) are in the "pipelines/" directory. Each pipeline contains an associated bash script and make command that can be used to run the pipeline (usually on O2). See the README in the "pipelines/" directory for more information.

Reports

Standardized reports are available in the "reports/" directory. Each analysis pipeline has a corresponding subdirectory in the reports directory. These notebooks are meant as quick, standardized reports to check on the results of a pipeline. More detailed analyses are in the "notebooks/" section.

Presentations

Presentations that involved this project are stored in the "presentations/" directory. More information is available in the README in that directory.

Testing

Tests in the "tests/" directory have been written against the modules in "speclet/" using 'pytest' and 'hypothesis'. They can be run using the following command.

```python

Run full test suite.

pytest

Or run the tests in two groups simultaneously.

make test # test_o2 on O2 HPC ```

The coverage report can be shown by adding the --cov="speclet" flag. Some tests are slow because they involve the creation of models or sampling/fitting them. These can be skipped using the -m "not slow" flag. Some tests require the ability to construct plots (using the 'matplotlib' library), but not all platforms (notably the HMS research computing cluster) provide this ability. These tests can be skipped using the -m "not plots" flag.

These tests are automatically run on GitHub Actions on pushes or PRs with the master git branch. The most recent results can be seen here.

Running analyses

Pipelines

Each individual pipeline can be run through a bash script or a make command. See the pipelines README for full details.

Notebooks

The notebooks contain the analyses of the models and additional exploration of the data and other model designs. See the "notebooks/" directory for information the running these analyses.

Full project build

The entire project can be installed from scratch and all analysis run with the following make command.

bash make build # or `build_o2` on the O2 HPC

Owner

Name: Kevin Haigis Lab at Dana-Farber Cancer Institute
Login: Kevin-Haigis-Lab
Kind: organization
Email: kevin_haigis@dfci.harvard.edu
Location: Boston, MA

Website: https://www.haigislab.org
Twitter: KevinHaigisLab
Repositories: 4
Profile: https://github.com/Kevin-Haigis-Lab

Our cancer biology work focuses largely on the influence of activating Ras mutations in the pathogenesis of colorectal cancer.

GitHub Events

Total

Last Year

Dependencies

.github/workflows/CI.yaml actions

actions/checkout v3 composite
conda-incubator/setup-miniconda v2 composite

.github/workflows/project-build.yaml actions

actions/checkout v2 composite
conda-incubator/setup-miniconda v2 composite

speclet

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

speclet - A Bayesian hierarchical model to discover tissue-specific cancer driver genes and synthetic lethal interactions from CRISPR/Cas9 LoF screens

Setup

Python virtual environments

Same as above.

GPU

R environment

Confirm installation

Alternatively

Pre-commit

Configuration

Project configuration YAML

Environment variables

Project organization

Data preparation

Notebooks

Python Module

Run only if the module was not automatically installed by conda.

Pipelines

Reports

Presentations

Testing

Run full test suite.

Or run the tests in two groups simultaneously.

Running analyses

Pipelines

Notebooks

Full project build

Owner

GitHub Events

Total

Last Year

Dependencies