speclet
A Bayesian hierarchical model to discover tissue-specific cancer driver genes and synthetic lethal interactions from CRISPR/Cas9 LoF screens.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary
Keywords
Repository
A Bayesian hierarchical model to discover tissue-specific cancer driver genes and synthetic lethal interactions from CRISPR/Cas9 LoF screens.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
speclet - A Bayesian hierarchical model to discover tissue-specific cancer driver genes and synthetic lethal interactions from CRISPR/Cas9 LoF screens

The speclet model accounts for cell line- and chromosome-specific differences while simultaneously measuring the effect of targeting each gene across multiple molecular covariates including copy number, mRNA expression, and mutation status. The effect of the presence of mutations to key driver and tumor suppressor genes is also included to identify putative synthetic lethal interactions. The results of this project have been published in Chapter 4 of my Ph.D. dissertation available here: "Studying the tissue-specificity of cancer driver genes through KRAS and genetic dependency screens" (link to come soon).
Setup
Many setup and running commands have been added as
makecommands. Runmake helpto see the options available.
Python virtual environments
There are two 'conda' environments for this project: the first speclet for modeling and analysis, the second speclet_smk for the pipelines.
They can be created using the following commands.
Here, we use 'mamba' as a drop-in replacement for 'conda' to speed up the installation process.
bash
conda install -n base -c conda-forge mamba
mamba env create -f conda.yaml
mamba env create -f conda_smk.yaml
Either environment can then be used like a normal 'conda' environment.
For example, below is the command it activate the speclet environment.
bash
conda activate speclet
Alternatively, the above commands can be accomplished using the make pyenvs command.
```bash
Same as above.
make pyenvs ```
On O2, because I don't have control over the base conda environment, I follow the incantations below for each environment:
bash
conda create -n speclet --yes -c conda-forge python=3.9 mamba
conda activate speclet && mamba env update --name speclet --file conda.yaml
In addition to that fun, there is also a problem with installing Python 3.10 on the installed version of conda, so I find I need to instead install 3.9 and then let the mamba install step update it.
GPU
Some additions to the environment need to be made in order to use a GPU for sampling from posterior distributions with the JAX backend in PyMC.
There are instructions provided on the JAX GitHub repo and the PyMC repo
First, the cuda and cudnn libraries need to be installed.
Second, a specific distribution of jax should be installed.
At the time of writing, the following commands work, but I would recommend consulting the two links above if doing this again in the future.
bash
mamba install --yes -c nvidia "cuda>=11.1" "cudnn>=8.2"
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
These commands have been added to the Makefile under the command make gpu.
Use the same commands with the speclet_smk environment active to be able to use the GPU in the pipelines.
R environment
The 'renv' package is used to manage the R packages. R is only used for data processing in this project. The environment can be setup in multiple ways. The first is by entering R and following the prompts to install the necessary packages. Another option is to install 'renv' and running its restore command, as shown below in the R console.
r
install.packages("renv")
renv::restore()
This can simply be accomplished with the following make command.
bash
make renv
Confirm installation
Installation of the Python virtual environment can be confirmed by running the 'speclet' test suite.
```bash conda activate speclet pytest
Alternatively
make test # or make test_o2 if on O2 HPC ```
Pre-commit
If you plan to work on the code in this project, I recommend install 'precommit' so that all git commits are first checked for various style and code features.
The package is included in the speclet virtual environment so you just need to run the following command once.
bash
precommit install
Configuration
Project configuration YAML
There are options for configuration in the "project-config.yaml" file. There are controls for various constants and parameters for analyses and pipelines. Most are intuitively named.
Environment variables
There is a required ".env" file that should be configured as follows.
text
PROJECT_ROOT=${PWD} # location of the root directory
PROJECT_CONFIG=${PROJECT_ROOT}/project-config.yaml # location of project config file
An optional global environment that is used by 'speclet' is AESARA_GCC_FLAG to set any desired Aesara gcc/g++ flags in the pipelines.
I need to have it set so that theano uses the correct gcc and blas modules when running in pipelines on O2 (see issue #151 for details).
Project organization
Data preparation
The data is downloaded to the "data/" directory and prepared in the "munge/" directory. The prepared data is available in "modeling_data/". Please see the READMEs in the respective directories for more information.
All of the data can be downloaded and prepared using the following commands.
bash
make download_data
make munge # or `make munge_o2` if on O2 HPC
Notebooks
Exploration and analyses are conducted in the "notebooks/" directory. Subdirectories divide related notebooks. See the README in that directory for further details.
Python Module
All shared Python code is contained in the "speclet/" directory. The installation of this directory as an editable module should be done automatically when the conda environment is created. If this failed, the module can be installed using the following command.
```python
Run only if the module was not automatically installed by conda.
pip install -e . ```
The modules are tested using 'pytest' see below for how to run the tests. They also conform to the 'black' and 'isort' formatters and make heavy use of Python's type-hinting system checked by 'mypy'. The functions are well documented using the Google documentation style and are checked by 'pydocstyle'.
Pipelines
All pipelines and associated files (e.g. configurations and runners) are in the "pipelines/" directory.
Each pipeline contains an associated bash script and make command that can be used to run the pipeline (usually on O2).
See the README in the "pipelines/" directory for more information.
Reports
Standardized reports are available in the "reports/" directory. Each analysis pipeline has a corresponding subdirectory in the reports directory. These notebooks are meant as quick, standardized reports to check on the results of a pipeline. More detailed analyses are in the "notebooks/" section.
Presentations
Presentations that involved this project are stored in the "presentations/" directory. More information is available in the README in that directory.
Testing
Tests in the "tests/" directory have been written against the modules in "speclet/" using 'pytest' and 'hypothesis'. They can be run using the following command.
```python
Run full test suite.
pytest
Or run the tests in two groups simultaneously.
make test # test_o2 on O2 HPC
```
The coverage report can be shown by adding the --cov="speclet" flag.
Some tests are slow because they involve the creation of models or sampling/fitting them.
These can be skipped using the -m "not slow" flag.
Some tests require the ability to construct plots (using the 'matplotlib' library), but not all platforms (notably the HMS research computing cluster) provide this ability.
These tests can be skipped using the -m "not plots" flag.
These tests are automatically run on GitHub Actions on pushes or PRs with the master git branch.
The most recent results can be seen here.
Running analyses
Pipelines
Each individual pipeline can be run through a bash script or a make command.
See the pipelines README for full details.
Notebooks
The notebooks contain the analyses of the models and additional exploration of the data and other model designs. See the "notebooks/" directory for information the running these analyses.
Full project build
The entire project can be installed from scratch and all analysis run with the following make command.
bash
make build # or `build_o2` on the O2 HPC
Owner
- Name: Kevin Haigis Lab at Dana-Farber Cancer Institute
- Login: Kevin-Haigis-Lab
- Kind: organization
- Email: kevin_haigis@dfci.harvard.edu
- Location: Boston, MA
- Website: https://www.haigislab.org
- Twitter: KevinHaigisLab
- Repositories: 4
- Profile: https://github.com/Kevin-Haigis-Lab
Our cancer biology work focuses largely on the influence of activating Ras mutations in the pathogenesis of colorectal cancer.
GitHub Events
Total
Last Year
Dependencies
- actions/checkout v3 composite
- conda-incubator/setup-miniconda v2 composite
- actions/checkout v2 composite
- conda-incubator/setup-miniconda v2 composite