https://github.com/kundajelab/bpnet-manuscript
BPNet manuscript code.
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary
Repository
BPNet manuscript code.
Basic Info
- Host: GitHub
- Owner: kundajelab
- Language: Jupyter Notebook
- Default Branch: master
- Size: 171 MB
Statistics
- Stars: 11
- Watchers: 3
- Forks: 3
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
BPNet manuscript
Code accompanying the BPNet manuscript.
If you want to use BPNet on your own data, please use the BPNet python package: https://github.com/kundajelab/bpnet.
Folder organization
basepair- python package (contains python functions/classes common across multiple notebooks)src- scripts for running the experiments and producing the figuresbpnet-pipeline- Train BPNet models, generate importance scores, run TF-MoDISco, get motif instances with CWM scanningmotif-interactions- Generate the in silico motif interactions.comparison- Run ChExMixfigures- Generate all the paper figures
data- data filestests- Unit tests.
Reproducing the results
1. Setup the environment
- Install miniconda or anaconda.
- Install git-lfs:
conda install -c conda-forge git-lfs && git lfs install - Clone this repository:
git clone https://github.com/kundajelab/bpnet-manuscript.git && cd bpnet-manuscript - Run:
git lfs pull '-I data/**' - Run:
conda env create -f conda-env.yaml(if you want to use the GPU, renametensorflowtotensorflow-gpuand make sure you have the correct CUDA version installed to run tensorflow 1.7 or 1.6). This will install a new conda environmentbpnet-manuscript - Activate the environment:
source activate bpnet-manuscript - Install the
basepairpython package for this repository :pip install -e .
To speed-up data-loading build vmtouch. This is used to load the bigWig files into system memory cache which allows multiple processes to access the bigWigs loaded into memory.
Here's how I install vmtouch:
```bash
~/bin = directory for localy compiled binaries
mkdir -p ~/bin
cd ~/bin
Clone and build
git clone https://github.com/hoytech/vmtouch.git vmtouchsrc cd vmtouchsrc make
Move the binary to ~/bin
cp vmtouch ../
Add ~/bin to $PATH
echo 'export PATH=$PATH:~/bin' >> ~/.bashrc ```
To make sure saving the Keras model in HDF5 file format works (https://github.com/h5py/h5py/issues/1082), add the following to your ~/.bashrc:
bash
export HDF5_USE_FILE_LOCKING=FALSE
2. Download the data
First, make a directory on your machine:
bash
mkdir -p bpnet-manuscript-data
cd bpnet-manuscript-data
1
All the data will be downloaded to this directory. In the code-base, replace /oak/stanford/groups/akundaje/avsec/basepair/data/processed/comparison path with your the absolute path of bpnet-manuscript-data directory.
Download raw data
bash
wget 'https://zenodo.org/record/3371164/files/output.tar.gz?download=1' -O output.tar.gz && tar xvfz output.tar.gz && rm output.tar.gz
Download outputs
bash
wget 'https://zenodo.org/record/3371216/files/data.tar.gz?download=1' -O data.tar.gz && tar xvfz data.tar.gz && rm data.tar.gz
3. Run all scripts for which the main data were not provided
- Compute the contribution score files (
output/*/deeplift.imp_score.h5) as follows:bash source activate bpnet-manuscript for out in $(ls -d output/*/) do basepair imp-score-seqmodel ${out%%/} ${out%%/}/deeplift.imp_score.h5 \ --dataspec ${out%%/}/dataspec.yaml \ --gpu 0 \ --batch-size 16 \ --method deeplift \ --intp-pattern '*' \ --peak-width 1000} \ --seq-width 1000 \ --memfrac 1 \ --num-workers 5 \ --exclude-chr chrX,chrY done - Run chexmix
- Follow the instructions in src/comparison/README.md.
4. (Optional) Re-run the remaining computationally heavy scripts
These steps are optional as the output data were already downloaded in the previous step.
- Train BPNet, compute contrib. scores, run TF-MoDISco, get motif instances
- Follow the instructions in src/bpnet-pipeline/README.md.
- Filter motif instances
- Follow the instructions in src/figures/README.md.
- Motif interaction analysis
- Follow the instructions in src/motif-interactions/README.md.
5. Generate figures
Execute notebooks in folder src/figures/README.md. Figures will be generated to data/figures.
Owner
- Name: Kundaje Lab
- Login: kundajelab
- Kind: organization
- Location: Stanford University
- Website: http://anshul.kundaje.net
- Repositories: 117
- Profile: https://github.com/kundajelab
Compbio and machine learning code repositories from the Kundaje Lab at Stanford Genetics and Computer Science Depts.
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1