https://github.com/kundajelab/bpnet-manuscript

BPNet manuscript code.

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary

Last synced: 8 months ago · JSON representation

Repository

BPNet manuscript code.

Basic Info

Host: GitHub
Owner: kundajelab
Language: Jupyter Notebook
Default Branch: master
Size: 171 MB

Statistics

Stars: 11
Watchers: 3
Forks: 3
Open Issues: 2
Releases: 0

Created almost 7 years ago · Last pushed over 5 years ago

Metadata Files

Readme

BPNet manuscript

Code accompanying the BPNet manuscript.

If you want to use BPNet on your own data, please use the BPNet python package: https://github.com/kundajelab/bpnet.

Folder organization

basepair - python package (contains python functions/classes common across multiple notebooks)
src - scripts for running the experiments and producing the figures
- bpnet-pipeline - Train BPNet models, generate importance scores, run TF-MoDISco, get motif instances with CWM scanning
- motif-interactions - Generate the in silico motif interactions.
- comparison - Run ChExMix
- figures - Generate all the paper figures
data - data files
tests - Unit tests.

Reproducing the results

1. Setup the environment

Install miniconda or anaconda.
Install git-lfs: conda install -c conda-forge git-lfs && git lfs install
Clone this repository: git clone https://github.com/kundajelab/bpnet-manuscript.git && cd bpnet-manuscript
Run: git lfs pull '-I data/**'
Run: conda env create -f conda-env.yaml (if you want to use the GPU, rename tensorflow to tensorflow-gpu and make sure you have the correct CUDA version installed to run tensorflow 1.7 or 1.6). This will install a new conda environment bpnet-manuscript
Activate the environment: source activate bpnet-manuscript
Install the basepair python package for this repository : pip install -e .

To speed-up data-loading build vmtouch. This is used to load the bigWig files into system memory cache which allows multiple processes to access the bigWigs loaded into memory.

Here's how I install vmtouch:

```bash

~/bin = directory for localy compiled binaries

mkdir -p ~/bin
cd ~/bin

Clone and build

git clone https://github.com/hoytech/vmtouch.git vmtouchsrc cd vmtouchsrc make

Move the binary to ~/bin

cp vmtouch ../

Add ~/bin to $PATH

echo 'export PATH=$PATH:~/bin' >> ~/.bashrc ```

To make sure saving the Keras model in HDF5 file format works (https://github.com/h5py/h5py/issues/1082), add the following to your ~/.bashrc:

bash export HDF5_USE_FILE_LOCKING=FALSE

2. Download the data

First, make a directory on your machine:

bash mkdir -p bpnet-manuscript-data cd bpnet-manuscript-data 1

All the data will be downloaded to this directory. In the code-base, replace /oak/stanford/groups/akundaje/avsec/basepair/data/processed/comparison path with your the absolute path of bpnet-manuscript-data directory.

Download raw data

bash wget 'https://zenodo.org/record/3371164/files/output.tar.gz?download=1' -O output.tar.gz && tar xvfz output.tar.gz && rm output.tar.gz

Download outputs

bash wget 'https://zenodo.org/record/3371216/files/data.tar.gz?download=1' -O data.tar.gz && tar xvfz data.tar.gz && rm data.tar.gz

3. Run all scripts for which the main data were not provided

Compute the contribution score files (output/*/deeplift.imp_score.h5) as follows: bash source activate bpnet-manuscript for out in $(ls -d output/*/) do basepair imp-score-seqmodel ${out%%/} ${out%%/}/deeplift.imp_score.h5 \ --dataspec ${out%%/}/dataspec.yaml \ --gpu 0 \ --batch-size 16 \ --method deeplift \ --intp-pattern '*' \ --peak-width 1000} \ --seq-width 1000 \ --memfrac 1 \ --num-workers 5 \ --exclude-chr chrX,chrY done
Run chexmix
- Follow the instructions in src/comparison/README.md.

4. (Optional) Re-run the remaining computationally heavy scripts

These steps are optional as the output data were already downloaded in the previous step.

Train BPNet, compute contrib. scores, run TF-MoDISco, get motif instances
- Follow the instructions in src/bpnet-pipeline/README.md.
Filter motif instances
- Follow the instructions in src/figures/README.md.
Motif interaction analysis
- Follow the instructions in src/motif-interactions/README.md.

5. Generate figures

Execute notebooks in folder src/figures/README.md. Figures will be generated to data/figures.

Owner

Name: Kundaje Lab
Login: kundajelab
Kind: organization
Location: Stanford University

Website: http://anshul.kundaje.net
Repositories: 117
Profile: https://github.com/kundajelab

Compbio and machine learning code repositories from the Kundaje Lab at Stanford Genetics and Computer Science Depts.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/kundajelab/bpnet-manuscript

Science Score: 10.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

BPNet manuscript

Folder organization

Reproducing the results

1. Setup the environment

~/bin = directory for localy compiled binaries

Clone and build

Move the binary to ~/bin

Add ~/bin to $PATH

2. Download the data

Download raw data

Download outputs

3. Run all scripts for which the main data were not provided

4. (Optional) Re-run the remaining computationally heavy scripts

5. Generate figures

Owner

GitHub Events

Total

Last Year