https://github.com/kundajelab/bpnet-manuscript

BPNet manuscript code.

https://github.com/kundajelab/bpnet-manuscript

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.1%) to scientific vocabulary
Last synced: 4 months ago · JSON representation

Repository

BPNet manuscript code.

Basic Info
  • Host: GitHub
  • Owner: kundajelab
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 171 MB
Statistics
  • Stars: 11
  • Watchers: 3
  • Forks: 3
  • Open Issues: 2
  • Releases: 0
Created over 6 years ago · Last pushed about 5 years ago
Metadata Files
Readme

README.md

BPNet manuscript

Code accompanying the BPNet manuscript.

If you want to use BPNet on your own data, please use the BPNet python package: https://github.com/kundajelab/bpnet.

Folder organization

  • basepair - python package (contains python functions/classes common across multiple notebooks)
  • src - scripts for running the experiments and producing the figures
    • bpnet-pipeline - Train BPNet models, generate importance scores, run TF-MoDISco, get motif instances with CWM scanning
    • motif-interactions - Generate the in silico motif interactions.
    • comparison - Run ChExMix
    • figures - Generate all the paper figures
  • data - data files
  • tests - Unit tests.

Reproducing the results

1. Setup the environment

  1. Install miniconda or anaconda.
  2. Install git-lfs: conda install -c conda-forge git-lfs && git lfs install
  3. Clone this repository: git clone https://github.com/kundajelab/bpnet-manuscript.git && cd bpnet-manuscript
  4. Run: git lfs pull '-I data/**'
  5. Run: conda env create -f conda-env.yaml (if you want to use the GPU, rename tensorflow to tensorflow-gpu and make sure you have the correct CUDA version installed to run tensorflow 1.7 or 1.6). This will install a new conda environment bpnet-manuscript
  6. Activate the environment: source activate bpnet-manuscript
  7. Install the basepair python package for this repository : pip install -e .

To speed-up data-loading build vmtouch. This is used to load the bigWig files into system memory cache which allows multiple processes to access the bigWigs loaded into memory.

Here's how I install vmtouch:

```bash

~/bin = directory for localy compiled binaries

mkdir -p ~/bin
cd ~/bin

Clone and build

git clone https://github.com/hoytech/vmtouch.git vmtouchsrc cd vmtouchsrc make

Move the binary to ~/bin

cp vmtouch ../

Add ~/bin to $PATH

echo 'export PATH=$PATH:~/bin' >> ~/.bashrc ```

To make sure saving the Keras model in HDF5 file format works (https://github.com/h5py/h5py/issues/1082), add the following to your ~/.bashrc:

bash export HDF5_USE_FILE_LOCKING=FALSE

2. Download the data

First, make a directory on your machine:

bash mkdir -p bpnet-manuscript-data cd bpnet-manuscript-data 1

All the data will be downloaded to this directory. In the code-base, replace /oak/stanford/groups/akundaje/avsec/basepair/data/processed/comparison path with your the absolute path of bpnet-manuscript-data directory.

Download raw data

bash wget 'https://zenodo.org/record/3371164/files/output.tar.gz?download=1' -O output.tar.gz && tar xvfz output.tar.gz && rm output.tar.gz

Download outputs

bash wget 'https://zenodo.org/record/3371216/files/data.tar.gz?download=1' -O data.tar.gz && tar xvfz data.tar.gz && rm data.tar.gz

3. Run all scripts for which the main data were not provided

  1. Compute the contribution score files (output/*/deeplift.imp_score.h5) as follows: bash source activate bpnet-manuscript for out in $(ls -d output/*/) do basepair imp-score-seqmodel ${out%%/} ${out%%/}/deeplift.imp_score.h5 \ --dataspec ${out%%/}/dataspec.yaml \ --gpu 0 \ --batch-size 16 \ --method deeplift \ --intp-pattern '*' \ --peak-width 1000} \ --seq-width 1000 \ --memfrac 1 \ --num-workers 5 \ --exclude-chr chrX,chrY done
  2. Run chexmix

4. (Optional) Re-run the remaining computationally heavy scripts

These steps are optional as the output data were already downloaded in the previous step.

  1. Train BPNet, compute contrib. scores, run TF-MoDISco, get motif instances
  2. Filter motif instances
  3. Motif interaction analysis

5. Generate figures

Execute notebooks in folder src/figures/README.md. Figures will be generated to data/figures.

Owner

  • Name: Kundaje Lab
  • Login: kundajelab
  • Kind: organization
  • Location: Stanford University

Compbio and machine learning code repositories from the Kundaje Lab at Stanford Genetics and Computer Science Depts.

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1