https://github.com/boyle-lab/sempl

C++ implementation of the SEM algorithm

https://github.com/boyle-lab/sempl

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary

Keywords

bioinformatics
Last synced: 10 months ago · JSON representation

Repository

C++ implementation of the SEM algorithm

Basic Info
  • Host: GitHub
  • Owner: Boyle-Lab
  • Language: C
  • Default Branch: master
  • Size: 57.8 MB
Statistics
  • Stars: 15
  • Watchers: 12
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Topics
bioinformatics
Created over 9 years ago · Last pushed over 4 years ago

https://github.com/Boyle-Lab/SEMpl/blob/master/

# SEMpl
C++ implementation of the SEM algorithm

Sierra S Nishizaki, Natalie Ng, Shengcheng Dong, Robert S Porter, Cody Morterud, Colten Williams, Courtney Asman, Jessica A Switzenberg, Alan P Boyle, Predicting the effects of SNPs on transcription factor binding affinity, Bioinformatics, Volume 36, Issue 2, 15 January 2020, Pages 364372, https://doi.org/10.1093/bioinformatics/btz612

We have made all of the SEMs generated as part of this work available [here](SEMs/).

# System Requirements

## Hardware Requirements
Generation of a SEM requires variable RAM and disk storage based on the size of the initial PWM being considered. For minimal performance, we recommend a computer with the following specs:

RAM: 64+ GB  
CPU: 8+ cores, 3.4+ GHz/core

The runtime on this minimal system is approximately 38 CPU hours. Compile time is approximately 35 seconds.

## Software Requirements

The package development version is tested on *Linux* operating systems. The developmental version of the package has been tested on the following systems:

Linux: Ubuntu 18.04  
Packages: libcurl4-dev

## Demo

We include a small of generation of the SEM for HNF4A in HepG2 cells. Execution time of this demo is approximately 6791 seconds on 20 threads. The expected output is:
```
Running Iterative SEM building..
        PWM: examples/MA0114.1.pwm
        merge_file: examples/wgEncodeOpenChromDnaseHepg2Pk.narrowPeak.gz
        bigwig: examples/wgEncodeHaibTfbsHepg2Hnf4asc8987V0416101RawRep1.bigWig
        TF_name: HNF4A
         output: results/HNF4A/
        cachefile flag: results/HNF4A/HNF4A.cache.db
        verbose
....
```

# Installation
Clone a copy of the SEMpl repository and submodules:

```
git clone --recurse-submodules https://github.com/Boyle-Lab/SEMpl.git
```

Build external libraries:
```
cd SEMpl/lib/libBigWig
make
cd ..
make
mv */*.so .
cd ..
```

Symlink to bowtie index location (use your own index location):
```
ln -s /data/genomes/hg19/bowtie_index/ data
```

Build SEMpl
```
make
```
 
# Usage information
SEMpl runs as an iterative process and requires specific input files (need more details). The following example will build the SEM for HNF4a in HepG2 cells given the example data
```
./iterativeSEM -PWM examples/MA0114.1.pwm -merge_file examples/wgEncodeOpenChromDnaseHepg2Pk.narrowPeak -big_wig examples/wgEncodeHaibTfbsHepg2Hnf4asc8987V0416101RawRep1.bigWig -TF_name HNF4A -genome data/hg19 -output results/HNF4A
```

# Testing
Run "make test" to compile and run this input example.

Owner

  • Name: The Boyle Lab
  • Login: Boyle-Lab
  • Kind: organization
  • Email: apboyle@umich.edu
  • Location: University of Michigan

GitHub Events

Total
  • Release event: 1
  • Watch event: 1
  • Member event: 1
  • Fork event: 1
  • Create event: 1
Last Year
  • Release event: 1
  • Watch event: 1
  • Member event: 1
  • Fork event: 1
  • Create event: 1