CRED

CRED: a rapid peak caller for Chem-seq data - Published in JOSS (2019)

https://github.com/jlincbio/cred

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org, zenodo.org
○
Committers with academic emails
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Last synced: 9 months ago · JSON representation

Repository

Chem-seq Read Enrichment Discovery: CRED

Basic Info

Host: GitHub
Owner: jlincbio
License: gpl-3.0
Language: C
Default Branch: master
Homepage:
Size: 156 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created about 7 years ago · Last pushed over 6 years ago

Metadata Files

Readme License

Chem-seq Read Enrichment Discovery: CRED

Chem-seq Read Enrichment Discovery (CRED) is a simple peak caller written in C for identifying non-canonical feature enrichments in paired Chem-seq data.

04/10/2019 initial commit/JOSS submission
05/04/2019 JOSS peer review completed
05/08/2019 formal publication in JOSS

Installation

CRED requires a compatible compiler (e.g. GCC), and utilizes HTSlib to process BAM files. Please clone the repo inside CRED before compiling.

Sample installation: git clone https://github.com/jlincbio/cred.git cd cred && git clone https://github.com/samtools/htslib.git make # or "make mac" if macOS export INSTALL_PREFIX=/usr/local/bin make PREFIX=$INSTALL_PREFIX install Notes: 1. For macOS systems, replace make with make mac. Alternatively, enter htslib/ and invoke make. 2. For troubleshooting, make clean will also clean the HTSlib directory. 3. use PREFIX to specify a destination to install CRED.

Inputs and Parameters

CRED: Chem-seq Read Enrichment Discovery (Version 0.1, Apr 2019 Initial Release) Command : cred [options] -t TREATMENT.BAM -c CONTROL.BAM > OUTPUT.BED Required : -t TREATMENT.BAM Path to the treatment ("pulldown") track [BAM] -c CONTROL.BAM Path to the control ("input") Chem-seq track [BAM] Optional : -p [P-VALUE] Significance level [default 0.0001] -q [SCORE] Minimum MAPQ quality for reads to count [default 30] -w [INTEGER] Size of differential windows [default 1200 bp] -k Evaluate site significance with Kolmogorov-Smirnov Reminders: 1. BAM files must be sorted and indexed. 2. Use a pipe (">") to capture CRED output.

BAM files for the treatment and control BAM's, respectively: use -t and -c to specify the pair. BAM's should be aligned, coordinate-sorted AND indexed (both .bam and .bam.bai should be present).
Quality score cutoff (MAPQ, option -q): this is the minimum required mapping quality score as defined in the SAM format specification. CRED defaults to 30.
Significance (alpha) level (option -p): please specify this either as a decimal (e.g. 0.0001) or a fraction (e.g. 1/10000). CRED will check all regions against this cutoff and output only features more significant than this predefined alpha level.
Size of sliding windows may also be adjusted with option -w; defaults to 1200 bp (maximum size used in Lin et al. PLoS ONE 2016).
Method of evaluating the significance of enrichment may also be modified from Welch's t-test (default) to Kolmogorov-Smirnov by the -k toggle.

A helper program, "Batch CRED" or BCRED, is also supplied here for running CRED on more hardware-limited systems (e.g., those without access to a lot of RAM). BCRED is written in Perl, and requires samtools as well as Parallel::ForkManager (a Perl module available on CPAN) for operation; this will split the input BAM pairs per chromosome, dispatch CRED calls, and merge the result. Multithreading support is also made possible via Parallel::ForkManager.

To launch BCRED, look for bcred in the same folder as cred. The inputs and parameters are largely the same as cred:

bcred: a batch assistant for CRED Ver. 0.1 (Apr 2019 Initial Release) Usage: bcred [options] -t TREATMENT.BAM -c CONTROL.BAM > OUTPUT.BED Required : -t TREATMENT.BAM Path to the treatment ("pulldown") track [BAM] -c CONTROL.BAM Path to the control ("input") Chem-seq track [BAM] Optional : -p [P-VALUE] Significance level [default 0.0001] -q [SCORE] Minimum MAPQ quality for reads to count [default 30] -w [INTEGER] Size of differential windows [default 1200 bp] -n [INTEGER] Number of threads to utilize [default 1] -k Evaluate site significance with Kolmogorov-Smirnov Reminders: 1. BAM files must be sorted. 2. Use a pipe (">") to capture CRED output.

The initial release is archived at Zenodo:

Output

The current version of CRED writes to STDOUT so the results can be streamed in-line for subsequent tasks, e.g. checking for motif intersects with BEDTools and immediately compressing the results with GZip. To store the output to a file, use a pipe (">"). The output is presented in a BED-like format directly interpretable in genome browsers such as IGV. The columns are as follows:

Chromosome ID
Start of the site
End position of the site
Peak ID (numerically ordered)
log ratio of relative enrichment in the pulldown vs. control track
(Unused)
Significance level of the relative enrichment

Example

Under samples/ there is a set of simulated treatment and control BAM's one can use to test CRED: * simReads_hg19_treatment-chr20.bam * simReads_hg19_control-chr20.bam

Those files were created by using DWGSIM with a BED file containing a list of reference features ("sites") as true positives (N = 3M), followed by random regions as control (N = 50M). The treatment track was compiled by merging the two to ensure genomic enrichment (see simReads-generate.pl for an example script). Following processing and alignment by LAST, reads located in chr20 were extracted so that the file sizes will be under 100M per GitHub rules.

To try out CRED with these two tracks this way with default settings: cred -t samples/simReads_hg19_treatment-chr20.bam -c samples/simReads_hg19_control-chr20.bam > simReads_chr20-cred.bed

If BEDTools is installed, intersecting the resultant BEDs with the true positives would reveal that the CRED results includes more sites containing the positive spikes (700+) compared to MACS (~500). These regions can also be confirmed by IGV (see results_sample_igv_snapshot.png for an example). On a 3.5GHz 6-core Mac Pro with 64GB of RAM running MacOS 10.14.4, the CRED run completed ~20 seconds for CRED and about a minute for MACS.

Citation

If you use CRED in your research, please cite the following publication:

J. Lin, T. Kuo, P. Horton, H. Nagase, "CRED: a rapid peak caller for Chem-seq data." Journal of Open Source Software 4(37): 1423, 2019. DOI: 10.21105/joss.01423

Owner

Login: jlincbio
Kind: user
Location: Chiba, Japan
Company: Chiba Cancer Center Research Institute

Repositories: 1
Profile: https://github.com/jlincbio

JOSS Publication

CRED: a rapid peak caller for Chem-seq data

Published

May 08, 2019

DOI

10.21105/joss.01423

Volume 4, Issue 37, Page 1423

Authors

Jason Lin

Laboratory of Cancer Genetics, Chiba Cancer Center Research Institute, Chuo-ku, Chiba, Japan, Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo, Japan

Tony Kuo

Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo, Japan

Paul Horton

Institute of Medical Informatics, National Cheng Kung University, Tainan, Taiwan, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan

Hiroki Nagase

Laboratory of Cancer Genetics, Chiba Cancer Center Research Institute, Chuo-ku, Chiba, Japan

Editor

Lorena Pantano

GitHub Events

Total

Last Year

Committers

Last synced: 10 months ago

All Time

Total Commits: 47
Total Committers: 3
Avg Commits per committer: 15.667
Development Distribution Score (DDS): 0.128

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
jlincbio	4****o	41
jlincbio		5
Kyle Niemeyer	k**r@g**m	1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: about 3 hours
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

CRED

Science Score: 93.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Chem-seq Read Enrichment Discovery: CRED

Installation

Inputs and Parameters

Output

Example

Citation

Owner

JOSS Publication

CRED: a rapid peak caller for Chem-seq data

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels