CRED

CRED: a rapid peak caller for Chem-seq data - Published in JOSS (2019)

https://github.com/jlincbio/cred

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software
Last synced: 6 months ago · JSON representation

Repository

Chem-seq Read Enrichment Discovery: CRED

Basic Info
  • Host: GitHub
  • Owner: jlincbio
  • License: gpl-3.0
  • Language: C
  • Default Branch: master
  • Homepage:
  • Size: 156 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created almost 7 years ago · Last pushed about 6 years ago
Metadata Files
Readme License

README.md

Chem-seq Read Enrichment Discovery: CRED

Chem-seq Read Enrichment Discovery (CRED) is a simple peak caller written in C for identifying non-canonical feature enrichments in paired Chem-seq data.

status
04/10/2019 initial commit/JOSS submission
05/04/2019 JOSS peer review completed
05/08/2019 formal publication in JOSS

Installation

CRED requires a compatible compiler (e.g. GCC), and utilizes HTSlib to process BAM files. Please clone the repo inside CRED before compiling.

Sample installation: git clone https://github.com/jlincbio/cred.git cd cred && git clone https://github.com/samtools/htslib.git make # or "make mac" if macOS export INSTALL_PREFIX=/usr/local/bin make PREFIX=$INSTALL_PREFIX install Notes: 1. For macOS systems, replace make with make mac. Alternatively, enter htslib/ and invoke make. 2. For troubleshooting, make clean will also clean the HTSlib directory. 3. use PREFIX to specify a destination to install CRED.

Inputs and Parameters

CRED: Chem-seq Read Enrichment Discovery (Version 0.1, Apr 2019 Initial Release) Command : cred [options] -t TREATMENT.BAM -c CONTROL.BAM > OUTPUT.BED Required : -t TREATMENT.BAM Path to the treatment ("pulldown") track [BAM] -c CONTROL.BAM Path to the control ("input") Chem-seq track [BAM] Optional : -p [P-VALUE] Significance level [default 0.0001] -q [SCORE] Minimum MAPQ quality for reads to count [default 30] -w [INTEGER] Size of differential windows [default 1200 bp] -k Evaluate site significance with Kolmogorov-Smirnov Reminders: 1. BAM files must be sorted and indexed. 2. Use a pipe (">") to capture CRED output.

  • BAM files for the treatment and control BAM's, respectively: use -t and -c to specify the pair. BAM's should be aligned, coordinate-sorted AND indexed (both .bam and .bam.bai should be present).
  • Quality score cutoff (MAPQ, option -q): this is the minimum required mapping quality score as defined in the SAM format specification. CRED defaults to 30.
  • Significance (alpha) level (option -p): please specify this either as a decimal (e.g. 0.0001) or a fraction (e.g. 1/10000). CRED will check all regions against this cutoff and output only features more significant than this predefined alpha level.
  • Size of sliding windows may also be adjusted with option -w; defaults to 1200 bp (maximum size used in Lin et al. PLoS ONE 2016).
  • Method of evaluating the significance of enrichment may also be modified from Welch's t-test (default) to Kolmogorov-Smirnov by the -k toggle.

A helper program, "Batch CRED" or BCRED, is also supplied here for running CRED on more hardware-limited systems (e.g., those without access to a lot of RAM). BCRED is written in Perl, and requires samtools as well as Parallel::ForkManager (a Perl module available on CPAN) for operation; this will split the input BAM pairs per chromosome, dispatch CRED calls, and merge the result. Multithreading support is also made possible via Parallel::ForkManager.

To launch BCRED, look for bcred in the same folder as cred. The inputs and parameters are largely the same as cred:

bcred: a batch assistant for CRED Ver. 0.1 (Apr 2019 Initial Release) Usage: bcred [options] -t TREATMENT.BAM -c CONTROL.BAM > OUTPUT.BED Required : -t TREATMENT.BAM Path to the treatment ("pulldown") track [BAM] -c CONTROL.BAM Path to the control ("input") Chem-seq track [BAM] Optional : -p [P-VALUE] Significance level [default 0.0001] -q [SCORE] Minimum MAPQ quality for reads to count [default 30] -w [INTEGER] Size of differential windows [default 1200 bp] -n [INTEGER] Number of threads to utilize [default 1] -k Evaluate site significance with Kolmogorov-Smirnov Reminders: 1. BAM files must be sorted. 2. Use a pipe (">") to capture CRED output.

The initial release is archived at Zenodo:
DOI

Output

The current version of CRED writes to STDOUT so the results can be streamed in-line for subsequent tasks, e.g. checking for motif intersects with BEDTools and immediately compressing the results with GZip. To store the output to a file, use a pipe (">"). The output is presented in a BED-like format directly interpretable in genome browsers such as IGV. The columns are as follows:

  1. Chromosome ID
  2. Start of the site
  3. End position of the site
  4. Peak ID (numerically ordered)
  5. log ratio of relative enrichment in the pulldown vs. control track
  6. (Unused)
  7. Significance level of the relative enrichment

Example

Under samples/ there is a set of simulated treatment and control BAM's one can use to test CRED: * simReads_hg19_treatment-chr20.bam * simReads_hg19_control-chr20.bam

Those files were created by using DWGSIM with a BED file containing a list of reference features ("sites") as true positives (N = 3M), followed by random regions as control (N = 50M). The treatment track was compiled by merging the two to ensure genomic enrichment (see simReads-generate.pl for an example script). Following processing and alignment by LAST, reads located in chr20 were extracted so that the file sizes will be under 100M per GitHub rules.

To try out CRED with these two tracks this way with default settings: cred -t samples/simReads_hg19_treatment-chr20.bam -c samples/simReads_hg19_control-chr20.bam > simReads_chr20-cred.bed

If BEDTools is installed, intersecting the resultant BEDs with the true positives would reveal that the CRED results includes more sites containing the positive spikes (700+) compared to MACS (~500). These regions can also be confirmed by IGV (see results_sample_igv_snapshot.png for an example). On a 3.5GHz 6-core Mac Pro with 64GB of RAM running MacOS 10.14.4, the CRED run completed ~20 seconds for CRED and about a minute for MACS.

Citation

If you use CRED in your research, please cite the following publication:

J. Lin, T. Kuo, P. Horton, H. Nagase, "CRED: a rapid peak caller for Chem-seq data." Journal of Open Source Software 4(37): 1423, 2019. DOI: 10.21105/joss.01423

Owner

  • Login: jlincbio
  • Kind: user
  • Location: Chiba, Japan
  • Company: Chiba Cancer Center Research Institute

JOSS Publication

CRED: a rapid peak caller for Chem-seq data
Published
May 08, 2019
Volume 4, Issue 37, Page 1423
Authors
Jason Lin ORCID
Laboratory of Cancer Genetics, Chiba Cancer Center Research Institute, Chuo-ku, Chiba, Japan, Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo, Japan
Tony Kuo ORCID
Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo, Japan
Paul Horton ORCID
Institute of Medical Informatics, National Cheng Kung University, Tainan, Taiwan, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
Hiroki Nagase ORCID
Laboratory of Cancer Genetics, Chiba Cancer Center Research Institute, Chuo-ku, Chiba, Japan
Editor
Lorena Pantano ORCID
Tags
polyamides chemical biology chem-seq

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 47
  • Total Committers: 3
  • Avg Commits per committer: 15.667
  • Development Distribution Score (DDS): 0.128
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
jlincbio 4****o 41
jlincbio 5
Kyle Niemeyer k****r@g****m 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 3 hours
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • kyleniemeyer (1)
Top Labels
Issue Labels
Pull Request Labels