https://github.com/kundajelab/ataqc

https://github.com/kundajelab/ataqc

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    4 of 6 committers (66.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: kundajelab
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 227 KB
Statistics
  • Stars: 17
  • Watchers: 25
  • Forks: 7
  • Open Issues: 4
  • Releases: 0
Created over 9 years ago · Last pushed over 7 years ago
Metadata Files
Readme License

README.md

ATAqC

This pipeline is designed for collecting advanced quality control metrics for ATAC-seq datasets.

===================================================

Annotation sets:

  • hg19 - Gencode v19
  • hg38 - Gencode v24
  • mm9 - vM1 (though I believe the ENCODE portal no longer supports mm9)
  • mm10 - vM7

The TSS bed files are generated directly from the Gencode full GTF files, with the following command:

zcat $GTF | grep -P '\tgene\t' | grep 'protein_coding' | grep -v 'level 3' | awk -F '[\t|\"]' '{ print $1"\t"$4"\t"$5"\t"$10"\t0\t"$7 }' | awk -F '\t' 'BEGIN{ OFS="\t" } { if ($6=="+") { $3=$2-1; $2=$2-2 } else { $2=$3; $3=$3+1 } print }' | sort -k1,1 -k2,2n > $TSS

*Note that the TSS file is a point file, and is not the same as the promoter file (described below).

===================================================

The promoter/enhancer annotations are a little trickier (and likely should be updated, given that the annotations are based off the data that was in the ENCODE portal as of 03/27/2016):

These annotations should be viewed as preliminary and approximate, not as part of the ENCODE encyclopedia. Good for QC, but for deeper analysis please do consider carefully the process by which we got these annotations.

  • hg19 - we made use of the high stringency (-logpval > 10) Reg2Map promoter and enhancer sets: https://personal.broadinstitute.org/meuleman/reg2map/HoneyBadger2release/
  • hg38 - the promoter set is the union of ENCODE RAMPAGE peaks, the enhancer set is the remainder of the union of ENCODE open chromatin (ie DNase) peak sets after removing blacklist and promoter regions. Ie, any other site that was accessible in some ENCODE DNase experiment that was not labeled as a promoter by RAMPAGE data. There is no Reg2Map resource for hg38.

  • mm9 - after getting the union of all mouse mm9 DNase peaks available in the ENCODE portal, the promoter set is those peaks that overlap the TSS file, and the enhancer set is the rest.

  • mm10 - the promoters are the predicted promoters from the ENCODE portal (https://www.encodeproject.org/data/annotations/v3/). after getting the union of all mouse mm10 DNase peaks available in the ENCODE portal, the enhancer set is the remainder after subtracting the promoter and blacklist, since one exists for mm10.

Whenever a blacklist is mentioned, it's the recorded file from the ENCODE portal.

===================================================

Known issues

The pipeline is not currently compatible with samtools/1.3 - we are working on this incompatibility

Contributors

  • Daniel Kim - MD/PhD Student, Biomedical Informatics Program, Stanford University
  • Chuan Sheng Foo - PhD Student, Computer Science Dept., Stanford University
  • Anshul Kundaje - Assistant Professor, Dept. of Genetics, Stanford University

Owner

  • Name: Kundaje Lab
  • Login: kundajelab
  • Kind: organization
  • Location: Stanford University

Compbio and machine learning code repositories from the Kundaje Lab at Stanford Genetics and Computer Science Depts.

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 62
  • Total Committers: 6
  • Avg Commits per committer: 10.333
  • Development Distribution Score (DDS): 0.226
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
vervacity d****9@g****m 48
Chuan-Sheng Foo c****o@c****u 8
Daniel Sunwook Kim d****9@s****u 2
Jin Lee l****2@g****m 2
Chris Probert c****s@D****u 1
Chris Probert c****s@D****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 3
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 0.33
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • HeyLifeHD (1)
  • pveber (1)
  • shangguandong1996 (1)
Pull Request Authors
  • biomystery (1)
  • vervacity (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 26 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 1
  • Total maintainers: 1
pypi.org: ataqc

ATAqC - quality control for ATAC-seq

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 26 Last month
Rankings
Dependent packages count: 10.0%
Forks count: 12.5%
Stargazers count: 14.2%
Average: 18.6%
Dependent repos count: 21.7%
Downloads: 34.4%
Maintainers (1)
Last synced: 7 months ago