https://github.com/kundajelab/ataqc
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
4 of 6 committers (66.7%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: kundajelab
- License: mit
- Language: Python
- Default Branch: master
- Size: 227 KB
Statistics
- Stars: 17
- Watchers: 25
- Forks: 7
- Open Issues: 4
- Releases: 0
Metadata Files
README.md
ATAqC
This pipeline is designed for collecting advanced quality control metrics for ATAC-seq datasets.
===================================================
Annotation sets:
- hg19 - Gencode v19
- hg38 - Gencode v24
- mm9 - vM1 (though I believe the ENCODE portal no longer supports mm9)
- mm10 - vM7
The TSS bed files are generated directly from the Gencode full GTF files, with the following command:
zcat $GTF |
grep -P '\tgene\t' |
grep 'protein_coding' |
grep -v 'level 3' |
awk -F '[\t|\"]' '{ print $1"\t"$4"\t"$5"\t"$10"\t0\t"$7 }' |
awk -F '\t' 'BEGIN{ OFS="\t" } { if ($6=="+") { $3=$2-1; $2=$2-2 } else { $2=$3; $3=$3+1 } print }' |
sort -k1,1 -k2,2n > $TSS
*Note that the TSS file is a point file, and is not the same as the promoter file (described below).
===================================================
The promoter/enhancer annotations are a little trickier (and likely should be updated, given that the annotations are based off the data that was in the ENCODE portal as of 03/27/2016):
These annotations should be viewed as preliminary and approximate, not as part of the ENCODE encyclopedia. Good for QC, but for deeper analysis please do consider carefully the process by which we got these annotations.
- hg19 - we made use of the high stringency (-logpval > 10) Reg2Map promoter and enhancer sets: https://personal.broadinstitute.org/meuleman/reg2map/HoneyBadger2release/
hg38 - the promoter set is the union of ENCODE RAMPAGE peaks, the enhancer set is the remainder of the union of ENCODE open chromatin (ie DNase) peak sets after removing blacklist and promoter regions. Ie, any other site that was accessible in some ENCODE DNase experiment that was not labeled as a promoter by RAMPAGE data. There is no Reg2Map resource for hg38.
mm9 - after getting the union of all mouse mm9 DNase peaks available in the ENCODE portal, the promoter set is those peaks that overlap the TSS file, and the enhancer set is the rest.
mm10 - the promoters are the predicted promoters from the ENCODE portal (https://www.encodeproject.org/data/annotations/v3/). after getting the union of all mouse mm10 DNase peaks available in the ENCODE portal, the enhancer set is the remainder after subtracting the promoter and blacklist, since one exists for mm10.
Whenever a blacklist is mentioned, it's the recorded file from the ENCODE portal.
===================================================
Known issues
The pipeline is not currently compatible with samtools/1.3 - we are working on this incompatibility
Contributors
- Daniel Kim - MD/PhD Student, Biomedical Informatics Program, Stanford University
- Chuan Sheng Foo - PhD Student, Computer Science Dept., Stanford University
- Anshul Kundaje - Assistant Professor, Dept. of Genetics, Stanford University
Owner
- Name: Kundaje Lab
- Login: kundajelab
- Kind: organization
- Location: Stanford University
- Website: http://anshul.kundaje.net
- Repositories: 117
- Profile: https://github.com/kundajelab
Compbio and machine learning code repositories from the Kundaje Lab at Stanford Genetics and Computer Science Depts.
GitHub Events
Total
Last Year
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| vervacity | d****9@g****m | 48 |
| Chuan-Sheng Foo | c****o@c****u | 8 |
| Daniel Sunwook Kim | d****9@s****u | 2 |
| Jin Lee | l****2@g****m | 2 |
| Chris Probert | c****s@D****u | 1 |
| Chris Probert | c****s@D****u | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 3
- Total pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Total issue authors: 3
- Total pull request authors: 2
- Average comments per issue: 0.33
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- HeyLifeHD (1)
- pveber (1)
- shangguandong1996 (1)
Pull Request Authors
- biomystery (1)
- vervacity (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 26 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 1
- Total maintainers: 1
pypi.org: ataqc
ATAqC - quality control for ATAC-seq
- Homepage: https://github.com/kundajelab/ataqc
- Documentation: https://ataqc.readthedocs.io/
- License: BSD-3
-
Latest release: 0.2
published over 8 years ago