tophat-recondition
Post-processor for TopHat unmapped.bam files making them usable by downstream software.
Science Score: 31.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary
Keywords
Repository
Post-processor for TopHat unmapped.bam files making them usable by downstream software.
Basic Info
Statistics
- Stars: 7
- Watchers: 5
- Forks: 5
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
TopHat-Recondition
tophat-recondition is a post-processor for TopHat unmapped reads (contained in unmapped.bam), making them compatible with downstream tools (e.g., the Picard suite, samtools, GATK) (TopHat issue #17). It also works around bugs in TopHat:
- the "mate is unmapped" SAM flag is not set on any reads in the unmapped.bam file (TopHat issue #3)
- the mapped mate of an unmapped read can be absent from accepted_hits.bam, creating a mismatch between the file and the unmapped read's flags (TopHat issue #16)
This software was developed as part of a PhD research project in the laboratory of Lao H. Saal, Translational Oncogenomics Unit, Department of Oncology and Pathology, Lund University, Sweden.
A detailed description of the software can be found in Brueffer and Saal (2016).
Requirements
- Python 2.7 or Python 3
- pysam
TopHat-Recondition is available for installation with the conda package manager via the bioconda channel: conda install -c bioconda tophat-recondition
Usage
``` usage: tophat-recondition.py [-h] [-l LOGFILE] [-m MAPPEDFILE] [-q] [-r RESULT_DIR] [-u UNMAPPED_FILE] [-v] tophatresult_dir
Post-process TopHat unmapped reads. For detailed information on the issues this software corrects, please consult the software homepage: https://github.com/cbrueffer/tophat-recondition
positional arguments: tophatresultdir directory containing TopHat mapped and unmapped read files.
optional arguments: -h, --help show this help message and exit -l LOGFILE, --logfile LOGFILE log file (optional, (default: resultdir/tophat- recondition.log) -m MAPPEDFILE, --mapped-file MAPPEDFILE Name of the file containing mapped reads (default: acceptedhits.bam) -q, --quiet quiet mode, no console output -r RESULTDIR, --resultdir RESULTDIR directory to write unmappedfixup.bam to (default: tophatoutputdir) -u UNMAPPEDFILE, --unmapped-file UNMAPPEDFILE Name of the file containing unmapped reads (default: unmapped.bam) -v, --version show program's version number and exit ```
Please make sure tophatoutputdir contains both, the mapped file (default: accepted_hits.bam) and the unmapped file (default: unmapped.bam). The fixed reads will be written to a file with the unmapped file name stem and the suffix _fixup, e.g. unmapped_fixup.bam, in result_dir.
Note: The unmapped file is read into memory, so make sure your computer has enough RAM to fit it.
Details
Specifically, the script does the following (see SAM format specification for details on the fields in capital letters):
Fixes wrong flags resulting from a bug in TopHat:
- For paired reads where both reads are unmapped, TopHat does not set the 0x8 flag ("mate is unmapped") on either read.
Removes /1 and /2 suffixes from read names (QNAME), if present.
Sets mapping quality (MAPQ) for unmapped reads to 0. TopHat sets it to 255 which some downstream tools don't like (even though it is a valid value according to the SAM specification).
If an unmapped read's paired read is mapped, set the following fields in the unmapped read (downstream tools like Picard AddOrReplaceReadGroups get confused by the values TopHat fills in for those fields):
- RNAME: RNAME of the paired read
- RNEXT: RNAME of the paired read
- POS: POS of the paired read
- PNEXT: 0
For unmapped reads with missing mapped mates, unset the mate-related flags to effectively make them unpaired. The following flags are unset:
- 0x1 (mate is paired)
- 0x2 (mate in proper pair)
- 0x8 (mate is unmapped)
- 0x20 (mate is reversed)
- 0x40 (first in pair)
- 0x80 (second in pair)
Examples of error messages emitted by downstream tools when trying to process unmapped reads without some or all of these modifications can be found in this thread in the SEQanswers forum, which lead to the development of this software.
Citation
If you use this software in your research and would like to cite it, please use the citation information in the CITATION file.
Owner
- Name: Christian Brueffer
- Login: cbrueffer
- Kind: user
- Location: Lund, Sweden
- Company: InSilico Consulting AB
- Website: https://www.brueffer.io
- Twitter: cbrueffer
- Repositories: 21
- Profile: https://github.com/cbrueffer
Freelance Bioinformatics and Data Science Consultant
Citation (CITATION)
To cite TopHat-Recondition in publications, please use:
Brueffer, C. and Saal, L. H. (2016).
TopHat-Recondition: A post-processor for TopHat unmapped reads.
BMC Bioinformatics, 2016. 17(1):199. doi:10.1186/s12859-016-1058-x
A BibTeX entry for LaTeX users is:
@article{BruefferSaal2016,
title = {{TopHat-Recondition: A post-processor for TopHat unmapped reads}},
author = {Brueffer, Christian and Saal, Lao H},
journal = {BMC Bioinformatics},
month = {5},
year = {2016},
volume = {17},
pages = {199},
number = {1},
doi = {10.1186/s12859-016-1058-x},
url = {http://dx.doi.org/10.1186/s12859-016-1058-x},
}
GitHub Events
Total
Last Year
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Christian Brueffer | c****n@b****e | 98 |
| roryk | r****r@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 3
- Total pull requests: 1
- Average time to close issues: 8 days
- Average time to close pull requests: 1 day
- Total issue authors: 3
- Total pull request authors: 1
- Average comments per issue: 6.0
- Average comments per pull request: 8.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- drchriscole (1)
- roryk (1)
- EdwardBetts (1)
Pull Request Authors
- roryk (1)