tophat-recondition

Post-processor for TopHat unmapped.bam files making them usable by downstream software.

https://github.com/cbrueffer/tophat-recondition

Science Score: 31.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary

Keywords

bioinformatics ngs python sam tophat tophat-recondition
Last synced: 6 months ago · JSON representation ·

Repository

Post-processor for TopHat unmapped.bam files making them usable by downstream software.

Basic Info
  • Host: GitHub
  • Owner: cbrueffer
  • License: bsd-2-clause
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 56.6 KB
Statistics
  • Stars: 7
  • Watchers: 5
  • Forks: 5
  • Open Issues: 1
  • Releases: 0
Topics
bioinformatics ngs python sam tophat tophat-recondition
Created over 11 years ago · Last pushed over 6 years ago
Metadata Files
Readme Changelog License Citation

README.md

TopHat-Recondition

bioconda-badge

tophat-recondition is a post-processor for TopHat unmapped reads (contained in unmapped.bam), making them compatible with downstream tools (e.g., the Picard suite, samtools, GATK) (TopHat issue #17). It also works around bugs in TopHat:

  • the "mate is unmapped" SAM flag is not set on any reads in the unmapped.bam file (TopHat issue #3)
  • the mapped mate of an unmapped read can be absent from accepted_hits.bam, creating a mismatch between the file and the unmapped read's flags (TopHat issue #16)

This software was developed as part of a PhD research project in the laboratory of Lao H. Saal, Translational Oncogenomics Unit, Department of Oncology and Pathology, Lund University, Sweden.

A detailed description of the software can be found in Brueffer and Saal (2016).

Requirements

  • Python 2.7 or Python 3
  • pysam

TopHat-Recondition is available for installation with the conda package manager via the bioconda channel: conda install -c bioconda tophat-recondition

Usage

``` usage: tophat-recondition.py [-h] [-l LOGFILE] [-m MAPPEDFILE] [-q] [-r RESULT_DIR] [-u UNMAPPED_FILE] [-v] tophatresult_dir

Post-process TopHat unmapped reads. For detailed information on the issues this software corrects, please consult the software homepage: https://github.com/cbrueffer/tophat-recondition

positional arguments: tophatresultdir directory containing TopHat mapped and unmapped read files.

optional arguments: -h, --help show this help message and exit -l LOGFILE, --logfile LOGFILE log file (optional, (default: resultdir/tophat- recondition.log) -m MAPPEDFILE, --mapped-file MAPPEDFILE Name of the file containing mapped reads (default: acceptedhits.bam) -q, --quiet quiet mode, no console output -r RESULTDIR, --resultdir RESULTDIR directory to write unmappedfixup.bam to (default: tophatoutputdir) -u UNMAPPEDFILE, --unmapped-file UNMAPPEDFILE Name of the file containing unmapped reads (default: unmapped.bam) -v, --version show program's version number and exit ```

Please make sure tophatoutputdir contains both, the mapped file (default: accepted_hits.bam) and the unmapped file (default: unmapped.bam). The fixed reads will be written to a file with the unmapped file name stem and the suffix _fixup, e.g. unmapped_fixup.bam, in result_dir.

Note: The unmapped file is read into memory, so make sure your computer has enough RAM to fit it.

Details

Specifically, the script does the following (see SAM format specification for details on the fields in capital letters):

  • Fixes wrong flags resulting from a bug in TopHat:

    • For paired reads where both reads are unmapped, TopHat does not set the 0x8 flag ("mate is unmapped") on either read.
  • Removes /1 and /2 suffixes from read names (QNAME), if present.

  • Sets mapping quality (MAPQ) for unmapped reads to 0. TopHat sets it to 255 which some downstream tools don't like (even though it is a valid value according to the SAM specification).

  • If an unmapped read's paired read is mapped, set the following fields in the unmapped read (downstream tools like Picard AddOrReplaceReadGroups get confused by the values TopHat fills in for those fields):

    • RNAME: RNAME of the paired read
    • RNEXT: RNAME of the paired read
    • POS: POS of the paired read
    • PNEXT: 0
  • For unmapped reads with missing mapped mates, unset the mate-related flags to effectively make them unpaired. The following flags are unset:

    • 0x1 (mate is paired)
    • 0x2 (mate in proper pair)
    • 0x8 (mate is unmapped)
    • 0x20 (mate is reversed)
    • 0x40 (first in pair)
    • 0x80 (second in pair)

Examples of error messages emitted by downstream tools when trying to process unmapped reads without some or all of these modifications can be found in this thread in the SEQanswers forum, which lead to the development of this software.

Citation

If you use this software in your research and would like to cite it, please use the citation information in the CITATION file.

Owner

  • Name: Christian Brueffer
  • Login: cbrueffer
  • Kind: user
  • Location: Lund, Sweden
  • Company: InSilico Consulting AB

Freelance Bioinformatics and Data Science Consultant

Citation (CITATION)

To cite TopHat-Recondition in publications, please use:

  Brueffer, C. and Saal, L. H. (2016).
  TopHat-Recondition: A post-processor for TopHat unmapped reads.
  BMC Bioinformatics, 2016. 17(1):199. doi:10.1186/s12859-016-1058-x


A BibTeX entry for LaTeX users is:

@article{BruefferSaal2016,
  title = {{TopHat-Recondition: A post-processor for TopHat unmapped reads}},
  author = {Brueffer, Christian and Saal, Lao H},
  journal = {BMC Bioinformatics},
  month = {5},
  year = {2016},
  volume = {17},
  pages = {199},
  number = {1},
  doi = {10.1186/s12859-016-1058-x},
  url = {http://dx.doi.org/10.1186/s12859-016-1058-x},
}

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 99
  • Total Committers: 2
  • Avg Commits per committer: 49.5
  • Development Distribution Score (DDS): 0.01
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Christian Brueffer c****n@b****e 98
roryk r****r@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 3
  • Total pull requests: 1
  • Average time to close issues: 8 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 3
  • Total pull request authors: 1
  • Average comments per issue: 6.0
  • Average comments per pull request: 8.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • drchriscole (1)
  • roryk (1)
  • EdwardBetts (1)
Pull Request Authors
  • roryk (1)
Top Labels
Issue Labels
Pull Request Labels