https://github.com/brentp/hts-nim-tools

useful command-line tools written to showcase hts-nim

https://github.com/brentp/hts-nim-tools

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary

Keywords

bam bioinformatics genomics nim nim-lang vcf vcf-filtering
Last synced: 6 months ago · JSON representation

Repository

useful command-line tools written to showcase hts-nim

Basic Info
Statistics
  • Stars: 50
  • Watchers: 4
  • Forks: 6
  • Open Issues: 3
  • Releases: 0
Topics
bam bioinformatics genomics nim nim-lang vcf vcf-filtering
Created about 8 years ago · Last pushed over 5 years ago
Metadata Files
Readme License

README.md

hts-nim-tools

This repository contains a number of tools created with hts-nim intended to serve as examples for using hts-nim as well as to be useful tools.

These tools are:

``` hts-nim utility programs. version: $version

• bam-filter    : filter BAM/CRAM/SAM files with a simple expression language
• count-reads   : count BAM/CRAM reads in regions given in a BED file
• vcf-check     : check regions of a VCF against a background for missing chunks

```

each of these is described in more detail below.

bam-filter

Use simple expressions to filter a BAM/CRAM file:

``` bam-filter

Usage: bam-filter [options]

-t --threads number of BAM decompression threads [default: 0] -f --fasta fasta file for use with CRAM files [default: $env_fasta]. ```

valid expressions may access the bam attibutes:

  • mapq/ start/ pos/ end/ flag/ insert_size (where pos is the 1-based start)
  • is_aligned is_read1 is_read2 is_supplementary is_secondary is_dup is_qcfail
  • is_reverse is_mate_reverse is_pair is_proper_pair is_mate_unmapped is_unmapped

to use aux tags, indicate them prefixed with 'tag_', e.g.:

tag_NM < 2. Any tag present in the bam can be used in this manner.

example: bam-filter "tag_NM == 2 && tag_RG == 'SRR741410' && is_proper_pair" tests/HG02002.bam

count-reads

Count reads reports the number of reads overlapping each interval in a BED file.

``` count-reads

Usage: count-reads [options]

Arguments:

the bed file containing regions in which to count reads. the alignment file for which to calculate depth.

Options:

-t --threads number of BAM decompression threads [default: 0] -f --fasta fasta file for use with CRAM files [default: ]. -F --flag exclude reads with any of the bits in FLAG set [default: 1796] -Q --mapq mapping quality threshold [default: 0] -h --help show help ```

This is output a line with a count of reads for each line in .

vcf-check

vcf-check is useful as a quality control for large projects which have done variant calling in regions where each region is called in parallel. With many regions, and large projects, some regions can error and this might be unknown to the analyst.

This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants. If the background has many variants across a long stretch of genome where the query VCF has no variation, we can expect that region is missed in the query VCF.

``` Check a VCF against a background to make sure that there are no large missing chunks.

vcf-check

Usage: vcf-check [options]

Arguments:
population VCF/BCF with expected sites query VCF/BCF to check

Options:

-c --chunk chunk size for genome [default: 100000] -m --maf allele frequency cutoff [default: 0.1] ```

This will output a tab-delimited file of chrom\tposition\tbackground-count\tquery-count.

The user can find regions that might be problematic by plotting or with some simple awk commands.

Owner

  • Name: Brent Pedersen
  • Login: brentp
  • Kind: user
  • Location: Oregon, USA

Doing genomics

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 11
  • Total pull requests: 0
  • Average time to close issues: 28 days
  • Average time to close pull requests: N/A
  • Total issue authors: 9
  • Total pull request authors: 0
  • Average comments per issue: 6.82
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • pengyu1608 (2)
  • brentp (2)
  • smoe (1)
  • telatin (1)
  • nellore (1)
  • filidi (1)
  • andreas-wilm (1)
  • crazyhottommy (1)
  • ipstone (1)
Pull Request Authors
Top Labels
Issue Labels
enhancement (1)
Pull Request Labels