Kindel

Kindel: indel-aware consensus for nucleotide sequence alignments - Published in JOSS (2017)

https://github.com/bede/kindel

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Mathematics Computer Science - 84% confidence
Biochemistry, Genetics and Molecular Biology Life Sciences - 60% confidence
Last synced: 6 months ago · JSON representation

Repository

Indel-aware consensus for aligned BAM

Basic Info
  • Host: GitHub
  • Owner: bede
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 83.8 MB
Statistics
  • Stars: 21
  • Watchers: 3
  • Forks: 2
  • Open Issues: 5
  • Releases: 12
Created over 9 years ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

Kindel: indel-aware consensus from aligned BAM

JOSS status PyPI version Python support Tests

Kindel reconciles substitutions and CIGAR-described indels to to produce a majority consensus from an aligned BAM/SAM file. Using the --realign option, unaligned gap closure using soft-clipped sequence information is also performed, a kind of local reassembly. Intended for use with small alignments of genes or virus genomes, Kindel is tested with BAMs created by aligners such as Minimap2 and BWA. No reference sequence is required, however the input BAM must contain headers (@SQ) . If you encounter problems, please open an issue. Please also cite the JOSS article if you find this useful.

Core functionality

clip-dominant region

Reassembly of clip-dominant regions (CDRs) with --realign

clip-dominant region

Features

  • Consensus of aligned substititutions, small insertions and deletions

  • Optional consensus reassembly around large unaligned 'clip-dominant' gaps (using --realign)

  • Support for short, paired and long reads mapped with e.g. Minimap2, BWA-MEM, and Segemehl

  • Support for BAMs with multiple reference contigs, chromosomes

  • Visualisation of aligned and clipped sequence depth by site alongside insertions, deletions (kindel plot)

Limitations

  • While Kindel has been tested with bacterial genomes, expect slow performance with megabase genomes
  • SAM/BAM files must contain an @SQ header line with reference sequence length(s).
  • Realignment mode (--realign) is able to close gaps of up to 2x read length

Installation

Install inside existing Python environment:

```shell

Requires Python 3.9+ and Samtools

pip install kindel ``` Complete installation using a conda-compatible package manager:

conda create -y -n kindel python=3.13 samtools conda activate kindel pip install kindel

Development install:

git clone https://github.com/bede/kindel.git cd kindel pip install --editable '.[dev]'

Usage (kindel consensus)

Also see usage.ipynb

Command line

Generate a consensus sequence from an aligned BAM, saving the consensus sequence to cns.fa:

bash $ kindel consensus alignment.bam > cns.fa

Generate a consensus sequence from an aligned BAM with realignment mode enabled, allowing closure of gaps in the consensus sequence:

bash $ kindel consensus --realign alignment.bam > cns.fa

Built in help:

```bash $ kindel -h usage: kindel [-h] {consensus,weights,features,variants,plot} ...

positional arguments: {consensus,weights,features,variants,plot,version} consensus Infer consensus sequence(s) from alignment in SAM/BAM format weights Returns table of per-site nucleotide frequencies and coverage features Returns table of per-site nucleotide frequencies and coverage including indels variants Output variants exceeding specified absolute and relative frequency thresholds plot Plot sitewise soft clipping frequency across reference and genome version Show version

optional arguments: -h, --help show this help message and exit

```

```bash $ kindel consensus -h usage: kindel consensus [-h] [-r] [--min-depth MINDEPTH] [--min-overlap MIN_OVERLAP] [-c CLIPDECAYTHRESHOLD] [--mask-ends MASK_ENDS] [-t] [-u] bampath

Infer consensus sequence(s) from alignment in SAM/BAM format

positional arguments: bam_path path to SAM/BAM file

optional arguments: -h, --help show this help message and exit -r, --realign attempt to reconstruct reference around soft-clip boundaries (default: False) --min-depth MINDEPTH substitute Ns at coverage depths beneath this value (default: 1) --min-overlap MINOVERLAP match length required to close soft-clipped gaps (default: 7) -c CLIPDECAYTHRESHOLD, --clip-decay-threshold CLIPDECAYTHRESHOLD read depth fraction at which to cease clip extension (default: 0.1) --mask-ends MASK_ENDS ignore clip dominant positions within n positions of termini (default: 50) -t, --trim-ends trim ambiguous nucleotides (Ns) from sequence ends (default: False) -u, --uppercase close gaps using uppercase alphabet (default: False) ```

Python API

```python from kindel import kindel

kindel.bamtoconsensus(bampath, realign=False, mindepth=2, minoverlap=7, clipdecaythreshold=0.1, trimends=False, uppercase=False) ```

Issues

If you encounter problems please open a GitHub issue, preferably including a BAM that allows the problem to be reproduced, or else reach out via email or social media.

Visualising alignments (kindel plot)

It can be useful to visualise rates of insertion, deletion and alignment clipping across an alignment. kindel plot generates an interactive HTML plot showing relevant alignment information.

To plot aligned depth alongside insertion, deletion and soft clipping frequency:

kindel plot tests/data_minimap2/2.issue23.debug.bam

Original alignment Plot of original alignment

After alignment to Kindel consensus sequence Plot after alignment to Kindel consensus sequence

Contributing

If you would like to contribute to this project, please open an issue or contact the author directly using the details above. Please note that this project is released with a Contributor Code of Conduct, and by participating in this project you agree to abide by its terms.

Before opening a pull request, please:

  • Ensure tests pass in a local development build (see installation instructions) by executing pytest inside the package directory.
  • Increment the version number inside __init__.py according to SemVer.
  • Update documentation and/or tests if possible.

Owner

  • Name: Bede Constantinides
  • Login: bede
  • Kind: user
  • Company: Oxford Nanopore Technologies

JOSS Publication

Kindel: indel-aware consensus for nucleotide sequence alignments
Published
July 13, 2017
Volume 2, Issue 15, Page 282
Authors
Bede Constantinides ORCID
Evolution and Genomic Sciences, University of Manchester, Manchester, UK
David L. Robertson ORCID
MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
Editor
Melissa Gymrek ORCID
Tags
bioinformatics sequence analysis genome assembly

GitHub Events

Total
  • Create event: 8
  • Release event: 4
  • Issues event: 12
  • Watch event: 1
  • Delete event: 4
  • Issue comment event: 21
  • Push event: 23
Last Year
  • Create event: 8
  • Release event: 4
  • Issues event: 12
  • Watch event: 1
  • Delete event: 4
  • Issue comment event: 21
  • Push event: 23

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 124
  • Total Committers: 2
  • Avg Commits per committer: 62.0
  • Development Distribution Score (DDS): 0.008
Past Year
  • Commits: 18
  • Committers: 1
  • Avg Commits per committer: 18.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Bede Constantinides b****c@g****m 123
Arfon Smith a****n 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 23
  • Total pull requests: 1
  • Average time to close issues: over 3 years
  • Average time to close pull requests: 4 minutes
  • Total issue authors: 6
  • Total pull request authors: 1
  • Average comments per issue: 2.0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 0
  • Average time to close issues: 18 days
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 5.33
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • bede (17)
  • sivico26 (2)
  • joelnitta (1)
  • lneaves (1)
  • migrau (1)
  • sureman (1)
Pull Request Authors
  • arfon (1)
Top Labels
Issue Labels
enhancement (4) low priority (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 149 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 16
  • Total maintainers: 1
pypi.org: kindel

Indel-aware consensus from aligned BAMs

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 149 Last month
Rankings
Dependent packages count: 10.0%
Stargazers count: 13.6%
Average: 20.5%
Dependent repos count: 21.8%
Forks count: 22.6%
Downloads: 34.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/test.yml actions
  • actions/checkout v2 composite
  • s-weigand/setup-conda v1 composite
pyproject.toml pypi
  • argh >=0.26.2
  • dnaio ==1.2.3
  • pandas >=0.19.2
  • plotly >=2.0.0
  • scipy >=0.19.0
  • simplesam ==0.1.3.2
  • tqdm >=4.11.2