Koverage

Koverage: Read-coverage analysis for massive (meta)genomics datasets - Published in JOSS (2024)

https://github.com/beardymcjohnface/koverage

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Engineering Computer Science - 40% confidence
Last synced: 6 months ago · JSON representation ·

Repository

Quickly get coverage statistics given reads and an assembly

Basic Info
  • Host: GitHub
  • Owner: beardymcjohnface
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 18.9 MB
Statistics
  • Stars: 16
  • Watchers: 4
  • Forks: 5
  • Open Issues: 0
  • Releases: 12
Created almost 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

PyPI version install with bioconda GitHub last commit (branch) Documentation Status codecov


Quickly get coverage statistics given reads and an assembly.

Motivation

While there are tools that will calculate read-coverage statistics, they do not scale particularly well for large datasets, large sample numbers, or large reference FASTAs. Koverage is designed to place minimal burden on I/O and RAM to allow for maximum scalability.

Install

Koverage is available on PyPI and Bioconda.

Recommend create environment for installation:

shell conda create -n koverage python=3.11 conda activate koverage

Option 1: install with PyPI

shell pip install koverage

Option 2: install with Bioconda

shell conda install -c conda-forge -c bioconda koverage

Test the installation:

shell koverage test

Developer install:

shell git clone https://github.com/beardymcjohnface/Koverage.git cd Koverage pip install -e .

Usage

Get coverage statistics from mapped reads (default method).

shell koverage run --reads readDir --ref assembly.fasta

Get coverage statistics using kmers (scales much better for very large reference FASTAs).

shell koverage run --reads readDir --ref assembly.fasta kmer

Any unrecognised commands are passed onto Snakemake. Run Koverage on a HPC using a Snakemake profile.

shell koverage run --reads readDir --ref assembly.fasta --profile mySlurmProfile

Parsing samples with --reads

You can pass either a directory of reads or a TSV file to --reads. Note that Koverage expects your read file names to include R1 or R2 e.g. Tynes-BDA-rw-1S14L001R1001.fastq.gz or SRR7141305R2.fastq.gz. - _Directory:__ Koverage will infer sample names and _R1/_R2 pairs from the filenames. - TSV file: Koverage expects 2 or 3 columns, with column 1 being the sample name and columns 2 and 3 the reads files.

More information and examples are available here

Test

You can test the methods with the inbuilt dataset like so.

```shell

test default method

koverage test

test all methods

koverage test map kmer coverm ```

Coverage methods

Mapping-based (default)

```shell koverage run ...

or

koverage run ... map ```

This method will map reads using minimap2 and use the mapping coordinates to calculate coverage. This method is suitable for most applications.

Kmer-based

shell koverage run ... kmer

This method calculates Jellyfish databases of the sequencing reads. It samples kmers from all reference contigs and queries them from the Jellyfish DBs to calculate coverage statistics. This method is exceptionally fast for very large reference genomes.

CoverM

shell koverage run ... coverm

We've included a wrapper for CoverM which you may find useful. The wrapper manually runs minimap2 and then invokes CoverM on the sorted BAM file. It then combines the output from all samples like the other methods. If you have a large tempfs/ you'll probably find it faster to run CoverM directly on your reads. CoverM is not currently available for MacOS.

Outputs

Mapping-based

Default output files using fast estimations for mean, median, hitrate, and variance.

sample_coverage.tsv Per sample and per contig counts. Column | description --- | --- Sample | Sample name derived from read file name Contig | Contig ID from assembly FASTA Count | Raw mapped read count RPM | Reads per million RPKM | Reads per kilobase million RPK | Reads per kilobase TPM | Transcripts per million Mean | _Estimated_ mean read depth Median | _Estimated_ median read depth Hitrate | _Estimated_ fraction of contig with depth > 0 Variance | _Estimated_ read depth variance


all_coverage.tsv Per contig counts (all samples). Column | description --- | --- Contig | Contig ID from assembly FASTA Count | Raw mapped read count RPM | Reads per million RPKM | Reads per kilobase million RPK | Reads per kilobase TPM | Transcripts per million

Kmer-based

Outputs for kmer-based coverage metrics. Kmer outputs are gzipped as it is anticipated that this method will be used with very large reference FASTA files.

sample_kmer_coverage.NNmer.tsv.gz Per sample and contig kmer coverage. Column | description --- | --- Sample | Sample name derived from read file name Contig | Contig ID from assembly FASTA Sum | Sum of sampled kmer depths Mean | Mean sampled kmer depth Median | Median sampled kmer depth Hitrate | Fraction of kmers with depth > 0 Variance | Variance of lowest 95 % of sampled kmer depths


all_kmer_coverage.NNmer.tsv.gz Contig kmer coverage (all samples). Column | description --- | --- Contig | Contig ID from assembly FASTA Sum | Sum of sampled kmer depths Mean | Mean sampled kmer depth Median | Median sampled kmer depth

Owner

  • Name: Michael Roach
  • Login: beardymcjohnface
  • Kind: user
  • Company: Flinders University

JOSS Publication

Koverage: Read-coverage analysis for massive (meta)genomics datasets
Published
February 27, 2024
Volume 9, Issue 94, Page 6235
Authors
Michael J. Roach ORCID
Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia, Adelaide Centre for Epigenetics and the South Australian Immunogenomics Cancer Institute, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, Australia
Bradley J. Hart ORCID
Health and Biomedical Innovation, Clinical and Health Sciences, University of South Australia, SA, Australia
Sarah J. Beecroft ORCID
Pawsey Supercomputing Research Centre, Kensington, WA, Australia
Bhavya Papudeshi ORCID
Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
Laura K. Inglis ORCID
Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
Susanna R. Grigson ORCID
Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
Vijini Mallawaarachchi ORCID
Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
George Bouras ORCID
Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, Australia, The Department of Surgery – Otolaryngology Head and Neck Surgery, Central Adelaide Local Health Network, Adelaide, SA, Australia
Robert A. Edwards ORCID
Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
Editor
Charlotte Soneson ORCID
Tags
Snakemake Genomics Metagenomics

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Roach
  given-names: Michael J.
  orcid: "https://orcid.org/0000-0003-1488-5148"
- family-names: Hart
  given-names: Bradley J.
  orcid: "https://orcid.org/0000-0001-8110-2460"
- family-names: Beecroft
  given-names: Sarah J.
  orcid: "https://orcid.org/0000-0002-3935-2279"
- family-names: Papudeshi
  given-names: Bhavya
  orcid: "https://orcid.org/0000-0001-5359-3100"
- family-names: Inglis
  given-names: Laura K.
  orcid: "https://orcid.org/0000-0001-7919-8563"
- family-names: Grigson
  given-names: Susanna R.
  orcid: "https://orcid.org/0000-0003-4738-3451"
- family-names: Mallawaarachchi
  given-names: Vijini
  orcid: "https://orcid.org/0000-0002-2651-8719"
- family-names: Bouras
  given-names: George
  orcid: "https://orcid.org/0000-0002-5885-4186"
- family-names: Edwards
  given-names: Robert A.
  orcid: "https://orcid.org/0000-0001-8383-8949"
contact:
- family-names: Roach
  given-names: Michael J.
  orcid: "https://orcid.org/0000-0003-1488-5148"
doi: 10.5281/zenodo.10633263
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Roach
    given-names: Michael J.
    orcid: "https://orcid.org/0000-0003-1488-5148"
  - family-names: Hart
    given-names: Bradley J.
    orcid: "https://orcid.org/0000-0001-8110-2460"
  - family-names: Beecroft
    given-names: Sarah J.
    orcid: "https://orcid.org/0000-0002-3935-2279"
  - family-names: Papudeshi
    given-names: Bhavya
    orcid: "https://orcid.org/0000-0001-5359-3100"
  - family-names: Inglis
    given-names: Laura K.
    orcid: "https://orcid.org/0000-0001-7919-8563"
  - family-names: Grigson
    given-names: Susanna R.
    orcid: "https://orcid.org/0000-0003-4738-3451"
  - family-names: Mallawaarachchi
    given-names: Vijini
    orcid: "https://orcid.org/0000-0002-2651-8719"
  - family-names: Bouras
    given-names: George
    orcid: "https://orcid.org/0000-0002-5885-4186"
  - family-names: Edwards
    given-names: Robert A.
    orcid: "https://orcid.org/0000-0001-8383-8949"
  date-published: 2024-02-27
  doi: 10.21105/joss.06235
  issn: 2475-9066
  issue: 94
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 6235
  title: "Koverage: Read-coverage analysis for massive (meta)genomics
    datasets"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.06235"
  volume: 9
title: "Koverage: Read-coverage analysis for massive (meta)genomics
  datasets"

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 328
  • Total Committers: 5
  • Avg Commits per committer: 65.6
  • Development Distribution Score (DDS): 0.058
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Michael Roach b****e@g****m 309
SarahBeecroft S****t@p****u 11
Vijini Mallawaarachchi v****i@g****m 5
Susie Grigson 5****o 2
linsalrob r****s@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 9
  • Total pull requests: 39
  • Average time to close issues: 14 days
  • Average time to close pull requests: about 17 hours
  • Total issue authors: 6
  • Total pull request authors: 5
  • Average comments per issue: 3.22
  • Average comments per pull request: 0.67
  • Merged pull requests: 39
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • telatin (3)
  • biobrad (2)
  • linsalrob (1)
  • Laura-RC (1)
  • lparsons (1)
  • beardymcjohnface (1)
Pull Request Authors
  • beardymcjohnface (39)
  • SarahBeecroft (3)
  • linsalrob (2)
  • csoneson (2)
  • Vini2 (1)
Top Labels
Issue Labels
enhancement (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 42 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 12
  • Total maintainers: 1
pypi.org: koverage

Quickly get coverage statistics given reads and an assembly

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 42 Last month
Rankings
Dependent packages count: 7.3%
Downloads: 20.8%
Stargazers count: 23.4%
Average: 24.6%
Forks count: 30.4%
Dependent repos count: 41.3%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/build-koverage-envs.yaml actions
  • actions/checkout v3 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/codecov.yaml actions
  • actions/checkout v3 composite
  • codecov/codecov-action v3 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/py-app.yaml actions
  • actions/checkout v3 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/python-publish.yaml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
docs/requirements.txt pypi
  • Pygments >=2.10.0
  • jinja2 ==3.1.0
  • mkdocs >=1.2.2
  • mkdocs-material *
  • mkdocstrings *
  • mkdocstrings-python *
  • mkgendocs *
  • pymdown-extensions >=9.0
setup.py pypi
  • Click >=8.1.3
  • datapane >=0.16.7
  • metasnek >=0.0.8
  • numpy >=1.24.3
  • plotly >=5.15.0
  • py-spy >=0.3.14
  • pyyaml >=6.0
  • snakemake >=7.14.0
  • snaketool-utils >=0.0.4
  • zstandard >=0.21.0