Koverage
Koverage: Read-coverage analysis for massive (meta)genomics datasets - Published in JOSS (2024)
Science Score: 98.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Scientific Fields
Repository
Quickly get coverage statistics given reads and an assembly
Basic Info
- Host: GitHub
- Owner: beardymcjohnface
- License: mit
- Language: Python
- Default Branch: main
- Size: 18.9 MB
Statistics
- Stars: 16
- Watchers: 4
- Forks: 5
- Open Issues: 0
- Releases: 12
Metadata Files
README.md

Quickly get coverage statistics given reads and an assembly.
Motivation
While there are tools that will calculate read-coverage statistics, they do not scale particularly well for large datasets, large sample numbers, or large reference FASTAs. Koverage is designed to place minimal burden on I/O and RAM to allow for maximum scalability.
Install
Koverage is available on PyPI and Bioconda.
Recommend create environment for installation:
shell
conda create -n koverage python=3.11
conda activate koverage
Option 1: install with PyPI
shell
pip install koverage
Option 2: install with Bioconda
shell
conda install -c conda-forge -c bioconda koverage
Test the installation:
shell
koverage test
Developer install:
shell
git clone https://github.com/beardymcjohnface/Koverage.git
cd Koverage
pip install -e .
Usage
Get coverage statistics from mapped reads (default method).
shell
koverage run --reads readDir --ref assembly.fasta
Get coverage statistics using kmers (scales much better for very large reference FASTAs).
shell
koverage run --reads readDir --ref assembly.fasta kmer
Any unrecognised commands are passed onto Snakemake. Run Koverage on a HPC using a Snakemake profile.
shell
koverage run --reads readDir --ref assembly.fasta --profile mySlurmProfile
Parsing samples with --reads
You can pass either a directory of reads or a TSV file to --reads.
Note that Koverage expects your read file names to include R1 or R2 e.g. Tynes-BDA-rw-1S14L001R1001.fastq.gz or SRR7141305R2.fastq.gz.
- _Directory:__ Koverage will infer sample names and _R1/_R2 pairs from the filenames.
- TSV file: Koverage expects 2 or 3 columns, with column 1 being the sample name and columns 2 and 3 the reads files.
More information and examples are available here
Test
You can test the methods with the inbuilt dataset like so.
```shell
test default method
koverage test
test all methods
koverage test map kmer coverm ```
Coverage methods
Mapping-based (default)
```shell koverage run ...
or
koverage run ... map ```
This method will map reads using minimap2 and use the mapping coordinates to calculate coverage. This method is suitable for most applications.
Kmer-based
shell
koverage run ... kmer
This method calculates Jellyfish databases of the sequencing reads. It samples kmers from all reference contigs and queries them from the Jellyfish DBs to calculate coverage statistics. This method is exceptionally fast for very large reference genomes.
CoverM
shell
koverage run ... coverm
We've included a wrapper for CoverM which you may find useful. The wrapper manually runs minimap2 and then invokes CoverM on the sorted BAM file. It then combines the output from all samples like the other methods. If you have a large tempfs/ you'll probably find it faster to run CoverM directly on your reads. CoverM is not currently available for MacOS.
Outputs
Mapping-based
Default output files using fast estimations for mean, median, hitrate, and variance.
sample_coverage.tsv
Per sample and per contig counts. Column | description --- | --- Sample | Sample name derived from read file name Contig | Contig ID from assembly FASTA Count | Raw mapped read count RPM | Reads per million RPKM | Reads per kilobase million RPK | Reads per kilobase TPM | Transcripts per million Mean | _Estimated_ mean read depth Median | _Estimated_ median read depth Hitrate | _Estimated_ fraction of contig with depth > 0 Variance | _Estimated_ read depth varianceall_coverage.tsv
Per contig counts (all samples). Column | description --- | --- Contig | Contig ID from assembly FASTA Count | Raw mapped read count RPM | Reads per million RPKM | Reads per kilobase million RPK | Reads per kilobase TPM | Transcripts per millionKmer-based
Outputs for kmer-based coverage metrics. Kmer outputs are gzipped as it is anticipated that this method will be used with very large reference FASTA files.
sample_kmer_coverage.NNmer.tsv.gz
Per sample and contig kmer coverage. Column | description --- | --- Sample | Sample name derived from read file name Contig | Contig ID from assembly FASTA Sum | Sum of sampled kmer depths Mean | Mean sampled kmer depth Median | Median sampled kmer depth Hitrate | Fraction of kmers with depth > 0 Variance | Variance of lowest 95 % of sampled kmer depthsall_kmer_coverage.NNmer.tsv.gz
Contig kmer coverage (all samples). Column | description --- | --- Contig | Contig ID from assembly FASTA Sum | Sum of sampled kmer depths Mean | Mean sampled kmer depth Median | Median sampled kmer depthOwner
- Name: Michael Roach
- Login: beardymcjohnface
- Kind: user
- Company: Flinders University
- Website: bioinf.cc
- Twitter: beardymcface
- Repositories: 6
- Profile: https://github.com/beardymcjohnface
JOSS Publication
Koverage: Read-coverage analysis for massive (meta)genomics datasets
Authors
Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia, Adelaide Centre for Epigenetics and the South Australian Immunogenomics Cancer Institute, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, Australia
Health and Biomedical Innovation, Clinical and Health Sciences, University of South Australia, SA, Australia
Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
Tags
Snakemake Genomics MetagenomicsCitation (CITATION.cff)
cff-version: "1.2.0"
authors:
- family-names: Roach
given-names: Michael J.
orcid: "https://orcid.org/0000-0003-1488-5148"
- family-names: Hart
given-names: Bradley J.
orcid: "https://orcid.org/0000-0001-8110-2460"
- family-names: Beecroft
given-names: Sarah J.
orcid: "https://orcid.org/0000-0002-3935-2279"
- family-names: Papudeshi
given-names: Bhavya
orcid: "https://orcid.org/0000-0001-5359-3100"
- family-names: Inglis
given-names: Laura K.
orcid: "https://orcid.org/0000-0001-7919-8563"
- family-names: Grigson
given-names: Susanna R.
orcid: "https://orcid.org/0000-0003-4738-3451"
- family-names: Mallawaarachchi
given-names: Vijini
orcid: "https://orcid.org/0000-0002-2651-8719"
- family-names: Bouras
given-names: George
orcid: "https://orcid.org/0000-0002-5885-4186"
- family-names: Edwards
given-names: Robert A.
orcid: "https://orcid.org/0000-0001-8383-8949"
contact:
- family-names: Roach
given-names: Michael J.
orcid: "https://orcid.org/0000-0003-1488-5148"
doi: 10.5281/zenodo.10633263
message: If you use this software, please cite our article in the
Journal of Open Source Software.
preferred-citation:
authors:
- family-names: Roach
given-names: Michael J.
orcid: "https://orcid.org/0000-0003-1488-5148"
- family-names: Hart
given-names: Bradley J.
orcid: "https://orcid.org/0000-0001-8110-2460"
- family-names: Beecroft
given-names: Sarah J.
orcid: "https://orcid.org/0000-0002-3935-2279"
- family-names: Papudeshi
given-names: Bhavya
orcid: "https://orcid.org/0000-0001-5359-3100"
- family-names: Inglis
given-names: Laura K.
orcid: "https://orcid.org/0000-0001-7919-8563"
- family-names: Grigson
given-names: Susanna R.
orcid: "https://orcid.org/0000-0003-4738-3451"
- family-names: Mallawaarachchi
given-names: Vijini
orcid: "https://orcid.org/0000-0002-2651-8719"
- family-names: Bouras
given-names: George
orcid: "https://orcid.org/0000-0002-5885-4186"
- family-names: Edwards
given-names: Robert A.
orcid: "https://orcid.org/0000-0001-8383-8949"
date-published: 2024-02-27
doi: 10.21105/joss.06235
issn: 2475-9066
issue: 94
journal: Journal of Open Source Software
publisher:
name: Open Journals
start: 6235
title: "Koverage: Read-coverage analysis for massive (meta)genomics
datasets"
type: article
url: "https://joss.theoj.org/papers/10.21105/joss.06235"
volume: 9
title: "Koverage: Read-coverage analysis for massive (meta)genomics
datasets"
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Michael Roach | b****e@g****m | 309 |
| SarahBeecroft | S****t@p****u | 11 |
| Vijini Mallawaarachchi | v****i@g****m | 5 |
| Susie Grigson | 5****o | 2 |
| linsalrob | r****s@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 9
- Total pull requests: 39
- Average time to close issues: 14 days
- Average time to close pull requests: about 17 hours
- Total issue authors: 6
- Total pull request authors: 5
- Average comments per issue: 3.22
- Average comments per pull request: 0.67
- Merged pull requests: 39
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- telatin (3)
- biobrad (2)
- linsalrob (1)
- Laura-RC (1)
- lparsons (1)
- beardymcjohnface (1)
Pull Request Authors
- beardymcjohnface (39)
- SarahBeecroft (3)
- linsalrob (2)
- csoneson (2)
- Vini2 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 42 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 12
- Total maintainers: 1
pypi.org: koverage
Quickly get coverage statistics given reads and an assembly
- Homepage: https://github.com/beardymcjohnface/Koverage
- Documentation: https://koverage.readthedocs.io/
- License: mit
-
Latest release: 0.1.11
published about 2 years ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v3 composite
- conda-incubator/setup-miniconda v2 composite
- actions/checkout v3 composite
- codecov/codecov-action v3 composite
- conda-incubator/setup-miniconda v2 composite
- actions/checkout v3 composite
- conda-incubator/setup-miniconda v2 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
- Pygments >=2.10.0
- jinja2 ==3.1.0
- mkdocs >=1.2.2
- mkdocs-material *
- mkdocstrings *
- mkdocstrings-python *
- mkgendocs *
- pymdown-extensions >=9.0
- Click >=8.1.3
- datapane >=0.16.7
- metasnek >=0.0.8
- numpy >=1.24.3
- plotly >=5.15.0
- py-spy >=0.3.14
- pyyaml >=6.0
- snakemake >=7.14.0
- snaketool-utils >=0.0.4
- zstandard >=0.21.0