krakenparser

πŸ—‚οΈParse multiple Kraken2 reports into CSV files on 6 taxonomical levels

https://github.com/popoviilab/krakenparser

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • βœ“
    CITATION.cff file
    Found CITATION.cff file
  • βœ“
    codemeta.json file
    Found codemeta.json file
  • βœ“
    .zenodo.json file
    Found .zenodo.json file
  • β—‹
    DOI references
  • β—‹
    Academic publication links
  • β—‹
    Academic email domains
  • β—‹
    Institutional organization owner
  • β—‹
    JOSS paper metadata
  • β—‹
    Scientific vocabulary similarity
    Low similarity (8.2%) to scientific vocabulary

Keywords

kraken2 krakentools metagenomic-pipeline metagenomics
Last synced: 6 months ago · JSON representation ·

Repository

πŸ—‚οΈParse multiple Kraken2 reports into CSV files on 6 taxonomical levels

Basic Info
Statistics
  • Stars: 2
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
kraken2 krakentools metagenomic-pipeline metagenomics
Created about 1 year ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

KrakenParser: Convert Kraken2 Reports to CSV

License Downloads CI

Overview

KrakenParser is a collection of scripts designed to process Kraken2 reports and convert them into CSV format. This pipeline extracts taxonomic abundance data at six levels: - Phylum - Class - Order - Family - Genus - Species

You can run the entire pipeline with a single command, or use the scripts individually depending on your needs.

πŸ”— Please visit KrakenParser wiki page

Output example

Total abundance output

counts_phylum.csv parsed from 7 kraken2 reports of metagenomic samples using KrakenParser:

``` Sample_id,Calditrichota,Caldisericota,Thermosulfidibacterota,Elusimicrobiota,Candidatus Fervidibacterota,Lentisphaerota,Kiritimatiellota,Vulcanimicrobiota,Thermodesulfobiota,Atribacterota,Dictyoglomota,Nitrospinota,Chrysiogenota,Coprothermobacterota,Aquificota,Thermotogota,Bdellovibrionota,Nitrospirota,Deferribacterota,Synergistota,Myxococcota,Acidobacteriota,Candidatus Bipolaricaulota,Candidatus Saccharibacteria,Candidatus Absconditabacteria,Fusobacteriota,Spirochaetota,Candidatus Omnitrophota,Chlamydiota,Verrucomicrobiota,Planctomycetota,Thermodesulfobacteriota,Campylobacterota,Candidatus Cloacimonadota,Fibrobacterota,Gemmatimonadota,Balneolota,Rhodothermota,Ignavibacteriota,Chlorobiota,Bacteroidota,Deinococcota,Thermomicrobiota,Armatimonadota,Chloroflexota,Cyanobacteriota,Mycoplasmatota,Actinomycetota,Bacillota,Pseudomonadota,Heterolobosea,Parabasalia,Fornicata,Evosea,Bacillariophyta,Cercozoa,Euglenozoa,Apicomplexa,Microsporidia,Basidiomycota,Ascomycota,Nanoarchaeota,Candidatus Micrarchaeota,Candidatus Thermoplasmatota,Candidatus Lokiarchaeota,Nitrososphaerota,Euryarchaeota,Thermoproteota,Hofneiviricota,Artverviricota,Nucleocytoviricota,Cossaviricota,Kitrinoviricota,Negarnaviricota,Lenarviricota,Pisuviricota,Peploviricota,Uroviricota X1,0,0,0,0,0,0,0,0,1,1,1,1,2,3,4,5,7,8,9,17,23,25,5,13,22,47,54,1,6,27,31,128,151,2,6,13,1,3,7,44,14991,7,9,11,61,414,449,3551,55304,438645,0,0,0,0,0,0,1,22,0,4,15,0,0,0,0,0,3,191,0,0,1,88,0,0,0,161,0,1241 X2,1,4,14,20,5,12,15,6,8,15,2,15,109,68,182,97,79,196,70,272,331,149,36,77,35,562,1237,21,33,129,427,1044,543,8,98,25,16,45,11,1043,41374,160,28,161,1348,1196,2709,15864,431170,2747842,22,7,301,373,134,136,107,3239,54,1151,2905,0,0,3,5,6,7,410,0,0,0,736,0,3,11,26,1,1552 ... X8,1,19,0,47,0,1,6,20,28,0,1,1,47,7,336,110,30,32,10,93,85,48,9,7,7,154,386,0,14,19,106,358,242,14,5,134,15,11,7,18,54057,106,10,24,212,340,1128,16220,567908,650264,95,4,193,402,314,300,187,4376,37,9796,8653,0,1,0,1,5,23,1778,1,1,0,1,1,4,66,30,4,1263 X9,0,3,2,16,7,1,23,12,10,9,1,2,134,40,390,289,29,372,27,81,150,90,9,88,32,287,881,14,33,60,319,1045,328,15,22,22,10,72,8,63,35301,127,15,48,412,935,2343,11500,380765,2613854,0,0,0,0,0,0,5,74,0,38,40,3,0,0,0,1,3,275,0,0,0,0,0,2,118,25,0,1675

```

Relative abundance output

ra_phylum.csv calculated from 7 kraken2 reports of metagenomic samples using KrakenParser:

Sample_id,taxon,rel_abund_perc X1,Pseudomonadota,85.03558294577552 X1,Bacillota,10.72121619814011 X1,Other (<4.0%),4.243200856084384 X2,Pseudomonadota,84.28702055549813 X2,Bacillota,13.225663867469137 X2,Other (<4.0%),2.487315577032736 ... X8,Pseudomonadota,49.25373021277305 X8,Bacillota,43.01574040339849 X8,Bacteroidota,4.094504530639667 X8,Other (<4.0%),3.6360248531887933 X9,Pseudomonadota,85.62839981589192 X9,Bacillota,12.473649123439218 X9,Other (<4.0%),1.8979510606688494

Ξ±-diversity output

alpha_div.csv calculated from 7 kraken2 reports of metagenomic samples using KrakenParser:

Sample,Shannon,Pielou,Chao1 X1,3.911345447107001,0.5269245043289149,2274.533185840708 X2,3.9944130792536563,0.4906424221265042,4155.0 ... X8,3.442077115880119,0.42753293021330063,4177.251358695652 X9,4.033664950188261,0.5050385978575492,3492.16

Ξ²-diversity output

beta_div_bray.csv calculated from 7 kraken2 reports of metagenomic samples using KrakenParser:

,X1,X2,...,X8,X9 X1,0.0,0.398,...,0.61,0.353 X2,0.398,0.0,...,0.723,0.388 ... X8,0.61,0.723,...,0.0,0.665 X9,0.353,0.388,...,0.665,0.0

beta_div_jaccard.csv calculated from 7 kraken2 reports of metagenomic samples using KrakenParser:

,X1,X2,...,X8,X9 X1,0.0,0.7073170731707317,...,0.8223938223938224,0.7232472324723247 X2,0.7073170731707317,0.0,...,0.835016835016835,0.7352941176470589 ... X8,0.8223938223938224,0.835016835016835,...,0.0,0.8066914498141264 X9,0.7232472324723247,0.7352941176470589,...,0.8066914498141264,0.0

Visualization examples gallery

|Stacked Barplot|Streamgraph| |-------|-------| |kpstbar|kpstream|

Stacked Barplot + Streamgraph|Clustermap| |-------|-------| |combined_white|kpclust|

Quick Start (Full Pipeline)

To run the full pipeline, use the following command: ```bash KrakenParser --complete -i data/kreports

Having troubles? Run KrakenParser --complete -h

``` This will: 1. Convert Kraken2 reports to MPA format 2. Combine MPA files into a single file 3. Extract taxonomic levels into separate text files 4. Process extracted text files 5. Convert them into CSV format 6. Calculate relative abundance 7. Calculate Ξ± & Ξ²-diversities

Input Requirements

  • The Kraken2 reports must be inside a subdirectory (e.g., data/kreports).
  • The script automatically creates output directories and processes the data.

Installation

pip install krakenparser

Using Individual Modules

You can also run each step manually if needed.

Step 1: Convert Kraken2 Reports to MPA Format

```bash KrakenParser --kreport2mpa -i data/kreports -o data/mpa

Having troubles? Run KrakenParser --kreport2mpa -h

`` This script converts Kraken2.kreport` files into MPA format using KrakenTools.

Step 2: Combine MPA Files

```bash KrakenParser --combine_mpa -i data/mpa/* -o data/COMBINED.txt

Having troubles? Run KrakenParser --combine_mpa -h

``` This merges multiple MPA files into a single combined file.

Step 3: Extract Taxonomic Levels

```bash KrakenParser --deconstruct -i data/COMBINED.txt -o data/counts

Having troubles? Run KrakenParser --deconstruct -h

```

If user wants to inspect Viruses domain separately: ```bash KrakenParser --deconstructviruses -i data/COMBINED.txt -o data/countsviruses

Having troubles? Run KrakenParser --deconstruct_viruses -h

```

This step extracts only species-level data (excluding human reads).

Step 4: Process Extracted Taxonomic Data

```bash KrakenParser --process -i data/COMBINED.txt -o data/counts/txt/counts_phylum.txt

Having troubles? Run KrakenParser --process -h

```

Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap up KrakenParser --process to a loop!

This script cleans up taxonomic names (removes prefixes, replaces underscores with spaces).

Step 5: Convert TXT to CSV

```bash KrakenParser --txt2csv -i data/counts/txt/countsphylum.txt -o data/counts/csv/countsphylum.csv

Having troubles? Run KrakenParser --txt2csv -h

`` Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap upKrakenParser --txt2csv` to a loop!

This converts the processed text files into structured CSV format.

Step 6: Calculate relative abundance

```bash KrakenParser --relabund -i data/counts/csv/countsphylum.csv -o data/counts/csvrelabund/counts_phylum.csv

Having troubles? Run KrakenParser --relabund -h

`` Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap upKrakenParser --relabund` to a loop!

This calculates relative abundance and saves as CSV format.

If user wants to group low abundant taxa in "Other" group: ```bash KrakenParser --relabund -i data/counts/csv/countsphylum.csv -o data/counts/csvrelabund/counts_phylum.csv --other 3.5

Having troubles? Run KrakenParser --relabund -h

```

This will group all the taxa that have abundance <3.5 into "Other <3.5%" group. Other parameters are welcome!

Step 7: Calculate Ξ± & Ξ²-diversities

```bash KrakenParser --diversity -i data/counts/csv/counts_species.csv -o data/diversity

Having troubles? Run KrakenParser --diversity -h

```

This calculates Ξ± & Ξ²-diversities and saves them as CSV format to directory provided in the output.

If user wants to use another depth for Ξ²-diversity calculations: ```bash KrakenParser --diversity -i data/counts/csv/counts_species.csv -o data/diversity --depth 750

Having troubles? Run KrakenParser --diversity -h

```

Other parameters are welcome!

Arguments Breakdown

KrakenParser (Main Pipeline)

  • Automates the entire workflow.
  • Takes one argument: the path to Kraken2 reports (data/kreports).
  • Runs all the scripts in sequence.

--kreport2mpa (Step 1)

  • Converts Kraken2 reports to MPA format.
  • Uses KrakenTools/kreport2mpa.py.

--combine_mpa (Step 2)

  • Combines multiple MPA files into one.
  • Uses KrakenTools/combine_mpa.py.

--deconstruct & --deconstruct_viruses (Step 3)

  • Extracts phylum, class, order, family, genus, species into separate text files.
  • Removes human-related reads (--deconstruct only).

--process (Step 4)

  • Cleans and formats extracted taxonomic data.
  • Removes prefixes (s__, g__, etc.), replaces underscores with spaces.

--txt2csv (Step 5)

  • Converts cleaned text files to CSV.
  • Transposes data so that sample names become rows.

--relabund (Step 6)

  • Calculates relative abundance based on total abundance CSV.
  • Optionally can group low abundant taxa.

--diversity (Step 7)

  • Calculates Ξ± & Ξ²-diversities based on total species abundance CSV.
  • Shannon, Pielou & Chao1 indices for Ξ±-diversity
  • Bray-Curtis & Jaccard indices for Ξ²-diversity
  • Uses 1000 depth for Ξ²-diversity as default (can be adjusted with -d)

Example Output Structure

After running the full pipeline, the output directory will look like this: data/ β”œβ”€ kreports/ # Input Kraken2 reports β”œβ”€ mpa/ # Converted MPA files β”œβ”€ COMBINED.txt # Merged MPA file β”œβ”€ counts/ β”‚ β”œβ”€ txt/ # Extracted taxonomic levels in TXT β”‚ β”‚ β”œβ”€ counts_species.txt β”‚ β”‚ β”œβ”€ counts_genus.txt β”‚ β”‚ β”œβ”€ counts_family.txt β”‚ β”‚ β”œβ”€ ... β”‚ └─ csv/ # Total abundance CSV output β”‚ β”œβ”€ counts_species.csv β”‚ β”œβ”€ counts_genus.csv β”‚ β”œβ”€ counts_family.csv β”‚ β”œβ”€ ... β”œβ”€ rel_abund/ # Relative abundance CSV output β”‚ β”œβ”€ ra_species.csv β”‚ β”œβ”€ ra_genus.csv β”‚ β”œβ”€ ra_family.csv β”‚ β”œβ”€ ... └─ diversity/ β”œβ”€ alpha_div.csv β”œβ”€ beta_div_bray.csv └─ beta_div_jaccard.csv

Conclusion

KrakenParser provides a simple and automated way to convert Kraken2 reports into usable CSV files for downstream analysis. You can run the full pipeline with a single command or use individual scripts as needed.

For any issues or feature requests, feel free to open an issue on GitHub!

πŸš€ Happy analyzing!

Owner

  • Name: Ilia and Igor Popov's Lab
  • Login: PopovIILab
  • Kind: organization
  • Email: iljapopov17@gmail.com
  • Location: Russian Federation

Citation (CITATION.cff)

cff-version: 0.1.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Popov
    given-names: Ilia
title: "KrakenParser"
date-released: 2025-02-16
url: https://github.com/PopovIILab/KrakenParser

GitHub Events

Total
  • Watch event: 1
  • Public event: 1
  • Push event: 47
  • Gollum event: 9
  • Pull request event: 13
  • Create event: 1
Last Year
  • Watch event: 1
  • Public event: 1
  • Push event: 47
  • Gollum event: 9
  • Pull request event: 13
  • Create event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 9
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 9
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • iliapopov17 (9)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 245 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 11
  • Total maintainers: 1
pypi.org: krakenparser

A collection of scripts designed to process Kraken2 reports and convert them into CSV format.

  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 245 Last month
Rankings
Dependent packages count: 9.6%
Average: 31.9%
Dependent repos count: 54.1%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/draft-pdf.yml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v4 composite
  • openjournals/openjournals-draft-action master composite