krakenparser

🗂️Parse multiple Kraken2 reports into CSV files on 6 taxonomical levels

https://github.com/popoviilab/krakenparser

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.2%) to scientific vocabulary

Keywords

kraken2 krakentools metagenomic-pipeline metagenomics

Last synced: 9 months ago · JSON representation ·

Repository

🗂️Parse multiple Kraken2 reports into CSV files on 6 taxonomical levels

Basic Info

Host: GitHub
Owner: PopovIILab
License: mit
Language: Python
Default Branch: main
Homepage: https://pypi.org/project/KrakenParser/
Size: 6.26 MB

Statistics

Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Topics

kraken2 krakentools metagenomic-pipeline metagenomics

Created over 1 year ago · Last pushed 11 months ago

Metadata Files

Readme Contributing License Code of conduct Citation

KrakenParser: Convert Kraken2 Reports to CSV

Overview

KrakenParser is a collection of scripts designed to process Kraken2 reports and convert them into CSV format. This pipeline extracts taxonomic abundance data at six levels: - Phylum - Class - Order - Family - Genus - Species

You can run the entire pipeline with a single command, or use the scripts individually depending on your needs.

🔗 Please visit KrakenParser wiki page

Output example

Total abundance output

counts_phylum.csv parsed from 7 kraken2 reports of metagenomic samples using KrakenParser:

``` Sample_id,Calditrichota,Caldisericota,Thermosulfidibacterota,Elusimicrobiota,Candidatus Fervidibacterota,Lentisphaerota,Kiritimatiellota,Vulcanimicrobiota,Thermodesulfobiota,Atribacterota,Dictyoglomota,Nitrospinota,Chrysiogenota,Coprothermobacterota,Aquificota,Thermotogota,Bdellovibrionota,Nitrospirota,Deferribacterota,Synergistota,Myxococcota,Acidobacteriota,Candidatus Bipolaricaulota,Candidatus Saccharibacteria,Candidatus Absconditabacteria,Fusobacteriota,Spirochaetota,Candidatus Omnitrophota,Chlamydiota,Verrucomicrobiota,Planctomycetota,Thermodesulfobacteriota,Campylobacterota,Candidatus Cloacimonadota,Fibrobacterota,Gemmatimonadota,Balneolota,Rhodothermota,Ignavibacteriota,Chlorobiota,Bacteroidota,Deinococcota,Thermomicrobiota,Armatimonadota,Chloroflexota,Cyanobacteriota,Mycoplasmatota,Actinomycetota,Bacillota,Pseudomonadota,Heterolobosea,Parabasalia,Fornicata,Evosea,Bacillariophyta,Cercozoa,Euglenozoa,Apicomplexa,Microsporidia,Basidiomycota,Ascomycota,Nanoarchaeota,Candidatus Micrarchaeota,Candidatus Thermoplasmatota,Candidatus Lokiarchaeota,Nitrososphaerota,Euryarchaeota,Thermoproteota,Hofneiviricota,Artverviricota,Nucleocytoviricota,Cossaviricota,Kitrinoviricota,Negarnaviricota,Lenarviricota,Pisuviricota,Peploviricota,Uroviricota X1,0,0,0,0,0,0,0,0,1,1,1,1,2,3,4,5,7,8,9,17,23,25,5,13,22,47,54,1,6,27,31,128,151,2,6,13,1,3,7,44,14991,7,9,11,61,414,449,3551,55304,438645,0,0,0,0,0,0,1,22,0,4,15,0,0,0,0,0,3,191,0,0,1,88,0,0,0,161,0,1241 X2,1,4,14,20,5,12,15,6,8,15,2,15,109,68,182,97,79,196,70,272,331,149,36,77,35,562,1237,21,33,129,427,1044,543,8,98,25,16,45,11,1043,41374,160,28,161,1348,1196,2709,15864,431170,2747842,22,7,301,373,134,136,107,3239,54,1151,2905,0,0,3,5,6,7,410,0,0,0,736,0,3,11,26,1,1552 ... X8,1,19,0,47,0,1,6,20,28,0,1,1,47,7,336,110,30,32,10,93,85,48,9,7,7,154,386,0,14,19,106,358,242,14,5,134,15,11,7,18,54057,106,10,24,212,340,1128,16220,567908,650264,95,4,193,402,314,300,187,4376,37,9796,8653,0,1,0,1,5,23,1778,1,1,0,1,1,4,66,30,4,1263 X9,0,3,2,16,7,1,23,12,10,9,1,2,134,40,390,289,29,372,27,81,150,90,9,88,32,287,881,14,33,60,319,1045,328,15,22,22,10,72,8,63,35301,127,15,48,412,935,2343,11500,380765,2613854,0,0,0,0,0,0,5,74,0,38,40,3,0,0,0,1,3,275,0,0,0,0,0,2,118,25,0,1675

```

Relative abundance output

ra_phylum.csv calculated from 7 kraken2 reports of metagenomic samples using KrakenParser:

Sample_id,taxon,rel_abund_perc X1,Pseudomonadota,85.03558294577552 X1,Bacillota,10.72121619814011 X1,Other (<4.0%),4.243200856084384 X2,Pseudomonadota,84.28702055549813 X2,Bacillota,13.225663867469137 X2,Other (<4.0%),2.487315577032736 ... X8,Pseudomonadota,49.25373021277305 X8,Bacillota,43.01574040339849 X8,Bacteroidota,4.094504530639667 X8,Other (<4.0%),3.6360248531887933 X9,Pseudomonadota,85.62839981589192 X9,Bacillota,12.473649123439218 X9,Other (<4.0%),1.8979510606688494

α-diversity output

alpha_div.csv calculated from 7 kraken2 reports of metagenomic samples using KrakenParser:

Sample,Shannon,Pielou,Chao1 X1,3.911345447107001,0.5269245043289149,2274.533185840708 X2,3.9944130792536563,0.4906424221265042,4155.0 ... X8,3.442077115880119,0.42753293021330063,4177.251358695652 X9,4.033664950188261,0.5050385978575492,3492.16

β-diversity output

beta_div_bray.csv calculated from 7 kraken2 reports of metagenomic samples using KrakenParser:

,X1,X2,...,X8,X9 X1,0.0,0.398,...,0.61,0.353 X2,0.398,0.0,...,0.723,0.388 ... X8,0.61,0.723,...,0.0,0.665 X9,0.353,0.388,...,0.665,0.0

beta_div_jaccard.csv calculated from 7 kraken2 reports of metagenomic samples using KrakenParser:

,X1,X2,...,X8,X9 X1,0.0,0.7073170731707317,...,0.8223938223938224,0.7232472324723247 X2,0.7073170731707317,0.0,...,0.835016835016835,0.7352941176470589 ... X8,0.8223938223938224,0.835016835016835,...,0.0,0.8066914498141264 X9,0.7232472324723247,0.7352941176470589,...,0.8066914498141264,0.0

Visualization examples gallery

|Stacked Barplot|Streamgraph| |-------|-------| | kpstbar | kpstream |

Stacked Barplot + Streamgraph|Clustermap| |-------|-------| | combined_white | kpclust |

Quick Start (Full Pipeline)

To run the full pipeline, use the following command: ```bash KrakenParser --complete -i data/kreports

Having troubles? Run KrakenParser --complete -h

``` This will: 1. Convert Kraken2 reports to MPA format 2. Combine MPA files into a single file 3. Extract taxonomic levels into separate text files 4. Process extracted text files 5. Convert them into CSV format 6. Calculate relative abundance 7. Calculate α & β-diversities

Input Requirements

The Kraken2 reports must be inside a subdirectory (e.g., data/kreports).
The script automatically creates output directories and processes the data.

Installation

pip install krakenparser

Using Individual Modules

You can also run each step manually if needed.

Step 1: Convert Kraken2 Reports to MPA Format

```bash KrakenParser --kreport2mpa -i data/kreports -o data/mpa

Having troubles? Run KrakenParser --kreport2mpa -h

``This script converts Kraken2.kreport` files into MPA format using KrakenTools.

Step 2: Combine MPA Files

```bash KrakenParser --combine_mpa -i data/mpa/* -o data/COMBINED.txt

Having troubles? Run KrakenParser --combine_mpa -h

``` This merges multiple MPA files into a single combined file.

Step 3: Extract Taxonomic Levels

```bash KrakenParser --deconstruct -i data/COMBINED.txt -o data/counts

Having troubles? Run KrakenParser --deconstruct -h

```

If user wants to inspect Viruses domain separately: ```bash KrakenParser --deconstructviruses -i data/COMBINED.txt -o data/countsviruses

Having troubles? Run KrakenParser --deconstruct_viruses -h

```

This step extracts only species-level data (excluding human reads).

Step 4: Process Extracted Taxonomic Data

```bash KrakenParser --process -i data/COMBINED.txt -o data/counts/txt/counts_phylum.txt

Having troubles? Run KrakenParser --process -h

```

Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap up KrakenParser --process to a loop!

This script cleans up taxonomic names (removes prefixes, replaces underscores with spaces).

Step 5: Convert TXT to CSV

```bash KrakenParser --txt2csv -i data/counts/txt/countsphylum.txt -o data/counts/csv/countsphylum.csv

Having troubles? Run KrakenParser --txt2csv -h

``Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap upKrakenParser --txt2csv` to a loop!

This converts the processed text files into structured CSV format.

Step 6: Calculate relative abundance

```bash KrakenParser --relabund -i data/counts/csv/countsphylum.csv -o data/counts/csvrelabund/counts_phylum.csv

Having troubles? Run KrakenParser --relabund -h

``Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap upKrakenParser --relabund` to a loop!

This calculates relative abundance and saves as CSV format.

If user wants to group low abundant taxa in "Other" group: ```bash KrakenParser --relabund -i data/counts/csv/countsphylum.csv -o data/counts/csvrelabund/counts_phylum.csv --other 3.5

Having troubles? Run KrakenParser --relabund -h

```

This will group all the taxa that have abundance <3.5 into "Other <3.5%" group. Other parameters are welcome!

Step 7: Calculate α & β-diversities

```bash KrakenParser --diversity -i data/counts/csv/counts_species.csv -o data/diversity

Having troubles? Run KrakenParser --diversity -h

```

This calculates α & β-diversities and saves them as CSV format to directory provided in the output.

If user wants to use another depth for β-diversity calculations: ```bash KrakenParser --diversity -i data/counts/csv/counts_species.csv -o data/diversity --depth 750

Having troubles? Run KrakenParser --diversity -h

```

Other parameters are welcome!

Arguments Breakdown

KrakenParser (Main Pipeline)

Automates the entire workflow.
Takes one argument: the path to Kraken2 reports (data/kreports).
Runs all the scripts in sequence.

--kreport2mpa (Step 1)

Converts Kraken2 reports to MPA format.
Uses KrakenTools/kreport2mpa.py.

--combine_mpa (Step 2)

Combines multiple MPA files into one.
Uses KrakenTools/combine_mpa.py.

--deconstruct & --deconstruct_viruses (Step 3)

Extracts phylum, class, order, family, genus, species into separate text files.
Removes human-related reads (--deconstruct only).

--process (Step 4)

Cleans and formats extracted taxonomic data.
Removes prefixes (s__, g__, etc.), replaces underscores with spaces.

--txt2csv (Step 5)

Converts cleaned text files to CSV.
Transposes data so that sample names become rows.

--relabund (Step 6)

Calculates relative abundance based on total abundance CSV.
Optionally can group low abundant taxa.

--diversity (Step 7)

Calculates α & β-diversities based on total species abundance CSV.
Shannon, Pielou & Chao1 indices for α-diversity
Bray-Curtis & Jaccard indices for β-diversity
Uses 1000 depth for β-diversity as default (can be adjusted with -d)

Example Output Structure

After running the full pipeline, the output directory will look like this: data/ ├─ kreports/ # Input Kraken2 reports ├─ mpa/ # Converted MPA files ├─ COMBINED.txt # Merged MPA file ├─ counts/ │ ├─ txt/ # Extracted taxonomic levels in TXT │ │ ├─ counts_species.txt │ │ ├─ counts_genus.txt │ │ ├─ counts_family.txt │ │ ├─ ... │ └─ csv/ # Total abundance CSV output │ ├─ counts_species.csv │ ├─ counts_genus.csv │ ├─ counts_family.csv │ ├─ ... ├─ rel_abund/ # Relative abundance CSV output │ ├─ ra_species.csv │ ├─ ra_genus.csv │ ├─ ra_family.csv │ ├─ ... └─ diversity/ ├─ alpha_div.csv ├─ beta_div_bray.csv └─ beta_div_jaccard.csv

Conclusion

KrakenParser provides a simple and automated way to convert Kraken2 reports into usable CSV files for downstream analysis. You can run the full pipeline with a single command or use individual scripts as needed.

For any issues or feature requests, feel free to open an issue on GitHub!

🚀 Happy analyzing!

Owner

Name: Ilia and Igor Popov's Lab
Login: PopovIILab
Kind: organization
Email: iljapopov17@gmail.com
Location: Russian Federation

Repositories: 1
Profile: https://github.com/PopovIILab

Citation (CITATION.cff)

cff-version: 0.1.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Popov
    given-names: Ilia
title: "KrakenParser"
date-released: 2025-02-16
url: https://github.com/PopovIILab/KrakenParser

GitHub Events

Total

Watch event: 1
Public event: 1
Push event: 47
Gollum event: 9
Pull request event: 13
Create event: 1

Last Year

Watch event: 1
Public event: 1
Push event: 47
Gollum event: 9
Pull request event: 13
Create event: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 0
Total pull requests: 9
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 9
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

iliapopov17 (9)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 245 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 11
Total maintainers: 1

pypi.org: krakenparser

A collection of scripts designed to process Kraken2 reports and convert them into CSV format.

Homepage: https://github.com/PopovIILab/KrakenParser
Documentation: https://krakenparser.readthedocs.io/
License: mit
Latest release: 0.6.1
published 11 months ago

Versions: 11
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 245 Last month

Rankings

Dependent packages count: 9.6%

Average: 31.9%

Dependent repos count: 54.1%

Maintainers (1)

iliapopov17

Last synced: 10 months ago

Dependencies

.github/workflows/ci.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/draft-pdf.yml actions

actions/checkout v4 composite
actions/upload-artifact v4 composite
openjournals/openjournals-draft-action master composite

krakenparser

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

KrakenParser: Convert Kraken2 Reports to CSV

Overview

Output example

Total abundance output

Relative abundance output

α-diversity output

β-diversity output

Visualization examples gallery

Quick Start (Full Pipeline)

Having troubles? Run KrakenParser --complete -h

Input Requirements

Installation

Using Individual Modules

Step 1: Convert Kraken2 Reports to MPA Format

Having troubles? Run KrakenParser --kreport2mpa -h

Step 2: Combine MPA Files

Having troubles? Run KrakenParser --combine_mpa -h

Step 3: Extract Taxonomic Levels

Having troubles? Run KrakenParser --deconstruct -h

Having troubles? Run KrakenParser --deconstruct_viruses -h

Step 4: Process Extracted Taxonomic Data

Having troubles? Run KrakenParser --process -h

Step 5: Convert TXT to CSV

Having troubles? Run KrakenParser --txt2csv -h

Step 6: Calculate relative abundance

Having troubles? Run KrakenParser --relabund -h

Having troubles? Run KrakenParser --relabund -h

Step 7: Calculate α & β-diversities

Having troubles? Run KrakenParser --diversity -h

Having troubles? Run KrakenParser --diversity -h

Arguments Breakdown

KrakenParser (Main Pipeline)

--kreport2mpa (Step 1)

--combine_mpa (Step 2)

--deconstruct & --deconstruct_viruses (Step 3)

--process (Step 4)

--txt2csv (Step 5)

--relabund (Step 6)

--diversity (Step 7)

Example Output Structure

Conclusion

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: krakenparser

Rankings

Maintainers (1)

Dependencies