TAXPASTA

TAXPASTA: TAXonomic Profile Aggregation and STAndardisation - Published in JOSS (2023)

https://github.com/taxprofiler/taxpasta

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

bioinformatics classification metagenomic-classification metagenomics profiling python standardisation taxonomic-classifications taxonomic-profiling
Last synced: 4 months ago · JSON representation ·

Repository

TAXnomic Profile Aggregation and STAndardisation

Basic Info
Statistics
  • Stars: 41
  • Watchers: 4
  • Forks: 7
  • Open Issues: 13
  • Releases: 13
Topics
bioinformatics classification metagenomic-classification metagenomics profiling python standardisation taxonomic-classifications taxonomic-profiling
Created over 3 years ago · Last pushed 4 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Support

README.md

taxpasta logo - a green DNA double helix morphing into a fusili pasta shape with the word taxpasta above it

TAXonomic Profile Aggregation and STAndardisation

| | | | ---------- || | Package | Latest PyPI Version Supported Python Versions DOI | | Meta | Project Status: Active – The project has reached a stable, usable state and is being actively developed. Apache-2.0 Code of Conduct Checked with mypy Code Style Black Linting: Ruff pyOpenSci DOI | | Automation | CI Documentation Code Coverage |

About

The main purpose of taxpasta is to standardise taxonomic profiles created by a range of bioinformatics tools. We call those tools taxonomic profilers. They each come with their own particular tabular output format. Across the profilers, relative abundances can be reported in read counts, fractions, or percentages, as well as any number of additional columns with extra information. We therefore decided to take the lessons learnt to heart and provide our own solution to deal with this pasticcio. With taxpasta you can ingest all of those formats and, at a minimum, output taxonomy identifiers and their integer counts. Taxpasta can not only standardise profiles but also merge them across samples for the same profiler into a single table.

Diagram of taxpasta functionality. On the left are a range of taxonomic profilers with hetereogeneous output types with a header of taxonomic profiles, then a range of colourful lines leading into a box with a single green line, the taxpasta logo plus three icons for Validation, Standardisation and Conversion, and finally a range of green lines spreading out to a range of file icons with various file types with a header of Standardised Tables.

Supported Taxonomic Profilers

Taxpasta currently supports standardisation and generation of comparable taxonomic tables for:

See supported profilers for more information.

Install

It's as simple as:

shell pip install taxpasta

Taxpasta is also available from the Bioconda channel

shell conda install -c bioconda taxpasta

and thus automatically generated Docker and Singularity BioContainers images also exist.

Optional Dependencies

Taxpasta supports a number of extras that you can install for additional features; primarily support for additional output file formats. You can install them by specifying a comma separated list within square brackets, for example,

shell pip install 'taxpasta[rich,biom]'

  • rich provides rich-formatted command line output and logging.
  • arrow supports writing output tables in Apache Arrow format.
  • parquet supports writing output tables in Apache Parquet format.
  • biom supports writing output tables in BIOM format.
  • ods supports writing output tables in ODS format.
  • xlsx supports writing output tables in Microsoft Excel format.
  • all includes all of the above.
  • dev provides all tools needed for contributing to taxpasta.

Usage

The main entry point for taxpasta is its command-line interface (CLI). You can interactively explore the offered commands through the help system.

shell taxpasta -h

Taxpasta currently offers two commands corresponding to the main use-cases. You can find out more in the commands' documentation.

Standardise

Since the supported profilers all produce their own flavour of tabular output, a quick way to normalize such files, is to standardise them with taxpasta. You need to let taxpasta know what tool the file was created by. As an example, let's standardise a MetaPhlAn profile. (You can find an example file in our test data.)

shell curl -O https://raw.githubusercontent.com/taxprofiler/taxpasta/main/tests/data/metaphlan/MOCK_002_Illumina_Hiseq_3000_se_metaphlan3-db.metaphlan3_profile.txt taxpasta standardise -p metaphlan -o standardised.tsv MOCK_002_Illumina_Hiseq_3000_se_metaphlan3-db.metaphlan3_profile.txt

With these minimal arguments, taxpasta produces a two column output consisting of

| taxonomy_id | count | | ----------- | ----- | | | |

You can count on the second column being integers :wink:. Having such a simple and tidy table should make your downstream analysis much smoother to start out with. Please have a look at the full getting started tutorial for a more thorough introduction.

Merge

Converting single tables is nice, but hopefully you have many shiny samples to analyze. The taxpasta merge command works similarly to standardise except that you provide multiple profiles as input. You can grab a few more 'MOCK' examples from our test data and try it out.

```shell LOCATION=https://raw.githubusercontent.com/taxprofiler/taxpasta/main/tests/data/metaphlan curl -O "${LOCATION}/MOCK001IlluminaHiseq3000semetaphlan3-db.metaphlan3profile.txt" curl -O "${LOCATION}/MOCK002IlluminaHiseq3000semetaphlan3-db.metaphlan3profile.txt" curl -O "${LOCATION}/MOCK003IlluminaHiseq3000semetaphlan3-db.metaphlan3_profile.txt"

taxpasta merge -p metaphlan -o merged.tsv MOCK*.metaphlan3profile.txt ```

The output of the merge command has one column for the taxonomic identifier and one more column for each input profile. Again, have a look at the full getting started tutorial for a more thorough introduction.

Citation

If you use TAXPASTA in your academic work, please cite our article in the Journal of Open Source Software.

Beber, M. E., Borry, M., Stamouli, S., & Fellows Yates, J. A. (2023). TAXPASTA: TAXonomic Profile Aggregation and STAndardisation. Journal of Open Source Software, 8(87), 5627. https://doi.org/10.21105/joss.05627

Acknowledgments

Many thanks to:

Copyright

  • Copyright © 2022-2024, Moritz E. Beber, Maxime Borry, James A. Fellows Yates, and Sofia Stamouli.
  • Free software distributed under the Apache Software License 2.0.

Owner

  • Name: taxprofiler
  • Login: taxprofiler
  • Kind: organization

JOSS Publication

TAXPASTA: TAXonomic Profile Aggregation and STAndardisation
Published
July 11, 2023
Volume 8, Issue 87, Page 5627
Authors
Moritz E. Beber ORCID
Unseen Bio ApS, Copenhagen, Denmark
Maxime Borry ORCID
Microbiome Sciences Group, Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, Associated Research Group of Archaeogenetics, Leibniz Institute for Natural Product Research and Infection Biology Hans Knöll Institute, Jena, Germany
Sofia Stamouli ORCID
Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Solna, Sweden
James A. Fellows Yates ORCID
Microbiome Sciences Group, Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, Associated Research Group of Archaeogenetics, Leibniz Institute for Natural Product Research and Infection Biology Hans Knöll Institute, Jena, Germany, Department of Paleobiotechnology, Leibniz Institute for Natural Product Research and Infection Biology Hans Knöll Institute, Jena, Germany
Editor
Kevin M. Moerman ORCID
Tags
bioinformatics metagenomics profiling classification standardisation taxonomy Python

Citation (CITATION.cff)

cff-version: "1.2.0"
title: >-
  TAXPASTA: TAXonomic Profile Aggregation and
  STAndardisation
message: >-
  If you use this software, please cite our article in the
  Journal of Open Source Software.
doi: 10.5281/zenodo.8105840
authors:
  - given-names: Moritz E.
    family-names: Beber
    affiliation: "Unseen Bio ApS, Copenhagen, Denmark"
    orcid: "https://orcid.org/0000-0003-2406-1978"
  - given-names: Maxime
    family-names: Borry
    affiliation: >-
      Microbiome Sciences Group, Department of
      Archaeogenetics, Max Planck Institute for Evolutionary
      Anthropology, Leipzig, Germany
    orcid: "https://orcid.org/0000-0001-9140-7559"
  - given-names: Sofia
    family-names: Stamouli
    orcid: "https://orcid.org/0009-0006-0893-3771"
    affiliation: >-
      Department of Microbiology, Tumor and Cell Biology,
      Karolinska Institute, Solna, Sweden
  - given-names: James A.
    family-names: Fellows Yates
    orcid: "https://orcid.org/0000-0001-5585-6277"
    affiliation: >-
      Microbiome Sciences Group, Department of
      Archaeogenetics, Max Planck Institute for Evolutionary
      Anthropology, Leipzig, Germany
contact:
- family-names: Beber
  given-names: Moritz E.
  orcid: "https://orcid.org/0000-0003-2406-1978"
repository-code: "https://github.com/taxprofiler/taxpasta"
url: "https://taxpasta.readthedocs.io/"
repository-artifact: "https://zenodo.org/record/8105840"
preferred-citation:
  authors:
  - family-names: Beber
    given-names: Moritz E.
    orcid: "https://orcid.org/0000-0003-2406-1978"
  - family-names: Borry
    given-names: Maxime
    orcid: "https://orcid.org/0000-0001-9140-7559"
  - family-names: Stamouli
    given-names: Sofia
    orcid: "https://orcid.org/0009-0006-0893-3771"
  - family-names: Fellows Yates
    given-names: James A.
    orcid: "https://orcid.org/0000-0001-5585-6277"
  date-published: 2023-07-11
  doi: 10.21105/joss.05627
  issn: 2475-9066
  issue: 87
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 5627
  title: "TAXPASTA: TAXonomic Profile Aggregation and STAndardisation"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.05627"
  volume: 8
keywords:
  - bioinformatics
  - metagenomics
  - profiling
  - classification
  - standardisation
  - taxonomy
  - Python
license: Apache-2.0

GitHub Events

Total
  • Issues event: 4
  • Watch event: 6
  • Delete event: 1
  • Issue comment event: 17
  • Push event: 6
  • Pull request review event: 3
  • Pull request review comment event: 3
  • Pull request event: 1
  • Fork event: 2
  • Create event: 2
Last Year
  • Issues event: 4
  • Watch event: 6
  • Delete event: 1
  • Issue comment event: 17
  • Push event: 6
  • Pull request review event: 3
  • Pull request review comment event: 3
  • Pull request event: 1
  • Fork event: 2
  • Create event: 2

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 602
  • Total Committers: 6
  • Avg Commits per committer: 100.333
  • Development Distribution Score (DDS): 0.156
Past Year
  • Commits: 40
  • Committers: 2
  • Avg Commits per committer: 20.0
  • Development Distribution Score (DDS): 0.05
Top Committers
Name Email Commits
Moritz E. Beber m****r@p****t 508
James Fellows Yates j****3@g****m 43
Sofia Stamouli s****i@s****e 38
maxibor m****y@g****m 9
Maxime Borry m****r 3
Tim Van Rillaer t****r@h****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 69
  • Total pull requests: 85
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 5 days
  • Total issue authors: 19
  • Total pull request authors: 6
  • Average comments per issue: 1.71
  • Average comments per pull request: 1.67
  • Merged pull requests: 78
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 6.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jfy133 (25)
  • Midnighter (17)
  • maxibor (3)
  • MajoroMask (3)
  • sofstam (2)
  • alexhbnr (2)
  • UmaJan (1)
  • prototaxites (1)
  • LilyAnderssonLee (1)
  • Swindle98 (1)
  • paulzierep (1)
  • apcamargo (1)
  • kdm9 (1)
  • SannaAb (1)
  • nvhphuc1206 (1)
Pull Request Authors
  • Midnighter (51)
  • jfy133 (20)
  • sofstam (8)
  • maxibor (4)
  • xuanxu (1)
Top Labels
Issue Labels
enhancement (17) bug (16) new taxonomic profiler (9) documentation (3) question (3) help wanted (1)
Pull Request Labels
documentation (6) bug (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 40 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 13
  • Total maintainers: 3
pypi.org: taxpasta

TAXonomic Profile Aggregation and STAndardisation

  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 40 Last month
Rankings
Dependent packages count: 6.6%
Downloads: 10.7%
Forks count: 13.6%
Average: 15.5%
Stargazers count: 16.1%
Dependent repos count: 30.6%
Maintainers (3)
Last synced: 4 months ago

Dependencies

.github/workflows/cron.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/main.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • softprops/action-gh-release v0.1 composite
pyproject.toml pypi
  • depinfo ~=2.2
  • numpy ~=1.20
  • pandas ~=1.4
  • pandera ~=0.14
  • taxopy ~=0.10
  • typer ~=0.6