TAXPASTA
TAXPASTA: TAXonomic Profile Aggregation and STAndardisation - Published in JOSS (2023)
Science Score: 98.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 8 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Repository
TAXnomic Profile Aggregation and STAndardisation
Basic Info
- Host: GitHub
- Owner: taxprofiler
- License: apache-2.0
- Language: Python
- Default Branch: dev
- Homepage: https://taxpasta.readthedocs.io/
- Size: 1.99 MB
Statistics
- Stars: 41
- Watchers: 4
- Forks: 7
- Open Issues: 13
- Releases: 13
Topics
Metadata Files
README.md
TAXonomic Profile Aggregation and STAndardisation
| | |
| ---------- ||
| Package |
|
| Meta |
|
| Automation |
|
About
The main purpose of taxpasta is to standardise taxonomic profiles created by a range of bioinformatics tools. We call those tools taxonomic profilers. They each come with their own particular tabular output format. Across the profilers, relative abundances can be reported in read counts, fractions, or percentages, as well as any number of additional columns with extra information. We therefore decided to take the lessons learnt to heart and provide our own solution to deal with this pasticcio. With taxpasta you can ingest all of those formats and, at a minimum, output taxonomy identifiers and their integer counts. Taxpasta can not only standardise profiles but also merge them across samples for the same profiler into a single table.

Supported Taxonomic Profilers
Taxpasta currently supports standardisation and generation of comparable taxonomic tables for:
See supported profilers for more information.
Install
It's as simple as:
shell
pip install taxpasta
Taxpasta is also available from the Bioconda channel
shell
conda install -c bioconda taxpasta
and thus automatically generated Docker and Singularity BioContainers images also exist.
Optional Dependencies
Taxpasta supports a number of extras that you can install for additional features; primarily support for additional output file formats. You can install them by specifying a comma separated list within square brackets, for example,
shell
pip install 'taxpasta[rich,biom]'
-
richprovides rich-formatted command line output and logging. -
arrowsupports writing output tables in Apache Arrow format. -
parquetsupports writing output tables in Apache Parquet format. -
biomsupports writing output tables in BIOM format. -
odssupports writing output tables in ODS format. -
xlsxsupports writing output tables in Microsoft Excel format. -
allincludes all of the above. -
devprovides all tools needed for contributing to taxpasta.
Usage
The main entry point for taxpasta is its command-line interface (CLI). You can interactively explore the offered commands through the help system.
shell
taxpasta -h
Taxpasta currently offers two commands corresponding to the main use-cases. You can find out more in the commands' documentation.
Standardise
Since the supported profilers all produce their own flavour of tabular output, a quick way to normalize such files, is to standardise them with taxpasta. You need to let taxpasta know what tool the file was created by. As an example, let's standardise a MetaPhlAn profile. (You can find an example file in our test data.)
shell
curl -O https://raw.githubusercontent.com/taxprofiler/taxpasta/main/tests/data/metaphlan/MOCK_002_Illumina_Hiseq_3000_se_metaphlan3-db.metaphlan3_profile.txt
taxpasta standardise -p metaphlan -o standardised.tsv MOCK_002_Illumina_Hiseq_3000_se_metaphlan3-db.metaphlan3_profile.txt
With these minimal arguments, taxpasta produces a two column output consisting of
| taxonomy_id | count | | ----------- | ----- | | | |
You can count on the second column being integers :wink:. Having such a simple and tidy table should make your downstream analysis much smoother to start out with. Please have a look at the full getting started tutorial for a more thorough introduction.
Merge
Converting single tables is nice, but hopefully you have many shiny samples to
analyze. The taxpasta merge command works similarly to standardise except
that you provide multiple profiles as input. You can grab a few more 'MOCK' examples from
our test
data and
try it out.
```shell LOCATION=https://raw.githubusercontent.com/taxprofiler/taxpasta/main/tests/data/metaphlan curl -O "${LOCATION}/MOCK001IlluminaHiseq3000semetaphlan3-db.metaphlan3profile.txt" curl -O "${LOCATION}/MOCK002IlluminaHiseq3000semetaphlan3-db.metaphlan3profile.txt" curl -O "${LOCATION}/MOCK003IlluminaHiseq3000semetaphlan3-db.metaphlan3_profile.txt"
taxpasta merge -p metaphlan -o merged.tsv MOCK*.metaphlan3profile.txt ```
The output of the merge command has one column for the taxonomic identifier and
one more column for each input profile. Again, have a look at the full
getting
started
tutorial for a more thorough introduction.
Citation
If you use TAXPASTA in your academic work, please cite our article in the Journal of Open Source Software.
Beber, M. E., Borry, M., Stamouli, S., & Fellows Yates, J. A. (2023). TAXPASTA: TAXonomic Profile Aggregation and STAndardisation. Journal of Open Source Software, 8(87), 5627. https://doi.org/10.21105/joss.05627
Acknowledgments
Many thanks to:
- nf-core for bringing together the original developers
- Zandra Fagernäs for the logo design
Copyright
- Copyright © 2022-2024, Moritz E. Beber, Maxime Borry, James A. Fellows Yates, and Sofia Stamouli.
- Free software distributed under the Apache Software License 2.0.
Owner
- Name: taxprofiler
- Login: taxprofiler
- Kind: organization
- Repositories: 2
- Profile: https://github.com/taxprofiler
JOSS Publication
TAXPASTA: TAXonomic Profile Aggregation and STAndardisation
Authors
Microbiome Sciences Group, Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, Associated Research Group of Archaeogenetics, Leibniz Institute for Natural Product Research and Infection Biology Hans Knöll Institute, Jena, Germany
Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Solna, Sweden
Microbiome Sciences Group, Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, Associated Research Group of Archaeogenetics, Leibniz Institute for Natural Product Research and Infection Biology Hans Knöll Institute, Jena, Germany, Department of Paleobiotechnology, Leibniz Institute for Natural Product Research and Infection Biology Hans Knöll Institute, Jena, Germany
Tags
bioinformatics metagenomics profiling classification standardisation taxonomy PythonCitation (CITATION.cff)
cff-version: "1.2.0"
title: >-
TAXPASTA: TAXonomic Profile Aggregation and
STAndardisation
message: >-
If you use this software, please cite our article in the
Journal of Open Source Software.
doi: 10.5281/zenodo.8105840
authors:
- given-names: Moritz E.
family-names: Beber
affiliation: "Unseen Bio ApS, Copenhagen, Denmark"
orcid: "https://orcid.org/0000-0003-2406-1978"
- given-names: Maxime
family-names: Borry
affiliation: >-
Microbiome Sciences Group, Department of
Archaeogenetics, Max Planck Institute for Evolutionary
Anthropology, Leipzig, Germany
orcid: "https://orcid.org/0000-0001-9140-7559"
- given-names: Sofia
family-names: Stamouli
orcid: "https://orcid.org/0009-0006-0893-3771"
affiliation: >-
Department of Microbiology, Tumor and Cell Biology,
Karolinska Institute, Solna, Sweden
- given-names: James A.
family-names: Fellows Yates
orcid: "https://orcid.org/0000-0001-5585-6277"
affiliation: >-
Microbiome Sciences Group, Department of
Archaeogenetics, Max Planck Institute for Evolutionary
Anthropology, Leipzig, Germany
contact:
- family-names: Beber
given-names: Moritz E.
orcid: "https://orcid.org/0000-0003-2406-1978"
repository-code: "https://github.com/taxprofiler/taxpasta"
url: "https://taxpasta.readthedocs.io/"
repository-artifact: "https://zenodo.org/record/8105840"
preferred-citation:
authors:
- family-names: Beber
given-names: Moritz E.
orcid: "https://orcid.org/0000-0003-2406-1978"
- family-names: Borry
given-names: Maxime
orcid: "https://orcid.org/0000-0001-9140-7559"
- family-names: Stamouli
given-names: Sofia
orcid: "https://orcid.org/0009-0006-0893-3771"
- family-names: Fellows Yates
given-names: James A.
orcid: "https://orcid.org/0000-0001-5585-6277"
date-published: 2023-07-11
doi: 10.21105/joss.05627
issn: 2475-9066
issue: 87
journal: Journal of Open Source Software
publisher:
name: Open Journals
start: 5627
title: "TAXPASTA: TAXonomic Profile Aggregation and STAndardisation"
type: article
url: "https://joss.theoj.org/papers/10.21105/joss.05627"
volume: 8
keywords:
- bioinformatics
- metagenomics
- profiling
- classification
- standardisation
- taxonomy
- Python
license: Apache-2.0
GitHub Events
Total
- Issues event: 4
- Watch event: 6
- Delete event: 1
- Issue comment event: 17
- Push event: 6
- Pull request review event: 3
- Pull request review comment event: 3
- Pull request event: 1
- Fork event: 2
- Create event: 2
Last Year
- Issues event: 4
- Watch event: 6
- Delete event: 1
- Issue comment event: 17
- Push event: 6
- Pull request review event: 3
- Pull request review comment event: 3
- Pull request event: 1
- Fork event: 2
- Create event: 2
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Moritz E. Beber | m****r@p****t | 508 |
| James Fellows Yates | j****3@g****m | 43 |
| Sofia Stamouli | s****i@s****e | 38 |
| maxibor | m****y@g****m | 9 |
| Maxime Borry | m****r | 3 |
| Tim Van Rillaer | t****r@h****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 69
- Total pull requests: 85
- Average time to close issues: about 2 months
- Average time to close pull requests: 5 days
- Total issue authors: 19
- Total pull request authors: 6
- Average comments per issue: 1.71
- Average comments per pull request: 1.67
- Merged pull requests: 78
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 3
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 6.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jfy133 (25)
- Midnighter (17)
- maxibor (3)
- MajoroMask (3)
- sofstam (2)
- alexhbnr (2)
- UmaJan (1)
- prototaxites (1)
- LilyAnderssonLee (1)
- Swindle98 (1)
- paulzierep (1)
- apcamargo (1)
- kdm9 (1)
- SannaAb (1)
- nvhphuc1206 (1)
Pull Request Authors
- Midnighter (51)
- jfy133 (20)
- sofstam (8)
- maxibor (4)
- xuanxu (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 40 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 13
- Total maintainers: 3
pypi.org: taxpasta
TAXonomic Profile Aggregation and STAndardisation
- Homepage: https://github.com/taxprofiler/taxpasta
- Documentation: https://taxpasta.readthedocs.io
- License: Apache Software License
-
Latest release: 0.7.0
published over 1 year ago
Rankings
Maintainers (3)
Dependencies
- actions/checkout v3 composite
- actions/setup-python v3 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- softprops/action-gh-release v0.1 composite
- depinfo ~=2.2
- numpy ~=1.20
- pandas ~=1.4
- pandera ~=0.14
- taxopy ~=0.10
- typer ~=0.6