https://github.com/alberdilab/barcodemapper
Quantification of dietary items from animal-derived metagenomes
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Repository
Quantification of dietary items from animal-derived metagenomes
Basic Info
- Host: GitHub
- Owner: alberdilab
- Language: Python
- Default Branch: main
- Size: 453 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md

BarcodeMapper is a software tool for detecting and quantifying plant, fungal, and animal DNA by mapping shotgun sequencing reads to taxonomically annotated COI and ITS marker gene sequences. Leveraging filtered versions of the BOLD and UNITE databases, it enables identification of host species from faecal samples, provides insights into dietary composition, and supports the detection of potential contaminants.
1. Installation
BarcodeMapper can be installed with its python dependencies (but not snakemake, fastp, samtools and bowtie2) directly from this repository using pip.
sh
pip install git+https://github.com/alberdilab/barcodemapper.git
barcodemapper -h
If you don't have the dependencies listed below installed you can create a custom ready-to-use environment for BarcodeMapper with fully compatible software versions using conda.
sh
wget https://raw.githubusercontent.com/alberdilab/barcodemapper/main/barcodemapper_env.yml
conda env create --file barcodemapper_env.yml
conda activate barcodemapper
barcodemapper -h
Dependencies
Software
- snakemake
- fastp
- samtools
- bowtie2
Python modules
- numpy
- pandas
- argparse
- PyYAML
- requests
- plotly
2. Source databases
There are two main modes for running BarcodeMapper: a) using the default BarcodeMapper database or b) generating a custom database from raw BOLD and UNITE databases.
BarcodeMapper database
This is the standard database containing marker genes sequences of plants, basidiomycota fungi, and animals. It was generated in March 2025 with the most updated versions of BOLD (2025-03-14) and UNITE (*All eukaryotes; 2025-02-19) databases at the time. The database can be stored in any directory, and can be used in BarcodeMapper by pointing towards the main fasta file -d directory/to/database/barcodemapper_db_202503.fa.
sh
cd barcodemapper_db
wget https://sid.erda.dk/share_redirect/FGathhqKb5/barcodemapper_db_202503.tar.gz
tar -xzvf barcodemapper_db_202503.tar.gz
Custom database
If you want to use different versions of the databases (e.g., a more recent update) or a different taxonomic breadth (e.g., exclude certain animals from the database), you can download the original databases and create a custom BarcodeMapper database.
BOLD
- Login to BOLDSystems (you need a user account).
- Visit https://bench.boldsystems.org/index.php/datapackages/Latest
- Obtain the link of the FASTA version of the database. Note that when making the download request a unique download link like the one shown below will be generated, so this step cannot be reproduced directly.
- Download the FASTA file (gz compression) and decompress it.
sh
cd barcodemapper_db
curl -L -o BOLD_Public.14-Mar-2025.fa.gz "https://bench.boldsystems.org/index.php/API_Datapackage/fasta?id=BOLD_Public.14-Mar-2025&uid=167dcd55552bc4"
gunzip BOLD_Public.14-Mar-2025.fa.gz
UNITE
- Visit: https://unite.ut.ee/repository.php
- Obtain the link of the FASTA version of the All Eukaryotes database.
- Download the FASTA file (tgz compression) and decompress it.
sh
cd barcodemapper_db
wget -O sh_general_release_dynamic_s_all_19.02.2025.tgz https://s3.hpc.ut.ee/plutof-public/original/b02db549-5f04-43fc-afb6-02888b594d10.tgz
tar xvf sh_general_release_dynamic_s_all_19.02.2025.tgz
3. BarcodeMapper usage
With existing BarcodeMapper database
Just provide the input data (-i), the path to the BarcodeMapper database (-d) and the output (-o) file.
sh
barcodemapper -i path/to/reads_folder -d barcodemapper_db_202503.fa -o final_file.txt
Or alternatively, add your forward (-1) and reverse (-2) sequencing reads as comma-separated lists.
sh
barcodemapper -1 sample1_1.fq.gz,sample2_1.fq.gz -2 sample1_2.fq.gz,sample2_2.fq.gz -d barcodemapper_db_202503.fa -o final_file.txt
Note that while BarcodeMapper can be run on individual samples, it is primarily designed for batch processing to generate combined output files (taxonomy file and sunburst plot).
With original BOLD and UNITE databases
Provide the paths to the original BOLD (-b) and UNITE (-u) databases, as well as the destination of the BarcodeMapper database (-d). Optionally, limit the taxa to be included in the BarcodeMapper database using -x for limiting BOLD taxa and -y for limiting UNITE taxa. Once the BarcodeMapper database has been generated, you will only need to use (-d).
sh
barcodemapper -i path/to/readsdir -b barcodemapper_db/BOLD_Public.14-Mar-2025.fa -u barcodemapper_db/sh_general_release_dynamic_s_all_19.02.2025.fasta -d barcodemapper_db/barcodemapper_db_202503.fa -x k__Animalia -y k__Viridiplantae,p__Basidiomycota -o barcodemapper_results.txt
If you only want to create the database without running any analysis, use the --build argument without input and output arguments besides the databases.
sh
barcodemapper --build -b barcodemapper_db/BOLD_Public.14-Mar-2025.fa -u barcodemapper_db/sh_general_release_dynamic_s_all_19.02.2025.fasta -d barcodemapper_db/barcodemapper_db_202503.fa -x k__Animalia -y k__Viridiplantae,p__Basidiomycota
On a computational cluster
When processing multiple samples on a computational cluster, it is recommended to use a screen session and SLURM to run the pipeline efficiently—optimising resource usage and adhering to the job queue system.
sh
screen -S barcodemapper
barcodemapper -i path/to/readsdir -d barcodemapper_db_202503.fa -o final_file.txt --slurm
Modulate mapping stringency
BarcodeMapper allows to hone the stringency of the mapping by tweaking two parameters:
Minimum coverage (-c, --min_coverage): specifies the minimum number of nucleotide bases that must align with the reference sequence.
Maximum percentage of mismatches (-m, --max_mismatches): defines the maximum percentage of mismatches allowed in the alignment. For instance, if -c is set to 150, and -m to 2 (%), then up to 3 mismatches will be permitted (150*2/100=3) for a read to be considered mapped.
To simplify this process, BarcodeMapper offers four preset mapping modes, each setting specific -c and -m values to control stringency levels:
- Ultrastrict (--ultrastrict): sets -c to 120 and -m to 0. This mode minimises false positives by requiring perfect matches across a large portion of the sequence. While ideal for high-confidence species-level annotations, it may miss valid mappings due to its high stringency, increasing the false negative rate.
sh
barcodemapper -i path/to/readsdir -d barcodemapper_db_202503.fa -o final_file.txt --ultrastrict
- Strict (--strict): sets -c to 100 and -m to 2. This mode balances accuracy and flexibility, allowing for minor differences between sample reads and reference sequences. It accounts for natural intraspecific variability that is naturally found across different environments or geographic regions, slightly increasing the false positive rate while significantly reducing false negatives.
sh
barcodemapper -i path/to/readsdir -d barcodemapper_db_202503.fa -o final_file.txt --strict
- Standard (--standard or nothing): sets -c to 80 and -m to 3. The default mode, designed to strike a practical balance between sensitivity and specificity. It may miss some species-level annotations but tends to recover a broader range of higher taxonomic levels, offering a more comprehensive and realistic view of the sample’s biodiversity. Recommended for general use.
```sh barcodemapper -i path/to/readsdir -d barcodemapperdb202503.fa -o final_file.txt --standard
or
barcodemapper -i path/to/readsdir -d barcodemapperdb202503.fa -o final_file.txt ```
- Permissive (--permissive): sets -c to 60 and -m to 5. The most relaxed setting, aimed at capturing the widest possible taxonomic breadth. While it allows for more mismatches and may miss fine-grained identifications, it provides valuable insights at broader taxonomic levels (e.g., class or phylum). Ideal when species-level resolution is not a priority.
sh
barcodemapper -i path/to/readsdir -d barcodemapper_db_202503.fa -o final_file.txt --permissive
4. Output files
Taxonomy table
The main output of BarcodeMapper is a quantitative taxonomy table that displays the number of reads mapped to each taxonomic level. Read counts are aggregated from lower to higher taxonomic ranks, providing the most specific classification possible—typically at the species level—when a read is uniquely mapped to a species-annotated reference. In cases where reads map to multiple taxa, the pipeline assigns the lowest common taxonomic level.
Taxonomy SAMPLE1 SAMPLE2
k__Fungi 244 2
k__Fungi;p__Basidiomycota 244 2
k__Fungi;p__Basidiomycota;c__Tremellomycetes 223 0
k__Fungi;p__Basidiomycota;c__Tremellomycetes;o__Holtermanniales 1 0
k__Fungi;p__Basidiomycota;c__Tremellomycetes;o__Holtermanniales;f__Holtermanniaceae 1 0
k__Fungi;p__Basidiomycota;c__Tremellomycetes;o__Holtermanniales;f__Holtermanniaceae;g__Holtermannia 1 0
k__Fungi;p__Basidiomycota;c__Tremellomycetes;o__Holtermanniales;f__Holtermanniaceae;g__Holtermannia;s__Holtermannia_saccardoi 1 0
k__Fungi;p__Basidiomycota;c__Tremellomycetes;o__Tremellales 182 0
k__Fungi;p__Basidiomycota;c__Tremellomycetes;o__Tremellales;f__Cryptococcaceae 6 0
k__Fungi;p__Basidiomycota;c__Tremellomycetes;o__Tremellales;f__Cryptococcaceae;g__Kwoniella 6 0
k__Fungi;p__Basidiomycota;c__Tremellomycetes;o__Tremellales;f__Cryptococcaceae;g__Kwoniella;s__Kwoniella_shandongensis 6 0
k__Fungi;p__Basidiomycota;c__Tremellomycetes;o__Tremellales;f__Tremellaceae 59 0
k__Fungi;p__Basidiomycota;c__Tremellomycetes;o__Tremellales;f__Tremellaceae;g__Tremella 59 0
k__Fungi;p__Basidiomycota;c__Tremellomycetes;o__Tremellales;f__Tremellaceae;g__Tremella;s__Tremella_cheejenii 23 0
k__Fungi;p__Basidiomycota;c__Tremellomycetes;o__Tremellales;f__Tremellaceae;g__Tremella;s__Tremella_mesenterica 36 0
k__Viridiplantae 23 19
k__Viridiplantae;p__Anthophyta 23 12
k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae 20 12
k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae;o__Fagales 19 9
k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae;o__Fagales;f__Betulaceae 0 8
k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae;o__Fagales;f__Betulaceae;g__Ostrya 0 7
k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae;o__Fagales;f__Betulaceae;g__Ostrya;s__Ostrya_carpinifolia 0 2
k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae;o__Fagales;f__Fagaceae 13 0
k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae;o__Fagales;f__Fagaceae;g__Castanea 4 0
k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae;o__Fagales;f__Fagaceae;g__Castanea;s__Castanea_dentata 1 0
k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae;o__Fagales;f__Fagaceae;g__Castanea;s__Castanea_seguinii 1 0
k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae;o__Fagales;f__Fagaceae;g__Lithocarpus 1 0
k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae;o__Fagales;f__Fagaceae;g__Lithocarpus;s__Lithocarpus_corneus 1 0
[Note that many lines have been removed from the above example for the sake of visualisation]
Report
BarcodeMapper also yields a HTML report in which the taxonomic classifications of all samples are visualised in barplots and sunburst plots.

Owner
- Name: alberdilab
- Login: alberdilab
- Kind: organization
- Repositories: 1
- Profile: https://github.com/alberdilab
GitHub Events
Total
- Push event: 13
Last Year
- Push event: 13
Dependencies
- PyYAML *
- argparse *
- numpy *
- pandas *
- requests *
- snakemake *