https://github.com/alberdilab/drakkar
Metagenomics pipeline optimised for Mjolnir
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.6%) to scientific vocabulary
Repository
Metagenomics pipeline optimised for Mjolnir
Basic Info
- Host: GitHub
- Owner: alberdilab
- Language: Python
- Default Branch: main
- Size: 500 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md

DRAKKAR is a snakemake-based genome-resolved metagenomics pipeline optimised for Mjolnir. Snakemake works along with Slurm to conduct the long pipeline using the optimal memory and time resources. It is built in a modular fashion, so that the entire workflow or only parts of it can be executed. Extended usage tutorial can be found in https://drakkar.readthedocs.io/
Quickstart
module load drakkar/1.0.0
drakkar complete -f input_info.tsv -o drakkar_output
Modules
DRAKKAR is a modular software that allows executing each section of the genome-resolved metagenomic pipeline independently.
- Preprocessing: quality-filters the reads and optionally removes host DNA.
drakkar preprocessing {arguments} - Cataloging: assembles and bins the metagenomic reads using multiple strategies.
drakkar cataloging {arguments} - Annotating: annotates the bins taxonomically and/or functionally, and/or generates community-scale metabolic networks.
drakkar annotating {arguments} - Profiling: conducts genome- or pangenome-based quantitative analyses.
drakkar profiling {arguments}
Complete mode
All the modules of DRAKKAR can be run together by using the complete mode.
drakkar complete {arguments}
Usage examples
Without sample info file
Minimum usage
drakkar complete -i {input_path} -o {output_path}
-i: path to the folder where the metagenomic sequencing reads are stored. -o: path in which the DRAKKAR outputs will be stored. Metagenomic reads are not mapped to a host genome, individual assemblies are performed, and genome-based profiling is conducted.
With reference genome
drakkar complete -i {input_path} -o {output_path} -r {genome_path}
-i: path to the folder where the metagenomic sequencing reads are stored. -o: path in which the DRAKKAR outputs will be stored. -r: path to the reference genome. Metagenomic reads are mapped to the host genome individual assemblies are performed, and genome-based profiling is conducted.
With reference genome and assembly mode
drakkar complete -i {input_path} -o {output_path} -r {genome_path} -m individual,all -t genomes,pangenomes
-i: path to the folder where the metagenomic sequencing reads are stored. -o: path in which the DRAKKAR outputs will be stored. -r: path to the reference genome. -m: comma-separated list of assembly modes Metagenomic reads are mapped to the host genome, individual assemblies as well as a single coassembly including all samples are performed, and both genome- and pangenome-based profiling is conducted.
With sample info file
|sample|rawreads1|rawreads2|referencename|referencepath|assembly| |---|---|---|---|---|---| |sample1|path/sample11.fq.gz|path/sample12.fq.gz|ref1|path/ref1.fna|assembly1,all| |sample1|path/sample11.fq.gz|path/sample12.fq.gz|ref1|path/ref1.fna|assembly1,all| |sample2|path/sample21.fq.gz|path/sample22.fq.gz|ref1|path/ref1.fna|assembly2,all| |sample3|path/sample31.fq.gz|path/sample32.fq.gz|ref2|path/ref2.fna|assembly2,all| |sample4|path/sample41.fq.gz|path/sample42.fq.gz|ref2|path/ref2.fna|assembly2,all| |sample4|path/sample41.fq.gz|path/sample42.fq.gz|ref2|path/ref2.fna|assembly2,all|
Minimum usage
drakkar complete -f {info_file} -o {output_path}
All the required information is extracted from the sample info file.
Minimum usage
drakkar complete -f {info_file} -o {output_path} -m individual
Individual assemblies are also conducted on top of the assemblies specified in the sample info file.
DRAKKAR modules
Preprocessing
- Quality-filtering using fastp
- Reference genome indexing
- Reference genome mapping
- Metagenomic and host genomic data outputting
Cataloging
Documentation to be added.
Profiling
Documentation to be added.
Owner
- Name: alberdilab
- Login: alberdilab
- Kind: organization
- Repositories: 1
- Profile: https://github.com/alberdilab
GitHub Events
Total
- Push event: 619
- Create event: 2
Last Year
- Push event: 619
- Create event: 2
Dependencies
- argparse *
- numpy *
- pandas *
- sphinx ==7.1.2
- sphinx-rtd-theme ==1.3.0rc1