https://github.com/alberdilab/drakkar

Metagenomics pipeline optimised for Mjolnir

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Metagenomics pipeline optimised for Mjolnir

Basic Info

Host: GitHub
Owner: alberdilab
Language: Python
Default Branch: main
Size: 500 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 1
Releases: 0

Created over 1 year ago · Last pushed 10 months ago

Metadata Files

Readme

DRAKKAR is a snakemake-based genome-resolved metagenomics pipeline optimised for Mjolnir. Snakemake works along with Slurm to conduct the long pipeline using the optimal memory and time resources. It is built in a modular fashion, so that the entire workflow or only parts of it can be executed. Extended usage tutorial can be found in https://drakkar.readthedocs.io/

Quickstart

module load drakkar/1.0.0 drakkar complete -f input_info.tsv -o drakkar_output

Modules

DRAKKAR is a modular software that allows executing each section of the genome-resolved metagenomic pipeline independently.

Preprocessing: quality-filters the reads and optionally removes host DNA. drakkar preprocessing {arguments}
Cataloging: assembles and bins the metagenomic reads using multiple strategies. drakkar cataloging {arguments}
Annotating: annotates the bins taxonomically and/or functionally, and/or generates community-scale metabolic networks. drakkar annotating {arguments}
Profiling: conducts genome- or pangenome-based quantitative analyses. drakkar profiling {arguments}

Complete mode

All the modules of DRAKKAR can be run together by using the complete mode. drakkar complete {arguments}

Usage examples

Without sample info file

Minimum usage

drakkar complete -i {input_path} -o {output_path}

-i: path to the folder where the metagenomic sequencing reads are stored. -o: path in which the DRAKKAR outputs will be stored. Metagenomic reads are not mapped to a host genome, individual assemblies are performed, and genome-based profiling is conducted.

With reference genome

drakkar complete -i {input_path} -o {output_path} -r {genome_path}

-i: path to the folder where the metagenomic sequencing reads are stored. -o: path in which the DRAKKAR outputs will be stored. -r: path to the reference genome. Metagenomic reads are mapped to the host genome individual assemblies are performed, and genome-based profiling is conducted.

With reference genome and assembly mode

drakkar complete -i {input_path} -o {output_path} -r {genome_path} -m individual,all -t genomes,pangenomes

-i: path to the folder where the metagenomic sequencing reads are stored. -o: path in which the DRAKKAR outputs will be stored. -r: path to the reference genome. -m: comma-separated list of assembly modes Metagenomic reads are mapped to the host genome, individual assemblies as well as a single coassembly including all samples are performed, and both genome- and pangenome-based profiling is conducted.

With sample info file

|sample|rawreads1|rawreads2|referencename|referencepath|assembly| |---|---|---|---|---|---| |sample1|path/sample11.fq.gz|path/sample12.fq.gz|ref1|path/ref1.fna|assembly1,all| |sample1|path/sample11.fq.gz|path/sample12.fq.gz|ref1|path/ref1.fna|assembly1,all| |sample2|path/sample21.fq.gz|path/sample22.fq.gz|ref1|path/ref1.fna|assembly2,all| |sample3|path/sample31.fq.gz|path/sample32.fq.gz|ref2|path/ref2.fna|assembly2,all| |sample4|path/sample41.fq.gz|path/sample42.fq.gz|ref2|path/ref2.fna|assembly2,all| |sample4|path/sample41.fq.gz|path/sample42.fq.gz|ref2|path/ref2.fna|assembly2,all|

Minimum usage

drakkar complete -f {info_file} -o {output_path} All the required information is extracted from the sample info file.

Minimum usage

drakkar complete -f {info_file} -o {output_path} -m individual Individual assemblies are also conducted on top of the assemblies specified in the sample info file.

DRAKKAR modules

Preprocessing

Quality-filtering using fastp
Reference genome indexing
Reference genome mapping
Metagenomic and host genomic data outputting

Cataloging

Documentation to be added.

Profiling

Documentation to be added.

Owner

Name: alberdilab
Login: alberdilab
Kind: organization

Repositories: 1
Profile: https://github.com/alberdilab

GitHub Events

Total

Push event: 619
Create event: 2

Last Year

Push event: 619
Create event: 2

Dependencies

setup.py pypi

argparse *
numpy *
pandas *

docs/requirements.txt pypi

sphinx ==7.1.2
sphinx-rtd-theme ==1.3.0rc1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/alberdilab/drakkar

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Quickstart

Modules

Complete mode

Usage examples

Without sample info file

Minimum usage

With reference genome

With reference genome and assembly mode

With sample info file

Minimum usage

Minimum usage

DRAKKAR modules

Preprocessing

Cataloging

Profiling

Owner

GitHub Events

Total

Last Year

Dependencies