https://github.com/alberdilab/drakkar

Metagenomics pipeline optimised for Mjolnir

https://github.com/alberdilab/drakkar

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Metagenomics pipeline optimised for Mjolnir

Basic Info
  • Host: GitHub
  • Owner: alberdilab
  • Language: Python
  • Default Branch: main
  • Size: 500 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme

README.md

alt text

DRAKKAR is a snakemake-based genome-resolved metagenomics pipeline optimised for Mjolnir. Snakemake works along with Slurm to conduct the long pipeline using the optimal memory and time resources. It is built in a modular fashion, so that the entire workflow or only parts of it can be executed. Extended usage tutorial can be found in https://drakkar.readthedocs.io/

Quickstart

module load drakkar/1.0.0 drakkar complete -f input_info.tsv -o drakkar_output

Modules

DRAKKAR is a modular software that allows executing each section of the genome-resolved metagenomic pipeline independently.

  • Preprocessing: quality-filters the reads and optionally removes host DNA. drakkar preprocessing {arguments}
  • Cataloging: assembles and bins the metagenomic reads using multiple strategies. drakkar cataloging {arguments}
  • Annotating: annotates the bins taxonomically and/or functionally, and/or generates community-scale metabolic networks. drakkar annotating {arguments}
  • Profiling: conducts genome- or pangenome-based quantitative analyses. drakkar profiling {arguments}

Complete mode

All the modules of DRAKKAR can be run together by using the complete mode. drakkar complete {arguments}

Usage examples

Without sample info file

Minimum usage

drakkar complete -i {input_path} -o {output_path}

-i: path to the folder where the metagenomic sequencing reads are stored. -o: path in which the DRAKKAR outputs will be stored. Metagenomic reads are not mapped to a host genome, individual assemblies are performed, and genome-based profiling is conducted.

With reference genome

drakkar complete -i {input_path} -o {output_path} -r {genome_path}

-i: path to the folder where the metagenomic sequencing reads are stored. -o: path in which the DRAKKAR outputs will be stored. -r: path to the reference genome. Metagenomic reads are mapped to the host genome individual assemblies are performed, and genome-based profiling is conducted.

With reference genome and assembly mode

drakkar complete -i {input_path} -o {output_path} -r {genome_path} -m individual,all -t genomes,pangenomes

-i: path to the folder where the metagenomic sequencing reads are stored. -o: path in which the DRAKKAR outputs will be stored. -r: path to the reference genome. -m: comma-separated list of assembly modes Metagenomic reads are mapped to the host genome, individual assemblies as well as a single coassembly including all samples are performed, and both genome- and pangenome-based profiling is conducted.

With sample info file

|sample|rawreads1|rawreads2|referencename|referencepath|assembly| |---|---|---|---|---|---| |sample1|path/sample11.fq.gz|path/sample12.fq.gz|ref1|path/ref1.fna|assembly1,all| |sample1|path/sample11.fq.gz|path/sample12.fq.gz|ref1|path/ref1.fna|assembly1,all| |sample2|path/sample21.fq.gz|path/sample22.fq.gz|ref1|path/ref1.fna|assembly2,all| |sample3|path/sample31.fq.gz|path/sample32.fq.gz|ref2|path/ref2.fna|assembly2,all| |sample4|path/sample41.fq.gz|path/sample42.fq.gz|ref2|path/ref2.fna|assembly2,all| |sample4|path/sample41.fq.gz|path/sample42.fq.gz|ref2|path/ref2.fna|assembly2,all|

Minimum usage

drakkar complete -f {info_file} -o {output_path} All the required information is extracted from the sample info file.

Minimum usage

drakkar complete -f {info_file} -o {output_path} -m individual Individual assemblies are also conducted on top of the assemblies specified in the sample info file.

DRAKKAR modules

Preprocessing

  • Quality-filtering using fastp
  • Reference genome indexing
  • Reference genome mapping
  • Metagenomic and host genomic data outputting

Cataloging

Documentation to be added.

Profiling

Documentation to be added.

Owner

  • Name: alberdilab
  • Login: alberdilab
  • Kind: organization

GitHub Events

Total
  • Push event: 619
  • Create event: 2
Last Year
  • Push event: 619
  • Create event: 2

Dependencies

setup.py pypi
  • argparse *
  • numpy *
  • pandas *
docs/requirements.txt pypi
  • sphinx ==7.1.2
  • sphinx-rtd-theme ==1.3.0rc1