atavide

Atavistic processing of metagenomics data.

https://github.com/linsalrob/atavide

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary

Keywords

bioinformatics bioinformatics-pipeline metagenomics metagenomics-binning metagenomics-counts metagenomics-toolkit

Scientific Fields

Mathematics Computer Science - 84% confidence
Last synced: 4 months ago · JSON representation ·

Repository

Atavistic processing of metagenomics data.

Basic Info
  • Host: GitHub
  • Owner: linsalrob
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 107 KB
Statistics
  • Stars: 2
  • Watchers: 3
  • Forks: 2
  • Open Issues: 6
  • Releases: 3
Topics
bioinformatics bioinformatics-pipeline metagenomics metagenomics-binning metagenomics-counts metagenomics-toolkit
Created over 4 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License Citation

README.md

Edwards Lab DOI DOI License: MIT GitHub language count

atavide

atavide is a simple, yet complete workflow for metagenomics data analysis, including QC/QA, optional host removal, assembly and cross-assembly, and individual read based annotations. We have also built in some advanced analytics including tools to assign annotations from reads to contigs, and to generate metagenome-assembled genomes in several different ways, giving you the power to explore your data!

atavide is 100% snakemake and conda, so you only need to install the snakemake workflow, and then everything else will be installed with conda.

It is definitely a work in progress, but you can run it with the following command

bash snakemake --configfile config/atavide.yaml -s workflow/atavide.snakefile --profile slurm

But you will need a slurm profile to make this work!

Installation and getting going

  1. Clone this repository from GitHub: git clone https://github.com/linsalrob/atavide.git
  2. Set the location of the repository: export ATAVIDE_DIR=$PWD/atavide/
  3. Install a few python dependencies. You probably already have most of these, but the one that trips up is pysam. We're working on getting conda configured properly to do this automatically. pip install -r $ATAVIDE_DIR/requirements.txt
  4. Install the appropriate super-focus database [hint: probably version 2] and set the SUPERFOCUS_DB directory to point to the location of those files.
  5. Copy the NCBI taxonomy (You really just need the taxdump.tar.gz file), and set the NCBI_TAXONOMY environment variable to point to the location of those files.
  6. Have a directory of fastq files with both _R1_ and _R2_ files in a data directory: $DATA_DIR/fastq
  7. Run atavide: cd $DATA_DIR && snakemake --configfile $ATAVIDE_DIR/atavide.yaml -s $ATAVIDE_DIR/workflow/atavide.snakefile --profile slurm

Current processing steps:

Steps:

  1. QC/QA with prinseq++
  2. optional host removal using bowtie2 and samtools, as described previously. To enable this, you need to provide a path to the host db and a host db.

Metagenome assembly

  1. pairwise assembly of each sample using megahit
  2. extraction of all reads that do not assemble using samtools flags
  3. assembly of all unassembled reads using megahit
  4. compilation of all contigs into a single unified set using Flye
  5. comparison of reads -> contigs to generate coverage

MAG creation

  1. metabat
  2. concoct
  3. Pairwise comparisons using turbocor followed by clustering

Read-based annotations

  1. Kraken2
  2. singlem
  3. SUPER-focus
  4. FOCUS

Want something else added to the suite? File an issue on GitHub and we'll add it ASAP!

Owner

  • Name: Rob Edwards
  • Login: linsalrob
  • Kind: user
  • Location: Adelaide, Australia
  • Company: Flinders University

Professor of CS and Biology Writing bioinformatics code to study viruses, phages, and metagenomes.

Citation (CITATION.md)

# Citation

If you use atavide, please cite

Roach M., and Edwards R.A., 2021. Atavide: atavistic metagenome annotations. DOI: 10.5281/zenodo.5523912

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 95
  • Total Committers: 1
  • Avg Commits per committer: 95.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
linsalrob r****s@g****m 95

Issues and Pull Requests

Last synced: 5 months ago

All Time
  • Total issues: 7
  • Total pull requests: 0
  • Average time to close issues: over 1 year
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • linsalrob (7)
Pull Request Authors
Top Labels
Issue Labels
enhancement (4) good first issue (1)
Pull Request Labels

Dependencies

requirements.txt pypi
  • matplotlib *
  • numpy *
  • pandas *
  • pysam *
  • scipy *
  • seaborn *