aviary-genome

A hybrid assembly and MAG recovery pipeline (and more!)

https://github.com/rhysnewell/aviary

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    8 of 12 committers (66.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.8%) to scientific vocabulary

Keywords

assembly binning bioinformatics metagenomics workflow
Last synced: 6 months ago · JSON representation ·

Repository

A hybrid assembly and MAG recovery pipeline (and more!)

Basic Info
  • Host: GitHub
  • Owner: rhysnewell
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 39.8 MB
Statistics
  • Stars: 99
  • Watchers: 4
  • Forks: 15
  • Open Issues: 24
  • Releases: 26
Topics
assembly binning bioinformatics metagenomics workflow
Created over 5 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

install with bioconda DOI

Aviary

An easy to use for wrapper for a robust snakemake pipeline for metagenomic short-read, long-read, and hybrid assembly. Aviary also performs binning, annotation, strain diversity analyses,a nd provides users with an easy way to combine and dereplicate many aviary results with rapidity. The pipeline currently includes a series of distinct, yet flexible, modules that can seamlessly communicate with each other. Each module can be run independently or as a single pipeline depending on provided input.

Please refer to the full docs here

Quick Installation

Your conda channels should be configured ideally in this order: conda config --add channels defaults conda config --add channels bioconda conda config --add channels conda-forge

Your resulting .condarc file should look something like: channels: - conda-forge - bioconda - defaults

Option 1: Install from Bioconda

Conda can handle the creation of the environment for you directly:

conda create -n aviary -c bioconda aviary

Or install into existing environment: conda install -c bioconda aviary

Option 2: Install from pip

Create the environment using the aviary.yml file then install from pip: conda env create -n aviary -f aviary.yml conda activate aviary pip install aviary-genome

Option 3: Install from source

Initial requirements for aviary can be downloaded using the aviary.yml: git clone https://github.com/rhysnewell/aviary.git cd aviary conda env create -n aviary -f aviary.yml conda activate aviary pip install -e . The aviary executable can then be run from any directory. Since the code in this directory is then used for running, any updates made there will be immediately available. We recommend this mode for developing and debugging aviary.

Checking installation

Whatever option you choose, running aviary --help should return the following output:

``` ......:::::: AVIARY ::::::......

       A comprehensive metagenomics bioinformatics pipeline

Metagenome assembly, binning, and annotation: assemble - Perform hybrid assembly using short and long reads, or assembly using only short reads recover - Recover MAGs from provided assembly using a variety of binning algorithms annotate - Annotate MAGs using EggNOG and GTBD-tk genotype - Perform strain diversity analysis of MAGs using Lorikeet complete - Runs each stage of the pipeline: assemble, recover, annotate, genotype in that order. cluster - Combines and dereplicates the MAGs from multiple Aviary runs using Galah

Isolate assembly, binning, and annotation: isolate - Perform isolate assembly PARTIALLY COMPLETED

Utility modules: configure - Set or overwrite the environment variables for future runs.

```

Databases

Aviary uses programs which require access to locally stored databases. These databases can be quite large, as such we recommend setting up one instance of Aviary and these databases per machine or machine cluster.

The required databases are as follows: * GTDB * EggNog * CheckM2 * SingleM

Installing databases

Aviary can handle the download and installation of these databases via use of the --download flag. Using --download will download and install the databases into the folders corresponding to their associated environment variables. Aviary will ask you to set these environment variables upon first running and if they are not already available. Otherwise, users can use the aviary configure subcommand to reset the environment variables:

commandline aviary configure -o logs/ --eggnog-db-path /shared/db/eggnog/ --gtdb-path /shared/db/gtdb/ --checkm2-db-path /shared/db/checkm2db/ --singlem-metapackage-path /shared/db/singlem/ --download

This command will check if the databases exist at those given locations, if they don't then aviary will download and change the conda environment variables to match those paths.

N.B. Again, these databases are VERY large. Please talk to your sysadmin/bioinformatics specialist about setting a shared location to install these databases to prevent unnecessary storage use. Additionally, the --download flag can be used within any aviary module to check that databases are configured properly.

Environment variables

Upon first running Aviary, you will be prompted to input the location for several database folders if they haven't already been provided. If at any point the location of these folders change you can use the the aviary configure module to update the environment variables used by aviary.

These environment variables can also be configured manually, just set the following variables in your .bashrc file: export GTDBTK_DATA_PATH=/path/to/gtdb/gtdb_release220/db/ # https://gtdb.ecogenomic.org/downloads export EGGNOG_DATA_DIR=/path/to/eggnog-mapper/2.1.8/ # https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.8#setup export SINGLEM_METAPACKAGE_PATH=/path/to/singlem_metapackage.smpkg/ export CHECKM2DB=/path/to/checkm2db/ export CONDA_ENV_PATH=/path/to/conda/envs/

Workflow

Aviary workflow

Citations

If you use aviary then please be aware that you are using a great number of other programs and aviary wrapping around them. You should cite all of these tools as well, or whichever tools you know that you are using. To make this easy for you we have provided the following list of citations for you to use in alphabetical order. This list will be updated as new modules are added to aviary.

A constantly updating list of citations can be found in the Citations document.

License

Code is GPL-3.0

Owner

  • Name: Rhys Newell
  • Login: rhysnewell
  • Kind: user
  • Location: Sydney, Australia
  • Company: Microba LifeSciences

Bioinformatics Software Engineer @ Microba. Awaiting for my PhD to be examined. Specialises in software development and analysis of big genomic data

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Newell
    given-names: Rhys J. P.
    orcid: https://orcid.org/0000-0002-1300-6116
  - family-names: Aroney
    given-names: Samuel T. N.
    orcid: https://orcid.org/0000-0001-9806-5846
  - family-names: Zaugg
    given-names: Julian
    orcid: https://orcid.org/0000-0002-4919-1448
  - family-names: Sternes
    given-names: Peter
    orcid: https://orcid.org/0000-0002-4456-150X
  - family-names: Tyson
    given-names: Gene W.
    orcid: https://orcid.org/0000-0001-8559-9427
  - family-names: Woodcroft
    given-names: Ben J.
    orcid: https://orcid.org/0000-0003-0670-7480
title: "Aviary: Hybrid assembly and genome recovery from metagenomes"
version: 0.9.0
doi: 10.5281/zenodo.10806928
date-released: 2024-03-12
preferred-citation:
  type: article
  authors:
    - family-names: Newell
      given-names: Rhys J. P.
      orcid: https://orcid.org/0000-0002-1300-6116
    - family-names: Aroney
      given-names: Samuel T. N.
      orcid: https://orcid.org/0000-0001-9806-5846
    - family-names: Zaugg
      given-names: Julian
      orcid: https://orcid.org/0000-0002-4919-1448
    - family-names: Sternes
      given-names: Peter
      orcid: https://orcid.org/0000-0002-4456-150X
    - family-names: Tyson
      given-names: Gene W.
      orcid: https://orcid.org/0000-0001-8559-9427
    - family-names: Woodcroft
      given-names: Ben J.
      orcid: https://orcid.org/0000-0003-0670-7480
  title: "Aviary: Hybrid assembly and genome recovery from metagenomes"
  doi: 10.5281/zenodo.10158086
  journal: "Zenodo"

GitHub Events

Total
  • Create event: 41
  • Release event: 4
  • Issues event: 38
  • Watch event: 17
  • Delete event: 19
  • Issue comment event: 108
  • Push event: 128
  • Pull request review comment event: 19
  • Pull request review event: 51
  • Pull request event: 76
  • Fork event: 3
Last Year
  • Create event: 41
  • Release event: 4
  • Issues event: 38
  • Watch event: 17
  • Delete event: 19
  • Issue comment event: 108
  • Push event: 128
  • Pull request review comment event: 19
  • Pull request review event: 51
  • Pull request event: 76
  • Fork event: 3

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 534
  • Total Committers: 12
  • Avg Commits per committer: 44.5
  • Development Distribution Score (DDS): 0.371
Past Year
  • Commits: 15
  • Committers: 3
  • Avg Commits per committer: 5.0
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Rhys Newell r****l@h****u 336
AroneyS s****y@q****u 106
Ben Woodcroft b****t@g****m 42
rhysnewell r****l@m****m 34
JamesRH j****h@g****m 6
julianzaugg j****g@g****m 2
sternp p****s@q****u 2
Yibi Chen c****2@l****u 2
Virginie Perlo p****o@c****u 1
Rhys Newell n****9@c****u 1
Rhys Newell n****9@c****u 1
Rhys Newell n****9@c****u 1

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 156 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 26
  • Total maintainers: 3
pypi.org: aviary-genome

aviary - metagenomics pipeline using long and short reads

  • Versions: 26
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 156 Last month
Rankings
Dependent packages count: 6.6%
Stargazers count: 9.4%
Forks count: 12.8%
Average: 15.5%
Downloads: 18.1%
Dependent repos count: 30.6%
Maintainers (3)
Last synced: 6 months ago

Dependencies

.github/workflows/deploy-docs.yaml actions
  • Homebrew/actions/setup-homebrew master composite
  • actions/checkout main composite
  • crazy-max/ghaction-github-pages v3.1.0 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • marvinpinto/action-automatic-releases latest composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/test-aviary.yml actions
  • actions/checkout v2 composite
  • conda-incubator/setup-miniconda v2 composite
setup.py pypi
  • biopython *
  • numpy *
  • pandas *
  • ruamel.yaml >=0.15.99
  • snakemake *