Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: robertosanchezn
  • Language: HTML
  • Default Branch: main
  • Size: 11.9 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 1
  • Open Issues: 1
  • Releases: 0
Created over 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme Citation

README.md

Binder

bgcflowAShqMAGs

  • This repository contains the code required to reproduce the analysis from the manuscript Snchez-Navarro et al. 2022. Long-Read Metagenome-Assembled Genomes Improve Identification of Novel Complete Biosynthetic Gene Clusters in a Complex Microbial-Activated Sludge Ecosystem.

Preview notebooks

Usage

Follow these steps to reproduce the analysis and data generated in this study.

Clone this repository

Clone this repository to your local machine by: bash git clone git@github.com:robertosanchezn/AS_hqMAGs.git cd AS_hqMAGs

Run the analysis

To generate the figures in the manuscript, run the analysis inside the r_markdown folder or jupyter_notebook folder. Each folder has its own README.md with instructions to run the analysis.

Reproduce the data

1. Install Conda Environments & BGCFlow

  • This analysis was done in Microsoft Azure Virtual Machine running on Linux (ubuntu 20.04).
  • Get a clone of BGCflow, following the instructions at https://github.com/NBChub/bgcflow : bash git clone git@github.com:NBChub/bgcflow.git cd bgcflow
  • Switch to branch v0.3.3-alpha (where this study was conducted) bash git checkout v0.3.3-alpha >- TO DO: attach a zipped archive of the v0.3.3-alpha
  • Installing Snakemake using Mamba is advised. In case you dont use Mambaforge you can always install Mamba into any other Conda-based Python distribution with: bash conda install -n base -c conda-forge mamba
  • Install conda environments ```bash # snakemake environment mamba create -c conda-forge -c bioconda -n snakemake snakemake=7.6.1

environment to run notebooks

mamba env create -n workflow/envs/bgc_analytics.yaml ```

2. Snakemake configuration set up

  • Set up the configuration files by copying the content in /bgcflow_configuration folder (replacing the original config.yaml in BGCflow) shell cp ../bgcflow_config/* config/. -r ### 3. Download and prepare data from other studies
  • Not all of the genomes are hosted in NCBI, and some fasta files needs cleaning. Run the notebook to grab all custom fasta files to data/raw/fasta. ```shell # run notebook to download genomes from other studies to bgcflow/data/external, will take a while to finish conda activate bgcanalytics (cd ../jupyternotebook/notebook2/ && jupyter nbconvert --to html --execute 01otherMAGdatasettable.ipynb) conda deactivate

generate symlink

extdir="data/external" for directory in Bickhartetal Chenetalsanitized Liuetal Sharraretalsanitized; do for fna in $extdir/$directory/*.fna do (cd data/raw/fasta && ln -s ../../external/$directory/$(basename $fna) $(basename $fna) --verbose) done done ```

4. Run the workflow for each individual study

This will generate antiSMASH results and other downstream processes. bash conda activate snakemake snakemake --use-conda --cores 8 --keep-going -n conda deactivate - PS: remove the args -n to do a real run

5. Run the workflow for all study comparison

This will generate antiSMASH results and other downstream processes. bash conda activate snakemake snakemake --configfile config/config_all_studies.yaml --use-conda --cores 8 --keep-going -n conda deactivate - PS: remove the args -n to do a real run

6. Run the workflow for in depth study in Phylum Nitrospirota and Myxococcota

This will generate antiSMASH results and other downstream processes. bash conda activate snakemake snakemake --configfile config/config_in_depth.yaml --use-conda --cores 8 --keep-going -n conda deactivate - PS: remove the args -n to do a real run

Owner

  • Login: robertosanchezn
  • Kind: user
  • Company: Aalborg University

PhD student

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you find this repository useful, please cite it as below."
authors:
- family-names: "Sánchez-Navarro"
  given-names: "Roberto"
  orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Nuhamunada"
  given-names: "Matin"
  orcid: "https://orcid.org/0000-0003-3177-8299"
- family-names: "Mohite"
  given-names: "Omkar"
  orcid: "https://orcid.org/0000-0002-3240-1656"
- family-names: "Wasmund"
  given-names: "Kenneth"
  orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Albertsen"
  given-names: "Mads"
  orcid: "https://orcid.org/0000-0002-6151-190X"
- family-names: "Gram"
  given-names: "Lone"
  orcid: "https://orcid.org/0000-0002-1076-5723"
- family-names: "Nielsen"
  given-names: "Per H."
  orcid: "https://orcid.org/0000-0002-6402-1877"
- family-names: "Weber"
  given-names: "Tilmann"
  orcid: "https://orcid.org/0000-0002-8260-5120"
- family-names: "Singleton"
  given-names: "Caitlin M."
  orcid: "https://orcid.org/0000-0001-9688-8208"
title: "Long-Read Metagenome-Assembled Genomes Improve Identification of Novel Complete Biosynthetic Gene Clusters in a Complex Microbial-Activated Sludge Ecosystem"
version: 1.0.0
doi: "TBD"
date-released: 2022-11-14
url: "https://github.com/robertosanchezn/AS_hqMAGs"
preferred-citation:
  type: article
  authors:
  - family-names: "Sánchez-Navarro"
    given-names: "Roberto"
    orcid: "https://orcid.org/0000-0000-0000-0000"
  - family-names: "Nuhamunada"
    given-names: "Matin"
    orcid: "https://orcid.org/0000-0000-0000-0000"
  - family-names: "Mohite"
    given-names: "Omkar"
    orcid: "https://orcid.org/0000-0000-0000-0000"
  - family-names: "Wasmund"
    given-names: "Kenneth"
    orcid: "https://orcid.org/0000-0000-0000-0000"
  - family-names: "Albertsen"
    given-names: "Mads"
    orcid: "https://orcid.org/0000-0000-0000-0000"
  - family-names: "Gram"
    given-names: "Lone"
    orcid: "https://orcid.org/0000-0000-0000-0000"
  - family-names: "Nielsen"
    given-names: "Per H."
    orcid: "https://orcid.org/0000-0000-0000-0000"
  - family-names: "Weber"
    given-names: "Tilmann"
    orcid: "https://orcid.org/0000-0000-0000-0000"
  - family-names: "Singleton"
    given-names: "Caitlin M."
    orcid: "https://orcid.org/0000-0000-0000-0000"
  doi: "TBD"
  journal: "TBD"
  month: 11
  start: 1 # First page number
  end: 10 # Last page number
  title: "Long-Read Metagenome-Assembled Genomes Improve Identification of Novel Complete Biosynthetic Gene Clusters in a Complex Microbial-Activated Sludge Ecosystem"
  issue: 1
  volume: 1
  year: 2022

GitHub Events

Total
Last Year

Dependencies

environment.yml conda
  • alive-progress
  • bioconductor-genomicranges
  • jupyterlab
  • jupytext
  • openpyxl
  • pandas
  • pip
  • pysqlite3
  • r-argparser
  • r-base
  • r-essentials
  • r-irkernel
  • r-pbapply
  • r-tidyverse
  • seaborn