Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: robertosanchezn
- Language: HTML
- Default Branch: main
- Size: 11.9 MB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
bgcflowAShqMAGs
- This repository contains the code required to reproduce the analysis from the manuscript Snchez-Navarro et al. 2022. Long-Read Metagenome-Assembled Genomes Improve Identification of Novel Complete Biosynthetic Gene Clusters in a Complex Microbial-Activated Sludge Ecosystem.
Preview notebooks
- FIG 1. Overview of the number of BGCs detected in the HQ MAG data set.
- FIG 2. Distribution of BGCs and BGC classes across taxonomic groups.
- FIG 3. BGCs in selected genera functionally relevant and/or abundant in WWTPs with nutrient removal.
- FIG 4. Distribution of the presence of GCFs across the phylogenomic tree of 16 genomes of Nitrospira. TO DO
- FIG 5. Distribution of the presence of GCFs across the phylogenomic tree of 22 genomes within the Polyangiaceae. TO DO
- FIG 6. Comparison of BGC mining studies in HQ MAGs. TO DO
Usage
Follow these steps to reproduce the analysis and data generated in this study.
Clone this repository
Clone this repository to your local machine by:
bash
git clone git@github.com:robertosanchezn/AS_hqMAGs.git
cd AS_hqMAGs
Run the analysis
To generate the figures in the manuscript, run the analysis inside the r_markdown folder or jupyter_notebook folder. Each folder has its own README.md with instructions to run the analysis.
Reproduce the data
1. Install Conda Environments & BGCFlow
- This analysis was done in Microsoft Azure Virtual Machine running on Linux (ubuntu 20.04).
- Get a clone of BGCflow, following the instructions at https://github.com/NBChub/bgcflow :
bash git clone git@github.com:NBChub/bgcflow.git cd bgcflow - Switch to branch v0.3.3-alpha (where this study was conducted)
bash git checkout v0.3.3-alpha>- TO DO: attach a zipped archive of the v0.3.3-alpha - Installing Snakemake using Mamba is advised. In case you dont use Mambaforge you can always install Mamba into any other Conda-based Python distribution with:
bash conda install -n base -c conda-forge mamba - Install conda environments ```bash # snakemake environment mamba create -c conda-forge -c bioconda -n snakemake snakemake=7.6.1
environment to run notebooks
mamba env create -n workflow/envs/bgc_analytics.yaml ```
2. Snakemake configuration set up
- Set up the configuration files by copying the content in
/bgcflow_configurationfolder (replacing the originalconfig.yamlin BGCflow)shell cp ../bgcflow_config/* config/. -r### 3. Download and prepare data from other studies - Not all of the genomes are hosted in NCBI, and some fasta files needs cleaning. Run the notebook to grab all custom fasta files to
data/raw/fasta. ```shell # run notebook to download genomes from other studies to bgcflow/data/external, will take a while to finish conda activate bgcanalytics (cd ../jupyternotebook/notebook2/ && jupyter nbconvert --to html --execute 01otherMAGdatasettable.ipynb) conda deactivate
generate symlink
extdir="data/external" for directory in Bickhartetal Chenetalsanitized Liuetal Sharraretalsanitized; do for fna in $extdir/$directory/*.fna do (cd data/raw/fasta && ln -s ../../external/$directory/$(basename $fna) $(basename $fna) --verbose) done done ```
4. Run the workflow for each individual study
This will generate antiSMASH results and other downstream processes.
bash
conda activate snakemake
snakemake --use-conda --cores 8 --keep-going -n
conda deactivate
- PS: remove the args -n to do a real run
5. Run the workflow for all study comparison
This will generate antiSMASH results and other downstream processes.
bash
conda activate snakemake
snakemake --configfile config/config_all_studies.yaml --use-conda --cores 8 --keep-going -n
conda deactivate
- PS: remove the args -n to do a real run
6. Run the workflow for in depth study in Phylum Nitrospirota and Myxococcota
This will generate antiSMASH results and other downstream processes.
bash
conda activate snakemake
snakemake --configfile config/config_in_depth.yaml --use-conda --cores 8 --keep-going -n
conda deactivate
- PS: remove the args -n to do a real run
Owner
- Login: robertosanchezn
- Kind: user
- Company: Aalborg University
- Repositories: 1
- Profile: https://github.com/robertosanchezn
PhD student
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you find this repository useful, please cite it as below."
authors:
- family-names: "Sánchez-Navarro"
given-names: "Roberto"
orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Nuhamunada"
given-names: "Matin"
orcid: "https://orcid.org/0000-0003-3177-8299"
- family-names: "Mohite"
given-names: "Omkar"
orcid: "https://orcid.org/0000-0002-3240-1656"
- family-names: "Wasmund"
given-names: "Kenneth"
orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Albertsen"
given-names: "Mads"
orcid: "https://orcid.org/0000-0002-6151-190X"
- family-names: "Gram"
given-names: "Lone"
orcid: "https://orcid.org/0000-0002-1076-5723"
- family-names: "Nielsen"
given-names: "Per H."
orcid: "https://orcid.org/0000-0002-6402-1877"
- family-names: "Weber"
given-names: "Tilmann"
orcid: "https://orcid.org/0000-0002-8260-5120"
- family-names: "Singleton"
given-names: "Caitlin M."
orcid: "https://orcid.org/0000-0001-9688-8208"
title: "Long-Read Metagenome-Assembled Genomes Improve Identification of Novel Complete Biosynthetic Gene Clusters in a Complex Microbial-Activated Sludge Ecosystem"
version: 1.0.0
doi: "TBD"
date-released: 2022-11-14
url: "https://github.com/robertosanchezn/AS_hqMAGs"
preferred-citation:
type: article
authors:
- family-names: "Sánchez-Navarro"
given-names: "Roberto"
orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Nuhamunada"
given-names: "Matin"
orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Mohite"
given-names: "Omkar"
orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Wasmund"
given-names: "Kenneth"
orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Albertsen"
given-names: "Mads"
orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Gram"
given-names: "Lone"
orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Nielsen"
given-names: "Per H."
orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Weber"
given-names: "Tilmann"
orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Singleton"
given-names: "Caitlin M."
orcid: "https://orcid.org/0000-0000-0000-0000"
doi: "TBD"
journal: "TBD"
month: 11
start: 1 # First page number
end: 10 # Last page number
title: "Long-Read Metagenome-Assembled Genomes Improve Identification of Novel Complete Biosynthetic Gene Clusters in a Complex Microbial-Activated Sludge Ecosystem"
issue: 1
volume: 1
year: 2022
GitHub Events
Total
Last Year
Dependencies
- alive-progress
- bioconductor-genomicranges
- jupyterlab
- jupytext
- openpyxl
- pandas
- pip
- pysqlite3
- r-argparser
- r-base
- r-essentials
- r-irkernel
- r-pbapply
- r-tidyverse
- seaborn