https://github.com/arcadia-science/emp500

Analysis and generation of community resources from the EMP500 metagenomes

https://github.com/arcadia-science/emp500

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Analysis and generation of community resources from the EMP500 metagenomes

Basic Info
  • Host: GitHub
  • Owner: Arcadia-Science
  • Language: R
  • Default Branch: main
  • Size: 2.03 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 3 years ago · Last pushed about 3 years ago

https://github.com/Arcadia-Science/EMP500/blob/main/

# Earth Microbiome Project 500 Metagenomes
This repository documents analysis code and how community resource files (such as assemblies, taxonomy information, proteins, etc.) were generated from the Earth Microbiome Project (EMP) 500 metagenomes. The analyses and resources were generated from metagenomes seqeunced as part of Shaffer et al. **Standardized multi-omics of Earth's microbiomes reveals microbial and metabolite diversity. (2022). [https://doi.org/10.1038/s41564-022-01266-x](https://www.nature.com/articles/s41564-022-01266-x). Metagenomic reads were accessed from the ENA at Bioproject [PRJEB42019](https://www.ebi.ac.uk/ena/browser/view/PRJEB42019).

## Download EMP500 Metagenomes
From the Biproject [PRJEB42019](https://www.ebi.ac.uk/ena/browser/view/PRJEB42019) page the TSV report was downloaded and used to select a subset of reads to use for this project. Since this Bioproject contains both 16S rRNA amplicon sequencing and shotgun metagenomic sequencing, the script `scripts/emp_metadata_filter.R` was used to select only shotgun metagenomic sequencing samples. Additionally, we filtered out samples with a scientific name of "gut metagenome" or simply just "metagenome" since neither of these were descriptive enough to move forward with. This left a total of ~2200 shotgun metagenomes from diverse biomes.

These metagenomes were downloaded from the ENA using the `nf-core/fetchngs` Nextflow pipeline, which was launched on Tower with:

```
nextflow run 'https://github.com/nf-core/fetchngs'
    -name EMP500_download
    -profile docker
    -with-tower
    -r master
    -input_type sra
    -input EMP500_subset_accessions.csv
    -outdir s3://nf-metagenomics/EMP500/raw_data
    -nf_core_pipeline taxprofiler
```

The `nf_core_pipeline` parameter outputs a samplesheet compliant with the `nf-core/taxprofiler` pipeline, which of the options is the closest pipeline that creates a samplesheet that we will need for tracking metadata and inputting into the `Arcadia-Science/metagenomics` workflow. This samplesheet is the `metadata/EMP500_fetchngs_samplesheet.csv` file.

## Process EMP500 Metagenomes
A subset of the EMP500 metagenomes were analyzed from the `activated sludge` biome by filtering for accessions named `activated sludge metagenome`. These were then processed using the Arcadia-Science/metagenomics Nextflow workflow with

```
nextflow run 'https:/github.com/Arcadia-Science/metagenomics'
    -name EMP500_activated_sludge
    -profile docker
    -with-tower
    -r main
    -input EMP500-AS-s3-uris.csv
    -outdir s3://nf-metagenomics/EMP500/activated_sludge
    -sourmash_dbs s3://nf-metagenomics/sourmash-cover-dbs.csv
```
## Analyze EMP500 Metagenomes
The assemblies and sourmash files were pulled down from the output of the workflow. Sourmash signatures, compare results, gather results, and taxannotate results were analyzed using the `sourmashconsumr` R package with the script `scripts/emp_activated_sludge_sourmashconsumr_analysis.R`

Owner

  • Name: Arcadia Science
  • Login: Arcadia-Science
  • Kind: organization
  • Location: United States of America

GitHub Events

Total
Last Year