2024-predicting-antimicrobial-resistance

Analyses of large scale E. coli metagenome diversity data

https://github.com/arcadia-science/2024-predicting-antimicrobial-resistance

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Analyses of large scale E. coli metagenome diversity data

Basic Info
  • Host: GitHub
  • Owner: Arcadia-Science
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 316 KB
Statistics
  • Stars: 1
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme Citation

README.md

Predicting antimicrobial resistance phenotypes across 7,000 E. coli genomes

run with conda Snakemake

Context

At Arcadia, we're interested in mapping genotype-phenotype relationships at broader evolutionary scales than previously attempted. To do so, we're developing models that can capture genetic relationships—both linear and nonlinear—that may be inaccessible to conventional methods. A key part of this development is identifying and leveraging unique datasets that both capture large scales of diversity and rich phenotypic information.

In this repo, we characterize the genomic structure and perform genomic prediction using a diverse dataset of 7000 E. coli genomes we previously compiled and published. We first use exploratory population genomic analyses to verify that this dataset contains the type of high-quality genotypic information that can be leveraged for model development. We then follow this up with genomic prediction to uncover the genetic basis of three antimicrobial resistance (AMR) phenotypes. These genomic prediction analyses confirm and expand our understanding of the evolution of these AMR phenotypes and set the baseline for our future efforts, which will explore how non-linear models might be applied for similar genomic prediction goals.

Data

Phenotypic and Genotypic data on the 7,000 E. coli genomes has previously been published and is available here

Additional pre-computed data on presence-absence variation in the dataset is available here and includes copies of files needed from the original 7,000 genome study needed to reprodouce all analyses.

Installation and Setup

This repository uses Snakemake, R, and Python. Dependency requirements are managed by conda.

The bash script run_analyses.sh can be used to initiate installation of miniforge3 (conda) and the main environment containing a snakemake installation, as well as running the snakemake pipeline to generate desired results. By default miniforge3 installation is commented out on the first lines of the script, if conda needs to be installed, uncomment these lines prior to running script/workflow. You may have to restart teminral after conda installation for conda to properly initialise for the first time.

Basic workflow

The main snakemake file is found in Ecoli_AMR_GenotypePhenotype/workflow/Snakemake This file will initialize downloading all necessary data, as well as trigerring all downstream analyses. All dependencies are automiatically handled by conda using environments built from yaml files stored in EcoliAMRGenotypePhenotype/workflow/envs

Analyses are split into 4 main subworkflows in EcoliAMRGenotypePhenotype/workflow/rules - filtering.smk: This workflow cleans up the data, removing outlier samples, filtering down to informative sites - popgen_analyses.smk: This workflow generates some pop-gen visualizations such as a site-frequency spectrum and also constructs phylogenetic trees - genomic_prediction.smk: This workflow runs genomic prediction/GWAS using GEMMA - genomic_prediction_post_hoc.smk: This workflow runs exploratory post-hoc genomic prediction analyses

Directory Structure

Supplemental pub tables

  • supplemental_tables Directory where supplemental pub tables can be accessed (both pdf and csv files available). Tables can also be reproduced by running snakemake pipeline (see above)

Data

  • Ecoli_AMR_GenotypePhenotype/vcf_files Directory for storing genotypic information on SNPs/indels

  • Ecoli_AMR_GenotypePhenotype/presence_absence Directory for storing presence/absence data

  • Ecoli_AMR_GenotypePhenotype/pangenome Directory for storing data related to pangenome reference

  • Ecoli_AMR_GenotypePhenotype/phenotype_matrix Directory for storing data related to AMR phenotypes and assoaciated metadata

Analyses

  • Ecoli_AMR_GenotypePhenotype/workflow/rules/scripts This is where most scripts are stored, generating output such as phylogenies, genomic prediction etc.

  • Ecoli_AMR_GenotypePhenotype/figs Directory where reproduced pub figures are saved

  • Ecoli_AMR_GenotypePhenotype/tables Directory where reproduced pub tables are saved

  • Ecoli_AMR_GenotypePhenotype/tree Directory for data related to phylogenetic analyses

  • Ecoli_AMR_GenotypePhenotype/geno_pred Directory for data and results from genomic prediction

Owner

  • Name: Arcadia Science
  • Login: Arcadia-Science
  • Kind: organization
  • Location: United States of America

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite the associated publication.
title: Predicting antimicrobial resistance phenotypes across 7,000 E. coli genomes
doi: 10.57844/arcadia-8391-465f
authors:
- family-names: Bell
  given-names: Audrey
  affiliation: Arcadia Science
  orcid: https://orcid.org/0009-0008-2270-1613
- family-names: Patton
  given-names: Austin H.
  affiliation: Arcadia Science
  orcid: https://orcid.org/0000-0003-1286-9005
- family-names: Sandler
  given-names: George
  affiliation: Arcadia Science
  orcid: https://orcid.org/0000-0001-9420-1521
- family-names: York
  given-names: Ryan
  affiliation: Arcadia Science
  orcid: https://orcid.org/0000-0002-1073-1494
preferred-citation:
  title: Predicting antimicrobial resistance phenotypes across 7,000 E. coli genomes
  type: article
  doi: 10.57844/arcadia-8391-465f
  authors:
  - family-names: Patton
    given-names: Austin H.
    affiliation: Arcadia Science
    orcid: https://orcid.org/0000-0003-1286-9005
  - family-names: Sandler
    given-names: George
    affiliation: Arcadia Science
    orcid: https://orcid.org/0000-0001-9420-1521
  - family-names: York
    given-names: Ryan
    affiliation: Arcadia Science
    orcid: https://orcid.org/0000-0002-1073-1494
  year: 2025

GitHub Events

Total
Last Year

Dependencies

.github/workflows/lint.yml actions
  • actions/checkout v3 composite
  • r-lib/actions/setup-r v2 composite