2024-predicting-antimicrobial-resistance
Analyses of large scale E. coli metagenome diversity data
https://github.com/arcadia-science/2024-predicting-antimicrobial-resistance
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary
Repository
Analyses of large scale E. coli metagenome diversity data
Basic Info
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Predicting antimicrobial resistance phenotypes across 7,000 E. coli genomes
Context
At Arcadia, we're interested in mapping genotype-phenotype relationships at broader evolutionary scales than previously attempted. To do so, we're developing models that can capture genetic relationships—both linear and nonlinear—that may be inaccessible to conventional methods. A key part of this development is identifying and leveraging unique datasets that both capture large scales of diversity and rich phenotypic information.
In this repo, we characterize the genomic structure and perform genomic prediction using a diverse dataset of 7000 E. coli genomes we previously compiled and published. We first use exploratory population genomic analyses to verify that this dataset contains the type of high-quality genotypic information that can be leveraged for model development. We then follow this up with genomic prediction to uncover the genetic basis of three antimicrobial resistance (AMR) phenotypes. These genomic prediction analyses confirm and expand our understanding of the evolution of these AMR phenotypes and set the baseline for our future efforts, which will explore how non-linear models might be applied for similar genomic prediction goals.
Data
Phenotypic and Genotypic data on the 7,000 E. coli genomes has previously been published and is available here
Additional pre-computed data on presence-absence variation in the dataset is available here and includes copies of files needed from the original 7,000 genome study needed to reprodouce all analyses.
Installation and Setup
This repository uses Snakemake, R, and Python. Dependency requirements are managed by conda.
The bash script run_analyses.sh can be used to initiate installation of miniforge3 (conda) and the main environment containing a snakemake installation, as well as running the snakemake pipeline to generate desired results. By default miniforge3 installation is commented out on the first lines of the script, if conda needs to be installed, uncomment these lines prior to running script/workflow. You may have to restart teminral after conda installation for conda to properly initialise for the first time.
Basic workflow
The main snakemake file is found in Ecoli_AMR_GenotypePhenotype/workflow/Snakemake
This file will initialize downloading all necessary data, as well as trigerring all downstream analyses.
All dependencies are automiatically handled by conda using environments built from yaml files stored in EcoliAMRGenotypePhenotype/workflow/envs
Analyses are split into 4 main subworkflows in EcoliAMRGenotypePhenotype/workflow/rules
- filtering.smk:
This workflow cleans up the data, removing outlier samples, filtering down to informative sites
- popgen_analyses.smk:
This workflow generates some pop-gen visualizations such as a site-frequency spectrum and also constructs phylogenetic trees
- genomic_prediction.smk:
This workflow runs genomic prediction/GWAS using GEMMA
- genomic_prediction_post_hoc.smk:
This workflow runs exploratory post-hoc genomic prediction analyses
Directory Structure
Supplemental pub tables
supplemental_tablesDirectory where supplemental pub tables can be accessed (both pdf and csv files available). Tables can also be reproduced by running snakemake pipeline (see above)
Data
Ecoli_AMR_GenotypePhenotype/vcf_filesDirectory for storing genotypic information on SNPs/indelsEcoli_AMR_GenotypePhenotype/presence_absenceDirectory for storing presence/absence dataEcoli_AMR_GenotypePhenotype/pangenomeDirectory for storing data related to pangenome referenceEcoli_AMR_GenotypePhenotype/phenotype_matrixDirectory for storing data related to AMR phenotypes and assoaciated metadata
Analyses
Ecoli_AMR_GenotypePhenotype/workflow/rules/scriptsThis is where most scripts are stored, generating output such as phylogenies, genomic prediction etc.Ecoli_AMR_GenotypePhenotype/figsDirectory where reproduced pub figures are savedEcoli_AMR_GenotypePhenotype/tablesDirectory where reproduced pub tables are savedEcoli_AMR_GenotypePhenotype/treeDirectory for data related to phylogenetic analysesEcoli_AMR_GenotypePhenotype/geno_predDirectory for data and results from genomic prediction
Owner
- Name: Arcadia Science
- Login: Arcadia-Science
- Kind: organization
- Location: United States of America
- Website: https://www.arcadiascience.com/
- Twitter: ArcadiaScience
- Repositories: 16
- Profile: https://github.com/Arcadia-Science
Citation (CITATION.cff)
cff-version: 1.2.0
message: If you use this software, please cite the associated publication.
title: Predicting antimicrobial resistance phenotypes across 7,000 E. coli genomes
doi: 10.57844/arcadia-8391-465f
authors:
- family-names: Bell
given-names: Audrey
affiliation: Arcadia Science
orcid: https://orcid.org/0009-0008-2270-1613
- family-names: Patton
given-names: Austin H.
affiliation: Arcadia Science
orcid: https://orcid.org/0000-0003-1286-9005
- family-names: Sandler
given-names: George
affiliation: Arcadia Science
orcid: https://orcid.org/0000-0001-9420-1521
- family-names: York
given-names: Ryan
affiliation: Arcadia Science
orcid: https://orcid.org/0000-0002-1073-1494
preferred-citation:
title: Predicting antimicrobial resistance phenotypes across 7,000 E. coli genomes
type: article
doi: 10.57844/arcadia-8391-465f
authors:
- family-names: Patton
given-names: Austin H.
affiliation: Arcadia Science
orcid: https://orcid.org/0000-0003-1286-9005
- family-names: Sandler
given-names: George
affiliation: Arcadia Science
orcid: https://orcid.org/0000-0001-9420-1521
- family-names: York
given-names: Ryan
affiliation: Arcadia Science
orcid: https://orcid.org/0000-0002-1073-1494
year: 2025
GitHub Events
Total
Last Year
Dependencies
- actions/checkout v3 composite
- r-lib/actions/setup-r v2 composite