hp3
Repository for Host-Pathogen Phylogeny Project. Paper DOI: 10.1038/nature22975
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Keywords
Repository
Repository for Host-Pathogen Phylogeny Project. Paper DOI: 10.1038/nature22975
Basic Info
- Host: GitHub
- Owner: ecohealthalliance
- License: mit
- Language: HTML
- Default Branch: master
- Homepage: https://dx.doi.org/10.1038/nature22975
- Size: 429 MB
Statistics
- Stars: 15
- Watchers: 12
- Forks: 6
- Open Issues: 4
- Releases: 2
Topics
Metadata Files
README.md
HP3 Analysis files
This repository contains code, data, documentation, metadata and figure source files used in Olival et. al. (2017) "Host and Viral Traits Predict Zoonotic Spillover from Mammals." Nature https://dx.doi.org/10.1038/nature22975
Repo Structure
-
documentscontains two R markdown documents in both raw and readable HTML form which give more detail than in the main paper or supplemental methods on our model-fitting and validation process:model_summaries.Rmd/htmlandgeographic_cross_validation.Rmd/html. -
data/contains data used in these analyses, including- our primary database of host-viral associations (
associations.csv) - databases of host (
hosts.csv) and viral (viruses.csv) traits - 2 phylogenetic tree files in Newick format (
*.tree) format. One (supertree_mammals.tree) is a pruned version of the mammallian supertree (Bininda-Emonds et. al. 2007), for the subset of mammals in our database. The other (cytb-supertree.tree) is a custom-built cytochrome-B phylogeny constrained to the order-level topology of the mammalian supertree (see supplementary methods). - full references for all associations in our database (
references.txt) - An
intermediates/directory with derived data (species phylogenetic distance matrices and PVR-corrected host mass) - A
metadata.csvfile that describes variables in our database and derived variables used in model-fitting -
IUCN_taxonomy_23JUN2016.csv, data from IUCN used to harmonize our data with IUCN spatial data (see Supplementary Methods) -
Genbank_accession_cytb.csv,two Genbank accession numbers used in constructing the Cyt-B constrained tree -
region_names.rds, a list of zoogeographical region names used to describe cross-validation regions.
- our primary database of host-viral associations (
-
figures/contains figures and tables in the paper and extended data and the scripts to generate them, including amaps/subdirectory with individual maps that are stitched together for the main and extended figures. -
scripts/contains all the scripts used to fit the models and generate outputs -
R/contains files with functions used in other scripts. -
misc/contains small scripts used for other calculations -
intermediates/is a holding directory for intermediate data files and fitted model objects in*.rdsR data form. These are re-created when the project is built -
shapefiles/is an empty holding directory. Large shapefiles used to generate maps and in analyses are stored separately on AWS to limit the size of this repository. They are downloaded to this folder by the scripts when needed.
Listing of files
``` README.md | This file in .md format README.txt | This file in .txt format HP3.Rproj | Rstudio project organization file Makefile | Makefile for building project .zenodo.json | Metadata file for ZENODO repository data/ associations.csv | associations database cytbsupertree.tree | tree file for Cyt-b constrained version of mammal supertree Genbankaccessioncytb.csv | Genbank accession numbers used for calculating the Cyt-b constrained tree hosts.csv | hosts database IUCNtaxonomy23JUN2016.csv | IUCN taxonomy to harmonize IUCN spatial data with hosts database metadata.csv | listing of variables in hosts, viruses, and associations databases references.txt | listing of reference sources for associations database regionnames.rds | R object of zoogeographical region names for cross-validation supertree_mammals.tree | tree file for mammal supertree viruses.csv | viruses database intermediate/ | Intermediate data files calculated by scripts, primarily phylogenetic distance matrices
documents/ modelsummaries.Rmd | R-markdown document of GAM model summaries and diagnostics modelsummaries.html | Compiled HTML of above geographiccrossvalidation.Rmd | R-markdown geospatial diagnostics of models geographiccrossvalidation.html | Compiled HTML of above
figures | Figures and tables for manuscript and supplements Figure01A-boxplots.pdf | Figure01B-boxplots.pdf | Figure02-all-gams.svg | Figure03-missing-zoo-maps.png | Figure04-viral-traits.svg | ExtendedFigure03-ALL.png | ExtendedFigure04-CARNIVORA.png | ExtendedFigure05-CETARTIODACTYLA.png | ExtendedFigure06-CHIROPTERA.png | ExtendedFigure07-PRIMATES.png | ExtendedFigure08-RODENTIA.png | ExtendedTable01-models.docx | SuppTable1-observed-predicted-missing.csv | maps/ | Individual maps stiched together for figures.
misc/ | Assorted side-analyses calc-bat-special.R | Calculates significance of bat order effect in GAM genhostspatialdata.R | Used for generating host zoogeographies shapefile phylo-primates.Rmd | Examination of phylogenetic effects specific to primates calc-pred-obs-correlation.R | Alternative measures of model fit zoonoticdevexplainedw_offset.R | For calculating deviance explained in models with offsets
R/ | Functions used in scripts and R markdown documents avggamvis.R | Functions for visualizing the average GAM of an ense crossvalidation.R | Cross validation cvgamby.R | Zoogeographical cross-validation fitgam.R | Fitting ensembles of gam models logp.R | Log function with offset for zeros modelreduction.R | Dropping non-predictive variables from models relativecontributions.R | Calculating the explained deviance from different variables in a model utils.R | Miscelaneuous utility functions
scripts/ | Scripts to build project outputs 01-download-shapefiles.R | Fetch shapefiles from storage on Amazon AWS 02-generatephylogeneticintermediatedata.R | Calculate phylogenetic distance matrices and PVD-adjusted body mass 03-preprocessdata.R | Data cleaning and merging 04-fit-models.R | Fit the GAMs in the paper 05-make-Figure01-boxplots.R | Generate boxplots in Figure 1 06-make-Figure02-all-gams.R | Generate Figure 2 07-make_maps.R | Generate all maps 08-make-Figure03-ExtendedFigs-stitch-maps.R | Assemble maps together into Figure 3 and Extended Figures 09-make-Figure04-viral-traits.R | Generate Figure 4 10-make-ExtendedFigure02-heatmap.R | Generate heat map for Extended Figure 2 11-make-ExtendedTable01-models.R | Generate Extended Table 1 of model summaries 12-make-SuppTable01-predictions.R | Generate supplemental table of oberved and predicted viruses and zoonoses by species
intermediates/ | Holds intermediate fitted model objects when project is built shapefiles/ | Holds large shapefiles downloaded when project is built packrat/ | Holds all R package dependencies .Rprofile | Configures R to use packrat dependencies
```
Reproducing the analysis
The Makefile in this repository holds the project workflow. Running
make all in the directory will re-build the project. make clean will
remove shapefiles, intermediate data, fit models, and all figures and maps.
If this project is opened in RStudio, this can also be accomplished with the
"Build All" and "Clean" buttons in the Build tab.
This project uses packrat to manage
R package dependencies. Running packrat::restore() will unpack the versions
of packages used in this project. In addition, these packages have
the following system requirements: cairo, gdal, GEOS, libmagick++-,
jave, libcurl, libpng, libxml2, OpenSSL, and pandoc. All analyses
were performed using R 3.3.2 under Ubuntu 14.04. Complete build takes approximately
1 hour with 40 cores and 256GB of memory, or approximately 8 hours on a 2-core
Macbook Pro with 16GB of memory.
Owner
- Name: EcoHealth Alliance
- Login: ecohealthalliance
- Kind: organization
- Email: tech@ecohealthalliance.org
- Location: New York, NY
- Website: http://ecohealthalliance.org/
- Repositories: 199
- Profile: https://github.com/ecohealthalliance
GitHub Events
Total
Last Year
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Noam Ross | n****s@g****m | 137 |
| Anna Willoughby | w****y@e****g | 106 |
| Cale Basaraba | b****a@e****g | 33 |
| Carlos Zambrana-Torrelio | c****t@g****m | 17 |
| kevinolival | o****l@e****g | 15 |
| Cale Basaraba | b****a@e****h | 3 |
| Noam Ross | r****s@e****g | 2 |
| Your Name | y****u@e****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 14
- Total pull requests: 10
- Average time to close issues: 7 months
- Average time to close pull requests: 4 days
- Total issue authors: 3
- Total pull request authors: 3
- Average comments per issue: 2.0
- Average comments per pull request: 0.6
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- noamross (6)
- arw36 (3)
- jhpoelen (1)
Pull Request Authors
- noamross (4)
- calebasaraba (3)
- arw36 (1)