afs_genetic_legacy
R code and data for our paper: "The genetic legacy of extreme exploitation in a polar vertebrate"
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Repository
R code and data for our paper: "The genetic legacy of extreme exploitation in a polar vertebrate"
Basic Info
- Host: GitHub
- Owner: apaijmans
- Language: R
- Default Branch: master
- Size: 47.5 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Analysis workflow for "The genetic legacy of extreme exploitation in a polar vertebrate", Paijmans et al. Sci Rep 10, 5089 (2020)
Overview
This is the workflow used in the paper "The genetic legacy of extreme exploitation in a polar vertebrate".
The raw microsatellite data are available via the Zenodo repository:
To run the analyses, please download the complete folder. The folder named "Rcode" contains 14 scripts, which are named 1_ to 10_ . These scripts can be run a standard desktop machine.
Some of the analyses are computationally intensive. This is the case for the STRUCTURE analyses (scripts and data can be found in the folder "STRUCTURE") and the bottleneck analyses (scripts and data can be found in the folder "ABC"). The scripts in these folders should run on a server with sufficient computing power and memory.
Most datasets which are produced along the way are already saved in subfolders, so that the analysis can be started at any point. In some cases the file size was too large and the file could not be included here. These files are available upon request.
In some cases, data needed to be transformed from STRUCTURE files to genepop files. This was done using PGD Spider and the .spid file is also given in this repository.
Finally, .bat files were used for mass conversion of files (see also script 9aBOTTLENECKin), also these .bat files are given in this repository.
Basic statistics and PCA
Within the folder "Rcode" the following scripts can be found:
- 1aprepdata_ALL: prepping data for STRUCTURE
- 1bhweld_ALL: test for HWE and LD, removing loci that are not in HWE/LD
- 1fpcaALL: PCA (fig S12a & b)
- 2removehybs: removing hybrids and A. tropicalis found using STRUCTURE
- 3aprepdata_GAZ: prepping data for STRUCTURE, A. gazella only
- 4_stats: calculate # private alleles, Ar, Ho, Fis (input for table 1 and table S1)
- 5a_hwe: test for HWE (input for table S7)
- 5bldtest: test for LD
- 6FstGAZ: calculate Fst (input for table S2)
- 7pcaGAZ: PCA (fig S3a & b)
- 8mratioGAZ: calculate M ratio (input for table 1 and table S1)
- 9aBOTTLENECKin: creates 1000 subsets of randomly drawn individuals for analysis with BOTTLENECK software
- 9bBOTTLENECKout: reads 1000 results of BOTTLENECK software and gets prop het ex values (input for fig 3 and fig S4)
- 10sealingeffort: sealing effort (fig 1)
STRUCTURE analysis
Within the folder "STRUCTURE" the following scripts can be found:
(1) Folder "TropGaz": in this STRUCTURE run we included all the data, so A. gazella, A. tropicalis and potential hybrids. We used a two population model (i.e. k = 2) to classify individuals that were admixed with at least 10% of the genetic attribution being to the secondary species (i.e. 0.10 q 0.9). - 1crunstructure: script to run STRUCTURE on the server - 1dparsestructure: parse and plot STRUCTURE output (fig S11) - 1eextract_hyb: get a list of hybrids identified as described above
(2) Folder "TropGaz": in this STRUCTURE run we included only A. gazella individuals - 3brunstructure: script to run STRUCTURE on the server - 3cparse_structure: parse and plot STRUCTURE output (input fig 1, fig S1, S2)
Coalescent-simulations and ABC analysis.
The scripts contained in the ABC folders were used to simulate genetic data under
a bottleneck and a neutral demographic scenario using strataG as an interface
to fastsimcoal2. These data
were then used for ABC analyses across genetic clusters/populations.
Prerequisites
(1) These script should be run on a multi-core machine, optimally using around 20 or more cores. However, everything can run quickly for testing purposes based on a small number of simulations (say 1000 instead of 20000000)
(2) Install fastsimcoal2 and check using the strataG vignette that the package can access fastsimcoal.
(3) Several other packages need to be installed, which are mentioned in the scripts.
Among them is a small package specifically written for the analysis of this paper,
the sealABC package, which can be install from GitHub with:
devtools::install_github("mastoffel/sealABC")
In addition, we slightly altered some specific functions of the package sealABC.
For this the script: mssumstatsAP is needed (in addition to the sealABC package). This script can be found in all folders where it is used.
Within the folder "ABC" the following scripts can be found:
(1) Folder "fsccluster2019": simulations compared to emperical data on the level of genetic clusters - aprepempdata5cluster: prepping emperical data - bsumstats5cluster: calculating summary statistics from emperical data on genetic cluster level - 1simulatediversity: simulates genetic diversity (nb the resulting file is not included on GitHub as the file size was too big. However it is available upon request) - 2ABCanalysis: ABC analysis part 1, Model selection and evaluation - 3ABCanalysisposteriordistributions: ABC analysis part 2, Parameter estimation - 4abcresults: save ABC estimates to RData file (nb not all resulting files are included on GitHub as the file size was too big. However they are available upon request) - 5cvevalplotsFigS8: cross-validation plots (fig S8) - 6oneplotFig3: creates Fig 3 - 7posteriorpredictivechecks: simulations for posterior predictive checks - 8posteriorpredictiveplotsFigS7: creates figure for posterior predictive checks (Fig S7) - 9simulatingdiversitydiffNeFig4a-c: simulations to calculate diversity loss using different Ne and plots (Fig 4a-c) - 10simulationsheatmap: simulations to calculate diversity loss using different Nebot size and duration - 11plottingheatmapFig4d: creates heatmap (Fig 4d) - SuppconfmatplotFigS6: confusion matrix plot (Fig S6) - Suppdensityplotnehist_FigS9: density plots for Nehist (Fig S9)
(2) Folder "fscpop2019": simulations compared to emperical data on the level of populations (locality) - aprepempdata8pop: prepping emperical data - bsumstats8pop: calculating summary statistics from emperical data on genetic cluster level - 1simulatediversity: simulates genetic diversity (nb the resulting file is not included on GitHub as the file size was too big. However it is available upon request) - 2ABCanalysis: ABC analysis part 1, Model selection and evaluation - 3ABCanalysisposteriordistributions: ABC analysis part 2, Parameter estimation - 4abcresults: save ABC estimates to RData file (nb not all resulting files are included on GitHub as the file size was too big. However they are available upon request) - 5cvevalplots: cross-validation plots - 6oneplot_FigS4: creates Fig S4
(3) Folder "sumstatsspecies": - sumstatsspecies: calculates summary statistics for other otariid species - comparespeciesbeeswarm_FigS5: creates plot showing allelic richness for all otariids (Fig S5)
The code is highly specific to the current analysis and probably has to be modified to be of use in other projects.
Owner
- Name: Anneke
- Login: apaijmans
- Kind: user
- Repositories: 1
- Profile: https://github.com/apaijmans
Conservation biologist working on population genetics in Antarctic fur seals
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1