afs_genetic_legacy

R code and data for our paper: "The genetic legacy of extreme exploitation in a polar vertebrate"

https://github.com/apaijmans/afs_genetic_legacy

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

R code and data for our paper: "The genetic legacy of extreme exploitation in a polar vertebrate"

Basic Info

Host: GitHub
Owner: apaijmans
Language: R
Default Branch: master
Size: 47.5 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created over 6 years ago · Last pushed over 3 years ago

Metadata Files

Readme Citation

Analysis workflow for "The genetic legacy of extreme exploitation in a polar vertebrate", Paijmans et al. Sci Rep 10, 5089 (2020)

Overview

This is the workflow used in the paper "The genetic legacy of extreme exploitation in a polar vertebrate". The raw microsatellite data are available via the Zenodo repository:

To run the analyses, please download the complete folder. The folder named "Rcode" contains 14 scripts, which are named 1_ to 10_ . These scripts can be run a standard desktop machine.

Some of the analyses are computationally intensive. This is the case for the STRUCTURE analyses (scripts and data can be found in the folder "STRUCTURE") and the bottleneck analyses (scripts and data can be found in the folder "ABC"). The scripts in these folders should run on a server with sufficient computing power and memory.

Most datasets which are produced along the way are already saved in subfolders, so that the analysis can be started at any point. In some cases the file size was too large and the file could not be included here. These files are available upon request.

In some cases, data needed to be transformed from STRUCTURE files to genepop files. This was done using PGD Spider and the .spid file is also given in this repository.

Finally, .bat files were used for mass conversion of files (see also script 9aBOTTLENECKin), also these .bat files are given in this repository.

Basic statistics and PCA

Within the folder "Rcode" the following scripts can be found:

1aprepdata_ALL: prepping data for STRUCTURE
1bhweld_ALL: test for HWE and LD, removing loci that are not in HWE/LD
1fpcaALL: PCA (fig S12a & b)
2removehybs: removing hybrids and A. tropicalis found using STRUCTURE
3aprepdata_GAZ: prepping data for STRUCTURE, A. gazella only
4_stats: calculate # private alleles, Ar, Ho, Fis (input for table 1 and table S1)
5a_hwe: test for HWE (input for table S7)
5bldtest: test for LD
6FstGAZ: calculate Fst (input for table S2)
7pcaGAZ: PCA (fig S3a & b)
8mratioGAZ: calculate M ratio (input for table 1 and table S1)
9aBOTTLENECKin: creates 1000 subsets of randomly drawn individuals for analysis with BOTTLENECK software
9bBOTTLENECKout: reads 1000 results of BOTTLENECK software and gets prop het ex values (input for fig 3 and fig S4)
10sealingeffort: sealing effort (fig 1)

STRUCTURE analysis

Within the folder "STRUCTURE" the following scripts can be found:

(1) Folder "TropGaz": in this STRUCTURE run we included all the data, so A. gazella, A. tropicalis and potential hybrids. We used a two population model (i.e. k = 2) to classify individuals that were admixed with at least 10% of the genetic attribution being to the secondary species (i.e. 0.10 q 0.9). - 1crunstructure: script to run STRUCTURE on the server - 1dparsestructure: parse and plot STRUCTURE output (fig S11) - 1eextract_hyb: get a list of hybrids identified as described above

(2) Folder "TropGaz": in this STRUCTURE run we included only A. gazella individuals - 3brunstructure: script to run STRUCTURE on the server - 3cparse_structure: parse and plot STRUCTURE output (input fig 1, fig S1, S2)

Coalescent-simulations and ABC analysis.

The scripts contained in the ABC folders were used to simulate genetic data under a bottleneck and a neutral demographic scenario using strataG as an interface to fastsimcoal2. These data were then used for ABC analyses across genetic clusters/populations.

Prerequisites

(1) These script should be run on a multi-core machine, optimally using around 20 or more cores. However, everything can run quickly for testing purposes based on a small number of simulations (say 1000 instead of 20000000)

(2) Install fastsimcoal2 and check using the strataG vignette that the package can access fastsimcoal.

(3) Several other packages need to be installed, which are mentioned in the scripts. Among them is a small package specifically written for the analysis of this paper, the sealABC package, which can be install from GitHub with:

devtools::install_github("mastoffel/sealABC")

In addition, we slightly altered some specific functions of the package sealABC. For this the script: mssumstatsAP is needed (in addition to the sealABC package). This script can be found in all folders where it is used.

Within the folder "ABC" the following scripts can be found:

(1) Folder "fsccluster2019": simulations compared to emperical data on the level of genetic clusters - aprepempdata5cluster: prepping emperical data - bsumstats5cluster: calculating summary statistics from emperical data on genetic cluster level - 1simulatediversity: simulates genetic diversity (nb the resulting file is not included on GitHub as the file size was too big. However it is available upon request) - 2ABCanalysis: ABC analysis part 1, Model selection and evaluation - 3ABCanalysisposteriordistributions: ABC analysis part 2, Parameter estimation - 4abcresults: save ABC estimates to RData file (nb not all resulting files are included on GitHub as the file size was too big. However they are available upon request) - 5cvevalplotsFigS8: cross-validation plots (fig S8) - 6oneplotFig3: creates Fig 3 - 7posteriorpredictivechecks: simulations for posterior predictive checks - 8posteriorpredictiveplotsFigS7: creates figure for posterior predictive checks (Fig S7) - 9simulatingdiversitydiffNeFig4a-c: simulations to calculate diversity loss using different Ne and plots (Fig 4a-c) - 10simulationsheatmap: simulations to calculate diversity loss using different Nebot size and duration - 11plottingheatmapFig4d: creates heatmap (Fig 4d) - SuppconfmatplotFigS6: confusion matrix plot (Fig S6) - Suppdensityplotnehist_FigS9: density plots for Nehist (Fig S9)

(2) Folder "fscpop2019": simulations compared to emperical data on the level of populations (locality) - aprepempdata8pop: prepping emperical data - bsumstats8pop: calculating summary statistics from emperical data on genetic cluster level - 1simulatediversity: simulates genetic diversity (nb the resulting file is not included on GitHub as the file size was too big. However it is available upon request) - 2ABCanalysis: ABC analysis part 1, Model selection and evaluation - 3ABCanalysisposteriordistributions: ABC analysis part 2, Parameter estimation - 4abcresults: save ABC estimates to RData file (nb not all resulting files are included on GitHub as the file size was too big. However they are available upon request) - 5cvevalplots: cross-validation plots - 6oneplot_FigS4: creates Fig S4

(3) Folder "sumstatsspecies": - sumstatsspecies: calculates summary statistics for other otariid species - comparespeciesbeeswarm_FigS5: creates plot showing allelic richness for all otariids (Fig S5)

The code is highly specific to the current analysis and probably has to be modified to be of use in other projects.

Owner

Name: Anneke
Login: apaijmans
Kind: user

Repositories: 1
Profile: https://github.com/apaijmans

Conservation biologist working on population genetics in Antarctic fur seals

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science