SeleDiff
SeleDiff: A fast and scalable tool for estimating and testing selection differences between populations - Published in JOSS (2019)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, joss.theoj.org, zenodo.org -
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Repository
A fast and scalable tool for estimating and testing selection differences between populations
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Topics
Metadata Files
README.md
SeleDiff
NOTE: This project is no longer actively maintained.
Introduction
SeleDiffimplements a probabilistic method for estimating and testing selection (coefficient) differences between populations1.- If you have any problem, please feel free to contact xinhuang.res@gmail.com, or open an issue in this repository.
- If you would like to reproduce our simulation, please check the codes in
./appendix. - If you are interested in contributing to
SeleDiff, please feel free to clone and modify it. You should include unit tests for your modified codes. Besides, you can editbuild.gradleto include new dependencies. After your modification, please send a GitHub Pull Request with a clear list of what you've done. - For more details, please see the manual in
./docs.
Installation
To install SeleDiff, you should first install Java SE Development Kit 8
or OpenJDK8.
Linux/Mac
In Linux/Mac, you can open the terminal and clone SeleDiff using git:
> git clone https://github.com/xin-huang/SeleDiff
Then you can enter the SeleDiff directory and use gradlew to install SeleDiff:
> cd ./SeleDiff
> ./gradlew build
> ./gradlew install
The runnable SeleDiff is in ./build/install/SeleDiff/bin/. You can add this directory into your PATH environment variable by:
> export PATH="/path/to/SeleDiff/build/install/SeleDiff/bin/":$PATH
You can get help information by typing:
> SeleDiff
You can use gradlew to remove SeleDiff:
> ./gradlew clean
Windows
In Windows, you can download the latest release. Please make sure your environment variable JAVA_HOME correctly point to your JDK directory. After download and uncompression, you can open cmd and enter the directory of SeleDiff in cmd. Please use gradlew.bat to build and install SeleDiff.
> cd /path/to/SeleDiff
> gradlew.bat build
> gradlew.bat install
And run SeleDiff.bat in ./build/install/SeleDiff/bin/:
> cd /build/install/SeleDiff/bin/
> SeleDiff.bat
You can use gradlew.bat to remove SeleDiff:
> cd /path/to/SeleDiff
> gradlew.bat clean
Commands
SeleDiff contains two sub-commands:
compute-varfor estimating variances of Ω1, which is required for thecompute-diffcommand;compute-difffor estimating selection differences among loci.
Input Files
SeleDiff assumes bi-allelic genetic data and will not perform any checks on this assumption. All input files can be compressed by gzip.
EIGENSTRAT
SeleDiff accepts EIGENSTRAT format of genetic data as inputs. EIGENSOFT provides several functions to convert other formats to EIGENSTRAT format.
VCF
SeleDiff also accepts VCF format of genetic data as inputs, and assumes genotypes of each individual are encoded with 0 and 1. Because VCF format contains no population information of each individual, users should provide an additional file following EIGENSTRAT IND format.
Var File
The Var file is the output file from the first sub-command compute-var, which stores variances of pairwise Ω.
SeleDiff does not divide Ω with generation times as He et al. (2015) in order to reduce floating-point rounding errors.
When estimating Ω, SeleDiff uses SNPs are not fixed in any population.
When using sub-command compute-diff to estimate selection differences, SeleDiff uses --var option to accept a a SPACE delimited file without header that specifies variances of Ω between populations.
YRI CEU 1.547660
YRI CHS 1.639591
CEU CHS 0.989241
The first two columns are the population IDs, and the third column is the variances of Ω between populations.
Divergence Time File
When using sub-command compute-diff to estimate selection differences, SeleDiff uses --time option to accept a SPACE delimited file without header that specifies divergence times between two populations.
YRI CEU 5000
YRI CHS 5000
CEU CHS 3000
The first two columns are the population IDs, and the third column is the divergence times of the two populations.
Output File
The output file from SeleDiff is TAB delimited. The first row is a header that describes the meaning of each column.
| Column | Column Name | Description | | ------ | --------------------- | ----------------------------------- | | 1 | SNP ID | The name of a SNP | | 2 | Ref | The reference allele | | 3 | Alt | The alternative allele | | 4 | Population1 | The first population ID | | 5 | Population2 | The second population ID | | 6 | Selection difference | The selection difference between the first and second populations | | 7 | Std | The standard deviation of the selection difference | | 8 | Lower bound of 95% CI | Lower bound of 95% confidence interval of the selection difference | | 9 | Upper bound of 95% CI | Upper bound of 95% confidence interval of the selection difference | | 10 | Delta | The delta statistic for selection difference | | 11 | p-value | The p-value of the delta statistic |
An Example
Here is an example to show how SeleDiff estimates and tests selection differences between populations. Four populations (YRI, CEU, CHB, CHD) from HapMap3 (release3) were extracted. CHB and CHD were merged into one population called CHS. PLINK 1.7 were used to remove correlated individuals and SNPs with minor allele frequences less than 0.05 and strong linkage disequilibrium. These genome-wide data are stored in ./examples/data/example.geno and used for estimating variances of Ω.
Two alternative alleles (rs1800407 and rs12913832) associated with blue eyes were identified in genes HERC2 and OCA22. These candidate data are stored in ./examples/data/example.candidates.geno and used for estimating selection differences of these SNPs between populations.
The counts of alleles in our example data were summarized in below.
| SNP ID | Population | Reference Allele Count | Alternative Allele Count | | ------ | --- | --- | --- | | rs1800407 | YRI | 290 | 0 | | rs1800407 | CEU | 207 | 17 | | rs1800407 | CHS | 486 | 4 | | rs12913832 | YRI | 294 | 0 | | rs12913832 | CEU | 47 | 177 | | rs12913832 | CHS | 491 | 1 |
We assume the divergence time of YRI-CEU and YRI-CHS are both 5000 generations, while the divergence time of CEU-CHS is 3000 generations. This information is stored in ./examples/data/example.time.
First, we estimate variances of Ω using sub-command compute-var:
> SeleDiff compute-var --geno ./examples/data/example.geno \
--ind ./examples/data/example.ind \
--snp ./examples/data/example.snp \
--output ./examples/results/example.geno.var
To estimate selection differences of candidates, we use the sub-command compute-diff:
> SeleDiff compute-diff --geno ./examples/data/example.candidates.geno \
--ind ./examples/data/example.candidates.ind \
--snp ./examples/data/example.candidates.snp \
--var ./examples/results/example.geno.var \
--time ./examples/data/example.time \
--output ./examples/results/example.candidates.geno.results
The result is stored in ./examples/results/example.candidates.geno.results. The main result is in below.
| SNP ID | Population1 | Population2 | Selection difference | Std | delta | p-value | | ------ | ------------ | ------------ | -------------- | --------- | --------- | -------- | | rs1800407 | YRI | CEU | -0.000773 | 0.000380 | 4.129 | 0.042154 | | rs1800407 | YRI | CHS | -0.000336 | 0.000393 | 0.731 | 0.392559 | | rs1800407 | CEU | CHS | 0.000728 | 0.000377 | 3.730 | 0.053443 | | rs12913832 | YRI | CEU | -0.001541 | 0.000378 | 16.583 | 0.000047 | | rs12913832 | YRI | CHS | -0.000117 | 0.000415 | 0.080 | 0.777297 | | rs12913832 | CEU | CHS | 0.002372 | 0.000433 | 30.062 | 0.000000 |
From the result, we can see the selection coefficient of rs12913832 in CEU is significantly larger than that in YRI or CHS, which indicates rs12913832 is under directional selection in CEU. While the selection coefficient of rs1800407 in CEU is marginal significantly larger than that in YRI or CHS.
Please refer to our previous study1 for a more comprehensive working example using the HapMap3 dataset.
Dependencies
References
Owner
- Name: Xin Huang
- Login: xin-huang
- Kind: user
- Location: Vienna
- Company: University of Vienna
- Website: xin-huang.github.io
- Repositories: 4
- Profile: https://github.com/xin-huang
JOSS Publication
SeleDiff: A fast and scalable tool for estimating and testing selection differences between populations
Authors
Chinese Academy of Sciences Key Laboratory of Computational Biology, Chinese Academy of Sciences-Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Shanghai, 200031, China, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
Chinese Academy of Sciences Key Laboratory of Computational Biology, Chinese Academy of Sciences-Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Shanghai, 200031, China, State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200433, China
Tags
Population genetics Natural selection Selective pressuresGitHub Events
Total
Last Year
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| xin-huang | x****7@g****m | 260 |
| Xin Huang | x****g | 117 |
| Daniel S. Katz | d****z@i****g | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 6
- Average time to close issues: N/A
- Average time to close pull requests: 38 minutes
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.17
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- xin-huang (5)
- danielskatz (1)