SeleDiff

SeleDiff: A fast and scalable tool for estimating and testing selection differences between populations - Published in JOSS (2019)

https://github.com/xin-huang/selediff

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: ncbi.nlm.nih.gov, joss.theoj.org, zenodo.org
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

natural-selection population-genomics selective-pressures
Last synced: 6 months ago · JSON representation

Repository

A fast and scalable tool for estimating and testing selection differences between populations

Basic Info
  • Host: GitHub
  • Owner: xin-huang
  • License: apache-2.0
  • Language: Java
  • Default Branch: master
  • Homepage:
  • Size: 141 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Archived
Topics
natural-selection population-genomics selective-pressures
Created over 10 years ago · Last pushed over 4 years ago
Metadata Files
Readme License

README.md

SeleDiff

license language codecov build Status manual release DOI DOI

NOTE: This project is no longer actively maintained.

Introduction

  • SeleDiff implements a probabilistic method for estimating and testing selection (coefficient) differences between populations1.
  • If you have any problem, please feel free to contact xinhuang.res@gmail.com, or open an issue in this repository.
  • If you would like to reproduce our simulation, please check the codes in ./appendix.
  • If you are interested in contributing to SeleDiff, please feel free to clone and modify it. You should include unit tests for your modified codes. Besides, you can edit build.gradle to include new dependencies. After your modification, please send a GitHub Pull Request with a clear list of what you've done.
  • For more details, please see the manual in ./docs.

Installation

To install SeleDiff, you should first install Java SE Development Kit 8 or OpenJDK8.

Linux/Mac

In Linux/Mac, you can open the terminal and clone SeleDiff using git:

> git clone https://github.com/xin-huang/SeleDiff

Then you can enter the SeleDiff directory and use gradlew to install SeleDiff:

> cd ./SeleDiff
> ./gradlew build
> ./gradlew install

The runnable SeleDiff is in ./build/install/SeleDiff/bin/. You can add this directory into your PATH environment variable by:

> export PATH="/path/to/SeleDiff/build/install/SeleDiff/bin/":$PATH

You can get help information by typing:

> SeleDiff

You can use gradlew to remove SeleDiff:

> ./gradlew clean

Windows

In Windows, you can download the latest release. Please make sure your environment variable JAVA_HOME correctly point to your JDK directory. After download and uncompression, you can open cmd and enter the directory of SeleDiff in cmd. Please use gradlew.bat to build and install SeleDiff.

> cd /path/to/SeleDiff
> gradlew.bat build
> gradlew.bat install

And run SeleDiff.bat in ./build/install/SeleDiff/bin/:

> cd /build/install/SeleDiff/bin/
> SeleDiff.bat

You can use gradlew.bat to remove SeleDiff:

> cd /path/to/SeleDiff
> gradlew.bat clean

Commands

SeleDiff contains two sub-commands:

  • compute-var for estimating variances of Ω1, which is required for the compute-diff command;
  • compute-diff for estimating selection differences among loci.

Input Files

SeleDiff assumes bi-allelic genetic data and will not perform any checks on this assumption. All input files can be compressed by gzip.

EIGENSTRAT

SeleDiff accepts EIGENSTRAT format of genetic data as inputs. EIGENSOFT provides several functions to convert other formats to EIGENSTRAT format.

VCF

SeleDiff also accepts VCF format of genetic data as inputs, and assumes genotypes of each individual are encoded with 0 and 1. Because VCF format contains no population information of each individual, users should provide an additional file following EIGENSTRAT IND format.

Var File

The Var file is the output file from the first sub-command compute-var, which stores variances of pairwise Ω. SeleDiff does not divide Ω with generation times as He et al. (2015) in order to reduce floating-point rounding errors. When estimating Ω, SeleDiff uses SNPs are not fixed in any population. When using sub-command compute-diff to estimate selection differences, SeleDiff uses --var option to accept a a SPACE delimited file without header that specifies variances of Ω between populations.

    YRI CEU 1.547660
    YRI CHS 1.639591
    CEU CHS 0.989241

The first two columns are the population IDs, and the third column is the variances of Ω between populations.

Divergence Time File

When using sub-command compute-diff to estimate selection differences, SeleDiff uses --time option to accept a SPACE delimited file without header that specifies divergence times between two populations.

    YRI CEU 5000
    YRI CHS 5000
    CEU CHS 3000

The first two columns are the population IDs, and the third column is the divergence times of the two populations.

Output File

The output file from SeleDiff is TAB delimited. The first row is a header that describes the meaning of each column.

| Column | Column Name | Description | | ------ | --------------------- | ----------------------------------- | | 1 | SNP ID | The name of a SNP | | 2 | Ref | The reference allele | | 3 | Alt | The alternative allele | | 4 | Population1 | The first population ID | | 5 | Population2 | The second population ID | | 6 | Selection difference | The selection difference between the first and second populations | | 7 | Std | The standard deviation of the selection difference | | 8 | Lower bound of 95% CI | Lower bound of 95% confidence interval of the selection difference | | 9 | Upper bound of 95% CI | Upper bound of 95% confidence interval of the selection difference | | 10 | Delta | The delta statistic for selection difference | | 11 | p-value | The p-value of the delta statistic |

An Example

Here is an example to show how SeleDiff estimates and tests selection differences between populations. Four populations (YRI, CEU, CHB, CHD) from HapMap3 (release3) were extracted. CHB and CHD were merged into one population called CHS. PLINK 1.7 were used to remove correlated individuals and SNPs with minor allele frequences less than 0.05 and strong linkage disequilibrium. These genome-wide data are stored in ./examples/data/example.geno and used for estimating variances of Ω.

Two alternative alleles (rs1800407 and rs12913832) associated with blue eyes were identified in genes HERC2 and OCA22. These candidate data are stored in ./examples/data/example.candidates.geno and used for estimating selection differences of these SNPs between populations.

The counts of alleles in our example data were summarized in below.

| SNP ID | Population | Reference Allele Count | Alternative Allele Count | | ------ | --- | --- | --- | | rs1800407 | YRI | 290 | 0 | | rs1800407 | CEU | 207 | 17 | | rs1800407 | CHS | 486 | 4 | | rs12913832 | YRI | 294 | 0 | | rs12913832 | CEU | 47 | 177 | | rs12913832 | CHS | 491 | 1 |

We assume the divergence time of YRI-CEU and YRI-CHS are both 5000 generations, while the divergence time of CEU-CHS is 3000 generations. This information is stored in ./examples/data/example.time.

First, we estimate variances of Ω using sub-command compute-var:

> SeleDiff compute-var --geno ./examples/data/example.geno \
                       --ind ./examples/data/example.ind \
                       --snp ./examples/data/example.snp \
                       --output ./examples/results/example.geno.var

To estimate selection differences of candidates, we use the sub-command compute-diff:

> SeleDiff compute-diff --geno ./examples/data/example.candidates.geno \
                        --ind ./examples/data/example.candidates.ind \
                        --snp ./examples/data/example.candidates.snp \
                        --var ./examples/results/example.geno.var \
                        --time ./examples/data/example.time \
                        --output ./examples/results/example.candidates.geno.results

The result is stored in ./examples/results/example.candidates.geno.results. The main result is in below.

| SNP ID | Population1 | Population2 | Selection difference | Std | delta | p-value | | ------ | ------------ | ------------ | -------------- | --------- | --------- | -------- | | rs1800407 | YRI | CEU | -0.000773 | 0.000380 | 4.129 | 0.042154 | | rs1800407 | YRI | CHS | -0.000336 | 0.000393 | 0.731 | 0.392559 | | rs1800407 | CEU | CHS | 0.000728 | 0.000377 | 3.730 | 0.053443 | | rs12913832 | YRI | CEU | -0.001541 | 0.000378 | 16.583 | 0.000047 | | rs12913832 | YRI | CHS | -0.000117 | 0.000415 | 0.080 | 0.777297 | | rs12913832 | CEU | CHS | 0.002372 | 0.000433 | 30.062 | 0.000000 |

From the result, we can see the selection coefficient of rs12913832 in CEU is significantly larger than that in YRI or CHS, which indicates rs12913832 is under directional selection in CEU. While the selection coefficient of rs1800407 in CEU is marginal significantly larger than that in YRI or CHS.

Please refer to our previous study1 for a more comprehensive working example using the HapMap3 dataset.

Dependencies

References

  1. He et al., Genome Res, 2015.
  2. Sturm et al., Am J Hum Genet, 2008.

Owner

  • Name: Xin Huang
  • Login: xin-huang
  • Kind: user
  • Location: Vienna
  • Company: University of Vienna

JOSS Publication

SeleDiff: A fast and scalable tool for estimating and testing selection differences between populations
Published
July 18, 2019
Volume 4, Issue 39, Page 1545
Authors
Xin Huang ORCID
Chinese Academy of Sciences Key Laboratory of Computational Biology, Chinese Academy of Sciences-Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Shanghai, 200031, China, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
Li Jin
Chinese Academy of Sciences Key Laboratory of Computational Biology, Chinese Academy of Sciences-Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Shanghai, 200031, China, State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200433, China
Yungang He ORCID
Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai, 200032, China
Editor
Lorena Pantano ORCID
Tags
Population genetics Natural selection Selective pressures

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 378
  • Total Committers: 3
  • Avg Commits per committer: 126.0
  • Development Distribution Score (DDS): 0.312
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
xin-huang x****7@g****m 260
Xin Huang x****g 117
Daniel S. Katz d****z@i****g 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 6
  • Average time to close issues: N/A
  • Average time to close pull requests: 38 minutes
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.17
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • xin-huang (5)
  • danielskatz (1)
Top Labels
Issue Labels
Pull Request Labels