ipADMIXTURE

A data clustering package based on admixture ratios (Q matrix) of population structure analysis.

https://github.com/darkeyes/ipadmixture

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary

Keywords

admixture bioinformatics data-clustering-algorithm population-stratification population-structure r

Last synced: 6 months ago · JSON representation

Repository

A data clustering package based on admixture ratios (Q matrix) of population structure analysis.

Basic Info

Host: GitHub
Owner: DarkEyes
License: gpl-3.0
Language: R
Default Branch: master
Homepage:
Size: 2.67 MB

Statistics

Stars: 5
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 0

Topics

admixture bioinformatics data-clustering-algorithm population-stratification population-structure r

Created almost 6 years ago · Last pushed 10 months ago

Metadata Files

Readme License

ipADMIXTURE: Iterative Pruning Population Admixture Inference Framework

A data clustering package based on admixture ratios (Q matrix) of population structure.

The framework is based on iterative Pruning procedure that performs data clustering by splitting a given population into subclusters until meeting the condition of stopping criteria the same as ipPCA, iNJclust, and IPCAPS frameworks.

The package also provides a function to retrieve phylogeny tree that construct a neighbor-joining tree based on a similar matrix between clusters.

By given multiple Q matrices with varying a number of ancestors (K), the framework define a similar value between clusters i,j as a minimum number K* that makes majority of members of two clusters are in the different clusters. This K* reflexes a minimum number of ancestors we need to splitting cluster i,j into different clusters if we assign K* clusters based on maximum admixture ratio of individuals.

Installation

You can install our package from CRAN.

r install.packages("ipADMIXTURE")

For the newest version on github, please call the following command in R terminal.

r remotes::install_github("DarkEyes/ipADMIXTURE") This requires a user to install the "remotes" package before installing ipADMIXTURE.

EXAMPLE

In this example, we have data set of human 27 population data published by Xing, J., et al. (2009). The dataset consists of 544 individuals from 27 populations. The Q matrices from this data are provided in this package. The following steps are the simple way to use our package.

Step1: running the ipADMIXTURE using Human 27 population dataset where the number of ancestors K =12. ```{r} library(ipADMIXTURE)

# running area: ipADMIXTURE::human27pop_Qmat[[i]] is a Q matrix with K=i+1

h27popobj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27popQmat[[11]], admixRatioThs =0.15) ```

Step2: printing all cluster information in text mode. {r} ipADMIXTURE::printClustersFromLabels(h27pop_obj,human27pop_labels)

Then, the text looks like this {r} [1] "Overall labels" [1] "===============" [1] "Alur(10)Hema(15)Pygmy(25)Brahmin(25)Utah_N._European(25)Cambodian(5)Chinese(10)Tamil_LC(13)Irula(24)JPN2(13)Madiga(10)Mala(11)CEU(60)YRI(60)CHB(45)JPT(45)Luhya(24)Tuscan(25)Kung(13)Pedi(10)Sotho/Tswana(8)Stalskoe(5)Iban(25)TBrahmin(14)Urkarah(18)VN(7)Nguni(9)" [1] "===============" [1] "ID1, md0.05, N25" [1] "Pygmy(25/25)" [1] "===============" [1] "ID2, md0.13, N56" [1] "JPN2(12/13)JPT(44/45)" [1] "===============" [1] "ID3, md0.00, N12" [1] "Kung(12/13)" [1] "===============" [1] "ID4, md0.00, N25" [1] "Iban(25/25)" [1] "===============" [1] "ID5, md0.00, N69" [1] "Cambodian(5/5)Chinese(10/10)JPN2(1/13)CHB(45/45)JPT(1/45)VN(7/7)" [1] "===============" [1] "ID6, md0.06, N25" [1] "Utah_N._European(1/25)Tuscan(24/25)" [1] "===============" [1] "ID7, md0.09, N85" [1] "Utah_N._European(24/25)CEU(60/60)Tuscan(1/25)" [1] "===============" [1] "ID8, md0.00, N17" [1] "Urkarah(17/18)" [1] "===============" [1] "ID9, md0.00, N6" [1] "Stalskoe(5/5)Urkarah(1/18)" [1] "===============" [1] "ID10, md0.00, N4" [1] "Irula(4/24)" [1] "===============" [1] "ID11, md0.00, N10" [1] "Irula(10/24)" [1] "===============" [1] "ID12, md0.00, N9" [1] "Irula(9/24)" [1] "===============" [1] "ID13, md0.00, N33" [1] "Tamil_LC(13/13)Madiga(9/10)Mala(11/11)" [1] "===============" [1] "ID14, md0.08, N41" [1] "Brahmin(25/25)Irula(1/24)Madiga(1/10)TBrahmin(14/14)" [1] "===============" [1] "ID15, md0.00, N4" [1] "Pedi(2/10)Sotho/Tswana(2/8)" [1] "===============" [1] "ID16, md0.00, N20" [1] "Pedi(5/10)Sotho/Tswana(6/8)Nguni(9/9)" [1] "===============" [1] "ID17, md0.00, N4" [1] "Kung(1/13)Pedi(3/10)" [1] "===============" [1] "ID18, md0.04, N60" [1] "YRI(60/60)" [1] "===============" [1] "ID19, md0.00, N4" [1] "Hema(2/15)Luhya(2/24)" [1] "===============" [1] "ID20, md0.00, N2" [1] "Luhya(2/24)" [1] "===============" [1] "ID21, md0.07, N20" [1] "Luhya(20/24)" [1] "===============" [1] "ID22, md0.12, N23" [1] "Alur(10/10)Hema(13/15)" For any cluster, it is separated from other cluster by "===============". The first line of cluster details is "IDx, md0.xx, Nx" and the second line is a detail of populations from the ground truth.

For example, [1] "ID19, md0.00, N4" [1] "Hema(2/15)Luhya(2/24)".

This is a cluster ID19 that has a maximum of manitude-difference of admixture ratios (md) as 0.00 and there are 4 individuals in this cluster. For a second line, there are 2 individuals from Hema population where the total number of Hema members is 15. There are also 2 individuals out of 24 from Luhya population.

Step3: plotting admixture ratios and clustering assignment.

{r} ipADMIXTURE::plotAdmixClusters(h27pop_obj)

Step4: plotting clustering information in treemap plot

{r} ipADMIXTURE::plotClusterLeaves(h27pop_obj)

Step5: Inferring phylogenetic tree of clusters based on a list of Q matrices that varies K using neighbor-joining (NJ) method.

{r} out<-ipADMIXTURE::getPhyloTree(human27pop_Qmat,h27pop_obj$indexClsVec) plot(out$tree,type = "unrooted")

The leave nodes are cluster IDs.

Creating Q matrix from .geno file using R

There are two well-known software products for getting Q matrix: ADMIXTURE and STRUCTURE. However, if you want to have everything in R, then here's the solution.

We can use LEA package to convert .geno file into Q matrix. If you never install bioconductor, then you should run the following code. {r} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") You can install LEA package by the BiocManager below. {r} BiocManager::install("LEA") Suppose we have "yourfile.geno" and we want to get the Q matrix with 4 ancestors, then we can run the following code. {r} library(LEA) K=4 obj.snmf = LEA::snmf(input.file="yourfile.geno", K = K, project = "new") Qmat = LEA::Q(obj.snmf, K = K)

Citation

Chainarong Amornbunchornvej, Pongsakorn Wangkumhang, and Sissades Tongsima (2020). ipADMIXTURE: R package for inferring sub-population clusters based on genetic admixture. bioRxiv 2020.03.21.001206; doi: https://doi.org/10.1101/2020.03.21.001206

Contact

Developer: C. Amornbunchornvej
https://orcid.org/0000-0003-3131-0370
Strategic Analytics Networks with Machine Learning and AI (SAI), NECTEC, Thailand
Homepage: Link

GitHub Events

Total

Watch event: 1
Push event: 2

Last Year

Watch event: 1
Push event: 2

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 47
Total Committers: 1
Avg Commits per committer: 47.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Chainarong Amornbunchornvej	g**a@g**m	47

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 2
Total pull requests: 0
Average time to close issues: about 1 year
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 1.5
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

josephresearcher (2)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 492 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2
Total maintainers: 1

cran.r-project.org: ipADMIXTURE

Iterative Pruning Population Admixture Inference Framework

Homepage: https://github.com/DarkEyes/ipADMIXTURE
Documentation: http://cran.r-project.org/web/packages/ipADMIXTURE/ipADMIXTURE.pdf
License: GPL-3
Latest release: 0.1.2
published 10 months ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 492 Last month

Rankings

Forks count: 21.9%

Stargazers count: 24.2%

Dependent packages count: 29.8%

Dependent repos count: 35.5%

Average: 38.8%

Downloads: 82.8%

Maintainers (1)

grandca@gmail.com

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

R >= 3.5.0 depends
ape * imports
stats * imports
treemap * imports
knitr * suggests
rmarkdown * suggests

ipADMIXTURE

Science Score: 13.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

ipADMIXTURE: Iterative Pruning Population Admixture Inference Framework

Installation

EXAMPLE

# running area: ipADMIXTURE::human27pop_Qmat[[i]] is a Q matrix with K=i+1

Creating Q matrix from .geno file using R

Citation

Contact

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: ipADMIXTURE

Rankings

Maintainers (1)

Dependencies