aphylo

Statistical inference of genetic functions in phylogenetic trees

https://github.com/uscbiostats/aphylo

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.1%) to scientific vocabulary

Keywords

annotations inference phylogenetics r r-package rcpparmadillo
Last synced: 6 months ago · JSON representation

Repository

Statistical inference of genetic functions in phylogenetic trees

Basic Info
Statistics
  • Stars: 6
  • Watchers: 0
  • Forks: 2
  • Open Issues: 10
  • Releases: 2
Topics
annotations inference phylogenetics r r-package rcpparmadillo
Created about 9 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License

README.md

R
CI Build
status Coverage
Status Integrative Methods of Analysis for Genetic
Epidemiology CRAN
status status CRAN
downloads

aphylo: Statistical Inference of Annotated Phylogenetic Trees

The aphylo R package implements estimation and data imputation methods for Functional Annotations in Phylogenetic Trees. The core function consists of the log-likelihood computation of observing a given phylogenetic tree with functional annotation on its leaves and the probabilities associated to gain and loss of function, including probabilities of experimental misclassification. The log-likelihood is computed using peeling algorithms, which required developing and implementing efficient algorithms for re-coding and preparing phylogenetic tree data to be used with the package. Finally, aphylo works smoothly with popular tools for analysis of phylogenetic data such as ape R package, “Analyses of Phylogenetics and Evolution.”

The package is under MIT License and is developed by the Computing and Software Cores of the Biostatistics Division’s NIH Project Grant (P01) at the Department of Preventive Medicine at the University of Southern California.

Citation

r citation(package="aphylo")

To cite aphylo in publications use the following paper:

  Vega Yon GG, Thomas DC, Morrison J, Mi H, Thomas PD, et al. (2021)
  Bayesian parameter estimation for automatic annotation of gene
  functions using observational data and phylogenetic trees. PLOS
  Computational Biology 17(2): e1007948.
  https://doi.org/10.1371/journal.pcbi.1007948

And the actual R package:

  Vega Yon G (2022). _Statistical Inference of Annotated Phylogenetic
  Trees_. R package version 0.3-2,
  <https://github.com/USCBiostats/aphylo>.

To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.

Install

This package depends on another on-development R package, the fmcmc. So first, you need to install it:

r devtools::install_github("USCbiostats/fmcmc")

Then you can install the aphylo package

r devtools::install_github("USCbiostats/aphylo")

Reading data

r library(aphylo)

Loading required package: ape

``` r

This datasets are included in the package

data("fakeexperiment") data("faketree")

head(fakeexperiment) ```

     LeafId f1 f2
[1,]      1  0  0
[2,]      2  0  1
[3,]      3  1  0
[4,]      4  1  1

r head(faketree)

     ParentId NodeId
[1,]        6      1
[2,]        6      2
[3,]        7      3
[4,]        7      4
[5,]        5      6
[6,]        5      7

``` r O <- new_aphylo( tip.annotation = fakeexperiment[,2:3], tree = as.phylo(faketree) )

O ```

Phylogenetic tree with 4 tips and 3 internal nodes.

Tip labels:
  1, 2, 3, 4
Node labels:
  5, 6, 7

Rooted; no branch lengths.

 Tip (leafs) annotations:
  f1 f2
1  0  0
2  0  1
3  1  0
4  1  1

 Internal node annotations:
  f1 f2
5  9  9
6  9  9
7  9  9

r as.phylo(O)

Phylogenetic tree with 4 tips and 3 internal nodes.

Tip labels:
  1, 2, 3, 4
Node labels:
  5, 6, 7

Rooted; no branch lengths.

``` r

We can visualize it

plot(O) ```

r plot_logLik(O)

Simulating annotated trees

``` r set.seed(198) dat <- raphylo( 50, P = 1, psi = c(0.05, 0.05), mud = c(0.8, 0.3), mus = c(0.1, 0.1), Pi = .4 )

dat ```

Phylogenetic tree with 50 tips and 49 internal nodes.

Tip labels:
  1, 2, 3, 4, 5, 6, ...
Node labels:
  51, 52, 53, 54, 55, 56, ...

Rooted; no branch lengths.

 Tip (leafs) annotations:
  fun0000
1       1
2       0
3       0
4       1
5       0
6       0

...(44 obs. omitted)...


 Internal node annotations:
  fun0000
1       1
2       1
3       1
4       1
5       1
6       0

...(43 obs. omitted)...

Likelihood

``` r

Parameters and data

psi <- c(0.020,0.010) mud <- c(0.40,.10) mus <- c(0.04,.01) eta <- c(.7, .9) pi_root <- .05

Computing likelihood

str(LogLike(dat, psi = psi, mud = mud, mus = mus, eta = eta, Pi = pi_root)) ```

List of 2
 $ Pr:List of 1
  ..$ : num [1:99, 1:2] 0.018 0.686 0.686 0.018 0.686 0.686 0.018 0.018 0.018 0.686 ...
 $ ll: num -40.4

Estimation

``` r

Using L-BFGS-B (MLE) to get an initial guess

ans0 <- aphylomle(dat ~ psi + mud + Pi + eta)

MCMC method

ans2 <- aphylomcmc( dat ~ mud + mu_s + Pi, prior = bprior(c(9, 1, 1, 1, 5), c(1, 9, 9, 9, 5)), control = list(nsteps=5e3, burnin=500, thin=10, nchains=2)) ```

Warning: While using multiple chains, a single initial point has been passed
via `initial`: c(0.9, 0.5, 0.1, 0.05, 0.5). The values will be recycled.
Ideally you would want to start each chain from different locations.

Convergence has been reached with 5500 steps. Gelman-Rubin's R: 1.0314. (500 final count of samples).

r ans2

ESTIMATION OF ANNOTATED PHYLOGENETIC TREE

 Call: aphylo_mcmc(model = dat ~ mu_d + mu_s + Pi, priors = bprior(c(9, 
    1, 1, 1, 5), c(1, 9, 9, 9, 5)), control = list(nsteps = 5000, 
    burnin = 500, thin = 10, nchains = 2))
 LogLik (unnormalized): -20.0599 
 Method used: mcmc (5500 steps)
 # of Leafs: 50
 # of Functions 1
 # of Trees: 1

         Estimate  Std. Err.
 mu_d0   0.9093    0.0827
 mu_d1   0.1608    0.0767
 mu_s0   0.1015    0.0669
 mu_s1   0.1022    0.0443
 Pi      0.5318    0.1443

r plot( ans2, nsample = 200, loo = TRUE, ncores = 2L )

``` r

MCMC Diagnostics with coda

library(coda) gelman.diag(ans2$hist) ```

Potential scale reduction factors:

      Point est. Upper C.I.
mu_d0       1.00       1.02
mu_d1       1.02       1.11
mu_s0       1.00       1.01
mu_s1       1.01       1.06
Pi          1.01       1.02

Multivariate psrf

1.03

r plot(ans2$hist)

Prediction

r pred <- prediction_score(ans2, loo = TRUE) pred

Prediction score (H0: Observed = Random)

 N obs.      : 99

 Observed    : 0.71 ***
 Random      : NA 
 P(<t)       : 0.0000
--------------------------------------------------------------------------------
Values scaled to range between 0 and 1, 1 being best.

Significance levels: *** p < .01, ** p < .05, * p < .10
AUC 0.79.
MAE 0.29.

r plot(pred)

Misc

During the development process, we decided to allow the user to choose what ‘tree-reader’ function he would use, particularly between using either the rncl R package or ape. For such, we created a short benchmark that compares both functions here.

Owner

  • Name: USC Division of Biostatistics
  • Login: USCbiostats
  • Kind: organization
  • Location: Los Angeles, CA

GitHub Events

Total
  • Issues event: 1
  • Delete event: 2
  • Issue comment event: 2
  • Push event: 6
  • Pull request event: 5
  • Create event: 2
Last Year
  • Issues event: 1
  • Delete event: 2
  • Issue comment event: 2
  • Push event: 6
  • Pull request event: 5
  • Create event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 316
  • Total Committers: 2
  • Avg Commits per committer: 158.0
  • Development Distribution Score (DDS): 0.003
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
George G. Vega Yon g****n@g****m 315
Immaterial0 I****0 1

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 15
  • Total pull requests: 7
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 6 hours
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 0.6
  • Average comments per pull request: 0.14
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • gvegayon (15)
Pull Request Authors
  • gvegayon (7)
Top Labels
Issue Labels
enhancement (5) bug (3) methods (2) question (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 223 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
cran.r-project.org: aphylo

Statistical Inference and Prediction of Annotations in Phylogenetic Trees

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 223 Last month
Rankings
Stargazers count: 21.1%
Forks count: 21.9%
Average: 27.9%
Dependent packages count: 29.8%
Downloads: 31.3%
Dependent repos count: 35.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • ape >= 5.0 depends
  • MASS * imports
  • Matrix * imports
  • Rcpp >= 0.12.1 imports
  • coda * imports
  • fmcmc * imports
  • methods * imports
  • utils * imports
  • xml2 * imports
  • AUC * suggests
  • covr * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • tinytest * suggests
.github/workflows/ci.yml actions
  • actions/checkout v2 composite
.github/workflows/update-docker.yml actions
  • actions/checkout v2 composite
  • docker/login-action v1 composite
.github/workflows/website.yml actions
  • actions/checkout v2 composite
  • peaceiris/actions-gh-pages v3 composite
docker/Dockerfile docker
  • rocker/r-base latest build