phylokrr

Non-linear phylogenetic regression using regularized kernels

https://github.com/ulises-rosas/phylokrr

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: wiley.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Non-linear phylogenetic regression using regularized kernels

Basic Info
  • Host: GitHub
  • Owner: Ulises-Rosas
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 2.34 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created over 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Non-linear phylogenetic regression using regularized kernels

Installation

pip install phylokrr

Quick overview

Data

The data used below is obtained from simulation in such a way that the phylogenetically weighted observations follows a sine curve in the response variable. All these files are available at src/data folder.

```python import numpy as np import matplotlib.pyplot as plt

load phylokrr functions

from phylokrr.dataio import readdata from phylokrr.utils import weightdata

tree file in newick format

treefile = "./src/data/testtree.txt"

data file in csv format, without column names

datafile = "./src/data/testdata3.csv"

This file contains only a list species names, each

corresponding to a row in data_file.

datafilespps = "./src/data/testdataspps.txt"

Read data

Xuwuc, yuwuc, vcv = readdata(treefile, datafile, datafilespps, ycol = 1, # response variable column delimiter = ',', verbose = True)

Weight data

Xw, yw = weightdata(Xuwuc, yuwuc, vcv, usesd = False)

np.random.seed(12038)

fig, axs = plt.subplots(1, 2, figsize=(8, 4)) axs[0].scatter(Xuwuc, yuwuc, color = 'red', alpha=0.5, ) axs[0].settitle('Unweighted data') axs[0].setxlabel('x') axs[0].setylabel('y') axs[1].scatter(Xw, yw, color = 'blue', alpha=0.5, ) axs[1].settitle('Phylogenetically weighted data') axs[1].setxlabel('$x^*$') axs[1].setylabel('$y^*$') plt.tight_layout() ```

drawing

Simple model fitting without Cross-Validation (CV)

```python from phylokrr.utils import split_data

n,_ = X_w.shape

split data into training and testing sets

num_test = round(0.5*n)

(Xtrain, Xtest, ytrain, ytest,) = splitdata(Xw, yw, numtest, seed = 12038) # seed defined above

from phylokrr.kernels import KRR

set model

pkrrmodel = KRR(kernel='rbf', fitintercept= True)

arbitrarily proposed hyperparameters

params = {'lambda': 1, 'gamma': 1}

set hyperparamters

pkrrmodel.setparams(**params)

fit model

pkrrmodel.fit(Xtrain, y_train,)

make predictions

ypredkernel = pkrrmodel.predict(Xtest) ```

Let's compare it with the standard phylogenetic regression (i.e., PGLS)

```python from phylokrr.utils import PGLS

fit standard phylogenetic regression

pgls = PGLS(fitintercept=True) pgls.fit(Xtrain, ytrain) ypredpgls = pgls.predict(Xtest)

plot model fits

plt.scatter(Xtest, ytest , color = 'blue' , alpha=0.5, label = 'Testing (unseen) data') plt.scatter(Xtest, ypredkernel, color = 'green', alpha=0.5, label = 'phyloKRR predictions w\o CV') plt.scatter(Xtest, ypredpgls, color = 'red', alpha=0.5, label = 'PGLS predictions') plt.xlabel('$x^$') plt.ylabel('$y^$') plt.legend() plt.tight_layout() ```

drawing

Hyperparameter tuning with CV

```python from phylokrr.utils import kfoldcv_random params = { 'lambda' : np.logspace(-5, 3, 200, base=2), 'gamma' : np.logspace(-5, 3, 200, base=2), }

cross validation

bestparams = kfoldcvrandom(Xtrain, ytrain, pkrr_model, params, verbose = False, folds = 3, sample = 100)

pkrrmodel.setparams(**bestparams) pkrrmodel.fit(Xtrain, ytrain) ypredcv = pkrrmodel.predict(Xtest)

plot model fits

fs = 10 plt.scatter(Xtest, ytest, color = 'blue' , alpha=0.5, label = 'Testing (unseen) data',) plt.scatter(Xtest, ypredcv, color = 'green', alpha=0.5, label = 'phyloKRR predictions',) plt.scatter(Xtest, ypredpgls, color = 'red', alpha=0.5, label = 'PGLS predictions',) plt.xlabel('$x^$', fontsize = fs) plt.ylabel('$y^$', fontsize = fs) plt.legend(fontsize = fs) plt.tight_layout() ```

drawing

Model performance metrics

```python r2kernel = pkrrmodel.score(Xtest, ytest, metric='r2') r2pgls = pgls.score(Xtest, y_test, metric='r2')

print(f"R2 for phyloKRR: {r2kernel}") print(f"R2 for PGLS: {r2pgls}") R2 for phyloKRR: 0.752485346280156 R2 for PGLS: 0.4126035522866096 ```

More information at these notebooks

Reference

Rosas‐Puchuri, U., Santaquiteria, A., Khanmohammadi, S., Solís‐Lemus, C., & Betancur‐R, R. (2024). Non‐linear phylogenetic regression using regularised kernels. Methods in Ecology and Evolution.

Owner

  • Name: U. Rosas-Puchuri
  • Login: Ulises-Rosas
  • Kind: user
  • Company: George Washington University

Coding for fun!

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Rosas-Puchuri"
  given-names: "Ulises"
  orcid: "https://orcid.org/0000-0003-0529-2623"
title: "phyloKRR"
version: 1.0.1
doi: 10.5281/zenodo.12595028
date-released: 2024-06-29
url: "https://github.com/ulises-rosas/phylokrr"

GitHub Events

Total
  • Watch event: 2
  • Push event: 8
Last Year
  • Watch event: 2
  • Push event: 8

Dependencies

setup.py pypi