phylokrr
Non-linear phylogenetic regression using regularized kernels
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: wiley.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Repository
Non-linear phylogenetic regression using regularized kernels
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Non-linear phylogenetic regression using regularized kernels
Installation
pip install phylokrr
Quick overview
Data
The data used below is obtained from simulation in such a way that the phylogenetically weighted observations follows a sine curve in the response variable. All these files are available at src/data folder.
```python import numpy as np import matplotlib.pyplot as plt
load phylokrr functions
from phylokrr.dataio import readdata from phylokrr.utils import weightdata
tree file in newick format
treefile = "./src/data/testtree.txt"
data file in csv format, without column names
datafile = "./src/data/testdata3.csv"
This file contains only a list species names, each
corresponding to a row in data_file.
datafilespps = "./src/data/testdataspps.txt"
Read data
Xuwuc, yuwuc, vcv = readdata(treefile, datafile, datafilespps, ycol = 1, # response variable column delimiter = ',', verbose = True)
Weight data
Xw, yw = weightdata(Xuwuc, yuwuc, vcv, usesd = False)
np.random.seed(12038)
fig, axs = plt.subplots(1, 2, figsize=(8, 4)) axs[0].scatter(Xuwuc, yuwuc, color = 'red', alpha=0.5, ) axs[0].settitle('Unweighted data') axs[0].setxlabel('x') axs[0].setylabel('y') axs[1].scatter(Xw, yw, color = 'blue', alpha=0.5, ) axs[1].settitle('Phylogenetically weighted data') axs[1].setxlabel('$x^*$') axs[1].setylabel('$y^*$') plt.tight_layout() ```
Simple model fitting without Cross-Validation (CV)
```python from phylokrr.utils import split_data
n,_ = X_w.shape
split data into training and testing sets
num_test = round(0.5*n)
(Xtrain, Xtest, ytrain, ytest,) = splitdata(Xw, yw, numtest, seed = 12038) # seed defined above
from phylokrr.kernels import KRR
set model
pkrrmodel = KRR(kernel='rbf', fitintercept= True)
arbitrarily proposed hyperparameters
params = {'lambda': 1, 'gamma': 1}
set hyperparamters
pkrrmodel.setparams(**params)
fit model
pkrrmodel.fit(Xtrain, y_train,)
make predictions
ypredkernel = pkrrmodel.predict(Xtest) ```
Let's compare it with the standard phylogenetic regression (i.e., PGLS)
```python from phylokrr.utils import PGLS
fit standard phylogenetic regression
pgls = PGLS(fitintercept=True) pgls.fit(Xtrain, ytrain) ypredpgls = pgls.predict(Xtest)
plot model fits
plt.scatter(Xtest, ytest , color = 'blue' , alpha=0.5, label = 'Testing (unseen) data') plt.scatter(Xtest, ypredkernel, color = 'green', alpha=0.5, label = 'phyloKRR predictions w\o CV') plt.scatter(Xtest, ypredpgls, color = 'red', alpha=0.5, label = 'PGLS predictions') plt.xlabel('$x^$') plt.ylabel('$y^$') plt.legend() plt.tight_layout() ```
Hyperparameter tuning with CV
```python from phylokrr.utils import kfoldcv_random params = { 'lambda' : np.logspace(-5, 3, 200, base=2), 'gamma' : np.logspace(-5, 3, 200, base=2), }
cross validation
bestparams = kfoldcvrandom(Xtrain, ytrain, pkrr_model, params, verbose = False, folds = 3, sample = 100)
pkrrmodel.setparams(**bestparams) pkrrmodel.fit(Xtrain, ytrain) ypredcv = pkrrmodel.predict(Xtest)
plot model fits
fs = 10 plt.scatter(Xtest, ytest, color = 'blue' , alpha=0.5, label = 'Testing (unseen) data',) plt.scatter(Xtest, ypredcv, color = 'green', alpha=0.5, label = 'phyloKRR predictions',) plt.scatter(Xtest, ypredpgls, color = 'red', alpha=0.5, label = 'PGLS predictions',) plt.xlabel('$x^$', fontsize = fs) plt.ylabel('$y^$', fontsize = fs) plt.legend(fontsize = fs) plt.tight_layout() ```
Model performance metrics
```python r2kernel = pkrrmodel.score(Xtest, ytest, metric='r2') r2pgls = pgls.score(Xtest, y_test, metric='r2')
print(f"R2 for phyloKRR: {r2kernel}")
print(f"R2 for PGLS: {r2pgls}")
R2 for phyloKRR: 0.752485346280156
R2 for PGLS: 0.4126035522866096
```
More information at these notebooks
Reference
Rosas‐Puchuri, U., Santaquiteria, A., Khanmohammadi, S., Solís‐Lemus, C., & Betancur‐R, R. (2024). Non‐linear phylogenetic regression using regularised kernels. Methods in Ecology and Evolution.
Owner
- Name: U. Rosas-Puchuri
- Login: Ulises-Rosas
- Kind: user
- Company: George Washington University
- Website: stackoverflow.com/users/8872487
- Repositories: 7
- Profile: https://github.com/Ulises-Rosas
Coding for fun!
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Rosas-Puchuri" given-names: "Ulises" orcid: "https://orcid.org/0000-0003-0529-2623" title: "phyloKRR" version: 1.0.1 doi: 10.5281/zenodo.12595028 date-released: 2024-06-29 url: "https://github.com/ulises-rosas/phylokrr"
GitHub Events
Total
- Watch event: 2
- Push event: 8
Last Year
- Watch event: 2
- Push event: 8