IKPLS

IKPLS: Improved Kernel Partial Least Squares and Fast Cross-Validation Algorithms for Python with CPU and GPU Implementations Using NumPy and JAX - Published in JOSS (2024)

https://github.com/sm00thix/ikpls

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 12 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: wiley.com, joss.theoj.org
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

algorithm data-science gpu-support linear-regression partial-least-squares partial-least-squares-regression pls plsda plsr tpu-acceleration weighted-least-squares weighted-linear-regression weighted-regression

Scientific Fields

Engineering Computer Science - 80% confidence
Economics Social Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

Fast CPU and GPU Python implementations of Improved Kernel Partial Least Squares (PLS) by Dayal and MacGregor (1997) and Fast Partition-Based Cross-Validation With Centering and Scaling for XTX and XTY by Engstrøm and Jensen (2025).

Basic Info
Statistics
  • Stars: 29
  • Watchers: 1
  • Forks: 5
  • Open Issues: 0
  • Releases: 7
Topics
algorithm data-science gpu-support linear-regression partial-least-squares partial-least-squares-regression pls plsda plsr tpu-acceleration weighted-least-squares weighted-linear-regression weighted-regression
Created about 2 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License

README.md

Improved Kernel Partial Least Squares (IKPLS) and Fast Cross-Validation

PyPI Version

PyPI - Downloads

Python Versions

License

Documentation Status

Tests Status

Package Status

JOSS Status

The ikpls software package provides fast and efficient tools for PLS (Partial Least Squares) modeling. This package is designed to help researchers and practitioners handle PLS modeling faster than previously possible - particularly on large datasets.

NEW IN 3.0.0: Fast cross-validation for weighted IKPLS.

The ikpls software package now directly depends on the cvmatrix software package [11] to implement the fast cross-validation by Engstrøm and Jensen [7]. cvmatrix extends the fast cross-validation algorithms to correctly handle the weighted cases. The extension includes support for all 16 (12 unique) combinations of weighted centering and weighted scaling for X and Y, increasing neither time nor space complexity.

NEW IN 2.0.0: Weighted IKPLS

The ikpls software package now also features sample-weighted PLS [8]. For this, ikpls uses the weighted mean [9] and standard deviation [10] as formulated by National Institute of Science and Technology (NIST). Both NumPy and JAX implementations allow for weighted cross-validation with their respective cross_validate methods.

Citation

If you use the ikpls software package for your work, please cite this Journal of Open Source Software article. If you use the fast cross-validation algorithm implemented in ikpls.fast_cross_validation.numpy_ikpls, please also cite this Journal of Chemometrics article.

Unlock the Power of Fast and Stable Partial Least Squares Modeling with IKPLS

Dive into cutting-edge Python implementations of the IKPLS (Improved Kernel Partial Least Squares) Algorithms #1 and #2 [1] for CPUs, GPUs, and TPUs. IKPLS is both fast [2] and numerically stable [3] making it optimal for PLS modeling.

  • Use our NumPy [4] based CPU implementations for seamless integration with scikit-learn\'s [5] ecosystem of machine learning algorithms and pipelines. As the implementations subclass scikit-learn's BaseEstimator, they can be used with scikit-learn\'s cross_validate.
  • Use our JAX [6] implementations on CPUs or leverage powerful GPUs and TPUs for PLS modelling. Our JAX implementations are end-to-end differentaible allowing gradient propagation when using PLS as a layer in a deep learning model.
  • Use our combination of IKPLS with Engstrøm's unbelievably fast cross-validation algorithm [7] to quickly determine the optimal combination of preprocessing and number of PLS components.

The documentation is available at https://ikpls.readthedocs.io/en/latest/; examples can be found at https://github.com/Sm00thix/IKPLS/tree/main/examples.

Fast Cross-Validation

In addition to the standalone IKPLS implementations, this package contains an implementation of IKPLS combined with the novel, fast cross-validation algorithm by Engstrøm and Jensen [7]. The fast cross-validation algorithm benefit both IKPLS Algorithms and especially Algorithm #2. The fast cross-validation algorithm is mathematically equivalent to the classical cross-validation algorithm. Still, it is much quicker. The fast cross-validation algorithm correctly handles (column-wise) centering and scaling of the X and Y input matrices using training set means and standard deviations to avoid data leakage from the validation set. This centering and scaling can be enabled or disabled independently from eachother and for X and Y by setting the parameters center_X, center_Y, scale_X, and scale_Y, respectively. In addition to correctly handling (column-wise) centering and scaling, the fast cross-validation algorithm correctly handles row-wise preprocessing that operates independently on each sample such as (row-wise) centering and scaling of the X and Y input matrices, convolution, or other preprocessing. Row-wise preprocessing can safely be applied before passing the data to the fast cross-validation algorithm.

Prerequisites

The JAX implementations support running on both CPU, GPU, and TPU.

  • To enable NVIDIA GPU execution, install JAX and CUDA with: shell pip3 install -U "jax[cuda12]"

  • To enable Google Cloud TPU execution, install JAX with: shell pip3 install -U "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html

These are typical installation instructions that will be what most users are looking for. For customized installations, follow the instructions from the JAX Installation Guide.

To ensure that JAX implementations use float64, set the environment variable JAXENABLEX64=True as per the Current Gotchas. Alternatively, float64 can be enabled with the following function call: python import jax jax.config.update("jax_enable_x64", True)

Installation

  • Install the package for Python3 using the following command: shell pip3 install ikpls

  • Now you can import the NumPy and JAX implementations with: python from ikpls.numpy_ikpls import PLS as NpPLS from ikpls.jax_ikpls_alg_1 import PLS as JAXPLS_Alg_1 from ikpls.jax_ikpls_alg_2 import PLS as JAXPLS_Alg_2 from ikpls.fast_cross_validation.numpy_ikpls import PLS as NpPLS_FastCV

Quick Start

Use the ikpls package for PLS modeling

```python import numpy as np

from ikpls.numpy_ikpls import PLS

N = 100 # Number of samples. K = 50 # Number of features. M = 10 # Number of targets. A = 20 # Number of latent variables (PLS components).

X = np.random.uniform(size=(N, K)) # Predictor variables Y = np.random.uniform(size=(N, M)) # Target variables w = np.random.uniform(size=(N, )) # sample weights

The other PLS algorithms and implementations have the same interface for fit()

and predict(). The fast cross-validation implementation with IKPLS has a

different interface.

npikplsalg1 = PLS(algorithm=1) npikplsalg1.fit(X, Y, A, w)

Has shape (A, N, M) = (20, 100, 10). Contains a prediction for all possible

numbers of components up to and including A.

ypred = npikplsalg1.predict(X)

Has shape (N, M) = (100, 10).

ypred20components = npikplsalg1.predict(X, ncomponents=20) (ypred20components == y_pred[19]).all() # True

The internal model parameters can be accessed as follows:

Regression coefficients tensor of shape (A, K, M) = (20, 50, 10).

npikplsalg_1.B

X weights matrix of shape (K, A) = (50, 20).

npikplsalg_1.W

X loadings matrix of shape (K, A) = (50, 20).

npikplsalg_1.P

Y loadings matrix of shape (M, A) = (10, 20).

npikplsalg_1.Q

X rotations matrix of shape (K, A) = (50, 20).

npikplsalg_1.R

X scores matrix of shape (N, A) = (100, 20).

This is only computed for IKPLS Algorithm #1.

npikplsalg_1.T ```

Examples

In examples, you will find:

Contribute

To contribute, please read the Contribution Guidelines.

References

  1. Dayal, B. S. and MacGregor, J. F. (1997). Improved PLS algorithms. Journal of Chemometrics, 11(1), 73-85.
  2. Alin, A. (2009). Comparison of PLS algorithms when the number of objects is much larger than the number of variables. Statistical Papers, 50, 711-720.
  3. Andersson, M. (2009). A comparison of nine PLS1 algorithms. Journal of Chemometrics, 23(10), 518-529.
  4. NumPy
  5. scikit-learn
  6. JAX
  7. Engstrøm, O.-C. G. and Jensen, M. H. (2025). Fast Partition-Based Cross-Validation With Centering and Scaling for $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$
  8. Becker and Ismail (2016). Accounting for sampling weights in PLS path modeling: Simulations and empirical examples. European Management Journal, 34(6), 606-617.
  9. Weighted mean. National Institute of Standards and Technology.
  10. Weighted standard deviation. National Institute of Standards and Technology.
  11. CVMatrix. Fast computation of possibly weighted and possibly centered/scaled training set kernel matrices in a cross-validation setting.

Funding

Owner

  • Name: Ole-Christian Galbo Engstrøm
  • Login: Sm00thix
  • Kind: user
  • Location: Denmark
  • Company: University of Copenhagen and FOSS Analytical A/S

Industrial Ph.D. Student at The University of Copenhagen, Department of Computer Science.

JOSS Publication

IKPLS: Improved Kernel Partial Least Squares and Fast Cross-Validation Algorithms for Python with CPU and GPU Implementations Using NumPy and JAX
Published
July 23, 2024
Volume 9, Issue 99, Page 6533
Authors
Ole-Christian Galbo Engstrøm ORCID
FOSS Analytical A/S, Denmark, Department of Computer Science (DIKU), University of Copenhagen, Denmark, Department of Food Science (UCPH FOOD), University of Copenhagen, Denmark
Erik Schou Dreier ORCID
FOSS Analytical A/S, Denmark
Birthe Møller Jespersen ORCID
UCL University College, Denmark
Kim Steenstrup Pedersen ORCID
Department of Computer Science (DIKU), University of Copenhagen, Denmark, Natural History Museum of Denmark (NHMD), University of Copenhagen, Denmark
Editor
Sébastien Boisgérault ORCID
Tags
PLS latent variables multivariate statistics cross-validation deep learning

GitHub Events

Total
  • Create event: 7
  • Issues event: 1
  • Release event: 5
  • Watch event: 15
  • Delete event: 4
  • Issue comment event: 2
  • Push event: 36
  • Pull request event: 4
  • Fork event: 2
Last Year
  • Create event: 7
  • Issues event: 1
  • Release event: 5
  • Watch event: 15
  • Delete event: 4
  • Issue comment event: 2
  • Push event: 36
  • Pull request event: 4
  • Fork event: 2

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 201
  • Total Committers: 4
  • Avg Commits per committer: 50.25
  • Development Distribution Score (DDS): 0.154
Past Year
  • Commits: 8
  • Committers: 3
  • Avg Commits per committer: 2.667
  • Development Distribution Score (DDS): 0.5
Top Committers
Name Email Commits
Ole Engstrøm o****l@i****m 170
Ole-Christian Galbo Engstrøm S****x 23
parmentelat t****t@i****r 5
Ole-Christian Galbo Engstrøm o****e@f****k 3
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 17
  • Total pull requests: 27
  • Average time to close issues: 18 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 5
  • Total pull request authors: 3
  • Average comments per issue: 5.35
  • Average comments per pull request: 0.41
  • Merged pull requests: 24
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 7
  • Average time to close issues: 3 months
  • Average time to close pull requests: about 20 hours
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 5.0
  • Average comments per pull request: 0.14
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • parmentelat (8)
  • Sm00thix (4)
  • basileMarchand (2)
  • ayaanhossain (1)
  • nic-Oban (1)
Pull Request Authors
  • sm00thix (21)
  • parmentelat (11)
  • Sm00thix (8)
Top Labels
Issue Labels
question (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 74 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 40
  • Total maintainers: 1
pypi.org: ikpls

Improved Kernel PLS and Fast Cross-Validation.

  • Versions: 40
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 74 Last month
Rankings
Dependent packages count: 9.9%
Average: 38.8%
Dependent repos count: 67.8%
Maintainers (1)
Last synced: 4 months ago