libscientific

libscientific: A Powerful C Library for Multivariate Analysis - Published in JOSS (2023)

https://github.com/gmrandazzo/libscientific

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 28 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords from Contributors

blackhole meshes gravitational-lenses generative-models ode parallel energy-systems pde exoplanets stellar

Scientific Fields

Engineering Computer Science - 60% confidence
Mathematics Computer Science - 43% confidence
Economics Social Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

Libscientific is a C framework for multivariate and other statistical analysis

Basic Info
  • Host: GitHub
  • Owner: gmrandazzo
  • License: gpl-3.0
  • Language: C
  • Default Branch: master
  • Size: 3.17 MB
Statistics
  • Stars: 18
  • Watchers: 4
  • Forks: 3
  • Open Issues: 2
  • Releases: 11
Created almost 10 years ago · Last pushed 4 months ago
Metadata Files
Readme License

README.md

libscientific

DOI Licence: GPL v3 Pylint Pytest Coverage Maintainability Rating Reliability Rating Security Rating Bugs Security Hotspots DOI

Libscientific is a C framework for multivariate and other statistical analysis written to be quasi-completely independent of common and well-established calculus libraries, except for the lapack library, which is used only to calculate left eigenvectors and eigenvalues.

The main goals of libscientific are:

  1. To provide a simple and tiny framework for multivariate analysis that can be used not only in regular computers but also in embedded systems
  2. Create a robust library of multivariate algorithms for any research and industrial application

Currently libscientific is able to compute:

  • Multivariate analysis

    • Principal Component Analysis (PCA) NIPALS algorithm) [1]
    • Partial Least Squares (PLS) NIPALS algorithm [1]
    • Consensus PCA (CPCA) NIPALS algorithm [7]
    • Multiple Linear Regression (MLR) Ordinary least squares algorithm
    • Unfold PCA (UPCA) [2]
    • Unfold PLS (UPLS) [2]
    • Fisher LDA
  • Clustering

    • K-means++ (David Arthur modification) [3]
    • Hierarchical clustering
  • Object/Instance selection

    • Most Descriptive Compounds (MDC) [4]
    • Most Dissimilar Compounds (DIS) [5]
  • Statistical analyisis

    • R2, MSE, MAE, RMSE, BIAS, Sensitivity, Positive Predicted Values
    • Yates analysis
    • Receiver operating characteristic (ROC)
    • Precision-Recal
    • Matrix-Matrix Euclidean, Manhattan, Cosine and Mahalanobis distances
  • Numerical analysis

    • Estimate of an integral over a xy region (numerical integration using the trapezoid rule)
    • Natural cubic spline interpolation and prediction
    • Ordinary Least-Squares (OrdinaryLeastSquares)
    • Linear Equation Solver (SolveLSE)
    • Singular value decomposition
  • Optimization

    • Nelder-Mead simplex algorithm

Moreover for some algorithms is possible to run validation methods with parallel computing to be faster:

  • Bootstrap k-fold Cross Validation (RGCV)
  • Leave-One-Out
  • Y-Scrambling [6]

Documentation

The library documentation is available at http://gmrandazzo.github.io/libscientific/

Usage examples

  • Sampling example on a drug dataset Open In Colab

  • PLS example on the Solubility Dataset Open In Colab

TODO

  • Implement Independent Component Analysis ICA

  • Implement PARAFAC

  • Exstensive test of some numerical analyisis methods: CholeskyReduction, QR Decomposition, LU Decomposition, HouseholderReduction and so on.

  • Fix UPLS algorithm

References:

[1] P. Geladi, B.R. Kowalski Partial least-squares regression: a tutorial Analytica Chimica Acta Volume 185, 1986, Pages 1-17 link

[2] S. Wold, P. Geladi, K. Esbensen and J. Öhman MULTI-WAY PRINCIPAL COMPONENTSAND PLS-ANALYSIS Journal of Chemometrics Volume 1, Issue 1, pages 41–56, January 1987 link

[3] T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, A.Y. Wu An efficient k-means clustering algorithm: analysis and implementation Pattern Analysis and Machine Intelligence, IEEE Transactions on Issue Date: Jul 2002 On page(s): 881 - 892 link

[4] B.D. Hudson, R.M. Hyde, E. Rahr, J, Wood and J. Osman Parameter Based Methods for Compound Selection from Chemical Databases Quantitative Structure-Activity Relationships Volume 15, Issue 4, pages 285–289, 1996 link

[5] J. Holliday, P. Willett Definitions of "Dissimilarity" for Dissimilarity-Based Compound Selection Journal of Biomolecular Screening Volume 1, Number 3, 1996 Pages: 145-151 link

[6] R.D. Clark , P.C. Fox Statistical variation in progressive scrambling. J Comput Aided Mol Des. 2004 Jul-Sep;18(7-9):563-76. link

[7] J. A. Westerhuis, T. Kourti and J.F. Macgregor Analysis of multiblock and hierarchical PCA and PLS models Journal of Chemometrics 1998 12, 301-321 link

License

Libscientific is distributed under GPLv3 license. To know more in detail how the license work, please read the file "LICENSE" or go to "http://www.gnu.org/licenses/gpl-3.0.en.html"

Dependencies

The required dependencies to use libscientific are:

  • lapack/blas library or a fortran compiler
  • libsqlite3
  • c compiler (gcc or clang for osx)
  • cmake

Install

Manual Installation

mkdir build cd build cmake -DCMAKE_INSTALL_PREFIX=/usr/ .. make -j make test # optional sudo make install cd ../src/python_bindings/ sudo pip install -e . pytest # optional

Compile python platform portable whl (no library installation required)

Linux/OSX

mkdir build cd build cmake -DCMAKE_BUILD_TYPE=Release -DPORTABLE_PYTHON_PACKAGE=True .. make -j cd ../src/python_bindings/ OSX: python3 setup.py bdist_wheel --plat-name macosx-14-0-arm64 Linux: python3 setup.py bdist_wheel --plat-name manylinux1_x86_64 N.B.: pip3 debug --verbose to get the compatible tags

Windows with MSYS/MinGW64

mkdir build cd build cmake -G "MinGW Makefiles" -DCMAKE_BUILD_TYPE=Release -DPORTABLE_PYTHON_PACKAGE=True .. mingw32-make -j cd ../src/python_bindings/ python3 setup.py bdist_wheel --plat-name win_amd64 python3 setup.py bdist_wheel --plat-name mingw_x86_64 N.B.: pip3 debug --verbose to get the compatible tags

Homebrew OSX

``` brew tap gmrandazzo/homebrew-gmr

brew install --HEAD libscientific ```

Development

If you are interested in extending or developing a new algorithm, please read here to understand the logic behind the library.

The library is engineered to have a specific data structure for every model, which is then stored in the HEAP to support dynamic memory allocation. Every data object, such as matrix, vectors, tensors, or in general pca/pls/upca/...models, need to be manually allocated/deallocated using the predefined constructs “NewSOMETHING(&…);” and “DelSOMETHING(&…);”.

For example, pca.c contains:

  1. A data structure to store the model output (PCAMODEL). In this data structure, you have standard libscientific types such as matrix and vectors.
  2. A function that computes the PCA model (PCA(...)). This function takes an input matrix using the libscientific data type and uses this to produce and store the pca model inside the PCAMODEL data structure.

All the multivariate analysis algorithms are implemented with this logic: reusing libscientific datatype, writing a data structure for the algorithm, and writing the necessary functions to run the calculation.

N.B.: If you develop or modify an algorithm, a unit test and a stress test needs to be created to prove that the algorithm works and his numerically solid.

General rules for contributing

To contribute, you can fork the project, or if you have already forked the project update to the latest version of libscientific, make the changes and open a Pull Request.

Here are some important requests.

Before opening a Pull Request: * Be sure that your code it's working. * No leaks. Run valgrind. * Comment your code with Parameters, attributes, returns, notes, and References. * Test examples are necessary. Tests must prove that - the algorithm works correctly - the algorithm do not present any memory leak

How to write a unit test?

Please first read the cmake documentation about testing with cmake and ctest

Then write a test for the proposed algorithm and save it in "src/tests" directory. Please write a test in src/python_bindings/tests for the python binding. The test should work using pytest.

Run and submit the resulting output in the pull request specifying: - What the algorithm does - What the unit tests represent and what they prove.

Owner

  • Name: Giuseppe Marco Randazzo
  • Login: gmrandazzo
  • Kind: user
  • Location: Zürich and Lugano
  • Company: Endogena Therapeutics

Head of AI and Cheminformatics @ Endogena Therapeutics

JOSS Publication

libscientific: A Powerful C Library for Multivariate Analysis
Published
October 25, 2023
Volume 8, Issue 90, Page 5420
Authors
Giuseppe Marco Randazzo ORCID
Independent researcher
Editor
Mehmet Hakan Satman ORCID
Tags
chemometrics multivariate analysis

GitHub Events

Total
  • Watch event: 2
  • Delete event: 7
  • Issue comment event: 11
  • Push event: 11
  • Pull request event: 7
  • Create event: 3
Last Year
  • Watch event: 2
  • Delete event: 7
  • Issue comment event: 11
  • Push event: 11
  • Pull request event: 7
  • Create event: 3

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 284
  • Total Committers: 6
  • Avg Commits per committer: 47.333
  • Development Distribution Score (DDS): 0.173
Past Year
  • Commits: 2
  • Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
gmrandazzo g****o@g****m 235
marco m****o@c****o 38
dependabot[bot] 4****] 5
Mehmet Hakan Satman m****n@g****m 4
Arfon Smith a****n 1
gmrandazzo g****o@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 7
  • Total pull requests: 17
  • Average time to close issues: 12 days
  • Average time to close pull requests: 2 months
  • Total issue authors: 4
  • Total pull request authors: 3
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.82
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 14
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 2 months
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.5
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 2
Top Authors
Issue Authors
  • gmrandazzo (4)
  • Beliavsky (1)
  • eriol (1)
  • mikeaalv (1)
Pull Request Authors
  • dependabot[bot] (20)
  • jbytecode (1)
  • arfon (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (20)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 78 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 7
  • Total maintainers: 1
pypi.org: libscientific

Libscientific python foreign function interface

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 78 Last month
Rankings
Dependent packages count: 10.1%
Downloads: 13.9%
Average: 15.2%
Dependent repos count: 21.5%
Maintainers (1)
Last synced: 4 months ago

Dependencies

.github/workflows/docs.yml actions
  • JamesIves/github-pages-deploy-action v4 composite
  • actions/checkout v2 composite
.github/workflows/pylint.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/pytest.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/sonarqube.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
docs/requirements.txt pypi
  • Babel ==2.12.1
  • Jinja2 ==3.1.2
  • MarkupSafe ==2.1.3
  • Pygments ==2.15.1
  • Sphinx ==5.3.0
  • alabaster ==0.7.13
  • beautifulsoup4 ==4.12.2
  • certifi ==2023.7.22
  • charset-normalizer ==3.1.0
  • clang ==14.0
  • docutils ==0.17.1
  • idna ==3.4
  • imagesize ==1.4.1
  • packaging ==23.1
  • readthedocs-sphinx-search ==0.1.1
  • requests ==2.31.0
  • snowballstemmer ==2.2.0
  • soupsieve ==2.4.1
  • sphinx-c-autodoc ==1.1.1
  • sphinx-rtd-theme ==1.1.1
  • sphinxcontrib-applehelp ==1.0.4
  • sphinxcontrib-devhelp ==1.0.2
  • sphinxcontrib-htmlhelp ==2.0.1
  • sphinxcontrib-jsmath ==1.0.1
  • sphinxcontrib-qthelp ==1.0.3
  • sphinxcontrib-serializinghtml ==1.1.5
  • urllib3 ==2.0.7
src/python_bindings/pyproject.toml pypi
src/python_bindings/setup.py pypi