Scikit-Longitudinal

Scikit-Longitudinal: A Machine Learning Library for Longitudinal Classification in Python - Published in JOSS (2025)

https://github.com/simonprovost/scikit-longitudinal

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

classification longitudinal longitudinal-classification longitudinal-data longitudinal-studies machine-learning repeated-measurements scikit-learn supervised-learning

Scientific Fields

Engineering Computer Science - 80% confidence
Last synced: 4 months ago · JSON representation

Repository

☂️ Scikit-longitudinal (Sklong) is an open-source Python library & Scikit-Learn API compliant, tailored to longitudinal machine learning classification tasks. It is ideal for researchers, data scientists, and analysts, as it provides specialist tools for dealing with repeated-measures data challenges

Basic Info
Statistics
  • Stars: 66
  • Watchers: 4
  • Forks: 3
  • Open Issues: 3
  • Releases: 6
Topics
classification longitudinal longitudinal-classification longitudinal-data longitudinal-studies machine-learning repeated-measurements scikit-learn supervised-learning
Created almost 3 years ago · Last pushed 4 months ago
Metadata Files
Readme Changelog License Citation

README.md


Scikit-longitudinal
Scikit-longitudinal

A specialised Python library for longitudinal data analysis built on Scikit-learn

pytest pylint pre-commit black Jupyter RUFF compliant UV compliant Codecov Fork Sklearn Python 3.9+ < 3.10 DOI badge

💡 About The Project

Scikit-longitudinal (Sklong) is a machine learning library designed to analyse longitudinal data (Classification tasks focussed as of today). It offers tools and models for processing, analysing, and predicting longitudinal data, with a user-friendly interface that integrates with the Scikit-learn ecosystem.

Wait, what is Longitudinal Data — In layman's terms ?

Longitudinal data is a "time-lapse" snapshot of the same subject, entity, or group tracked over time-periods, similar to checking in on patients to see how they change. For instance, doctors may monitor a patient's blood pressure, weight, and cholesterol every year for a decade to identify health trends or risk factors. This data is more useful for predicting future results than a one-time survey because it captures evolution, patterns, and cause-effect throughout time.

Not enough?


🛠️ Installation

[!NOTE] Want to be using Jupyter Notebook, Marimo, Google Colab, or JupyterLab? Head to the Getting Started section of the documentation, we explain it all! 🎉

To install Scikit-longitudinal:

  1. ✅ Install the latest version: bash pip install Scikit-longitudinal

To install a specific version: bash pip install Scikit-longitudinal==0.1.0

[!CAUTION] Scikit-longitudinal is currently compatible with Python versions 3.9 only. Ensure you have one of these versions installed before proceeding with the installation.

Now, while we understand that this is a limitation, we are tied for the time being because of Deep Forest. Deep Forest is a dependency of Scikit-longitudinal that is not compatible with Python versions greater than 3.9. Deep Forest helps us with the Deep Forest algorithm, to which we have made some modifications to welcome Lexicographical Deep Forest.

To follow up on this discussion, please refer to this github issue.

If you encounter any errors, feel free to explore further the installation section in the Getting Started of the documentation. If it still doesn't work, please open an issue on GitHub.


🚀 Getting Started

Here's how to analyse longitudinal data with Scikit-longitudinal:

``` py from scikitlongitudinal.datapreparation import LongitudinalDataset from scikitlongitudinal.estimators.ensemble.lexicographical.lexicogradient_boosting import LexicoGradientBoostingClassifier

dataset = LongitudinalDataset('./stroke.csv') # Note this is a fictional dataset. Use yours! dataset.loaddatatargettraintestsplit( targetcolumn="classstrokewave_4", )

Pre-set or manually set your temporal dependencies

dataset.setupfeaturesgroup(input_data="elsa")

model = LexicoGradientBoostingClassifier( featuresgroup=dataset.featuregroups(), threshold_gain=0.00015 # Refer to the API for more hyper-parameters and their meaning )

model.fit(dataset.Xtrain, dataset.ytrain) ypred = model.predict(dataset.Xtest)

Classification report

print(classificationreport(ytest, y_pred)) ```


📝 How to Cite

If you use Sklong in your research, please cite our paper:

bibtex @article{Provost2025, doi = {10.21105/joss.08481}, url = {https://doi.org/10.21105/joss.08481}, year = {2025}, publisher = {The Open Journal}, volume = {10}, number = {112}, pages = {8481}, author = {Provost, Simon and Freitas, Alex A.}, title = {Scikit-Longitudinal: A Machine Learning Library for Longitudinal Classification in Python}, journal = {Journal of Open Source Software} }


🔐 License

Scikit-longitudinal is licensed under the MIT License.

Owner

  • Name: Provost Simon
  • Login: simonprovost
  • Kind: user
  • Location: London
  • Company: @UniversityOfKent

Incoming Visiting Researcher @ NYU | TANDON 🇺🇸 –– Ph.D student @ University of Kent 🎓

JOSS Publication

Scikit-Longitudinal: A Machine Learning Library for Longitudinal Classification in Python
Published
August 01, 2025
Volume 10, Issue 112, Page 8481
Authors
Simon Provost ORCID
School of Computing, University of Kent, Canterbury, United Kingdom
Alex A. Freitas ORCID
School of Computing, University of Kent, Canterbury, United Kingdom
Editor
Evan Spotte-Smith ORCID
Tags
machine learning longitudinal data classification Scikit-learn

GitHub Events

Total
  • Create event: 7
  • Commit comment event: 2
  • Issues event: 19
  • Release event: 2
  • Watch event: 35
  • Delete event: 7
  • Issue comment event: 43
  • Push event: 50
  • Pull request event: 13
  • Fork event: 2
Last Year
  • Create event: 7
  • Commit comment event: 2
  • Issues event: 19
  • Release event: 2
  • Watch event: 35
  • Delete event: 7
  • Issue comment event: 43
  • Push event: 50
  • Pull request event: 13
  • Fork event: 2

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 174
  • Total Committers: 2
  • Avg Commits per committer: 87.0
  • Development Distribution Score (DDS): 0.098
Past Year
  • Commits: 52
  • Committers: 1
  • Avg Commits per committer: 52.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Provost Simon s****t@e****u 157
sgp28 s****v@g****n 17
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 27
  • Total pull requests: 40
  • Average time to close issues: 23 days
  • Average time to close pull requests: about 11 hours
  • Total issue authors: 7
  • Total pull request authors: 1
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.1
  • Merged pull requests: 36
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 13
  • Pull requests: 10
  • Average time to close issues: 4 days
  • Average time to close pull requests: 43 minutes
  • Issue authors: 7
  • Pull request authors: 1
  • Average comments per issue: 3.62
  • Average comments per pull request: 0.2
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • simonprovost (15)
  • blengerich (4)
  • TahiriNadia (3)
  • klemenpe (2)
  • orduek (1)
  • edqzhang (1)
  • SteffKL (1)
Pull Request Authors
  • simonprovost (40)
Top Labels
Issue Labels
Classification (6) Preprocessing (3) enhancement (2) Preparation (2) documentation (1) help wanted (1)
Pull Request Labels
documentation (8) enhancement (8) Classification (4) bug (4) Preprocessing (3) Preparation (3) tests (3) migration (2)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 330 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 16
  • Total maintainers: 1
pypi.org: scikit-longitudinal

Scikit-longitudinal is an open-source Python library for longitudinal data analysis, building on Scikit-learn's foundation with tools tailored for repeated measures data.

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 215 Last month
Rankings
Dependent packages count: 10.7%
Average: 35.5%
Dependent repos count: 60.3%
Maintainers (1)
Last synced: 4 months ago
pypi.org: scikit-lexicographical-trees

A set of python modules for machine learning and data mining

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 115 Last month
Rankings
Dependent packages count: 10.7%
Average: 35.5%
Dependent repos count: 60.3%
Maintainers (1)
Last synced: 4 months ago