Recent Releases of Scikit-Longitudinal
Scikit-Longitudinal - Sklong Official Paper Is Out! 0.1.0 is now out there π
Hi folks!
We are pleased to announce that Scikit-Longitudinal is now available in its first beta release under the tag 0.1.0 π
[!NOTE] Why
0.1.0now? With the publication of our paper in the Journal of Open Source Software (JOSS) today, we're marking a major step forward! This transitions us from alpha versions to beta, reflecting the library's maturity after rigorous open peer review and enhancements. Bear with us on the version jumpβwe've incorporated key updates since0.0.7, including two minor releases (0.0.8and0.0.9) for pre-approval tweaks.
β In a nutshell, what's Scikit-Longitudinal?
π½οΈ Scikit-Longitudinal (also abbreviated as Sklong) is built on @scikit-learn (by @scikit-learn team and contributors, thanks to @GaelVaroquaux for inspiring for years around the Sklearn ecosystem w/ now @probabl-ai), @Scikit-Tree (by the @neurodata team, including @adam2392 and others), while also drawing inspiration from longitudinal research by top-notch researchers like Dr. Caio Ribeiro (@caioedurib), Dr. Tossapol Pomsuwan (@mastervii), Dr. Sergey Ovchinnik (@SergeyOvchinnik), Dr. Fernando Otero (@febo), etc. But really, what does it do? πππ
π‘ Scikit-Longitudinal is an open-source Python package that improves machine learning for longitudinal data classification while integrating seamlessly with the Scikit-learn environment. Longitudinal data, which consists of repeated measurements of variables at different time intervals (known as waves), is widely used in sectors such as health and social sciences. Unlike ordinary tabular datasets, longitudinal data has temporal linkages that require specialised processing. In practice, people frequently either naively rely only on the last wave of information (while not too wrong, it forgets about the past) or flatten everything in an incorrect manner.
With Sklong, we address such a problematic workflow that is too frequently used with a novel set of tools, including:
- Data Preparation: Utilities like
LongitudinalDatasetfor loading and structuring data, defining temporal feature groups, and more. - Data Transformation: Methods to handle the temporal aspect, either by flattening the data into a static representation (e.g.,
MarWavTimeMinusorSepWav) for standard ML or preserving the temporal structure (e.g.,MerWavTimePlus) for use in longitudinal-aware steps. - Preprocessing: Longitudinal-data-aware feature selection, such as
CFSPerGroup, leveraging temporal information. - Estimators: Specialised algorithm-adaptation-based classifiers like
LexicoRandomForestClassifier,LexicoDeepForestClassifier, andNestedTreesClassifier, which exploit the temporal structure to potentially enhance performance.
In total, the library implements (as of 0.1.0) 1 data preparation method, 4 data transformation methods, 1 preprocessing method, and 6 estimatorsβ2 of which ( LexicoRandomForestClassifier and NestedTreesClassifier) are standalone methods published in the literature. Sklong emphasises highly-typed, Pythonic code with substantial test coverage (over 88%) and comprehensive documentation (over 72%).
| |
Not Enough? β
ποΈ The scientific paper is available at (Published by @openjournals) : https://joss.theoj.org/papers/10.21105/joss.08481
As well as that, more is coming; explore our GitHub issues, read through our README, and check our documentation! We've also added a podcast explanation in the docs for a quick audio overview.
| |
Open-Source Contribution, More Than Welcome! β
We hope to provide motivation for you to contribute your own estimators, preprocessors, data transformation techniques and more! If we could have 1% of what @scikit-learn did 10 years ago (back in France π«π·) for the machine learning community globally, it'd be just insane!
As a result, please share your suggestions! Without external input, how can we ensure we're advancing longitudinal ML workflows? π New primitives are welcome from external contributors without problemsβsimply open an issue to discuss.
For full transparency. JOSS peer review process (publicly available at https://github.com/openjournals/joss-reviews/issues/8481), where reviewers @TahiriNadia and @blengerich provided invaluable feedback, leading to significant refinements in documentation, examples, and overall user-friendliness. A huge thanks to them!
At the moment, we have an ongoing external contribution by a French team, lead by the great @MathiasValla; more specifically, they are going to contribute a new Longitudinal-data aware classifier, dubbed the Time-penalised trees (TpT). Read the theoretical aspect at: https://link.springer.com/article/10.1007/s10472-024-09950-w.
β I guess it's now time for tech-ish changelog!
π«΅ https://pypi.org/project/scikit-longitudinal/
[v0.1.0] - 2025-08-02 - JOSS Paper Publication and Beta Transition
Added
- JOSS paper integration: Added links, badges, and references to the published paper (DOI: 10.21105/joss.08481) β core commit for release.
- Tutorials section in documentation β #63.
- Troubleshooting installation section β commit on Jun 27.
- Podcast explanation of Sklong in docs β commit on Jul 30.
- Glightbox dependency for MkDocs β #63.
- Document-dates plugin for docs β #55.
- Blurry tabs styling for docs β #55.
Enhanced
- Documentation overhaul: New-style home page, improved wordings, import examples, and docstrings β #55, #63.
- Emphasized version compatibility in docs β commit on Jun 27.
- Adapted temporal dependency links β #63.
- Updated GitHub stars count in docs β commit on Jul 30.
- Updated development requirements (e.g., for RDT) and uv.lock β commits on Jun 26 and Jul 3.
- Improved PyPI README and links β commits on Jun 27 and Jul 22.
Resolved
- Fixed setup.py and PyPI links β commit 12 hours ago.
- Addressed JOSS pre-approval tweaks in minor releases
0.0.8and0.0.9β commits on Jun 27 and Jul 22.
Note: This changelog covers advancements since 0.0.7. For prior details, see the expanded history below.
Previously in v0.0.7 and earlier
## Version 0.0.7 - 2025-01-15 - Migration to `uv` and Major Enhancements
### New Features
- **Migration to `uv`**: Successfully transitioned from PDM to `uv` for package management, enhancing workflow efficiency and build reliability. Enhanced documentation to assist users with installation and setup using `uv`.
- **Refactored Visualisations and Tutorials**: Updated tutorials and visualisations to align with the migration to `uv`. Improved the **Quick Start Guide** by providing clearer instructions and optimising the layout to enhance user experience.
- **Enhanced Estimators and Pipelines**:
- Refactored **Lexico Gradient Boosting** for full compliance with Scikit-Learn, eliminating the previous dependency on StarBoost.
- Improved preprocessing pipelines for ARFF file management using the powerful **liac-arff** library.
- **CI/CD Updates**: Implemented enhancements to the continuous integration and deployment pipeline, resulting in more efficient builds and improved compatibility with GitHub Actions.
### Enhanced
- **Documentation**: Fixed typos and inconsistencies in installation guides and tutorials to enhance clarity and improve user experience.
- **Usage Examples**: Enhanced examples through various methodologies to effectively illustrate the application of longitudinal machine learning techniques.
- **Compliance and Maintainability**: Implemented key enhancements to boost compliance and maintainability of tools and documentation.
### Resolved
- Resolved minor issues in installation and documentation setups, improving usability and reliability.
π«΅
https://pypi.org/project/scikit-longitudinal/0.0.7/
## [v0.0.4] - 2024-07-04 - First Public Release and Major Enhancements
### Added
- **Documentation**: Comprehensive new documentation with Material for MKDocs. This includes a detailed tutorial on understanding vectors of waves in longitudinal datasets, a contribution guide, an FAQ section, and complete API references for all estimators, preprocessors, data preparations, and the pipeline manager.
- **Docker Installation**: Added new Docker installation process.
- **Windows Support**: Windows is now supported via Docker.
- **New Classifiers/Regressors**: Introduced Lexico Deep Forest, Lexico Gradient Boosting, and Lexico Decision Tree Regressor.
- **PyPI Availability**: Scikit-Longitudinal is now available on PyPI.
- **Continuous Integration**: Integrated unit testing, documentation, and PyPI publishing within the CI pipeline.
### Improved
- **PDM Setup and Installation**: Enhanced setup and installation processes using PDM.
- **Testing Coverage**: Improved testing coverage, ensuring that nearly 90% of the library is tested.
- **Scikit-Lexicographical-Trees**: Extracted the lexicographical scikit-learn tree node splitting function into its own repository and published it to PyPI as Scikit-Lexicographical-Trees. This is now leveraged by our lexico-based estimators.
- **.env Management**: Improved management of environment variables.
- **Lexicographical Enhancements**: Integrated lexicographical enhancements of the waves vector within the variant of scikit-learn, scikit-lexicographical-trees, improving memory and time efficiency by handling algorithmic temporality directly in C++.
### To-Do
- **Docstrings Alignment**: Ensure that docstrings in the codebase align with the official documentation to avoid confusion.
- **Native Windows Compatibility**: Achieve Windows compatibility without relying on Docker (requires access to a Windows machine).
- **Future Enhancements**: Ongoing improvements and new features as they are identified.
- **Documentation examples**: Add examples to the documentation to help users understand how to use the library with Jupyter notebooks.
## [v0.0.3] - 2023-10-31 - Usability, Maintainability, and Compliance Enhancements
### Added
- Features Group Missing Waves Handling: Introduced mechanisms for gracefully handling missing waves in features groups.
- Readiness Descriptions: New readiness indicators provide detailed descriptions of temporal data management across the library.
- Auto-Sklong Compliance: The library is now compliant with Auto-Sklong standards.
- Package Management Transition: Switched from Poetry to PDM for improved package and dependency management.
- Docker Support: Linux-based Docker environment setup for streamlined installation and deployment.
- Platform Testing: Library is tested on both Mac and Linux, with Windows support nearing completion.
- Documentation: Comprehensive version 0.0.1 of the documentation is available on GitHub Pages.
- Pipeline Manager: Refactored the pipeline into a more maintainable and flexible pipeline manager.
- CFS Classes Refactoring: Separated CFS and CFS Per Group algorithms into distinct classes for better management.
### Removed
- Irrelevant Scripts: Removed scripts related to visualizations not core to the library's functionality.
- Experiments Branch: Moved all experiment-related codes to a dedicated `Experiments` branch.
## [v0.0.2] - 2023-05-17 - Enhanced Longitudinal Analysis and Parallelization Features
### Added
- Implementation and validation of the three CFS Per Group Nested Tree and LexicoRF algorithms.
- Parallelization enhancements where possible.
- Longitudinal dataset handler for access to non-longitudinal features, longitudinal features group, etc.
- Longitudinal pipeline for longitudinal-based algorithms that pass features group onto each step of the pipeline.
- Comprehensive documentation and extensive test coverage (>95% of the codebase).
- Git hooks and other tools for long-term project use.
- An improved version of the CFS per Group algorithm (version two) based on the paper's concept level.
- Updated README file.
## [v0.0.1] - 2023-03-27 - Initial Release
### Added
- Initial setup of the Poetry Python project with robust type-checking.
- Integration of linting tools: pylint, flake8, pre-commit, black, and isort.
- Correlation-based Feature Selection (CFS) algorithm with improved typing and testing.
- CFS per Group for Longitudinal Data: Python implementation with parallelism for better performance.
Extras
Could be of interest to: @sudehashrafi, @blakeandreou, @rushikeshburle, @MEDomics-UdeS, @mvalliere, @gsi-upm, @JTFouquier, @bunu, @dado93, and more !
Scientific Software - Peer-reviewed
- Python
Published by simonprovost 5 months ago
Scikit-Longitudinal - π Migration to `UV` with better overall compliance
Version 0.0.7 - 2025-01-15 - Migration to uv and Major Enhancements
New Features
- Migration to
uv: Successfully transitioned from PDM touvfor package management, enhancing workflow efficiency and build reliability. Enhanced documentation to assist users with installation and setup usinguv. - Refactored Visualisations and Tutorials: Updated tutorials and visualisations to align with the migration to
uv. Improved the Quick Start Guide by providing clearer instructions and optimising the layout to enhance user experience. - Enhanced Estimators and Pipelines:
- Refactored Lexico Gradient Boosting for full compliance with Scikit-Learn, eliminating the previous dependency on StarBoost.
- Improved preprocessing pipelines for ARFF file management using the powerful liac-arff library.
- CI/CD Updates: Implemented enhancements to the continuous integration and deployment pipeline, resulting in more efficient builds and improved compatibility with GitHub Actions.
Enhanced
- Documentation: Fixed typos and inconsistencies in installation guides and tutorials to enhance clarity and improve user experience.
- Usage Examples: Enhanced examples through various methodologies to effectively illustrate the application of longitudinal machine learning techniques.
- Compliance and Maintainability: Implemented key enhancements to boost compliance and maintainability of tools and documentation.
Resolved
- Resolved minor issues in installation and documentation setups, improving usability and reliability.
Previously in v0.0.4
This release includes a number of important changes intended to improve the library's overall usability, maintainability, and compliance. We added new estimators and enhanced certain strategies for data preparation and preprocessing. In addition, we implemented everything for PyPi publishing and Github CI to ensure the long-term viability of Scikit-Longitudinal.
π«΅
https://pypi.org/project/Scikit-longitudinal/0.0.4/
## [v0.0.4] - 2024-07-04 - First Public Release and Major Enhancements
### Added
- **Documentation**: Comprehensive new documentation with Material for MKDocs. This includes a detailed tutorial on understanding vectors of waves in longitudinal datasets, a contribution guide, an FAQ section, and complete API references for all estimators, preprocessors, data preparations, and the pipeline manager.
- **Docker Installation**: Added new Docker installation process.
- **Windows Support**: Windows is now supported via Docker.
- **New Classifiers/Regressors**: Introduced Lexico Deep Forest, Lexico Gradient Boosting, and Lexico Decision Tree Regressor.
- **PyPI Availability**: Scikit-Longitudinal is now available on PyPI.
- **Continuous Integration**: Integrated unit testing, documentation, and PyPI publishing within the CI pipeline.
### Improved
- **PDM Setup and Installation**: Enhanced setup and installation processes using PDM.
- **Testing Coverage**: Improved testing coverage, ensuring that nearly 90% of the library is tested.
- **Scikit-Lexicographical-Trees**: Extracted the lexicographical scikit-learn tree node splitting function into its own repository and published it to PyPI as Scikit-Lexicographical-Trees. This is now leveraged by our lexico-based estimators.
- **.env Management**: Improved management of environment variables.
- **Lexicographical Enhancements**: Integrated lexicographical enhancements of the waves vector within the variant of scikit-learn, scikit-lexicographical-trees, improving memory and time efficiency by handling algorithmic temporality directly in C++.
### To-Do
- **Docstrings Alignment**: Ensure that docstrings in the codebase align with the official documentation to avoid confusion.
- **Native Windows Compatibility**: Achieve Windows compatibility without relying on Docker (requires access to a Windows machine).
- **Future Enhancements**: Ongoing improvements and new features as they are identified.
- **Documentation examples**: Add examples to the documentation to help users understand how to use the library with Jupyter notebooks.
## [v0.0.3] - 2023-10-31 - Usability, Maintainability, and Compliance Enhancements
### Added
- Features Group Missing Waves Handling: Introduced mechanisms for gracefully handling missing waves in features groups.
- Readiness Descriptions: New readiness indicators provide detailed descriptions of temporal data management across the library.
- Auto-Sklong Compliance: The library is now compliant with Auto-Sklong standards.
- Package Management Transition: Switched from Poetry to PDM for improved package and dependency management.
- Docker Support: Linux-based Docker environment setup for streamlined installation and deployment.
- Platform Testing: Library is tested on both Mac and Linux, with Windows support nearing completion.
- Documentation: Comprehensive version 0.0.1 of the documentation is available on GitHub Pages.
- Pipeline Manager: Refactored the pipeline into a more maintainable and flexible pipeline manager.
- CFS Classes Refactoring: Separated CFS and CFS Per Group algorithms into distinct classes for better management.
### Removed
- Irrelevant Scripts: Removed scripts related to visualizations not core to the library's functionality.
- Experiments Branch: Moved all experiment-related codes to a dedicated `Experiments` branch.
## [v0.0.2] - 2023-05-17 - Enhanced Longitudinal Analysis and Parallelization Features
### Added
- Implementation and validation of the three CFS Per Group Nested Tree and LexicoRF algorithms.
- Parallelization enhancements where possible.
- Longitudinal dataset handler for access to non-longitudinal features, longitudinal features group, etc.
- Longitudinal pipeline for longitudinal-based algorithms that pass features group onto each step of the pipeline.
- Comprehensive documentation and extensive test coverage (>95% of the codebase).
- Git hooks and other tools for long-term project use.
- An improved version of the CFS per Group algorithm (version two) based on the paper's concept level.
- Updated README file.
## [v0.0.1] - 2023-03-27 - Initial Release
### Added
- Initial setup of the Poetry Python project with robust type-checking.
- Integration of linting tools: pylint, flake8, pre-commit, black, and isort.
- Correlation-based Feature Selection (CFS) algorithm with improved typing and testing.
- CFS per Group for Longitudinal Data: Python implementation with parallelism for better performance.
Scientific Software - Peer-reviewed
- Python
Published by simonprovost 12 months ago
Scikit-Longitudinal - π First Public Release Github & PyPi
This release includes a number of important changes intended to improve the library's overall usability, maintainability, and compliance. We added new estimators and enhanced certain strategies for data preparation and preprocessing. In addition, we implemented everything for PyPi publishing and Github CI to ensure the long-term viability of Scikit-Longitudinal.
π«΅ https://pypi.org/project/Scikit-longitudinal/0.0.4/
[v0.0.4] - 2024-07-04 - First Public Release and Major Enhancements
Added
- Documentation: Comprehensive new documentation with Material for MKDocs. This includes a detailed tutorial on understanding vectors of waves in longitudinal datasets, a contribution guide, an FAQ section, and complete API references for all estimators, preprocessors, data preparations, and the pipeline manager.
- Docker Installation: Added new Docker installation process.
- Windows Support: Windows is now supported via Docker.
- New Classifiers/Regressors: Introduced Lexico Deep Forest, Lexico Gradient Boosting, and Lexico Decision Tree Regressor.
- PyPI Availability: Scikit-Longitudinal is now available on PyPI.
- Continuous Integration: Integrated unit testing, documentation, and PyPI publishing within the CI pipeline.
Improved
- PDM Setup and Installation: Enhanced setup and installation processes using PDM.
- Testing Coverage: Improved testing coverage, ensuring that nearly 90% of the library is tested.
- Scikit-Lexicographical-Trees: Extracted the lexicographical scikit-learn tree node splitting function into its own repository and published it to PyPI as Scikit-Lexicographical-Trees. This is now leveraged by our lexico-based estimators.
- .env Management: Improved management of environment variables.
- Lexicographical Enhancements: Integrated lexicographical enhancements of the waves vector within the variant of scikit-learn, scikit-lexicographical-trees, improving memory and time efficiency by handling algorithmic temporality directly in C++.
To-Do
- Docstrings Alignment: Ensure that docstrings in the codebase align with the official documentation to avoid confusion.
- Native Windows Compatibility: Achieve Windows compatibility without relying on Docker (requires access to a Windows machine).
- Future Enhancements: Ongoing improvements and new features as they are identified.
- Documentation examples: Add examples to the documentation to help users understand how to use the library with Jupyter notebooks.
[v0.0.3] - 2023-10-31 - Usability, Maintainability, and Compliance Enhancements
Added
- Features Group Missing Waves Handling: Introduced mechanisms for gracefully handling missing waves in features groups.
- Readiness Descriptions: New readiness indicators provide detailed descriptions of temporal data management across the library.
- Auto-Sklong Compliance: The library is now compliant with Auto-Sklong standards.
- Package Management Transition: Switched from Poetry to PDM for improved package and dependency management.
- Docker Support: Linux-based Docker environment setup for streamlined installation and deployment.
- Platform Testing: Library is tested on both Mac and Linux, with Windows support nearing completion.
- Documentation: Comprehensive version 0.0.1 of the documentation is available on GitHub Pages.
- Pipeline Manager: Refactored the pipeline into a more maintainable and flexible pipeline manager.
- CFS Classes Refactoring: Separated CFS and CFS Per Group algorithms into distinct classes for better management.
Removed
- Irrelevant Scripts: Removed scripts related to visualizations not core to the library's functionality.
- Experiments Branch: Moved all experiment-related codes to a dedicated
Experimentsbranch.
[v0.0.2] - 2023-05-17 - Enhanced Longitudinal Analysis and Parallelization Features
Added
- Implementation and validation of the three CFS Per Group Nested Tree and LexicoRF algorithms.
- Parallelization enhancements where possible.
- Longitudinal dataset handler for access to non-longitudinal features, longitudinal features group, etc.
- Longitudinal pipeline for longitudinal-based algorithms that pass features group onto each step of the pipeline.
- Comprehensive documentation and extensive test coverage (>95% of the codebase).
- Git hooks and other tools for long-term project use.
- An improved version of the CFS per Group algorithm (version two) based on the paper's concept level.
- Updated README file.
[v0.0.1] - 2023-03-27 - Initial Release
Added
- Initial setup of the Poetry Python project with robust type-checking.
- Integration of linting tools: pylint, flake8, pre-commit, black, and isort.
- Correlation-based Feature Selection (CFS) algorithm with improved typing and testing.
- CFS per Group for Longitudinal Data: Python implementation with parallelism for better performance.
Scientific Software - Peer-reviewed
- Python
Published by simonprovost over 1 year ago
Scikit-Longitudinal - π Major Improvements and Compliance with AutoLD
v0.0.3
This release introduces a number of significant modifications aimed at enhancing the library's overall usability, maintainability, and compliance. We have addressed everything from features group management and AutoLD compliance to the switch from Poetry to PDM for package management and cross-compatability!
Key Features π«Ά
- Features Group Missing Waves Handling: We've introduced mechanisms to gracefully handle missing waves in features groups.
- Readiness Descriptions: New readiness indicators are available, providing detailed descriptions of how temporal information is handled across the library.
- Compliance with AutoLD: The library is now compliant with AutoLD standards, extending its interoperability.
- Package Management Transition: We've migrated from Poetry to PDM, enhancing our package and dependency management.
- Docker Support: A Linux-based Docker environment has been set up to streamline installation and deployment.
- Platform Testing: The library is tested on both Mac and Linux. Windows support is nearing completion.
- Documentation: A comprehensive version 0.0.1 of the documentation is now available on GitHub Pages.
- Pipeline Manager: The pipeline has been refactored into a more maintainable and flexible pipeline manager.
- CFS Classes Refactoring: The CFS and CFS Per Group algorithms have been separated into distinct classes for better management.
Removed or Moved π§Ή
- Irrelevant Scripts: Scripts related to visualisations have been removed as they were not directly relevant to the library's core functionality.
- Experiments Branch: All experiment-related codes have been moved to a dedicated branch
Experiments.
v0.0.2
This release introduces several key improvements and features, including the implementation and validation of the three 'CFS Per Group Nested Tree' and 'LexicoRF', parallelization where possible, and a longitudinal dataset handler. Additionally, the codebase is highly documented and more than 95% of it is tested.
Key Features:
- CFS Per Group Nested Tree and LexicoRF: Implemented and validated these algorithms.
- Parallelization: Applied parallelization for performance improvement wherever possible.
- Longitudinal Dataset Handler: Introduced a handler for easy access to non-longitudinal features, longitudinal features group, etc.
- Longitudinal Pipeline: Developed a pipeline specifically for longitudinal-based algorithms, allowing feature groups to pass onto each step of the pipeline.
- Highly Documented Code: Ensured the codebase is well-documented to facilitate understanding and maintenance.
- Extensive Testing: More than 95% of the codebase is tested.
- Hooks and More Tools: Added hooks and other tools for long-term project usage.
- Improved CFS Per Group Algorithm: Introduced a version two of the algorithm, based on the paper's concept level.
- Updated README: The README has been updated with new information.
v0.0.1 - Initial Release
This release marks the initial setup of the Poetry Python project for Scikit-Longitudinal with one first estimator, featuring robust type-checking and an array of linting tools, including pylint, flake8, pre-commit, black, and isort.
Key Features:
- Setup project with one first estimator
- Highly typed Python code to ensure code quality and maintainability.
- Comprehensive linting tools (pylint, flake8, pre-commit, black, isort) integrated into the project to enforce coding standards and consistency.
Estimators:
- Correlation-based Feature Selection (CFS) algorithm: This release includes a refined version of an open-source CFS algorithm, featuring improved typing, testing, runtime optimisation and brand new search algorithms.
- CFS per Group for Longitudinal Data: This release also introduces a Python implementation of a previously Java-based open-source CFS per Group algorithm, tailored for longitudinal data. The Python implementation now is enhanced with parallelism for better performance, testing and highly typing.
We hope you enjoy using this first release and look forward to your feedback and contributions! Cheers!
Scientific Software - Peer-reviewed
- Python
Published by simonprovost about 2 years ago
Scikit-Longitudinal - π‘ Implementation of New Algorithms and Improved project's powerfulness and safeness
v0.0.2
This release introduces several key improvements and features, including the implementation and validation of the three 'CFS Per Group Nested Tree' and 'LexicoRF', parallelization where possible, and a longitudinal dataset handler. Additionally, the codebase is highly documented and more than 95% of it is tested.
Key Features:
- CFS Per Group Nested Tree and LexicoRF: Implemented and validated these algorithms.
- Parallelization: Applied parallelization for performance improvement wherever possible.
- Longitudinal Dataset Handler: Introduced a handler for easy access to non-longitudinal features, longitudinal features group, etc.
- Longitudinal Pipeline: Developed a pipeline specifically for longitudinal-based algorithms, allowing feature groups to pass onto each step of the pipeline.
- Highly Documented Code: Ensured the codebase is well-documented to facilitate understanding and maintenance.
- Extensive Testing: More than 95% of the codebase is tested.
- Hooks and More Tools: Added hooks and other tools for long-term project usage.
- Improved CFS Per Group Algorithm: Introduced a version two of the algorithm, based on the paper's concept level.
- Updated README: The README has been updated with new information.
v0.0.1 - Initial Release
This release marks the initial setup of the Poetry Python project for Scikit-Longitudinal with one first estimator, featuring robust type-checking and an array of linting tools, including pylint, flake8, pre-commit, black, and isort.
Key Features:
- Setup project with one first estimator
- Highly typed Python code to ensure code quality and maintainability.
- Comprehensive linting tools (pylint, flake8, pre-commit, black, isort) integrated into the project to enforce coding standards and consistency.
Estimators:
- Correlation-based Feature Selection (CFS) algorithm: This release includes a refined version of an open-source CFS algorithm, featuring improved typing, testing, runtime optimisation and brand new search algorithms.
- CFS per Group for Longitudinal Data: This release also introduces a Python implementation of a previously Java-based open-source CFS per Group algorithm, tailored for longitudinal data. The Python implementation now is enhanced with parallelism for better performance, testing and highly typing.
We hope you enjoy using this first release and look forward to your feedback and contributions! Cheers!
Scientific Software - Peer-reviewed
- Python
Published by simonprovost over 2 years ago
Scikit-Longitudinal - π₯ Setup project and First estimator
v0.0.1 - Initial Release
This release marks the initial setup of the Poetry Python project for Scikit-Longitudinal with one first estimator, featuring robust type-checking and an array of linting tools, including pylint, flake8, pre-commit, black, and isort.
Key Features:
- Setup project with one first estimator
- Highly typed Python code to ensure code quality and maintainability.
- Comprehensive linting tools (pylint, flake8, pre-commit, black, isort) integrated into the project to enforce coding standards and consistency.
Estimators:
- Correlation-based Feature Selection (CFS) algorithm: This release includes a refined version of an open-source CFS algorithm, featuring improved typing, testing, runtime optimisation and brand new search algorithms.
- CFS per Group for Longitudinal Data: This release also introduces a Python implementation of a previously Java-based open-source CFS per Group algorithm, tailored for longitudinal data. The Python implementation now is enhanced with parallelism for better performance, testing and highly typing.
We hope you enjoy using this first release and look forward to your feedback and contributions! Cheers!
Scientific Software - Peer-reviewed
- Python
Published by simonprovost almost 3 years ago