skglm

Fast and modular sklearn replacement for generalized linear models

https://github.com/scikit-learn-contrib/skglm

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
1 of 18 committers (5.6%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.1%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Fast and modular sklearn replacement for generalized linear models

Basic Info

Host: GitHub
Owner: scikit-learn-contrib
License: bsd-3-clause
Language: Python
Default Branch: main
Homepage: http://contrib.scikit-learn.org/skglm
Size: 56.7 MB

Statistics

Stars: 183
Watchers: 8
Forks: 37
Open Issues: 44
Releases: 0

Created about 4 years ago · Last pushed 11 months ago

Metadata Files

Readme License Code of conduct Citation

## A fast ⚡ and modular ⚒️ scikit-learn replacement for sparse GLMs ![build](https://github.com/scikit-learn-contrib/skglm/workflows/pytest/badge.svg) ![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg) [![Downloads](https://static.pepy.tech/badge/skglm)](https://pepy.tech/project/skglm) [![Downloads](https://static.pepy.tech/badge/skglm/month)](https://pepy.tech/project/skglm) [![PyPI version](https://badge.fury.io/py/skglm.svg)](https://pypi.org/project/skglm/)

skglm is a Python package that offers fast estimators for sparse Generalized Linear Models (GLMs) that are 100% compatible with scikit-learn. It is highly flexible and supports a wide range of GLMs. You get to choose from skglm's already-made estimators or customize your own by combining the available datafits and penalties.

Excited to have a tour on skglm documentation?

Cite

skglm is the result of perseverant research. It is licensed under BSD 3-Clause. You are free to use it and if you do so, please cite

```bibtex @inproceedings{skglm, title = {Beyond L1: Faster and better sparse models with skglm}, author = {Q. Bertrand and Q. Klopfenstein and P.-A. Bannier and G. Gidel and M. Massias}, booktitle = {NeurIPS}, year = {2022}, }

@article{moufad2023skglm, title={skglm: improving scikit-learn for regularized Generalized Linear Models}, author={Moufad, Badr and Bannier, Pierre-Antoine and Bertrand, Quentin and Klopfenstein, Quentin and Massias, Mathurin}, year={2023} } ```

Why `skglm`?

skglm is specifically conceived to solve sparse GLMs. It supports many missing models in scikit-learn and ensures high performance. There are several reasons to opt for skglm among which:

| | | | ----- | -------------- | | Speed | Fast solvers able to tackle large datasets, either dense or sparse, with millions of features up to 100 times faster than scikit-learn| | Modularity | User-friendly API that enables composing custom estimators with any combination of its existing datafits and penalties | | Extensibility | Flexible design that makes it simple and easy to implement new datafits and penalties, a matter of few lines of code | Compatibility | Estimators fully compatible with the scikit-learn API and drop-in replacements of its GLM estimators | | |

Get started with `skglm`

Installing `skglm`

skglm is available on PyPi. Run the following command to get the latest version of the package

shell pip install -U skglm

It is also available on conda-forge and can be installed using, for instance:

shell conda install -c conda-forge skglm

First steps with `skglm`

Once you installed skglm, you can run the following code snippet to fit a MCP Regression model on a toy dataset

```python

import model to fit

from skglm.estimators import MCPRegression

import util to create a toy dataset

from skglm.utils.data import makecorrelateddata

generate a toy dataset

X, y, _ = makecorrelateddata(nsamples=10, nfeatures=100)

init and fit estimator

estimator = MCPRegression() estimator.fit(X, y)

print R²

print(estimator.score(X, y)) `You can refer to the documentation to explore the list ofskglm``'s already-made estimators.

Didn't find one that suits you? you can still compose your own. Here is a code snippet that fits a MCP-regularized problem with Huber loss.

```python

import datafit, penalty and GLM estimator

from skglm.datafits import Huber from skglm.penalties import MCPenalty from skglm.estimators import GeneralizedLinearEstimator

from skglm.utils.data import makecorrelateddata from skglm.solvers import AndersonCD

X, y, _ = makecorrelateddata(nsamples=10, nfeatures=100)

create and fit GLM estimator with Huber loss and MCP penalty

estimator = GeneralizedLinearEstimator( datafit=Huber(delta=1.), penalty=MCPenalty(alpha=1e-2, gamma=3), solver=AndersonCD() ) estimator.fit(X, y) ```

You will find detailed description on the supported datafits and penalties and how to combine them in the API section of the documentation. You can also take our tutorial to learn how to create your own datafit and penalty.

Contribute to `skglm`

skglm is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be

bug report: you may encounter a bug while using skglm. Don't hesitate to report it on the issue section.
feature request: you may want to extend/add new features to skglm. You can use the issue section to make suggestions.
pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and we will reach out to you asap.

Useful links

link to documentation: https://contrib.scikit-learn.org/skglm/
link to skglm arXiv article: https://arxiv.org/pdf/2204.07826.pdf

Owner

Name: scikit-learn-contrib
Login: scikit-learn-contrib
Kind: organization

Website: http://contrib.scikit-learn.org
Repositories: 27
Profile: https://github.com/scikit-learn-contrib

scikit-learn compatible projects

Citation (CITATION.bib)

@inproceedings{skglm,
    title     = {Beyond L1: Faster and better sparse models with skglm},
    author    = {Q. Bertrand and Q. Klopfenstein and P.-A. Bannier and G. Gidel and M. Massias},
    booktitle = {NeurIPS},
    year      = {2022},
}

GitHub Events

Total

Commit comment event: 2
Issues event: 17
Watch event: 21
Issue comment event: 82
Push event: 42
Pull request review comment event: 101
Pull request review event: 77
Pull request event: 63
Fork event: 6

Last Year

Commit comment event: 2
Issues event: 17
Watch event: 21
Issue comment event: 82
Push event: 42
Pull request review comment event: 101
Pull request review event: 77
Pull request event: 63
Fork event: 6

Committers

Last synced: about 1 year ago

All Time

Total Commits: 156
Total Committers: 18
Avg Commits per committer: 8.667
Development Distribution Score (DDS): 0.609

Past Year

Commits: 28
Committers: 10
Avg Commits per committer: 2.8
Development Distribution Score (DDS): 0.571

Top Committers

Name	Email	Commits
Badr MOUFAD	6****D	61
mathurinm	m****m	50
PAB	p**r@g**m	18
Quentin Bertrand	q**d@m**c	7
floko	f**i@p**u	4
Johan Larsson	1****s	3
Pascal Carrivain	3****n	2
AnavAgrawal	7****l	1
Boris Pfahringer	b**d@g**m	1
En LAI	1****1	1
Julien Jerphanion	g**t@j**z	1
Klopfe	4****e	1
Ram Rachum	r**m@r**m	1
SujayP	c**t@g**m	1
Titouan Vayer	t**r@g**m	1
Tomasz Kacprzak	t**k@p**e	1
Wassim MAZOUZ	1****z	1
jasperlamm	1****m	1

Committer Domains (Top 20 + Academic)

pm.me: 1 rachum.com: 1 jjerphan.xyz: 1 polytechnique.edu: 1 mila.quebec: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 52
Total pull requests: 130
Average time to close issues: 5 months
Average time to close pull requests: about 1 month
Total issue authors: 13
Total pull request authors: 12
Average comments per issue: 1.12
Average comments per pull request: 2.38
Merged pull requests: 85
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 12
Pull requests: 61
Average time to close issues: about 2 months
Average time to close pull requests: 4 days
Issue authors: 6
Pull request authors: 7
Average comments per issue: 1.25
Average comments per pull request: 1.21
Merged pull requests: 36
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

mathurinm (38)
Badr-MOUFAD (8)
QB3 (6)
PABannier (6)
carlosg-m (3)
floriankozikowski (3)
tpanum (2)
hermanhmchan (2)
jonpedros (2)
jolars (2)
Tianbo-Diao (1)
s-banach (1)
sujay-pandit (1)
ktang16 (1)
glemaitre (1)

Pull Request Authors

mathurinm (44)
Badr-MOUFAD (39)
floriankozikowski (27)
PABannier (16)
QB3 (7)
jolars (6)
PascalCarrivain (4)
Perceptronium (3)
hoodaty (2)
EnLAI111 (1)
sujay-pandit (1)
Klopfe (1)
tomaszkacprzak (1)
tvayer (1)
wassimmazouz (1)

Top Labels

Issue Labels

good first issue (4) enhancement (3) documentation (1) feature request (1) bug (1)

Pull Request Labels

Ready for review (14) Work In Progress (4)

Packages

Total packages: 1
Total downloads:
- pypi 3,208 last-month
Total docker downloads: 58

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 6
Total maintainers: 1

pypi.org: skglm

A fast and modular scikit-learn replacement for generalized linear models

Homepage: https://contrib.scikit-learn.org/skglm
Documentation: https://skglm.readthedocs.io/
License: BSD (3-Clause)
Latest release: 0.3.1
published over 2 years ago

Versions: 6
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 3,208 Last month
Docker Downloads: 58

Rankings

Docker downloads count: 3.2%

Downloads: 3.6%

Average: 9.6%

Dependent packages count: 10.1%

Dependent repos count: 21.6%

Maintainers (1)

mathurinm

Last synced: 10 months ago

Dependencies

.github/workflows/circleci.yml actions

larsoner/circleci-artifacts-redirector-action master composite

.github/workflows/flake8.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/main.yml actions

actions/checkout v3 composite
actions/setup-python v3 composite

doc/doc-requirements.txt pypi

benchopt *
furo *
libsvmdata >=0.2
matplotlib >=2.0.0
myst_parser *
numpydoc *
pillow *
pytest *
sphinx-bootstrap-theme *
sphinx-gallery *
sphinx_copybutton *

setup.py pypi

numba *
numpy >=1.12
scikit-learn >=1.0
scipy >=0.18.0

skglm

Science Score: 64.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Cite

Why skglm?

Get started with skglm

Installing skglm

First steps with skglm

import model to fit

import util to create a toy dataset

generate a toy dataset

init and fit estimator

print R²

import datafit, penalty and GLM estimator

create and fit GLM estimator with Huber loss and MCP penalty

Contribute to skglm

Useful links

Owner

Citation (CITATION.bib)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: skglm

Rankings

Maintainers (1)

Dependencies

Why `skglm`?

Get started with `skglm`

Installing `skglm`

First steps with `skglm`

Contribute to `skglm`