imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

https://github.com/scikit-learn-contrib/imbalanced-learn

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    4 of 87 committers (4.6%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.4%) to scientific vocabulary

Keywords

data-analysis data-science machine-learning python statistics

Keywords from Contributors

data-mining distributed alignment flexible tensors transformers parallel autograd gbrt gbm
Last synced: 6 months ago · JSON representation

Repository

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

Basic Info
  • Host: GitHub
  • Owner: scikit-learn-contrib
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage: https://imbalanced-learn.org
  • Size: 21.8 MB
Statistics
  • Stars: 7,036
  • Watchers: 141
  • Forks: 1,313
  • Open Issues: 55
  • Releases: 40
Topics
data-analysis data-science machine-learning python statistics
Created over 11 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Authors

README.rst

.. -*- mode: rst -*-

.. _scikit-learn: http://scikit-learn.org/stable/

.. _scikit-learn-contrib: https://github.com/scikit-learn-contrib

|GitHubActions|_ |Codecov|_ |CircleCI|_ |PythonVersion|_ |Pypi|_ |Gitter|_ |Black|_

.. |GitHubActions| image:: https://github.com/scikit-learn-contrib/imbalanced-learn/actions/workflows/tests.yml/badge.svg
.. _GitHubActions: https://github.com/scikit-learn-contrib/imbalanced-learn/actions/workflows/tests.yml

.. |Codecov| image:: https://codecov.io/gh/scikit-learn-contrib/imbalanced-learn/branch/master/graph/badge.svg
.. _Codecov: https://codecov.io/gh/scikit-learn-contrib/imbalanced-learn

.. |CircleCI| image:: https://circleci.com/gh/scikit-learn-contrib/imbalanced-learn.svg?style=shield
.. _CircleCI: https://circleci.com/gh/scikit-learn-contrib/imbalanced-learn/tree/master

.. |PythonVersion| image:: https://img.shields.io/pypi/pyversions/imbalanced-learn.svg
.. _PythonVersion: https://img.shields.io/pypi/pyversions/imbalanced-learn.svg

.. |Pypi| image:: https://badge.fury.io/py/imbalanced-learn.svg
.. _Pypi: https://badge.fury.io/py/imbalanced-learn

.. |Gitter| image:: https://badges.gitter.im/scikit-learn-contrib/imbalanced-learn.svg
.. _Gitter: https://gitter.im/scikit-learn-contrib/imbalanced-learn?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge

.. |Black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
.. _Black: :target: https://github.com/psf/black

.. |PythonMinVersion| replace:: 3.10
.. |NumPyMinVersion| replace:: 1.25.2
.. |SciPyMinVersion| replace:: 1.11.4
.. |ScikitLearnMinVersion| replace:: 1.4.2
.. |MatplotlibMinVersion| replace:: 3.7.3
.. |PandasMinVersion| replace:: 2.0.3
.. |TensorflowMinVersion| replace:: 2.16.1
.. |KerasMinVersion| replace:: 3.3.3
.. |SeabornMinVersion| replace:: 0.12.2
.. |PytestMinVersion| replace:: 7.2.2

imbalanced-learn
================

imbalanced-learn is a python package offering a number of re-sampling techniques
commonly used in datasets showing strong between-class imbalance.
It is compatible with scikit-learn_ and is part of scikit-learn-contrib_
projects.

Documentation
-------------

Installation documentation, API documentation, and examples can be found on the
documentation_.

.. _documentation: https://imbalanced-learn.org/stable/

Installation
------------

Dependencies
~~~~~~~~~~~~

`imbalanced-learn` requires the following dependencies:

- Python (>= |PythonMinVersion|)
- NumPy (>= |NumPyMinVersion|)
- SciPy (>= |SciPyMinVersion|)
- Scikit-learn (>= |ScikitLearnMinVersion|)
- Pytest (>= |PytestMinVersion|)

Additionally, `imbalanced-learn` requires the following optional dependencies:

- Pandas (>= |PandasMinVersion|) for dealing with dataframes
- Tensorflow (>= |TensorflowMinVersion|) for dealing with TensorFlow models
- Keras (>= |KerasMinVersion|) for dealing with Keras models

The examples will requires the following additional dependencies:

- Matplotlib (>= |MatplotlibMinVersion|)
- Seaborn (>= |SeabornMinVersion|)

Installation
~~~~~~~~~~~~

From PyPi or conda-forge repositories
.....................................

imbalanced-learn is currently available on the PyPi's repositories and you can
install it via `pip`::

  pip install -U imbalanced-learn

The package is release also in Anaconda Cloud platform::

  conda install -c conda-forge imbalanced-learn

From source available on GitHub
...............................

If you prefer, you can clone it and run the setup.py file. Use the following
commands to get a copy from Github and install all dependencies::

  git clone https://github.com/scikit-learn-contrib/imbalanced-learn.git
  cd imbalanced-learn
  pip install .

Be aware that you can install in developer mode with::

  pip install --no-build-isolation --editable .

If you wish to make pull-requests on GitHub, we advise you to install
pre-commit::

  pip install pre-commit
  pre-commit install

Testing
~~~~~~~

After installation, you can use `pytest` to run the test suite::

  make coverage

Development
-----------

The development of this scikit-learn-contrib is in line with the one
of the scikit-learn community. Therefore, you can refer to their
`Development Guide
`_.

Endorsement of the Scientific Python Specification
--------------------------------------------------

We endorse good practices from the Scientific Python Ecosystem Coordination (SPEC).
The full list of recommendations is available `here`_.

See below the list of recommendations that we endorse for the imbalanced-learn project.

|SPEC 0 — Minimum Supported Dependencies|

.. |SPEC 0 — Minimum Supported Dependencies| image:: https://img.shields.io/badge/SPEC-0-green?labelColor=%23004811&color=%235CA038
   :target: https://scientific-python.org/specs/spec-0000/

.. _here: https://scientific-python.org/specs/

About
-----

If you use imbalanced-learn in a scientific publication, we would appreciate
citations to the following paper::

  @article{JMLR:v18:16-365,
  author  = {Guillaume  Lema{{\^i}}tre and Fernando Nogueira and Christos K. Aridas},
  title   = {Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning},
  journal = {Journal of Machine Learning Research},
  year    = {2017},
  volume  = {18},
  number  = {17},
  pages   = {1-5},
  url     = {http://jmlr.org/papers/v18/16-365}
  }

Most classification algorithms will only perform optimally when the number of
samples of each class is roughly the same. Highly skewed datasets, where the
minority is heavily outnumbered by one or more classes, have proven to be a
challenge while at the same time becoming more and more common.

One way of addressing this issue is by re-sampling the dataset as to offset this
imbalance with the hope of arriving at a more robust and fair decision boundary
than you would otherwise.

You can refer to the `imbalanced-learn`_ documentation to find details about
the implemented algorithms.

.. _imbalanced-learn: https://imbalanced-learn.org/stable/user_guide.html

Owner

  • Name: scikit-learn-contrib
  • Login: scikit-learn-contrib
  • Kind: organization

scikit-learn compatible projects

GitHub Events

Total
  • Create event: 22
  • Release event: 2
  • Issues event: 22
  • Watch event: 209
  • Delete event: 20
  • Issue comment event: 54
  • Push event: 45
  • Pull request review event: 9
  • Pull request review comment event: 7
  • Pull request event: 52
  • Fork event: 39
Last Year
  • Create event: 22
  • Release event: 2
  • Issues event: 22
  • Watch event: 209
  • Delete event: 20
  • Issue comment event: 54
  • Push event: 45
  • Pull request review event: 9
  • Pull request review comment event: 7
  • Pull request event: 52
  • Fork event: 39

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 833
  • Total Committers: 87
  • Avg Commits per committer: 9.575
  • Development Distribution Score (DDS): 0.509
Past Year
  • Commits: 25
  • Committers: 4
  • Avg Commits per committer: 6.25
  • Development Distribution Score (DDS): 0.36
Top Committers
Name Email Commits
Guillaume Lemaitre g****8@g****m 409
Guillaume Lemaitre g****e@v****u 212
Fernando Nogueira f****a@g****m 37
chkoar c****r 33
Dayvid Victor v****o@g****m 20
Soledad Galli s****i@p****m 8
dependabot[bot] 4****] 7
Alexander L. Hayes a****r@b****t 6
Joan Massich m****k@g****m 5
Matt Eding 3****g 5
chkoar c****r@c****p 4
microsheep t****p@g****m 4
T.Thost 5****t 2
Jeff Hale d****r 2
Shihab Shahriar r****5@g****m 2
bganglia 4****a 2
Hisashi Osanai o****i 2
Francis T. O'Donovan f****n@g****m 2
Yann Bayle b****4@g****m 2
chkoar i****r@g****m 2
David Gasquez d****z@g****m 1
Chuanzhu Xu c****u@g****m 1
Christian Kastner c****k@k****t 1
Camilo Prieto c****5@h****m 1
Beard 1****d 1
Baltschun ali a****n@g****m 1
Aurélien Massiot a****t@g****m 1
Ashwin Mathur 9****l 1
Arunabh a****m@p****m 1
Sebastian Raschka m****l@s****m 1
and 57 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 146
  • Total pull requests: 167
  • Average time to close issues: 6 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 129
  • Total pull request authors: 48
  • Average comments per issue: 2.87
  • Average comments per pull request: 1.63
  • Merged pull requests: 111
  • Bot issues: 0
  • Bot pull requests: 26
Past Year
  • Issues: 19
  • Pull requests: 49
  • Average time to close issues: 14 days
  • Average time to close pull requests: 8 days
  • Issue authors: 19
  • Pull request authors: 9
  • Average comments per issue: 1.63
  • Average comments per pull request: 0.55
  • Merged pull requests: 24
  • Bot issues: 0
  • Bot pull requests: 25
Top Authors
Issue Authors
  • glemaitre (8)
  • solegalli (6)
  • BrandonKMLee (2)
  • EssamWisam (2)
  • penguinpee (2)
  • Zethson (2)
  • celestinoxp (2)
  • wenh06 (1)
  • tomateit (1)
  • hayesall (1)
  • iki77 (1)
  • dcriado1985 (1)
  • tamargrey (1)
  • ts2095 (1)
  • stephengmatthews (1)
Pull Request Authors
  • glemaitre (76)
  • dependabot[bot] (29)
  • solegalli (13)
  • chkoar (4)
  • virchan (4)
  • Ab2nour (2)
  • neal301 (2)
  • imgremlin (2)
  • tthost (2)
  • fritshermans (2)
  • AYY7 (2)
  • ts2095 (2)
  • gmogol (2)
  • mr-c (2)
  • hayesall (2)
Top Labels
Issue Labels
Type: Question (9) Type: Bug (6) good first issue (6) Type: Enhancement (6) Status: More Info Needed (4) For: Documentation (3) Package: keras (1) Package: over_sampling (1) Type: Performance (1) easy (1) Status: Help Wanted (1) Package: under_sampling (1)
Pull Request Labels
dependencies (29) github_actions (12) No Changelog Needed (2) Status: Stalled (1)

Packages

  • Total packages: 5
  • Total downloads:
    • pypi 16,349,874 last-month
  • Total docker downloads: 14,006,826
  • Total dependent packages: 171
    (may contain duplicates)
  • Total dependent repositories: 3,053
    (may contain duplicates)
  • Total versions: 60
  • Total maintainers: 3
pypi.org: imbalanced-learn

Toolbox for imbalanced dataset in machine learning

  • Versions: 38
  • Dependent Packages: 158
  • Dependent Repositories: 2,817
  • Downloads: 16,349,874 Last month
  • Docker Downloads: 14,006,826
Rankings
Downloads: 0.1%
Dependent packages count: 0.2%
Dependent repos count: 0.2%
Stargazers count: 0.3%
Average: 0.6%
Forks count: 1.2%
Docker downloads count: 1.4%
Maintainers (1)
Last synced: 6 months ago
alpine-v3.17: py3-imbalanced-learn

Toolbox for imbalanced dataset in machine learning

  • Versions: 1
  • Dependent Packages: 2
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Forks count: 1.4%
Stargazers count: 2.3%
Average: 4.1%
Dependent packages count: 12.7%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: imbalanced-learn

imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.

  • Versions: 15
  • Dependent Packages: 8
  • Dependent Repositories: 118
Rankings
Dependent repos count: 3.1%
Forks count: 4.3%
Stargazers count: 4.3%
Average: 4.7%
Dependent packages count: 7.1%
Last synced: 6 months ago
spack.io: py-imbalanced-learn

imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit- learn-contrib projects.

  • Versions: 1
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Stargazers count: 1.7%
Forks count: 2.5%
Average: 15.4%
Dependent packages count: 57.3%
Maintainers (1)
Last synced: 6 months ago
anaconda.org: imbalanced-learn

imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.

  • Versions: 5
  • Dependent Packages: 2
  • Dependent Repositories: 118
Rankings
Forks count: 10.0%
Stargazers count: 10.3%
Dependent repos count: 17.6%
Average: 19.7%
Dependent packages count: 41.0%
Last synced: 6 months ago

Dependencies

.github/workflows/circleci-artifacts-redirector.yml actions
  • larsoner/circleci-artifacts-redirector-action master composite