scikit-activeml

scikit-activeml: Python library for active learning on top of scikit-learn

https://github.com/scikit-activeml/scikit-activeml

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: preprints.org
✓
Committers with academic emails
6 of 18 committers (33.3%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.9%) to scientific vocabulary

Keywords

active-learning machine-learning python scikit-learn

Keywords from Contributors

mesh interpretability sequences projection interactive optim hacking network-simulation

Last synced: 6 months ago · JSON representation ·

Repository

scikit-activeml: Python library for active learning on top of scikit-learn

Basic Info

Host: GitHub
Owner: scikit-activeml
License: bsd-3-clause
Language: Python
Default Branch: master
Homepage: https://scikit-activeml.github.io
Size: 64.9 MB

Statistics

Stars: 171
Watchers: 5
Forks: 19
Open Issues: 23
Releases: 20

Topics

active-learning machine-learning python scikit-learn

Created over 5 years ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Code of conduct Citation

README.rst

.. intro_start

|

.. image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/scikit-activeml-logo.png
   :class: dark-light
   :align: center
   :width: 40%

|

=====================================================================
scikit-activeml: A Library and Toolbox for Active Learning Algorithms
=====================================================================
|Doc| |Codecov| |PythonVersion| |PyPi| |Black| |Downloads| |Paper|

.. |Doc| image:: https://img.shields.io/badge/docs-latest-green
   :target: https://scikit-activeml.github.io/latest/

.. |Codecov| image:: https://codecov.io/gh/scikit-activeml/scikit-activeml/branch/master/graph/badge.svg
   :target: https://app.codecov.io/gh/scikit-activeml/scikit-activeml

.. |PythonVersion| image:: https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue.svg
   :target: https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue

.. |PyPi| image:: https://badge.fury.io/py/scikit-activeml.svg
   :target: https://badge.fury.io/py/scikit-activeml

.. |Paper| image:: https://img.shields.io/badge/paper-10.20944/preprints202103.0194.v1-blue.svg
   :target: https://www.preprints.org/manuscript/202103.0194/v1

.. |Black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :target: https://github.com/psf/black

.. |Downloads| image:: https://static.pepy.tech/badge/scikit-activeml
   :target: https://www.pepy.tech/projects/scikit-activeml

Machine learning models often need large amounts of training data to perform well.
While unlabeled data can be gathered with relative ease, labeling is typically difficult,
time-consuming, or expensive. Active learning addresses this challenge by querying labels
for the most informative samples, enabling high performance with fewer labeled examples.
With this goal in mind, **scikit-activeml** has been developed as a Python library for active
learning on top of `scikit-learn `_.

.. intro_end

.. user_installation_start

User Installation
=================

The easiest way to install scikit-activeml is using ``pip``:

::

    pip install -U scikit-activeml

This installation via `pip` includes only the minimum requirements to avoid
potential package downgrades within your installation. If you encounter any incompatibility issues,
you can install the `maximum requirements `_,
which have been tested for the current package release:

::

    pip install -U scikit-activeml[max]

.. user_installation_end

.. examples_start

Examples
========

We provide a broad overview of different use cases in our `tutorial section `_,
including:

- `Pool-based Active Learning - Getting Started `_
- `Deep Pool-based Active Learning - scikit-activeml with Skorch `_
- `Pool-based Active Learning for Regression - Getting Started `_
- `Pool-based Active Learning - Sample Annotating `_
- `Pool-based Active Learning - Simple Evaluation Study `_
- `Active Image Classification via Self-supervised Learning `_
- `Multi-annotator Pool-based Active Learning - Getting Started `_
- `Stream-based Active Learning - Getting Started `_
- `Batch Stream-based Active Learning with Pool Query Strategies `_
- `Stream-based Active Learning With River `_

Below are two code snippets illustrating how straightforward it is to implement active learning cycles using our Python package ``skactiveml``.

Pool-based Active Learning
--------------------------

The following snippet implements an active learning cycle with 20 iterations using a Gaussian process
classifier and uncertainty sampling. You can substitute other classifiers from ``sklearn`` or those
provided by ``skactiveml``. Note that when using active learning with ``sklearn``, unlabeled data
is represented by the value ``MISSING_LABEL`` in the label vector ``y``. Additional query strategies
are available in our documentation.

.. code-block:: python

    import numpy as np
    from sklearn.gaussian_process import GaussianProcessClassifier
    from sklearn.datasets import make_blobs
    from skactiveml.pool import UncertaintySampling
    from skactiveml.utils import MISSING_LABEL
    from skactiveml.classifier import SklearnClassifier

    # Generate data set.
    X, y_true = make_blobs(n_samples=200, centers=4, random_state=0)
    y = np.full(shape=y_true.shape, fill_value=MISSING_LABEL)

    # Use the first 10 samples as initial training data.
    y[:10] = y_true[:10]

    # Create classifier and query strategy.
    clf = SklearnClassifier(
        GaussianProcessClassifier(random_state=0),
        classes=np.unique(y_true),
        random_state=0
    )
    qs = UncertaintySampling(method='entropy')

    # Execute active learning cycle.
    n_cycles = 20
    for c in range(n_cycles):
        query_idx = qs.query(X=X, y=y, clf=clf)
        y[query_idx] = y_true[query_idx]

    # Fit final classifier.
    clf.fit(X, y)

As a result, an actively trained Gaussian process classifier is obtained.
A visualization of its decision boundary (black line) along with sample utilities (greenish contours) is shown below.

.. image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/pal-example-output.png
   :width: 400

Stream-based Active Learning
----------------------------

The following snippet implements an active learning cycle with 200 data points and a default budget of 10%
using a Parzen window classifier and split uncertainty sampling.
Similar to the pool-based example, you can wrap classifiers from ``sklearn``, use sklearn-compatible classifiers,
or choose from the example classifiers provided by ``skactiveml``.

.. code-block:: python

    import numpy as np
    from sklearn.datasets import make_blobs
    from skactiveml.classifier import ParzenWindowClassifier
    from skactiveml.stream import Split
    from skactiveml.utils import MISSING_LABEL

    # Generate data set.
    X, y_true = make_blobs(n_samples=200, centers=4, random_state=0)

    # Create classifier and query strategy.
    clf = ParzenWindowClassifier(random_state=0, classes=np.unique(y_true))
    qs = Split(random_state=0)

    # Initialize training data as empty lists.
    X_train = []
    y_train = []

    # Initialize a list to store prediction results.
    correct_classifications = []

    # Execute active learning cycle.
    for x_t, y_t in zip(X, y_true):
        X_cand = x_t.reshape([1, -1])
        y_cand = y_t
        clf.fit(X_train, y_train)
        correct_classifications.append(clf.predict(X_cand)[0] == y_cand)
        sampled_indices = qs.query(candidates=X_cand, clf=clf)
        qs.update(candidates=X_cand, queried_indices=sampled_indices)
        X_train.append(x_t)
        y_train.append(y_cand if len(sampled_indices) > 0 else MISSING_LABEL)

As a result, an actively trained Parzen window classifier is obtained.
A visualization of its accuracy curve across the active learning cycle is shown below.

.. image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/stream-example-output.png
   :width: 400

Query Strategy Overview
=======================
For better orientation, we provide an `overview `_
(including paper references and `visual examples `_)
of the query strategies implemented by ``skactiveml``.

|Overview| |Visualization|

.. |Overview| image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/strategy-overview.gif
   :width: 365

.. |Visualization| image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/example-overview.gif
   :width: 365

.. examples_end

.. citing_start

Citing
======
If you use ``skactiveml`` in your research projects and find it helpful, please cite the following:

::

    @article{skactiveml2021,
        title={scikit-activeml: {A} {L}ibrary and {T}oolbox for {A}ctive {L}earning {A}lgorithms},
        author={Daniel Kottke and Marek Herde and Tuan Pham Minh and Alexander Benz and Pascal Mergard and Atal Roghman and Christoph Sandrock and Bernhard Sick},
        journal={Preprints},
        doi={10.20944/preprints202103.0194.v1},
        year={2021},
        url={https://github.com/scikit-activeml/scikit-activeml}
    }

.. citing_end

Owner

Name: scikit-activeml
Login: scikit-activeml
Kind: organization

Repositories: 2
Profile: https://github.com/scikit-activeml

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Kottke"
  given-names: "Daniel"
  orcid: "https://orcid.org/0000-0002-7870-6033"
- family-names: "Herde"
  given-names: "Marek"
  orcid: "https://orcid.org/0000-0003-4908-122X"
- family-names: "Minh"
  given-names: "Tuan Pham"
  orcid: "https://orcid.org/0000-0002-5102-5561"
- family-names: "Benz"
  given-names: "Alexander"
- family-names: "Lührs"
  given-names: "Lukas"
- family-names: "Mergard"
  given-names: "Pascal"
- family-names: "Roghman"
  given-names: "Atal"
- family-names: "Sandrock"
  given-names: "Christoph"
- family-names: "Sick"
  given-names: "Bernhard"
  orcid: "https://orcid.org/0000-0001-9467-656X"
title: "scikit-activeml"
version: 0.4.0
date-released: 2022-12-21
url: "https://github.com/scikit-activeml/scikit-activeml"
preferred-citation:
  type: article
  authors:
  - family-names: "Kottke"
    given-names: "Daniel"
    orcid: "https://orcid.org/0000-0002-7870-6033"
  - family-names: "Herde"
    given-names: "Marek"
    orcid: "https://orcid.org/0000-0003-4908-122X"
  - family-names: "Minh"
    given-names: "Tuan Pham"
    orcid: "https://orcid.org/0000-0002-5102-5561"
  - family-names: "Benz"
    given-names: "Alexander"
  - family-names: "Mergard"
    given-names: "Pascal"
  - family-names: "Roghman"
    given-names: "Atal"
  - family-names: "Sandrock"
    given-names: "Christoph"
  - family-names: "Sick"
    given-names: "Bernhard"
    orcid: "https://orcid.org/0000-0001-9467-656X"
  doi: "10.20944/preprints202103.0194.v1"
  journal: "Preprints"
  title: "scikit-activeml: A Library and Toolbox for Active Learning Algorithms"
  year: 2021

GitHub Events

Total

Create event: 28
Release event: 1
Issues event: 19
Watch event: 17
Delete event: 25
Issue comment event: 39
Push event: 144
Pull request review comment event: 6
Pull request review event: 27
Pull request event: 62
Fork event: 4

Last Year

Create event: 28
Release event: 1
Issues event: 19
Watch event: 17
Delete event: 25
Issue comment event: 39
Push event: 144
Pull request review comment event: 6
Pull request review event: 27
Pull request event: 62
Fork event: 4

Committers

Last synced: 9 months ago

All Time

Total Commits: 1,718
Total Committers: 18
Avg Commits per committer: 95.444
Development Distribution Score (DDS): 0.799

Past Year

Commits: 288
Committers: 5
Avg Commits per committer: 57.6
Development Distribution Score (DDS): 0.611

Top Committers

Name	Email	Commits
Marek Herde	m**e@u**e	346
tpham93	t**3@g**m	300
AlexanderBenz	a**7@w**e	261
Pascal Mergard	u**9@s**e	188
Lukas	l**e@g**m	152
Daniel Kottke	d**e@u**e	138
Cheng-JY	c**c@g**m	125
christoph14	c**2@g**m	101
dependabot[bot]	4****]	37
Atal Roghman	a**n@g**m	27
Jiaying Cheng	j**g@c**e	12
lluehrs	u**9@s**e	9
Mehmet Müjde	m**t@g**m	7
Mehmet Müjde	m**e@m**m	7
Alexandre Abraham	a**e@g**m	3
Mehmet Müjde	m**e@s**e	3
Mehmet Müjde	m**b@g**m	1
Mostafa Hany	s**s@g**m	1

Committer Domains (Top 20 + Academic)

student.uni-kassel.de: 3 uni-kassel.de: 2 me.com: 1 cluster.ies.uni-kassel.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 104
Total pull requests: 243
Average time to close issues: 8 months
Average time to close pull requests: 29 days
Total issue authors: 15
Total pull request authors: 13
Average comments per issue: 0.63
Average comments per pull request: 0.93
Merged pull requests: 167
Bot issues: 0
Bot pull requests: 105

Past Year

Issues: 11
Pull requests: 73
Average time to close issues: about 1 month
Average time to close pull requests: 18 days
Issue authors: 4
Pull request authors: 5
Average comments per issue: 0.0
Average comments per pull request: 0.95
Merged pull requests: 54
Bot issues: 0
Bot pull requests: 34

View more stats

Top Authors

Issue Authors

mherde (35)
tpham93 (35)
Pascal112 (7)
AlexanderBenz (7)
dakot (7)
LukasLuehrs (3)
renanj (2)
Cheng-JY (2)
perceptualJonathan (1)
showkeyjar (1)
meeen (1)
orshur (1)
mmuejde (1)
christoph14 (1)
bjaster (1)

Pull Request Authors

dependabot[bot] (148)
tpham93 (47)
mherde (39)
AlexanderBenz (22)
Pascal112 (17)
dakot (13)
Cheng-JY (12)
LukasLuehrs (5)
Moritz-Wirth (3)
ArthurHoa (2)
Peter-obi (2)
christoph14 (1)
mmuejde (1)
CatB1t (1)

Top Labels

Issue Labels

pool (14) upcoming (13) documentation (10) stream (9) bug (5) question (5) test (5) enhancement (4) classifier (2) regressor (2) guideline (1)

Pull Request Labels

dependencies (148) python (130) github_actions (18) bug (4)

Packages

Total packages: 1
Total downloads:
- pypi 3,321 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 20
Total maintainers: 1

pypi.org: scikit-activeml

scikit-activeml is a Python library for active learning on top of SciPy and scikit-learn.

Homepage: https://scikit-activeml.github.io
Documentation: https://scikit-activeml.readthedocs.io/
License: bsd-3-clause
Latest release: 0.6.2
published 9 months ago

Versions: 20
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 3,321 Last month

Rankings

Stargazers count: 7.3%

Dependent packages count: 7.3%

Forks count: 11.0%

Average: 12.6%

Downloads: 15.4%

Dependent repos count: 22.1%

Maintainers (1)

mherde

Last synced: 6 months ago

Dependencies

.github/workflows/main.yml actions

actions/checkout v1 composite
actions/setup-python v1 composite
codecov/codecov-action v1 composite

.github/workflows/python-publish.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite
pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite

requirements.txt pypi

iteration-utilities >=0.11
joblib >=1.2
matplotlib >=3.5
numpy >=1.22
scikit-learn >=1.2
scipy >=1.8

requirements_extra.txt pypi

flake8 *
jupyter *
nbformat *
nbsphinx *
numpydoc *
pybtex *
pydata_sphinx_theme ==0.9
pytest *
pytest-cov *
sphinx ==4.2.0
sphinx-gallery *
sphinxcontrib-bibtex *