scikit-activeml

scikit-activeml: Python library for active learning on top of scikit-learn

https://github.com/scikit-activeml/scikit-activeml

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: preprints.org
  • Committers with academic emails
    6 of 18 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.9%) to scientific vocabulary

Keywords

active-learning machine-learning python scikit-learn

Keywords from Contributors

mesh interpretability sequences projection interactive optim hacking network-simulation
Last synced: 6 months ago · JSON representation ·

Repository

scikit-activeml: Python library for active learning on top of scikit-learn

Basic Info
Statistics
  • Stars: 171
  • Watchers: 5
  • Forks: 19
  • Open Issues: 23
  • Releases: 20
Topics
active-learning machine-learning python scikit-learn
Created over 5 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.rst

.. intro_start

|

.. image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/scikit-activeml-logo.png
   :class: dark-light
   :align: center
   :width: 40%

|

=====================================================================
scikit-activeml: A Library and Toolbox for Active Learning Algorithms
=====================================================================
|Doc| |Codecov| |PythonVersion| |PyPi| |Black| |Downloads| |Paper|

.. |Doc| image:: https://img.shields.io/badge/docs-latest-green
   :target: https://scikit-activeml.github.io/latest/

.. |Codecov| image:: https://codecov.io/gh/scikit-activeml/scikit-activeml/branch/master/graph/badge.svg
   :target: https://app.codecov.io/gh/scikit-activeml/scikit-activeml

.. |PythonVersion| image:: https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue.svg
   :target: https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue

.. |PyPi| image:: https://badge.fury.io/py/scikit-activeml.svg
   :target: https://badge.fury.io/py/scikit-activeml

.. |Paper| image:: https://img.shields.io/badge/paper-10.20944/preprints202103.0194.v1-blue.svg
   :target: https://www.preprints.org/manuscript/202103.0194/v1

.. |Black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :target: https://github.com/psf/black

.. |Downloads| image:: https://static.pepy.tech/badge/scikit-activeml
   :target: https://www.pepy.tech/projects/scikit-activeml

Machine learning models often need large amounts of training data to perform well.
While unlabeled data can be gathered with relative ease, labeling is typically difficult,
time-consuming, or expensive. Active learning addresses this challenge by querying labels
for the most informative samples, enabling high performance with fewer labeled examples.
With this goal in mind, **scikit-activeml** has been developed as a Python library for active
learning on top of `scikit-learn `_.

.. intro_end

.. user_installation_start

User Installation
=================

The easiest way to install scikit-activeml is using ``pip``:

::

    pip install -U scikit-activeml

This installation via `pip` includes only the minimum requirements to avoid
potential package downgrades within your installation. If you encounter any incompatibility issues,
you can install the `maximum requirements `_,
which have been tested for the current package release:

::

    pip install -U scikit-activeml[max]

.. user_installation_end

.. examples_start

Examples
========

We provide a broad overview of different use cases in our `tutorial section `_,
including:

- `Pool-based Active Learning - Getting Started `_
- `Deep Pool-based Active Learning - scikit-activeml with Skorch `_
- `Pool-based Active Learning for Regression - Getting Started `_
- `Pool-based Active Learning - Sample Annotating `_
- `Pool-based Active Learning - Simple Evaluation Study `_
- `Active Image Classification via Self-supervised Learning `_
- `Multi-annotator Pool-based Active Learning - Getting Started `_
- `Stream-based Active Learning - Getting Started `_
- `Batch Stream-based Active Learning with Pool Query Strategies `_
- `Stream-based Active Learning With River `_

Below are two code snippets illustrating how straightforward it is to implement active learning cycles using our Python package ``skactiveml``.

Pool-based Active Learning
--------------------------

The following snippet implements an active learning cycle with 20 iterations using a Gaussian process
classifier and uncertainty sampling. You can substitute other classifiers from ``sklearn`` or those
provided by ``skactiveml``. Note that when using active learning with ``sklearn``, unlabeled data
is represented by the value ``MISSING_LABEL`` in the label vector ``y``. Additional query strategies
are available in our documentation.

.. code-block:: python

    import numpy as np
    from sklearn.gaussian_process import GaussianProcessClassifier
    from sklearn.datasets import make_blobs
    from skactiveml.pool import UncertaintySampling
    from skactiveml.utils import MISSING_LABEL
    from skactiveml.classifier import SklearnClassifier

    # Generate data set.
    X, y_true = make_blobs(n_samples=200, centers=4, random_state=0)
    y = np.full(shape=y_true.shape, fill_value=MISSING_LABEL)

    # Use the first 10 samples as initial training data.
    y[:10] = y_true[:10]

    # Create classifier and query strategy.
    clf = SklearnClassifier(
        GaussianProcessClassifier(random_state=0),
        classes=np.unique(y_true),
        random_state=0
    )
    qs = UncertaintySampling(method='entropy')

    # Execute active learning cycle.
    n_cycles = 20
    for c in range(n_cycles):
        query_idx = qs.query(X=X, y=y, clf=clf)
        y[query_idx] = y_true[query_idx]

    # Fit final classifier.
    clf.fit(X, y)

As a result, an actively trained Gaussian process classifier is obtained.
A visualization of its decision boundary (black line) along with sample utilities (greenish contours) is shown below.

.. image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/pal-example-output.png
   :width: 400

Stream-based Active Learning
----------------------------

The following snippet implements an active learning cycle with 200 data points and a default budget of 10%
using a Parzen window classifier and split uncertainty sampling.
Similar to the pool-based example, you can wrap classifiers from ``sklearn``, use sklearn-compatible classifiers,
or choose from the example classifiers provided by ``skactiveml``.

.. code-block:: python

    import numpy as np
    from sklearn.datasets import make_blobs
    from skactiveml.classifier import ParzenWindowClassifier
    from skactiveml.stream import Split
    from skactiveml.utils import MISSING_LABEL

    # Generate data set.
    X, y_true = make_blobs(n_samples=200, centers=4, random_state=0)

    # Create classifier and query strategy.
    clf = ParzenWindowClassifier(random_state=0, classes=np.unique(y_true))
    qs = Split(random_state=0)

    # Initialize training data as empty lists.
    X_train = []
    y_train = []

    # Initialize a list to store prediction results.
    correct_classifications = []

    # Execute active learning cycle.
    for x_t, y_t in zip(X, y_true):
        X_cand = x_t.reshape([1, -1])
        y_cand = y_t
        clf.fit(X_train, y_train)
        correct_classifications.append(clf.predict(X_cand)[0] == y_cand)
        sampled_indices = qs.query(candidates=X_cand, clf=clf)
        qs.update(candidates=X_cand, queried_indices=sampled_indices)
        X_train.append(x_t)
        y_train.append(y_cand if len(sampled_indices) > 0 else MISSING_LABEL)

As a result, an actively trained Parzen window classifier is obtained.
A visualization of its accuracy curve across the active learning cycle is shown below.

.. image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/stream-example-output.png
   :width: 400

Query Strategy Overview
=======================
For better orientation, we provide an `overview `_
(including paper references and `visual examples `_)
of the query strategies implemented by ``skactiveml``.

|Overview| |Visualization|

.. |Overview| image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/strategy-overview.gif
   :width: 365

.. |Visualization| image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/example-overview.gif
   :width: 365

.. examples_end

.. citing_start

Citing
======
If you use ``skactiveml`` in your research projects and find it helpful, please cite the following:

::

    @article{skactiveml2021,
        title={scikit-activeml: {A} {L}ibrary and {T}oolbox for {A}ctive {L}earning {A}lgorithms},
        author={Daniel Kottke and Marek Herde and Tuan Pham Minh and Alexander Benz and Pascal Mergard and Atal Roghman and Christoph Sandrock and Bernhard Sick},
        journal={Preprints},
        doi={10.20944/preprints202103.0194.v1},
        year={2021},
        url={https://github.com/scikit-activeml/scikit-activeml}
    }

.. citing_end

Owner

  • Name: scikit-activeml
  • Login: scikit-activeml
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Kottke"
  given-names: "Daniel"
  orcid: "https://orcid.org/0000-0002-7870-6033"
- family-names: "Herde"
  given-names: "Marek"
  orcid: "https://orcid.org/0000-0003-4908-122X"
- family-names: "Minh"
  given-names: "Tuan Pham"
  orcid: "https://orcid.org/0000-0002-5102-5561"
- family-names: "Benz"
  given-names: "Alexander"
- family-names: "Lührs"
  given-names: "Lukas"
- family-names: "Mergard"
  given-names: "Pascal"
- family-names: "Roghman"
  given-names: "Atal"
- family-names: "Sandrock"
  given-names: "Christoph"
- family-names: "Sick"
  given-names: "Bernhard"
  orcid: "https://orcid.org/0000-0001-9467-656X"
title: "scikit-activeml"
version: 0.4.0
date-released: 2022-12-21
url: "https://github.com/scikit-activeml/scikit-activeml"
preferred-citation:
  type: article
  authors:
  - family-names: "Kottke"
    given-names: "Daniel"
    orcid: "https://orcid.org/0000-0002-7870-6033"
  - family-names: "Herde"
    given-names: "Marek"
    orcid: "https://orcid.org/0000-0003-4908-122X"
  - family-names: "Minh"
    given-names: "Tuan Pham"
    orcid: "https://orcid.org/0000-0002-5102-5561"
  - family-names: "Benz"
    given-names: "Alexander"
  - family-names: "Mergard"
    given-names: "Pascal"
  - family-names: "Roghman"
    given-names: "Atal"
  - family-names: "Sandrock"
    given-names: "Christoph"
  - family-names: "Sick"
    given-names: "Bernhard"
    orcid: "https://orcid.org/0000-0001-9467-656X"
  doi: "10.20944/preprints202103.0194.v1"
  journal: "Preprints"
  title: "scikit-activeml: A Library and Toolbox for Active Learning Algorithms"
  year: 2021

GitHub Events

Total
  • Create event: 28
  • Release event: 1
  • Issues event: 19
  • Watch event: 17
  • Delete event: 25
  • Issue comment event: 39
  • Push event: 144
  • Pull request review comment event: 6
  • Pull request review event: 27
  • Pull request event: 62
  • Fork event: 4
Last Year
  • Create event: 28
  • Release event: 1
  • Issues event: 19
  • Watch event: 17
  • Delete event: 25
  • Issue comment event: 39
  • Push event: 144
  • Pull request review comment event: 6
  • Pull request review event: 27
  • Pull request event: 62
  • Fork event: 4

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 1,718
  • Total Committers: 18
  • Avg Commits per committer: 95.444
  • Development Distribution Score (DDS): 0.799
Past Year
  • Commits: 288
  • Committers: 5
  • Avg Commits per committer: 57.6
  • Development Distribution Score (DDS): 0.611
Top Committers
Name Email Commits
Marek Herde m****e@u****e 346
tpham93 t****3@g****m 300
AlexanderBenz a****7@w****e 261
Pascal Mergard u****9@s****e 188
Lukas l****e@g****m 152
Daniel Kottke d****e@u****e 138
Cheng-JY c****c@g****m 125
christoph14 c****2@g****m 101
dependabot[bot] 4****] 37
Atal Roghman a****n@g****m 27
Jiaying Cheng j****g@c****e 12
lluehrs u****9@s****e 9
Mehmet Müjde m****t@g****m 7
Mehmet Müjde m****e@m****m 7
Alexandre Abraham a****e@g****m 3
Mehmet Müjde m****e@s****e 3
Mehmet Müjde m****b@g****m 1
Mostafa Hany s****s@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 104
  • Total pull requests: 243
  • Average time to close issues: 8 months
  • Average time to close pull requests: 29 days
  • Total issue authors: 15
  • Total pull request authors: 13
  • Average comments per issue: 0.63
  • Average comments per pull request: 0.93
  • Merged pull requests: 167
  • Bot issues: 0
  • Bot pull requests: 105
Past Year
  • Issues: 11
  • Pull requests: 73
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 18 days
  • Issue authors: 4
  • Pull request authors: 5
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.95
  • Merged pull requests: 54
  • Bot issues: 0
  • Bot pull requests: 34
Top Authors
Issue Authors
  • mherde (35)
  • tpham93 (35)
  • Pascal112 (7)
  • AlexanderBenz (7)
  • dakot (7)
  • LukasLuehrs (3)
  • renanj (2)
  • Cheng-JY (2)
  • perceptualJonathan (1)
  • showkeyjar (1)
  • meeen (1)
  • orshur (1)
  • mmuejde (1)
  • christoph14 (1)
  • bjaster (1)
Pull Request Authors
  • dependabot[bot] (148)
  • tpham93 (47)
  • mherde (39)
  • AlexanderBenz (22)
  • Pascal112 (17)
  • dakot (13)
  • Cheng-JY (12)
  • LukasLuehrs (5)
  • Moritz-Wirth (3)
  • ArthurHoa (2)
  • Peter-obi (2)
  • christoph14 (1)
  • mmuejde (1)
  • CatB1t (1)
Top Labels
Issue Labels
pool (14) upcoming (13) documentation (10) stream (9) bug (5) question (5) test (5) enhancement (4) classifier (2) regressor (2) guideline (1)
Pull Request Labels
dependencies (148) python (130) github_actions (18) bug (4)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 3,321 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 20
  • Total maintainers: 1
pypi.org: scikit-activeml

scikit-activeml is a Python library for active learning on top of SciPy and scikit-learn.

  • Versions: 20
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 3,321 Last month
Rankings
Stargazers count: 7.3%
Dependent packages count: 7.3%
Forks count: 11.0%
Average: 12.6%
Downloads: 15.4%
Dependent repos count: 22.1%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/main.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v1 composite
  • codecov/codecov-action v1 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
requirements.txt pypi
  • iteration-utilities >=0.11
  • joblib >=1.2
  • matplotlib >=3.5
  • numpy >=1.22
  • scikit-learn >=1.2
  • scipy >=1.8
requirements_extra.txt pypi
  • flake8 *
  • jupyter *
  • nbformat *
  • nbsphinx *
  • numpydoc *
  • pybtex *
  • pydata_sphinx_theme ==0.9
  • pytest *
  • pytest-cov *
  • sphinx ==4.2.0
  • sphinx-gallery *
  • sphinxcontrib-bibtex *