baal

Bayesian active learning library for research and industrial usecases.

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.0%) to scientific vocabulary

Keywords

active-learning ai bayesian-active-learning deep-learning machine-learning python pytorch

Keywords from Contributors

interpretability annotation distribution interactive packaging notebook deep-neural-networks network-simulation hacking observability

Last synced: 6 months ago · JSON representation

Repository

Bayesian active learning library for research and industrial usecases.

Basic Info

Host: GitHub
Owner: baal-org
License: apache-2.0
Language: Python
Default Branch: master
Homepage: https://baal.readthedocs.io
Size: 45 MB

Statistics

Stars: 902
Watchers: 16
Forks: 86
Open Issues: 21
Releases: 16

Topics

active-learning ai bayesian-active-learning deep-learning machine-learning python pytorch

Created over 6 years ago · Last pushed 8 months ago

Metadata Files

Readme Contributing License Code of conduct Citation Support

Bayesian Active Learning (Baal)

Baal is an active learning library that supports both industrial applications and research usecases.

Read the documentation at https://baal.readthedocs.io.

Our paper can be read on arXiv. It includes tips and tricks to make active learning usable in production.

For a quick introduction to Baal and Bayesian active learning, please see these links:

Baal was initially developed at ElementAI (acquired by ServiceNow in 2021), but is now independant.

Installation and requirements

Baal requires Python>=3.10.

To install Baal using pip: pip install baal

We use Poetry as our package manager. To install Baal from source: poetry install

Papers using Baal

Bayesian active learning for production, a systematic study and a reusable library (Atighehchian et al. 2020)
Synbols: Probing Learning Algorithms with Synthetic Datasets (Lacoste et al. 2020)
Can Active Learning Preemptively Mitigate Fairness Issues? (Branchaud-Charron et al. 2021)
Active learning with MaskAL reduces annotation effort for training Mask R-CNN ( Blok et al. 2021)
Stochastic Batch Acquisition for Deep Active Learning (Kirsch et al. 2022)

What is active learning?

Active learning is a special case of machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points (to understand the concept in more depth, refer to our tutorial).

Baal Framework

At the moment Baal supports the following methods to perform active learning.

Monte-Carlo Dropout (Gal et al. 2015)
MCDropConnect (Mobiny et al. 2019)
Deep ensembles
Semi-supervised learning

If you want to propose new methods, please submit an issue.

The Monte-Carlo Dropout method is a known approximation for Bayesian neural networks. In this method, the Dropout layer is used both in training and test time. By running the model multiple times whilst randomly dropping weights, we calculate the uncertainty of the prediction using one of the uncertainty measurements in heuristics.py.

The framework consists of four main parts, as demonstrated in the flowchart below:

ActiveLearningDataset
Heuristics
ModelWrapper
ActiveLearningLoop

To get started, wrap your dataset in our [ActiveLearningDataset](baal/active/dataset/pytorchdataset.py)_ class. This will ensure that the dataset is split into training and pool sets. The pool set represents the portion of the training set which is yet to be labelled.

We provide a lightweight object ModelWrapper similar to keras.Model to make it easier to train and test the model. If your model is not ready for active learning, we provide Modules to prepare them.

For example, the MCDropoutModule wrapper changes the existing dropout layer to be used in both training and inference time and the ModelWrapper makes the specifies the number of iterations to run at training and inference.

Finally, [ActiveLearningLoop](baal/active/activeloop.py)_ automatically computes the uncertainty and label the most uncertain items in the pool.

In conclusion, your script should be similar to this:

```python dataset = ActiveLearningDataset(yourdataset) dataset.labelrandomly(INITIALPOOL) # label some data model = MCDropoutModule(yourmodel) wrapper = ModelWrapper(model, args=TrainingArgs(...)) experiment = ActiveLearningExperiment( trainer=wrapper, # Huggingface or ModelWrapper to train aldataset=dataset, # Active learning dataset evaldataset=testdataset, # Evaluation Dataset heuristic=BALD(), # Uncertainty heuristic to use querysize=100, # How many items to label per round. iterations=20, # How many MC sampling to perform per item. pool_size=None, # Optionally limit the size of the unlabelled pool. criterion=None # Stopping criterion for the experiment. )

The experiment will run until all items are labelled.

metrics = experiment.start() ```

For a complete experiment, see [experiments/vggmcdropoutcifar10.py](experiments/vggmcdropoutcifar10.py) .

Re-run our Experiments

bash docker build [--target base_baal] -t baal . docker run --rm baal --gpus all python3 experiments/vgg_mcdropout_cifar10.py

Use Baal for YOUR Experiments

Simply clone the repo, and create your own experiment script similar to the example at [experiments/vggmcdropoutcifar10.py](experiments/vggmcdropoutcifar10.py). Make sure to use the four main parts of Baal framework. Happy running experiments

Contributing!

To contribute, see CONTRIBUTING.md.

Who We Are!

"There is passion, yet peace; serenity, yet emotion; chaos, yet order."

The Baal team tests and implements the most recent papers on uncertainty estimation and active learning.

Current maintainers:

How to cite

If you used Baal in one of your project, we would greatly appreciate if you cite this library using this Bibtex:

@misc{atighehchian2019baal, title={Baal, a bayesian active learning library}, author={Atighehchian, Parmida and Branchaud-Charron, Frederic and Freyberg, Jan and Pardinas, Rafael and Schell, Lorne and Pearse, George}, year={2022}, howpublished={\url{https://github.com/baal-org/baal/}}, }

Licence

To get information on licence of this API please read LICENCE

Owner

Name: baal-org
Login: baal-org
Kind: organization

Website: https://baal.readthedocs.io/en/latest/
Repositories: 1
Profile: https://github.com/baal-org

GitHub Events

Total

Create event: 2
Release event: 1
Issues event: 2
Watch event: 39
Push event: 2
Fork event: 1

Last Year

Create event: 2
Release event: 1
Issues event: 2
Watch event: 39
Push event: 2
Fork event: 1

Committers

Last synced: 9 months ago

All Time

Total Commits: 232
Total Committers: 21
Avg Commits per committer: 11.048
Development Distribution Score (DDS): 0.595

Past Year

Commits: 15
Committers: 3
Avg Commits per committer: 5.0
Development Distribution Score (DDS): 0.467

Top Committers

Name	Email	Commits
Frédéric Branchaud-Charron	f**n@e**m	94
Frédéric Branchaud-Charron	f**n@g**m	66
fr.branchaud-charron	f**n@s**m	19
Parmida Atighehchian	p****g	19
Rafa	r**a@e**m	7
Freddie Bickford Smith	3****h	4
Rafael Pardinas	3****i	4
Arthur Thuy	5****y	3
reeshipaul	r**5@g**m	3
Jan Freyberg	j**g@g**m	2
BvMWUR	4****R	1
Cami Williams	c**s@g**m	1
George Pearse	4****e	1
Lorne Schell	1****r	1
Nitish Sharma	n**5@g**m	1
ThierryJudge	t**e@u**a	1
Dref360	f**n@u**a	1
Archy de Berker	a**y@e**m	1
Trim Bresilla	t**a@g**m	1
dependabot[bot]	4****]	1
vfdev	v**5@g**m	1

Committer Domains (Top 20 + Academic)

elementai.com: 3 usherbrooke.ca: 2 servicenow.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 73
Total pull requests: 70
Average time to close issues: 5 months
Average time to close pull requests: 23 days
Total issue authors: 31
Total pull request authors: 12
Average comments per issue: 2.45
Average comments per pull request: 0.49
Merged pull requests: 63
Bot issues: 0
Bot pull requests: 4

Past Year

Issues: 2
Pull requests: 1
Average time to close issues: about 20 hours
Average time to close pull requests: about 1 hour
Issue authors: 2
Pull request authors: 1
Average comments per issue: 2.5
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Dref360 (29)
arthur-thuy (9)
GeorgePearse (4)
parmidaatg (4)
nitish1295 (2)
Anurich (1)
biro-mark (1)
TobiArndt (1)
pl-ghost (1)
whaowhao (1)
pieterblok (1)
vahuja4 (1)
pinarezgicol (1)
noknok00 (1)
lorinczszabolcs (1)

Pull Request Authors

Dref360 (54)
arthur-thuy (4)
dependabot[bot] (4)
parmidaatg (4)
fbickfordsmith (2)
bresilla (1)
nitish1295 (1)
junaidahmed361 (1)
pieterblok (1)
GeorgePearse (1)

Top Labels

Issue Labels

enhancement (36) bug (18) documentation (5) good first issue (3) help wanted (3) PRs welcome (2) wontfix (1)

Pull Request Labels

dependencies (4)

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 5

conda-forge.org: baal

Homepage: https://github.com/baal-org/baal
License: Apache-2.0
Latest release: 1.7.0
published over 3 years ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 1

Rankings

Stargazers count: 14.9%

Forks count: 21.7%

Dependent repos count: 24.4%

Average: 28.1%

Dependent packages count: 51.6%

Last synced: 6 months ago

Dependencies

.github/workflows/pythonci.yml actions

Gr1N/setup-poetry v7 composite
actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/test-import.yml actions

Gr1N/setup-poetry v7 composite
actions/checkout v2 composite
actions/setup-python v2 composite

Dockerfile docker

pytorch/pytorch 1.9.0-cuda10.2-cudnn7-runtime build
setup latest build

docs/requirements.txt pypi

Pygments ==2.13.0
mkdocs ==1.4.0
mkdocs-exclude-search ==0.6.4
mkdocs-jupyter ==0.21.0
mkdocstrings ==0.18.1

poetry.lock pypi

147 dependencies

pyproject.toml pypi

Pygments ^2.12.0 develop
bandit ^1.7.1 develop
black ^22.3.0 develop
docutils 0.16 develop
flake8 ^3.9.2 develop
hypothesis 4.24.0 develop
mkdocs-exclude-search ^0.6.4 develop
mkdocs-jupyter ^0.21.0 develop
mkdocs-material ^8.5.6 develop
mkdocstrings ^0.18.1 develop
mypy ^0.910 develop
pytest ^6.2.5 develop
pytest-cov ^2.12.1 develop
pytest-mock ^3.6.1 develop
torch-hypothesis 0.2.0 develop
Pillow >=6.2.0
datasets >=1.11.0
h5py ^3.4.0
lightning-flash >=0.7.5
matplotlib ^3.4.3
numpy ^1.21.2
python >=3.8,<4
scikit-learn ^1.0.0
scipy ^1.7.1
structlog ^21.1.0
torch >=1.6.0
torchmetrics ^0.9.3
torchvision >=0.7.0
tqdm ^4.62.2
transformers >=4.10.2

baal

Science Score: 36.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Bayesian Active Learning (Baal)

Installation and requirements

Papers using Baal

What is active learning?

Baal Framework

The experiment will run until all items are labelled.

Re-run our Experiments

Use Baal for YOUR Experiments

Contributing!

Who We Are!

How to cite

Licence

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

conda-forge.org: baal

Rankings

Dependencies