active-learning-as-a-service

A scalable & efficient active learning/data selection system for everyone.

https://github.com/huaizhengzhang/active-learning-as-a-service

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, springer.com, ieee.org
✓
Committers with academic emails
1 of 4 committers (25.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.6%) to scientific vocabulary

Keywords

active-learning automl deep-learning machine-learning mlops mlsys pytorch

Last synced: 6 months ago · JSON representation ·

Repository

A scalable & efficient active learning/data selection system for everyone.

Basic Info

Host: GitHub
Owner: HuaizhengZhang
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 1.49 MB

Statistics

Stars: 214
Watchers: 9
Forks: 15
Open Issues: 10
Releases: 4

Topics

active-learning automl deep-learning machine-learning mlops mlsys pytorch

Created almost 4 years ago · Last pushed over 1 year ago

Metadata Files

Readme Contributing License Citation

ALaaS: Active Learning as a Service.

PyPI GitHub Actions Workflow Status GitHub Docker Pulls

Active Learning as a Service (ALaaS) is a fast and scalable framework for automatically selecting a subset to be labeled from a full dataset so to reduce labeling cost. It provides an out-of-the-box and standalone experience for users to quickly utilize active learning.

ALaaS is featured for

:hatching_chick: Easy-to-use With <10 lines of code to start the system to employ active learning.
:rocket: Fast Use the stage-level parallellism to achieve over 10x speedup than under-optimized active learning process.
:collision: Elastic Scale up and down multiple active workers, depending on the number of GPU devices.

The project is still under the active development. Welcome to join us!

Installation :construction:

You can easily install the ALaaS by PyPI,

bash pip install alaas

The package of ALaaS contains both client and server parts. You can build an active data selection service on your own servers or just apply the client to perform data selection.

:warning: For deep learning frameworks like TensorFlow and Pytorch, you may need to install manually since the version to meet your deployment can be different (as well as transformers if you are running models from it).

You can also use Docker to run ALaaS:

bash docker pull huangyz0918/alaas

and start a service by the following command:

bash docker run -it --rm -p 8081:8081 \ --mount type=bind,source=<config path>,target=/server/config.yml,readonly huangyz0918/alaas:latest

Quick Start :truck:

After the installation of ALaaS, you can easily start a local server, here is the simplest example that can be executed with only 2 lines of code.

```python from alaas.server import Server

Server.start() ```

The example code (by default) will start an image data selection (PyTorch ResNet-18 for image classification task) HTTP server in port 8081 for you. After this, you can try to get the selection results on your own image dataset, a client-side example is like

bash curl \ -X POST http://0.0.0.0:8081/post \ -H 'Content-Type: application/json' \ -d '{"data":[{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png"}, {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png"}, {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png"}, {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png"}, {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png"}], "parameters": {"budget": 3}, "execEndpoint":"/query"}'

You can also use alaas.Client to build the query request (for both http and grpc protos) like this,

```python from alaas.client import Client

urllist = [ 'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png', 'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png', 'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png', 'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png', 'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png' ] client = Client('http://0.0.0.0:8081') print(client.querybyuri(urllist, budget=3)) ```

The output data is a subset uris/data in your input dataset, which indicates selected results for further data labeling.

ALaaS Server Customization :wrench:

We support two different methods to start your server, 1. by input parameters 2. by YAML configuration

Input Parameters

You can modify your server by setting different input parameters,

```python from alaas.server import Server

Server.start(proto='http', # the server proto, can be 'grpc', 'http' and 'https'. port=8081, # the access port of your server. host='0.0.0.0', # the access IP address of your server. jobname='defaultapp', # the server name. modelhub='pytorch/vision:v0.10.0', # the active learning model hub, the server will automatically download it for data selection. modelname='resnet18', # the active learning model name (should be available in your model hub). device='cpu', # the deploy location/device (can be something like 'cpu', 'cuda' or 'cuda:0'). strategy='LeastConfidence', # the selection strategy (read the document to see what ALaaS supports). batchsize=1, # the batch size of data processing. replica=1, # the number of workers to select/query data. tokenizer=None, # the tokenizer name (should be available in your model hub), only for NLP tasks. transformerstask=None # the NLP task name (for Hugging Face Pipelines), only for NLP tasks. ) ```

YAML Configuration

You can also start the server by setting an input YAML configuration like this,

```python from alaas import Server

start the server by an input configuration file.

Server.startbyconfig('pathtoyour_configuration.yml') ```

Details about building a configuration for your deployment scenarios can be found here.

Strategy Zoo :art:

Currently we supported several active learning strategies shown in the following table,

Citation

Our tech report of ALaaS is available on arxiv and NeurIPS 2022. Please cite as:

bash @article{huang2022active, title={Active-Learning-as-a-Service: An Efficient MLOps System for Data-Centric AI}, author={Huang, Yizheng and Zhang, Huaizheng and Li, Yuanming and Lau, Chiew Tong and You, Yang}, journal={arXiv preprint arXiv:2207.09109}, year={2022} }

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Yizheng Huang}
🚇 ⚠️ 💻

_Huaizheng
🖋 ⚠️ 📖

_{Yuanming Li}
⚠️ 💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Acknowledgement

Jina - Build cross-modal and multimodal applications on the cloud.
Transformers - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

License

The theme is available as open source under the terms of the Apache 2.0 License.

Owner

Name: Huaizheng Hunter Zhang
Login: HuaizhengZhang
Kind: user
Location: Singapore
Company: https://breezeml.ai/

Website: https://huaizhengzhang.github.io
Repositories: 6
Profile: https://github.com/HuaizhengZhang

Founding Engineer at BreezeML. Focus on MLSys and MLOps. PhD@NTUsg.

Citation (CITATION.cff)

cff-version: 0.2.1
message: "If you use this software, please cite it as below."
authors:
- family-names: "Huang"
  given-names: "Yizheng"
- family-names: "Zhang"
  given-names: "Huaizheng"
- family-names: "Li"
  given-names: "Yuanming"
title: "Active-Learning-as-a-Service"
date-released: 2022-12-30
url: "https://github.com/MLSysOps/Active-Learning-as-a-Service"
preferred-citation:
  type: article
  authors:
  - family-names: "Huang"
    given-names: "Yizheng"
  - family-names: "Zhang"
    given-names: "Huaizheng"
  - family-names: "Li"
    given-names: "Yuanming"
  - family-names: "Lau"
    given-names: "Chiew Tong"
  - family-names: "You"
    given-names: "Yang"
  journal: "arXiv preprint arXiv:2207.09109"
  title: "Active-Learning-as-a-Service: An Automatic and Efficient MLOps System for Data-Centric AI"
  year: 2022

GitHub Events

Total

Watch event: 4

Last Year

Watch event: 4

Committers

Last synced: 9 months ago

All Time

Total Commits: 88
Total Committers: 4
Avg Commits per committer: 22.0
Development Distribution Score (DDS): 0.159

Past Year

Commits: 5
Committers: 1
Avg Commits per committer: 5.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Yizheng Huang	h**8@g**m	74
Huaizheng	H**1@e**g	7
allcontributors[bot]	4****]	6
Ikko Ashimine	e**r@g**m	1

Committer Domains (Top 20 + Academic)

e.ntu.edu.sg: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 44
Total pull requests: 40
Average time to close issues: 14 days
Average time to close pull requests: 3 days
Total issue authors: 4
Total pull request authors: 5
Average comments per issue: 0.61
Average comments per pull request: 0.48
Merged pull requests: 40
Bot issues: 0
Bot pull requests: 6

Past Year

Issues: 0
Pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: 8 minutes
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

huangyz0918 (18)
jamnicki (3)
HuaizhengZhang (3)

Pull Request Authors

huangyz0918 (11)
YuanmingLeee (4)
HuaizhengZhang (4)
allcontributors[bot] (3)
eltociear (1)

Top Labels

Issue Labels

enhancement (5) feature (4) bug (3) discussion (1)

Pull Request Labels

bug (1) enhancement (1)

Dependencies

requirements.txt pypi

jina *
numpy *
opencv-python *
pillow *
pydantic *
pyyaml *
requests *
scikit_learn *
sentencepiece *
setuptools *
tqdm *
transformers *

.github/workflows/python-publish.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

setup.py pypi

.github/workflows/test.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

environment.yml conda

brotlipy 0.7.0.*
ca-certificates 2021.7.5.*
certifi 2021.5.30.*
cffi 1.14.6.*
chardet 4.0.0.*
conda-pack 0.6.0.*
cryptography 3.4.7.*
idna 2.10.*
libffi 3.3.*
ncurses 6.2.*
openssl 1.1.1.*
pip 21.1.3.*
pycosat 0.6.3.*
pycparser 2.20.*
pyopenssl 20.0.1.*
pysocks 1.7.1.*
python 3.9.5.*
readline 8.1.*
requests 2.25.1.*
ruamel_yaml 0.15.100.*
setuptools 52.0.0.*
six 1.16.0.*
sqlite 3.36.0.*
tk 8.6.10.*
tqdm 4.61.2.*
tzdata 2021a.*
urllib3 1.26.6.*
wheel 0.36.2.*
xz 5.2.5.*
yaml 0.2.5.*
zlib 1.2.11.*

active-learning-as-a-service

Science Score: 77.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

ALaaS: Active Learning as a Service.

Installation :construction:

Quick Start :truck:

ALaaS Server Customization :wrench:

Input Parameters

YAML Configuration

start the server by an input configuration file.

Strategy Zoo :art:

Citation

Contributors ✨

Acknowledgement

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies