Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: lchen001
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 1010 KB
Statistics
- Stars: 17
- Watchers: 3
- Forks: 2
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
-----
[](https://github.com/pre-commit/pre-commit)
[](https://github.com/RichardLitt/standard-readme)
[](https://arxiv.org/abs/2209.08443)
[](LICENSE)
[]()
A longitudinal database of ML API predictions.
[**Getting Started**](#%EF%B8%8F-quickstart)
| [**Website**](http://hapi.stanford.edu/)
| [**Contributing**](CONTRIBUTING.md)
| [**About**](#%EF%B8%8F-about)
💡 What is HAPI?

History of APIs (HAPI) is a large-scale, longitudinal database of commercial ML API predictions. It contains 1.7 million predictions collected from 2020 to 2022 and spanning APIs from Amazon, Google, IBM, and Microsoft. The database include diverse machine learning tasks including image tagging, speech recognition and text mining.
⚡️ Quickstart
We provide a lightweight python package for getting started with HAPI.
Read the guide below or follow along in Google Colab:
bash
pip install "hapi @ git+https://github.com/lchen001/hapi@main"
Import the library and download the data, optionally specifying the directory for the
the download. If the directory is not specified, the data will be downloaded to ~/.hapi.
```python
import hapi
hapi.config.data_dir = "/path/to/data/dir"
hapi.download() ```
You can permanently set the data directory by adding the variable
HAPI_DATA_DIRto your environment.
Once we've downloaded the database, we can list the available APIs, datasets, and tasks with hapi.summary(). This returns a Pandas DataFrame with columns task, dataset, api, date, path, cost_per_10k.
```python
df = hapi.summary() ```
To load the predictions into memory we use hapi.get_predictions(). The keyword arguments allow us to load predictions for a subset of tasks, datasets, apis and/or dates.
```python
predictions = hapi.getpredictions(task="mic", dataset="pascal", api=["googlemic", "ibm_mic"]) ```
The predictions are returned as a dictionary mapping from "{task}/{dataset}/{api}/{date}" to lists of dictionaries, each with keys "example_id", "predicted_label", and "confidence". For example:
python
{
"mic/pascal/google_mic/20-10-28": [
{
'confidence': 0.9798267782,
'example_id': '2011_000494',
'predicted_label': ['bird', 'bird']
},
...
],
"mic/pascal/microsoft_mic/20-10-28": [...],
...
}
To load the labels into memory we use hapi.get_labels(). The keyword arguments allow us to load labels for a subset of tasks and datasets.
```python
labels = hapi.get_labels(task="mic", dataset="pascal") ```
The labels are returned as a dictionary mapping from "{task}/{dataset}" to lists of dictionaries, each with keys "example_id" and "true_label".
💾 Manual Downloading
In this section, we discuss how to download the database without the HAPI Python API.
The database is stored in a GCP bucket named hapi-data. All model predictions are stored in hapi.tar.gz (Compressed size: 205.3MB, Full size: 1.2GB).
From the command line, you can download and extract the predictions with:
bash
wget https://storage.googleapis.com/hapi-data/hapi.tar.gz && tar -xzvf hapi.tar.gz
However, we recommend downloading using the Python API as described above.
🌍 Datasets
In this section, we discuss how to download the benchmark datasets used in HAPI.
The predictions in HAPI are made on benchmark datasets from across the machine learning community. For example, HAPI includes predictions on PASCAL, a popular object detection dataset. Unfortunately, we lack the permissions required to redistribute these datasets, so we do not include the raw data in the download described above.
We provide instructions on how to download each of the datasets and, for a growing number of them, we provide automated scripts that can download the dataset. These scripts are implemented in the Meerkat Dataset Registry – a registry of machine learning datasets (similar to Torchvision Datasets).
To download a dataset and load it into memory, use hapi.get_dataset():
```python
import hapi dp = hapi.getdataset("pascal") ``` This returns a Meerkat DataPanel – a DataFrame-like object that houses the dataset. See the Meerkat User Guide for more information. The DataPanel will have an "exampleid" column that corresponds to the "exampleid" key in the outputs of `hapi.getpredictions()
andhapi.get_labels()`.
If the dataset is not yet available through the Meerkat Dataset Registry, a ValueError will be raised containing instructions for manually downloading the dataset. For example:
```python
dp = hapi.get_dataset("cmd")
ValueError: Data download for 'cmd' not yet available for download through the HAPI Python API. Please download manually following the instructions below:
CMD is a spoken command recognition dataset.
It can be downloaded here: https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.datasets.googlespeechcommands.html. ```
✉️ About
HAPI was developed at Stanford in the Zou Group. Reach out to Lingjiao Chen (lingjiao [at] stanford [dot] edu) and Sabri Eyuboglu (eyuboglu [at] stanford [dot] edu) if you would like to get involved!
Owner
- Login: lchen001
- Kind: user
- Repositories: 2
- Profile: https://github.com/lchen001
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this benchmark, please cite it as below." authors: - family-names: Chen given-names: Lingjiao - family-names: Eyuboglu given-names: Sabri orcid: "https://orcid.org/0000-0002-8412-0266" - family-names: Jin given-names: Zhihua - family-names: Ré given-names: Christopher - family-names: Zaharia given-names: Matei - family-names: Zou given-names: James title: "hapi" version: 1.0.0 doi: 10.5281/zenodo.1234 date-released: 2021-11-29 url: "https://github.com/lchen001/HAPI"
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 1
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- yueyu1030 (1)
Pull Request Authors
- TrellixVulnTeam (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- dcbench * develop
- ipython * develop
- twine * develop
- dcbench *
- 144 dependencies