https://github.com/bootphon/speech-map

Mean Average Precision over words or n-grams with speech features

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Mean Average Precision over words or n-grams with speech features

Basic Info

Host: GitHub
Owner: bootphon
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 22.1 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed 10 months ago

Metadata Files

Readme License

Mean Average Precision over words or n-grams with speech features

Compute the Mean Average Precision (MAP) with speech features.

This is the MAP@R from equation (3) of https://arxiv.org/abs/2003.08505.

Installation

This package is available on PyPI:

bash pip install speech-map

It is much more efficient to use the Faiss backend for the k-NN, instead of the naive PyTorch backend. Since Faiss is not available on PyPI, you can install this package in a conda environment with your conda variant:

CPU version: bash micromamba create -f environment-cpu.yaml
GPU version: bash CONDA_OVERRIDE_CUDA=12.6 micromamba create -f environment-gpu.yaml

Usage

CLI

``` ❯ python -m speechmap --help usage: _main__.py [-h] [--pooling {MEAN,MAX,MIN,HAMMING}] [--frequency FREQUENCY] [--backend {FAISS,TORCH}] features jsonl

Mean Average Precision over n-grams / words with speech features

positional arguments: features Path to the directory with pre-computed features jsonl Path to the JSONL file with annotations

options: -h, --help show this help message and exit --pooling {MEAN,MAX,MIN,HAMMING} Pooling (default: MEAN) --frequency FREQUENCY Feature frequency in Hz (default: 50 Hz) --backend {FAISS,TORCH} KNN (default: FAISS) ```

Python API

You most probably need only two functions: build_embeddings_and_labels and mean_average_precision. Use them like this:

```python from speechmap import buildembeddingsandlabels, meanaverageprecision

embeddings, labels = buildembeddingsandlabels(pathtofeatures, pathtojsonl) print(meanaverage_precision(embeddings, labels)) ```

In this example, path_to_features is a path to a directory containing features stored in individual PyTorch tensor files, and path_to_jsonl is the path to the JSONL annotations file.

You can also use those functions in a more advanced setting like this:

```python from speechmap import Pooling, buildembeddingsandlabels, meanaverageprecision

embeddings, labels = buildembeddingsandlabels( pathtofeatures, pathtojsonl, pooling=Pooling.MAX, frequency=100, featuremaker=mymodel, fileextension=".wav", ) print(meanaverageprecision(embeddings, labels)) ```

This is a minimal package, and you can easily go through the code in src/speech_map/core.py if you want to check the details.

Data

We distribute in data the words and n-grams annotations for LibriSpeech evaluation subsets. Decompress them with zstd.

We have not used the n-grams annotations recently; there is probably too much samples and they would need some clever subsampling.

References

MAP for speech representations:

bibtex @inproceedings{carlin11_interspeech, title = {Rapid evaluation of speech representations for spoken term discovery}, author = {Michael A. Carlin and Samuel Thomas and Aren Jansen and Hynek Hermansky}, year = {2011}, booktitle = {Interspeech 2011}, pages = {821--824}, doi = {10.21437/Interspeech.2011-304}, issn = {2958-1796}, }

Data and original implementation:

bibtex @inproceedings{algayres20_interspeech, title = {Evaluating the Reliability of Acoustic Speech Embeddings}, author = {Robin Algayres and Mohamed Salah Zaiem and Benoît Sagot and Emmanuel Dupoux}, year = {2020}, booktitle = {Interspeech 2020}, pages = {4621--4625}, doi = {10.21437/Interspeech.2020-2362}, issn = {2958-1796}, }

Owner

Name: CoML
Login: bootphon
Kind: organization
Email: syntheticlearner@gmail.com
Location: Paris, France

Website: https://cognitive-ml.fr
Repositories: 55
Profile: https://github.com/bootphon

GitHub Events

Total

Public event: 1
Push event: 4

Last Year

Public event: 1
Push event: 4

Issues and Pull Requests

Last synced: 12 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/release.yml actions

actions/checkout v4 composite
actions/upload-artifact v4 composite
astral-sh/ruff-action v3 composite
astral-sh/setup-uv v5 composite

pyproject.toml pypi

numpy ==1.26.4
polars >=1.30.0
torch >=2.6.0
tqdm >=4.67.1

uv.lock pypi

appnope 0.1.4
asttokens 3.0.0
cffi 1.17.1
colorama 0.4.6
comm 0.2.2
debugpy 1.8.14
decorator 5.2.1
executing 2.2.0
filelock 3.18.0
fsspec 2025.5.1
ipykernel 6.29.5
ipython 9.3.0
ipython-pygments-lexers 1.1.1
jedi 0.19.2
jinja2 3.1.6
jupyter-client 8.6.3
jupyter-core 5.8.1
markupsafe 3.0.2
matplotlib-inline 0.1.7
mpmath 1.3.0
nest-asyncio 1.6.0
networkx 3.5
numpy 1.26.4
nvidia-cublas-cu12 12.6.4.1
nvidia-cuda-cupti-cu12 12.6.80
nvidia-cuda-nvrtc-cu12 12.6.77
nvidia-cuda-runtime-cu12 12.6.77
nvidia-cudnn-cu12 9.5.1.17
nvidia-cufft-cu12 11.3.0.4
nvidia-cufile-cu12 1.11.1.6
nvidia-curand-cu12 10.3.7.77
nvidia-cusolver-cu12 11.7.1.2
nvidia-cusparse-cu12 12.5.4.2
nvidia-cusparselt-cu12 0.6.3
nvidia-nccl-cu12 2.26.2
nvidia-nvjitlink-cu12 12.6.85
nvidia-nvtx-cu12 12.6.77
packaging 25.0
parso 0.8.4
pexpect 4.9.0
platformdirs 4.3.8
polars 1.30.0
prompt-toolkit 3.0.51
psutil 7.0.0
ptyprocess 0.7.0
pure-eval 0.2.3
pycparser 2.22
pygments 2.19.1
python-dateutil 2.9.0.post0
pywin32 310
pyzmq 26.4.0
ruff 0.12.0
setuptools 80.9.0
six 1.17.0
speech-map 0.1.0
stack-data 0.6.3
sympy 1.14.0
torch 2.7.0
tornado 6.5.1
tqdm 4.67.1
traitlets 5.14.3
triton 3.3.0
typing-extensions 4.13.2
wcwidth 0.2.13

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bootphon/speech-map

Science Score: 49.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Mean Average Precision over words or n-grams with speech features

Installation

Usage

CLI

Python API

Data

References

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies