https://github.com/bootphon/speech-map

Mean Average Precision over words or n-grams with speech features

https://github.com/bootphon/speech-map

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Mean Average Precision over words or n-grams with speech features

Basic Info
  • Host: GitHub
  • Owner: bootphon
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 22.1 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 9 months ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

Mean Average Precision over words or n-grams with speech features

Compute the Mean Average Precision (MAP) with speech features.

This is the MAP@R from equation (3) of https://arxiv.org/abs/2003.08505.

Installation

This package is available on PyPI:

bash pip install speech-map

It is much more efficient to use the Faiss backend for the k-NN, instead of the naive PyTorch backend. Since Faiss is not available on PyPI, you can install this package in a conda environment with your conda variant:

  • CPU version: bash micromamba create -f environment-cpu.yaml
  • GPU version: bash CONDA_OVERRIDE_CUDA=12.6 micromamba create -f environment-gpu.yaml

Usage

CLI

``` ❯ python -m speechmap --help usage: _main__.py [-h] [--pooling {MEAN,MAX,MIN,HAMMING}] [--frequency FREQUENCY] [--backend {FAISS,TORCH}] features jsonl

Mean Average Precision over n-grams / words with speech features

positional arguments: features Path to the directory with pre-computed features jsonl Path to the JSONL file with annotations

options: -h, --help show this help message and exit --pooling {MEAN,MAX,MIN,HAMMING} Pooling (default: MEAN) --frequency FREQUENCY Feature frequency in Hz (default: 50 Hz) --backend {FAISS,TORCH} KNN (default: FAISS) ```

Python API

You most probably need only two functions: build_embeddings_and_labels and mean_average_precision. Use them like this:

```python from speechmap import buildembeddingsandlabels, meanaverageprecision

embeddings, labels = buildembeddingsandlabels(pathtofeatures, pathtojsonl) print(meanaverage_precision(embeddings, labels)) ```

In this example, path_to_features is a path to a directory containing features stored in individual PyTorch tensor files, and path_to_jsonl is the path to the JSONL annotations file.

You can also use those functions in a more advanced setting like this:

```python from speechmap import Pooling, buildembeddingsandlabels, meanaverageprecision

embeddings, labels = buildembeddingsandlabels( pathtofeatures, pathtojsonl, pooling=Pooling.MAX, frequency=100, featuremaker=mymodel, fileextension=".wav", ) print(meanaverageprecision(embeddings, labels)) ```

This is a minimal package, and you can easily go through the code in src/speech_map/core.py if you want to check the details.

Data

We distribute in data the words and n-grams annotations for LibriSpeech evaluation subsets. Decompress them with zstd.

We have not used the n-grams annotations recently; there is probably too much samples and they would need some clever subsampling.

References

MAP for speech representations:

bibtex @inproceedings{carlin11_interspeech, title = {Rapid evaluation of speech representations for spoken term discovery}, author = {Michael A. Carlin and Samuel Thomas and Aren Jansen and Hynek Hermansky}, year = {2011}, booktitle = {Interspeech 2011}, pages = {821--824}, doi = {10.21437/Interspeech.2011-304}, issn = {2958-1796}, }

Data and original implementation:

bibtex @inproceedings{algayres20_interspeech, title = {Evaluating the Reliability of Acoustic Speech Embeddings}, author = {Robin Algayres and Mohamed Salah Zaiem and Benoît Sagot and Emmanuel Dupoux}, year = {2020}, booktitle = {Interspeech 2020}, pages = {4621--4625}, doi = {10.21437/Interspeech.2020-2362}, issn = {2958-1796}, }

Owner

  • Name: CoML
  • Login: bootphon
  • Kind: organization
  • Email: syntheticlearner@gmail.com
  • Location: Paris, France

GitHub Events

Total
  • Public event: 1
  • Push event: 4
Last Year
  • Public event: 1
  • Push event: 4

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/release.yml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v4 composite
  • astral-sh/ruff-action v3 composite
  • astral-sh/setup-uv v5 composite
pyproject.toml pypi
  • numpy ==1.26.4
  • polars >=1.30.0
  • torch >=2.6.0
  • tqdm >=4.67.1
uv.lock pypi
  • appnope 0.1.4
  • asttokens 3.0.0
  • cffi 1.17.1
  • colorama 0.4.6
  • comm 0.2.2
  • debugpy 1.8.14
  • decorator 5.2.1
  • executing 2.2.0
  • filelock 3.18.0
  • fsspec 2025.5.1
  • ipykernel 6.29.5
  • ipython 9.3.0
  • ipython-pygments-lexers 1.1.1
  • jedi 0.19.2
  • jinja2 3.1.6
  • jupyter-client 8.6.3
  • jupyter-core 5.8.1
  • markupsafe 3.0.2
  • matplotlib-inline 0.1.7
  • mpmath 1.3.0
  • nest-asyncio 1.6.0
  • networkx 3.5
  • numpy 1.26.4
  • nvidia-cublas-cu12 12.6.4.1
  • nvidia-cuda-cupti-cu12 12.6.80
  • nvidia-cuda-nvrtc-cu12 12.6.77
  • nvidia-cuda-runtime-cu12 12.6.77
  • nvidia-cudnn-cu12 9.5.1.17
  • nvidia-cufft-cu12 11.3.0.4
  • nvidia-cufile-cu12 1.11.1.6
  • nvidia-curand-cu12 10.3.7.77
  • nvidia-cusolver-cu12 11.7.1.2
  • nvidia-cusparse-cu12 12.5.4.2
  • nvidia-cusparselt-cu12 0.6.3
  • nvidia-nccl-cu12 2.26.2
  • nvidia-nvjitlink-cu12 12.6.85
  • nvidia-nvtx-cu12 12.6.77
  • packaging 25.0
  • parso 0.8.4
  • pexpect 4.9.0
  • platformdirs 4.3.8
  • polars 1.30.0
  • prompt-toolkit 3.0.51
  • psutil 7.0.0
  • ptyprocess 0.7.0
  • pure-eval 0.2.3
  • pycparser 2.22
  • pygments 2.19.1
  • python-dateutil 2.9.0.post0
  • pywin32 310
  • pyzmq 26.4.0
  • ruff 0.12.0
  • setuptools 80.9.0
  • six 1.17.0
  • speech-map 0.1.0
  • stack-data 0.6.3
  • sympy 1.14.0
  • torch 2.7.0
  • tornado 6.5.1
  • tqdm 4.67.1
  • traitlets 5.14.3
  • triton 3.3.0
  • typing-extensions 4.13.2
  • wcwidth 0.2.13