https://github.com/aliutkus/speechmetrics

A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
1 of 5 committers (20.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR

Basic Info

Host: GitHub
Owner: aliutkus
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 28.2 MB

Statistics

Stars: 968
Watchers: 21
Forks: 165
Open Issues: 27
Releases: 0

Created almost 7 years ago · Last pushed almost 3 years ago

Metadata Files

Readme License

speechmetrics

This repository is a wrapper around several freely available implementations of objective metrics for estimating the quality of speech signals. It includes both relative and absolute metrics, which means metrics that do or do not need a reference signal, respectively.

If you find speechmetrics useful, you are welcome to cite the original papers for the corresponding metrics, since this is just a wrapper around the implementations that were kindly provided by the original authors.

Please let me know if you think of some metric with available python implementation that could be included here!

Installation

As of our recent tests, installation goes smoothly on ubuntu, but there may be some compiler errors for pypesq on iOs.

Note that the mosnet seems to be incompatible with numpy >= 1.24

pip install numpy==1.23.4 pip install git+https://github.com/aliutkus/speechmetrics#egg=speechmetrics

Usage

speechmetrics has been designed to be easily used in a modular way. All you need to do is to specify the actual metrics you want to use and it will load them.

The process is to: 1. Load the metrics you want with the load function from the root of the package, that takes two arguments: * metrics: str or list of str the available metrics that match this argument will be automatically loaded. This matching is relative to the structure of the speechmetrics package. For instance: - 'absolute' will match all absolute metrics - 'absolute.srmr' or 'srmr' will only match SRMR - '' will match all * window: float or None gives the length in seconds of the windows on which to compute the actual scores. If None, the whole signals will be considered.
my_metrics = speechmetrics.load('relative', window=5)

Just call the object returned by load with your estimated file (and your reference in case of relative metrics.)
scores = my_metrics(path_to_estimate, path_to_reference)
Numpy arrays are also supported, but the corresponding sampling rate needs to be specified
scores = my_metrics(estimate_array, reference_array, rate=sampling_rate) > WARNING: The convention for relative metrics is to provide estimate first, and reference second.
> This is the opposite as the general convention.
> => The advantage is: you can still call absolute metrics with the same code, they will just ignore the reference.

Example

```

the case of absolute metrics

import speechmetrics windowlength = 5 # seconds metrics = speechmetrics.load('absolute', windowlength) scores = metrics(pathtoaudio_file)

the case of relative metrics

metrics = speechmetrics.load(['bsseval', 'sisdr'], windowlength) scores = metrics(pathtoestimatefile, pathtoreference)

mixed case, still works

metrics = speechmetrics.load(['bsseval', 'mosnet'], windowlength) scores = metrics(pathtoestimatefile, pathtoreference)

```

Available metrics

Absolute metrics (`absolute`)

MOSNet (`absolute.mosnet` or `mosnet`)

dimensionless, higher is better. 0=very bad, 5=very good

As provided by the authors of MOSNet: Deep Learning based Objective Assessment for Voice Conversion. Original github here

@article{lo2019mosnet,
title={MOSNet: Deep Learning based Objective Assessment for Voice Conversion},
author={Lo, Chen-Chou and Fu, Szu-Wei and Huang, Wen-Chin and Wang, Xin and Yamagishi, Junichi and Tsao, Yu and Wang, Hsin-Min},
journal={arXiv preprint arXiv:1904.08352},
year={2019} }

SRMR (`absolute.srmr` or `srmr`)

dimensionless ratio, higher is better. 0=very bad, 1=very good

As provided by the SRMR Toolbox, implemented by @jfsantos.

@article{falk2010non,
title={A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech},
author={Falk, Tiago H and Zheng, Chenxi and Chan, Wai-Yip},
journal={IEEE Transactions on Audio, Speech, and Language Processing},
volume={18},
number={7},
pages={1766--1774},
year={2010},
}
@inproceedings{santos2014updated,
title={An updated objective intelligibility estimation metric for normal hearing listeners under noise and reverberation},
author={Santos, Joo F and Senoussaoui, Mohammed and Falk, Tiago H},
booktitle={Proc. Int. Workshop Acoust. Signal Enhancement},
pages={55--59},
year={2014}
}
@article{santos2014updating,
title={Updating the SRMR-CI metric for improved intelligibility prediction for cochlear implant users},
author={Santos, Jo{~a}o F and Falk, Tiago H},
journal={IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)},
volume={22},
number={12},
pages={2197--2206},
year={2014},
}

Relative metrics (`relative`)

BSSEval (`relative.bsseval` or `bsseval`)

expressed in dB, higher is better.

As presented in this paper and freely available in the official museval page, corresponds to BSSEval v4. There are 3 submetrics handled here: SDR, SAR, ISR.

@InProceedings{SiSEC18,
author="St{\"o}ter, Fabian-Robert and Liutkus, Antoine and Ito, Nobutaka",
title="The 2018 Signal Separation Evaluation Campaign",
booktitle="Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Surrey, UK",
year="2018",
pages="293--305"
}

PESQ (`relative.pesq` or `pesq`)

dimensionless, higher is better. 0=very bad, 5=very good

Wide band PESQ. As implemented there by @ludlows. Pranay Manocha: "[This implementation] matches with a very old matlab implementation of Phillip Loizou’s book. (I personally verified that)"

NBPESQ (`relative.nb_pesq` or `nb_pesq`)

dimensionless, higher is better. 0=very bad, 5=very good

Narrow band PESQ. As implemented there by @vBaiCai.

STOI (`relative.stoi` or `stoi`)

dimensionless correlation coefficient, higher is better. 0=very bad, 1=very good

As implemented by @mpariente here * > @inproceedings{taal2010short,
title={A short-time objective intelligibility measure for time-frequency weighted noisy speech},
author={Taal, Cees H and Hendriks, Richard C and Heusdens, Richard and Jensen, Jesper},
booktitle={2010 IEEE International Conference on Acoustics, Speech and Signal Processing},
pages={4214--4217},
year={2010},
organization={IEEE}
} * > @article{taal2011algorithm,
title={An algorithm for intelligibility prediction of time--frequency weighted noisy speech},
author={Taal, Cees H and Hendriks, Richard C and Heusdens, Richard and Jensen, Jesper},
journal={IEEE Transactions on Audio, Speech, and Language Processing},
volume={19},
number={7},
pages={2125--2136},
year={2011},
publisher={IEEE}
} * > @article{jensen2016algorithm,
title={An algorithm for predicting the intelligibility of speech masked by modulated noise maskers},
author={Jensen, Jesper and Taal, Cees H},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
volume={24},
number={11},
pages={2009--2022},
year={2016},
publisher={IEEE}
}

SISDR: Scale-invariant SDR (`relative.sisdr` or `sisdr`)

expressed in dB, higher is better.

As described in the following paper and implemented by @Jonathan-LeRoux here * > @article{Roux_2019,
title={SDR – Half-baked or Well Done?},
ISBN={9781479981311},
url={http://dx.doi.org/10.1109/ICASSP.2019.8683855},
DOI={10.1109/icassp.2019.8683855},
journal={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
publisher={IEEE},
author={Roux, Jonathan Le and Wisdom, Scott and Erdogan, Hakan and Hershey, John R.},
year={2019},
month={May}
}

Owner

Name: Antoine Liutkus
Login: aliutkus
Kind: user
Location: France
Company: @INRIA

Repositories: 7
Profile: https://github.com/aliutkus

Researcher at Inria

GitHub Events

Total

Issues event: 1
Watch event: 98
Fork event: 15

Last Year

Issues event: 1
Watch event: 98
Fork event: 15

Committers

Last synced: about 1 year ago

All Time

Total Commits: 56
Total Committers: 5
Avg Commits per committer: 11.2
Development Distribution Score (DDS): 0.589

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Lo Chenchou	l****u	23
Antoine Liutkus	a**s@i**r	21
mpariente	p**l@g**m	8
Joseph Turian	t**n@g**m	3
刘濠赫	l**7@b**m	1

Committer Domains (Top 20 + Academic)

bytedance.com: 1 inria.fr: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 44
Total pull requests: 14
Average time to close issues: about 22 hours
Average time to close pull requests: about 1 month
Total issue authors: 26
Total pull request authors: 5
Average comments per issue: 1.89
Average comments per pull request: 1.21
Merged pull requests: 10
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

a897456 (3)
mpariente (3)
BilalDendani (2)
hidehowever1 (2)
ruaruaruabick (2)
haoheliu (1)
dmumtaz (1)
divineSix (1)
zuowanbushiwo (1)
LuWei6896 (1)
yujiren (1)
hcy96 (1)
chankl3579 (1)
williamluer (1)
1902716153 (1)

Pull Request Authors

mpariente (5)
turian (1)
jonashaag (1)
jakeChal (1)
haoheliu (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

setup.py pypi

This *
and *
gammatone *
line. *
museval *
numpy *
pesq *
pypesq *
pystoi *
resampy *
scipy *
srmrpy *
tqdm *

https://github.com/aliutkus/speechmetrics

Science Score: 46.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

speechmetrics

Installation

Usage

Example

the case of absolute metrics

the case of relative metrics

mixed case, still works

Available metrics

Absolute metrics (absolute)

MOSNet (absolute.mosnet or mosnet)

SRMR (absolute.srmr or srmr)

Relative metrics (relative)

BSSEval (relative.bsseval or bsseval)

PESQ (relative.pesq or pesq)

NBPESQ (relative.nb_pesq or nb_pesq)

STOI (relative.stoi or stoi)

SISDR: Scale-invariant SDR (relative.sisdr or sisdr)

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Absolute metrics (`absolute`)

MOSNet (`absolute.mosnet` or `mosnet`)

SRMR (`absolute.srmr` or `srmr`)

Relative metrics (`relative`)

BSSEval (`relative.bsseval` or `bsseval`)

PESQ (`relative.pesq` or `pesq`)

NBPESQ (`relative.nb_pesq` or `nb_pesq`)

STOI (`relative.stoi` or `stoi`)

SISDR: Scale-invariant SDR (`relative.sisdr` or `sisdr`)