nisqa-s

https://github.com/deepvk/nisqa-s

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: deepvk
License: apache-2.0
Language: Python
Default Branch: main
Size: 8.45 MB

Statistics

Stars: 40
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

NISQA-s: Speech Quality and Naturalness Assessment for Online Inference

NISQA-s is highly stripped and optimized version of original NISQA metric. It is aiming to create universal metrics set for both offline and online evaluation of audio quality.

This version supports only CNN+LSTM version of original model (since other modifications don't support streaming or perform too slow). It uses the same architecture with some tweaks for streaming purposes. Also there's no MOS-only model, since main model supports MOS prediction (for simplicity of the code and repo).

Installation

(Optional) Create new venv or conda env

Then just pip install -r requirements.txt

Note that there may be some problems with torch installation. If so, follow official PyTorch instructions

Quick start

Please note that provided checkpoint trained for 48 kHz files. Upsample your files if yours has lower samplerate.

If you want to just run this repo with provided config and samples - python -m scripts.run_infer_file

If you want to test online inference from your mic - python -m scripts.run_infer_mic This will log inference results to terminal, so pay attention to it.

Config options

Default config is config/nisqa_s.yaml. All configurations for everything related to training and inference are happening here. There are detailed comments about each parameter, so we'll cover only the most important ones for inference:

ckp: path to trained checkpoint (weights/nisqa_s.tar by default)
sample: path to evaluated file

If you plan to run online inference, you should pay close attention to last 4 arguments in this config:

frame lets you choose length of buffer to feed into the model;
updates will make the model spit metrics more often (check argument description)
sd_device's ID should be provided if you want to run this on different input devices (e.g. sound-card mic). First run of run_infer_mic.py will show you those IDs.
sd_dump lets you save mic input to check the results in offline later.

And finally, you can run custom config for your experiments - just add --yaml argument to python -m scripts.run_infer_file/python -m scripts.run_infer_mic and provide path to your own config: python -m scripts.run_infer_file --yaml path/to/custom/config.yaml

Training

We provide simple interface for training your own version of NISQA-s.

Firstly, you will need the dataset. You can obtain it from official NISQA repo. This is probably the only (but definitely the best) way to train this, since the data needs to be very specifically labeled for this to work.

To train the same version as provided - python -m scripts.run_train

Remember to check name of the experiment in nisqa_s.yaml and path to NISQA Corpus in data_dir, as well as path to save the model (output_dir)

Training and model parameters in config

Since you're most probably using NISQA Corpus, there is no need to change anything in Dataset options. If you use some hand-made dataset - you need to refer to this guide.
Training options contains all parameters connected to training setup (like learning rates, batch size etc.).
You can also experiment with bias loss by enabling Bias loss options
Change Mel-Specs options if you want to experiment on different samplerates, Fourier lengths or sample length for training (although it is highly not recommended to lower value of ms_max_length because of NISQA Corpus labeling)
CNN parameters and LSTM parameters - change those to experiment on different parameters of convolutional and recurrent layers.

Note that provided checkpoint is trained with provided config.

Citations

@article{Mittag_Naderi_Chehadi_Möller_2021, title={Nisqa: A deep CNN-self-attention model for multidimensional speech quality prediction with crowdsourced datasets}, DOI={10.21437/interspeech.2021-299}, journal={Interspeech 2021}, author={Mittag, Gabriel and Naderi, Babak and Chehadi, Assmaa and Möller, Sebastian}, year={2021} } @misc{deepvk2024nisqa, author = {Ivan, Beskrovnyi}, title = {nisqa-s}, year = {2024}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {https://github.com/deepvk/nisqa-s} }

Owner

Name: Deep VK
Login: deepvk
Kind: organization

Repositories: 3
Profile: https://github.com/deepvk

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: NISQA-s
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Ivan
    family-names: Beskrovnyi
    email: i.beskrovnyy@vk.team
    affiliation: 'deepvk, VK'
identifiers:
  - type: url
    value: 'https://github.com/deepvk/nisqa-s'
    description: Repository
repository-code: 'https://github.com/deepvk/nisqa-s'
abstract: >-
  Code of modified NISQA-s metric and the weights of the
  trained metric.
keywords:
  - speech quality
  - metric
license: Apache-2.0

GitHub Events

Total

Issues event: 2
Watch event: 9
Issue comment event: 2
Push event: 3
Fork event: 2

Last Year

Issues event: 2
Watch event: 9
Issue comment event: 2
Push event: 3
Fork event: 2

Dependencies

.github/workflows/main.yaml actions

actions/checkout v3 composite
actions/setup-python v3 composite

pyproject.toml pypi

requirements.dev.txt pypi

black ==23.12.1 development
isort ==5.13.2 development
mypy ==1.8.0 development
pytest ==7.4.3 development
pytest-subtests ==0.11.0 development

requirements.txt pypi

PyYAML ==6.0.1
loguru ==0.7.2
matplotlib ==3.8.0
numpy ==1.25.2
pandas ==2.2.0
scipy ==1.12.0
sounddevice ==0.4.6
soundfile ==0.12.1
torch ==2.1.2
torchaudio ==2.1.2
tqdm ==4.66.1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science