utmosv2

UTokyo-SaruLab MOS Prediction System

https://github.com/sarulab-speech/utmosv2

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
✓
Institutional organization owner
Organization sarulab-speech has institutional domain (www.sp.ipc.i.u-tokyo.ac.jp)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

UTokyo-SaruLab MOS Prediction System

Basic Info

Host: GitHub
Owner: sarulab-speech
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 2.53 MB

Statistics

Stars: 210
Watchers: 7
Forks: 23
Open Issues: 0
Releases: 6

Created almost 2 years ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

UTMOSv2: UTokyo-SaruLab MOS Prediction System

🎤✨ Official implementation of ✨🎤
“The T05 System for The VoiceMOS Challenge 2024:
Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech”
🏅🎉 accepted by IEEE Spoken Language Technology Workshop (SLT) 2024. 🎉🏅

ꔫ･-･ꔫ･-･ꔫ･-･ꔫ･-･ꔫ･-･ꔫ･-･ꔫ･-･ꔫ

✨ UTMOSv2 achieved 1st place in 7 out of 16 metrics ✨
✨🏆 and 2nd place in the remaining 9 metrics 🏆✨
✨ in the VoiceMOS Challenge 2024 Track1! ✨

🚀 Quick Prediction

✨ You can easily use the pretrained UTMOSv2 model!

🛠️ Using in your Python code 🛠️

✨⚡️ With the UTMOSv2 library, you can easily integrate it into your Python code, ⚡️✨
✨ allowing you to quickly create models and make predictions with minimal effort!! ✨

If you want to make predictions using the UTMOSv2 library, follow these steps:

Install the UTMOSv2 library from GitHub

bash pip install git+https://github.com/sarulab-speech/UTMOSv2.git

Make predictions
- To predict the MOS of a single .wav file:
python import utmosv2 model = utmosv2.create_model(pretrained=True) mos = model.predict(input_path="/path/to/wav/file.wav")

To predict the MOS of all .wav files in a folder:

python import utmosv2 model = utmosv2.create_model(pretrained=True) mos = model.predict(input_dir="/path/to/wav/dir/")

[!NOTE] Either input_path or input_dir must be specified, but not both.

📜 Using the inference script 📜

If you want to make predictions using the inference script, follow these steps:

Clone this repository and navigate to UTMOSv2 folder

bash git clone https://github.com/sarulab-speech/UTMOSv2.git cd UTMOSv2

Install Package

bash pip install --upgrade pip # enable PEP 660 support pip install -e .[optional] # install with optional dependencies

Make predictions
- To predict the MOS of a single .wav file:
bash python inference.py --input_path /path/to/wav/file.wav --out_path /path/to/output/file.csv

To predict the MOS of all .wav files in a folder:

bash python inference.py --input_dir /path/to/wav/dir/ --out_path /path/to/output/file.csv

[!NOTE] If you are using zsh, make sure to escape the square brackets like this:

zsh pip install -e '.[optional]'

[!TIP] If --out_path is not specified, the prediction results will be output to the standard output. This is particularly useful when the number of files to be predicted is small.

[!NOTE] Either --input_path or --input_dir must be specified, but not both.

[!NOTE] These methods provide quick and simple predictions. For more accurate predictions and detailed usage of the inference script, please refer to the inference guide.

🤗 You can try a simple demonstration on Hugging Face Space:

⚒️ Train UTMOSv2 Yourself

If you want to train UTMOSv2 yourself, please refer to the training guide. To reproduce the training as described in the paper or used in the competition, please refer to this document.

📂 Used Datasets

Details of the datasets used in this project can be found in the datasets documentation.

🔖 Citation

If you find UTMOSv2 useful in your research, please cite the following paper:

bibtex @inproceedings{baba2024utmosv2, title = {The T05 System for The {V}oice{MOS} {C}hallenge 2024: Transfer Learning from Deep Image Classifier to Naturalness {MOS} Prediction of High-Quality Synthetic Speech}, author = {Baba, Kaito and Nakata, Wataru and Saito, Yuki and Saruwatari, Hiroshi}, booktitle = {IEEE Spoken Language Technology Workshop (SLT)}, year = {2024}, }

Owner

Name: sarulab-speech
Login: sarulab-speech
Kind: organization
Email: shinnosuke_takamichi@ipc.i.u-tokyo.ac.jp
Location: Tokyo, Japan

Website: http://www.sp.ipc.i.u-tokyo.ac.jp/index-en
Repositories: 8
Profile: https://github.com/sarulab-speech

Speech group, Saruwatari-Koyama Lab, the University of Tokyo, Japan.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you find UTMOSv2 useful in your research, please cite the following paper."
authors:
  - name: Kaito Baba
title: "UTMOSv2: UTokyo-SaruLab MOS Prediction System"
url: "https://github.com/sarulab-speech/UTMOSv2"
preferred-citation:
  type: conference-paper
  authors:
    - family-names: "Baba"
      given-names: "Kaito"
    - family-names: "Nakata"
      given-names: "Wataru"
    - family-names: "Saito"
      given-names: "Yuki"
    - family-names: "Saruwatari"
      given-names: "Hiroshi"
  collection-title: "IEEE Spoken Language Technology Workshop (SLT)"
  title: "The t05 system for the VoiceMOS Challenge 2024: Transfer learning from deep image classifier to naturalness MOS prediction of high-quality synthetic speech"
  year: 2024

GitHub Events

Total

Create event: 23
Issues event: 10
Release event: 2
Watch event: 151
Delete event: 19
Issue comment event: 13
Push event: 51
Pull request review comment event: 8
Pull request event: 38
Pull request review event: 16
Fork event: 21

Last Year

Create event: 23
Issues event: 10
Release event: 2
Watch event: 151
Delete event: 19
Issue comment event: 13
Push event: 51
Pull request review comment event: 8
Pull request event: 38
Pull request review event: 16
Fork event: 21

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 10
Total pull requests: 43
Average time to close issues: 4 days
Average time to close pull requests: about 4 hours
Total issue authors: 8
Total pull request authors: 3
Average comments per issue: 0.9
Average comments per pull request: 0.16
Merged pull requests: 39
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 8
Pull requests: 39
Average time to close issues: 6 days
Average time to close pull requests: 16 minutes
Issue authors: 7
Pull request authors: 2
Average comments per issue: 0.88
Average comments per pull request: 0.1
Merged pull requests: 37
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

benjohn18 (2)
lars76 (2)
kehinde-elelu (1)
ajk1402 (1)
dtransposed (1)
g-milis (1)
TechInterMezzo (1)
kAIto47802 (1)

Pull Request Authors

kAIto47802 (42)
splinter21 (2)
Wataru-Nakata (1)

Top Labels

Issue Labels

bug (8) needs-discussion (1) enhancement (1)

Pull Request Labels

documentation (8) bug (6) enhancement (6) version (5) chore (2) code-fix (2) refactor (1) ci (1) feature (1)

Dependencies

.github/workflows/lint_and_format.yml actions

actions/checkout v4 composite
actions/setup-python v4 composite
stefanzweifel/git-auto-commit-action v5 composite

pyproject.toml pypi

librosa >=0.10.2
numpy >=1.24.4
pandas >=2.2.2
python-dotenv >=1.0.1
scikit-learn >=1.3.2
timm >=1.0.7
torch >=2.3.1
tqdm >=4.66.4
transformers >=4.42.4
wandb >=0.17.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science