pyfoal

Python forced alignment

https://github.com/maxrmorrison/pyfoal

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary

Keywords

alignment phoneme speech

Last synced: 6 months ago · JSON representation ·

Repository

Python forced alignment

Basic Info

Host: GitHub
Owner: maxrmorrison
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 7.16 MB

Statistics

Stars: 91
Watchers: 4
Forks: 5
Open Issues: 5
Releases: 0

Topics

alignment phoneme speech

Created over 5 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License Citation

Python forced alignment

[![PyPI](https://img.shields.io/pypi/v/pyfoal.svg)](https://pypi.python.org/pypi/pyfoal) [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![Downloads](https://static.pepy.tech/badge/pyfoal)](https://pepy.tech/project/pyfoal)

Forced alignment suite. Includes English grapheme-to-phoneme (G2P) and phoneme alignment from the following forced alignment tools. - RAD-TTS [1] - Montreal Forced Aligner (MFA) [2] - Penn Phonetic Forced Aligner (P2FA) [3]

RAD-TTS is used by default. Alignments can be saved to disk or accessed via the pypar.Alignment phoneme alignment representation. See pypar for more details.

pyfoal also includes the following - Converting alignments to and from a categorical representation suitable for training machine learning models (pyfoal.convert) - Natural interpolation of forced alignments for time-stretching speech (pyfoal.interpolate)

Installation
Inference
- Application programming interface
- Command-line interface
Training
- Download
- Preprocess
- Partition
- Train
- Monitor
- Evaluate
References

Installation

pip install pyfoal

MFA and P2FA both require additional installation steps found below.

Montreal Forced Aligner (MFA)

conda install -c conda-forge montreal-forced-aligner

Penn Phonetic Forced Aligner (P2FA)

P2FA depends on the Hidden Markov Model Toolkit (HTK), which has been tested on Mac OS and Linux using HTK version 3.4.0. There are known issues in using version 3.4.1 on Linux. HTK is released under a license that prohibits redistribution, so you must install HTK yourself and verify that the commands HCopy and HVite are available as system-wide binaries. After downloading HTK, I use the following for installation on Linux.

sudo apt-get install -y gcc-multilib libx11-dev sudo chmod +x configure ./configure --disable-hslab make all sudo make install

For more help with HTK installation, see notes by Jaekoo Kang and Steve Rubin.

Inference

Force-align text and audio

```python import pyfoal

Load text

text = pyfoal.load.text(text_file)

Load and resample audio

audio = pyfoal.load.audio(audio_file)

Select an aligner. One of ['mfa', 'p2fa', 'radtts' (default)].

aligner = 'radtts'

For RAD-TTS, select a model checkpoint

checkpoint = pyfoal.DEFAULT_CHECKPOINT

Select a GPU to run inference on

gpu = 0

alignment = pyfoal.fromtextandaudio( text, audio, pyfoal.SAMPLERATE, aligner=aligner, checkpoint=checkpoint, gpu=gpu) ```

Application programming interface

`pyfoal.from_text_and_audio`

``` """Phoneme-level forced-alignment

Arguments text : string The speech transcript audio : torch.tensor(shape=(1, samples)) The speech signal to process sample_rate : int The audio sampling rate

Returns alignment : pypar.Alignment The forced alignment """ ```

`pyfoal.from_file`

``` """Phoneme alignment from audio and text files

Arguments textfile : Path The corresponding transcript file audiofile : Path The audio file to process aligner : str The alignment method to use checkpoint : Path The checkpoint to use for neural methods gpu : int The index of the gpu to perform alignment on for neural methods

Returns alignment : Alignment The forced alignment """ ```

`pyfoal.from_file_to_file`

``` """Perform phoneme alignment from files and save to disk

Arguments textfile : Path The corresponding transcript file audiofile : Path The audio file to process output_file : Path The file to save the alignment aligner : str The alignment method to use checkpoint : Path The checkpoint to use for neural methods gpu : int The index of the gpu to perform alignment on for neural methods """ ```

`pyfoal.from_files_to_files`

``` """Perform parallel phoneme alignment from many files and save to disk

Arguments textfiles : list The transcript files audiofiles : list The corresponding speech audio files outputfiles : list The files to save the alignments aligner : str The alignment method to use numworkers : int Number of CPU cores to utilize. Defaults to all cores. checkpoint : Path The checkpoint to use for neural methods gpu : int The index of the gpu to perform alignment on for neural methods """ ```

Command-line interface

``` python -m pyfoal [-h] --textfiles TEXTFILES [TEXTFILES ...] --audiofiles AUDIOFILES [AUDIOFILES ...] --outputfiles OUTPUTFILES [OUTPUT_FILES ...] [--aligner ALIGNER] [--numworkers NUMWORKERS] [--checkpoint CHECKPOINT] [--gpu GPU]

Arguments: -h, --help show this help message and exit --textfiles TEXTFILES [TEXTFILES ...] The speech transcript files --audiofiles AUDIOFILES [AUDIOFILES ...] The speech audio files --outputfiles OUTPUTFILES [OUTPUTFILES ...] The files to save the alignments --aligner ALIGNER The alignment method to use --numworkers NUM_WORKERS Number of CPU cores to utilize. Defaults to all cores. --checkpoint CHECKPOINT The checkpoint to use for neural methods --gpu GPU The index of the GPU to use for inference. Defaults to CPU. ```

Training

Download

python -m pyfoal.data.download

Downloads and uncompresses the arctic and libritts datasets used for training.

Preprocess

python -m pyfoal.data.preprocess

Converts each dataset to a common format on disk ready for training.

Partition

python -m pyfoal.partition

Generates train valid, and test partitions for arctic and libritts. Partitioning is deterministic given the same random seed. You do not need to run this step, as the original partitions are saved in pyfoal/assets/partitions.

Train

python -m pyfoal.train --config <config> --gpus <gpus>

Trains a model according to a given configuration on the libritts dataset. Uses a list of GPU indices as an argument, and uses distributed data parallelism (DDP) if more than one index is given. For example, --gpus 0 3 will train using DDP on GPUs 0 and 3.

Monitor

Run tensorboard --logdir runs/. If you are running training remotely, you must create a SSH connection with port forwarding to view Tensorboard. This can be done with ssh -L 6006:localhost:6006 <user>@<server-ip-address>. Then, open localhost:6006 in your browser.

Evaluate

python -m pyfal.evaluate \ --config <config> \ --checkpoint <checkpoint> \ --gpu <gpu>

Evaluate a model. <checkpoint> is the checkpoint file to evaluate and <gpu> is the GPU index.

References

[1] R. Badlani, A. Łańcucki, K. J. Shih, R. Valle, W. Ping, and B. Catanzaro, "One TTS Alignment to Rule Them All," International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.

[2] J. Yuan and M. Liberman, “Speaker identification on the scotus corpus,” Journal of the Acoustical Society of America, vol. 123, p. 3878, 2008.

[3] M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger, "Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi," Interspeech, vol. 2017, p. 498-502. 2017.

Owner

Name: Max Morrison
Login: maxrmorrison
Kind: user

Website: https://www.maxrmorrison.com
Twitter: maxrmorrison
Repositories: 7
Profile: https://github.com/maxrmorrison

Computer Science PhD student at Northwestern University researching machine learning and audio technology

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it using the following metadata."
authors:
- family-names: "Morrison"
  given-names: "Max"
title: "pyfoal"
version: 1.0.0
date-released: 2023-02-01
url: "https://github.com/maxrmorrison/pyfoal"

GitHub Events

Total

Issues event: 1
Watch event: 22
Issue comment event: 1

Last Year

Issues event: 1
Watch event: 22
Issue comment event: 1

Committers

Last synced: 9 months ago

All Time

Total Commits: 134
Total Committers: 3
Avg Commits per committer: 44.667
Development Distribution Score (DDS): 0.321

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Max Morrison	m**n@g**m	91
Max Morrison	m**x@d**m	42
Nathan Pruyne	n**e@g**m	1

Committer Domains (Top 20 + Academic)

descript.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 23
Total pull requests: 1
Average time to close issues: 11 days
Average time to close pull requests: about 4 hours
Total issue authors: 14
Total pull request authors: 1
Average comments per issue: 1.61
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

maxrmorrison (7)
lvZic (2)
pranavmalikk (2)
jimkleiber (1)
NicoCaldo (1)
naitian (1)
carodb (1)
YuCaIb (1)
SlistInc (1)
aneesh3397 (1)
aaron-jencks (1)
AlexJustRandom (1)

Pull Request Authors

NathanPruyne (2)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 156 last-month

Total dependent packages: 1
Total dependent repositories: 1
Total versions: 6
Total maintainers: 1

pypi.org: pyfoal

Python forced aligner

Homepage: https://github.com/maxrmorrison/pyfoal
Documentation: https://pyfoal.readthedocs.io/
License: MIT
Latest release: 1.0.1
published almost 2 years ago

Versions: 6
Dependent Packages: 1
Dependent Repositories: 1
Downloads: 156 Last month

Rankings

Dependent packages count: 7.3%

Stargazers count: 10.7%

Downloads: 12.9%

Average: 14.4%

Forks count: 19.2%

Dependent repos count: 22.1%

Maintainers (1)

maxrmorrison

Last synced: 6 months ago

Dependencies

setup.py pypi

g2p_en *
pypar *
resampy *
soundfile *

pyfoal

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Python forced alignment

Table of contents

Installation

Montreal Forced Aligner (MFA)

Penn Phonetic Forced Aligner (P2FA)

Inference

Force-align text and audio

Load text

Load and resample audio

Select an aligner. One of ['mfa', 'p2fa', 'radtts' (default)].

For RAD-TTS, select a model checkpoint

Select a GPU to run inference on

Application programming interface

pyfoal.from_text_and_audio

pyfoal.from_file

pyfoal.from_file_to_file

pyfoal.from_files_to_files

Command-line interface

Training

Download

Preprocess

Partition

Train

Monitor

Evaluate

References

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: pyfoal

Rankings

Maintainers (1)

Dependencies

`pyfoal.from_text_and_audio`

`pyfoal.from_file`

`pyfoal.from_file_to_file`

`pyfoal.from_files_to_files`