pyfoal

Python forced alignment

https://github.com/maxrmorrison/pyfoal

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary

Keywords

alignment phoneme speech
Last synced: 6 months ago · JSON representation ·

Repository

Python forced alignment

Basic Info
  • Host: GitHub
  • Owner: maxrmorrison
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 7.16 MB
Statistics
  • Stars: 91
  • Watchers: 4
  • Forks: 5
  • Open Issues: 5
  • Releases: 0
Topics
alignment phoneme speech
Created over 5 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

Python forced alignment

[![PyPI](https://img.shields.io/pypi/v/pyfoal.svg)](https://pypi.python.org/pypi/pyfoal) [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![Downloads](https://static.pepy.tech/badge/pyfoal)](https://pepy.tech/project/pyfoal)

Forced alignment suite. Includes English grapheme-to-phoneme (G2P) and phoneme alignment from the following forced alignment tools. - RAD-TTS [1] - Montreal Forced Aligner (MFA) [2] - Penn Phonetic Forced Aligner (P2FA) [3]

RAD-TTS is used by default. Alignments can be saved to disk or accessed via the pypar.Alignment phoneme alignment representation. See pypar for more details.

pyfoal also includes the following - Converting alignments to and from a categorical representation suitable for training machine learning models (pyfoal.convert) - Natural interpolation of forced alignments for time-stretching speech (pyfoal.interpolate)

Table of contents

Installation

pip install pyfoal

MFA and P2FA both require additional installation steps found below.

Montreal Forced Aligner (MFA)

conda install -c conda-forge montreal-forced-aligner

Penn Phonetic Forced Aligner (P2FA)

P2FA depends on the Hidden Markov Model Toolkit (HTK), which has been tested on Mac OS and Linux using HTK version 3.4.0. There are known issues in using version 3.4.1 on Linux. HTK is released under a license that prohibits redistribution, so you must install HTK yourself and verify that the commands HCopy and HVite are available as system-wide binaries. After downloading HTK, I use the following for installation on Linux.

sudo apt-get install -y gcc-multilib libx11-dev sudo chmod +x configure ./configure --disable-hslab make all sudo make install

For more help with HTK installation, see notes by Jaekoo Kang and Steve Rubin.

Inference

Force-align text and audio

```python import pyfoal

Load text

text = pyfoal.load.text(text_file)

Load and resample audio

audio = pyfoal.load.audio(audio_file)

Select an aligner. One of ['mfa', 'p2fa', 'radtts' (default)].

aligner = 'radtts'

For RAD-TTS, select a model checkpoint

checkpoint = pyfoal.DEFAULT_CHECKPOINT

Select a GPU to run inference on

gpu = 0

alignment = pyfoal.fromtextandaudio( text, audio, pyfoal.SAMPLERATE, aligner=aligner, checkpoint=checkpoint, gpu=gpu) ```

Application programming interface

pyfoal.from_text_and_audio

``` """Phoneme-level forced-alignment

Arguments text : string The speech transcript audio : torch.tensor(shape=(1, samples)) The speech signal to process sample_rate : int The audio sampling rate

Returns alignment : pypar.Alignment The forced alignment """ ```

pyfoal.from_file

``` """Phoneme alignment from audio and text files

Arguments textfile : Path The corresponding transcript file audiofile : Path The audio file to process aligner : str The alignment method to use checkpoint : Path The checkpoint to use for neural methods gpu : int The index of the gpu to perform alignment on for neural methods

Returns alignment : Alignment The forced alignment """ ```

pyfoal.from_file_to_file

``` """Perform phoneme alignment from files and save to disk

Arguments textfile : Path The corresponding transcript file audiofile : Path The audio file to process output_file : Path The file to save the alignment aligner : str The alignment method to use checkpoint : Path The checkpoint to use for neural methods gpu : int The index of the gpu to perform alignment on for neural methods """ ```

pyfoal.from_files_to_files

``` """Perform parallel phoneme alignment from many files and save to disk

Arguments textfiles : list The transcript files audiofiles : list The corresponding speech audio files outputfiles : list The files to save the alignments aligner : str The alignment method to use numworkers : int Number of CPU cores to utilize. Defaults to all cores. checkpoint : Path The checkpoint to use for neural methods gpu : int The index of the gpu to perform alignment on for neural methods """ ```

Command-line interface

``` python -m pyfoal [-h] --textfiles TEXTFILES [TEXTFILES ...] --audiofiles AUDIOFILES [AUDIOFILES ...] --outputfiles OUTPUTFILES [OUTPUT_FILES ...] [--aligner ALIGNER] [--numworkers NUMWORKERS] [--checkpoint CHECKPOINT] [--gpu GPU]

Arguments: -h, --help show this help message and exit --textfiles TEXTFILES [TEXTFILES ...] The speech transcript files --audiofiles AUDIOFILES [AUDIOFILES ...] The speech audio files --outputfiles OUTPUTFILES [OUTPUTFILES ...] The files to save the alignments --aligner ALIGNER The alignment method to use --numworkers NUM_WORKERS Number of CPU cores to utilize. Defaults to all cores. --checkpoint CHECKPOINT The checkpoint to use for neural methods --gpu GPU The index of the GPU to use for inference. Defaults to CPU. ```

Training

Download

python -m pyfoal.data.download

Downloads and uncompresses the arctic and libritts datasets used for training.

Preprocess

python -m pyfoal.data.preprocess

Converts each dataset to a common format on disk ready for training.

Partition

python -m pyfoal.partition

Generates train valid, and test partitions for arctic and libritts. Partitioning is deterministic given the same random seed. You do not need to run this step, as the original partitions are saved in pyfoal/assets/partitions.

Train

python -m pyfoal.train --config <config> --gpus <gpus>

Trains a model according to a given configuration on the libritts dataset. Uses a list of GPU indices as an argument, and uses distributed data parallelism (DDP) if more than one index is given. For example, --gpus 0 3 will train using DDP on GPUs 0 and 3.

Monitor

Run tensorboard --logdir runs/. If you are running training remotely, you must create a SSH connection with port forwarding to view Tensorboard. This can be done with ssh -L 6006:localhost:6006 <user>@<server-ip-address>. Then, open localhost:6006 in your browser.

Evaluate

python -m pyfal.evaluate \ --config <config> \ --checkpoint <checkpoint> \ --gpu <gpu>

Evaluate a model. <checkpoint> is the checkpoint file to evaluate and <gpu> is the GPU index.

References

[1] R. Badlani, A. Łańcucki, K. J. Shih, R. Valle, W. Ping, and B. Catanzaro, "One TTS Alignment to Rule Them All," International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.

[2] J. Yuan and M. Liberman, “Speaker identification on the scotus corpus,” Journal of the Acoustical Society of America, vol. 123, p. 3878, 2008.

[3] M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger, "Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi," Interspeech, vol. 2017, p. 498-502. 2017.

Owner

  • Name: Max Morrison
  • Login: maxrmorrison
  • Kind: user

Computer Science PhD student at Northwestern University researching machine learning and audio technology

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it using the following metadata."
authors:
- family-names: "Morrison"
  given-names: "Max"
title: "pyfoal"
version: 1.0.0
date-released: 2023-02-01
url: "https://github.com/maxrmorrison/pyfoal"

GitHub Events

Total
  • Issues event: 1
  • Watch event: 22
  • Issue comment event: 1
Last Year
  • Issues event: 1
  • Watch event: 22
  • Issue comment event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 134
  • Total Committers: 3
  • Avg Commits per committer: 44.667
  • Development Distribution Score (DDS): 0.321
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Max Morrison m****n@g****m 91
Max Morrison m****x@d****m 42
Nathan Pruyne n****e@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 23
  • Total pull requests: 1
  • Average time to close issues: 11 days
  • Average time to close pull requests: about 4 hours
  • Total issue authors: 14
  • Total pull request authors: 1
  • Average comments per issue: 1.61
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • maxrmorrison (7)
  • lvZic (2)
  • pranavmalikk (2)
  • jimkleiber (1)
  • NicoCaldo (1)
  • naitian (1)
  • carodb (1)
  • YuCaIb (1)
  • SlistInc (1)
  • aneesh3397 (1)
  • aaron-jencks (1)
  • AlexJustRandom (1)
Pull Request Authors
  • NathanPruyne (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 156 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 1
  • Total versions: 6
  • Total maintainers: 1
pypi.org: pyfoal

Python forced aligner

  • Versions: 6
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 156 Last month
Rankings
Dependent packages count: 7.3%
Stargazers count: 10.7%
Downloads: 12.9%
Average: 14.4%
Forks count: 19.2%
Dependent repos count: 22.1%
Maintainers (1)
Last synced: 6 months ago

Dependencies

setup.py pypi