Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary
Keywords
Repository
Python forced alignment
Basic Info
Statistics
- Stars: 91
- Watchers: 4
- Forks: 5
- Open Issues: 5
- Releases: 0
Topics
Metadata Files
README.md
Python forced alignment
Forced alignment suite. Includes English grapheme-to-phoneme (G2P) and phoneme alignment from the following forced alignment tools. - RAD-TTS [1] - Montreal Forced Aligner (MFA) [2] - Penn Phonetic Forced Aligner (P2FA) [3]
RAD-TTS is used by default. Alignments can be saved to disk or accessed via the
pypar.Alignment phoneme alignment representation. See
pypar for more details.
pyfoal also includes the following
- Converting alignments to and from a categorical representation
suitable for training machine learning models (pyfoal.convert)
- Natural interpolation of forced alignments for time-stretching speech
(pyfoal.interpolate)
Table of contents
Installation
pip install pyfoal
MFA and P2FA both require additional installation steps found below.
Montreal Forced Aligner (MFA)
conda install -c conda-forge montreal-forced-aligner
Penn Phonetic Forced Aligner (P2FA)
P2FA depends on the
Hidden Markov Model Toolkit (HTK), which has been
tested on Mac OS and Linux using HTK version 3.4.0. There are known issues in
using version 3.4.1 on Linux. HTK is released under a license that prohibits
redistribution, so you must install HTK yourself and verify that the commands
HCopy and HVite are available as system-wide binaries. After downloading
HTK, I use the following for installation on Linux.
sudo apt-get install -y gcc-multilib libx11-dev
sudo chmod +x configure
./configure --disable-hslab
make all
sudo make install
For more help with HTK installation, see notes by Jaekoo Kang and Steve Rubin.
Inference
Force-align text and audio
```python import pyfoal
Load text
text = pyfoal.load.text(text_file)
Load and resample audio
audio = pyfoal.load.audio(audio_file)
Select an aligner. One of ['mfa', 'p2fa', 'radtts' (default)].
aligner = 'radtts'
For RAD-TTS, select a model checkpoint
checkpoint = pyfoal.DEFAULT_CHECKPOINT
Select a GPU to run inference on
gpu = 0
alignment = pyfoal.fromtextandaudio( text, audio, pyfoal.SAMPLERATE, aligner=aligner, checkpoint=checkpoint, gpu=gpu) ```
Application programming interface
pyfoal.from_text_and_audio
``` """Phoneme-level forced-alignment
Arguments text : string The speech transcript audio : torch.tensor(shape=(1, samples)) The speech signal to process sample_rate : int The audio sampling rate
Returns alignment : pypar.Alignment The forced alignment """ ```
pyfoal.from_file
``` """Phoneme alignment from audio and text files
Arguments textfile : Path The corresponding transcript file audiofile : Path The audio file to process aligner : str The alignment method to use checkpoint : Path The checkpoint to use for neural methods gpu : int The index of the gpu to perform alignment on for neural methods
Returns alignment : Alignment The forced alignment """ ```
pyfoal.from_file_to_file
``` """Perform phoneme alignment from files and save to disk
Arguments textfile : Path The corresponding transcript file audiofile : Path The audio file to process output_file : Path The file to save the alignment aligner : str The alignment method to use checkpoint : Path The checkpoint to use for neural methods gpu : int The index of the gpu to perform alignment on for neural methods """ ```
pyfoal.from_files_to_files
``` """Perform parallel phoneme alignment from many files and save to disk
Arguments textfiles : list The transcript files audiofiles : list The corresponding speech audio files outputfiles : list The files to save the alignments aligner : str The alignment method to use numworkers : int Number of CPU cores to utilize. Defaults to all cores. checkpoint : Path The checkpoint to use for neural methods gpu : int The index of the gpu to perform alignment on for neural methods """ ```
Command-line interface
``` python -m pyfoal [-h] --textfiles TEXTFILES [TEXTFILES ...] --audiofiles AUDIOFILES [AUDIOFILES ...] --outputfiles OUTPUTFILES [OUTPUT_FILES ...] [--aligner ALIGNER] [--numworkers NUMWORKERS] [--checkpoint CHECKPOINT] [--gpu GPU]
Arguments: -h, --help show this help message and exit --textfiles TEXTFILES [TEXTFILES ...] The speech transcript files --audiofiles AUDIOFILES [AUDIOFILES ...] The speech audio files --outputfiles OUTPUTFILES [OUTPUTFILES ...] The files to save the alignments --aligner ALIGNER The alignment method to use --numworkers NUM_WORKERS Number of CPU cores to utilize. Defaults to all cores. --checkpoint CHECKPOINT The checkpoint to use for neural methods --gpu GPU The index of the GPU to use for inference. Defaults to CPU. ```
Training
Download
python -m pyfoal.data.download
Downloads and uncompresses the arctic and libritts datasets used for training.
Preprocess
python -m pyfoal.data.preprocess
Converts each dataset to a common format on disk ready for training.
Partition
python -m pyfoal.partition
Generates train valid, and test partitions for arctic and libritts.
Partitioning is deterministic given the same random seed. You do not need to
run this step, as the original partitions are saved in
pyfoal/assets/partitions.
Train
python -m pyfoal.train --config <config> --gpus <gpus>
Trains a model according to a given configuration on the libritts
dataset. Uses a list of GPU indices as an argument, and uses distributed
data parallelism (DDP) if more than one index is given. For example,
--gpus 0 3 will train using DDP on GPUs 0 and 3.
Monitor
Run tensorboard --logdir runs/. If you are running training remotely, you
must create a SSH connection with port forwarding to view Tensorboard.
This can be done with ssh -L 6006:localhost:6006 <user>@<server-ip-address>.
Then, open localhost:6006 in your browser.
Evaluate
python -m pyfal.evaluate \
--config <config> \
--checkpoint <checkpoint> \
--gpu <gpu>
Evaluate a model. <checkpoint> is the checkpoint file to evaluate and <gpu>
is the GPU index.
References
[1] R. Badlani, A. Łańcucki, K. J. Shih, R. Valle, W. Ping, and B. Catanzaro, "One TTS Alignment to Rule Them All," International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
[2] J. Yuan and M. Liberman, “Speaker identification on the scotus corpus,” Journal of the Acoustical Society of America, vol. 123, p. 3878, 2008.
[3] M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger, "Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi," Interspeech, vol. 2017, p. 498-502. 2017.
Owner
- Name: Max Morrison
- Login: maxrmorrison
- Kind: user
- Website: https://www.maxrmorrison.com
- Twitter: maxrmorrison
- Repositories: 7
- Profile: https://github.com/maxrmorrison
Computer Science PhD student at Northwestern University researching machine learning and audio technology
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it using the following metadata." authors: - family-names: "Morrison" given-names: "Max" title: "pyfoal" version: 1.0.0 date-released: 2023-02-01 url: "https://github.com/maxrmorrison/pyfoal"
GitHub Events
Total
- Issues event: 1
- Watch event: 22
- Issue comment event: 1
Last Year
- Issues event: 1
- Watch event: 22
- Issue comment event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Max Morrison | m****n@g****m | 91 |
| Max Morrison | m****x@d****m | 42 |
| Nathan Pruyne | n****e@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 23
- Total pull requests: 1
- Average time to close issues: 11 days
- Average time to close pull requests: about 4 hours
- Total issue authors: 14
- Total pull request authors: 1
- Average comments per issue: 1.61
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- maxrmorrison (7)
- lvZic (2)
- pranavmalikk (2)
- jimkleiber (1)
- NicoCaldo (1)
- naitian (1)
- carodb (1)
- YuCaIb (1)
- SlistInc (1)
- aneesh3397 (1)
- aaron-jencks (1)
- AlexJustRandom (1)
Pull Request Authors
- NathanPruyne (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 156 last-month
- Total dependent packages: 1
- Total dependent repositories: 1
- Total versions: 6
- Total maintainers: 1
pypi.org: pyfoal
Python forced aligner
- Homepage: https://github.com/maxrmorrison/pyfoal
- Documentation: https://pyfoal.readthedocs.io/
- License: MIT
-
Latest release: 1.0.1
published almost 2 years ago
Rankings
Maintainers (1)
Dependencies
- g2p_en *
- pypar *
- resampy *
- soundfile *