psola

Pitch-shifting and time-stretching with TD-PSOLA

https://github.com/maxrmorrison/psola

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.9%) to scientific vocabulary

Keywords

pitch-shifting psola speech tdpsola time-stretching
Last synced: 6 months ago · JSON representation ·

Repository

Pitch-shifting and time-stretching with TD-PSOLA

Basic Info
  • Host: GitHub
  • Owner: maxrmorrison
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 46.9 KB
Statistics
  • Stars: 83
  • Watchers: 3
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Topics
pitch-shifting psola speech tdpsola time-stretching
Created over 5 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

Time-domain pitch-synchronous overlap-add (TD-PSOLA)

[![PyPI](https://img.shields.io/pypi/v/psola.svg)](https://pypi.python.org/pypi/psola) [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0) [![Downloads](https://static.pepy.tech/badge/psola)](https://pepy.tech/project/psola)

This module is for constant- and variable-rate pitch-shifting and time-stretching of speech. It is a wrapper around the parselmouth [1] wrapper around the Praat [2] implementation of TD-PSOLA [3]. Pitch-shifting is performed by providing a numpy array of target pitch values equally spaced over time. Variable-rate time stretching uses forced phoneme alignment via pypar.

If you need to extract pitch features or phoneme alignments, see penn for pitch estimation and pyfoal for forced alignment. If you only want to perform pitch-shifting, you do not need to extract forced alignments. If you want to do variable-rate time stretching, you do not need to perform pitch estimation.

Installation

pip install psola

Usage

If you want to perform pitch-shifting or time-stretching on audio already loaded into memory, use psola.vocode. If you want to do this with audio saved in a file, use psola.from_file. You can use psola.to_file or psola.from_file_to_file to save the results to a file. To process many files at once with multiprocessing, use psola.from_files_to_files. Each of these functions is documented below. The command-line interface wraps the arguments of psola.from_files_to_files and is described in the next section.

psola.vocode

```python """Performs pitch vocoding using Praat

Arguments audio : np.array(shape=(samples,)) The speech signal to process samplerate : int The audio sampling rate. sourcealignment : pypar.Alignment The current alignment if performing time-stretching targetalignment : pypar.Alignment The target alignment if performing time-stretching targetpitch : np.array(shape=(frames,)) The target pitch contour constant_stretch : float or None A constant value for time-stretching fmin : int The minimum allowable frequency in Hz. fmax : int The maximum allowable frequency in Hz.

Returns audio : np.array(shape=(samples,)) The vocoded audio """ ```

psola.from_file

```python """Performs vocoding using Praat

Arguments audiofile : string The file containing the speech signal to process sourcealignmentfile : string or None The file containing the original alignment targetalignmentfile : string or None The file containing the target alignment targetpitchfile : string or None The file containing the target pitch constantstretch : float or None A constant value for time-stretching fmin : int The minimum allowable frequency in Hz. fmax : int The maximum allowable frequency in Hz.

Returns audio : np.array(shape=(samples,)) The vocoded audio sample_rate : int The audio sampling rate """ ```

psola.to_file

```python """Performs pitch vocoding and saves audio to disk

Arguments audio : np.array(shape=(samples,)) The speech signal to process samplerate : int The audio sampling rate outputfile : string The file to save the vocoded speech sourcealignment : pypar.Alignment The current alignment if performing time-stretching targetalignment : pypar.Alignment The target alignment if performing time-stretching targetpitch : np.array(shape=(frames,)) The target pitch contour constantstretch : float or None A constant value for time-stretching fmin : int The minimum allowable frequency in Hz. fmax : int The maximum allowable frequency in Hz. """ ```

psola.from_file_to_file

```python """Performs vocoding using Praat and save to disk

Arguments audiofile : string The file containing the speech signal to process outputfile : string The file to save the vocoded speech sourcealignmentfile : string or None The file containing the original alignment targetalignmentfile : string or None The file containing the target alignment targetpitchfile : string or None The file containing the target pitch constant_stretch : float or None A constant value for time-stretching fmin : int The minimum allowable frequency in Hz. fmax : int The maximum allowable frequency in Hz. """ ```

psola.from_files_to_files

```python """Performs vocoding using Praat and save to disk

Arguments audiofiles : list The files containing the speech signals to process outputfiles : list The files to save the vocoded speech sourcealignmentfiles : string or None The files containing the original alignments targetalignmentfiles : list or None The files containing the target alignments targetpitchfiles : list or None The files containing the target pitch constant_stretch : float or None A constant value for time-stretching fmin : int The minimum allowable frequency in Hz. fmax : int The maximum allowable frequency in Hz. """ ```

Command-line interface

``` usage: python -m psola [-h] [--audiofiles AUDIOFILES [AUDIOFILES ...]] [--sourcealignmentfiles SOURCEALIGNMENTFILES [SOURCEALIGNMENTFILES ...]] [--targetalignmentfiles TARGETALIGNMENTFILES [TARGETALIGNMENTFILES ...]] [--constantstretch CONSTANTSTRETCH] [--targetpitchfiles TARGETPITCHFILES [TARGETPITCHFILES ...]] [--fmin FMIN] [--fmax FMAX] [--outputfiles OUTPUTFILES [OUTPUTFILES ...]]

optional arguments: -h, --help show this help message and exit --audiofiles AUDIOFILES [AUDIOFILES ...] The speech signal to process --sourcealignmentfiles SOURCEALIGNMENTFILES [SOURCEALIGNMENTFILES ...] The files containing the original alignments --targetalignmentfiles TARGETALIGNMENTFILES [TARGETALIGNMENTFILES ...] The files containing the target alignments --constantstretch CONSTANTSTRETCH A constant value for time-stretching --targetpitchfiles TARGETPITCHFILES [TARGETPITCHFILES ...] The target pitch contour --fmin FMIN The minimum allowable frequency in Hz --fmax FMAX The maximum allowable frequency in Hz --outputfiles OUTPUTFILES [OUTPUTFILES ...] Where to save the vocoded audio ```

References

[1] Y. Jadoul, B. Thompson, and B. De Boer, "Introducing parselmouth: A python interface to praat," Journal of Phonetics, vol. 71, pp. 1–15, 2018.

[2] P. Boersma, "Praat: doing phonetics by computer", http://www.praat.org/, 2006.

[3] E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones," Speech communication, 1990.

Owner

  • Name: Max Morrison
  • Login: maxrmorrison
  • Kind: user

Computer Science PhD student at Northwestern University researching machine learning and audio technology

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it using the following metadata."
authors:
- family-names: "Morrison"
  given-names: "Max"
title: "psola"
version: 0.0.1
date-released: 2021-03-31
url: "https://github.com/maxrmorrison/psola"

GitHub Events

Total
  • Watch event: 9
Last Year
  • Watch event: 9

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 17
  • Total Committers: 1
  • Avg Commits per committer: 17.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Max Morrison m****n@g****m 17

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 2,664 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 6
  • Total versions: 1
  • Total maintainers: 1
pypi.org: psola

Time-domain pitch-synchronous overlap-add

  • Versions: 1
  • Dependent Packages: 1
  • Dependent Repositories: 6
  • Downloads: 2,664 Last month
Rankings
Dependent repos count: 6.0%
Stargazers count: 8.3%
Dependent packages count: 10.0%
Downloads: 10.7%
Average: 10.8%
Forks count: 19.1%
Maintainers (1)
Last synced: 7 months ago

Dependencies

setup.py pypi
  • numpy *
  • praat-parselmouth *
  • pypar *
  • soundfile *
  • tqdm *