diffsptk

A differentiable version of SPTK

https://github.com/sp-nitech/diffsptk

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.2%) to scientific vocabulary

Keywords

cepstrum cqt ddsp deep-learning digital-signal-processing dsp gmm k-means lpc lsp mdct mfcc nmf plp pqmf python pytorch signal-processing sptk stft
Last synced: 6 months ago · JSON representation ·

Repository

A differentiable version of SPTK

Basic Info
  • Host: GitHub
  • Owner: sp-nitech
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage: http://sp-tk.sourceforge.net
  • Size: 1.76 MB
Statistics
  • Stars: 190
  • Watchers: 8
  • Forks: 19
  • Open Issues: 0
  • Releases: 26
Topics
cepstrum cqt ddsp deep-learning digital-signal-processing dsp gmm k-means lpc lsp mdct mfcc nmf plp pqmf python pytorch signal-processing sptk stft
Created almost 4 years ago · Last pushed 6 months ago
Metadata Files
Readme License Code of conduct Citation

README.md

diffsptk

diffsptk is a differentiable version of SPTK based on the PyTorch framework.

Manual Downloads ClickPy Python Version PyTorch Version PyPI Version Codecov License GitHub Actions Ruff

Requirements

  • Python 3.10+
  • PyTorch 2.3.1+

Documentation

  • See this page for the reference manual.
  • Our paper is available on the ISCA Archive.

Installation

The latest stable release can be installed through PyPI by running

sh pip install diffsptk

The development release can be installed from the master branch:

sh pip install git+https://github.com/sp-nitech/diffsptk.git@master

Examples

Running on a GPU

```python import diffsptk

stftparams = {"framelength": 400, "frameperiod": 80, "fftlength": 512}

Read waveform.

x, sr = diffsptk.read("assets/data.wav", device="cuda")

Compute spectrogram using a nn.Module class.

X1 = diffsptk.STFT(**stft_params, device="cuda")(x)

Compute spectrogram using a functional method.

X2 = diffsptk.functional.stft(x, **stft_params)

print(X1.allclose(X2)) ```

Mel-cepstral analysis and synthesis

```python import diffsptk

fl = 400 # Frame length. fp = 80 # Frame period. n_fft = 512 # FFT length. M = 24 # Mel-cepstrum dimensions.

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Compute STFT amplitude of x.

stft = diffsptk.STFT(framelength=fl, frameperiod=fp, fftlength=nfft) X = stft(x)

Estimate mel-cepstrum of x.

alpha = diffsptk.getalpha(sr) mcep = diffsptk.MelCepstralAnalysis( fftlength=nfft, ceporder=M, alpha=alpha, n_iter=10, ) mc = mcep(X)

Reconstruct x.

mlsa = diffsptk.MLSA(filterorder=M, frameperiod=fp, alpha=alpha, taylororder=20) xhat = mlsa(mlsa(x, -mc), mc)

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error)

Extract pitch of x.

pitch = diffsptk.Pitch( frameperiod=fp, samplerate=sr, fmin=80, fmax=180, voicingthreshold=0.4, outformat="pitch", ) p = pitch(x)

Generate excitation signal.

excite = diffsptk.ExcitationGeneration(frame_period=fp) e = excite(p) n = diffsptk.nrand(x.size(0) - 1)

Synthesize waveform.

xvoiced = mlsa(e, mc) xunvoiced = mlsa(n, mc)

Output analysis-synthesis result.

diffsptk.write("voiced.wav", xvoiced, sr) diffsptk.write("unvoiced.wav", xunvoiced, sr) ```

WORLD analysis and synthesis

```python import diffsptk

fp = 80 # Frame period. n_fft = 1024 # FFT length.

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Extract F0 of x, or prepare well-estimated F0.

pitch = diffsptk.Pitch( frameperiod=fp, samplerate=sr, fmin=80, fmax=180, voicingthreshold=0.4, outformat="f0", ) f0 = pitch(x)

Extract aperiodicity of x by D4C.

ap = diffsptk.Aperiodicity( frameperiod=fp, samplerate=sr, fftlength=nfft, algorithm="d4c", out_format="a", ) A = ap(x, f0)

Extract spectral envelope of x by CheapTrick.

pitchspec = diffsptk.PitchAdaptiveSpectralAnalysis( frameperiod=fp, samplerate=sr, fftlength=nfft, algorithm="cheap-trick", outformat="power", ) S = pitch_spec(x, f0)

Reconstruct x.

worldsynth = diffsptk.WorldSynthesis( frameperiod=fp, samplerate=sr, fftlength=nfft, ) xhat = world_synth(f0, A, S)

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

LPC analysis and synthesis

```python import diffsptk

fl = 400 # Frame length. fp = 80 # Frame period. M = 24 # LPC dimensions.

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Estimate LPC of x.

frame = diffsptk.Frame(framelength=fl, frameperiod=fp) window = diffsptk.Window(inlength=fl) lpc = diffsptk.LPC(framelength=fl, lpc_order=M, eps=1e-5) a = lpc(window(frame(x)))

Convert to inverse filter coefficients.

norm0 = diffsptk.AllPoleToAllZeroDigitalFilterCoefficients(filter_order=M) b = norm0(a)

Reconstruct x.

zerodf = diffsptk.AllZeroDigitalFilter(filterorder=M, frameperiod=fp) poledf = diffsptk.AllPoleDigitalFilter(filterorder=M, frameperiod=fp) x_hat = poledf(zerodf(x, b), a)

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Mel spectrogram analysis and synthesis

```python import diffsptk

fl = 400 # Frame length. fp = 80 # Frame period. nfft = 512 # FFT length. nchannel = 128 # Number of channels.

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Compute STFT amplitude of x.

stft = diffsptk.STFT(framelength=fl, frameperiod=fp, fftlength=nfft) X = stft(x)

Extract log-mel spectrogram.

fbank = diffsptk.FBANK( fftlength=nfft, nchannel=nchannel, sample_rate=sr, ) Y = fbank(X)

Reconstruct linear spectrogram.

ifbank = diffsptk.IFBANK( nchannel=nchannel, fftlength=nfft, samplerate=sr, ) Xhat = ifbank(Y)

Reconstruct x.

griffin = diffsptk.GriffinLim( framelength=fl, frameperiod=fp, fftlength=nfft, ) xhat = griffin(Xhat, out_length=x.size(0))

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Subband decomposition

```python import diffsptk

K = 4 # Number of subbands. M = 40 # Order of filter.

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Decompose x.

pqmf = diffsptk.PQMF(K, M) decimate = diffsptk.Decimation(K) y = decimate(pqmf(x))

Reconstruct x.

interpolate = diffsptk.Interpolation(K) ipqmf = diffsptk.IPQMF(K, M) x_hat = ipqmf(interpolate(K * y)).reshape(-1)

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Gammatone filter bank analysis and synthesis

```python import diffsptk

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Decompose x.

gammatone = diffsptk.GammatoneFilterBankAnalysis(sr) y = gammatone(x)

Reconstruct x.

igammatone = diffsptk.GammatoneFilterBankSynthesis(sr) x_hat = igammatone(y).reshape(-1)

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Fractional octave band analysis and synthesis

```python import diffsptk

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Decompose x.

oband = diffsptk.FractionalOctaveBandAnalysis(sr) y = oband(x)

Reconstruct x.

x_hat = y.sum(1).reshape(-1)

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Constant-Q transform

```python import diffsptk import librosa # This is to get sample audio.

fp = 128 # Frame period. K = 252 # Number of CQ-bins. B = 36 # Number of bins per octave.

Read waveform.

x, sr = diffsptk.read(librosa.ex("trumpet"))

Transform x.

cqt = diffsptk.CQT(fp, sr, nbin=K, nbinperoctave=B) c = cqt(x)

Reconstruct x.

icqt = diffsptk.ICQT(fp, sr, nbin=K, nbinperoctave=B) xhat = icqt(c, outlength=x.size(0))

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Modified discrete cosine transform

```python import diffsptk

fl = 512 # Frame length.

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Transform x.

mdct = diffsptk.MDCT(fl) c = mdct(x)

Reconstruct x.

imdct = diffsptk.IMDCT(fl) xhat = imdct(c, outlength=x.size(0))

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Vector quantization

```python import diffsptk

K = 2 # Codebook size. M = 4 # Order of vector.

Prepare input.

x = diffsptk.nrand(M)

Quantize x.

vq = diffsptk.VectorQuantization(M, K) xhat, indices, commitmentloss = vq(x)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

License

This software is released under the Apache License 2.0.

Citation

bibtex @InProceedings{sp-nitech2023sptk, author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda}, title = {{SPTK4}: An open-source software toolkit for speech signal processing}, booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)}, pages = {211--217}, year = {2023}, }

Owner

  • Name: sp-nitech
  • Login: sp-nitech
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "SPTK Working Group"
title: "diffsptk"
url: "https://github.com/sp-nitech/diffsptk"
preferred-citation:
  type: conference-paper
  title: "SPTK4: An open-source software toolkit for speech signal processing"
  authors:
    - family-names: "Yoshimura"
      given-names: "Takenori"
    - family-names: "Fujimoto"
      given-names: "Takato"
    - family-names: "Oura"
      given-names: "Keiichiro"
    - family-names: "Tokuda"
      given-names: "Keiichi"
  collection-title: "12th ISCA Speech Synthesis Workshop"
  collection-type: proceedings
  doi: 10.21437/SSW.2023-33
  month: 8
  start: 211
  end: 217
  year: 2023
  url: "https://www.isca-archive.org/ssw_2023/yoshimura23_ssw.pdf"

GitHub Events

Total
  • Create event: 41
  • Release event: 10
  • Issues event: 15
  • Watch event: 24
  • Delete event: 28
  • Issue comment event: 48
  • Push event: 145
  • Pull request review comment event: 2
  • Pull request event: 69
  • Fork event: 4
Last Year
  • Create event: 41
  • Release event: 10
  • Issues event: 15
  • Watch event: 24
  • Delete event: 28
  • Issue comment event: 48
  • Push event: 145
  • Pull request review comment event: 2
  • Pull request event: 69
  • Fork event: 4

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 754
  • Total Committers: 3
  • Avg Commits per committer: 251.333
  • Development Distribution Score (DDS): 0.013
Past Year
  • Commits: 269
  • Committers: 2
  • Avg Commits per committer: 134.5
  • Development Distribution Score (DDS): 0.004
Top Committers
Name Email Commits
takenori-y t****4@g****m 744
Chin-Yun Yu c****u@q****k 9
tan90xx 5****3@q****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 13
  • Total pull requests: 191
  • Average time to close issues: 9 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 6
  • Total pull request authors: 3
  • Average comments per issue: 2.15
  • Average comments per pull request: 0.93
  • Merged pull requests: 172
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 8
  • Pull requests: 76
  • Average time to close issues: 3 days
  • Average time to close pull requests: 1 day
  • Issue authors: 4
  • Pull request authors: 2
  • Average comments per issue: 2.13
  • Average comments per pull request: 0.89
  • Merged pull requests: 65
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tensorsofthewall (4)
  • yoyololicon (3)
  • turian (2)
  • Simon-Grome (2)
  • bagustris (1)
  • tan90xx (1)
Pull Request Authors
  • takenori-y (199)
  • tan90xx (2)
  • yoyololicon (1)
Top Labels
Issue Labels
enhancement (3) question (3) bug (1) documentation (1)
Pull Request Labels
enhancement (119) maintenance (48) bug (13) refactoring (7) invalid (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 768 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 26
  • Total maintainers: 1
pypi.org: diffsptk

Speech signal processing modules for machine learning

  • Versions: 26
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 768 Last month
Rankings
Stargazers count: 6.2%
Dependent packages count: 10.1%
Forks count: 10.2%
Average: 12.4%
Downloads: 14.0%
Dependent repos count: 21.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
.github/workflows/stale.yml actions
  • actions/stale v8 composite
pyproject.toml pypi
  • numpy *
  • soundfile *
  • torch >= 1.11.0
  • torchcrepe >= 0.0.16, <= 0.0.18
  • torchlpc >= 0.2.0
  • vector-quantize-pytorch >= 0.8.0