diffsptk

A differentiable version of SPTK

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 3 committers (33.3%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary

Keywords

cepstrum cqt ddsp deep-learning digital-signal-processing dsp gmm k-means lpc lsp mdct mfcc nmf plp pqmf python pytorch signal-processing sptk stft

Last synced: 6 months ago · JSON representation ·

Repository

A differentiable version of SPTK

Basic Info

Host: GitHub
Owner: sp-nitech
License: apache-2.0
Language: Python
Default Branch: master
Homepage: http://sp-tk.sourceforge.net
Size: 1.76 MB

Statistics

Stars: 190
Watchers: 8
Forks: 19
Open Issues: 0
Releases: 26

Topics

cepstrum cqt ddsp deep-learning digital-signal-processing dsp gmm k-means lpc lsp mdct mfcc nmf plp pqmf python pytorch signal-processing sptk stft

Created almost 4 years ago · Last pushed 6 months ago

Metadata Files

Readme License Code of conduct Citation

diffsptk

diffsptk is a differentiable version of SPTK based on the PyTorch framework.

Requirements

Python 3.10+
PyTorch 2.3.1+

Documentation

See this page for the reference manual.
Our paper is available on the ISCA Archive.

Installation

The latest stable release can be installed through PyPI by running

sh pip install diffsptk

The development release can be installed from the master branch:

sh pip install git+https://github.com/sp-nitech/diffsptk.git@master

Examples

Running on a GPU

```python import diffsptk

stftparams = {"framelength": 400, "frameperiod": 80, "fftlength": 512}

Read waveform.

x, sr = diffsptk.read("assets/data.wav", device="cuda")

Compute spectrogram using a nn.Module class.

X1 = diffsptk.STFT(**stft_params, device="cuda")(x)

Compute spectrogram using a functional method.

X2 = diffsptk.functional.stft(x, **stft_params)

print(X1.allclose(X2)) ```

Mel-cepstral analysis and synthesis

```python import diffsptk

fl = 400 # Frame length. fp = 80 # Frame period. n_fft = 512 # FFT length. M = 24 # Mel-cepstrum dimensions.

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Compute STFT amplitude of x.

stft = diffsptk.STFT(framelength=fl, frameperiod=fp, fftlength=nfft) X = stft(x)

Estimate mel-cepstrum of x.

alpha = diffsptk.getalpha(sr) mcep = diffsptk.MelCepstralAnalysis( fftlength=nfft, ceporder=M, alpha=alpha, n_iter=10, ) mc = mcep(X)

Reconstruct x.

mlsa = diffsptk.MLSA(filterorder=M, frameperiod=fp, alpha=alpha, taylororder=20) xhat = mlsa(mlsa(x, -mc), mc)

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error)

Extract pitch of x.

pitch = diffsptk.Pitch( frameperiod=fp, samplerate=sr, fmin=80, fmax=180, voicingthreshold=0.4, outformat="pitch", ) p = pitch(x)

Generate excitation signal.

excite = diffsptk.ExcitationGeneration(frame_period=fp) e = excite(p) n = diffsptk.nrand(x.size(0) - 1)

Synthesize waveform.

xvoiced = mlsa(e, mc) xunvoiced = mlsa(n, mc)

Output analysis-synthesis result.

diffsptk.write("voiced.wav", xvoiced, sr) diffsptk.write("unvoiced.wav", xunvoiced, sr) ```

WORLD analysis and synthesis

```python import diffsptk

fp = 80 # Frame period. n_fft = 1024 # FFT length.

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Extract F0 of x, or prepare well-estimated F0.

pitch = diffsptk.Pitch( frameperiod=fp, samplerate=sr, fmin=80, fmax=180, voicingthreshold=0.4, outformat="f0", ) f0 = pitch(x)

Extract aperiodicity of x by D4C.

ap = diffsptk.Aperiodicity( frameperiod=fp, samplerate=sr, fftlength=nfft, algorithm="d4c", out_format="a", ) A = ap(x, f0)

Extract spectral envelope of x by CheapTrick.

pitchspec = diffsptk.PitchAdaptiveSpectralAnalysis( frameperiod=fp, samplerate=sr, fftlength=nfft, algorithm="cheap-trick", outformat="power", ) S = pitch_spec(x, f0)

Reconstruct x.

worldsynth = diffsptk.WorldSynthesis( frameperiod=fp, samplerate=sr, fftlength=nfft, ) xhat = world_synth(f0, A, S)

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

LPC analysis and synthesis

```python import diffsptk

fl = 400 # Frame length. fp = 80 # Frame period. M = 24 # LPC dimensions.

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Estimate LPC of x.

frame = diffsptk.Frame(framelength=fl, frameperiod=fp) window = diffsptk.Window(inlength=fl) lpc = diffsptk.LPC(framelength=fl, lpc_order=M, eps=1e-5) a = lpc(window(frame(x)))

Convert to inverse filter coefficients.

norm0 = diffsptk.AllPoleToAllZeroDigitalFilterCoefficients(filter_order=M) b = norm0(a)

Reconstruct x.

zerodf = diffsptk.AllZeroDigitalFilter(filterorder=M, frameperiod=fp) poledf = diffsptk.AllPoleDigitalFilter(filterorder=M, frameperiod=fp) x_hat = poledf(zerodf(x, b), a)

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Mel spectrogram analysis and synthesis

```python import diffsptk

fl = 400 # Frame length. fp = 80 # Frame period. nfft = 512 # FFT length. nchannel = 128 # Number of channels.

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Compute STFT amplitude of x.

stft = diffsptk.STFT(framelength=fl, frameperiod=fp, fftlength=nfft) X = stft(x)

Extract log-mel spectrogram.

fbank = diffsptk.FBANK( fftlength=nfft, nchannel=nchannel, sample_rate=sr, ) Y = fbank(X)

Reconstruct linear spectrogram.

ifbank = diffsptk.IFBANK( nchannel=nchannel, fftlength=nfft, samplerate=sr, ) Xhat = ifbank(Y)

Reconstruct x.

griffin = diffsptk.GriffinLim( framelength=fl, frameperiod=fp, fftlength=nfft, ) xhat = griffin(Xhat, out_length=x.size(0))

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Subband decomposition

```python import diffsptk

K = 4 # Number of subbands. M = 40 # Order of filter.

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Decompose x.

pqmf = diffsptk.PQMF(K, M) decimate = diffsptk.Decimation(K) y = decimate(pqmf(x))

Reconstruct x.

interpolate = diffsptk.Interpolation(K) ipqmf = diffsptk.IPQMF(K, M) x_hat = ipqmf(interpolate(K * y)).reshape(-1)

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Gammatone filter bank analysis and synthesis

```python import diffsptk

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Decompose x.

gammatone = diffsptk.GammatoneFilterBankAnalysis(sr) y = gammatone(x)

Reconstruct x.

igammatone = diffsptk.GammatoneFilterBankSynthesis(sr) x_hat = igammatone(y).reshape(-1)

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Fractional octave band analysis and synthesis

```python import diffsptk

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Decompose x.

oband = diffsptk.FractionalOctaveBandAnalysis(sr) y = oband(x)

Reconstruct x.

x_hat = y.sum(1).reshape(-1)

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Constant-Q transform

```python import diffsptk import librosa # This is to get sample audio.

fp = 128 # Frame period. K = 252 # Number of CQ-bins. B = 36 # Number of bins per octave.

Read waveform.

x, sr = diffsptk.read(librosa.ex("trumpet"))

Transform x.

cqt = diffsptk.CQT(fp, sr, nbin=K, nbinperoctave=B) c = cqt(x)

Reconstruct x.

icqt = diffsptk.ICQT(fp, sr, nbin=K, nbinperoctave=B) xhat = icqt(c, outlength=x.size(0))

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Modified discrete cosine transform

```python import diffsptk

fl = 512 # Frame length.

Read waveform.

x, sr = diffsptk.read("assets/data.wav")

Transform x.

mdct = diffsptk.MDCT(fl) c = mdct(x)

Reconstruct x.

imdct = diffsptk.IMDCT(fl) xhat = imdct(c, outlength=x.size(0))

Write reconstructed waveform.

diffsptk.write("reconst.wav", x_hat, sr)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

Vector quantization

```python import diffsptk

K = 2 # Codebook size. M = 4 # Order of vector.

Prepare input.

x = diffsptk.nrand(M)

Quantize x.

vq = diffsptk.VectorQuantization(M, K) xhat, indices, commitmentloss = vq(x)

Compute error.

error = (x_hat - x).abs().sum() print(error) ```

License

This software is released under the Apache License 2.0.

Citation

bibtex @InProceedings{sp-nitech2023sptk, author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda}, title = {{SPTK4}: An open-source software toolkit for speech signal processing}, booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)}, pages = {211--217}, year = {2023}, }

Owner

Name: sp-nitech
Login: sp-nitech
Kind: organization

Repositories: 3
Profile: https://github.com/sp-nitech

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "SPTK Working Group"
title: "diffsptk"
url: "https://github.com/sp-nitech/diffsptk"
preferred-citation:
  type: conference-paper
  title: "SPTK4: An open-source software toolkit for speech signal processing"
  authors:
    - family-names: "Yoshimura"
      given-names: "Takenori"
    - family-names: "Fujimoto"
      given-names: "Takato"
    - family-names: "Oura"
      given-names: "Keiichiro"
    - family-names: "Tokuda"
      given-names: "Keiichi"
  collection-title: "12th ISCA Speech Synthesis Workshop"
  collection-type: proceedings
  doi: 10.21437/SSW.2023-33
  month: 8
  start: 211
  end: 217
  year: 2023
  url: "https://www.isca-archive.org/ssw_2023/yoshimura23_ssw.pdf"

GitHub Events

Total

Create event: 41
Release event: 10
Issues event: 15
Watch event: 24
Delete event: 28
Issue comment event: 48
Push event: 145
Pull request review comment event: 2
Pull request event: 69
Fork event: 4

Last Year

Create event: 41
Release event: 10
Issues event: 15
Watch event: 24
Delete event: 28
Issue comment event: 48
Push event: 145
Pull request review comment event: 2
Pull request event: 69
Fork event: 4

Committers

Last synced: 9 months ago

All Time

Total Commits: 754
Total Committers: 3
Avg Commits per committer: 251.333
Development Distribution Score (DDS): 0.013

Past Year

Commits: 269
Committers: 2
Avg Commits per committer: 134.5
Development Distribution Score (DDS): 0.004

Top Committers

Name	Email	Commits
takenori-y	t**4@g**m	744
Chin-Yun Yu	c**u@q**k	9
tan90xx	5**3@q**m	1

Committer Domains (Top 20 + Academic)

qq.com: 1 qmul.ac.uk: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 13
Total pull requests: 191
Average time to close issues: 9 days
Average time to close pull requests: 1 day
Total issue authors: 6
Total pull request authors: 3
Average comments per issue: 2.15
Average comments per pull request: 0.93
Merged pull requests: 172
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 8
Pull requests: 76
Average time to close issues: 3 days
Average time to close pull requests: 1 day
Issue authors: 4
Pull request authors: 2
Average comments per issue: 2.13
Average comments per pull request: 0.89
Merged pull requests: 65
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

tensorsofthewall (4)
yoyololicon (3)
turian (2)
Simon-Grome (2)
bagustris (1)
tan90xx (1)

Pull Request Authors

takenori-y (199)
tan90xx (2)
yoyololicon (1)

Top Labels

Issue Labels

enhancement (3) question (3) bug (1) documentation (1)

Pull Request Labels

enhancement (119) maintenance (48) bug (13) refactoring (7) invalid (1)

Packages

Total packages: 1
Total downloads:
- pypi 768 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 26
Total maintainers: 1

pypi.org: diffsptk

Speech signal processing modules for machine learning

Homepage: https://sp-tk.sourceforge.net/
Documentation: https://sp-nitech.github.io/diffsptk/latest/
License: Apache 2.0
Latest release: 3.3.1
published 7 months ago

Versions: 26
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 768 Last month

Rankings

Stargazers count: 6.2%

Dependent packages count: 10.1%

Forks count: 10.2%

Average: 12.4%

Downloads: 14.0%

Dependent repos count: 21.6%

Maintainers (1)

takenori-y

Last synced: 6 months ago

Dependencies

.github/workflows/ci.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
codecov/codecov-action v3 composite

.github/workflows/stale.yml actions

actions/stale v8 composite

pyproject.toml pypi

numpy *
soundfile *
torch >= 1.11.0
torchcrepe >= 0.0.16, <= 0.0.18
torchlpc >= 0.2.0
vector-quantize-pytorch >= 0.8.0

diffsptk

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

diffsptk

Requirements

Documentation

Installation

Examples

Running on a GPU

Read waveform.

Compute spectrogram using a nn.Module class.

Compute spectrogram using a functional method.

Mel-cepstral analysis and synthesis

Read waveform.

Compute STFT amplitude of x.

Estimate mel-cepstrum of x.

Reconstruct x.

Write reconstructed waveform.

Compute error.

Extract pitch of x.

Generate excitation signal.

Synthesize waveform.

Output analysis-synthesis result.

WORLD analysis and synthesis

Read waveform.

Extract F0 of x, or prepare well-estimated F0.

Extract aperiodicity of x by D4C.

Extract spectral envelope of x by CheapTrick.

Reconstruct x.

Write reconstructed waveform.

Compute error.

LPC analysis and synthesis

Read waveform.

Estimate LPC of x.

Convert to inverse filter coefficients.

Reconstruct x.

Write reconstructed waveform.

Compute error.

Mel spectrogram analysis and synthesis

Read waveform.

Compute STFT amplitude of x.

Extract log-mel spectrogram.

Reconstruct linear spectrogram.

Reconstruct x.

Write reconstructed waveform.

Compute error.

Subband decomposition

Read waveform.

Decompose x.

Reconstruct x.

Write reconstructed waveform.

Compute error.

Gammatone filter bank analysis and synthesis

Read waveform.

Decompose x.

Reconstruct x.

Write reconstructed waveform.

Compute error.

Fractional octave band analysis and synthesis

Read waveform.

Decompose x.

Reconstruct x.

Write reconstructed waveform.

Compute error.

Constant-Q transform

Read waveform.

Transform x.

Reconstruct x.

Write reconstructed waveform.

Compute error.

Modified discrete cosine transform

Read waveform.

Transform x.

Reconstruct x.