Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Keywords
Repository
A differentiable version of SPTK
Basic Info
- Host: GitHub
- Owner: sp-nitech
- License: apache-2.0
- Language: Python
- Default Branch: master
- Homepage: http://sp-tk.sourceforge.net
- Size: 1.76 MB
Statistics
- Stars: 190
- Watchers: 8
- Forks: 19
- Open Issues: 0
- Releases: 26
Topics
Metadata Files
README.md
diffsptk
diffsptk is a differentiable version of SPTK based on the PyTorch framework.
Requirements
- Python 3.10+
- PyTorch 2.3.1+
Documentation
Installation
The latest stable release can be installed through PyPI by running
sh
pip install diffsptk
The development release can be installed from the master branch:
sh
pip install git+https://github.com/sp-nitech/diffsptk.git@master
Examples
Running on a GPU
```python import diffsptk
stftparams = {"framelength": 400, "frameperiod": 80, "fftlength": 512}
Read waveform.
x, sr = diffsptk.read("assets/data.wav", device="cuda")
Compute spectrogram using a nn.Module class.
X1 = diffsptk.STFT(**stft_params, device="cuda")(x)
Compute spectrogram using a functional method.
X2 = diffsptk.functional.stft(x, **stft_params)
print(X1.allclose(X2)) ```
Mel-cepstral analysis and synthesis
```python import diffsptk
fl = 400 # Frame length. fp = 80 # Frame period. n_fft = 512 # FFT length. M = 24 # Mel-cepstrum dimensions.
Read waveform.
x, sr = diffsptk.read("assets/data.wav")
Compute STFT amplitude of x.
stft = diffsptk.STFT(framelength=fl, frameperiod=fp, fftlength=nfft) X = stft(x)
Estimate mel-cepstrum of x.
alpha = diffsptk.getalpha(sr) mcep = diffsptk.MelCepstralAnalysis( fftlength=nfft, ceporder=M, alpha=alpha, n_iter=10, ) mc = mcep(X)
Reconstruct x.
mlsa = diffsptk.MLSA(filterorder=M, frameperiod=fp, alpha=alpha, taylororder=20) xhat = mlsa(mlsa(x, -mc), mc)
Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)
Compute error.
error = (x_hat - x).abs().sum() print(error)
Extract pitch of x.
pitch = diffsptk.Pitch( frameperiod=fp, samplerate=sr, fmin=80, fmax=180, voicingthreshold=0.4, outformat="pitch", ) p = pitch(x)
Generate excitation signal.
excite = diffsptk.ExcitationGeneration(frame_period=fp) e = excite(p) n = diffsptk.nrand(x.size(0) - 1)
Synthesize waveform.
xvoiced = mlsa(e, mc) xunvoiced = mlsa(n, mc)
Output analysis-synthesis result.
diffsptk.write("voiced.wav", xvoiced, sr) diffsptk.write("unvoiced.wav", xunvoiced, sr) ```
WORLD analysis and synthesis
```python import diffsptk
fp = 80 # Frame period. n_fft = 1024 # FFT length.
Read waveform.
x, sr = diffsptk.read("assets/data.wav")
Extract F0 of x, or prepare well-estimated F0.
pitch = diffsptk.Pitch( frameperiod=fp, samplerate=sr, fmin=80, fmax=180, voicingthreshold=0.4, outformat="f0", ) f0 = pitch(x)
Extract aperiodicity of x by D4C.
ap = diffsptk.Aperiodicity( frameperiod=fp, samplerate=sr, fftlength=nfft, algorithm="d4c", out_format="a", ) A = ap(x, f0)
Extract spectral envelope of x by CheapTrick.
pitchspec = diffsptk.PitchAdaptiveSpectralAnalysis( frameperiod=fp, samplerate=sr, fftlength=nfft, algorithm="cheap-trick", outformat="power", ) S = pitch_spec(x, f0)
Reconstruct x.
worldsynth = diffsptk.WorldSynthesis( frameperiod=fp, samplerate=sr, fftlength=nfft, ) xhat = world_synth(f0, A, S)
Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)
Compute error.
error = (x_hat - x).abs().sum() print(error) ```
LPC analysis and synthesis
```python import diffsptk
fl = 400 # Frame length. fp = 80 # Frame period. M = 24 # LPC dimensions.
Read waveform.
x, sr = diffsptk.read("assets/data.wav")
Estimate LPC of x.
frame = diffsptk.Frame(framelength=fl, frameperiod=fp) window = diffsptk.Window(inlength=fl) lpc = diffsptk.LPC(framelength=fl, lpc_order=M, eps=1e-5) a = lpc(window(frame(x)))
Convert to inverse filter coefficients.
norm0 = diffsptk.AllPoleToAllZeroDigitalFilterCoefficients(filter_order=M) b = norm0(a)
Reconstruct x.
zerodf = diffsptk.AllZeroDigitalFilter(filterorder=M, frameperiod=fp) poledf = diffsptk.AllPoleDigitalFilter(filterorder=M, frameperiod=fp) x_hat = poledf(zerodf(x, b), a)
Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)
Compute error.
error = (x_hat - x).abs().sum() print(error) ```
Mel spectrogram analysis and synthesis
```python import diffsptk
fl = 400 # Frame length. fp = 80 # Frame period. nfft = 512 # FFT length. nchannel = 128 # Number of channels.
Read waveform.
x, sr = diffsptk.read("assets/data.wav")
Compute STFT amplitude of x.
stft = diffsptk.STFT(framelength=fl, frameperiod=fp, fftlength=nfft) X = stft(x)
Extract log-mel spectrogram.
fbank = diffsptk.FBANK( fftlength=nfft, nchannel=nchannel, sample_rate=sr, ) Y = fbank(X)
Reconstruct linear spectrogram.
ifbank = diffsptk.IFBANK( nchannel=nchannel, fftlength=nfft, samplerate=sr, ) Xhat = ifbank(Y)
Reconstruct x.
griffin = diffsptk.GriffinLim( framelength=fl, frameperiod=fp, fftlength=nfft, ) xhat = griffin(Xhat, out_length=x.size(0))
Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)
Compute error.
error = (x_hat - x).abs().sum() print(error) ```
Subband decomposition
```python import diffsptk
K = 4 # Number of subbands. M = 40 # Order of filter.
Read waveform.
x, sr = diffsptk.read("assets/data.wav")
Decompose x.
pqmf = diffsptk.PQMF(K, M) decimate = diffsptk.Decimation(K) y = decimate(pqmf(x))
Reconstruct x.
interpolate = diffsptk.Interpolation(K) ipqmf = diffsptk.IPQMF(K, M) x_hat = ipqmf(interpolate(K * y)).reshape(-1)
Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)
Compute error.
error = (x_hat - x).abs().sum() print(error) ```
Gammatone filter bank analysis and synthesis
```python import diffsptk
Read waveform.
x, sr = diffsptk.read("assets/data.wav")
Decompose x.
gammatone = diffsptk.GammatoneFilterBankAnalysis(sr) y = gammatone(x)
Reconstruct x.
igammatone = diffsptk.GammatoneFilterBankSynthesis(sr) x_hat = igammatone(y).reshape(-1)
Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)
Compute error.
error = (x_hat - x).abs().sum() print(error) ```
Fractional octave band analysis and synthesis
```python import diffsptk
Read waveform.
x, sr = diffsptk.read("assets/data.wav")
Decompose x.
oband = diffsptk.FractionalOctaveBandAnalysis(sr) y = oband(x)
Reconstruct x.
x_hat = y.sum(1).reshape(-1)
Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)
Compute error.
error = (x_hat - x).abs().sum() print(error) ```
Constant-Q transform
```python import diffsptk import librosa # This is to get sample audio.
fp = 128 # Frame period. K = 252 # Number of CQ-bins. B = 36 # Number of bins per octave.
Read waveform.
x, sr = diffsptk.read(librosa.ex("trumpet"))
Transform x.
cqt = diffsptk.CQT(fp, sr, nbin=K, nbinperoctave=B) c = cqt(x)
Reconstruct x.
icqt = diffsptk.ICQT(fp, sr, nbin=K, nbinperoctave=B) xhat = icqt(c, outlength=x.size(0))
Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)
Compute error.
error = (x_hat - x).abs().sum() print(error) ```
Modified discrete cosine transform
```python import diffsptk
fl = 512 # Frame length.
Read waveform.
x, sr = diffsptk.read("assets/data.wav")
Transform x.
mdct = diffsptk.MDCT(fl) c = mdct(x)
Reconstruct x.
imdct = diffsptk.IMDCT(fl) xhat = imdct(c, outlength=x.size(0))
Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)
Compute error.
error = (x_hat - x).abs().sum() print(error) ```
Vector quantization
```python import diffsptk
K = 2 # Codebook size. M = 4 # Order of vector.
Prepare input.
x = diffsptk.nrand(M)
Quantize x.
vq = diffsptk.VectorQuantization(M, K) xhat, indices, commitmentloss = vq(x)
Compute error.
error = (x_hat - x).abs().sum() print(error) ```
License
This software is released under the Apache License 2.0.
Citation
bibtex
@InProceedings{sp-nitech2023sptk,
author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},
title = {{SPTK4}: An open-source software toolkit for speech signal processing},
booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)},
pages = {211--217},
year = {2023},
}
Owner
- Name: sp-nitech
- Login: sp-nitech
- Kind: organization
- Repositories: 3
- Profile: https://github.com/sp-nitech
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "SPTK Working Group"
title: "diffsptk"
url: "https://github.com/sp-nitech/diffsptk"
preferred-citation:
type: conference-paper
title: "SPTK4: An open-source software toolkit for speech signal processing"
authors:
- family-names: "Yoshimura"
given-names: "Takenori"
- family-names: "Fujimoto"
given-names: "Takato"
- family-names: "Oura"
given-names: "Keiichiro"
- family-names: "Tokuda"
given-names: "Keiichi"
collection-title: "12th ISCA Speech Synthesis Workshop"
collection-type: proceedings
doi: 10.21437/SSW.2023-33
month: 8
start: 211
end: 217
year: 2023
url: "https://www.isca-archive.org/ssw_2023/yoshimura23_ssw.pdf"
GitHub Events
Total
- Create event: 41
- Release event: 10
- Issues event: 15
- Watch event: 24
- Delete event: 28
- Issue comment event: 48
- Push event: 145
- Pull request review comment event: 2
- Pull request event: 69
- Fork event: 4
Last Year
- Create event: 41
- Release event: 10
- Issues event: 15
- Watch event: 24
- Delete event: 28
- Issue comment event: 48
- Push event: 145
- Pull request review comment event: 2
- Pull request event: 69
- Fork event: 4
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| takenori-y | t****4@g****m | 744 |
| Chin-Yun Yu | c****u@q****k | 9 |
| tan90xx | 5****3@q****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 13
- Total pull requests: 191
- Average time to close issues: 9 days
- Average time to close pull requests: 1 day
- Total issue authors: 6
- Total pull request authors: 3
- Average comments per issue: 2.15
- Average comments per pull request: 0.93
- Merged pull requests: 172
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 8
- Pull requests: 76
- Average time to close issues: 3 days
- Average time to close pull requests: 1 day
- Issue authors: 4
- Pull request authors: 2
- Average comments per issue: 2.13
- Average comments per pull request: 0.89
- Merged pull requests: 65
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tensorsofthewall (4)
- yoyololicon (3)
- turian (2)
- Simon-Grome (2)
- bagustris (1)
- tan90xx (1)
Pull Request Authors
- takenori-y (199)
- tan90xx (2)
- yoyololicon (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 768 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 26
- Total maintainers: 1
pypi.org: diffsptk
Speech signal processing modules for machine learning
- Homepage: https://sp-tk.sourceforge.net/
- Documentation: https://sp-nitech.github.io/diffsptk/latest/
- License: Apache 2.0
-
Latest release: 3.3.1
published 7 months ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v3 composite
- actions/stale v8 composite
- numpy *
- soundfile *
- torch >= 1.11.0
- torchcrepe >= 0.0.16, <= 0.0.18
- torchlpc >= 0.2.0
- vector-quantize-pytorch >= 0.8.0