voice

voice: A Comprehensive R Package for Audio Analysis - Published in JOSS (2025)

https://github.com/filipezabala/voice

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in JOSS metadata
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Last synced: 9 months ago · JSON representation

Repository

General tools for voice analysis.

Basic Info

Host: GitHub
Owner: filipezabala
License: gpl-3.0
Language: HTML
Default Branch: master
Homepage:
Size: 86.3 MB

Statistics

Stars: 22
Watchers: 5
Forks: 5
Open Issues: 3
Releases: 0

Created over 6 years ago · Last pushed 11 months ago

Metadata Files

Readme Changelog License

`voice`

General tools for voice analysis.

The voice package is being developed to be an easy-to-use set of tools to deal with audio analysis in R. It provides a free and user-friendly toolkit for audio analysis, enabling researchers to extract, tag, and analyze voice data efficiently. It supports the extraction of audio features, enrichment of structured datasets with audio summaries, and automatic identification of spoken segments—while introducing novel features. It also allows audio analysis based on musical theory, associating frequencies with musical notes arranged in a score via gm package.

The package has been tested extensively since 2019, including:

Real-world applications: Dozens of uses, e.g. sex prediction from voice features and speaker diarization in audiobooks.
Validation: Successful tests on open datasets and LibriVox recordings.

If you want to contribute, report bugs or request new features, use the 'Issues' tab on Github.

0. Basic installation

```{r, eval=FALSE}

Development version from GitHub

install.packages(c('devtools','tidyverse')) devtools::install_github('filipezabala/voice')

Stable version from CRAN

install.packages('voice') ```

If you wish to perform a full installation, proceed to Section 4.

0.1 For Windows Users

If you're compiling R packages from source, you may need to install RTools, a collection of Windows-specific build tools for R.

0.2 For macOS Users

If you're compiling packages, ensure you have Xcode Command Line Tools installed. You also may need macOS tools.

```{bash, eval=FALSE}

Install Xcode on MacOS

xcode-select --install ``` More details may be found at https://filipezabala.com/voicegnette/.

1. Extract features

1.1 Load packages and audio files

```{r, message=FALSE, warning=FALSE}

packs

library(voice) library(tidyverse)

get path to audio file

wavDir <- list.files(system.file('extdata', package = 'wrassp'), pattern = glob2rx('*.wav'), full.names = TRUE) ```

1.2 Extract features

```{r, message=FALSE, warning=FALSE}

minimal usage

M <- voice::extract_features(wavDir) glimpse(M) ```

2. Tag

```{r, message=FALSE, warning=FALSE}

creating Extended synthetic data

E <- dplyr::tibble(subjectid = c(1,1,1,2,2,2,3,3,3), wavpath = wavDir) E

minimal usage

voice::tag(E)

canonical data

voice::tag(E, groupBy = 'subject_id') ```

3. Visualization

3.1 Get audio

{r, message=FALSE, warning=FALSE} url0 <- 'https://github.com/filipezabala/voiceAudios/raw/refs/heads/main/wav/doremi.wav' download.file(url0, paste0(tempdir(), '/doremi.wav'), mode = 'wb')

You may use the command voice::embed_audio(url0) if you wish to show a play button when compiling an .Rmd file. See https://github.com/mccarthy-m-g/embedr for more details about embed_audio() related functions.

3.2 Media data

{r, message=FALSE, warning=FALSE} M <- voice::extract_features(tempdir()) summary(M)

3.3 Plot

{r, message=FALSE, warning=FALSE, fig.width=7.5, fig.height=4} voice::piano_plot(M, 0) # f0 voice::piano_plot(M, 0:1) # f0 + f1

3.4 Assign notes

{r, message=FALSE, warning=FALSE} (f0_spn <- voice::assign_notes(M, fmt = 0, min_points = 22, min_percentile = .85)) # f0 (f1_spn <- voice::assign_notes(M, fmt = 1, min_points = 22, min_percentile = .85)) # f1

3.5 Sheet music

Must have MuseScore and gm.

3.5.1 Notes sequence of f0

{r, message=FALSE, warning=FALSE} library(gm) line_0 <- gm::Line(as.character(f0_spn)) m0 <- gm::Music() + gm::Meter(4, 4) + line_0 gm::show(m0, to = c('score', 'audio'))

3.5.2 Notes sequences of f0 and f1

{r, message=FALSE, warning=FALSE} line_0 <- gm::Line(as.character(f0_spn)) line_1 <- gm::Line(as.character(f1_spn)) m1 <- gm::Music() + gm::Meter(4, 4) + line_0 + line_1 gm::show(m1, to = c('score', 'audio'))

4. Advanced installation

Python-based functions diarize and extract_features (when the latter is inferring f0_praat and fmt_praat features) require a configured Python environment.

4.1 Ubuntu

The following steps are used to fully configure voice on Ubuntu 24.04 LTS (Noble Numbat). Reports of inconsistencies are welcome.

4.1.1. Curl

Command line tool and library for transferring data with URLs. ```bash

installing dependencies

sudo apt-get update sudo apt-get install -y libssl-dev autoconf libtool make

installing curl

sudo apt install curl

verify installation

curl --version ```

4.1.2. ffmpeg

ffmpeg is a cross-platform solution to record, convert and stream audio and video. bash sudo apt-get update sudo apt-get install ffmpeg

4.1.3. Audio drivers and extra packages

bash sudo apt-get update sudo apt-get install portaudio19-dev libasound2-dev libfontconfig1-dev libmagick++-dev libxml2-dev libharfbuzz-dev libfribidi-dev libgdal-dev cmake cmake-doc ninja-build

4.1.4. MuseScore

MuseScore is an open source notation software.

bash sudo add-apt-repository ppa:mscore-ubuntu/mscore-stable sudo apt-get update sudo apt-get install musescore

4.1.5. R

R is a free software environment for statistical computing and graphics. To find out your Ubuntu distribution use lsb_release -a at terminal.
```bash sudo sh -c 'echo "deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/" >> /etc/apt/sources.list.d/cran.list' sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 51716619E084DAB9 gpg -a --export E084DAB9 | sudo apt-key add -

sudo add-apt-repository ppa:c2d4u.team/c2d4u4.0+

sudo apt-get update && sudo apt-get upgrade sudo apt-get install r-base r-base-dev ```

4.1.6. RStudio

RStudio is an Integrated Development Environment (IDE) for R. Check for updates here. bash sudo apt-get update sudo apt-get install gdebi-core wget https://download1.rstudio.org/electron/jammy/amd64/rstudio-2025.05.0-496-amd64.deb sudo gdebi rstudio-2025.05.0-496-amd64.deb

4.1.9. R packages

"Packages are the fundamental units of reproducible R code." Hadley Wickham and Jennifer Bryan. The installation may take several minutes. At terminal run: bash sudo R

Running R as super user paste the following, row by row: r packs <- c('audio','reticulate','R.utils','seewave','tidyverse','tuneR','wrassp') install.packages(packs, dep = TRUE) update.packages(ask = FALSE) devtools::install_github('egenn/music') devtools::install_github('flujoo/gm') To configure the gm package. r usethis::edit_r_environ()

Add the line MUSESCORE_PATH=/usr/bin/mscore to /root/.Renviron file. To exit use :wq at VI. Save and restart the R/RStudio session.

4.1.10. Miniconda

Miniconda is a free minimal installer for conda, an open source package, dependency and environment management system for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN and more, that runs on Windows, macOS and Linux.
Follow the instructions at https://docs.conda.io/en/latest/miniconda.html.

At terminal: bash cd ~/Downloads/ wget -r -np -k https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh cd repo.anaconda.com/miniconda/ bash Miniconda3-latest-Linux-x86_64.sh Do you accept the license terms? [yes|no] yes.

Miniconda3 will now be installed into this location: /home/user/miniconda3 [ENTER]

You can undo this by running conda init --reverse $SHELL? yes

Do you wish the installer to initialize Miniconda3 by running conda init? yes.

Close and reopen terminal.

bash conda update -n base -c defaults conda

The following packages will be INSTALLED/REMOVED/UPDATED/DOWNGRADED:... Proceed ([y]/n)? y

bash conda create -n pyvoice python=3.12

The following (NEW) packages will be downloaded/INSTALLED:... Proceed ([y]/n)? y

bash conda activate pyvoice pip install -r https://raw.githubusercontent.com/filipezabala/voice/master/requirements.txt

4.2 MacOS

The following steps are used to fully configure voice on MacOS Sonoma (Link to MacOS Sequoia). Reports of inconsistencies are welcome.

4.2.1. Homebrew

Install Homebrew, 'The Missing Package Manager for macOS (or Linux)' and remember to brew doctor eventually. At terminal (command + space 'terminal') run: bash /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

4.2.2. wget

GNU Wget is a free software package for retrieving files using HTTP, HTTPS, FTP and FTPS, the most widely used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc. bash brew install wget

4.2.3. Python

Python is a programming language that integrate systems. According to this post, it is recommended to install Python 3.8 and 3.9 and make it consistent. bash brew install python@3.12 python3 --version pip3 --version

4.2.4. ffmpeg

ffmpeg is a cross-platform solution to record, convert and stream audio and video. The installation may take several minutes. bash brew install ffmpeg

4.2.5. XQuartz

The XQuartz project is an open-source effort to develop a version of the X.Org X Window System that runs on macOS.

Download and run https://github.com/XQuartz/XQuartz/releases/download/XQuartz-2.8.5/XQuartz-2.8.5.pkg
Will take around 320 MB of disk space
Send XQuartz-2.8.5.dmg to Trash

4.2.6. MacPorts

Follow the instructions from https://guide.macports.org/chunked/installing.macports.html.

4.2.7. tcllib

bash sudo port selfupdate && sudo port upgrade tcllib sudo port install tcllib

4.2.8. MuseScore

MuseScore is an open source notation software.

Download and run https://musescore.org/en/download/musescore.dmg
Drag MuseScore 4 to Applications folder
Will take around 320 MB of disk space
Unmount MuseScore-4.5.2 virtual disk and send MuseScore-Studio-4.5.2.251141402.dmg to Trash

4.2.9. R

R is a free software environment for statistical computing and graphics.

Download and run the pkg file according to you architecture from https://cloud.r-project.org/bin/macosx/
Will take around 180 MB of disk space

4.2.10. RStudio

RStudio is an Integrated Development Environment (IDE) for R.

Download and run https://download1.rstudio.org/electron/macos/RStudio-2025.05.0-496.dmg
Drag RStudio to Applications folder
Will take around 770 MB of disk space
Unmount RStudio virtual disk and send RStudio-2025.05.0-496.dmg to Trash
Type command + space 'rstudio'
Tools > Global Options... > Appearance > Merbivore (Restart required)

4.2.11. R packages

"Packages are the fundamental units of reproducible R code." Hadley Wickham and Jennifer Bryan. Type command + space 'terminal'
bash sudo R

Running R as super user paste the following, one line at a time. r packs <- c('audio','reticulate','R.utils','seewave','tidyverse','tuneR','wrassp') install.packages(packs, dep = TRUE) update.packages(ask = FALSE) devtools::install_github('egenn/music') devtools::install_github('flujoo/gm')

4.2.12. Miniconda

For 64-bit version use

bash cd ~/Downloads wget -r -np -k https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh cd repo.anaconda.com/miniconda/ bash Miniconda3-latest-MacOSX-x86_64.sh

For M1 version use

bash cd ~/Downloads wget -r -np -k https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh cd repo.anaconda.com/miniconda/ bash Miniconda3-latest-MacOSX-arm64.sh

In order to continue the installation process, please review the license agreement. Please, press ENTER to continue ENTER.

You can undo this by running conda init --reverse $SHELL? yes

Close and reopen terminal.

bash export PATH="~/miniconda3/bin:$PATH" conda update -n base -c defaults conda

The following packages will be INSTALLED/REMOVED/UPDATED/DOWNGRADED:... Proceed ([y]/n)? y

bash conda create -n pyvoice python=3.12 The following (NEW) packages will be downloaded/INSTALLED:... Proceed ([y]/n)? y

Close and reopen terminal.

bash conda activate base conda activate pyvoice pip install -r https://raw.githubusercontent.com/filipezabala/voice/master/requirements.txt

5. Diarize

```{r}

download

url0 <- 'https://github.com/filipezabala/voiceAudios/raw/main/wav/sherlock0.wav' wavDir <- normalizePath(tempdir()) download.file(url0, paste0(wavDir, '/sherlock0.wav'), mode = 'wb') Diarization can be performed to detect speaker segments (i.e., 'who spoke when').{r}

diarize

voice::diarize(fromWav = wavDir, toRttm = wavDir, token = 'YOUR_TOKEN') ```

The voice::diarize() function creates Rich Transcription Time Marked (RTTM)[^rttm] files, space-delimited text files containing one turn per line defined by NIST - National Institute of Standards and Technology. The RTTM files can be read using voice::read_rttm().

[^rttm]: See Appendix C at https://www.nist.gov/system/files/documents/itl/iad/mig/KWS15-evalplan-v05.pdf.

```{r}

read_rttm

(rttm <- voice::read_rttm(wavDir)) ```

Finally, the audio waves can be automatically segmented. ```{r}

split audio wave

voice::splitw(fromWav = wavDir, fromRttm = wavDir, to = wavDir) dir(wavDir, pattern = '.[Ww][Aa][Vv]$') ```

Owner

Name: Filipe Zabala
Login: filipezabala
Kind: user

Website: filipezabala.com
Repositories: 4
Profile: https://github.com/filipezabala

JOSS Publication

voice: A Comprehensive R Package for Audio Analysis

Published

July 30, 2025

DOI

10.21105/joss.08420

Volume 10, Issue 111, Page 8420

Authors

Filipe Jaeger Zabala

Graduate Program of Psychiatry and Behavioral Sciences, UFRGS, Brazil

Giovanni Abrahão Salum

Graduate Program of Psychiatry and Behavioral Sciences, UFRGS, Brazil, Child Mind Institute, New York, NY 10022, USA

Editor

Neea Rusch

GitHub Events

Total

Issues event: 6
Watch event: 3
Issue comment event: 14
Push event: 38
Create event: 1

Last Year

Issues event: 6
Watch event: 3
Issue comment event: 14
Push event: 38
Create event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 2
Total pull requests: 5
Average time to close issues: N/A
Average time to close pull requests: about 2 months
Total issue authors: 1
Total pull request authors: 3
Average comments per issue: 1.0
Average comments per pull request: 0.2
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 3

Past Year

Issues: 2
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Issue authors: 1
Pull request authors: 1
Average comments per issue: 1.0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

expectopatronum (4)
nkrusch (1)

Pull Request Authors

dependabot[bot] (3)
filipezabala (1)
jtrecenti (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (3)

Packages

Total packages: 1
Total downloads:
- cran 186 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 5
Total maintainers: 1

cran.r-project.org: voice

Speaker Recognition, Voice Analysis and Mood Inference via Music Theory

Homepage: https://github.com/filipezabala/voice
Documentation: http://cran.r-project.org/web/packages/voice/voice.pdf
License: GPL-3
Latest release: 0.5.4
published 11 months ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 186 Last month

Rankings

Forks count: 11.3%

Stargazers count: 14.2%

Average: 28.5%

Dependent packages count: 29.8%

Dependent repos count: 35.5%

Downloads: 51.7%

Maintainers (1)

filipezabala@gmail.com

Last synced: 9 months ago

Dependencies

DESCRIPTION cran

R >= 4.1.0 depends
R.utils * imports
dplyr * imports
reticulate * imports
seewave * imports
tibble * imports
tidyselect * imports
tuneR * imports
wrassp * imports
zoo * imports
gm * suggests
knitr * suggests

requirements.txt pypi

librosa *
numpy ==1.20
pandas *
praat-parselmouth *
pyannote.audio ==1.1.1
pychord *
tensorboard *
torch *
torchvision *

draft/requirements-copy.txt pypi

librosa *
numpy ==1.20
pandas *
praat-parselmouth *
pyannote.audio ==1.1.1
pychord *
tensorboard *
torch *
torchvision *

draft/requirements_old.txt pypi

librosa *
numpy ==1.20
pandas *
praat-parselmouth *
pyannote.audio ==1.1.2
pychord *
tensorboard *
torch *
torchvision *

voice

Science Score: 93.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

voice

0. Basic installation

Development version from GitHub

Stable version from CRAN

0.1 For Windows Users

0.2 For macOS Users

Install Xcode on MacOS

1. Extract features

1.1 Load packages and audio files

packs

get path to audio file

1.2 Extract features

minimal usage

2. Tag

creating Extended synthetic data

minimal usage

canonical data

3. Visualization

3.1 Get audio

3.2 Media data

3.3 Plot

3.4 Assign notes

3.5 Sheet music

3.5.1 Notes sequence of f0

3.5.2 Notes sequences of f0 and f1

4. Advanced installation

4.1 Ubuntu

4.1.1. Curl

installing dependencies

installing curl

verify installation

4.1.2. ffmpeg

4.1.3. Audio drivers and extra packages

4.1.4. MuseScore

4.1.5. R

4.1.6. RStudio

4.1.9. R packages

4.1.10. Miniconda

4.2 MacOS

4.2.1. Homebrew

4.2.2. wget

4.2.3. Python

4.2.4. ffmpeg

4.2.5. XQuartz

4.2.6. MacPorts

4.2.7. tcllib

4.2.8. MuseScore

4.2.9. R

4.2.10. RStudio

4.2.11. R packages

4.2.12. Miniconda

5. Diarize

download

diarize

read_rttm

split audio wave

Owner

JOSS Publication

voice: A Comprehensive R Package for Audio Analysis

Authors

Editor

Tags

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

`voice`