nzilbb.labbcat

R package for accessing LaBB-CAT functionality

https://github.com/nzilbb/labbcat-r

Last synced: 7 months ago · JSON representation ·

Repository

R package for accessing LaBB-CAT functionality

Basic Info

Host: GitHub
Owner: nzilbb
License: gpl-3.0
Language: R
Default Branch: main
Size: 3.52 MB

Statistics

Stars: 6
Watchers: 3
Forks: 2
Open Issues: 7
Releases: 13

Created about 7 years ago · Last pushed 8 months ago

Metadata Files

Readme Changelog License Citation

nzilbb.labbcat package for R

This R package provides functionality for querying and extracting data from LaBB-CAT servers, directly from R.

``` R library(nzilbb.labbcat) labbcat.url <- "http://localhost:8080/labbcat/"

search for tokens of the KIT vowel

m <- getMatches(labbcat.url, list(segment="I")) |> # get morphology of the word, and the participant's gender appendLabels(c("morphology", "participant_gender")) |> # extract F1, 2, and 3 from the mid-point of each vowel appendFromPraat( Target.segment.start, Target.segment.end, praatScriptFormants(c(1,2,3)), window.offset=0.5) ```

LaBB-CAT is a web-based linguistic annotation store that stores audio or video recordings, text transcripts, and other annotations.

This package provides access to basic corpus structure data, pattern-based search, annotation, audio, TextGrid (and other format) extraction, and server-side acoustic measurement with Praat.

Online documentation is available at https://nzilbb.github.io/labbcat-R

Basic usage instructions

Getting started

To install the latest version of the package in CRAN:

R install.packages("nzilbb.labbcat")

To use it:

R library(nzilbb.labbcat)

For all functions, the first parameter is the URL to the LaBB-CAT instance you want to interact with - e.g. "https://labbcat.canterbury.ac.nz/demo/".

If the instance is password-protected, you'll be prompted for the username and password the first time you invoke a function for that instance.

Basic informational functions

There are some basic functions that provide information about the LaBB-CAT instance you're using.

``` R labbcat.url <- "https://labbcat.canterbury.ac.nz/demo"

id <- getId(labbcat.url) layers <- getLayerIds(labbcat.url) corpora <- getCorpusIds(labbcat.url)

paste("LaBB-CAT instance", id, "has", length(layers), "layers. The corpora are:") corpora ```

Accessing specific transcript and participant IDs

You can get a complete list of participants and transcripts:

``` R participants <- getParticipantIds(labbcat.url) transcripts <- getTranscriptIds(labbcat.url)

paste("There are", length(participants), "participants. The first one is", participants[1]) paste("There are", length(transcripts), "transcripts. The first one is", transcripts[1]) ```

There are also ways to get a filtered list of transcripts:

``` R

Transcripts in the UC corpus:

getTranscriptIdsInCorpus(labbcat.url, "UC")

Transcripts featuring the participant QB1602:

getTranscriptIdsWithParticipant(labbcat.url, "QB1602")

Transcripts with 'YW' in their name:

getMatchingTranscriptIds(labbcat.url, "/.YW./.test(id)") ```

Accessing Media

Given a transcript ID you can access information about what media it has available:

``` R

Download the default WAV file

wav.file <- getMedia(labbcat.url, "AP2515_ErrolHitt.eaf")

Get information about all media available

media <- getAvailableMedia(labbcat.url, "AP2515_ErrolHitt.eaf")

All media file names with their track suffixes and content types

media[,c("name","trackSuffix","mimeType")]

Download a specific media file

quake.face.video.file <- getMedia(labbcat.url, "AP2515ErrolHitt.eaf", track.suffix = "face", mime.type = "video/mp4")

tidily delete the files we just downloaded

file.remove(wav.file) file.remove(quake.face.video.file)

```

Media fragments

You can access a selected fragment of a wav file with getSoundFragments. The function downloads a wav file to the current working directory, and returns the name of the file:

``` R wav.file <- getSoundFragments(labbcat.url, "AP2505_Nelson.eaf", 10.0, 15.0)

paste("The third 5 seconds is in this file:", wav.file)

tidily delete the file we just downloaded

file.remove(wav.file) ```

getSoundFragments also accepts vectors for the id, start, and end parameters:

``` R results <- data.frame( id=c("AP2505Nelson.eaf", "AP2512MattBlack.eaf", "AP2512_MattBlack.eaf"), start=c(10.0, 20.0, 30.0), end=c(15.0, 25.0, 35.0))

wav.files <- getSoundFragments(labbcat.url, results$id, results$start, results$end, no.progress = TRUE)

wav.files

tidily delete the files we just downloaded

file.remove(wav.files) ```

This means that, if you have a results csv file exported from LaBB-CAT, which identifies segment tokens, you can iterate through the rows, downloading the corresponding wav files, something like:

``` R

load the results from the CSV file

results <- read.csv("results.csv", header=T)

download all the segment WAV files

wav.files <- getSoundFragments( labbcat, results$Transcript, results$segment.start, results$segment.end) ```

Getting annotations from other layers

If you have search results in a CSV file, and would like to retrieve annotations from some other layer, you can use the getMatchLabels function, providing the MatchId column (or the URL column) that indentifies the token, and the desired layer name:

R results <- read.csv("results.csv", header=T) phonemes <- getMatchLabels(labbcat.url, results$MatchId, c("participant_age", "phonemes"))

If you want alignment information - i.e. start and end time -- you can use getMatchAlignments:

R results <- read.csv("results.csv", header=T) phonemes <- getMatchAlignments(labbcat.url, results$MatchId, "syllables")

Search

Searching for matching tokens can be achieved using the getMatches function.

A basic search can be achieved with a simple, single-layer pattern like:

``` R

all words starting with "ps..."

results <- getMatches(labbcat.url, list(orthography = "ps.*")) ```

More complex patterns, across multiple tokens an multiple layers, is possible by specifying a more complex structure:

``` R

the word 'the' followed immediately or with one intervening word by

a hapax legomenon (word with a frequency of 1) that doesn't start with a vowel

results <- getMatches(labbcat.url, list(columns = list( list(layers = list( orthography = list(pattern = "the")), adj = 2), list(layers = list( phonemes = list(not = TRUE, pattern = "[cCEFHiIPqQuUV0123456789~#\{\$@].*"), frequency = list(max = "2")))))) ```

The data frame that's returned contains columns that can be used as parameters for other functions:

``` R

get all instances of the KIT vowel

results <- getMatches(labbcat.url, list(segment = "I"))

get phonemic transcription for the whole word

phonemes <- getMatchLabels(labbcat.url, results$MatchId, "phonemes")

download all the segment WAV files

wav.files <- getSoundFragments( labbcat.url, results$Transcript, results$Target.segment.start, results$Target.segment.end) ```

Looking up dictionaries

LaBB-CAT maintains a number of dictionaries it uses to look things up. These include access to CELEX, LIWC, and other lexicons that might be set up in the LaBB-CAT instance.

You can list the available dictionaries using:

R dictionaries <- getDictionaries(labbcat.url)

With one of the returned layer manager ID and dictionary ID pairs, you can look up dictionary entries for a list of keys:

R words <- c("the", "quick", "brown", "fox") pronunciation <- getDictionaryEntries(labbcat.url, "CELEX-EN", "Phonology (wordform)", words)

Process with Praat

This function instructs the LaBB-CAT server to invoke Praat for a set of sound intervals, in order to extract acoustic measures.

The exact measurements to return depend on the praat.script that is invoked. This is a Praat script fragment that will run once for each sound interval specified.

There are functions to allow the generation of a number of pre-defined praat scripts for common tasks such as formant, pitch, intensity, and centre of gravity:

``` R

Perform a search

results <- getMatches(labbcat.url, list(segment="I"))

get F1 and F2 for the mid point of the vowel

formants <- processWithPraat( labbcat.url, results$MatchId, results$Target.segment.start, results$Target.segment.end, praatScriptFormants(), no.progress=TRUE) ```

You can provide your own script, either by building a string with your code, or loading one from a file.

``` R

execute a custom script loaded form a file

acoustic.measurements <- processWithPraat( labbcat.url, results$MatchId, results$Target.segment.start, results$Target.segment.end, readLines("acousticMeasurements.praat")) ```

Retrieving transcript and participant attributes

Transcript attributes can be retrieved like this:

``` R

Get language, duration, and corpus for transcripts starting with 'BR'

attributes <- getTranscriptAttributes(labbcat.url, getMatchingTranscriptIds(labbcat.url, "/BR.+/.test(id)"), c('transcriptlanguage', 'transcriptduration', 'corpus')) ```

Similarly, participant attributes can also be accessed:

``` R

Get gender and age for all participants

attributes <- getParticipantAttributes(labbcat.url, getParticipantIds(labbcat.url), c('participantgender', 'participantage')) ```

Developers

Prerequesites

Developer tools:

R -e "install.packages('devtools')"

For building the documentation with pkgdown:

apt install pandoc R -e "install.packages('pkgdown')"

Building the package and documentation

The package can be built from the source code using using:
./build.sh

Running automated tests

Unit tests use the 'testthat' package, which requires a one-time installation:

R -e "install.packages('testthat')"

The tests assume access to at least one LaBB-CAT server, with URL and credentials defined by environment variables, so you must create a .Renviron file something like:

TEST_READ_LABBCAT_URL=https://labbcat.canterbury.ac.nz/demo/ TEST_READ_LABBCAT_USERNAME=demo TEST_READ_LABBCAT_PASSWORD=demo TEST_ADMIN_LABBCAT_URL=http://localhost:8080/labbcat/ TEST_ADMIN_LABBCAT_USERNAME=labbcat TEST_ADMIN_LABBCAT_PASSWORD=labbcat

Then you can use the following commands to run unit tests:

R -e "devtools::test()"

Specific tests can be run like this:

R -e "devtools::test(filter='getId')"

Owner

Name: Te Kāhui Roro Reo | New Zealand Institute of Language, Brain and Behaviour
Login: nzilbb
Kind: organization
Location: Christchurch, New Zealand

Website: http://www.nzilbb.canterbury.ac.nz/
Repositories: 43
Profile: https://github.com/nzilbb

A multi-disciplinary centre dedicated to the study of human language.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.4.0
title: nzilbb.labbcat R package
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Robert
    family-names: Fromont
    email: robert.fromont@canterbury.ac.nz
    affiliation: NZILBB
    orcid: 'https://orcid.org/0000-0001-5271-5487'
repository-code: 'https://github.com/nzilbb/labbcat-R/'
url: 'https://cran.r-project.org/web/packages/nzilbb.labbcat/index.html'
abstract: >-
  Client library for communicating with LaBB-CAT
  servers using R.
keywords:
  - LaBB-CAT
  - corpus linguistics
license: AGPL-3.0-or-later
version: 1.5-0
identifiers:
  - type: doi
    value: 10.5281/zenodo.16905575
date-released: '2025-08-19'

GitHub Events

Total

Release event: 1
Watch event: 1
Push event: 8
Create event: 2

Last Year

Release event: 1
Watch event: 1
Push event: 8
Create event: 2

Committers

Last synced: over 2 years ago

All Time

Total Commits: 354
Total Committers: 1
Avg Commits per committer: 354.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 51
Committers: 1
Avg Commits per committer: 51.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Robert Fromont	r**t@f**z	354

Committer Domains (Top 20 + Academic)

fromont.net.nz: 1

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 26
Total pull requests: 1
Average time to close issues: 6 months
Average time to close pull requests: 11 days
Total issue authors: 3
Total pull request authors: 1
Average comments per issue: 1.73
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 3
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

djvill (13)
robertfromont (12)
simongonzalez (1)

Pull Request Authors

olivroy (2)

Top Labels

Issue Labels

enhancement (16) bug (3)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 631 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 20
Total maintainers: 1

cran.r-project.org: nzilbb.labbcat

Accessing Data Stored in 'LaBB-CAT' Instances

Homepage: https://nzilbb.github.io/labbcat-R/
Documentation: http://cran.r-project.org/web/packages/nzilbb.labbcat/nzilbb.labbcat.pdf
License: GPL (≥ 3)
Latest release: 1.5-0
published 8 months ago

Versions: 20
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 631 Last month

Rankings

Forks count: 17.8%

Stargazers count: 24.2%

Average: 27.4%

Downloads: 29.6%

Dependent packages count: 29.8%

Dependent repos count: 35.5%

Maintainers (1)

robert.fromont@canterbury.ac.nz

Last synced: 7 months ago

nzilbb.labbcat

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

nzilbb.labbcat package for R

search for tokens of the KIT vowel

Basic usage instructions

Getting started

Basic informational functions

Accessing specific transcript and participant IDs

Transcripts in the UC corpus:

Transcripts featuring the participant QB1602:

Transcripts with 'YW' in their name:

Accessing Media

Download the default WAV file

Get information about all media available

All media file names with their track suffixes and content types

Download a specific media file

tidily delete the files we just downloaded

Media fragments

tidily delete the file we just downloaded

tidily delete the files we just downloaded

load the results from the CSV file

download all the segment WAV files

Getting annotations from other layers

Search

all words starting with "ps..."

the word 'the' followed immediately or with one intervening word by

a hapax legomenon (word with a frequency of 1) that doesn't start with a vowel

get all instances of the KIT vowel

get phonemic transcription for the whole word

download all the segment WAV files

Looking up dictionaries

Process with Praat

Perform a search

get F1 and F2 for the mid point of the vowel

execute a custom script loaded form a file

Retrieving transcript and participant attributes

Get language, duration, and corpus for transcripts starting with 'BR'

Get gender and age for all participants

Developers

Prerequesites

Building the package and documentation

Running automated tests

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: nzilbb.labbcat

Rankings

Maintainers (1)