batchalign

Tools for language sample analysis.

https://github.com/talkbank/batchalign2

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Tools for language sample analysis.

Basic Info
Statistics
  • Stars: 23
  • Watchers: 6
  • Forks: 12
  • Open Issues: 6
  • Releases: 0
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

TalkBank | Batchalign2

Welcome! Batchalign2 is a Python suite of language sample analysis (LSA) software from the TalkBank project. It is used to interact with conversation audio files and their transcripts, and provides a whole host of analyses within this space.

The TalkBank Project, of which Batchalign is a part, is supported by NIH grant HD082736.


Quick Start

The following instructions provide a quick start to installing Batchalign. For most users aiming to process CHAT and audio with Batchalign, we recommend more detailed usage instructions: for usage and human transcript cleanup. The following provides a quick start guide for the program.

Install and Update the Package

Batchalign is on PyPi (as batchalign). We recommend the use of UV to install Batchalign:

macOS / Linux

curl -LsSf https://astral.sh/uv/install.sh | sh UV_PYTHON=3.11 uv tool install batchalign

Windows

There are two commands used to install Batchalign. Run both in powershell:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Restart powershell and run the second command:

uv tool install batchalign

Rock and Roll

There are two main ways of interacting with Batchalign. Batchalign can be used as a program to batch-process CHAT (hence the name), or as a Python LSA library.

  • to get started with the Batchalign program, tap here
  • to get started on the Batchalign Library (assumes familiarity with Python), tap here

Quick Start: Command Line

Basic Usage

Once installed, you can invoke the Batchalign program by typing batchalign into the Terminal (MacOS) or Command Prompt (Windows).

It is used in the following basic way:

batchalign [verb] [input_dir] [output_dir]

Where verb includes:

  1. transcribe - by placing only an audio of video file (.mp3/.mp4/.wav) in the input directory, this function performs ASR on the audio, diarizes utterances, identifies some basic conversational features like retracing and filled pauses, and generates word-level alignments. You must supply a language code flag: --lang=[three letter ISO language code] for the ASR system to know what language the transcript is in. You can choose the flags --rev to use Rev.AI, a commercial ASR service, or --whisper, to use a local copy of OpenAI Whisper.
  2. align - by placing both an audio of video file (.mp3/.mp4/.wav) and an utterance-aligned CHAT file in the input directory, this function recovers utterance-level time alignments (if they are not already annotated) and generates word-level alignments. The @Languages header in the CHAT file tells the program which language is in the transcript.
  3. morphotag - by placing a CHAT file in the input directory, this function uses Stanford NLP Stanza to generate morphological and dependency analyses. The @Languages header in the CHAT file tells the program which language is in the transcript. You must supply a language code flag: --lang=[three letter ISO language code] for the alignment system to know what language the transcript is in. <!-- 4. bulletize - placing both an audio of video file (.mp3/.mp4/.wav) and an unlinked CHAT file in the input directory, generate utterance-level alignments through ASR -->

You can get a CHAT transcript to experiment with at the TalkBank website, under any of the "Banks" that are available. You can also generate and parse a CHAT transcript via the Python program.

Sample Commands

For input files (CHAT and audio for align, CHAT only for morphotag, and audio only for transcribe), located in ~/ba_input dumping the output to ~/ba_output, one could write:

ASR + Segmentation

batchalign transcribe --lang=eng ~/ba_input ~/ba_output

morphosyntactic analysis

batchalign morphotag ~/ba_input ~/ba_output

forced alignment

batchalign align ~/ba_input ~/ba_output


Follow instructions from

batchalign --help

and

batchalign [verb] --help

to learn more about other options.

Verbosity

Placing one or multiple -v behind the word batchalign (i.e. behind the [verb] will not work) increases the verbosity of Batchalign. The default mode and one -v will use the normal Batchalign interface, whereas Batchalign with more than 1 -v will switch to the text-based "logging" interface.

For instance, here is the instruction for running Batchalign to perform forced-alignment:

batchalign align input output

With one -v, you can get stack trace information about any files that crashes:

batchalign -v align input output

and with two -vv, we will ditch the loading bar user interface and instead switch to a logging-based interface that has more information about what Batchalign is doing under the hood:

batchalign -vv align input output

Quick Start: Python

Let's begin!

python import batchalign as ba

Document

The Document is the most basic object in Bachalign. All processing pipelines expect Document as input, and will spit out Document as output.

```python doc = ba.Document.new("Hello, this is a transcript! I have two utterances.", media_path="audio.mp3", lang="eng")

navigating the document

firstutterance = doc[0] firstform = doc[0][0] the_comma = doc[0][1]

assert thecomma.text == ',' assert thecomma.type == ba.TokenType.PUNCT

taking a transcript

sentences = doc.transcript(include_tiers=False, strip=True) ```

Notably, if you have a Document that you haven't transcribed yet, you still can make a Document!

python doc = ba.Document.new(media_path="audio.mp3", lang="eng")

Pipelines

Quick Pipeline

Say you wanted to perform ASR, and then tag morphology of the resulting output.

```python nlp = ba.BatchalignPipeline.new("asr,morphosyntax", lang="eng", numspeakers=2) doc = ba.Document.new(mediapath="audio.mp3", lang="eng") doc = nlp(doc) # this is equivalent to nlp("audio.mp3"), we will make the initial doc for you

firstwordpos = doc[0][0].morphology firstwordtime = doc[0][0].time firstutterancetime = doc[0].alignment ```

The quick API (right now) has support for the following tasks, which you can pass in a comma-separated list in the first argument:

  • asr: ASR!
  • morphosyntax: PoS and dependency analysis
  • fa: Forced Alignment (require utterance-level timings already)

We will support many, many, many more tasks soon with this API. For now, to gain access to the whole suite of tools, use the second pipeline API discussed below.

Manual Pipeline

Batchalign ships with a plurality of engines which preform the actual processing. For instance, to recreate the demo we had above using the Engines API, we would write

```python

ASR

whisper = ba.WhisperEngine(lang="eng")

retracing and disfluency analysis

retrace = ba.NgramRetraceEngine() disfluency = ba.DisfluencyReplacementEngine()

morphosyntax

morphosyntax = ba.StanzaEngine()

create a pipeline

nlp = ba.BatchalignPipeline(whisper, retrace, disfluency, morphosyntax)

and run it!

doc = nlp("audio.mp3") ```

Here's a list of available engines.

Formats

We currently support reading and writing two transcript formats: TalkBank CHAT, and Praat TextGrid.

CHAT

Here's how to read and write a CHAT file to parse a TalkBank transcript!

```python

reading

chat = ba.CHATFile(path="chat.cha") doc = chat.doc

writing

chat = ba.CHATFile(doc=doc) chat.write("chat.cha") ```

We will automatically detect audio files located within the same directory as the CHAT file, and associate it with the Batchalign Document.

TextGrid

Importantly, there are two ways a TextGrid could be written: we can either place each utterance in an individual IntervalTier, or each word in its own IntervalTier; we leave that decision up to you. To learn more about TextGrid, visit this page.

```python

reading; recall we can either interpret each IntervalTier as a word or utterance

tgutterance = ba.TextGridFile("utterance", path="tgut.TextGrid", lang="eng") tgword = ba.TextGridFile("word", path="tgw.TextGrid", lang="eng")

doc1 = tgutterance.doc doc2 = tgword.doc

writing

tgutterance = ba.TextGridFile("utterance", doc=doc1) tgword = ba.TextGridFile("word", doc=doc2)

tgutterance.write("tgut.TextGrid") tgword.write("tgw.TextGrid") ```

Questions?

If you have any questions or concerns, please reach out! If something isn't working right, open an issue on GitHub; if you need support, please feel free to email houjun@cmu.edu and macw@cmu.edu.

Owner

  • Name: TalkBank
  • Login: TalkBank
  • Kind: organization
  • Email: macw@cmu.edu
  • Location: Pittsburgh

TalkBank

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Batchalign2
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Brian
    family-names: MacWhinney
    email: macw@cmu.edu
    affiliation: Carnegie Mellon University
  - given-names: Houjun
    family-names: Liu
    email: houjun@stanford.edu
    affiliation: Stanford University
repository-code: 'https://github.com/talkbank/batchalign2'
url: 'https://github.com/talkbank/batchalign2'
abstract: >-
  Purpose: A major barrier to the wider use of language
  sample analysis (LSA) is the fact that transcription is
  very time intensive. Methods that can reduce the required
  time and effort could help in promoting the use of LSA for
  clinical practice and research.

  Method: This article describes an automated pipeline,
  called Batchalign, that takes raw audio and creates full
  transcripts in Codes for the Human Analysis of Talk (CHAT)
  transcription format, complete with utterance- and
  word-level time alignments and morphosyntactic analysis.
  The pipeline only requires major human intervention for
  final checking. It combines a series of existing tools
  with additional novel reformatting processes. The steps in
  the pipeline are (a) automatic speech recognition, (b)
  utterance tokenization, (c) automatic corrections, (d)
  speaker ID assignment, (e) forced alignment, (f) user
  adjustments, and (g) automatic morphosyntactic and
  profiling analyses.

  Results: For work with recordings from adults with
  language disorders, six major results were obtained: (a)
  The word error rate was between 2.4% for controls and 3.4%
  for patients, (b) utterance tokenization accuracy was at
  the level reported for speakers without language
  disorders, (c) word-level diarization accuracy was at 93%
  for control participants and 83% for participants with
  language disorders, (d) utterance-level diarization
  accuracy based on word-level diarization was high, (e)
  adherence to CHAT format was fully accurate, and (f) human
  transcriber time was reduced by up to 75%.

  Conclusion: The pipeline dramatically shortens the time
  gap between data collection and data analysis and provides
  an output superior to that typically generated by human
  transcribers.
license: BSD-3-Clause
preferred-citation:
  type: article
  authors:
  - family-names: "Liu"
    given-names: "Houjun"
  - family-names: "MacWhinney"
    given-names: "Brian"
  - family-names: "Fromm"
    given-names: "Davida"
  - family-names: "Lanzi"
    given-names: "Alyssa"
  doi: "10.1044/2023_JSLHR-22-00642"
  journal: "Journal of Speech, Language, and Hearing Research"
  month: 7
  start: 2421 # First page number
  end: 2433 # Last page number
  title: "Automation of Language Sample Analysis"
  issue: 7
  volume: 66
  year: 2023

GitHub Events

Total
  • Issues event: 25
  • Watch event: 10
  • Delete event: 8
  • Issue comment event: 80
  • Push event: 149
  • Pull request review comment event: 4
  • Pull request review event: 4
  • Pull request event: 27
  • Fork event: 7
  • Create event: 11
Last Year
  • Issues event: 25
  • Watch event: 10
  • Delete event: 8
  • Issue comment event: 80
  • Push event: 149
  • Pull request review comment event: 4
  • Pull request review event: 4
  • Pull request event: 27
  • Fork event: 7
  • Create event: 11

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 15
  • Total pull requests: 16
  • Average time to close issues: 2 months
  • Average time to close pull requests: 18 days
  • Total issue authors: 9
  • Total pull request authors: 7
  • Average comments per issue: 3.8
  • Average comments per pull request: 0.13
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 6
Past Year
  • Issues: 14
  • Pull requests: 16
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 18 days
  • Issue authors: 8
  • Pull request authors: 7
  • Average comments per issue: 3.93
  • Average comments per pull request: 0.13
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 6
Top Authors
Issue Authors
  • FranklinChen (6)
  • Wanlin-z (2)
  • pa-nlp (1)
  • metaclassing (1)
  • batchenoz (1)
  • tpr5580 (1)
  • abeerM (1)
  • JaviAgua (1)
  • Meadows0156 (1)
  • RichardGhiuzan (1)
  • Noah-Jaffe (1)
  • nrseda (1)
Pull Request Authors
  • dependabot[bot] (5)
  • ss-sebastian (4)
  • spol-29 (2)
  • braininahat (2)
  • FranklinChen (2)
  • Jemoka (2)
  • Melody-SHANG (1)
  • oven521 (1)
  • antecedent (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (5) python (4)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 1,983 last-month
  • Total docker downloads: 43
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 222
  • Total maintainers: 1
pypi.org: batchalign

Python Speech Language Sample Analysis

  • Versions: 222
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 1,983 Last month
  • Docker Downloads: 43
Rankings
Dependent packages count: 10.0%
Average: 37.9%
Dependent repos count: 65.9%
Maintainers (1)
Last synced: 6 months ago

Dependencies

setup.py pypi
  • eyed7 >=0.9.7
  • hmmlearn ==0.3.0
  • imblearn *
  • montreal-forced-aligner >=3.0.0
  • nltk >=3.8
  • plotly >=5.18.0
  • praatio >=6.0.0,<6.1.0
  • pyAudioAnalysis ==0.3.14
  • pydantic >=2.4
  • pydub >=0.25.1,<0.26.0
  • pytorch >=2.1.0,<2.2.0
  • tokenizers >=0.14.1
  • torchaudio >=2.1.0,<2.2.0
  • transformers >=4.35
.github/workflows/test.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite