fave-asr
Interface for automated transcription and time alignment of conversational interview data
https://github.com/forced-alignment-and-vowel-extraction/fave-asr
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary
Keywords
Repository
Interface for automated transcription and time alignment of conversational interview data
Basic Info
- Host: GitHub
- Owner: Forced-Alignment-and-Vowel-Extraction
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Homepage: https://forced-alignment-and-vowel-extraction.github.io/fave-asr/
- Size: 4.25 MB
Statistics
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 4
- Releases: 0
Topics
Metadata Files
README.md
fave-asr: Automated transcription of interview data
<!-- For the future: Coveralls for codecoverage -->
The FAVE-asr package provides a system for the automated transcription of sociolinguistic interview data on local machines for use by aligners like FAVE or the Montreal Forced Aligner. The package provides functions to label different speakers in the same audio (diarization), transcribe speech, and output TextGrids with phrase- or word-level alignments.
Example Use Cases
- You want a transcription of an interview for more detailed hand correction.
- You want to transcribe a large corpus and your analysis can tolerate a small error rate.
- You want to make an audio corpus into a text corpus.
- You want to know the number of speakers in an audio file.
For examples on how to use the pacakge, see the Usage pages.
Installation
To install fave-asr using pip, run the following command in your terminal:
bash
pip install fave-asr
Other software required
ffmpegis needed to process the audio. You can download it from their website
Not another transcription service
There are several services which automate the process of transcribing audio, including
Unlike other services, fave-asr does not require uploading your data to other servers and instead focuses on processing audio on your own computer. Audio data can contain highly confidential information, and uploading this data to other services may not comply with ethical or legal data protection obligations. The goal of fave-asr is to serve those use cases where data protection makes local transcription necessary while making the process as seamless as cloud-based transcription services.
Example
As an example, we'll transcribe an audio interview of Snoop Dogg by the 85 South Media podcast and output it as a TextGrid.
```{python} import fave_asr
data = faveasr.transcribeanddiarize( audiofile = 'usage/resources/SnoopDogg85SouthMedia.wav', hftoken = '', modelname = 'small.en', device = 'cpu' ) tg = faveasr.toTextGrid(data) tg.write('SnoopDogg85SouthMedia.TextGrid') ```
Using gated models
Artifical Intelegence models are powerful and in the wrong hands can be dangerous. The models used by fave-asr are cost-free, but you need to accept additional terms of use.
To use these models: 1. On HuggingFace, create an account or log in 2. Accept the terms and conditions for the segmentation model 3. Accept the terms and conditions for the diarization model 4. Create an access token or copy your existing token
Keep track of your token and keep it safe (e.g. don't accidentally upload it to GitHub). We suggest creating an environment variable for your token so that you don't need to paste it into your files.
Creating an environment variable for your token
Storing your tokens as environment variables is a good way to avoid accidentally leaking them. Instead of typing the token into your code and deleting it before you commit, you can use os.environ["HF_TOKEN"] to access it from Python instead. This also makes your code more readable since it's obvious what HF_TOKEN is while a string of numbers and letters isn't clear.
Linux and Mac
On Linux and Mac you can store your token in .bashrc
- Open
$HOME/.bashrcin a text editor - At the end of that file, add the following
HF_TOKEN='<your token>' ; export HF_TOKENreplacing<your token>with your HuggingFace token - Add the changes to your current session using
source $HOME/.bashrc
Windows
On Windows, use the setx command to create an environment variable.
setx HF_TOKEN <your token>
You need to restart the command line afterwards to make the environment variable available for use. If you try to use the variable in the same window you set the variable, you will run into problems.
Other software required
ffmpeg
Authors
Luís Roque contributed substantially to the main speaker diarization pipeline. Initial modifications to that code were made by Christian Brickhouse for stability and use as part of the fave-asr library. For licensing of the test audio, see the README in that directory.
Recommended citation
Brickhouse, Christian (2024). FAVE-ASR: Offline transcription of interview data (Version 0.1.0) [computer software]. https://forced-alignment-and-vowel-extraction.github.io/fave-asr/
Owner
- Name: Forced Alignment and Vowel Extraction
- Login: Forced-Alignment-and-Vowel-Extraction
- Kind: organization
- Repositories: 8
- Profile: https://github.com/Forced-Alignment-and-Vowel-Extraction
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Brickhouse" given-names: "Christian" orcid: "https://orcid.org/0000-0002-2748-8056" title: "FAVE-ASR: Offline transcription of interview data" version: 0.1.0 date-released: 2024-04-09 url: "https://github.com/Forced-Alignment-and-Vowel-Extraction/fave-asr"
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Packages
- Total packages: 1
-
Total downloads:
- pypi 15 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
- Total maintainers: 1
pypi.org: fave-asr
Automated transcription and diarization of linguistic data
- Homepage: https://forced-alignment-and-vowel-extraction.github.io/fave-asr/
- Documentation: https://fave-asr.readthedocs.io/
- License: GPL-3.0-or-later
-
Latest release: 0.1.0
published almost 2 years ago