bas
API for calling the Bavarian Archive for Speech Signals (BAS) services
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Repository
API for calling the Bavarian Archive for Speech Signals (BAS) services
Basic Info
- Host: GitHub
- Owner: nzilbb
- License: gpl-3.0
- Language: Java
- Default Branch: main
- Size: 2.51 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 3
Metadata Files
README.md
BAS API
API for calling the Bavarian Archive for Speech Signals (BAS) services:
http://hdl.handle.net/11858/00-1779-0000-0028-421B-4
Detailed documentation for this package is available at https://nzilbb.github.io/bas/
Prerequisites
- Maven
sudo apt install maven
Build
mvn package
Build Documentation
mvn site
Deployment to OSSRH
Snapshot Deployment
To perform a snapshot deployment:
- Ensure the
versionin pom.xml is suffixed with-SNAPSHOT - Execute the command:
mvn clean deploy
Release Deployment
To perform a release deployment:
- Ensure the
versionin pom.xml isn't suffixed with-SNAPSHOTe.g. use something like the following command from within the ag directory:
mvn versions:set -DnewVersion=1.1.0 - Execute the command:
mvn clean deploy -P release - Happy with everything? Complete the release with:
mvn nexus-staging:release -P releaseOtherwise:mvn nexus-staging:drop -P release...and start again. - Commit and push all changes, and create a release on GitHub.
- Start a new .SNAPSHOT version with something like:
mvn versions:set -DnewVersion=1.1.1-SNAPSHOT
Usage
You need to import nzilbb.bas.BAS; and then instantiate a BAS object:
BAS bas = new BAS();
Once that's done, you can invoke the function you need, and check/retrieve the results, e.g.:
BASResponse response = bas.MAUSBasic("eng-NZ", new File("my.wav"), new File("my.txt"));
if (response.getWarnings() != null) System.out.println(response.getWarnings());
if (response.getSuccess())
{
response.saveDownload(new File("my.TextGrid"))
}
API
Below are the basic functions. For convenience functions and other options, check the full documentation
MAUSBasic(String LANGUAGE, File SIGNAL, File TEXT)
Invokes the MAUSBasic service, which combines G2P and MAUS for forced alignment given a WAV file and a plain text orthrogaphic transcript. - LANGUAGE RFC 5646 tag for identifying the language. - SIGNAL The signal, in WAV format. - TEXT The transcription of the text.
G2P(String lng, String txt, String outsym, String featset, String oform, boolean syl, boolean stress)
Invokes the G2P service for converting orthography into phonemic transcription.
This convenience method takes a String as the text, and assumes iform = "txt". - lng RFC 5646 tag for identifying the language. - txt The text to transform as a String. - outsym Ouput phoneme symbol inventory: + "sampa" + language-specific SAMPA variant is the default. + "x-sampa" + language independent X-SAMPA and IPA can be chosen. + "maus-sampa" + maps the output to a language-specific phoneme subset that WEBMAUS can process. + "ipa" + Unicode-encoded IPA. + "arpabet" + supported for eng-US only - featset + Feature set used for grapheme-phoneme conversion. + "standard" comprises a letter window centered on the grapheme to be converted. + "extended" set additionally includes part of speech and morphological analyses. - oform Output format: + "bpf" indicates the BAS Partitur Format (BPF) file with a KAN tier. + "bpfs" differs from "bpf" only in that respect, that the phonemes are separated by blanks. In case of TextGrid input, both "bpf" and "bpfs" require the additional parameters "tgrate" and "tgitem". The content of the TextGrid tier "tgitem" is stored as a word chunk segmentation in the partiture tier TRN. + "txt" indicates a replacement of the input words by their transcriptions; single line output without punctuation, where phonemes are separated by blanks and words by tabulators. + "tab" returns the grapheme phoneme conversion result in form of a table with two columns. The first column comprises the words, the second column their blank-separated transcriptions. + "exttab" results in a 5-column table. The columns contain from left to right: words, transcriptions, part of speech, morpheme segmentations, and morpheme class segmentations. + "lex" transforms the table to a lexicon, i.e. words are unique and sorted. + "extlex" provides the same information as "exttab" in a unique and sorted manner. For all lex and tab outputs columns are separated by ';'. + "exttcf" which is currently available for German and English only additionally adds part of speech (STTS tagset), morphs, and morph classes. + With "tg" and "exttg" TextGrid output is produced. - syl whether or not word stress is to be added to the output transcription. - stress whether or not the output transcription is to be syllabified.
MAUS(String LANGUAGE, File SIGNAL, File BPF, String OUTFORMAT, String OUTSYMBOL)
Invoke the general MAUS service, with mostly default options, for forced alignment given a WAV file and a phonemic transcription. - LANGUAGE RFC 5646 tag for identifying the language. - SIGNAL The signal, in WAV format. - BPF Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier. - OUTFORMAT Defines the output format: + "TextGrid" + a praat compatible TextGrid file + "par" or "mau-append" + the input BPF file with a new (or replaced) tier MAU + "csv" or "mau" + only the BPF MAU tier (CSV table) + "legacyEMU" + a file with extension .EMU that contains in the first part the Emu hlb file (.hlb) and in the second part the Emu phonetic segmentation (.phonetic) + "emuR" + an Emu compatible *_annot.json file - *OUTSYMBOL Defines the encoding of phonetic symbols in output. + "sampa" + (default), phonetic symbols are encoded in language specific SAM-PA (with some coding differences to official SAM-PA + "ipa" + the service produces UTF-8 IPA output. + "manner" + the service produces IPA manner of articulation for each segment; possible values are: silence, vowel, diphthong, plosive, nasal, fricative, affricate, approximant, lateral-approximant, ejective. + "place" + the service produces IPA place of articulation for each segment; possible values are: silence, labial, dental, alveolar, post-alveolar, palatal, velar, uvular, glottal, front, central, back.
Pho2Syl(String lng, File i, String tier, Boolean wsync, String oform, Integer rate)
Invoke the Pho2Syl service to syllabify a phonemic transcription. - lng RFC 5646 tag for identifying the language. - i Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier. - tier Name of tier in the annotation file, whose content is to be syllabified. - wsync Whether each word boundary is considered as syllable boundary. - oform Output format: + "bpf" + BAS Partiture format + "tg" + TextGrid format - rate Only needed if oform = "tg" (TextGrid); Sample rate to convert sample values from BAS partiture file to seconds in TextGrid.
TTS(String INPUT_TEXT)
Convenience method to invoke the MaryTTS German Text-to-speech service with plain text input, with a WAV file as output, using the default voice. - INPUT_TEXT The text input.
TextAlign(InputStream i, String cost, InputStream costfile, Boolean displc, String atype)
Invoke the TextAlign service for aligning two representations of text, e.g. letters in orthographic transcript with phonemes in a phonemic transcription. - i CSV text file with two semicolon-separated columns. Each row contains a sequence pair to be aligned. The sequence elements must be separated by a blank. Example: a word and its canonical transcription like S c h e r z;S E6 t s. - cost Cost function for the edit operations substitution, deletion, and insertion to be used for the alignment. + "naive" + assigns cost 1 to all operations except of null-substitution, i.e. the substitution of a symbol by itself, which receives cost 0. This 'naive' cost function should be used only if the pairs to be aligned share the same vocabulary, which is NOT the case e.g. in grapheme-phoneme alignment (grapheme 'x' is not the same as phoneme 'x'). + "g2pdeu", "g2peng" etc. are predefined cost functions for grapheme-phoneme alignment for the respective language expressed as iso639-3. + "intrinsic" + a cost function is trained on the input data and returned in the output zip. Costs are derived from co-occurrence probabilities, thus the bigger the input file, the more reliable the emerging cost function. + "import" + the user can provide his/her own cost function file, that must be a semicolon-separated 3-column csv text file. Examples: v;w;0.7 + the substitution of 'v' by 'w' costs 0.7. v;_;0.8 + the delition of 'v' costs 0.8; _;w;0.9 + the insertion of 'w' costs 0.9. A typical usecase is to train a cost function on a big data set with cost='intrinsic', and to subsequently apply this cost function on smaller data sets with cost='import'. - costfile CSV text file with three semicolon-separated columns. Each row contains three columns of the form a;b;c, where c denotes the cost for substituting a by b. Insertion and deletion are are marked by an underscore. - displc whether alignment costs should be displayed in a third column in the output file. - atype Alignment type: + "dir" + align the second column to the first. + "sym" symmetric alignment.
BASResponse
Each method returns a BASResponse object, which you can interrogate to get the result of the request, which is summarized below.
Check the JavaDoc for more details.
getSuccess()
true if successful, false otherwise.
getDownloadLink()
URL for downloading result.
getOutput()
Output message.
getWarnings()
Warning messages.
getXml()
Original XML of the response.
saveDownload() / saveDownload(File file)
Convenience function for downloading the result, if any.
Returns a File object.
Owner
- Name: Te Kāhui Roro Reo | New Zealand Institute of Language, Brain and Behaviour
- Login: nzilbb
- Kind: organization
- Location: Christchurch, New Zealand
- Website: http://www.nzilbb.canterbury.ac.nz/
- Repositories: 43
- Profile: https://github.com/nzilbb
A multi-disciplinary centre dedicated to the study of human language.
Citation (CITATION.cff)
cff-version: 1.2.0
type: software
message: If you use this software, please cite it as below.
title: nzilbb.bas
version: 0.1.1
date-released: 2023-02-08
authors:
- given-names: Robert
family-names: Fromont
email: robert.fromont@canterbury.ac.nz
affiliation: NZILBB
orcid: 'https://orcid.org/0000-0001-5271-5487'
repository-code: https://github.com/nzilbb/bas/tree/main
abstract: >-
API for calling the Bavarian Archive for Speech Signals (BAS) services:
http://hdl.handle.net/11858/00-1779-0000-0028-421B-4
references:
- type: software
title: Apache Commons Codec
version: '1.11'
abbreviation: commons-codec:commons-codec
license: Apache-2.0
notes: More license information can be found in the THIRD-PARTY/Apache_Commons_Codec
directory.
authors:
- name: Henri Yandell
email: bayard@apache.org
- name: Tim OBrien
email: tobrien@apache.org
- name: Scott Sanders
email: sanders@totalsync.com
- name: Rodney Waldhoff
email: rwaldhoff@apache.org
- name: Daniel Rall
email: dlr@finemaltcoding.com
- name: Jon S. Stevens
email: jon@collab.net
- name: Gary Gregory
email: ggregory@apache.org
- name: David Graham
email: dgraham@apache.org
- name: Julius Davies
email: julius@apache.org
- name: Thomas Neidhart
email: tn@apache.org
repository-code: http://svn.apache.org/viewvc/commons/proper/codec/trunk
- type: software
title: Apache Commons Logging
version: '1.2'
abbreviation: commons-logging:commons-logging
license: Apache-2.0
notes: More license information can be found in the THIRD-PARTY/Apache_Commons_Logging
directory.
authors:
- name: Juozas Baliuka
email: baliuka@apache.org
- name: Morgan Delagrange
email: morgand@apache.org
- name: Peter Donald
email: donaldp@apache.org
- name: Robert Burrell Donkin
email: rdonkin@apache.org
- name: Simon Kitching
email: skitching@apache.org
- name: Dennis Lundberg
email: dennisl@apache.org
- name: Costin Manolache
email: costin@apache.org
- name: Craig McClanahan
email: craigmcc@apache.org
- name: Thomas Neidhart
email: tn@apache.org
- name: Scott Sanders
email: sanders@apache.org
- name: Richard Sitze
email: rsitze@apache.org
- name: Brian Stansberry
- name: Rodney Waldhoff
email: rwaldhoff@apache.org
repository-code: http://svn.apache.org/repos/asf/commons/proper/logging/trunk
- type: software
title: Apache HttpClient
version: 4.5.13
abbreviation: org.apache.httpcomponents:httpclient
license: Apache-2.0
notes: More license information can be found in the THIRD-PARTY/Apache_HttpClient
directory.
authors:
- name: Ortwin Glueck
email: oglueck -at- apache.org
- name: Oleg Kalnichevski
email: olegk -at- apache.org
- name: Asankha C. Perera
email: asankha -at- apache.org
- name: Sebastian Bazley
email: sebb -at- apache.org
- name: Erik Abele
email: erikabele -at- apache.org
- name: Ant Elder
email: antelder -at- apache.org
- name: Paul Fremantle
email: pzf -at- apache.org
- name: Roland Weber
email: rolandw -at- apache.org
- name: Sam Berlin
email: sberlin -at- apache.org
- name: Sean C. Sullivan
email: sullis -at- apache.org
- name: Jonathan Moore
email: jonm -at- apache.org
- name: Gary Gregory
email: ggregory -at- apache.org
- name: William Speirs
email: wspeirs at apache.org
- name: Karl Wright
email: kwright -at- apache.org
- name: Francois-Xavier Bonnet
email: fx -at- apache.org
repository-code: https://github.com/apache/httpcomponents-client/tree/4.5.13/httpclient
- type: software
title: Apache HttpClient Mime
version: 4.5.13
abbreviation: org.apache.httpcomponents:httpmime
license: Apache-2.0
notes: More license information can be found in the THIRD-PARTY/Apache_HttpClient_Mime
directory.
authors:
- name: Ortwin Glueck
email: oglueck -at- apache.org
- name: Oleg Kalnichevski
email: olegk -at- apache.org
- name: Asankha C. Perera
email: asankha -at- apache.org
- name: Sebastian Bazley
email: sebb -at- apache.org
- name: Erik Abele
email: erikabele -at- apache.org
- name: Ant Elder
email: antelder -at- apache.org
- name: Paul Fremantle
email: pzf -at- apache.org
- name: Roland Weber
email: rolandw -at- apache.org
- name: Sam Berlin
email: sberlin -at- apache.org
- name: Sean C. Sullivan
email: sullis -at- apache.org
- name: Jonathan Moore
email: jonm -at- apache.org
- name: Gary Gregory
email: ggregory -at- apache.org
- name: William Speirs
email: wspeirs at apache.org
- name: Karl Wright
email: kwright -at- apache.org
- name: Francois-Xavier Bonnet
email: fx -at- apache.org
repository-code: https://github.com/apache/httpcomponents-client/tree/4.5.13/httpmime
- type: software
title: Apache HttpCore
version: 4.4.13
abbreviation: org.apache.httpcomponents:httpcore
license: Apache-2.0
notes: More license information can be found in the THIRD-PARTY/Apache_HttpCore
directory.
authors:
- name: Ortwin Glueck
email: oglueck -at- apache.org
- name: Oleg Kalnichevski
email: olegk -at- apache.org
- name: Asankha C. Perera
email: asankha -at- apache.org
- name: Sebastian Bazley
email: sebb -at- apache.org
- name: Erik Abele
email: erikabele -at- apache.org
- name: Ant Elder
email: antelder -at- apache.org
- name: Paul Fremantle
email: pzf -at- apache.org
- name: Roland Weber
email: rolandw -at- apache.org
- name: Sam Berlin
email: sberlin -at- apache.org
- name: Sean C. Sullivan
email: sullis -at- apache.org
- name: Jonathan Moore
email: jonm -at- apache.org
- name: Gary Gregory
email: ggregory -at- apache.org
- name: William Speirs
email: wspeirs at apache.org
- name: Karl Wright
email: kwright -at- apache.org
- name: Francois-Xavier Bonnet
email: fx -at- apache.org
repository-code: https://github.com/apache/httpcomponents-core/tree/4.4.13/httpcore
GitHub Events
Total
Last Year
Dependencies
- org.apache.httpcomponents:httpclient 4.5.13 compile
- org.apache.httpcomponents:httpmime 4.5.13 compile
- junit:junit 4.11 test
- actions/checkout v1 composite
- actions/setup-java v1 composite