speech-to-intent-dataset
Dataset Release for Intent Classification from Speech
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary
Keywords
Repository
Dataset Release for Intent Classification from Speech
Basic Info
Statistics
- Stars: 47
- Watchers: 5
- Forks: 3
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
Skit-S2I Dataset
Dataset Release for Intent Classification task from Speech
About
This is a dataset for Intent classification from human speech, and covers 14 coarse-grained intents from the Banking domain. This work is inspired by a similar release in the Minds-14 dataset - here, we restrict ourselves to Indian English but with a much larger training set. The dataset is split into:
- test - 100 samples per intent
- train - >650 samples per intent
The data was generated by 11 (Indian English) speakers, recording over a telephony line. We also provide access to anonymised speaker information - like gender, languages spoken, native language - so as to allow more structured discussions around robustness and bias, in the models you train.
Download and Usage
The dataset is available on HuggingFace as Skit-S2I.
This dataset is shared under Creative Commons Attribution-NonCommercial 4.0 International Licence. This places restrictions on commercial use of this dataset.
Uses
Most spoken dialog-systems use a pipeline of speech recognition followed by intent classification, and optimise each individually. But this allows ASR errors to leak downstream. Instead, what if we train end-to-end intent models on speech ? More importantly, how well would such models generalise in a language like Indian English - given the diversity of speech behaviours ? This dataset is an attempt towards answering such questions around robustness and model bias.
Structure
This release contains data of (Indian English) speech samples tagged with an intent from the Banking domain. Also includes the transcript template used to generate the sample.
Audio Quality : 8 Khz, 16-bit
Structure
```
- wavaudios [contains the wav audio files]
- train.csv [contains the train split, where each row contains "
```
More information regarding the dataset can be found in the datasheet.
Baselines
The code for the baselines are provided in the baselines directory.
Citation
If you are using this dataset, please cite using the link in the About section on the right.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Owner
- Name: Skit.ai
- Login: skit-ai
- Kind: organization
- Email: hello@skit.ai
- Location: Bangalore, India
- Website: https://skit.ai
- Twitter: SkitTech
- Repositories: 98
- Profile: https://github.com/skit-ai
Transforming Customer Experience with Voice AI
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this dataset, please cite it using these metadata." authors: - family-names: "Nethil" given-names: "Kumarmanas" - family-names: "Anandan" given-names: "Kriti" - family-names: "Senani" given-names: "Unnati" title: "Speech to Intent Dataset" abstract : "This dataset contains the Indian English speech samples tagged with relevant intents from the banking domain" type: dataset keywords: - "intent-recognition" - "speech-to-intent" - "SLU" - "Indian-English" version: 1.0.0 date-released: 2022-24-04 url: "https://github.com/skit-ai/speech-to-intent-dataset" license: CC BY-NC 4.0
GitHub Events
Total
- Issues event: 1
- Watch event: 3
- Push event: 1
- Pull request event: 1
Last Year
- Issues event: 1
- Watch event: 3
- Push event: 1
- Pull request event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 2
- Total pull requests: 3
- Average time to close issues: 11 months
- Average time to close pull requests: 4 months
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 6.5
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: 5 months
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- shangeth (1)
- abhinavg4 (1)
Pull Request Authors
- janaab11 (2)
- shangeth (1)
