https://github.com/awslabs/multi-domain-goal-oriented-dialogues-dataset

Data from the publication "Multi-Domain Goal-Oriented Dialogues (MultiDoGO): Strategies toward Curating and Annotating Large Scale Dialogue Data"

https://github.com/awslabs/multi-domain-goal-oriented-dialogues-dataset

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Data from the publication "Multi-Domain Goal-Oriented Dialogues (MultiDoGO): Strategies toward Curating and Annotating Large Scale Dialogue Data"

Basic Info
  • Host: GitHub
  • Owner: awslabs
  • License: other
  • Default Branch: master
  • Homepage:
  • Size: 56.4 MB
Statistics
  • Stars: 21
  • Watchers: 5
  • Forks: 3
  • Open Issues: 3
  • Releases: 0
Created about 6 years ago · Last pushed over 5 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Data from "Multi-Domain Goal-Oriented Dialogues (MultiDoGO): Strategies toward Curating and Annotating Large Scale Dialogue Data"

Repository Structure

Under the top level ./data directory, you will find the following two sub-directories:

1. unannotated:

unannotated human to human conversations from the airline, fastfood, finance, insurance, media, and software domains. Conversations are split by domain and given in TSV format with columns: "conversationId", "turnNumber", "utteranceId", "utterance", "authorRole".

2. paper_splits:

pre-processed training, development, and test splits for customer turns used to obtain intent classification and slot-labeling results in Table 7 of the paper. As in the paper, we partition these data by annotation granularity, either sentence level (located at ./data/papersplits/splitsannotatedatsentencelevel) or turn level (located at ./data/papersplits/splitsannotatedatturnlevel). Under each annotation granularity subdirectory, we provide splits for each domain: airline, fastfood, finance, insurance, media, and software. The splits are labeled as "train.tsv", "dev.tsv", "test.tsv" and contain the following tab separated columns: "conversationId", "turnNumber", "sentenceNumber" (only for sentence level splits), "utteranceId", "utterance", "slot-labels", and "intent". The labels in the slot-labels field are separated by spaces. In the case of multiple intents for a single input, we separate the intents with the special token <div>.

License

This project is licensed under the CDLA Permissive License. Terms given in LICENSE.txt.

Reference

For reference please cite our EMNLP-2019 paper: Multi-Domain Goal-Oriented Dialogues (MultiDoGO): Strategies toward Curating and Annotating Large Scale Dialogue Data (BibTex below)

@inproceedings{peskov-etal-2019-multi, title = "Multi-Domain Goal-Oriented Dialogues ({M}ulti{D}o{GO}): Strategies toward Curating and Annotating Large Scale Dialogue Data", author = "Peskov, Denis and Clarke, Nancy and Krone, Jason and Fodor, Brigi and Zhang, Yi and Youssef, Adel and Diab, Mona", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)", year = "2019", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D19-1460", doi = "10.18653/v1/D19-1460", pages = "4526--4536", }

Owner

  • Name: Amazon Web Services - Labs
  • Login: awslabs
  • Kind: organization
  • Location: Seattle, WA

AWS Labs

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 3
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 3
  • Total pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jpcorb20 (1)
  • moyapchen (1)
  • scottmackieverint (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels