https://github.com/amazon-science/question-answering-nlu
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: amazon-science
- License: other
- Language: Python
- Default Branch: main
- Size: 22.5 KB
Statistics
- Stars: 7
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Question Answering NLU
Question Answering NLU (QANLU) is an approach that maps the NLU task into question answering, leveraging pre-trained question-answering models to perform well on few-shot settings. Instead of training an intent classifier or a slot tagger, for example, we can ask the model intent- and slot-related questions in natural language:
``` Context : I'm looking for a cheap flight to Boston.
Question: Is the user looking to book a flight? Answer : Yes
Question: Is the user asking about departure time? Answer : No
Question: What price is the user looking for? Answer : cheap
Question: Where is the user flying from? Answer : (empty) ```
Thus, by asking questions for each intent and slot in natural language, we can effectively construct an NLU hypothesis. For more details, please read the paper: Language model is all you need: Natural language understanding as question answering.
This repository contains code to transform MATIS++ NLU data (e.g. utterances and intent / slot annotations) into SQuAD 2.0 format question-answering data that can be used by QANLU. MATIS++ includes the original English version of ATIS and a translation into eight languages: German, Spanish, French, Japanese, Hindi, Portuguese, Turkish, and Chinese.
To create a SQuAD-style dataset, we first need to create a list of questions for
each intent and a list of questions for each slot. Questions in English are saved in the MATIS_questions.json file.
In order to parse data in languages other than English, you need to provide questions in that language (or translate the English
questions we provide in this repository).
While we can have a number of questions for each intent and slot, sometimes QANLU will perform better if it sees
one question per intent and slot. We control this with the optional --single_q argument. If you call the
atis.py script using that argument, only the first question in the list will be chosen for each intent and slot.
In the opposite case, all questions for each intent and slot will be used.
Run the following to parse MATIS NLU data into SQuAD:
python atis.py \
--data_path <path to the data> \
--languages <de,en,es,fr,ja,hi,pt,tr,zh> \
--qas_file <path to intent and slot questions json file> \
--output_dir <path to where output files are stored> \
[--single_q]
The output of this process will be in the exact format of SQuAD and can be used to train question answering models. The next step would be to train a question answering model, see here for a guide. Alternatively, you can download a QA model trained on SQuAD-v2 directly from huggingface here, and fine-tune it with the MATIS++ NLU data parsed into SQuAD format. Please note that we need a model trained on SQuAD-v2 in order to support negative examples.
A QANLU model trained using SQuAD-v2 and MATIS++ (English) is also available from huggingface here.
In order to calculate precision, recall, and F1 for predictions done on QANLU test sets by the fine-tuned question answering model, you need to call:
python calculate_pr.py \
--pred_file <full path to the predictions file created by transformers> \
--test_file <full path to the test file that the predictions are for>
Example
In this example, we show how to train QANLU on English MATIS (i.e. the original ATIS). We assume that
MATIS has been downloaded at a folder called MATIS in the root directory of this repository.
The first step is to convert the data into SQuAD format:
``` mkdir data
python atis.py \ --datapath MATIS/data/traindevtest \ --languages en \ --qasfile MATISquestions.json \ --outputdir data ```
The next step is to fine-tune a SQuAD-trained QA model on the data we just created. For this
example, we will use the deepset/roberta-base-squad2 model from huggingface.
To do the fine-tuning, we will use the run_squad.py script from here
(assuming 8 GPUs present):
``` mkdir models
python -m torch.distributed.launch --nprocpernode=8 runsquad.py \ --modeltype roberta \ --modelnameorpath deepset/roberta-base-squad2 \ --dotrain \ --doeval \ --dolowercase \ --trainfile data/matisentrainsquad.json \ --predictfile data/matisentestsquad.json \ --learningrate 3e-5 \ --numtrainepochs 2 \ --maxseqlength 384 \ --docstride 64 \ --outputdir models/qanlu/ \ --pergputrainbatchsize 8 \ --overwriteoutputdir \ --version2withnegative \ --savesteps 100000 \ --gradientaccumulationsteps 8 \ --seed $RANDOM ```
Once our model is fine-tuned with MATIS++ data, the model will be saved in the models/qanlu/.
The final step is to calculate performance metrics:
python calculate_pr.py
--pred_file models/qanlu/predictions_.json
--test_file data/matis_en_test.json >> results_matis_en.txt
The output should look like this:
atis_en.txt
Precision: 0.9613439306358381
Recall: 0.9582283039250991
F1: 0.9597835888187556
Results: {'slot': {'Precision': 0.9613439306358381, 'Recall': 0.9582283039250991, 'F1': 0.9597835888187556}}
Citation
If you use this work, please cite:
@inproceedings{namazifar2021language,
title={Language model is all you need: Natural language understanding as question answering},
author={Namazifar, Mahdi and Papangelis, Alexandros and Tur, Gokhan and Hakkani-T{\"u}r, Dilek},
booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={7803--7807},
year={2021},
organization={IEEE}
}
Security
See CONTRIBUTING for more information.
License
This library is licensed under the CC BY NC License.
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0