Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: McGill-NLP
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 1.04 MB
Statistics
  • Stars: 48
  • Watchers: 7
  • Forks: 5
  • Open Issues: 2
  • Releases: 0
Created about 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

This repository hosts the code and pre-trained models for our paper FaithDial: A Faithful Benchmark for Information-Seeking Dialogue. Also, it hosts the data annotations for our NAACL paper On the origin of hallucination in dialogue systems. For more information, please visit the project page.

**************************** Updates **************************** * 9/06: FaithDial accepted to TACL! Please check out the updated paper. * 7/30: We released the code for FaithCritic and uploaded our model to :hugs: Hub. * 4/25: We released the FaithDial paper and launched the project page. Check them out! * 4/15: We released our paper, to appear at NAACL 2022!

Quick Links

Overview

The goal of information-seeking dialogue is to respond to user queries with natural language utterances that are grounded on knowledge sources. Dialogue systems, however, often hallucinate, i.e. generate unsupported utterances, as they amplify the noise found in existing training datasets. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues. Annotators were asked to edit the hallucinated utterances in a pre-existing dataset to ensure they are faithful to knowledge sources and re-purpose the role of the interlocutor from a human wizard to a domain-expert bot.

Data

The dataset is hosted on Huggingface's datasets:

```python from datasets import load_dataset

dataset = load_dataset("McGill-NLP/FaithDial") ```

Use with Huggingface

We'll release our fine-tuned models soon! Stay tuned!

Train Your Models

The code for all the models in the paper is available in models, which can be used to reproduce our results or to train your own models.

Requirements

First, install Pytorch 1.7+ from the official website and then, clone this repository and install the dependencies:

bash git clone git@github.com:McGill-NLP/FaithDial.git pip install -r requirements.txt

Our code is tested with Python 3.8, and Pytorch 1.7.1 with CUDA 11.0.

Data Format

By default, our code loads data from the Huggingface's datasets. But, you can also provide your own data with the following format:

text [ { "utterances": [ ... // prior utterances, { "history": [ "Have you ever been to a concert? They're so fun!", "No I cannot as a bot. However, have you been to Madonna's? Her 10th concert was used to help her 13th album called \"Rebel Heart\".", "Yeah I've heard of it but never went or what it was for. Can you tell me more about it?" ], "speaker": "Wizard", "knowledge": "It began on September 9, 2015, in Montreal, Canada, at the Bell Centre and concluded on March 20, 2016, in Sydney, Australia at Allphones Arena.", "original_response": "It started in September of 2015 and ran all the way through March of 2016. Can you imagine being on the road that long?", "response": "Sure. The concert started in September 9th of 2015 at Montreal, Canada. It continued till 20th of March of 2016, where it ended at Sydney, Australia.", "BEGIN": [ "Hallucination", "Entailment" ], "VRM": [ "Disclosure", "Question" ] }, ... // more utterances ] }, ... // more dialogues ] In the above example, original_response, BEGIN, and VRM are optional and don't have to be provided for your own data.

Training

Here is how to train a model:

bash python models/dialog.py --model_name_or_path t5-base \ --do_train \ --output_dir /path/to/output_dir \ --fp16 \ --train_batch_size 16 \ --num_train_epochs 10 \ --warmup_ratio 0.04 \ --max_seq_length 512

To run on multiple GPUs, set CUDA_VISIBLE_DEVICES. By default, training early stops and the best model is saved at /path/to/output_dir/best_model.

Other arguments for training are as follows: - --learning_rate: Initial learning rate for Adam. - --gradient_accumulation_steps: Number of steps to accumulate gradient before performing a backward/update pass. - --enable_infonce: Whether to use the InfoNCE model. Note that negative_samples must be present in the input data for contrastive learning. Also, --fp16 should not be set. - --max_negative_samples: The number of negative samples per training example (Works only when InfoNCE is enabled). - --inbatch_negatives: Whether to use inbatch negative sampling (Works only when InfoNCE is enabled). - --loss_truncation: Whether to use loss truncation. - --ctrl: Whether to use controlled generation. Note that control_tokens must be present in the input data. To learn about how to compute control tokens, see here. - --train_dataset_path (optional): Path to your own training dataset. - --eval_dataset_path (optional): Path to your own validation dataset.

For a complete list of arguments, take a look at models/dialog.py and models/lightning_base.py.

Evaluation

To compute perplexity of a model on the validation data, simply run:

bash python models/dialog.py --model_name_or_path /path/to/model/best_model \ --do_eval \ --eval_batch_size 16

For the test data, --do_eval should be replaced with --do_test. Note that evaluation should be run on a single GPU.

To compute other metrics (BLEU, ROUGE, F1, BERTScore, and Q^2), reported in the paper, we used the scripts, provided in https://github.com/orhonovich/q-squared.

Generation

To generate a response, simply run:

bash python models/generate.py --model_name_or_path /path/to/model/best_model --do_sample --top_p 0.6 Arguments for generation are as follows: - --output (optional): Path of the output directory to save the generated responses. - --dataset_path (optional): Path to your own dataset. - --control_tokens (optional): Control tokens, prepended to the sequence, for controlled generation. - --max_length (default: 100): Maximum length of the generated sequence.

For a complete list of arguments, refer to models/generate.py.

Critic

We also use our collected data to frame the problem of identifying hallucination as a binary classification task where the goal is to predict whether an utterance is faithful or not, given the source knowledge. We call this model FaithCritic.

Huggingface

```python import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.frompretrained("McGill-NLP/roberta-large-faithcritic", returntensors="pt") model = AutoModelForSequenceClassification.from_pretrained("McGill-NLP/roberta-large-faithcritic")

knowledge = "A cardigan is a type of knitted garment (sweater) that has an open front." response = "The old version is the regular one, knitted garment that has open front and buttons!" input = tokenizer(knowledge, response) print(torch.argmax(model(**input).logits)) ```

Training

bash python models/critic.py --model_name_or_path roberta-large --do_train --train_batch_size 16 \ --learning_rate 1e-5 --weight_decay 0.1 --warmup_ratio 0.08 --pad_to_multiple_of 8 --fp16 \ --output_dir /path/to/output

Testing

bash python models/critic.py --model_name_or_path /path/to/model --eval_batch_size 16 --do_test

To test on other datasets, you need to pass --test_task {BEGIN|MNLI}. For BEGIN and MNLI, --test_dataset_path is required and can be downloaded from here and here, respectively. For MNLI, it is possible to use the version that is hosted on :hugs: Datasets by not passing `--testdatasetpath`, but the results would be slightly different.

Bugs or questions?

If you have any questions (:question:) related to the code, or encounter any problems (:hammerandwrench:), or want to report a bug (:bug:), feel free to open an issue.

Citation

If you want to cite our papers, please use:

bibtex @article{dziri2022faithdial, title = "{FaithDial: A Faithful Benchmark for Information-Seeking Dialogue}", author = {Dziri, Nouha and Kamalloo, Ehsan and Milton, Sivan and Zaiane, Osmar and Yu, Mo and Ponti, Edoardo M and Reddy, Siva}, journal = {Transactions of the Association for Computational Linguistics}, volume = {10}, pages = {1473--1490}, year = {2022}, month = {12}, publisher = {MIT Press}, doi={10.1162/tacl_a_00529} }

and

bibtex @inproceedings{dziri2022origin, title = "On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?", author = {Dziri, Nouha and Milton, Sivan and Yu, Mo and Zaiane, Osmar and Reddy, Siva}, booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies", year = {2022}, pages = "5271--5285", address = "Seattle, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.naacl-main.387" }

Bibkey in aclanthology: dziri-etal-2022-origin.

License

This work is licensed under the MIT license. See LICENSE for details.

Owner

  • Name: McGill NLP
  • Login: McGill-NLP
  • Kind: organization
  • Location: Canada

Research group within McGill University and Mila focusing on various topics in natural language processing.

Citation (CITATION.cff)

cff-version: "1.2.0"
date-released: 2022-04
message: "If you use our work, please cite it using these metadata."
title: "FaithDial: A Faithful Benchmark for Information-Seeking Dialogue"
url: "https://github.com/McGill-NLP/FaithDial"
authors:
  - family-names: Dziri
    given-names: Nouha
  - family-names: Kamalloo
    given-names: Ehsan
  - family-names: Milton
    given-names: Sivan
  - family-names: Zaiane
    given-names: Osmar
  - family-names: Yu
    given-names: Mo
  - family-names: Ponti
    given-names: "Edoardo M."
  - family-names: Reddy
    given-names: Siva
preferred-citation:
  type: article
  authors:
  - family-names: Dziri
    given-names: Nouha
  - family-names: Kamalloo
    given-names: Ehsan
  - family-names: Milton
    given-names: Sivan
  - family-names: Zaiane
    given-names: Osmar
  - family-names: Yu
    given-names: Mo
  - family-names: Ponti
    given-names: "Edoardo M."
  - family-names: Reddy
    given-names: Siva
  month: 12
  title: "FaithDial: A Faithful Benchmark for Information-Seeking Dialogue"
  year: 2022
  url:  "https://doi.org/10.1162/tacl_a_00529"
  issn: "2307-387X"
  journal: "Transactions of the Association for Computational Linguistics"
  volume: 10
  pages: "1473-1490"
  address: "Online"

GitHub Events

Total
  • Watch event: 3
Last Year
  • Watch event: 3

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 2
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • 141forever (1)
  • abaheti95 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

models/ctrl/ctrl_requirements.txt pypi
  • accelerate *
  • en_core_web_sm *
  • spacy >=3.1.0,<3.2.0
requirements.txt pypi
  • datasets *
  • pytorch-lightning *
  • torchmetrics *
  • transformers *