https://github.com/amazon-science/dse

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: amazon-science
License: apache-2.0
Language: Python
Default Branch: main
Size: 604 KB

Statistics

Stars: 43
Watchers: 2
Forks: 5
Open Issues: 5
Releases: 0

Created about 4 years ago · Last pushed about 3 years ago

Metadata Files

Readme Contributing License Code of conduct

DSE: Learning Dialogue Representations from Consecutive Utterances

This repository contains the code for our paper

Learning Dialogue Representations from Consecutive Utterances" (NAACL 2022).

1. Introduction

DSE is a pre-trained conversational language model that can serve as a drop-in replacement of text encoder (e.g., BERT) for various dialogue systems. It learns dialogue representations by taking consecutive utterances within the same dialogues as positive pairs for contrastive learning. Please refer to our paper for more details.

figure_2

2. Pre-trained Models

The pre-trained models will be available at HuggingFace ModulHub very soon.

3. Setup environment

# create and activate virtual python environment
conda create -n dse python=3.7
conda activate dse

# install required packages
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -y -c pytorch # change the version of cudatoolkit based on your cuda version
python3 -m pip install -r requirements.txt

4. Quick Start

Our model is easy to use with the transformers package.

```python from transformers import AutoModel, AutoTokenizer from torch.nn import CosineSimilarity

Load the model and tokenizer

model = AutoModel.frompretrained("aws-ai/dse-bert-base") tokenizer = AutoTokenizer.frompretrained("aws-ai/dse-bert-base")

Define the sentences of interests

texts = ["When will I get my card?", "Is there a way to know when my card will arrive?"]

Define a function that calculate text embedding for a list of texts

def getaverageembedding(texts): inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

# Calculate the sentence embeddings by averaging the embeddings of non-padding words
with torch.no_grad():
    embeddings = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"])
    attention_mask = inputs["attention_mask"].unsqueeze(-1)
    embeddings = torch.sum(embeddings[0]*attention_mask, dim=1) / torch.sum(attention_mask, dim=1)
    return embeddings

cosinesim = nn.CosineSimilarity(dim=1) embeddings = getaverage_embedding(texts)

print("Similarity of the two sentences is: ", cosine_sim(embeddings[0], embeddings[1]).item()) ```

Model List

We released the following five models. You can import these models by using HuggingFace's Transformers.

| Model | |:-----------------------------------------------------------------------------------------------------------------| | aws-ai/dse-distilbert-base |
| aws-ai/dse-bert-base | | aws-ai/dse-bert-large | | aws-ai/dse-roberta-base |
| aws-ai/dse-roberta-large |

5. Train DSE

Our code base expects two type of positive pairs for pre-training: existing positive pairs (e.g., consecutive utterances) or "Dropout or the same sentence". To use existing positive pairs for pre_training, simply use a tsv or csv file where each row contains one positive pair. To use Dropout, simply use a txt file where each row contains a single sentence/paragraph.

For example, given a train.csv file, DSE can be trained as:

``` export DATADIR=/PATH/To/Training/Data/Folder export FILENAME="train.csv" export OUTPUTDIR=/PATH/To/Output/Folder export MODELTYPE=bertbase # choose from [bertbase, bertlarge, robertabase, robertalarge, distilbertbase] cd pretrain

python main.py \ --resdir ${OUTPUTDIR} \ --datapath ${DATADIR} \ --dataname todsinglepos3.tsv \ --mode contrastive \ --bert ${MODELTYPE} \ --contrasttype HardNeg \ --lr 3e-06 \ --lrscale 100 \ --batchsize 1024 \ --maxlength 32 \ --temperature 0.05 \ --epochs 15 \ --maxiter 10000000 \ --loggingstep 400 \ --featdim 128 \ --numturn 1 \ --seed 1 \ --savemodelevery_epoch

```

5.1 Reproduce Our Training

To reproduce our experiments, please generate training data as follows. We sincerely appreciate TOD-BERT authors for preparing these great scripts.

``` export WORKPLACE="/home" export OUTPUTDIR="/home/data/dse_training.tsv"

1. Clone the repos

cd $WORK_PLACE git clone https://github.com/jasonwu0731/ToD-BERT git clone https://github.com/amazon-research/dse

2. Downloads the raw datasets from https://drive.google.com/file/d/1EnGX0UF4KW6rVBKMF3fL-9Q2ZyFKNOIy/view?usp=sharing

and put the "TODBERTdialogdatasets.zip" file at current directory

3. Unzip the downloaded file

unzip TODBERTdialogdatasets.zip -d $WORK_SPACE

4. Modify "ToD-BERT/mytodpretraining.py" to acquire the data processed by TOD-BERT's codebase

4.1 Change line 745 to default='/PATH/TO/dialogdatasets' (e.g., '/home/dialogdatasets')

4.2 Add the following line after line 951

    with open("pre_train.pkl", "wb") as f:
        pickle.dump(datasets, f) 

    raise ValueError("Done")

4.3 Run the following script, once it stops, a file named "pre_train.pkl" should appear in this folder

cd $WORKPLACE/ToD-BERT ./runtodlmpretraining.sh 0 bert bert-base-uncased save/pretrain/ToD-BERT-MLM --onlylastturn

5. Run our script to generate positive pairs from TOD-BERT's training data

cd $WORKPLACE/dse/data python processpretrain.py --datadir $WORKPLACE/ToD-BERT/pretrain.pkl --outputdir $OUTPUT_DIR

A tsv file should appear at OUTPUT_DIR, which can be directly used for model pre-training

```

6. Evaluate DSE

We provide codes for 5 downstream applications:

Intent Classification: we formulate it as a single-sentence multi-class classification problem and evaluate it with accuracy.
Out-of-scope Detection: we formulate it as a single-sentence multi-class classification problem and evaluate it with accuracy, in-domain accuracy, out-of-scope accuracy, and out-of-scope recall
Sentence-Level Response Selection: we formulate it as a ranking problem. We respectively calculate the embeddings for response and query We always set the batch size as 100 during evaluation and calculate the recall@1/3/5/10. Perform on AmazonQA dataset.
Dialogue-Level Response Selection: Similar as above. But input to this task is dialogue history (concatenation of multiple dialogue sentences).
Dialogue Action Prediction: we formulate it as a multi-sentence multi-label classification problem. We use the concatenation of dialogue history as input.

The evaluations are performed in two ways:

Fine-tune: add task-specific layer(s) to the pre-trained model and optimize the entire model by training the model on labeled data. Support task 1,3,4,5.
Similarity-based methods: do not perform any model training. Make predictions singly based on the text embedding given by the pre-training model. When performing few-shot classification, we use prototypical network. Support task 1,2,3,4

6.1 Data Generation

To perform evaluations in our paper, please follow the instruction to generate the corresponding evaluation data.

6.1.1 Intent Classification and Out-Of-Scope Detection

We use 4 public datasets:

1. Clinc 150
2. Banking 77
3. SNIPS
4. HWU64

``` export OUTPUTDIR="/home/data/dseevaluate"

1. Download raw data for snips and hwu64 (we direct acquire clinc150 and banking 77 from a package named "datasets")

wget https://raw.githubusercontent.com/clinc/nlu-datasets/master/nludatasets/intentclassification/snipstrain.json wget https://raw.githubusercontent.com/clinc/nlu-datasets/master/nludatasets/intentclassification/snipstest.json wget https://raw.githubusercontent.com/xliuhw/NLU-Evaluation-Data/master/AnnotatedData/NLU-Data-Home-Domain-Annotated-All.csv

2. Run our scripts to generate data from the raw data

python processevaluate.py --outputdir $OUTPUTDIR --task clinc150 python processevaluate.py --outputdir $OUTPUTDIR --task bank77 python processevaluate.py --outputdir $OUTPUTDIR --task snips python processevaluate.py --outputdir $OUTPUTDIR --task hwu64

```

6.1.2 Response Selection

We use 2 public datasets:

1. AmazonQA
2. Ubuntu-DSTC7

We use ParlAI to get the raw data, and provide scripts to generate data for our evaluation.

``` export WORKSPACE="/home" export OUTPUTDIR="/home/data/dse_evaluate"

1. Install ParlAi Package

cd $WORK_SPACE git clone https://github.com/facebookresearch/ParlAI.git cd ParlAI; python setup.py develop

2. Use ParlAi to generate raw data

cd $WORKSPACE/dse/data parlai converttoparlai --task amazonqa --datatype train --outfile amazonqa.txt parlai converttoparlai --task dstc7 --datatype test --outfile ubuntutest.txt parlai converttoparlai --task dstc7 --datatype valid --outfile ubuntuvalid.txt

3. Use our scripts to generate evaluation data

python processevaluate.py --outputdir $OUTPUTDIR --task amazonqa python processevaluate.py --outputdir $OUTPUTDIR --task ubuntu

```

For AmazonQA, we observe large performance fluctuation on the absolute evaluation metric with different random seed and different ordering of the QA pairs, yet the improvement of DSE over the baselines is consistent. For Ubuntu-DSTC7, since there is little randomness, we observe consistent performance as reported in the paper.

6.2 Similarity-based Evaluation

export DATA_DIR=/Path/To/Evaluation/Data export MODEL_DIR=/Path/To/Pre-trained/Model export OUTPUT_DIR=/Path/To/Output/Folder cd evaluate

This part provides codes for using similarity-based methods to perform evaluation for any model that the Huggingface-transformers provides.

6.2.1 Intent Classification

python run_similarity.py \
    --model_dir ${MODEL_DIR} \
    --data_root_dir ${DATA_DIR} \
    --output_dir ${OUTPUT_DIR} \
    --TASK intent \
    --num_runs 10 \
    --max_seq_length 64

6.2.2 Out-of-scope Detection

python run_similarity.py \
    --model_dir ${MODEL_DIR} \
    --data_root_dir ${DATA_DIR} \
    --output_dir ${OUTPUT_DIR} \
    --TASK oos \
    --num_runs 10 \
    --max_seq_length 64

6.2.3 Response Selection on AmazonQA

python run_similarity.py \
    --model_dir ${MODEL_DIR} \
    --data_root_dir ${DATA_DIR} \
    --output_dir ${OUTPUT_DIR} \
    --TASK rs_amazon \
    --max_seq_length 128

6.2.4 Response Selection on Ubuntu-DSTC7

python run_similarity.py \
    --model_dir ${MODEL_DIR} \
    --data_root_dir ${DATA_DIR} \
    --output_dir ${OUTPUT_DIR} \
    --TASK rs_ubuntu \
    --max_seq_length 128

6.3 Fine-tune-based Evaluation

This part provides code for fine-tuning BERT model. We assume 2 GPUs for finetuning. If more of less number of GPUs are used, please adjust the per_gpu_batch_size accordingly.

export DATA_DIR=/Path/To/Evaluation/Data export MODEL_DIR=/Path/To/Pre-trained/Model export OUTPUT_DIR=/Path/To/Output/Folder cd evaluate

6.3.1 Intent Classification

for dataset in bank77 clinc150 hwu64 snips
do  
    for data_ratio in 1 5
    do
        python run_finetune.py \
        --data_dir ${DATA_DIR}/intent/${dataset} \
        --model_type ${MODEL_DIR} \
        --TASK seq \
        --output_dir ${OUTPUT_DIR}/intent_${dataset}_${data_ratio} \
        --bert_lr 3e-5 \
        --epoch 50 \
        --max_seq_length 64 \
        --per_gpu_batch_size 8 \
        --gradient_accumulation_steps 1 \
        --data_ratio data_ratio \
        --num_runs 10 \
        --patience 5 \
        --classification_pooling average \
        --early_stop_type metric
    done
done

6.3.2 Response Selection

for data_ratio in 500 1000  
do
    python run_finetune.py \
        --data_dir ${DATA_DIR}/rs/amazonqa \
        --model_type ${MODEL_DIR} \
        --TASK rs \
        --output_dir ${OUTPUT_DIR}/rs_amazonqa_${data_ratio} \
        --bert_lr 3e-5 \
        --epoch 50 \
        --max_seq_length 128 \
        --per_gpu_batch_size 50 \
        --gradient_accumulation_steps 1 \
        --data_ratio ${data_ratio} \
        --num_runs 5 \
        --patience 3 \
        --eval_steps 50 
done

6.3.3 Dialogue Action Prediction

for dataset in dstc2 sim_joint
do
    for data_ratio in 10 20
    do
        python run_finetune.py \
            --data_dir ${DATA_DIR}/da/${dataset} \
            --model_type ${MODEL_DIR} \
            --TASK da \
            --output_dir ${OUTPUT_DIR}/da_concat_${dataset}_${data_ratio} \
            --bert_lr 5e-5 \
            --epoch 100 \
            --max_seq_length 32 \
            --per_gpu_batch_size 16 \
            --gradient_accumulation_steps 1 \
            --data_ratio ${data_ratio} \
            --num_runs 5 \
            --patience 3 \
            --eval_steps 30 \
            --num_turn 1 \
            --concatenate \
            --save_model \
            --early_stop_type metric
    done
done

7. Citation

If you have any question regarding our paper or codes, please feel free to start an issue or email Zhihan Zhou or Dejiao Zhang (zhihanzhou2020@u.northwestern.edu, dejiaoz@amazon.com).

If you use DSE in your work, please cite our paper:

@inproceedings{zhou-etal-2022-learning, title = "Learning Dialogue Representations from Consecutive Utterances", author = "Zhou, Zhihan and Zhang, Dejiao and Xiao, Wei and Dingwall, Nicholas and Ma, Xiaofei and Arnold, Andrew and Xiang, Bing", booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies", month = jul, year = "2022", address = "Seattle, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.naacl-main.55", pages = "754--768", abstract = "Learning high-quality dialogue representations is essential for solving a variety of dialogue-oriented tasks, especially considering that dialogue systems often suffer from data scarcity. In this paper, we introduce Dialogue Sentence Embedding (DSE), a self-supervised contrastive learning method that learns effective dialogue representations suitable for a wide range of dialogue tasks. DSE learns from dialogues by taking consecutive utterances of the same dialogue as positive pairs for contrastive learning. Despite its simplicity, DSE achieves significantly better representation capability than other dialogue representation and universal sentence representation models. We evaluate DSE on five downstream dialogue tasks that examine dialogue representation at different semantic granularities. Experiments in few-shot and zero-shot settings show that DSE outperforms baselines by a large margin, for example, it achieves 13{\%} average performance improvement over the strongest unsupervised baseline in 1-shot intent classification on 6 datasets. We also provide analyses on the benefits and limitations of our model.", }

Owner

Name: Amazon Science
Login: amazon-science
Kind: organization

Website: https://amazon.science
Twitter: AmazonScience
Repositories: 80
Profile: https://github.com/amazon-science

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 7
Total pull requests: 4
Average time to close issues: 12 days
Average time to close pull requests: over 1 year
Total issue authors: 5
Total pull request authors: 2
Average comments per issue: 0.29
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

voorhs (1)
MatthewCYM (1)
LuoDQ (1)
yukyunglee (1)
Zeng-WH (1)