https://github.com/amazon-science/dse
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: amazon-science
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 604 KB
Statistics
- Stars: 43
- Watchers: 2
- Forks: 5
- Open Issues: 5
- Releases: 0
Metadata Files
README.md
DSE: Learning Dialogue Representations from Consecutive Utterances
This repository contains the code for our paper
Learning Dialogue Representations from Consecutive Utterances" (NAACL 2022).
Contents
- 1. Introduction
- 2. Pre-trained Models
- 3. Setup Environment
- 4. Quick Start
- 5. Train DSE
- 6. Evaluate DSE
- 7. Citation
1. Introduction
DSE is a pre-trained conversational language model that can serve as a drop-in replacement of text encoder (e.g., BERT) for various dialogue systems. It learns dialogue representations by taking consecutive utterances within the same dialogues as positive pairs for contrastive learning. Please refer to our paper for more details.

2. Pre-trained Models
The pre-trained models will be available at HuggingFace ModulHub very soon.
3. Setup environment
# create and activate virtual python environment
conda create -n dse python=3.7
conda activate dse
# install required packages
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -y -c pytorch # change the version of cudatoolkit based on your cuda version
python3 -m pip install -r requirements.txt
4. Quick Start
Our model is easy to use with the transformers package.
```python from transformers import AutoModel, AutoTokenizer from torch.nn import CosineSimilarity
Load the model and tokenizer
model = AutoModel.frompretrained("aws-ai/dse-bert-base") tokenizer = AutoTokenizer.frompretrained("aws-ai/dse-bert-base")
Define the sentences of interests
texts = ["When will I get my card?", "Is there a way to know when my card will arrive?"]
Define a function that calculate text embedding for a list of texts
def getaverageembedding(texts): inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
# Calculate the sentence embeddings by averaging the embeddings of non-padding words
with torch.no_grad():
embeddings = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"])
attention_mask = inputs["attention_mask"].unsqueeze(-1)
embeddings = torch.sum(embeddings[0]*attention_mask, dim=1) / torch.sum(attention_mask, dim=1)
return embeddings
cosinesim = nn.CosineSimilarity(dim=1) embeddings = getaverage_embedding(texts)
print("Similarity of the two sentences is: ", cosine_sim(embeddings[0], embeddings[1]).item()) ```
Model List
We released the following five models. You can import these models by using HuggingFace's Transformers.
| Model |
|:-----------------------------------------------------------------------------------------------------------------|
| aws-ai/dse-distilbert-base |
| aws-ai/dse-bert-base |
| aws-ai/dse-bert-large |
| aws-ai/dse-roberta-base |
| aws-ai/dse-roberta-large |
5. Train DSE
Our code base expects two type of positive pairs for pre-training: existing positive pairs (e.g., consecutive utterances) or "Dropout or the same sentence". To use existing positive pairs for pre_training, simply use a tsv or csv file where each row contains one positive pair. To use Dropout, simply use a txt file where each row contains a single sentence/paragraph.
For example, given a train.csv file, DSE can be trained as:
``` export DATADIR=/PATH/To/Training/Data/Folder export FILENAME="train.csv" export OUTPUTDIR=/PATH/To/Output/Folder export MODELTYPE=bertbase # choose from [bertbase, bertlarge, robertabase, robertalarge, distilbertbase] cd pretrain
python main.py \ --resdir ${OUTPUTDIR} \ --datapath ${DATADIR} \ --dataname todsinglepos3.tsv \ --mode contrastive \ --bert ${MODELTYPE} \ --contrasttype HardNeg \ --lr 3e-06 \ --lrscale 100 \ --batchsize 1024 \ --maxlength 32 \ --temperature 0.05 \ --epochs 15 \ --maxiter 10000000 \ --loggingstep 400 \ --featdim 128 \ --numturn 1 \ --seed 1 \ --savemodelevery_epoch
```
5.1 Reproduce Our Training
To reproduce our experiments, please generate training data as follows. We sincerely appreciate TOD-BERT authors for preparing these great scripts.
``` export WORKPLACE="/home" export OUTPUTDIR="/home/data/dse_training.tsv"
1. Clone the repos
cd $WORK_PLACE git clone https://github.com/jasonwu0731/ToD-BERT git clone https://github.com/amazon-research/dse
2. Downloads the raw datasets from https://drive.google.com/file/d/1EnGX0UF4KW6rVBKMF3fL-9Q2ZyFKNOIy/view?usp=sharing
and put the "TODBERTdialogdatasets.zip" file at current directory
3. Unzip the downloaded file
unzip TODBERTdialogdatasets.zip -d $WORK_SPACE
4. Modify "ToD-BERT/mytodpretraining.py" to acquire the data processed by TOD-BERT's codebase
4.1 Change line 745 to default='/PATH/TO/dialogdatasets' (e.g., '/home/dialogdatasets')
4.2 Add the following line after line 951
with open("pre_train.pkl", "wb") as f:
pickle.dump(datasets, f)
raise ValueError("Done")
4.3 Run the following script, once it stops, a file named "pre_train.pkl" should appear in this folder
cd $WORKPLACE/ToD-BERT ./runtodlmpretraining.sh 0 bert bert-base-uncased save/pretrain/ToD-BERT-MLM --onlylastturn
5. Run our script to generate positive pairs from TOD-BERT's training data
cd $WORKPLACE/dse/data python processpretrain.py --datadir $WORKPLACE/ToD-BERT/pretrain.pkl --outputdir $OUTPUT_DIR
A tsv file should appear at OUTPUT_DIR, which can be directly used for model pre-training
```
6. Evaluate DSE
We provide codes for 5 downstream applications:
Intent Classification: we formulate it as a single-sentence multi-class classification problem and evaluate it with accuracy.
Out-of-scope Detection: we formulate it as a single-sentence multi-class classification problem and evaluate it with accuracy, in-domain accuracy, out-of-scope accuracy, and out-of-scope recall
Sentence-Level Response Selection: we formulate it as a ranking problem. We respectively calculate the embeddings for response and query We always set the batch size as 100 during evaluation and calculate the recall@1/3/5/10. Perform on AmazonQA dataset.
Dialogue-Level Response Selection: Similar as above. But input to this task is dialogue history (concatenation of multiple dialogue sentences).
Dialogue Action Prediction: we formulate it as a multi-sentence multi-label classification problem. We use the concatenation of dialogue history as input.
The evaluations are performed in two ways:
Fine-tune: add task-specific layer(s) to the pre-trained model and optimize the entire model by training the model on labeled data. Support task 1,3,4,5.
Similarity-based methods: do not perform any model training. Make predictions singly based on the text embedding given by the pre-training model. When performing few-shot classification, we use prototypical network. Support task 1,2,3,4
6.1 Data Generation
To perform evaluations in our paper, please follow the instruction to generate the corresponding evaluation data.
6.1.1 Intent Classification and Out-Of-Scope Detection
We use 4 public datasets:
1. Clinc 150
2. Banking 77
3. SNIPS
4. HWU64
``` export OUTPUTDIR="/home/data/dseevaluate"
1. Download raw data for snips and hwu64 (we direct acquire clinc150 and banking 77 from a package named "datasets")
wget https://raw.githubusercontent.com/clinc/nlu-datasets/master/nludatasets/intentclassification/snipstrain.json wget https://raw.githubusercontent.com/clinc/nlu-datasets/master/nludatasets/intentclassification/snipstest.json wget https://raw.githubusercontent.com/xliuhw/NLU-Evaluation-Data/master/AnnotatedData/NLU-Data-Home-Domain-Annotated-All.csv
2. Run our scripts to generate data from the raw data
python processevaluate.py --outputdir $OUTPUTDIR --task clinc150 python processevaluate.py --outputdir $OUTPUTDIR --task bank77 python processevaluate.py --outputdir $OUTPUTDIR --task snips python processevaluate.py --outputdir $OUTPUTDIR --task hwu64
```
6.1.2 Response Selection
We use 2 public datasets:
1. AmazonQA
2. Ubuntu-DSTC7
We use ParlAI to get the raw data, and provide scripts to generate data for our evaluation.
``` export WORKSPACE="/home" export OUTPUTDIR="/home/data/dse_evaluate"
1. Install ParlAi Package
cd $WORK_SPACE git clone https://github.com/facebookresearch/ParlAI.git cd ParlAI; python setup.py develop
2. Use ParlAi to generate raw data
cd $WORKSPACE/dse/data parlai converttoparlai --task amazonqa --datatype train --outfile amazonqa.txt parlai converttoparlai --task dstc7 --datatype test --outfile ubuntutest.txt parlai converttoparlai --task dstc7 --datatype valid --outfile ubuntuvalid.txt
3. Use our scripts to generate evaluation data
python processevaluate.py --outputdir $OUTPUTDIR --task amazonqa python processevaluate.py --outputdir $OUTPUTDIR --task ubuntu
```
For AmazonQA, we observe large performance fluctuation on the absolute evaluation metric with different random seed and different ordering of the QA pairs, yet the improvement of DSE over the baselines is consistent. For Ubuntu-DSTC7, since there is little randomness, we observe consistent performance as reported in the paper.
6.2 Similarity-based Evaluation
export DATA_DIR=/Path/To/Evaluation/Data
export MODEL_DIR=/Path/To/Pre-trained/Model
export OUTPUT_DIR=/Path/To/Output/Folder
cd evaluate
This part provides codes for using similarity-based methods to perform evaluation for any model that the Huggingface-transformers provides.
6.2.1 Intent Classification
python run_similarity.py \
--model_dir ${MODEL_DIR} \
--data_root_dir ${DATA_DIR} \
--output_dir ${OUTPUT_DIR} \
--TASK intent \
--num_runs 10 \
--max_seq_length 64
6.2.2 Out-of-scope Detection
python run_similarity.py \
--model_dir ${MODEL_DIR} \
--data_root_dir ${DATA_DIR} \
--output_dir ${OUTPUT_DIR} \
--TASK oos \
--num_runs 10 \
--max_seq_length 64
6.2.3 Response Selection on AmazonQA
python run_similarity.py \
--model_dir ${MODEL_DIR} \
--data_root_dir ${DATA_DIR} \
--output_dir ${OUTPUT_DIR} \
--TASK rs_amazon \
--max_seq_length 128
6.2.4 Response Selection on Ubuntu-DSTC7
python run_similarity.py \
--model_dir ${MODEL_DIR} \
--data_root_dir ${DATA_DIR} \
--output_dir ${OUTPUT_DIR} \
--TASK rs_ubuntu \
--max_seq_length 128
6.3 Fine-tune-based Evaluation
This part provides code for fine-tuning BERT model. We assume 2 GPUs for finetuning. If more of less number of GPUs are used, please adjust the per_gpu_batch_size accordingly.
export DATA_DIR=/Path/To/Evaluation/Data
export MODEL_DIR=/Path/To/Pre-trained/Model
export OUTPUT_DIR=/Path/To/Output/Folder
cd evaluate
6.3.1 Intent Classification
for dataset in bank77 clinc150 hwu64 snips
do
for data_ratio in 1 5
do
python run_finetune.py \
--data_dir ${DATA_DIR}/intent/${dataset} \
--model_type ${MODEL_DIR} \
--TASK seq \
--output_dir ${OUTPUT_DIR}/intent_${dataset}_${data_ratio} \
--bert_lr 3e-5 \
--epoch 50 \
--max_seq_length 64 \
--per_gpu_batch_size 8 \
--gradient_accumulation_steps 1 \
--data_ratio data_ratio \
--num_runs 10 \
--patience 5 \
--classification_pooling average \
--early_stop_type metric
done
done
6.3.2 Response Selection
for data_ratio in 500 1000
do
python run_finetune.py \
--data_dir ${DATA_DIR}/rs/amazonqa \
--model_type ${MODEL_DIR} \
--TASK rs \
--output_dir ${OUTPUT_DIR}/rs_amazonqa_${data_ratio} \
--bert_lr 3e-5 \
--epoch 50 \
--max_seq_length 128 \
--per_gpu_batch_size 50 \
--gradient_accumulation_steps 1 \
--data_ratio ${data_ratio} \
--num_runs 5 \
--patience 3 \
--eval_steps 50
done
6.3.3 Dialogue Action Prediction
for dataset in dstc2 sim_joint
do
for data_ratio in 10 20
do
python run_finetune.py \
--data_dir ${DATA_DIR}/da/${dataset} \
--model_type ${MODEL_DIR} \
--TASK da \
--output_dir ${OUTPUT_DIR}/da_concat_${dataset}_${data_ratio} \
--bert_lr 5e-5 \
--epoch 100 \
--max_seq_length 32 \
--per_gpu_batch_size 16 \
--gradient_accumulation_steps 1 \
--data_ratio ${data_ratio} \
--num_runs 5 \
--patience 3 \
--eval_steps 30 \
--num_turn 1 \
--concatenate \
--save_model \
--early_stop_type metric
done
done
7. Citation
If you have any question regarding our paper or codes, please feel free to start an issue or email Zhihan Zhou or Dejiao Zhang (zhihanzhou2020@u.northwestern.edu, dejiaoz@amazon.com).
If you use DSE in your work, please cite our paper:
@inproceedings{zhou-etal-2022-learning,
title = "Learning Dialogue Representations from Consecutive Utterances",
author = "Zhou, Zhihan and
Zhang, Dejiao and
Xiao, Wei and
Dingwall, Nicholas and
Ma, Xiaofei and
Arnold, Andrew and
Xiang, Bing",
booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
month = jul,
year = "2022",
address = "Seattle, United States",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.naacl-main.55",
pages = "754--768",
abstract = "Learning high-quality dialogue representations is essential for solving a variety of dialogue-oriented tasks, especially considering that dialogue systems often suffer from data scarcity. In this paper, we introduce Dialogue Sentence Embedding (DSE), a self-supervised contrastive learning method that learns effective dialogue representations suitable for a wide range of dialogue tasks. DSE learns from dialogues by taking consecutive utterances of the same dialogue as positive pairs for contrastive learning. Despite its simplicity, DSE achieves significantly better representation capability than other dialogue representation and universal sentence representation models. We evaluate DSE on five downstream dialogue tasks that examine dialogue representation at different semantic granularities. Experiments in few-shot and zero-shot settings show that DSE outperforms baselines by a large margin, for example, it achieves 13{\%} average performance improvement over the strongest unsupervised baseline in 1-shot intent classification on 6 datasets. We also provide analyses on the benefits and limitations of our model.",
}
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 7
- Total pull requests: 4
- Average time to close issues: 12 days
- Average time to close pull requests: over 1 year
- Total issue authors: 5
- Total pull request authors: 2
- Average comments per issue: 0.29
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 2
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- voorhs (1)
- MatthewCYM (1)
- LuoDQ (1)
- yukyunglee (1)
- Zeng-WH (1)
Pull Request Authors
- jabalazs (4)
- dependabot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- datasets *
- numpy *
- pytorch-nlp ==0.5.0
- sentence-transformers ==2.0.0
- sklearn *
- tensorboardX ==2.3
- transformers ==4.8.1