tlidb

Transfer Learning in Dialogue Benchmarking Toolkit

https://github.com/alon-albalak/tlidb

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
1 of 5 committers (20.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.6%) to scientific vocabulary

Keywords

dataset few-shot-learning machine-learning transfer-learning

Last synced: 6 months ago · JSON representation ·

Repository

Transfer Learning in Dialogue Benchmarking Toolkit

Basic Info

Host: GitHub
Owner: alon-albalak
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 916 KB

Statistics

Stars: 14
Watchers: 4
Forks: 5
Open Issues: 1
Releases: 4

Topics

dataset few-shot-learning machine-learning transfer-learning

Created over 4 years ago · Last pushed almost 3 years ago

Metadata Files

Readme License Citation

The Transfer Learning in Dialogue Benchmarking Toolkit

This repo contains data and code used in FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue, presented at EMNLP 2022

The repo can also be utilized for many more research scenarios, including: - Multi-Task Learning - In-Context Task Transfer - Continual Learning - Generalizability of pre-training datasets and model architectures

This repo is also the starter code for the FETA Benchmark Challenge!

The FETA Benchmark Challenge is being hosted at the 5th Workshop on NLP For Conversational AI (co-located with ACL 2023).
The mission of the FETA challenge is to encourage the development and evaluation of new approaches to task-transfer with limited in-domain data.
Specifically, FETA focuses on the dialogue domain due to interests in empowering human-machine communication through natural language.

For more details on the FETA challenge, see the FETA README.

Overview

TLiDB is a tool used to benchmark methods of transfer learning in conversational AI. TLiDB can easily handle domain adaptation, task transfer, multitasking, continual learning, and other transfer learning settings. TLiDB maintains a unified json format for all datasets and tasks, easing the new code necessary for new datasets and tasks. We highly encourage community contributions to the project.

The main features of TLiDB are:

Dataset class to easily load a dataset for use across models
Unified metrics to standardize evaluation across datasets
Extensible Model and Algorithm classes to support fast prototyping

Installation

Requirements

python>=3.6
torch>=1.10
nltk>=3.6.5
scikit-learn>=1.0
transformers>=4.11.3
sentencepiece>=0.1.96
bert-score==0.3.11

To use TLiDB, you can simply install via pip: bash pip install tlidb

OR, you can install TLiDB from source. This is recommended if you want to edit or contribute: bash git clone git@github.com:alon-albalak/TLiDB.git cd TLiDB pip install -e .

How to use TLiDB

TLiDB can be used from the command line or as a python command. If you have installed the package from source, we highly recommend running commands from inside the tlidb/examples/ directory.

Quick Start

For a very simple set up, you can use the following commands. - From command line: bash tlidb --source_datasets Friends --source_tasks emory_emotion_recognition --target_datasets Friends --target_tasks reading_comprehension --do_train --do_finetune --do_eval --eval_best --model_config bert --few_shot_percent 0.1 - As python command (only if installed from source): bash cd examples python3 run_experiment.py --source_datasets Friends --source_tasks emory_emotion_recognition --target_datasets Friends --target_tasks reading_comprehension --do_train --do_finetune --do_eval --eval_best --model_config bert --few_shot_percent 0.1

Detailed Usage

TLiDB has 2 main folders of interest: - tlidb/examples - tlidb/TLiDB

tlidb/examples/ is recommended for use if you would like to utilize our training scripts. It contains sample code for models, learning algorithms, and sample training scripts. For detailed examples, see the Examples README.

tlidb/TLiDB/ holds the code related to data (datasets, dataloaders, metrics, etc.). If you are interested in utilizing our datasets and metrics but would like to train models using your own training scripts, take a look at the example usage in TLiDB README.

Folder descriptions:

tlidb/TLiDB is the folder holding the code for data handling
- tlidb/TLiDB/dataloaders contains code for dataloaders
- tlidb/TLiDB/data is the destination folder for downloaded datasets (if installed from source, otherwise data is in .cache/tlidb/data)
- tlidb/TLiDB/datasets contains code for dataset loading and preprocessing
- tlidb/TLiDB/metrics contains code for loss and evaluation metrics
- tlidb/TLiDB/utils contains utility files
tlidb/examples contains sample code for training and evaluating models
- tlidb/examples/algorithms contains code which trains and evaluates a model
- tlidb/examples/models contains code to define a model
- tlidb/examples/configs contains code for model configurations
/dataset_preprocessing is for reproducability purposes. It contains scripts used to preprocess the TLiDB datasets from their original form into the standardized TLiDB format

Comments, Questions, and Feedback

If you find issues, please open an issue here.

If you have dataset or model requests, please add a new discussion here.

We encourage outside contributions to the project!

Citation

If you use the FETA datasets in your work, please cite the FETA paper: ``` @inproceedings{albalak-etal-2022-feta, title = "{FETA}: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue", author = "Albalak, Alon and Tuan, Yi-Lin and Jandaghi, Pegah and Pryor, Connor and Yoffe, Luke and Ramachandran, Deepak and Getoor, Lise and Pujara, Jay and Wang, William Yang", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2022", address = "Abu Dhabi, United Arab Emirates", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.emnlp-main.751", pages = "10936--10953", abstract = "Task transfer, transferring knowledge contained in related tasks, holds the promise of reducing the quantity of labeled data required to fine-tune language models. Dialogue understanding encompasses many diverse tasks, yet task transfer has not been thoroughly studied in conversational AI. This work explores conversational task transfer by introducing FETA: a benchmark for FEw-sample TAsk transfer in open-domain dialogue.FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer; task transfer without domain adaptation. We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs and create a baseline for future work.We run experiments in the single- and multi-source settings and report valuable findings, e.g., most performance trends are model-specific, and span extraction and multiple-choice tasks benefit the most from task transfer.In addition to task transfer, FETA can be a valuable resource for future research into the efficiency and generalizability of pre-training datasets and model architectures, as well as for learning settings such as continual and multitask learning.", }

```

If you use TLiDB in your work, please cite the repository: @software{Albalak_The_Transfer_Learning_2022, author = {Albalak, Alon}, doi = {10.5281/zenodo.6374360}, month = {3}, title = {{The Transfer Learning in Dialogue Benchmarking Toolkit}}, url = {https://github.com/alon-albalak/TLiDB}, version = {1.0.0}, year = {2022} }

Acknowledgements

The design of TLiDB was based the wilds project, and the Open Graph Benchmark.

Owner

Name: Alon Albalak
Login: alon-albalak
Kind: user
Location: Santa Barbara, CA

Website: https://alon-albalak.github.io/
Twitter: AlbalakAlon
Repositories: 5
Profile: https://github.com/alon-albalak

PhD student, Natural Language Processing and Deep Learning

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Albalak"
  given-names: "Alon"
  orcid: "https://orcid.org/0000-0003-0809-1704"
title: "The Transfer Learning in Dialogue Benchmarking Toolkit"
version: 1.0.3
doi: 10.5281/zenodo.6534175
date-released: 2022-5-09
url: "https://github.com/alon-albalak/TLiDB"

GitHub Events

Total

Last Year

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 301
Total Committers: 5
Avg Commits per committer: 60.2
Development Distribution Score (DDS): 0.296

Top Committers

Name	Email	Commits
Alon Albalak	a**k@g**m	212
Alon Albalak	a**k@u**u	78
Gyuwan Kim	k**h@g**m	5
pascalson	p**n@g**m	3
Pegah Jandaghimeibodi	p**i@D**l	3

Committer Domains (Top 20 + Academic)

ucsb.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 2
Total pull requests: 2
Average time to close issues: 19 days
Average time to close pull requests: less than a minute
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 2.0
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

shubham8garg (1)

Pull Request Authors

alon-albalak (2)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 5 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 8
Total maintainers: 1

pypi.org: tlidb

The Transfer Learning in Dialogue Baselines Toolkit

Homepage: https://github.com/alon-albalak/TLiDB
Documentation: https://tlidb.readthedocs.io/
License: MIT
Latest release: 1.0.3
published almost 4 years ago

Versions: 8
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 5 Last month

Rankings

Dependent packages count: 9.8%

Forks count: 14.2%

Stargazers count: 15.6%

Dependent repos count: 21.8%

Average: 24.0%

Downloads: 58.4%

Maintainers (1)

alon_albalak

Last synced: 6 months ago

Dependencies

requirements.txt pypi

bert-score ==0.3.11
nltk ==3.6.5
scikit-learn ==1.0
sentencepiece ==0.1.96
torch >=1.10
transformers ==4.11.3

setup.py pypi

bert-score ==0.3.11
nltk ==3.6.5
scikit-learn ==1.0
sentencepiece ==0.1.96
torch >=1.10
transformers ==4.11.3

tlidb

Science Score: 77.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

The Transfer Learning in Dialogue Benchmarking Toolkit

This repo contains data and code used in FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue, presented at EMNLP 2022

This repo is also the starter code for the FETA Benchmark Challenge!

Overview

Installation

Requirements

How to use TLiDB

Quick Start

Detailed Usage

Folder descriptions:

Comments, Questions, and Feedback

Citation

Acknowledgements

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: tlidb

Rankings

Maintainers (1)

Dependencies