muld

The Multitask Long Document Benchmark

https://github.com/ghomashudson/muld

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.0%) to scientific vocabulary

Keywords

benchmark long-texts nlp
Last synced: 6 months ago · JSON representation ·

Repository

The Multitask Long Document Benchmark

Basic Info
  • Host: GitHub
  • Owner: ghomasHudson
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 94.7 KB
Statistics
  • Stars: 41
  • Watchers: 1
  • Forks: 1
  • Open Issues: 2
  • Releases: 0
Topics
benchmark long-texts nlp
Created about 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme Citation

README.md

MuLD: The Multitask Long Document Benchmark

MuLD (Multitask Long Document Benchmark) is a set of 6 NLP tasks where the inputs consist of at least 10,000 words. The benchmark covers a wide variety of task types including translation, summarization, question answering, and classification. Additionally there is a range of output lengths from a single word classification label all the way up to an output longer than the input text.

muld_table

This repo contains official code for the paper MuLD: The Multitask Long Document Benchmark.

Quickstart

The easiest method is to use the Huggingface Datasets library: python import datasets ds = datasets.load_dataset("ghomasHudson/muld", "NarrativeQA") ds = datasets.load_dataset("ghomasHudson/muld", "HotpotQA") ds = datasets.load_dataset("ghomasHudson/muld", "Character Archetype Classification") ds = datasets.load_dataset("ghomasHudson/muld", "OpenSubtitles") ds = datasets.load_dataset("ghomasHudson/muld", "AO3 Style Change Detection") ds = datasets.load_dataset("ghomasHudson/muld", "VLSP") Or by cloning this repo: python import datasets ds = datasets.load_dataset("./muld.py", "NarrativeQA") ...

Manual Download

If you prefer to download the data files yourself: - NarrativeQA Train Val Test - Mirror: Train Val Test - HotpotQA Train, Val - Mirror: Train Val - Character Archetype Classification Train Val Test - Mirror: Train Val Test - OpenSubtitles Train Test - Mirror: Train Test - Style Change Train Val Test, - Mirror: Train Val Test - VLSP Test - Mirror: Test

Citation

If you use our benchmark please cite the paper: @InProceedings{hudson-almoubayed:2022:LREC, author = {Hudson, George and Al Moubayed, Noura}, title = {MuLD: The Multitask Long Document Benchmark}, booktitle = {Proceedings of the Language Resources and Evaluation Conference}, month = {June}, year = {2022}, address = {Marseille, France}, publisher = {European Language Resources Association}, pages = {3675--3685}, url = {https://aclanthology.org/2022.lrec-1.392} }

Additionally please cite the datasets we used (particularly NarrativeQA, HotpotQA, and Opensubtitles where we directly use their data with limited filtering).

Dataset Metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value
name MuLD
alternateName Multitask Long Document Benchmark
url
description MuLD (Multitask Long Document Benchmark) is a set of 6 NLP tasks where the inputs consist of at least 10,000 words. The benchmark covers a wide variety of task types including translation, summarization, question answering, and classification. Additionally there is a range of output lengths from a single word classification label all the way up to an output longer than the input text.
citation https://arxiv.org/abs/2202.07362
creator
property value
name Thomas Hudson
sameAs https://orcid.org/0000-0003-3562-3593

Owner

  • Name: Thomas Hudson
  • Login: ghomasHudson
  • Kind: user
  • Company: Durham University

Research Associate at Durham University. Researching NLP for Veterinary medicine

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'MuLD: The Multitask Long Document Benchmark'
message: >-
  If you use this dataset, please cite it using the
  metadata from this file.
type: dataset
authors:
  - given-names: G Thomas
    family-names: Hudson
    email: g.t.hudson@durham.ac.uk
    affiliation: Durham University
    orcid: 'https://orcid.org/0000-0003-3562-3593'
  - given-names: Noura
    name-particle: Al
    family-names: Moubayed
    orcid: 'https://orcid.org/0000-0001-8942-355X'
    affiliation: Durham University
identifiers:
  - type: url
    value: 'https://aclanthology.org/2022.lrec-1.392'
abstract: >-
  The impressive progress in NLP techniques has been
  driven by the development of multi-task benchmarks
  such as GLUE and SuperGLUE. While these benchmarks
  focus on tasks for one or two input sentences,
  there has been exciting work in designing efficient
  techniques for processing much longer inputs. In
  this paper, we present MuLD: a new long document
  benchmark consisting of only documents over 10,000
  tokens. By modifying existing NLP tasks, we create
  a diverse benchmark which requires models to
  successfully model long-term dependencies in the
  text. We evaluate how existing models perform, and
  find that our benchmark is much more challenging
  than their ‘short document’ equivalents.
  Furthermore, by evaluating both regular and
  efficient transformers, we show that models with
  increased context length are better able to solve
  the tasks presented, suggesting that future
  improvements in these models are vital for solving
  similar long document problems. We release the data
  and code for baselines to encourage further
  research on efficient NLP models.
keywords:
  - Long Documents
  - Benchmark
  - Multitask learning
  - NLP
license: CC-BY-NC-4.0

preferred-citation:
  authors:
    - given-names: G Thomas
      family-names: Hudson
      email: g.t.hudson@durham.ac.uk
      affiliation: Durham University
      orcid: 'https://orcid.org/0000-0003-3562-3593'
    - given-names: Noura
      name-particle: Al
      family-names: Moubayed
      orcid: 'https://orcid.org/0000-0001-8942-355X'
      affiliation: Durham University
  title: "MuLD: The Multitask Long Document Benchmark"
  type: conference-paper
  collection-title: Proceedings of the Language Resources and Evaluation Conference
  conference:
    name: Language Resources and Evaluation Conference
    date-start: 2022-06-21
    date-end: 2022-06-23
    address: Marseille, France
  location: 
    name: Marseille, France
  start: 3675
  end: 3685
  publisher:
    name: European Language Resources Association
  url: https://aclanthology.org/2022.lrec-1.392
  abstract: >-
    The impressive progress in NLP techniques has been
    driven by the development of multi-task benchmarks
    such as GLUE and SuperGLUE. While these benchmarks
    focus on tasks for one or two input sentences,
    there has been exciting work in designing efficient
    techniques for processing much longer inputs. In
    this paper, we present MuLD: a new long document
    benchmark consisting of only documents over 10,000
    tokens. By modifying existing NLP tasks, we create
    a diverse benchmark which requires models to
    successfully model long-term dependencies in the
    text. We evaluate how existing models perform, and
    find that our benchmark is much more challenging
    than their ‘short document’ equivalents.
    Furthermore, by evaluating both regular and
    efficient transformers, we show that models with
    increased context length are better able to solve
    the tasks presented, suggesting that future
    improvements in these models are vital for solving
    similar long document problems. We release the data
    and code for baselines to encourage further
    research on efficient NLP models.

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 38
  • Total Committers: 1
  • Avg Commits per committer: 38.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Thomas Hudson 0****n@g****m 38

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 3
  • Total pull requests: 0
  • Average time to close issues: 22 days
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 1.33
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • floschne (2)
  • yulonglin (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels