tasknet

Easy modernBERT fine-tuning and multi-task learning

https://github.com/sileod/tasknet

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    4 of 5 committers (80.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

autotask autotrain bert dataset easy extreme-multi-task fine-tuning huggingface-transformers jiant-alternative modernbert mtl multi-task multi-task-trainer multitask nlp task-embeddings tasks templates trainer
Last synced: 6 months ago · JSON representation ·

Repository

Easy modernBERT fine-tuning and multi-task learning

Basic Info
  • Host: GitHub
  • Owner: sileod
  • License: gpl-3.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 339 KB
Statistics
  • Stars: 59
  • Watchers: 2
  • Forks: 6
  • Open Issues: 4
  • Releases: 59
Topics
autotask autotrain bert dataset easy extreme-multi-task fine-tuning huggingface-transformers jiant-alternative modernbert mtl multi-task multi-task-trainer multitask nlp task-embeddings tasks templates trainer
Created over 3 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

tasknet : simple modernBERT fine-tuning, with multi-task support

tasknet is an interface between Huggingface datasets and Huggingface transformers Trainer.

Tasknet should work with all recent versions of Transformers.

Installation and example

pip install tasknet

Each task template has fields that should be matched with specific dataset columns. Classification has two text fields s1,s2, and a label y. Pass a dataset to a template, and fill in the mapping between the template fields and the dataset columns to instantiate a task. ```py import tasknet as tn; from datasets import load_dataset

rte = tn.Classification( dataset=load_dataset("glue", "rte"), s1="sentence1", s2="sentence2", y="label") #s2 is optional for classification, used to represent text pairs # See AutoTask for shorter code

class hparams: modelname = 'tasksource/ModernBERT-base-nli' # better performance for most tasks learningrate = 3e-5 # see hf.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments

model, trainer = tn.ModelTrainer(tasks=[rte],hparams) trainer.train(), trainer.evaluate() p = trainer.pipeline() p([{'text':'premise here','textpair': 'hypothesis here'}]) # HuggingFace pipeline for inference `` Tasknet is multitask by design.model.taskmodelslist` contains one model per task, with a shared encoder.

Task templates

tasknet relies on task templates to avoid boilerplate codes. The task templates correspond to Transformers AutoClasses: - SequenceClassification(s1, s2, y) - TokenClassification(tokens, labels) (tokens and labels are lists of words and assigned labels) - MultipleChoice(s1, choices) (s1 is a prompt/qusetion, choices is a list of texts, y is the index of the correct choice) - Seq2SeqLM (experimental support)

The task templates follow the same interface. They implement preprocess_function, a data collator and compute_metrics. Look at tasks.py and use existing templates as a starting point to implement a custom task template.

AutoTask

You can also leverage tasksource with tn.AutoTask and have one-line access to 600+ datasets, see implemented tasks. py rte = tn.AutoTask("glue/rte", nrows=5000) AutoTask guesses a template based on the dataset structure. It also accepts a dataset as input, if it fits the template (e.g. after tasksource custom preprocessing).

Balancing dataset sizes

py tn.Classification(dataset, nrows=5000, nrows_eval=500, oversampling=2) You can balance multiple datasets with nrows and oversampling. nrows is the maximal number of examples. If a dataset has less than nrows, it will be oversampled at most oversampling times.

Colab examples

Minimal-ish example:

https://colab.research.google.com/drive/15Xf4Bgs3itUmok7XlAK6EEquNbvjD9BD?usp=sharing

More complex example, where tasknet was scaled to 600 tasks:

https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing

Credit

This code uses some part of the examples of the transformers library and some code from multitask-learning-transformers.

Contact

You can request features on github or reach me at damien.sileo@inria.fr bib @misc{sileod22-tasknet, author = {Sileo, Damien}, doi = {10.5281/zenodo.561225781}, month = {11}, title = {{tasknet, multitask interface between Trainer and datasets}}, url = {https://github.com/sileod/tasknet}, version = {1.5.0}, year = {2022}}

Owner

  • Login: sileod
  • Kind: user

Damien Sileo

Citation (CITATION.cff)

cff-version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Sileo
    given-names: Damien
    orcid: https://orcid.org/0000-0002-3274-291X
title: "tasknet"
doi: 10.5281/zenodo.561225781

GitHub Events

Total
  • Create event: 2
  • Issues event: 1
  • Release event: 2
  • Watch event: 15
  • Issue comment event: 2
  • Push event: 10
  • Pull request event: 1
  • Fork event: 1
Last Year
  • Create event: 2
  • Issues event: 1
  • Release event: 2
  • Watch event: 15
  • Issue comment event: 2
  • Push event: 10
  • Pull request event: 1
  • Fork event: 1

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 94
  • Total Committers: 5
  • Avg Commits per committer: 18.8
  • Development Distribution Score (DDS): 0.138
Top Committers
Name Email Commits
sileod d****o@g****m 81
Damien Sileo d****o@m****r 8
damien sileo d****o@m****r 3
root r****t@m****r 1
Damien Sileo d****o@m****r 1

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 8
  • Total pull requests: 2
  • Average time to close issues: about 1 month
  • Average time to close pull requests: about 2 hours
  • Total issue authors: 6
  • Total pull request authors: 2
  • Average comments per issue: 2.0
  • Average comments per pull request: 1.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 13 minutes
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tboquet (3)
  • netique (1)
  • Oxi84 (1)
  • deewhy26 (1)
  • niedakh (1)
  • thirsima (1)
Pull Request Authors
  • GabrielLoiseau (2)
  • tboquet (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 523 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 58
  • Total maintainers: 1
pypi.org: tasknet

Seamless integration of tasks with huggingface models

  • Versions: 58
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 523 Last month
Rankings
Downloads: 6.0%
Dependent packages count: 6.6%
Average: 18.5%
Stargazers count: 18.6%
Forks count: 30.5%
Dependent repos count: 30.6%
Maintainers (1)
Last synced: 7 months ago

Dependencies

.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/release.yml actions
  • actions/checkout v3 composite
pyproject.toml pypi