awesome-finetuning
A curated list of resources on fine-tuning language models.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.2%) to scientific vocabulary
Repository
A curated list of resources on fine-tuning language models.
Basic Info
- Host: GitHub
- Owner: mmarius
- License: mit
- Default Branch: main
- Size: 60.5 KB
Statistics
- Stars: 25
- Watchers: 1
- Forks: 2
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Awesome Fine-tuning 
A curated list of resources on fine-tuning language models, inspired by awesome-implicit-representations.
Disclaimer
This list does not aim to be exhaustive. Feel free to open a pull request in order to suggest papers that should be added to the list.
Disclosure. I'm an author of the following papers:
- On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
- On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers
Table of contents
Papers
Fine-tuning before transformers
- Semi-supervised Sequence Learning Dai & Le (2015)
How Transferable are Neural Networks in NLP Applications? Mou et al. (2016)
Improving Neural Machine Translation Models with Monolingual Data Sennrich et al. (2016)
Question Answering through Transfer Learning from Large Fine-grained Supervision Data Min et al. (2017)
Universal Language Model Fine-tuning for Text Classification Howard & Ruder (2018)
An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models Chronopoulou et al. (2019)
...
Fine-tuning transformers
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Devlin et al. (2019)
Better Fine-Tuning by Reducing Representational Collapse Aghajanyan et al. (2020)
FreeLB: Enhanced Adversarial Training for Natural Language Understanding Zhu et al. (2020)
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization Jiang et al. (2020)
Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning Gunel et al. (2021)
...
Intermediate task fine-tuning
Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks Phang et al. (2018)
Transfer Fine-Tuning: A BERT Case Study Arase & Tsujii (2019)
Learning and Evaluating General Linguistic Intelligence Yogatama et al. (2019)
Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? Pruksachatkun et al. (2020)
English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too Phang et al. (2020)
What to Pre-Train on? Efficient Intermediate Task Selection Poth et al. (2021)
Is Supervised Syntactic Parsing Beneficial for Language Understanding Tasks? An Empirical Investigation Glavaš & Vulić (2021)
Muppet: Massive Multi-task Representations with Pre-Finetuning Aghajanyan et al. (2021)
...
Intermediate (masked) language modeling
Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling Han & Eisenstein (2019)
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks Gururangan et al. (2020)
Mining Knowledge for Natural Language Inference from Wikipedia Categories Chen et al. (2020)
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank Chau et al. (2020)
Train No Evil: Selective Masking for Task-Guided Pre-Training Gu et al. (2020)
...
Injecting "skills"
Injecting Numerical Reasoning Skills into Language Models Geva et al. (2020)
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers Lauscher et al. (2020)
Analyzing Commonsense Emergence in Few-shot Knowledge Models Da et al. (2021)
...
Parameter-efficient fine-tuning
Parameter-Efficient Transfer Learning for NLP Houlsby et al. (2019)
BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning Stickland & Murray (2019)
Simple, Scalable Adaptation for Neural Machine Translation Bapna & Firat (2019)
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models Zhao et al. (2020)
Movement Pruning: Adaptive Sparsity by Fine-Tuning Sanh et al. (2020)
AdapterFusion: Non-Destructive Task Composition for Transfer Learning Pfeiffer et al. (2021)
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer Pfeiffer et al. (2020)
AdapterDrop: On the Efficiency of Adapters in Transformers Rücklé et al. (2021)
Parameter-efficient transfer learning with diff pruning Guo et al. (2021)
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers Mahabadi et al. (2021)
LoRA: Low-Rank Adaptation of Large Language Models Hu et al. (2021)
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models Zaken et al. (2022)
Training Neural Networks with Fixed Sparse Masks Sung et al. (2021)
Towards a Unified View of Parameter-Efficient Transfer Learning He et al. (2021)
Composable Sparse Fine-Tuning for Cross-Lingual Transfer Ansell et al. (2022)
Revisiting Parameter-Efficient Tuning: Are We Really There Yet? Chen et al. (2022)
Prompt-free and Efficient Few-shot Learning with Language Models Mahabadi et al. (2022)
Adaptable Adapters Moosavi et al. (2022)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning Liu et al. (2022)
...
Some continuous prompt-based methods can also be seen as parameter-efficient fine-tuning methods. For a list of papers see below.
Prompt-based fine-tuning
Discrete prompts
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference Schick & Schütze (2021a)
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners Schick & Schütze (2021b)
Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification Schick et al. (2020)
Few-Shot Text Generation with Natural Language Instructions Schick & Schütze (2021c)
Making Pre-trained Language Models Better Few-shot Learners Gao et al. (2021)
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts Shin et al. (2020)
How Many Data Points is a Prompt Worth? Le Scao & Rush (2021)
Improving and Simplifying Pattern Exploiting Training Tam et al. (2021)
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections Zhong et al. (2021)
Calibrate Before Use: Improving Few-Shot Performance of Language Models Zhao et al. (2021)
PTR: Prompt Tuning with Rules for Text Classification Han et al. (2021)
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models Logan IV et al. (2021)
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification Hu et al. (2021)
Prompt-Learning for Fine-Grained Entity Typing Ding et al. (2021)
Do Prompt-Based Models Really Understand the Meaning of their Prompts? Webson & Pavlick (2022)
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning Utama et al. (2021)
Prototypical Verbalizer for Prompt-based Few-shot Tuning Cui et al. (2022)
...
Multi-task fine-tuning using discrete prompts
Cross-Task Generalization via Natural Language Crowdsourcing Instructions Mishra et al. (2021)
Discrete and Soft Prompting for Multilingual Models Zhao & Schütze (2021)
Finetuned Language Models Are Zero-Shot Learners Wei et al. (2021)
Multitask Prompted Training Enables Zero-Shot Task Generalization Sanh et al. (2021)
Prompt Consistency for Zero-Shot Task Generalization Zhou et al. (2022)
Few-shot Adaptation Works with UnpredicTable Data Chan et al. (2022)
Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks Wang et al. (2022)
...
Continuous prompts
Prefix-Tuning: Optimizing Continuous Prompts for Generation Li & Liang (2021)
WARP: Word-level Adversarial ReProgramming Hambardzumyan et al. (2021)
Learning How to Ask: Querying LMs with Mixtures of Soft Prompts Qin & Eisner (2021)
Factual Probing Is [MASK]: Learning vs. Learning to Recall Zhong et al. (2021)
The Power of Scale for Parameter-Efficient Prompt Tuning Lester et al. (2021)
Multimodal Few-Shot Learning with Frozen Language Models Tsimpoukelli et al. (2021)
Noisy Channel Language Model Prompting for Few-Shot Text Classification Min et al. (2021)
Continuous Entailment Patterns for Lexical Inference in Context Schmitt & Schütze (2021)
Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners Zhang et al. (2022)
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer Vu et al. (2022)
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks Liu et al. (2022)
...
Evaluating few-shot fine-tuning
True Few-Shot Learning with Language Models Perez et al. (2021)
FLEX: Unifying Evaluation for Few-Shot NLP Bragg et al. (2021)
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding Zheng et al. (2022)
True Few-Shot Learning with Prompts—A Real-World Perspective Schick & Schütze (2022)
...
Fine-tuning analysis
Visualizing and Understanding the Effectiveness of BERT Hao et al. (2019)
oLMpics-On What Language Model Pre-training Captures Talmor et al. (2020)
Pretrained Transformers Improve Out-of-Distribution Robustness Hendrycks et al. (2020)
What Happens To BERT Embeddings During Fine-tuning? Merchant et al. (2020)
Investigating Learning Dynamics of BERT Fine-Tuning Hao et al. (2020)
Investigating Transferability in Pretrained Language Models Tamkin et al. (2020)
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning Aghajanyan et al. (2021)
Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers Phang et al. (2021)
Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution Kumar et al. (2022)
A Closer Look at How Fine-tuning Changes BERT Zhou & Srikumar (2022)
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning Aghajanyan et al. (2021)
When Do You Need Billions of Words of Pretraining Data? Zhang et al. (2021)
On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation He et al. (2021)
Pretrained Transformers as Universal Computation Engines Lu et al. (2021)
Predicting Inductive Biases of Pre-Trained Models Lovering et al. (2021)
Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers Phang et al. (2021)
...
Fine-tuning stability
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping Dodge et al. (2020)
Revisiting Few-sample BERT Fine-tuning Zhang et al. (2021)
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines Mosbach et al. (2021)
...
Fine-tuning and probing
What Happens To BERT Embeddings During Fine-tuning? Merchant et al. (2020)
On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers Mosbach et al. (2020)
On the Importance of Data Size in Probing Fine-tuned Models Mehrafarin et al. (2022)
...
Fine-tuning and generalization
BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance McCoy et al. (2020)
Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics Bhagava et al. (2021)
Linear Connectivity Reveals Generalization Strategies Juneja et al. (2022)
...
Fine-tuning and spurious features
An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models Tu et al. (2020)
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually) Warstadt et al. (2020)
Predicting Inductive Biases of Pre-Trained Models Lovering et al. (2021)
...
Theoretical work
A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks Saunshi et al. (2021)
Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning Wei et al. (2021)
...
Surveys
Recent Advances in Language Model Fine-tuning Ruder (2021)
On the Opportunities and Risks of Foundation Models (Adaptation chapter) Bommasani et al. (2021)
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing Liu et al. (2021)
Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models Ding et al. (2022)
...
Misc.
What is being transferred in transfer learning? Neyshabur et al. (2020)
Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge Talmor et al. (2020)
Exploring and Predicting Transferability across NLP Tasks Vu et al. (2020)
...
Owner
- Login: mmarius
- Kind: user
- Website: https://twitter.com/mariusmosbach
- Repositories: 1
- Profile: https://github.com/mmarius
PhD student @ Saarland University
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If this overiew is useful to you, please cite as below."
authors:
- family-names: Mosbach
given-names: Marius
orcid: https://orcid.org/0000-0003-2184-6964
title: "Awesome Fine-tuning - A curated list of resources on fine-tuning language models"
version: 1.0.0
url: https://github.com/mmarius/awesome-finetuning
GitHub Events
Total
- Watch event: 2
- Fork event: 1
Last Year
- Watch event: 2
- Fork event: 1