https://github.com/cofe-ai/msg
Masked Structural Growth for 2x Faster Language Model Pre-training
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.9%) to scientific vocabulary
Repository
Masked Structural Growth for 2x Faster Language Model Pre-training
Basic Info
- Host: GitHub
- Owner: cofe-ai
- License: apache-2.0
- Language: Python
- Default Branch: master
- Size: 56.6 KB
Statistics
- Stars: 20
- Watchers: 2
- Forks: 2
- Open Issues: 3
- Releases: 0
Metadata Files
README.md
Masked Structural Growth
We grow up language models in pre-training with efficient schedules and function-preserving operators that yields 2x speedup.
MSG paper: https://arxiv.org/abs/2305.02869
Quick Start
The following example shows how to run MSG on public Bert Pre-training data. 1. Pre-processing
preprocessbertdata.py
This generates static masks for raw data.
- Run MSG
For Bert-base:
sh growbertbase.sh
For Bert-large:
sh growbertlarge.sh
- Evaluation
cd glueeval sh rungluetogetherwith_stat.sh
Notes
You can modify configs/*.json and set "attentionprobsdropoutprob" and "hiddendropout_prob" to 0.0 in order to check function preservation. However, according to different pytorch versions, there can still be negligible differences of loss before and after growth.
References
If this project helps you, please cite us, thanks!
@inproceedings{
yao2024masked,
title={Masked Structural Growth for 2x Faster Language Model Pre-training},
author={Yiqun Yao and Zheng Zhang and Jing Li and Yequan Wang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=rL7xsg1aRn}
}
Owner
- Name: cofe-ai
- Login: cofe-ai
- Kind: organization
- Location: China
- Repositories: 11
- Profile: https://github.com/cofe-ai
Big Model AI Groups from BAAI
GitHub Events
Total
- Watch event: 3
Last Year
- Watch event: 3
Dependencies
- accelerate ==0.15.0
- datasets ==2.7.1
- evaluate ==0.3.0
- torch ==1.10.0a0
- transformers ==4.24.0