https://github.com/beomi/exbert-transformers
exBERT on Transformers🤗
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
â—‹CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
â—‹DOI references
-
â—‹Academic publication links
-
✓Committers with academic emails
49 of 888 committers (5.5%) from academic institutions -
â—‹Institutional organization owner
-
â—‹JOSS paper metadata
-
â—‹Scientific vocabulary similarity
Low similarity (8.9%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
exBERT on Transformers🤗
Basic Info
Statistics
- Stars: 10
- Watchers: 2
- Forks: 3
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
exBERT on Transformers 🤗
Original exBERT
- Repo: https://github.com/cgmhaicenter/exBERT
- Paper: https://www.aclweb.org/anthology/2020.findings-emnlp.129/
Updated for Transformers 🤗
- PyTorch 1.8.1 ✅
- Huggingface Trainer ✅
- AutoModel, AutoTokenizer ✅
- DeepSpeed Pretrain with
run_mlm.py✅ - GPU ✅ (TPU test in progress)
- Fine tune available (https://github.com/Beomi/KcBERT-finetune, In progress)
How to use
Pretrain exBERT
- Need to clone this repo
sh
git clone https://github.com/Beomi/exbert-transformers
cd exbert-transformers
pip install -e ".[dev]" && pip install datasets
cd examples/pytorch/language-modeling/
./exbert_pretrain.sh
Finetune
Install exbert-transformers
- No need to git clone repo!
sh
pip install git+https://github.com/Beomi/exbert-transformers
Load
```python from transformers import exBertModel, exBertTokenizer
model = exBertModel.frompretrained(...) tokenizer = exBertTokenizer.frompretrained(...) ```
Trained on PAWS
```python from transformers import exBertModel, exBertTokenizer
model = exBertModel.frompretrained('beomi/exKcBERT-paws') tokenizer = exBertTokenizer.frompretrained('beomi/exKcBERT-paws') ```
Note) The
base_modelof Finetuned model config should be""(blank)
Vocab update
If you want to change base BERT model or add more vocab on exBERT, add vocab or update vocab on examples/pytorch/language-modeling/exbert/vocab.txt
and update vocab_size and base_model on examples/pytorch/language-modeling/exbert/config.json.
Appendix
Sample Train result example
Terminal results on Github GIST: https://gist.github.com/Beomi/1aa650f75c8e9b3dd467038004244ed2
Owner
- Name: Junbum Lee
- Login: Beomi
- Kind: user
- Location: Seoul, South Korea
- Website: https://junbuml.ee
- Twitter: __Beomi__
- Repositories: 110
- Profile: https://github.com/Beomi
AI/ML GDE @ml-gde. Korean AI/NLP Researcher and creator of multiple Korean PLMs. Focused on advancing Open LLMs.
GitHub Events
Total
Last Year
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| thomwolf | t****f@g****m | 946 |
| Lysandre | l****t@r****r | 771 |
| Sylvain Gugger | 3****r | 524 |
| Patrick von Platen | p****n@g****m | 476 |
| Julien Chaumond | c****d@g****m | 380 |
| Stas Bekman | s****0 | 323 |
| Sam Shleifer | s****r@g****m | 279 |
| VictorSanh | v****h@g****m | 193 |
| Manuel Romero | m****8@g****m | 149 |
| Morgan Funtowicz | m****n@h****o | 123 |
| Julien Plu | p****n@g****m | 112 |
| Suraj Patil | s****5@g****m | 96 |
| Aymeric Augustin | a****n@f****m | 95 |
| Rémi Louf | r****f@g****m | 81 |
| Stefan Schweter | s****n@s****t | 70 |
| lukovnikov | l****v@o****m | 46 |
| Nicolas Patry | p****s@p****m | 35 |
| Joe Davison | j****n@g****m | 32 |
| Matthew Carrigan | r****1@g****m | 30 |
| erenup | p****e@p****n | 28 |
| Anthony MOI | m****i@g****m | 24 |
| Teven | t****o@g****m | 24 |
| piero | p****o@u****m | 24 |
| Philipp Schmid | 3****d | 23 |
| Grégory Châtel | c****y@g****m | 23 |
| Bram Vanroy | B****y@U****e | 23 |
| Clement | c****e@g****m | 20 |
| Kevin Canwen Xu | c****u@1****m | 19 |
| Philip May | e****o@g****m | 18 |
| Rémi Louf | r****i@h****o | 17 |
| and 858 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ken19980727 (1)