https://github.com/beomi/exbert-transformers

exBERT on Transformers🤗

https://github.com/beomi/exbert-transformers

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • â—‹
    CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • ✓
    .zenodo.json file
    Found .zenodo.json file
  • â—‹
    DOI references
  • â—‹
    Academic publication links
  • ✓
    Committers with academic emails
    49 of 888 committers (5.5%) from academic institutions
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (8.9%) to scientific vocabulary

Keywords

exbert transformers

Keywords from Contributors

transformer cryptocurrency vlm speech-recognition qwen pytorch-transformers pretrained-models model-hub glm gemma
Last synced: 5 months ago · JSON representation

Repository

exBERT on Transformers🤗

Basic Info
  • Host: GitHub
  • Owner: Beomi
  • License: apache-2.0
  • Language: Python
  • Default Branch: exbert
  • Homepage:
  • Size: 50.7 MB
Statistics
  • Stars: 10
  • Watchers: 2
  • Forks: 3
  • Open Issues: 1
  • Releases: 0
Topics
exbert transformers
Created over 4 years ago · Last pushed over 4 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

exBERT on Transformers 🤗

Original exBERT

  • Repo: https://github.com/cgmhaicenter/exBERT
  • Paper: https://www.aclweb.org/anthology/2020.findings-emnlp.129/

Updated for Transformers 🤗

  • PyTorch 1.8.1 ✅
  • Huggingface Trainer ✅
  • AutoModel, AutoTokenizer ✅
  • DeepSpeed Pretrain with run_mlm.py ✅
  • GPU ✅ (TPU test in progress)
  • Fine tune available (https://github.com/Beomi/KcBERT-finetune, In progress)

How to use

Pretrain exBERT

  • Need to clone this repo

sh git clone https://github.com/Beomi/exbert-transformers cd exbert-transformers pip install -e ".[dev]" && pip install datasets cd examples/pytorch/language-modeling/ ./exbert_pretrain.sh

Finetune

Install exbert-transformers

  • No need to git clone repo!

sh pip install git+https://github.com/Beomi/exbert-transformers

Load

```python from transformers import exBertModel, exBertTokenizer

model = exBertModel.frompretrained(...) tokenizer = exBertTokenizer.frompretrained(...) ```

Trained on PAWS

```python from transformers import exBertModel, exBertTokenizer

model = exBertModel.frompretrained('beomi/exKcBERT-paws') tokenizer = exBertTokenizer.frompretrained('beomi/exKcBERT-paws') ```

Note) The base_model of Finetuned model config should be ""(blank)

Vocab update

If you want to change base BERT model or add more vocab on exBERT, add vocab or update vocab on examples/pytorch/language-modeling/exbert/vocab.txt and update vocab_size and base_model on examples/pytorch/language-modeling/exbert/config.json.

Appendix

Sample Train result example

Terminal results on Github GIST: https://gist.github.com/Beomi/1aa650f75c8e9b3dd467038004244ed2

Owner

  • Name: Junbum Lee
  • Login: Beomi
  • Kind: user
  • Location: Seoul, South Korea

AI/ML GDE @ml-gde. Korean AI/NLP Researcher and creator of multiple Korean PLMs. Focused on advancing Open LLMs.

GitHub Events

Total
Last Year

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 6,807
  • Total Committers: 888
  • Avg Commits per committer: 7.666
  • Development Distribution Score (DDS): 0.861
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
thomwolf t****f@g****m 946
Lysandre l****t@r****r 771
Sylvain Gugger 3****r 524
Patrick von Platen p****n@g****m 476
Julien Chaumond c****d@g****m 380
Stas Bekman s****0 323
Sam Shleifer s****r@g****m 279
VictorSanh v****h@g****m 193
Manuel Romero m****8@g****m 149
Morgan Funtowicz m****n@h****o 123
Julien Plu p****n@g****m 112
Suraj Patil s****5@g****m 96
Aymeric Augustin a****n@f****m 95
Rémi Louf r****f@g****m 81
Stefan Schweter s****n@s****t 70
lukovnikov l****v@o****m 46
Nicolas Patry p****s@p****m 35
Joe Davison j****n@g****m 32
Matthew Carrigan r****1@g****m 30
erenup p****e@p****n 28
Anthony MOI m****i@g****m 24
Teven t****o@g****m 24
piero p****o@u****m 24
Philipp Schmid 3****d 23
Grégory Châtel c****y@g****m 23
Bram Vanroy B****y@U****e 23
Clement c****e@g****m 20
Kevin Canwen Xu c****u@1****m 19
Philip May e****o@g****m 18
Rémi Louf r****i@h****o 17
and 858 more...

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ken19980727 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels