https://github.com/ai-forever/ru-gpts

Russian GPT3 models.

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, scholar.google
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.1%) to scientific vocabulary

Keywords

deep-learning gpt3 language-model russian russian-language transformers

Keywords from Contributors

clip

Last synced: 6 months ago · JSON representation

Repository

Russian GPT3 models.

Basic Info

Host: GitHub
Owner: ai-forever
License: apache-2.0
Language: Python
Default Branch: master
Homepage:
Size: 383 KB

Statistics

Stars: 2,091
Watchers: 87
Forks: 439
Open Issues: 12
Releases: 0

Topics

deep-learning gpt3 language-model russian russian-language transformers

Created over 5 years ago · Last pushed about 3 years ago

Metadata Files

Readme License

Russian GPT-3 models

ruGPT3XL, ruGPT3Large, ruGPT3Medium, ruGPT3Small and ruGPT2Large

This repository contains bunch of autoregressive transformer language models trained on a huge dataset of russian language.

Russian GPT-3 models (ruGPT3XL, ruGPT3Large, ruGPT3Medium, ruGPT3Small) trained with 2048 sequence length with sparse and dense attention blocks. We also provide Russian GPT-2 large model (ruGPT2Large) trained with 1024 sequence length.
Try Model Generation In Colab! ruGPT-3 XL: or ruGPT-3 smaller models:
Usage examples are described in detail here. See how fine-tuning works:

ruGPT3XL
- Setup
- Usage
- Finetune
- Pretraining details ruGPT3XL
ruGPT3Large, ruGPT3Medium, ruGPT3Small, ruGPT2Large
- Setup
- Usage
- Pretraining details
- Pretraining details ruGPT3Large
- Pretraining details ruGPT3Medium
- Pretraining details ruGPT3Small
- Pretraining details ruGPT2Large
Papers mentioning ruGPT3
OpenSource Solutions with ruGPT3

ruGPT3XL

Setup

For colab we recommend use the following installation instructions:

%%bash export LD_LIBRARY_PATH=/usr/lib/ apt-get install clang-9 llvm-9 llvm-9-dev llvm-9-tools git clone https://github.com/qywu/apex cd apex pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ pip install triton DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 pip install deepspeed pip install transformers pip install huggingface_hub pip install timm==0.3.2 git clone https://github.com/sberbank-ai/ru-gpts cp ru-gpts/src_utils/trainer_pt_utils.py /usr/local/lib/python3.8/dist-packages/transformers/trainer_pt_utils.py cp ru-gpts/src_utils/_amp_state.py /usr/local/lib/python3.8/dist-packages/apex/amp/_amp_state.py

After installation env please restart colab. For checking is all ok, run the following commands:

``` !ds_report

Output:

... sparseattn ............ [YES] ...... [OKAY] ... import deepspeed.ops.sparseattention.sparseattnop ```

Usage

Here is a simple example of usage. For more see this example or .

```python import sys from src.xl_wrapper import RuGPT3XL import os

If run to from content root.

sys.path.append("ru-gpts/") os.environ["USE_DEEPSPEED"] = "1"

We can change address and port

os.environ["MASTERADDR"] = "127.0.0.1" os.environ["MASTERPORT"] = "5000" gpt = RuGPT3XL.frompretrained("sberbank-ai/rugpt3xl", seqlen=512) gpt.generate( "Кто был президентом США в 2020? ", maxlength=50, norepeatngramsize=3, repetition_penalty=2., ) ```

Finetuning

Example of finetune, load finetuned model and generate is here.

Our example of finetuning script here

Pretraining details ruGPT3XL

Model was trained with 512 sequence length using Deepspeed and Megatron code by Devices team, on 80B tokens dataset for 4 epochs. After that model was finetuned 1 epoch with sequence length 2048.
Note! Model has sparse attention blocks.

Total training time was around 10 days on 256 GPUs.
Final perplexity on test set is 12.05.

🤗HuggingFace model card link.

ruGPT3Large, ruGPT3Medium, ruGPT3Small, ruGPT2Large

Setup

For using ruGPT3Large, ruGPT3Medium, ruGPT3Small, ruGPT2Large just install 🤗HuggingFace transformers.

bash pip install transformers==4.24.0

Usage

Here we can obtain examples of finetuning or generation.

Also this examples is adapted for google colab: * finetuning: . * generation:

```python from transformers import GPT2LMHeadModel, GPT2Tokenizer

modelnameorpath = "sberbank-ai/rugpt3largebasedongpt2" tokenizer = GPT2Tokenizer.frompretrained(modelnameorpath) model = GPT2LMHeadModel.frompretrained(modelnameorpath).cuda() text = "Александр Сергеевич Пушкин родился в " inputids = tokenizer.encode(text, returntensors="pt").cuda() out = model.generate(inputids.cuda()) generatedtext = list(map(tokenizer.decode, out))[0] print(generated_text)

Output should be like this:

Александр Сергеевич Пушкин родился в \n1799 году. Его отец был крепостным крестьянином, а мать – крепостной крестьянкой. Детство и юность Пушкина прошли в деревне Михайловское под Петербургом. В 1820-х годах семья переехала

```

Pretraining details

All pretraining was done on Nvidia Tesla V100-SXM3 32 Gb GPUs on a Christofari Cluster. Following are the details of pretraining for each model.

Pretraining details ruGPT3Large

Model was trained with sequence length 1024 using transformers lib by Devices team on 80B tokens for 3 epochs. After that model was finetuned 1 epoch with sequence length 2048.

Total training time was around 14 days on 128 GPUs for 1024 context and few days on 16 GPUs for 2048 context.
Final perplexity on test set is 13.6.

You can obtain this model by using transformers with model name sberbank-ai/rugpt3large_based_on_gpt2.

🤗HuggingFace model card link

Our pretraining script here

Pretraining details ruGPT3Medium

Model was trained with sequence length 1024 using transformers lib by Devices team on 80B tokens for 3 epoch. After that model was finetuned on 2048 context.

Total training time was around 16 days on 64 GPUs.
Final perplexity on test set is 17.4.

You can obtain this model by using transformers with model name sberbank-ai/rugpt3medium_based_on_gpt2.

🤗HuggingFace model card link

Our pretraining script here

Pretraining details ruGPT3Small

Model was trained with sequence length 1024 using transformers by Devices team on 80B tokens around 3 epoch. After that model was finetuned on 2048 context.

Total training time took around one week on 32 GPUs.

You can obtain this model by using transformers with model name sberbank-ai/rugpt3small_based_on_gpt2.

🤗HuggingFace model card link

Our pretraining script here

Pretraining details ruGPT2Large

Model was trained with sequence length 1024 using transformers by Devices team on 170Gb data on 64 GPUs 3 weeks.

You can obtain this model by using transformers with model name sberbank-ai/rugpt2large.

🤗HuggingFace model card link

OpenSource Solutions with ruGPT3

ruCLIP Github
Simplification with ruGPT-3 XL Github
Word normalization (RuNormAS shared task) Github
AI CopyWriter Github
ЕГЭ Generation Github
NeuroZhirinovsky Github
PseudoKant Github
DostoevskyDoesntWriteIt Github

Papers mentioning ruGPT3

According to google scholar search - feel free to add links to this list

Text Simplification

``` @article{shatilovsentence, title={Sentence simplification with ruGPT3}, author={Shatilov, AA and Rey, AI}, url={http://www.dialog-21.ru/media/5281/shatilovaaplusreyai142.pdf} }

@article{fenogenovatext, title={Text Simplification with Autoregressive Models}, author={Fenogenova, Alena}, url={http://www.dialog-21.ru/media/5250/fenogenovaa141.pdf}} ```

Text Detoxification

@article{dementieva2021methods, title={Methods for Detoxification of Texts for the Russian Language}, author={Dementieva, Daryna and Moskovskiy, Daniil and Logacheva, Varvara and Dale, David and Kozlova, Olga and Semenov, Nikita and Panchenko, Alexander}, journal={arXiv preprint arXiv:2105.09052}, year={2021}, url={https://arxiv.org/abs/2105.09052} }

Paraphrasing and Data Augmentation

@inproceedings{fenogenova2021russian, title={Russian Paraphrasers: Paraphrase with Transformers}, author={Fenogenova, Alena}, booktitle={Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing}, pages={11--19}, year={2021}, url={https://www.aclweb.org/anthology/2021.bsnlp-1.2.pdf} }

Model Evaluation

@article{malykh2021morocco, title={MOROCCO: Model Resource Comparison Framework}, author={Malykh, Valentin and Kukushkin, Alexander and Artemova, Ekaterina and Mikhailov, Vladislav and Tikhonova, Maria and Shavrina, Tatiana}, journal={arXiv preprint arXiv:2104.14314}, year={2021}, url={https://arxiv.org/abs/2104.14314}}

Owner

Name: AI Forever
Login: ai-forever
Kind: organization
Location: Armenia

Repositories: 60
Profile: https://github.com/ai-forever

Creating ML for the future. AI projects you already know. We are non-profit organization with members from all over the world.

GitHub Events

Total

Watch event: 50
Issue comment event: 2
Pull request event: 1
Fork event: 7

Last Year

Watch event: 50
Issue comment event: 2
Pull request event: 1
Fork event: 7

Committers

Last synced: 9 months ago

All Time

Total Commits: 137
Total Committers: 4
Avg Commits per committer: 34.25
Development Distribution Score (DDS): 0.533

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Anton Emelyanov	l**t@m**u	64
Anton Emelyanov	k**n@g**m	60
Tatiana Shavrina	r**s@g**m	11
Oleg Shlyazhko	o****r	2

Committer Domains (Top 20 + Academic)

mail.ru: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 98
Total pull requests: 17
Average time to close issues: about 2 months
Average time to close pull requests: 23 days
Total issue authors: 68
Total pull request authors: 14
Average comments per issue: 2.58
Average comments per pull request: 0.53
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 1.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

fen0s (5)
mgrankin (4)
Pro100rus32 (4)
drunkinlove (4)
Artyrm (3)
denismashukov (3)
MsSurgeon (3)
AlexanderKozhevin (3)
asimaranov (2)
chsnt (2)
Den4ikAI (2)
Markfryazino (2)
LEv145 (2)
airogachev (2)
Kepler-Br (2)

Pull Request Authors

TatianaShavrina (3)
king-menin (2)
mcblooder (1)
cclauss (1)
safiza-web (1)
mgrankin (1)
egor-miasnikov (1)
dmitriy-pichugin (1)
amrzv (1)
TheRadioGuy (1)
nicoth-in (1)
tmm88 (1)
armanbolatov (1)
artemsnegirev (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

boto3 ==1.11.11
nltk >=3.4
numpy >=1.15.4
pandas >=0.24.0
regex ==2020.1.8
sentencepiece >=0.1.8
tensorflow >=1.12.0
transformers ==2.8.0

https://github.com/ai-forever/ru-gpts

Science Score: 10.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Russian GPT-3 models

ruGPT3XL, ruGPT3Large, ruGPT3Medium, ruGPT3Small and ruGPT2Large

Table of contents

ruGPT3XL

Setup

Output:

Usage

If run to from content root.

We can change address and port

Finetuning

Pretraining details ruGPT3XL

ruGPT3Large, ruGPT3Medium, ruGPT3Small, ruGPT2Large

Setup

Usage

Output should be like this:

Pretraining details

Pretraining details ruGPT3Large

Pretraining details ruGPT3Medium

Pretraining details ruGPT3Small

Pretraining details ruGPT2Large

OpenSource Solutions with ruGPT3

Papers mentioning ruGPT3

Text Simplification

Text Detoxification

Paraphrasing and Data Augmentation

Model Evaluation

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies