https://github.com/ai-forever/model-zoo
NLP model zoo for Russian
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.6%) to scientific vocabulary
Keywords
Repository
NLP model zoo for Russian
Basic Info
Statistics
- Stars: 45
- Watchers: 3
- Forks: 1
- Open Issues: 3
- Releases: 0
Topics
Metadata Files
README.md
Welcome to the Model Zoo!
Here you can find NLP models for Russian, implemented in HF transformers🤗
Models:
| Model | Task | Type | Tokenizer | Dict size | Num Parameters | Training Data Volume | |-----------------|----------------------|-----------------|-----------|-----------|-----------------|----------------------| | ruBERT-base | mask filling | encoder | bpe | 120 138 | 178 M | 30 GB | | ruBERT-large | mask filling | encoder | bpe | 120 138 | 427 M | 30 GB | | ruRoBERTa-large | mask filling | encoder | bbpe | 50 257 | 355 M | 250 GB | | ruT5-base | text2text generation | encoder-decoder | bpe | 32101 | 222 M | 300 GB | | ruT5-large | text2text generation | encoder-decoder | bpe | 32101 | 737 M | 300 GB |
ruT5
Text2Text Generation task T5 paper - Large: HF Model - Base: HF Model
ruRoBerta
fill-mask task Roberta paper - Large: HF Model
ruBert
fill-mask task Bert paper - Large: HF Model - Base: HF Model
How to:
Use this to explore the models or run them on your machine.
Model set up:
pip install -r requirements.txt
Pipeline usage
``` from transformers import pipeline
unmasker = pipeline("fill-mask", model="sberbank-ai/ruRoberta-large")
unmasker("Евгений Понасенков назвал 
Classical usage
```
ruRoberta-large example
from transformers import RobertaForMaskedLM,RobertaTokenizer
model=RobertaForMaskedLM.from_pretrained('sberbank-ai/ruRoberta-large')
tokenizer=RobertaTokenizer.from_pretrained('sberbank-ai/ruRoberta-large')
unmasker = pipeline('fill-mask', model=model,tokenizer=tokenizer)
unmasker("Стоит чаще писать на Хабр про
Use BertViz to obtain model visualizations
Roberta model_view:
``` from transformers import RobertaModel, RobertaTokenizer from bertviz import model_view
modelversion = 'sberbank-ai/ruRoberta-large' model = RobertaModel.frompretrained(modelversion, outputattentions=True) tokenizer = RobertaTokenizer.frompretrained(modelversion)
sentencea = "The cat sat on the mat" sentenceb = "The cat lay on the rug" inputs = tokenizer.encodeplus(sentencea, sentenceb, returntensors='pt', addspecialtokens=True) inputids = inputs['inputids'] attention = model(inputids)[-1] inputidlist = inputids[0].tolist() # Batch index 0 tokens = tokenizer.convertidstotokens(inputidlist) modelview(attention, tokens)
```
Owner
- Name: AI Forever
- Login: ai-forever
- Kind: organization
- Location: Armenia
- Repositories: 60
- Profile: https://github.com/ai-forever
Creating ML for the future. AI projects you already know. We are non-profit organization with members from all over the world.
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 4
- Total pull requests: 0
- Average time to close issues: about 8 hours
- Average time to close pull requests: N/A
- Total issue authors: 4
- Total pull request authors: 0
- Average comments per issue: 1.25
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Luonic (1)
- ninovikova (1)
- DimIsaev (1)
- AndreyM0 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- bertviz *
- datasets *
- sentencepiece *
- transformers *
/ !