https://github.com/ai-forever/model-zoo

NLP model zoo for Russian

https://github.com/ai-forever/model-zoo

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.6%) to scientific vocabulary

Keywords

bert nlp pytorch roberta roberta-model russian russian-language t5 t5-model transformers
Last synced: 5 months ago · JSON representation

Repository

NLP model zoo for Russian

Basic Info
  • Host: GitHub
  • Owner: ai-forever
  • License: apache-2.0
  • Default Branch: master
  • Homepage:
  • Size: 22.5 MB
Statistics
  • Stars: 45
  • Watchers: 3
  • Forks: 1
  • Open Issues: 3
  • Releases: 0
Topics
bert nlp pytorch roberta roberta-model russian russian-language t5 t5-model transformers
Created over 4 years ago · Last pushed over 4 years ago
Metadata Files
Readme License

README.md

Welcome to the Model Zoo!

Here you can find NLP models for Russian, implemented in HF transformers🤗

See Examples In Colab!

Models:

| Model | Task | Type | Tokenizer | Dict size | Num Parameters | Training Data Volume | |-----------------|----------------------|-----------------|-----------|-----------|-----------------|----------------------| | ruBERT-base | mask filling | encoder | bpe | 120 138 | 178 M | 30 GB | | ruBERT-large | mask filling | encoder | bpe | 120 138 | 427 M | 30 GB | | ruRoBERTa-large | mask filling | encoder | bbpe | 50 257 | 355 M | 250 GB | | ruT5-base | text2text generation | encoder-decoder | bpe | 32101 | 222 M | 300 GB | | ruT5-large | text2text generation | encoder-decoder | bpe | 32101 | 737 M | 300 GB |

ruT5

Text2Text Generation task T5 paper - Large: HF Model - Base: HF Model

Model parameters

ruRoBerta

fill-mask task Roberta paper - Large: HF Model

ruBert

fill-mask task Bert paper - Large: HF Model - Base: HF Model

How to:

Use this Colab! to explore the models or run them on your machine.

Model set up:

pip install -r requirements.txt

Pipeline usage

``` from transformers import pipeline

unmasker = pipeline("fill-mask", model="sberbank-ai/ruRoberta-large") unmasker("Евгений Понасенков назвал величайшим маэстро.", top_k=1) ```

Classical usage

```

ruRoberta-large example

from transformers import RobertaForMaskedLM,RobertaTokenizer

model=RobertaForMaskedLM.from_pretrained('sberbank-ai/ruRoberta-large')

tokenizer=RobertaTokenizer.from_pretrained('sberbank-ai/ruRoberta-large')

unmasker = pipeline('fill-mask', model=model,tokenizer=tokenizer) unmasker("Стоит чаще писать на Хабр про .") ```

Use BertViz to obtain model visualizations

Roberta model_view:

/ !

``` from transformers import RobertaModel, RobertaTokenizer from bertviz import model_view

modelversion = 'sberbank-ai/ruRoberta-large' model = RobertaModel.frompretrained(modelversion, outputattentions=True) tokenizer = RobertaTokenizer.frompretrained(modelversion)

sentencea = "The cat sat on the mat" sentenceb = "The cat lay on the rug" inputs = tokenizer.encodeplus(sentencea, sentenceb, returntensors='pt', addspecialtokens=True) inputids = inputs['inputids'] attention = model(inputids)[-1] inputidlist = inputids[0].tolist() # Batch index 0 tokens = tokenizer.convertidstotokens(inputidlist) modelview(attention, tokens)

```

Owner

  • Name: AI Forever
  • Login: ai-forever
  • Kind: organization
  • Location: Armenia

Creating ML for the future. AI projects you already know. We are non-profit organization with members from all over the world.

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 4
  • Total pull requests: 0
  • Average time to close issues: about 8 hours
  • Average time to close pull requests: N/A
  • Total issue authors: 4
  • Total pull request authors: 0
  • Average comments per issue: 1.25
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Luonic (1)
  • ninovikova (1)
  • DimIsaev (1)
  • AndreyM0 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • bertviz *
  • datasets *
  • sentencepiece *
  • transformers *