https://github.com/alexeyev/awesome-azerbaijani-nlp

Azerbaijani language processing software, models and datasets.

https://github.com/alexeyev/awesome-azerbaijani-nlp

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: researchgate.net, academia.edu
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.2%) to scientific vocabulary

Keywords

awesome-list azeri morphology natural-language-processing stemming turkic
Last synced: 5 months ago · JSON representation

Repository

Azerbaijani language processing software, models and datasets.

Basic Info
  • Host: GitHub
  • Owner: alexeyev
  • Language: Shell
  • Default Branch: master
  • Homepage:
  • Size: 114 KB
Statistics
  • Stars: 30
  • Watchers: 4
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
awesome-list azeri morphology natural-language-processing stemming turkic
Created almost 6 years ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

Awesome Azeri NLP Awesome

A curated list of awesome Azerbaijani language processing software, models and datasets. Inspired by awesome-ML.

The main focus is on open source tools, downloadable data and research papers with code.

If you want to contribute to this list (please do), send me a pull request. Also, a listed repository should be tagged as deprecated if:

  • Repository's owners explicitly say that "this library is not maintained".
  • Not committed for long time (2~3 years).

Table of Contents

Datasets

Raw text

Several corpora are also mentioned in research works: * S. Mammadova, G. Azimova, and A. Fatullayev. 2010.Text corpora and its role in development of the linguistic technologies for the azerbaijani language. In The Third International Conference Problems of Cybernetics and Informatics. * Baisa, Vıt, and Vıt Suchomel. "Large corpora for turkic languages and unsupervised morphological analysis." Proceedings of the Eighth conference on International Language Resources and Evaluation (LREC’12), Istanbul, Turkey. European Language Resources Association (ELRA). 2012. [SketchEngine corpora?] * C. Biemann, S. Bordag, G. Heyer, U. Quasthoff, and C. Wolff. 2004. Language-independent methods for compiling monolingual lexical data. Computational linguistics and intelligent text processing, pages 217–228. * Domrachev M. A., Sudoplatova S. N. Testing Methods for Automatic Detection of Mor- pheme Boundaries in the Azerbaijani Language. Vestnik NSU. Series: Linguistics and Intercultural Communication , 2018, vol. 16, no. 2, p. 34–47. (in Russ.) Downloadable corpus * Özenç B., Ehsani R., Solak E. Moraz: an open-source morphological analyzer for Azerbaijani Turkish //Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. – 2018. – С. 25-29. [BBC Azerbaijan]

Syntax

  • UD_Azerbaijani-TueCL: a treebank that contains a total of ~110 sentences including 20 Cairo sentences, and ~90 sentences suggested by UD Turkic Group; part of the UD Turkic Treebank. Translations of all the sentences are available in English, Turkish and Kyrgyz languages
  • UD project comments on difficulties in Turkish language processing, might bring light to the question why parsing Azeri is hard as well

Machine-readable dictionaries

TODO

Summarization

Translation

Sentiment

Mentioned in: * N. Gasimli's MS thesis "Analysis of the use of Twitter in Azerbaijan" — 2194+700 tweets * Mammad Hajili's 160K customer reviews with scores and upvotes

Pretrained models

Methods/Software

Morphology

Mentioned in papers: * POS-tagging paper — Mammadov, S., Rustamov, S., Mustafali, A., Sadigov, Z., Mollayev, R., & Mammadov, Z. (2018, October). Part-of-Speech Tagging for Azerbaijani Language. In 2018 IEEE 12th International Conference on Application of Information and Communication Technologies (AICT) (pp. 1-6). IEEE. [Probable implementation: aznlp repo] * Stemming paper, 2019 — Alizadeh, M. B. H., & Seyyedi, S. A. H. (2019). AUTO STEMMING OF AZERBAIJANI LANGUAGE. Problems of Information Technology, 59-66. * N. Gasimli's MS thesis "Analysis of the use of Twitter in Azerbaijan" — Zemberek is extended for Azerbaijani; though stated a lot of effort is still required for it to work properly for Azeri language.

Syntax

  • TODO

Online Demos

Miscellaneous

Owner

  • Name: Anton Alekseev
  • Login: alexeyev
  • Kind: user

GitHub Events

Total
Last Year

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 43
  • Total Committers: 2
  • Avg Commits per committer: 21.5
  • Development Distribution Score (DDS): 0.047
Past Year
  • Commits: 3
  • Committers: 2
  • Avg Commits per committer: 1.5
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Anton Alekseev a****v@g****m 41
Ismat i****v@g****m 2

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 10 minutes
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.5
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 10 minutes
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.5
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • Ismat-Samadov (2)
Top Labels
Issue Labels
Pull Request Labels