zensols.edusenti

EduSenti: Education Review Sentiment in Albanian (COLING paper)

https://github.com/plandes/edusenti

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary

Keywords

ai albanian albanian-language model natural-language-processing paper sentiment-analysis
Last synced: 6 months ago · JSON representation ·

Repository

EduSenti: Education Review Sentiment in Albanian (COLING paper)

Basic Info
  • Host: GitHub
  • Owner: plandes
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 194 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
ai albanian albanian-language model natural-language-processing paper sentiment-analysis
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

EduSenti: Education Review Sentiment in Albanian

PyPI Python 3.10 Python 3.11

Pretraining and sentiment student to instructor review corpora and analysis in Albanian. This repository contains the code base to be used for the paper RoBERTa Low Resource Fine Tuning for Sentiment Analysis in Albanian. To reproduce the results, see the paper reproduction repository. If you use our model or API, please cite our paper.

Table of Contents

Obtaining

The library can be installed with pip from the pypi repository: bash pip3 install zensols.edusenti

The models are downloaded on the first use of the command-line or API.

Usage

Command line: bash $ edusenti predict sq.txt (+): <Per shkak të gjendjes së krijuar si pasojë e pandemisë edhe ne sikur [...]> (-): <Fillimisht isha e shqetësuar se si do ti mbanim kuizet, si do të [...]> (+): <Kjo gjendje ka vazhduar edhe në kohën e provimeve> ...

Use the csv action to write all predictions to a comma-delimited file (use edusent --help).

API

```python

from zensols.edusenti import ( ApplicationFactory, Application, SentimentFeatureDocument ) app: Application = ApplicationFactory.getapplication() doc: SentimentFeatureDocument for doc in app.predict(['Kjo gjendje ka vazhduar edhe në kohën e provimeve']): print(f'sentence: {doc.text}') print(f'prediction: {doc.pred}') print(f'prediction: {doc.softmaxlogit}')

sentence: Kjo gjendje ka vazhduar edhe në kohën e provimeve prediction: + logits: {'+': 0.70292175, '-': 0.17432323, 'n': 0.12275504} ```

Models

The models are downloaded the first time the API is used. To change the model (by default xlm-roberta-base is used) on the command-line, use --override esi_default.model_namel=xlm-roberta-large. You can also create a ~/.edusentirc file with the following:

ini [esi_default] model_namel = xlm-roberta-large

Performance of the models on the test set when trained and validated are below.

| Model | F1 | Precision | Recall | |:--------------------|-----:|----------:|-------:| | xlm-roberta-base | 78.1 | 80.7 | 79.7 | | xlm-roberta-large | 83.5 | 84.9 | 84.7 |

However, the distributed models were trained on the training and test sets combined. The validation metrics of those trained models are available on the command line with edusenti info.

Differences from the Paper Repository

The paper reproduction repository has quite a few differences, mostly around reproducibility. However, this repository is designed to be a package used for research that applies the model. To reproduce the results of the paper, please refer to the reproduction repository. To use the best performing model (XLM-RoBERTa Large) from that paper, then use this repository.

The primary difference is this repo has significantly better performance in Albanian, which climbed from from F1 71.9 to 83.5 (see models). However, this repository has no English sentiment model since it was only used for comparing methods.

Changes include:

  • Python was upgraded from 3.9.9 to 3.11.6
  • PyTorch was upgraded from 1.12.1 to 2.1.1
  • HuggingFace transformers was upgraded from 4.19 to 4.35
  • zensols.deepnlp was upgraded from 1.8 to 1.13
  • The dataset was re-split and stratified.

Documentation

See the full documentation. The API reference is also available.

Changelog

An extensive changelog is available here.

Citation

If you use this project in your research please use the following BibTeX entry:

bibtex @inproceedings{nuci-etal-2024-roberta-low, title = "{R}o{BERT}a Low Resource Fine Tuning for Sentiment Analysis in {A}lbanian", author = "Nuci, Krenare Pireva and Landes, Paul and Di Eugenio, Barbara", editor = "Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen", booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)", month = may, year = "2024", address = "Torino, Italy", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.lrec-main.1233", pages = "14146--14151" }

License

MIT License

Copyright (c) 2023 - 2024 Paul Landes and Krenare Pireva Nuci

Owner

  • Name: Paul Landes
  • Login: plandes
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'RoBERTa Low Resource Fine Tuning for Sentiment Analysis in Albanian'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
date-released: 2024-05-19
repository-code: https://github.com/uic-nlp-lab/edusenti
authors:
  - given-names: Krenare
    family-names: Nuci
    affiliation: University of Prishtina
  - given-names: Paul
    family-names: Landes
    email: landes@mailc.net
    affiliation: University of Illinois at Chicago
    orcid: 'https://orcid.org/0000-0003-0985-0864'
  - given-names: Barbara
    family-names: Di Eugenio
    affiliation: University of Illinois at Chicago
preferred-citation:
  type: conference-paper
  authors:
    - given-names: Krenare
      family-names: Nuci
      affiliation: University of Prishtina
    - given-names: Paul
      family-names: Landes
      email: landes@mailc.net
      affiliation: University of Illinois at Chicago
      orcid: 'https://orcid.org/0000-0003-0985-0864'
    - given-names: Barbara
      family-names: Di Eugenio
      affiliation: University of Illinois at Chicago
  title: 'CALAMR: Component ALignment for Abstract Meaning Representation'
  url: https://aclanthology.org/2024.lrec-main.1233/
  year: 2024
  conference:
    name: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
    city: Torino
    country: IT
    date-start: 2024-05-20
    date-end: 2024-05-25

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 3 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
pypi.org: zensols.edusenti

Pretraining and sentiment student to instructor review sentiment corpora and analysis.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 3 Last month
Rankings
Dependent packages count: 10.9%
Average: 36.3%
Dependent repos count: 61.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

src/python/requirements-all.txt pypi
  • Jinja2 ==3.1.3
  • MarkupSafe ==2.1.5
  • PyYAML ==6.0.1
  • XlsxWriter ==3.0.9
  • annotated-types ==0.6.0
  • blis ==0.7.11
  • catalogue ==2.0.10
  • certifi ==2024.2.2
  • charset-normalizer ==3.3.2
  • click ==8.1.7
  • cloudpickle ==3.0.0
  • confection ==0.1.4
  • configparser ==5.2.0
  • contourpy ==1.2.0
  • cycler ==0.12.1
  • cymem ==2.0.8
  • filelock ==3.13.1
  • fonttools ==4.49.0
  • frozendict ==2.4.0
  • fsspec ==2024.2.0
  • future ==1.0.0
  • gensim ==4.3.2
  • h5py ==3.10.0
  • huggingface-hub ==0.21.4
  • hyperopt ==0.2.7
  • idna ==3.6
  • interlap ==0.2.7
  • joblib ==1.3.2
  • kiwisolver ==1.4.5
  • langcodes ==3.3.0
  • matplotlib ==3.8.3
  • mpmath ==1.3.0
  • msgpack ==1.0.8
  • msgpack-numpy ==0.4.8
  • murmurhash ==1.0.10
  • networkx ==3.2.1
  • nltk ==3.8.1
  • numpy ==1.25.2
  • nvidia-cublas-cu12 ==12.1.3.1
  • nvidia-cuda-cupti-cu12 ==12.1.105
  • nvidia-cuda-nvrtc-cu12 ==12.1.105
  • nvidia-cuda-runtime-cu12 ==12.1.105
  • nvidia-cudnn-cu12 ==8.9.2.26
  • nvidia-cufft-cu12 ==11.0.2.54
  • nvidia-curand-cu12 ==10.3.2.106
  • nvidia-cusolver-cu12 ==11.4.5.107
  • nvidia-cusparse-cu12 ==12.1.0.106
  • nvidia-nccl-cu12 ==2.18.1
  • nvidia-nvjitlink-cu12 ==12.4.99
  • nvidia-nvtx-cu12 ==12.1.105
  • packaging ==23.2
  • pandas ==2.1.4
  • parse ==1.20.1
  • pathlib_abc ==0.1.1
  • pathy ==0.11.0
  • patool ==1.12
  • pillow ==10.2.0
  • preshed ==3.0.9
  • protobuf ==3.20.3
  • py4j ==0.10.9.7
  • pydantic ==2.6.3
  • pydantic_core ==2.16.3
  • pyparsing ==3.1.2
  • python-dateutil ==2.9.0.post0
  • pytz ==2024.1
  • regex ==2023.12.25
  • requests ==2.31.0
  • safetensors ==0.4.2
  • scikit-learn ==1.3.2
  • scipy ==1.9.3
  • sentencepiece ==0.1.99
  • six ==1.16.0
  • smart-open ==6.4.0
  • spacy ==3.6.1
  • spacy-legacy ==3.0.12
  • spacy-loggers ==1.0.5
  • srsly ==2.4.8
  • sympy ==1.12
  • tabulate ==0.9.0
  • thinc ==8.1.12
  • threadpoolctl ==3.3.0
  • tokenizers ==0.15.2
  • torch ==2.1.2
  • torchvision ==0.16.2
  • tqdm ==4.66.2
  • transformers ==4.35.2
  • triton ==2.1.0
  • typer ==0.9.0
  • typing_extensions ==4.10.0
  • tzdata ==2024.1
  • urllib3 ==2.2.1
  • wasabi ==1.1.2
  • zensols.datdesc ==0.2.2
  • zensols.deeplearn ==1.11.0
  • zensols.deepnlp ==1.13.0
  • zensols.install ==1.1.2
  • zensols.nlp ==1.10.0
  • zensols.util ==1.14.2
src/python/requirements.txt pypi
  • zensols.deepnlp *
src/python/setup.py pypi