macedonian-llm-eval

LLM evaluation for Macedonian language.

https://github.com/lvstck/macedonian-llm-eval

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.3%) to scientific vocabulary

Scientific Fields

Engineering Computer Science - 40% confidence
Last synced: 6 months ago · JSON representation ·

Repository

LLM evaluation for Macedonian language.

Basic Info
  • Host: GitHub
  • Owner: LVSTCK
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 2.95 MB
Statistics
  • Stars: 12
  • Watchers: 0
  • Forks: 1
  • Open Issues: 0
  • Releases: 1
Created about 1 year ago · Last pushed 8 months ago
Metadata Files
Readme License Citation Codeowners

README.md

Macedonian LLM eval 🇲🇰

This repository is adapted from the original work by Aleksa Gordić. If you find this work useful, please consider citing or acknowledging the original source.

🎯 What is currently covered:

  • Common sense reasoning: Hellaswag, Winogrande, PIQA, OpenbookQA, ARC-Easy, ARC-Challenge
  • World knowledge: NaturalQuestions
  • Reading comprehension: BoolQ

You can find the Macedonian LLM eval dataset on HuggingFace. The dataset was translated from Serbian to Macedonian using the Google Translate API. The Serbian dataset was selected as the source instead of English because Serbian and Macedonian are closer from a linguistic standpoint, making Serbian a better starting point for translation. Additionally, the Serbian dataset was refined using GPT-4, which, according to the original report, significantly improved the quality of the translation.

Quality check was conducted on the translated Macedonian dataset, and the translations were deemed to be of good quality.


📊 Latest Results - January 16, 2025

image


| Model | Version | ARC Easy | ARC Challenge | Bool Q | HellaSwag | Openbook QA | PIQA | NQ Open | WinoGrande | |---------------------------------------------------------------------------------------------------------------------------------------------|---------|----------|---------------|--------|-----------|-------------|--------|---------|------------| | MKLLM-7B-Instruct | 7B | 0.5034 | 0.3003 | 0.7878 | 0.4328 | 0.2940 | 0.6420 | 0.0432 | 0.6148 | | BLOOM | 7B | 0.2774 | 0.1800 | 0.5028 | 0.2664 | 0.1580 | 0.5316 | 0 | 0.4964 | | Phi-3.5-mini | 3.8B | 0.2887 | 0.1877 | 0.6028 | 0.2634 | 0.1640 | 0.5256 | 0.0025 | 0.5193 | | Mistral | 7B | 0.4625 | 0.2867 | 0.7593 | 0.3722 | 0.2180 | 0.5783 | 0.0241 | 0.5612 | | Mistral-Nemo | 12B | 0.4718 | 0.3191 | 0.8086 | 0.3997 | 0.2420 | 0.6066 | 0.0291 | 0.6062 | | Qwen2.5 | 7B | 0.3906 | 0.2534 | 0.7789 | 0.3390 | 0.2160 | 0.5598 | 0.0042 | 0.5351 | | LLaMA 3.1 | 8B | 0.4453 | 0.2824 | 0.7639 | 0.3740 | 0.2520 | 0.5865 | 0.0335 | 0.5683 | | LLaMA 3.2 | 3B | 0.3224 | 0.2329 | 0.6624 | 0.2976 | 0.2060 | 0.5462 | 0.0044 | 0.5059 | | 🏆LLaMA 3.3 - 8bit | 70B | 0.5808 | 0.3686 | 0.8511 | 0.4656 | 0.2820 | 0.6600 | 0.0878 | 0.6093 | | domestic-yak-instruct | 8B | 0.5467 | 0.3362 | 0.7865 | 0.4480 | 0.3020 | 0.6910 | 0.0457 | 0.6267 |


📋Evaluation

To run the evaluation using the current version of macedonian-llm-eval you can follow the steps below:

Prerequisites

Before running the evaluation, ensure you have installed the necessary dependencies. First create an environment, e.g:

bash conda create -n mk_eval python==3.10 conda activate mk_eval

Then run: bash pip install -e .

Run Evaluation

To evaluate a specific language model on a specific task run: python3 main.py --language "Macedonian" --model hf-causal-experimental --model_args "pretrained=microsoft/Phi-3.5-mini-instruct" --tasks arc_challenge,arc_easy,boolq,hellaswag,openbookqa,piqa,winogrande --batch_size 8 --output_path "results_eval"

Info: You can run the evaluation for Serbian and Slovenian as well, just swap Macedonian with either one of them.

🈂️ (Optional) Translation

  • running this this will eat your google cloud credits or will bill you if you're already in the billing mode (this happens after you spend free credits and then deliberately enable billing again).

  • you can use your free credits to translate 500.000 chars / month!

  • if this is the first time you're creating a gcloud project you'll have 300$ of free credits!

Prerequisites

Before you begin, ensure you meet the following requirements:

For Linux Users:

For Windows Users: 1. Windows Subsystem for Linux (WSL2). If you don't have WSL2 installed, follow these steps in Windows cmd/powershell in administrator mode:

```bash
wsl --install

// Check version and distribution name. wsl -l -v

// Set the newly downloaded linux distro as default.
wsl --set-default <distribution name>
```
  1. Install Git from the WSL terminal.

    bash sudo apt update sudo apt install git git --version

  2. Install Miniconda from the WSL terminal. ```bash mkdir -p ~/miniconda3

    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh

    bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3

    rm -rf ~/miniconda3/miniconda.sh

    // Initialize conda with bash. ~/miniconda3/bin/conda init bash ```

  3. Follow the instructions below on WSL.

📗 Instructions for translating from Serbian into Macedonian

First let's setup a minimal Python program that makes sure you can run Google Translate on your local machine.

  1. Create a Google Console project (https://console.cloud.google.com/)
  2. Enable Google Translation API -> to enable it you have to setup the billing and input your credit card details (a note regarding safety: you'll have 300$ of free credit (if this is the first time you're doing it) and no one can spend money from your credit card unless all those free credits are spent and you re-enable the billing again! if you already had it setup in that case you have 500.000 chars/month for free!)
  3. Install Google Cloud CLI (gsutil) on your machine (see this: https://cloud.google.com/storage/docs/gsutil_install/)

    a.) Download the Linux archive file (find latest version from link above)

    curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-455.0.0-linux-x86_64.tar.gz

    b.) Extract the contents from the archive file above.

    tar -xf google-cloud-cli-455.0.0-linux-x86_64.tar.gz

    c.) Run installation script. ./google-cloud-sdk/install.sh

    d.) Initiate and authenticate your account. ./google-cloud-sdk/bin/gcloud init

    e.) Create a credentials file with gcloud auth application-default login

  4. Create and setting up the conda env

    a.) Open a terminal (if on Windows use the WSL terminal, if you're on Linux just use your terminal conda will already be in the PATH)

    b.) Run conda create -n sr_mk_translate python=3.10 -y

    c.) Run conda activate sr_mk_translate

    d.) Run pip install google-cloud-translate

  5. To run translation following these steps:

    a.) Download the Serbian LLM evaluation dataset Serbian LLM Eval, or any other dataset of choice (just make sure to change the source language in translate.py - maybe the logic as well).

    b.) Place the dataset in a data/ folder.

    c.) Navigate to the translate/ directory.

    d.) Set up the config.py file.

    e.) Run the translation script: bash python translate.py

🤝 How to Contribute?

We welcome contributions to the Macedonian LLM Eval! If you'd like to contribute, here’s how you can get involved:

  1. Translate Popular Benchmarks:

    • Identify benchmarks that have not yet been translated into Macedonian. For example, PubmedQA, SQuAD, or any other popular datasets.
    • Translate the dataset into Macedonian using appropriate tools or methods (e.g., Google Translate API).
  2. Fork and Modify the Repository:

    • Fork this repo.
    • Modify the necessary parts of the repository to support the new dataset. This includes:
      • Updating the evaluation script (lm_eval/tasks/<dataset_name>.py) to include the new benchmark.
      • Refer to existing implementations (e.g., ARC, SuperGLUE, HellaSwag) for guidance on how to implement evaluation logic.
  3. Update and Modify the Script:

    • Edit the evaluation script to include the new benchmark.
    • Ensure all changes are tested and documented.
  4. Open a PR:

    • Open a PR to submit your changes.
    • In your PR description, detail the following:
      • The benchmark you translated.
      • The modifications you made to the code.
      • How your changes were tested.
    • If applicable, attach the modified evaluation script to your PR.

Citation

@article{krsteski2025towards, title={Towards Open Foundation Language Model and Corpus for Macedonian: A Low-Resource Language}, author={Krsteski, Stefan and Tashkovska, Matea and Sazdov, Borjan and Gjoreski, Hristijan and Gerazov, Branislav}, journal={arXiv preprint arXiv:2506.09560}, year={2025} }

📝 TODOs

  • ⬜️ Add COPA-MK to the eval (https://huggingface.co/datasets/classla/COPA-MK)

Owner

  • Name: LVSTCK
  • Login: LVSTCK
  • Kind: organization

Citation (CITATION.bib)

@software{eval-harness,
  author       = {Gao, Leo and
                  Tow, Jonathan and
                  Biderman, Stella and
                  Black, Sid and
                  DiPofi, Anthony and
                  Foster, Charles and
                  Golding, Laurence and
                  Hsu, Jeffrey and
                  McDonell, Kyle and
                  Muennighoff, Niklas and
                  Phang, Jason and
                  Reynolds, Laria and
                  Tang, Eric and
                  Thite, Anish and
                  Wang, Ben and
                  Wang, Kevin and
                  Zou, Andy},
  title        = {A framework for few-shot language model evaluation},
  month        = sep,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {v0.0.1},
  doi          = {10.5281/zenodo.5371628},
  url          = {https://doi.org/10.5281/zenodo.5371628}
}

GitHub Events

Total
  • Watch event: 11
  • Push event: 8
  • Public event: 1
  • Fork event: 1
Last Year
  • Watch event: 11
  • Push event: 8
  • Public event: 1
  • Fork event: 1