macedonian-llm-eval
LLM evaluation for Macedonian language.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.3%) to scientific vocabulary
Scientific Fields
Repository
LLM evaluation for Macedonian language.
Basic Info
- Host: GitHub
- Owner: LVSTCK
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 2.95 MB
Statistics
- Stars: 12
- Watchers: 0
- Forks: 1
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Macedonian LLM eval 🇲🇰
This repository is adapted from the original work by Aleksa Gordić. If you find this work useful, please consider citing or acknowledging the original source.
🎯 What is currently covered:
- Common sense reasoning:
Hellaswag,Winogrande,PIQA,OpenbookQA,ARC-Easy,ARC-Challenge - World knowledge:
NaturalQuestions - Reading comprehension:
BoolQ
You can find the Macedonian LLM eval dataset on HuggingFace. The dataset was translated from Serbian to Macedonian using the Google Translate API. The Serbian dataset was selected as the source instead of English because Serbian and Macedonian are closer from a linguistic standpoint, making Serbian a better starting point for translation. Additionally, the Serbian dataset was refined using GPT-4, which, according to the original report, significantly improved the quality of the translation.
Quality check was conducted on the translated Macedonian dataset, and the translations were deemed to be of good quality.
📊 Latest Results - January 16, 2025
| Model | Version | ARC Easy | ARC Challenge | Bool Q | HellaSwag | Openbook QA | PIQA | NQ Open | WinoGrande | |---------------------------------------------------------------------------------------------------------------------------------------------|---------|----------|---------------|--------|-----------|-------------|--------|---------|------------| | MKLLM-7B-Instruct | 7B | 0.5034 | 0.3003 | 0.7878 | 0.4328 | 0.2940 | 0.6420 | 0.0432 | 0.6148 | | BLOOM | 7B | 0.2774 | 0.1800 | 0.5028 | 0.2664 | 0.1580 | 0.5316 | 0 | 0.4964 | | Phi-3.5-mini | 3.8B | 0.2887 | 0.1877 | 0.6028 | 0.2634 | 0.1640 | 0.5256 | 0.0025 | 0.5193 | | Mistral | 7B | 0.4625 | 0.2867 | 0.7593 | 0.3722 | 0.2180 | 0.5783 | 0.0241 | 0.5612 | | Mistral-Nemo | 12B | 0.4718 | 0.3191 | 0.8086 | 0.3997 | 0.2420 | 0.6066 | 0.0291 | 0.6062 | | Qwen2.5 | 7B | 0.3906 | 0.2534 | 0.7789 | 0.3390 | 0.2160 | 0.5598 | 0.0042 | 0.5351 | | LLaMA 3.1 | 8B | 0.4453 | 0.2824 | 0.7639 | 0.3740 | 0.2520 | 0.5865 | 0.0335 | 0.5683 | | LLaMA 3.2 | 3B | 0.3224 | 0.2329 | 0.6624 | 0.2976 | 0.2060 | 0.5462 | 0.0044 | 0.5059 | | 🏆LLaMA 3.3 - 8bit | 70B | 0.5808 | 0.3686 | 0.8511 | 0.4656 | 0.2820 | 0.6600 | 0.0878 | 0.6093 | | domestic-yak-instruct | 8B | 0.5467 | 0.3362 | 0.7865 | 0.4480 | 0.3020 | 0.6910 | 0.0457 | 0.6267 |
📋Evaluation
To run the evaluation using the current version of macedonian-llm-eval you can follow the steps below:
Prerequisites
Before running the evaluation, ensure you have installed the necessary dependencies. First create an environment, e.g:
bash
conda create -n mk_eval python==3.10
conda activate mk_eval
Then run:
bash
pip install -e .
Run Evaluation
To evaluate a specific language model on a specific task run:
python3 main.py --language "Macedonian" --model hf-causal-experimental --model_args "pretrained=microsoft/Phi-3.5-mini-instruct" --tasks arc_challenge,arc_easy,boolq,hellaswag,openbookqa,piqa,winogrande --batch_size 8 --output_path "results_eval"
Info: You can run the evaluation for Serbian and Slovenian as well, just swap Macedonian with either one of them.
🈂️ (Optional) Translation
running this this will eat your google cloud credits or will bill you if you're already in the billing mode (this happens after you spend free credits and then deliberately enable billing again).
you can use your free credits to translate 500.000 chars / month!
if this is the first time you're creating a gcloud project you'll have 300$ of free credits!
Prerequisites
Before you begin, ensure you meet the following requirements:
For Linux Users:
For Windows Users: 1. Windows Subsystem for Linux (WSL2). If you don't have WSL2 installed, follow these steps in Windows cmd/powershell in administrator mode:
```bash
wsl --install
// Check version and distribution name. wsl -l -v
// Set the newly downloaded linux distro as default.
wsl --set-default <distribution name>
```
Install Git from the WSL terminal.
bash sudo apt update sudo apt install git git --versionInstall Miniconda from the WSL terminal. ```bash mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
// Initialize conda with bash. ~/miniconda3/bin/conda init bash ```
Follow the instructions below on WSL.
📗 Instructions for translating from Serbian into Macedonian
First let's setup a minimal Python program that makes sure you can run Google Translate on your local machine.
- Create a Google Console project (https://console.cloud.google.com/)
- Enable Google Translation API -> to enable it you have to setup the billing and input your credit card details (a note regarding safety: you'll have 300$ of free credit (if this is the first time you're doing it) and no one can spend money from your credit card unless all those free credits are spent and you re-enable the billing again! if you already had it setup in that case you have 500.000 chars/month for free!)
Install Google Cloud CLI (gsutil) on your machine (see this: https://cloud.google.com/storage/docs/gsutil_install/)
a.) Download the Linux archive file (find latest version from link above)
curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-455.0.0-linux-x86_64.tar.gzb.) Extract the contents from the archive file above.
tar -xf google-cloud-cli-455.0.0-linux-x86_64.tar.gzc.) Run installation script.
./google-cloud-sdk/install.shd.) Initiate and authenticate your account.
./google-cloud-sdk/bin/gcloud inite.) Create a credentials file with
gcloud auth application-default loginCreate and setting up the conda env
a.) Open a terminal (if on Windows use the
WSLterminal, if you're on Linux just use your terminal conda will already be in the PATH)b.) Run
conda create -n sr_mk_translate python=3.10 -yc.) Run
conda activate sr_mk_translated.) Run
pip install google-cloud-translateTo run translation following these steps:
a.) Download the Serbian LLM evaluation dataset Serbian LLM Eval, or any other dataset of choice (just make sure to change the source language in translate.py - maybe the logic as well).
b.) Place the dataset in a data/ folder.
c.) Navigate to the translate/ directory.
d.) Set up the
config.pyfile.e.) Run the translation script:
bash python translate.py
🤝 How to Contribute?
We welcome contributions to the Macedonian LLM Eval! If you'd like to contribute, here’s how you can get involved:
Translate Popular Benchmarks:
- Identify benchmarks that have not yet been translated into Macedonian. For example, PubmedQA, SQuAD, or any other popular datasets.
- Translate the dataset into Macedonian using appropriate tools or methods (e.g., Google Translate API).
- Identify benchmarks that have not yet been translated into Macedonian. For example, PubmedQA, SQuAD, or any other popular datasets.
Fork and Modify the Repository:
- Fork this repo.
- Modify the necessary parts of the repository to support the new dataset. This includes:
- Updating the evaluation script (
lm_eval/tasks/<dataset_name>.py) to include the new benchmark. - Refer to existing implementations (e.g., ARC, SuperGLUE, HellaSwag) for guidance on how to implement evaluation logic.
- Updating the evaluation script (
- Fork this repo.
Update and Modify the Script:
- Edit the evaluation script to include the new benchmark.
- Ensure all changes are tested and documented.
- Edit the evaluation script to include the new benchmark.
Open a PR:
- Open a PR to submit your changes.
- In your PR description, detail the following:
- The benchmark you translated.
- The modifications you made to the code.
- How your changes were tested.
- The benchmark you translated.
- If applicable, attach the modified evaluation script to your PR.
- Open a PR to submit your changes.
Citation
@article{krsteski2025towards,
title={Towards Open Foundation Language Model and Corpus for Macedonian: A Low-Resource Language},
author={Krsteski, Stefan and Tashkovska, Matea and Sazdov, Borjan and Gjoreski, Hristijan and Gerazov, Branislav},
journal={arXiv preprint arXiv:2506.09560},
year={2025}
}
📝 TODOs
- ⬜️ Add COPA-MK to the eval (https://huggingface.co/datasets/classla/COPA-MK)
Owner
- Name: LVSTCK
- Login: LVSTCK
- Kind: organization
- Repositories: 1
- Profile: https://github.com/LVSTCK
Citation (CITATION.bib)
@software{eval-harness,
author = {Gao, Leo and
Tow, Jonathan and
Biderman, Stella and
Black, Sid and
DiPofi, Anthony and
Foster, Charles and
Golding, Laurence and
Hsu, Jeffrey and
McDonell, Kyle and
Muennighoff, Niklas and
Phang, Jason and
Reynolds, Laria and
Tang, Eric and
Thite, Anish and
Wang, Ben and
Wang, Kevin and
Zou, Andy},
title = {A framework for few-shot language model evaluation},
month = sep,
year = 2021,
publisher = {Zenodo},
version = {v0.0.1},
doi = {10.5281/zenodo.5371628},
url = {https://doi.org/10.5281/zenodo.5371628}
}
GitHub Events
Total
- Watch event: 11
- Push event: 8
- Public event: 1
- Fork event: 1
Last Year
- Watch event: 11
- Push event: 8
- Public event: 1
- Fork event: 1