https://github.com/cosmaadrian/romath

Official repository for "RoMath: A Mathematical Reasoning Benchmark in 🇷🇴 Romanian 🇷🇴"

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, scholar.google
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary

Keywords

dataset llms-benchmarking mathematics romanian

Last synced: 10 months ago · JSON representation

Repository

Official repository for "RoMath: A Mathematical Reasoning Benchmark in 🇷🇴 Romanian 🇷🇴"

Basic Info

Host: GitHub
Owner: cosmaadrian
License: other
Language: Python
Default Branch: master
Homepage: https://arxiv.org/abs/2409.11074
Size: 276 KB

Statistics

Stars: 8
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Topics

dataset llms-benchmarking mathematics romanian

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

RoMath: A Mathematical Reasoning Benchmark in Romanian

[Adrian Cosma](https://scholar.google.com/citations?user=cdYk_RUAAAAJ&hl=en), [Ana-Maria Bucur](https://scholar.google.com/citations?user=TQuQ5IAAAAAJ&hl=en), [Emilian Radoi](https://scholar.google.com/citations?user=yjtWIf8AAAAJ&hl=en)

[📘 Abstract](#intro) | [⚒️ Usage](#usage) | [♻️ Reproducing the Results](#repro) | [📖 Citation](#citation) | [📝 License](#license)

TL;DR

[📜 Arxiv Link](https://arxiv.org/abs/2409.11074) | [🤗 Huggingface Dataset](https://huggingface.co/datasets/cosmadrian/romath)

📘 Abstract

Mathematics has long been conveyed through natural language, primarily for human understanding. With the rise of mechanized mathematics and proof assistants, there's a growing need to translate informal mathematical text into formal languages. However, most existing benchmarks focus solely on English, overlooking other languages. This paper introduces RoMath, a Romanian mathematical reasoning benchmark suite comprising three datasets: RoMath-Synthetic, RoMath-Baccalaureate, and RoMath-Competitions. These datasets cover a range of mathematical domains and difficulty levels, aiming to improve non-English language models and promote multilingual AI development. By focusing on Romanian, a low-resource language with unique linguistic features, RoMath addresses the limitations of Anglo-centric models and emphasizes the need for dedicated resources beyond simple automatic translation. We benchmark several language models, highlighting the importance of creating resources for underrepresented languages.

⚒️ Usage

Loading the data from 🤗 Huggingface Datasets:

```python import datasets

subset = 'bac' # could be comps or synthetic

traindataset = datasets.loaddataset('cosmadrian/romath', subset, split = 'train') testdataset = datasets.loaddataset('cosmadrian/romath', subset, split = 'test')

Do your thing ...

```

♻️ Reproducing the Results

Generating your own split for Synthetic

While a pre-generated split for RoMath-Synthetic is provided for convenience on 🤗 HuggingFace, you can generate your own problems using the original DeepMind code with key phrases translated.

See romath-synthetic/ directory for instructions.

Running Experiments

Experiments for the paper are organized in the in the experiments/ directory, with separate scripts for each experiment in the paper. We used SLURM on a private cluster to train, make predictions and evaluate models. Use ./do_sbatch.sh <script.sh> <n_gpus> to run a particular bash script. Modify the ./do_sbatch.sh file to suit your needs.

To run a particular model on a dataset use the following commands: ```

Optional LoRA-Fine-tuning

python finetune.py --model <hfmodel_name> --dataset [bac|comps|synthetic] --output checkpoints/ ```

```

Use a (trained) model to make predictions on a test set.

python predict.py --model --dataset [bac|comps|synthetic] --temperature 0.5 --k 3 --shots 5 --output predictions/ ```

```

Evaluate the predictions of a model using a judge model.

python evaluate.py --predfile predictions/Qwen-Qwen2-1.5B-Instructbac20.5.csv --judgemodel <hfmodel_name> --output results/ ```

```

Compute the relevant metrics for all evaluated prediction files in a folder.

python evaluate/computemetrics.py --inputdir results/ --output_dir metrics/ ```

For translation, use the translate.py python script, alongside the predict_translated.py script.

For constructing the Judge Dataset (i.e., Table 3), run the evaluate/make_judge_dataset.py with the appropriate arguments and run evaluate_judge.py script.

GRPO Training

For training with rewards, first train an SFT model on Baccalaureate and Competitions using a "reasoning" format like <raționament>...</raționament><răspuns>...</răspuns>, like so:

python sft_everything.py --batch_size 4 --model meta-llama/Llama-3.2-1B-Instruct --seed 42

Afterwards, train using GRPO (only correctness reward and strict_format reward), like so (adjust the parameters in the script to suit your hardware capabilities):

python grpo.py --batch_size 1 --model checkpoints-sft/meta-llama-Llama-3.2-1B-Instruct-sft/checkpoint-246/ --seed 42

📖 Citation

If you found our work useful, please cite our paper:

RoMath: A Mathematical Reasoning Benchmark in 🇷🇴 Romanian 🇷🇴

@misc{cosma2024romath, title={RoMath: A Mathematical Reasoning Benchmark in Romanian}, author={Adrian Cosma and Ana-Maria Bucur and Emilian Radoi}, year={2024}, eprint={2409.11074}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2409.11074}, }

📝 License

This work is protected by Attribution-NonCommercial 4.0 International

Owner

Name: Adrian Cosma
Login: cosmaadrian
Kind: user
Location: Bucharest, Romania
Company: University Politehnica of Bucharest

Repositories: 21
Profile: https://github.com/cosmaadrian

Mercenary Researcher

GitHub Events

Total

Watch event: 2
Push event: 4

Last Year

Watch event: 2
Push event: 4

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/cosmaadrian/romath

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

RoMath: A Mathematical Reasoning Benchmark in Romanian

TL;DR

📘 Abstract

⚒️ Usage

Do your thing ...

♻️ Reproducing the Results

Generating your own split for Synthetic

Running Experiments

Optional LoRA-Fine-tuning

Use a (trained) model to make predictions on a test set.

Evaluate the predictions of a model using a judge model.

Compute the relevant metrics for all evaluated prediction files in a folder.

GRPO Training

📖 Citation

📝 License

Owner

GitHub Events

Total

Last Year