https://github.com/cosmaadrian/romath
Official repository for "RoMath: A Mathematical Reasoning Benchmark in π·π΄ Romanian π·π΄"
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
βCITATION.cff file
-
βcodemeta.json file
Found codemeta.json file -
β.zenodo.json file
-
βDOI references
-
βAcademic publication links
Links to: arxiv.org, scholar.google -
βAcademic email domains
-
βInstitutional organization owner
-
βJOSS paper metadata
-
βScientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary
Keywords
Repository
Official repository for "RoMath: A Mathematical Reasoning Benchmark in π·π΄ Romanian π·π΄"
Basic Info
- Host: GitHub
- Owner: cosmaadrian
- License: other
- Language: Python
- Default Branch: master
- Homepage: https://arxiv.org/abs/2409.11074
- Size: 276 KB
Statistics
- Stars: 8
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
RoMath: A Mathematical Reasoning Benchmark in Romanian
TL;DR
π Abstract
Mathematics has long been conveyed through natural language, primarily for human understanding. With the rise of mechanized mathematics and proof assistants, there's a growing need to translate informal mathematical text into formal languages. However, most existing benchmarks focus solely on English, overlooking other languages. This paper introduces RoMath, a Romanian mathematical reasoning benchmark suite comprising three datasets: RoMath-Synthetic, RoMath-Baccalaureate, and RoMath-Competitions. These datasets cover a range of mathematical domains and difficulty levels, aiming to improve non-English language models and promote multilingual AI development. By focusing on Romanian, a low-resource language with unique linguistic features, RoMath addresses the limitations of Anglo-centric models and emphasizes the need for dedicated resources beyond simple automatic translation. We benchmark several language models, highlighting the importance of creating resources for underrepresented languages.
βοΈ Usage
Loading the data from π€ Huggingface Datasets:
```python import datasets
subset = 'bac' # could be comps or synthetic
traindataset = datasets.loaddataset('cosmadrian/romath', subset, split = 'train') testdataset = datasets.loaddataset('cosmadrian/romath', subset, split = 'test')
Do your thing ...
```
β»οΈ Reproducing the Results
Generating your own split for Synthetic
While a pre-generated split for RoMath-Synthetic is provided for convenience on π€ HuggingFace, you can generate your own problems using the original DeepMind code with key phrases translated.
See romath-synthetic/ directory for instructions.
Running Experiments
Experiments for the paper are organized in the in the experiments/ directory, with separate scripts for each experiment in the paper. We used SLURM on a private cluster to train, make predictions and evaluate models. Use ./do_sbatch.sh <script.sh> <n_gpus> to run a particular bash script. Modify the ./do_sbatch.sh file to suit your needs.
To run a particular model on a dataset use the following commands: ```
Optional LoRA-Fine-tuning
python finetune.py --model <hfmodel_name> --dataset [bac|comps|synthetic] --output checkpoints/ ```
```
Use a (trained) model to make predictions on a test set.
python predict.py --model
```
Evaluate the predictions of a model using a judge model.
python evaluate.py --predfile predictions/Qwen-Qwen2-1.5B-Instructbac20.5.csv --judgemodel <hfmodel_name> --output results/ ```
```
Compute the relevant metrics for all evaluated prediction files in a folder.
python evaluate/computemetrics.py --inputdir results/ --output_dir metrics/ ```
For translation, use the translate.py python script, alongside the predict_translated.py script.
For constructing the Judge Dataset (i.e., Table 3), run the evaluate/make_judge_dataset.py with the appropriate arguments and run evaluate_judge.py script.
GRPO Training
For training with rewards, first train an SFT model on Baccalaureate and Competitions using a "reasoning" format like <raΘionament>...</raΘionament><rΔspuns>...</rΔspuns>, like so:
python sft_everything.py --batch_size 4 --model meta-llama/Llama-3.2-1B-Instruct --seed 42
Afterwards, train using GRPO (only correctness reward and strict_format reward), like so (adjust the parameters in the script to suit your hardware capabilities):
python grpo.py --batch_size 1 --model checkpoints-sft/meta-llama-Llama-3.2-1B-Instruct-sft/checkpoint-246/ --seed 42
π Citation
If you found our work useful, please cite our paper:
RoMath: A Mathematical Reasoning Benchmark in π·π΄ Romanian π·π΄
@misc{cosma2024romath,
title={RoMath: A Mathematical Reasoning Benchmark in Romanian},
author={Adrian Cosma and Ana-Maria Bucur and Emilian Radoi},
year={2024},
eprint={2409.11074},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.11074},
}
π License
This work is protected by Attribution-NonCommercial 4.0 International
Owner
- Name: Adrian Cosma
- Login: cosmaadrian
- Kind: user
- Location: Bucharest, Romania
- Company: University Politehnica of Bucharest
- Repositories: 21
- Profile: https://github.com/cosmaadrian
Mercenary Researcher
GitHub Events
Total
- Watch event: 2
- Push event: 4
Last Year
- Watch event: 2
- Push event: 4