https://github.com/cosmaadrian/romath

Official repository for "RoMath: A Mathematical Reasoning Benchmark in πŸ‡·πŸ‡΄ Romanian πŸ‡·πŸ‡΄"

https://github.com/cosmaadrian/romath

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • β—‹
    CITATION.cff file
  • βœ“
    codemeta.json file
    Found codemeta.json file
  • β—‹
    .zenodo.json file
  • β—‹
    DOI references
  • βœ“
    Academic publication links
    Links to: arxiv.org, scholar.google
  • β—‹
    Academic email domains
  • β—‹
    Institutional organization owner
  • β—‹
    JOSS paper metadata
  • β—‹
    Scientific vocabulary similarity
    Low similarity (11.6%) to scientific vocabulary

Keywords

dataset llms-benchmarking mathematics romanian
Last synced: 10 months ago · JSON representation

Repository

Official repository for "RoMath: A Mathematical Reasoning Benchmark in πŸ‡·πŸ‡΄ Romanian πŸ‡·πŸ‡΄"

Basic Info
Statistics
  • Stars: 8
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
dataset llms-benchmarking mathematics romanian
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

RoMath: A Mathematical Reasoning Benchmark in Romanian

[Adrian Cosma](https://scholar.google.com/citations?user=cdYk_RUAAAAJ&hl=en), [Ana-Maria Bucur](https://scholar.google.com/citations?user=TQuQ5IAAAAAJ&hl=en), [Emilian Radoi](https://scholar.google.com/citations?user=yjtWIf8AAAAJ&hl=en)
[πŸ“˜ Abstract](#intro) | [βš’οΈ Usage](#usage) | [♻️ Reproducing the Results](#repro) | [πŸ“– Citation](#citation) | [πŸ“ License](#license)

TL;DR

[πŸ“œ Arxiv Link](https://arxiv.org/abs/2409.11074) | [πŸ€— Huggingface Dataset](https://huggingface.co/datasets/cosmadrian/romath)

πŸ“˜ Abstract

Mathematics has long been conveyed through natural language, primarily for human understanding. With the rise of mechanized mathematics and proof assistants, there's a growing need to translate informal mathematical text into formal languages. However, most existing benchmarks focus solely on English, overlooking other languages. This paper introduces RoMath, a Romanian mathematical reasoning benchmark suite comprising three datasets: RoMath-Synthetic, RoMath-Baccalaureate, and RoMath-Competitions. These datasets cover a range of mathematical domains and difficulty levels, aiming to improve non-English language models and promote multilingual AI development. By focusing on Romanian, a low-resource language with unique linguistic features, RoMath addresses the limitations of Anglo-centric models and emphasizes the need for dedicated resources beyond simple automatic translation. We benchmark several language models, highlighting the importance of creating resources for underrepresented languages.

βš’οΈ Usage

Loading the data from πŸ€— Huggingface Datasets:

```python import datasets

subset = 'bac' # could be comps or synthetic

traindataset = datasets.loaddataset('cosmadrian/romath', subset, split = 'train') testdataset = datasets.loaddataset('cosmadrian/romath', subset, split = 'test')

Do your thing ...

```

♻️ Reproducing the Results

Generating your own split for Synthetic

While a pre-generated split for RoMath-Synthetic is provided for convenience on πŸ€— HuggingFace, you can generate your own problems using the original DeepMind code with key phrases translated.

See romath-synthetic/ directory for instructions.

Running Experiments

Experiments for the paper are organized in the in the experiments/ directory, with separate scripts for each experiment in the paper. We used SLURM on a private cluster to train, make predictions and evaluate models. Use ./do_sbatch.sh <script.sh> <n_gpus> to run a particular bash script. Modify the ./do_sbatch.sh file to suit your needs.

To run a particular model on a dataset use the following commands: ```

Optional LoRA-Fine-tuning

python finetune.py --model <hfmodel_name> --dataset [bac|comps|synthetic] --output checkpoints/ ```

```

Use a (trained) model to make predictions on a test set.

python predict.py --model --dataset [bac|comps|synthetic] --temperature 0.5 --k 3 --shots 5 --output predictions/ ```

```

Evaluate the predictions of a model using a judge model.

python evaluate.py --predfile predictions/Qwen-Qwen2-1.5B-Instructbac20.5.csv --judgemodel <hfmodel_name> --output results/ ```

```

Compute the relevant metrics for all evaluated prediction files in a folder.

python evaluate/computemetrics.py --inputdir results/ --output_dir metrics/ ```

For translation, use the translate.py python script, alongside the predict_translated.py script.

For constructing the Judge Dataset (i.e., Table 3), run the evaluate/make_judge_dataset.py with the appropriate arguments and run evaluate_judge.py script.

GRPO Training

For training with rewards, first train an SFT model on Baccalaureate and Competitions using a "reasoning" format like <raționament>...</raționament><răspuns>...</răspuns>, like so:

python sft_everything.py --batch_size 4 --model meta-llama/Llama-3.2-1B-Instruct --seed 42

Afterwards, train using GRPO (only correctness reward and strict_format reward), like so (adjust the parameters in the script to suit your hardware capabilities):

python grpo.py --batch_size 1 --model checkpoints-sft/meta-llama-Llama-3.2-1B-Instruct-sft/checkpoint-246/ --seed 42

πŸ“– Citation

If you found our work useful, please cite our paper:

RoMath: A Mathematical Reasoning Benchmark in πŸ‡·πŸ‡΄ Romanian πŸ‡·πŸ‡΄

@misc{cosma2024romath, title={RoMath: A Mathematical Reasoning Benchmark in Romanian}, author={Adrian Cosma and Ana-Maria Bucur and Emilian Radoi}, year={2024}, eprint={2409.11074}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2409.11074}, }

πŸ“ License

This work is protected by Attribution-NonCommercial 4.0 International

Owner

  • Name: Adrian Cosma
  • Login: cosmaadrian
  • Kind: user
  • Location: Bucharest, Romania
  • Company: University Politehnica of Bucharest

Mercenary Researcher

GitHub Events

Total
  • Watch event: 2
  • Push event: 4
Last Year
  • Watch event: 2
  • Push event: 4