https://github.com/ai4bharat/fermat

A vLLM-based Pipeline for benchmarking various VLMs on HMER Dataset of AI4Bharat

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

A vLLM-based Pipeline for benchmarking various VLMs on HMER Dataset of AI4Bharat

Basic Info

Host: GitHub
Owner: AI4Bharat
License: mit
Language: Python
Default Branch: main
Size: 902 KB

Statistics

Stars: 4
Watchers: 1
Forks: 1
Open Issues: 1
Releases: 0

Created over 1 year ago · Last pushed 11 months ago

Metadata Files

Readme License

FERMAT: Can Vision-Language Models Evaluate Handwritten Math?

📜 Paper | 🤗 HF Dataset

We present FERMAT, a benchmark designed to assess VLMs’ ability to detect, localize and correct errors in handwritten mathematical content. Please refer to our paper for more details.

We present FERMAT, a benchmark designed to assess VLMs’ ability to detect, localize and correct errors in handwritten mathematical content.

Loading Data

Steps to download data and store the images in benchmarkimages, and csv in benchmarkcsv. Steps to dowload data for the oikantik format

Setup

To run evaluation of VLMs against the FEMRAT dataset, you need to install the required packages by running the following command:

bash pip install -r requirements.txt

We self-hosted Pixtral-12B-2409 (https://huggingface.co/mistralai/Pixtral-12B-2409), Pixtral-Large-Instruct-2411, LLaMa-3.2-11B-Vision-Instruct, LLaMa-3.2-90B-Vision-Instruct, Phi-3.5-Vision-Instruct using VLLM (https://github.com/vllm-project/vllm)

We used hosted services for GPT-Family, Gemini-Family

For self-hosted models,

Set up environment variables:

bash export OPENAI_API_BASE=[ADD_THE_ENDPOINT_URL_OF_HOSTED_MODEL]

Example: "http://localhost:8004/v1"

Start Evaluations:

bash python main.py --model [MODEL_NAME] --dir_name [DATA_DIR]

MODELNAME: Name of the model to be evaluated. Choices: `['pixtral', 'pixtrallarge', 'phi', 'llama_large', 'llama']`
DATA_DIR: Path to the directory where the Benchmark Images are stored

Fill-in CSV

Once the evaluation is done, the results will be stored in a JSON File with the format state_<MODEL_NAME>.json. You can convert this JSON file to a CSV file using the following command:

bash python fill_in_csv.py --model [MODEL_NAME] --csv-file [CSV_FILE] --json-file [JSON_FILE]

MODELNAME: Name of the model to be evaluated. Choices: `['pixtral', 'pixtrallarge', 'phi', 'llama_large', 'llama']`
CSV_FILE: Path to the CSV file where the results need to be filled in.
JSON_FILE: Path to the JSON file where the results are stored.

Citation

If you used this repository or our models, please cite our work:

bibtex @article{nath2025vision1language, title = {Can Vision-Language Models Evaluate Handwritten Math?}, author = {Oikantik Nath and Hanani Bathina and Mohammed Safi Ur Rahman Khan and Mitesh M. Khapra}, year = {2025}, journal = {arXiv preprint arXiv: 2501.07244} }

Owner

Name: AI4Bhārat
Login: AI4Bharat
Kind: organization
Email: opensource@ai4bharat.org
Location: India

Website: https://ai4bharat.org
Twitter: AI4Bharat
Repositories: 37
Profile: https://github.com/AI4Bharat

Artificial-Intelligence-For-Bhārat : Building open-source AI solutions for India!

GitHub Events

Total

Watch event: 1
Push event: 2
Fork event: 1

Last Year

Watch event: 1
Push event: 2
Fork event: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 1
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/ai4bharat/fermat

Science Score: 23.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

FERMAT: Can Vision-Language Models Evaluate Handwritten Math?

Loading Data

Setup

Citation

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels