https://github.com/ai4bharat/fermat

A vLLM-based Pipeline for benchmarking various VLMs on HMER Dataset of AI4Bharat

https://github.com/ai4bharat/fermat

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

A vLLM-based Pipeline for benchmarking various VLMs on HMER Dataset of AI4Bharat

Basic Info
  • Host: GitHub
  • Owner: AI4Bharat
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 902 KB
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 1
  • Open Issues: 1
  • Releases: 0
Created over 1 year ago · Last pushed 11 months ago
Metadata Files
Readme License

README.md

FERMAT: Can Vision-Language Models Evaluate Handwritten Math?

📜 Paper | 🤗 HF Dataset

We present FERMAT, a benchmark designed to assess VLMs’ ability to detect, localize and correct errors in handwritten mathematical content. Please refer to our paper for more details.

We present FERMAT, a benchmark designed to assess VLMs’ ability to detect, localize and correct errors in handwritten mathematical content.

Loading Data

Steps to download data and store the images in benchmarkimages, and csv in benchmarkcsv. Steps to dowload data for the oikantik format

Setup

To run evaluation of VLMs against the FEMRAT dataset, you need to install the required packages by running the following command:

bash pip install -r requirements.txt

We self-hosted Pixtral-12B-2409 (https://huggingface.co/mistralai/Pixtral-12B-2409), Pixtral-Large-Instruct-2411, LLaMa-3.2-11B-Vision-Instruct, LLaMa-3.2-90B-Vision-Instruct, Phi-3.5-Vision-Instruct using VLLM (https://github.com/vllm-project/vllm)

We used hosted services for GPT-Family, Gemini-Family

For self-hosted models,

  1. Set up environment variables:

bash export OPENAI_API_BASE=[ADD_THE_ENDPOINT_URL_OF_HOSTED_MODEL]

Example: "http://localhost:8004/v1"

  1. Start Evaluations:

bash python main.py --model [MODEL_NAME] --dir_name [DATA_DIR]

  • MODELNAME: Name of the model to be evaluated. Choices: `['pixtral', 'pixtrallarge', 'phi', 'llama_large', 'llama']`
  • DATA_DIR: Path to the directory where the Benchmark Images are stored
  1. Fill-in CSV

Once the evaluation is done, the results will be stored in a JSON File with the format state_<MODEL_NAME>.json. You can convert this JSON file to a CSV file using the following command:

bash python fill_in_csv.py --model [MODEL_NAME] --csv-file [CSV_FILE] --json-file [JSON_FILE]

  • MODELNAME: Name of the model to be evaluated. Choices: `['pixtral', 'pixtrallarge', 'phi', 'llama_large', 'llama']`
  • CSV_FILE: Path to the CSV file where the results need to be filled in.
  • JSON_FILE: Path to the JSON file where the results are stored.

Citation

If you used this repository or our models, please cite our work:

bibtex @article{nath2025vision1language, title = {Can Vision-Language Models Evaluate Handwritten Math?}, author = {Oikantik Nath and Hanani Bathina and Mohammed Safi Ur Rahman Khan and Mitesh M. Khapra}, year = {2025}, journal = {arXiv preprint arXiv: 2501.07244} }

Owner

  • Name: AI4Bhārat
  • Login: AI4Bharat
  • Kind: organization
  • Email: opensource@ai4bharat.org
  • Location: India

Artificial-Intelligence-For-Bhārat : Building open-source AI solutions for India!

GitHub Events

Total
  • Watch event: 1
  • Push event: 2
  • Fork event: 1
Last Year
  • Watch event: 1
  • Push event: 2
  • Fork event: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • 9115jin (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels