reasoning_multimodal_llms

https://github.com/nagho/reasoning_multimodal_llms

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (4.0%) to scientific vocabulary

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 40% confidence

Last synced: 11 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: NaGho
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Size: 15.2 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation Support

README.md

reasoningmultimodalLLMs

This project implements a multimodal math word problem solver that integrates text descriptions and visual inputs (diagrams, charts).

Dataset: MATH-Vision

Problem Statement: Improve the accuracy of the answers generated by mLLMs on the MATH-Vision dataset

Limitations: One A100 GPU 40GB RAM on Google colab Open-source models where fine tuning is an option Important for Edge ML

LLaVA1.5 (Large Language and Vision Assistant)

Owner

Name: Nafiseh Ghoroghchian
Login: NaGho
Kind: user

Repositories: 1
Profile: https://github.com/NaGho

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: lmms-finetune
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Jingyang
    family-names: Zhang
    email: zhjy227@gmail.com
    orcid: 'https://orcid.org/0000-0002-9771-5111'
  - given-names: Yueqian
    family-names: Lin
    email: yueqian.lin@duke.edu
    affiliation: Duke University
    orcid: 'https://orcid.org/0000-0003-1473-8981'
repository-code: 'https://github.com/zjysteven/lmms-finetune'
abstract: >-
  lmms-finetune is a lightweight, unified codebase for
  finetuning multiple latest multi-modal LLMs including
  llava-1.5/1.6/interleave/next-video/onevision,
  qwen-vl(-2), and phi3-v.
keywords:
  - LLM
  - foundation model
  - multi-modal LLM
license: Apache-2.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science