reasoning_multimodal_llms
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (4.0%) to scientific vocabulary
Scientific Fields
Artificial Intelligence and Machine Learning
Computer Science -
40% confidence
Last synced: 4 months ago
·
JSON representation
·
Repository
Basic Info
- Host: GitHub
- Owner: NaGho
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 15.2 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created about 1 year ago
· Last pushed 10 months ago
Metadata Files
Readme
License
Citation
Support
README.md
reasoningmultimodalLLMs
This project implements a multimodal math word problem solver that integrates text descriptions and visual inputs (diagrams, charts).
Dataset: MATH-Vision
Problem Statement: Improve the accuracy of the answers generated by mLLMs on the MATH-Vision dataset
Limitations:
One A100 GPU 40GB RAM on Google colab
Open-source models where
fine tuning is an option
Important for Edge ML
LLaVA1.5 (Large Language and Vision Assistant)
Owner
- Name: Nafiseh Ghoroghchian
- Login: NaGho
- Kind: user
- Repositories: 1
- Profile: https://github.com/NaGho
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: lmms-finetune
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Jingyang
family-names: Zhang
email: zhjy227@gmail.com
orcid: 'https://orcid.org/0000-0002-9771-5111'
- given-names: Yueqian
family-names: Lin
email: yueqian.lin@duke.edu
affiliation: Duke University
orcid: 'https://orcid.org/0000-0003-1473-8981'
repository-code: 'https://github.com/zjysteven/lmms-finetune'
abstract: >-
lmms-finetune is a lightweight, unified codebase for
finetuning multiple latest multi-modal LLMs including
llava-1.5/1.6/interleave/next-video/onevision,
qwen-vl(-2), and phi3-v.
keywords:
- LLM
- foundation model
- multi-modal LLM
license: Apache-2.0
GitHub Events
Total
- Push event: 110
- Create event: 2
Last Year
- Push event: 110
- Create event: 2