https://github.com/adaptivemotorcontrollab/llavaction
Science Score: 20.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
✓Committers with academic emails
5 of 24 committers (20.8%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Keywords
action-recognition
behavioral-analysis
llms
mmlms
Keywords from Contributors
transformer
Last synced: 6 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: AdaptiveMotorControlLab
- License: other
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://mmathislab.github.io/llavaction/
- Size: 16.9 MB
Statistics
- Stars: 39
- Watchers: 1
- Forks: 1
- Open Issues: 1
- Releases: 0
Topics
action-recognition
behavioral-analysis
llms
mmlms
Created 11 months ago
· Last pushed 8 months ago
https://github.com/AdaptiveMotorControlLab/LLaVAction/blob/main/
# LLaVAction: Evaluating and Training Multi-Modal Large Language Models for Action Recognition [](https://arxiv.org/abs/2503.18712) [](https://mmathislab.github.io/llavaction/) [](https://huggingface.co/MLAdaptiveIntelligence) [](https://paperswithcode.com/sota/action-recognition-on-epic-kitchens-100?p=llavaction-evaluating-and-training-multi) [](https://pepy.tech/project/llavaction) [](https://pepy.tech/project/llavaction) [](https://badge.fury.io/py/llavaction)  ## Abstract Understanding human behavior requires measuring behavioral actions. Due to its complexity, behavior is best mapped onto a rich, semantic structure such as language. The recent development of multi-modal large language models (MLLMs) is a promising candidate for a wide range of action understanding tasks. In this work, we focus on evaluating and then improving MLLMs to perform action recognition. We reformulate EPIC-KITCHENS-100, one of the largest and most challenging egocentric action datasets, to the form of video multiple question answering (EPIC-KITCHENS-100-MQA). We show that when we sample difficult incorrect answers as distractors, leading MLLMs struggle to recognize the correct actions. We propose a series of methods that greatly improve the MLLMs' ability to perform action recognition, achieving state-of-the-art on both the EPIC-KITCHENS-100 Challenge, as well as outperforming GPT-4o by 21 points in accuracy on EPIC-KITCHENS-100-MQA. Lastly, we show improvements on other action-related video benchmarks such as VideoMME, PerceptionTest and MVBench. ## Code - This repository contains the implementation for our preprint on evaluating and training multi-modal large language models for action recognition. - Our code is built on [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), and files in the directory `llavaction/action` are related to our work. We thank the authors of LLaVA-NeXT for making their code publicly available. - The files in the `/eval`, `/model`, `/serve` and `/train` are directly from [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), unless modified and noted below. - `/model/llava_arch.py` - `/model/language_model/llava_qwen.py` - `/train/train.py` - `/train/llava_trainer.py` - `/utils.py` ## Demo [](https://colab.research.google.com/github/AdaptiveMotorControlLab/LLaVAction/blob/main/example/llavaction_video_demo.ipynb) We provide code to run video inference in a Jupyter Notebook (which can be run on Google Colaboratory). ### Installation guide for video inference: ```bash conda create -n llavaction python=3.10 -y conda activate llavaction pip install --upgrade pip # Enable PEP 660 support. pip install --pre llavaction ``` - Please see the `/example` directory for a demo notebook. ## EPIC-KITCHENS-100-MQA In our work, we introduce a new way to evaluate MLMMs for action recognition by casting EPIC-KITCHENS-100 into a multi-question-answer benchmark. This has not yet been released [as of 3/2025], but please check the issues or open an issue if you are interested in accessing this resource before the paper is published. We also plan to integrate this the package [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval). # Acknowledgments We thank the Swiss AI Initiative Project ID a03 from the Swiss National Supercomputing Centre (CSCS); Boehringer Ingelheim Fonds PhD stipend (H.Q.); M.W.M. thanks the Vallee Foundation; M.W.M. and A.M. thank the SNSF by grant No. 320030-227871. 
Owner
- Name: Mathis Lab | Adaptive Motor Control
- Login: AdaptiveMotorControlLab
- Kind: organization
- Email: mackenzie@post.harvard.edu
- Location: Swiss Federal Institute of Technology
- Website: http://mackenziemathislab.org
- Twitter: mwmathislab
- Repositories: 9
- Profile: https://github.com/AdaptiveMotorControlLab
Mechanisms underlying adaptive behavior in intelligent systems
GitHub Events
Total
- Create event: 14
- Issues event: 1
- Release event: 3
- Watch event: 36
- Delete event: 9
- Issue comment event: 2
- Member event: 2
- Push event: 18
- Pull request review event: 1
- Pull request event: 16
- Fork event: 3
Last Year
- Create event: 14
- Issues event: 1
- Release event: 3
- Watch event: 36
- Delete event: 9
- Issue comment event: 2
- Member event: 2
- Push event: 18
- Pull request review event: 1
- Pull request event: 16
- Fork event: 3
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Bo Li | d****n@g****m | 410 |
| Bo Li | b****1@b****m | 111 |
| Ye Shaokai | s****e@h****h | 65 |
| shaokai ye | s****h@g****m | 52 |
| Ye Shaokai | s****e@s****h | 51 |
| kcz358 | k****8@o****m | 33 |
| Haozhe Qi | h****i@c****h | 26 |
| HaozheQi | 5****7@q****m | 25 |
| Mackenzie Mathis | m****s@e****h | 25 |
| chunyuan.li | c****i@b****m | 21 |
| Tianyi Xiong | x****2@g****m | 19 |
| ZhangYuanhan-AI | y****2@n****g | 18 |
| ChunyuanLI | C****I | 9 |
| Haozhe Qi | h****i@c****h | 5 |
| zhanghao.cx | z****x@b****m | 4 |
| jzhang38 | a****8@g****m | 2 |
| hzhangcx@connect.ust.hk | 9****5@q****m | 2 |
| Haozhe Qi | h****i@c****h | 2 |
| raushan | r****n@h****o | 1 |
| ngquangtrung57 | q****5@g****m | 1 |
| litianjian | l****n@b****m | 1 |
| Qi Haozhe | h****i@h****h | 1 |
| Haozhe Qi | h****i@t****h | 1 |
| Renrui Zhang | 5****r | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 9
- Average time to close issues: N/A
- Average time to close pull requests: about 2 hours
- Total issue authors: 1
- Total pull request authors: 4
- Average comments per issue: 1.0
- Average comments per pull request: 0.22
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 2
Past Year
- Issues: 1
- Pull requests: 9
- Average time to close issues: N/A
- Average time to close pull requests: about 2 hours
- Issue authors: 1
- Pull request authors: 4
- Average comments per issue: 1.0
- Average comments per pull request: 0.22
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 2
Top Authors
Issue Authors
- NielsRogge (1)
Pull Request Authors
- MMathisLab (10)
- dependabot[bot] (4)
- HaozheQi (3)
- yeshaokai (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (4)
python (4)