https://github.com/adaptivemotorcontrollab/llavaction

https://github.com/adaptivemotorcontrollab/llavaction

Science Score: 20.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    5 of 24 committers (20.8%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary

Keywords

action-recognition behavioral-analysis llms mmlms

Keywords from Contributors

transformer
Last synced: 6 months ago · JSON representation

Repository

Basic Info
Statistics
  • Stars: 39
  • Watchers: 1
  • Forks: 1
  • Open Issues: 1
  • Releases: 0
Topics
action-recognition behavioral-analysis llms mmlms
Created 11 months ago · Last pushed 8 months ago

https://github.com/AdaptiveMotorControlLab/LLaVAction/blob/main/

# LLaVAction: Evaluating and Training Multi-Modal Large Language Models for Action Recognition

[![Static Badge](https://img.shields.io/badge/LLaVAction-paper-green)](https://arxiv.org/abs/2503.18712)
[![Demo Website](https://img.shields.io/badge/LLaVAction-website-red)](https://mmathislab.github.io/llavaction/)
[![llavaction-checkpoints](https://img.shields.io/badge/LLaVAction-checkpoints_-blue)](https://huggingface.co/MLAdaptiveIntelligence)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llavaction-evaluating-and-training-multi/action-recognition-on-epic-kitchens-100)](https://paperswithcode.com/sota/action-recognition-on-epic-kitchens-100?p=llavaction-evaluating-and-training-multi)

[![Downloads](https://static.pepy.tech/badge/llavaction)](https://pepy.tech/project/llavaction)
[![Downloads](https://static.pepy.tech/badge/llavaction/month)](https://pepy.tech/project/llavaction)
[![PyPI version](https://badge.fury.io/py/llavaction.svg)](https://badge.fury.io/py/llavaction)
![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-red)

## Abstract

Understanding human behavior requires measuring behavioral actions. Due to its complexity, behavior is best mapped onto a rich, semantic structure such as language. The recent development of multi-modal large language models (MLLMs) is a promising candidate for a wide range of action understanding tasks. In this work, we focus on evaluating and then improving MLLMs to perform action recognition. We reformulate EPIC-KITCHENS-100, one of the largest and most challenging egocentric action datasets, to the form of video multiple question answering (EPIC-KITCHENS-100-MQA). We show that when we sample difficult incorrect answers as distractors, leading MLLMs struggle to recognize the correct actions. We propose a series of methods that greatly improve the MLLMs' ability to perform action recognition, achieving state-of-the-art on both the EPIC-KITCHENS-100 Challenge, as well as outperforming GPT-4o by 21 points in accuracy on EPIC-KITCHENS-100-MQA. Lastly, we show improvements on other action-related video benchmarks such as VideoMME, PerceptionTest and MVBench.

## Code

- This repository contains the implementation for our preprint on evaluating and training multi-modal large language models for action recognition. 
- Our code is built on [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), and files in the directory `llavaction/action` are related to our work. We thank the authors of LLaVA-NeXT for making their code publicly available.
- The files in the `/eval`, `/model`, `/serve` and `/train` are directly from [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), unless modified and noted below.
  - `/model/llava_arch.py`  
  - `/model/language_model/llava_qwen.py`  
  - `/train/train.py`  
  - `/train/llava_trainer.py`  
  - `/utils.py` 

## Demo 
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AdaptiveMotorControlLab/LLaVAction/blob/main/example/llavaction_video_demo.ipynb)
 We provide code to run video inference in a Jupyter Notebook (which can be run on Google Colaboratory).

  
### Installation guide for video inference:
```bash
conda create -n llavaction python=3.10 -y
conda activate llavaction
pip install --upgrade pip  # Enable PEP 660 support.
pip install --pre llavaction
```

- Please see the `/example` directory for a demo notebook.

## EPIC-KITCHENS-100-MQA 

In our work, we introduce a new way to evaluate MLMMs for action recognition by casting EPIC-KITCHENS-100 into a multi-question-answer benchmark. This has not yet been released [as of 3/2025], but please check the issues or open an issue if you are interested in accessing this resource before the paper is published. We also plan to integrate this the package [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).

# Acknowledgments 
We thank the Swiss AI Initiative Project ID a03 from the Swiss National Supercomputing Centre (CSCS); Boehringer Ingelheim Fonds PhD stipend (H.Q.); M.W.M. thanks the Vallee Foundation; M.W.M. and A.M. thank the SNSF by grant No. 320030-227871.

![group-logo](https://github.com/user-attachments/assets/ad034dc3-5e92-4e8b-915b-85e443b3bdb2)

Owner

  • Name: Mathis Lab | Adaptive Motor Control
  • Login: AdaptiveMotorControlLab
  • Kind: organization
  • Email: mackenzie@post.harvard.edu
  • Location: Swiss Federal Institute of Technology

Mechanisms underlying adaptive behavior in intelligent systems

GitHub Events

Total
  • Create event: 14
  • Issues event: 1
  • Release event: 3
  • Watch event: 36
  • Delete event: 9
  • Issue comment event: 2
  • Member event: 2
  • Push event: 18
  • Pull request review event: 1
  • Pull request event: 16
  • Fork event: 3
Last Year
  • Create event: 14
  • Issues event: 1
  • Release event: 3
  • Watch event: 36
  • Delete event: 9
  • Issue comment event: 2
  • Member event: 2
  • Push event: 18
  • Pull request review event: 1
  • Pull request event: 16
  • Fork event: 3

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 886
  • Total Committers: 24
  • Avg Commits per committer: 36.917
  • Development Distribution Score (DDS): 0.537
Past Year
  • Commits: 410
  • Committers: 20
  • Avg Commits per committer: 20.5
  • Development Distribution Score (DDS): 0.778
Top Committers
Name Email Commits
Bo Li d****n@g****m 410
Bo Li b****1@b****m 111
Ye Shaokai s****e@h****h 65
shaokai ye s****h@g****m 52
Ye Shaokai s****e@s****h 51
kcz358 k****8@o****m 33
Haozhe Qi h****i@c****h 26
HaozheQi 5****7@q****m 25
Mackenzie Mathis m****s@e****h 25
chunyuan.li c****i@b****m 21
Tianyi Xiong x****2@g****m 19
ZhangYuanhan-AI y****2@n****g 18
ChunyuanLI C****I 9
Haozhe Qi h****i@c****h 5
zhanghao.cx z****x@b****m 4
jzhang38 a****8@g****m 2
hzhangcx@connect.ust.hk 9****5@q****m 2
Haozhe Qi h****i@c****h 2
raushan r****n@h****o 1
ngquangtrung57 q****5@g****m 1
litianjian l****n@b****m 1
Qi Haozhe h****i@h****h 1
Haozhe Qi h****i@t****h 1
Renrui Zhang 5****r 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 9
  • Average time to close issues: N/A
  • Average time to close pull requests: about 2 hours
  • Total issue authors: 1
  • Total pull request authors: 4
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.22
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 1
  • Pull requests: 9
  • Average time to close issues: N/A
  • Average time to close pull requests: about 2 hours
  • Issue authors: 1
  • Pull request authors: 4
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.22
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 2
Top Authors
Issue Authors
  • NielsRogge (1)
Pull Request Authors
  • MMathisLab (10)
  • dependabot[bot] (4)
  • HaozheQi (3)
  • yeshaokai (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (4) python (4)