https://github.com/adaptivemotorcontrollab/llavaction

Science Score: 20.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
5 of 24 committers (20.8%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary

Keywords

action-recognition behavioral-analysis llms mmlms

Keywords from Contributors

transformer

Last synced: 6 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: AdaptiveMotorControlLab
License: other
Language: Jupyter Notebook
Default Branch: main
Homepage: https://mmathislab.github.io/llavaction/
Size: 16.9 MB

Statistics

Stars: 39
Watchers: 1
Forks: 1
Open Issues: 1
Releases: 0

Topics

action-recognition behavioral-analysis llms mmlms

Created 11 months ago · Last pushed 8 months ago

https://github.com/AdaptiveMotorControlLab/LLaVAction/blob/main/

# LLaVAction: Evaluating and Training Multi-Modal Large Language Models for Action Recognition

[![Static Badge](https://img.shields.io/badge/LLaVAction-paper-green)](https://arxiv.org/abs/2503.18712)
[![Demo Website](https://img.shields.io/badge/LLaVAction-website-red)](https://mmathislab.github.io/llavaction/)
[![llavaction-checkpoints](https://img.shields.io/badge/LLaVAction-checkpoints_-blue)](https://huggingface.co/MLAdaptiveIntelligence)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llavaction-evaluating-and-training-multi/action-recognition-on-epic-kitchens-100)](https://paperswithcode.com/sota/action-recognition-on-epic-kitchens-100?p=llavaction-evaluating-and-training-multi)

[![Downloads](https://static.pepy.tech/badge/llavaction)](https://pepy.tech/project/llavaction)
[![Downloads](https://static.pepy.tech/badge/llavaction/month)](https://pepy.tech/project/llavaction)
[![PyPI version](https://badge.fury.io/py/llavaction.svg)](https://badge.fury.io/py/llavaction)
![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-red)

## Abstract

Understanding human behavior requires measuring behavioral actions. Due to its complexity, behavior is best mapped onto a rich, semantic structure such as language. The recent development of multi-modal large language models (MLLMs) is a promising candidate for a wide range of action understanding tasks. In this work, we focus on evaluating and then improving MLLMs to perform action recognition. We reformulate EPIC-KITCHENS-100, one of the largest and most challenging egocentric action datasets, to the form of video multiple question answering (EPIC-KITCHENS-100-MQA). We show that when we sample difficult incorrect answers as distractors, leading MLLMs struggle to recognize the correct actions. We propose a series of methods that greatly improve the MLLMs' ability to perform action recognition, achieving state-of-the-art on both the EPIC-KITCHENS-100 Challenge, as well as outperforming GPT-4o by 21 points in accuracy on EPIC-KITCHENS-100-MQA. Lastly, we show improvements on other action-related video benchmarks such as VideoMME, PerceptionTest and MVBench.

## Code

- This repository contains the implementation for our preprint on evaluating and training multi-modal large language models for action recognition. 
- Our code is built on [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), and files in the directory `llavaction/action` are related to our work. We thank the authors of LLaVA-NeXT for making their code publicly available.
- The files in the `/eval`, `/model`, `/serve` and `/train` are directly from [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), unless modified and noted below.
  - `/model/llava_arch.py`  
  - `/model/language_model/llava_qwen.py`  
  - `/train/train.py`  
  - `/train/llava_trainer.py`  
  - `/utils.py` 

## Demo 
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AdaptiveMotorControlLab/LLaVAction/blob/main/example/llavaction_video_demo.ipynb)
 We provide code to run video inference in a Jupyter Notebook (which can be run on Google Colaboratory).

  
### Installation guide for video inference:
```bash
conda create -n llavaction python=3.10 -y
conda activate llavaction
pip install --upgrade pip  # Enable PEP 660 support.
pip install --pre llavaction
```

- Please see the `/example` directory for a demo notebook.

## EPIC-KITCHENS-100-MQA 

In our work, we introduce a new way to evaluate MLMMs for action recognition by casting EPIC-KITCHENS-100 into a multi-question-answer benchmark. This has not yet been released [as of 3/2025], but please check the issues or open an issue if you are interested in accessing this resource before the paper is published. We also plan to integrate this the package [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).

# Acknowledgments 
We thank the Swiss AI Initiative Project ID a03 from the Swiss National Supercomputing Centre (CSCS); Boehringer Ingelheim Fonds PhD stipend (H.Q.); M.W.M. thanks the Vallee Foundation; M.W.M. and A.M. thank the SNSF by grant No. 320030-227871.

![group-logo](https://github.com/user-attachments/assets/ad034dc3-5e92-4e8b-915b-85e443b3bdb2)

Owner

Name: Mathis Lab | Adaptive Motor Control
Login: AdaptiveMotorControlLab
Kind: organization
Email: mackenzie@post.harvard.edu
Location: Swiss Federal Institute of Technology

Website: http://mackenziemathislab.org
Twitter: mwmathislab
Repositories: 9
Profile: https://github.com/AdaptiveMotorControlLab

Mechanisms underlying adaptive behavior in intelligent systems

GitHub Events

Total

Create event: 14
Issues event: 1
Release event: 3
Watch event: 36
Delete event: 9
Issue comment event: 2
Member event: 2
Push event: 18
Pull request review event: 1
Pull request event: 16
Fork event: 3

Last Year

Create event: 14
Issues event: 1
Release event: 3
Watch event: 36
Delete event: 9
Issue comment event: 2
Member event: 2
Push event: 18
Pull request review event: 1
Pull request event: 16
Fork event: 3

Committers

Last synced: 9 months ago

All Time

Total Commits: 886
Total Committers: 24
Avg Commits per committer: 36.917
Development Distribution Score (DDS): 0.537

Past Year

Commits: 410
Committers: 20
Avg Commits per committer: 20.5
Development Distribution Score (DDS): 0.778

Top Committers

Name	Email	Commits
Bo Li	d**n@g**m	410
Bo Li	b**1@b**m	111
Ye Shaokai	s**e@h**h	65
shaokai ye	s**h@g**m	52
Ye Shaokai	s**e@s**h	51
kcz358	k**8@o**m	33
Haozhe Qi	h**i@c**h	26
HaozheQi	5**7@q**m	25
Mackenzie Mathis	m**s@e**h	25
chunyuan.li	c**i@b**m	21
Tianyi Xiong	x**2@g**m	19
ZhangYuanhan-AI	y**2@n**g	18
ChunyuanLI	C****I	9
Haozhe Qi	h**i@c**h	5
zhanghao.cx	z**x@b**m	4
jzhang38	a**8@g**m	2
hzhangcx@connect.ust.hk	9**5@q**m	2
Haozhe Qi	h**i@c**h	2
raushan	r**n@h**o	1
ngquangtrung57	q**5@g**m	1
litianjian	l**n@b**m	1
Qi Haozhe	h**i@h**h	1
Haozhe Qi	h**i@t**h	1
Renrui Zhang	5****r	1

Committer Domains (Top 20 + Academic)

bytedance.com: 4 qq.com: 2 haas001.ds-a4-r02.cct.rcp.epfl.ch: 2 todi-ln004.cscs.ch: 1 huggingface.co: 1 clariden-ln002.cscs.ch: 1 clariden-ln003.cscs.ch: 1 ntu.edu.sg: 1 epfl.ch: 1 clariden-ln001.cscs.ch: 1 sv-rcp-gateway.intranet.epfl.ch: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 1
Total pull requests: 9
Average time to close issues: N/A
Average time to close pull requests: about 2 hours
Total issue authors: 1
Total pull request authors: 4
Average comments per issue: 1.0
Average comments per pull request: 0.22
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 1
Pull requests: 9
Average time to close issues: N/A
Average time to close pull requests: about 2 hours
Issue authors: 1
Pull request authors: 4
Average comments per issue: 1.0
Average comments per pull request: 0.22
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/adaptivemotorcontrollab/llavaction

Science Score: 20.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

https://github.com/AdaptiveMotorControlLab/LLaVAction/blob/main/

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels