https://github.com/google-deepmind/lm_act

LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.8%) to scientific vocabulary

Last synced: 6 months ago · JSON representation

Repository

LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations

Basic Info

Host: GitHub
Owner: google-deepmind
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://arxiv.org/abs/2412.01441
Size: 719 KB

Statistics

Stars: 17
Watchers: 7
Forks: 5
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed 11 months ago

Metadata Files

Readme Contributing License

LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations

Overview figure

This repository provides an implementation of our ICML 2025 paper LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations.

In this paper, we present a benchmark to pressure-test today’s frontier models’ multimodal decision-making capabilities in the very long-context regime (up to one million tokens) and investigate whether these models can learn from large numbers of expert demonstrations in their context. We evaluate the performance of Claude 3.5 Sonnet, Gemini 1.5 Flash, Gemini 1.5 Pro, Gemini 2.0 Flash Experimental, GPT-4o, o1-mini, o1-preview, and o1 as policies across a battery of simple interactive decision-making tasks: playing tic-tac-toe, chess, and Atari, navigating grid worlds, solving crosswords, and controlling a simulated cheetah. We study increasing amounts of expert demonstrations in the context — from no demonstrations to 512 full episodes. Across our tasks, models rarely manage to fully reach expert performance, and often, presenting more demonstrations has little effect. Some models steadily improve with more demonstrations on a few tasks. We investigate the effect of encoding observations as text or images and the impact of chain-of-thought prompting. To help quantify the impact of other approaches and future innovations, we open source our benchmark that covers the zero-, few-, and many-shot regimes in a unified evaluation.

Installation

Clone the source code into a local directory:

bash git clone https://github.com/google-deepmind/lm_act.git cd lm_act

This repository requires Python 3.11. pip install -r requirements.txt will install all required dependencies. This is best done inside a conda environment. To that end, install Anaconda. Then, create and activate the conda environment:

bash conda create --name lm_act python=3.11 conda activate lm_act

Install pip and use it to install all the dependencies:

bash conda install pip pip install -r requirements.txt

Installing Crafter

Download the crafter repository:

bash git clone https://github.com/danijar/crafter.git

Installing Stockfish

Download and compile the latest version of Stockfish (for Unix-like systems):

bash git clone https://github.com/official-stockfish/Stockfish.git cd Stockfish/src make -j profile-build ARCH=x86-64-avx2 cd ../..

Downloading the Expert Demonstrations

To download our expert demonstrations to the correct locations, run the following command:

bash cd data ./download.sh cd ..

Usage

Before running any code, make sure to activate the conda environment and set the PYTHONPATH:

bash conda activate lm_act export PYTHONPATH=$(pwd)/..

To evaluate an agent, run the following command: bash python src/main.py \ --environment=tic_tac_toe \ --observation_type=txt \ --agent=random \ --num_demonstrations=0

Citing this work

latex @inproceedings{ruoss2025lmact, author = {Anian Ruoss and Fabio Pardo and Harris Chan and Bonnie Li and Volodymyr Mnih and Tim Genewein}, title = {{LMAct}: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations booktitle = {{ICML}}, year = {2025}, }

License and disclaimer

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.

Owner

Name: Google DeepMind
Login: google-deepmind
Kind: organization

Website: https://www.deepmind.com/
Repositories: 245
Profile: https://github.com/google-deepmind

GitHub Events

Total

Watch event: 23
Issue comment event: 2
Push event: 2
Public event: 1
Pull request event: 2
Fork event: 5

Last Year

Watch event: 23
Issue comment event: 2
Push event: 2
Public event: 1
Pull request event: 2
Fork event: 5

Committers

Last synced: 10 months ago

All Time

Total Commits: 2
Total Committers: 1
Avg Commits per committer: 2.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 2
Committers: 1
Avg Commits per committer: 2.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Anian Ruoss	a**r@g**m	2

Committer Domains (Top 20 + Academic)

google.com: 1

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 0
Total pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: about 21 hours
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 2.5
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: about 21 hours
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 2.5
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

https://github.com/google-deepmind/lm_act

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations

Contents

Installation

Installing Crafter

Installing Stockfish

Downloading the Expert Demonstrations

Usage

Citing this work

License and disclaimer

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels