Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary
Repository
A framework for evaluating LLMs in Atari games
Basic Info
Statistics
- Stars: 15
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
🎮 ALE-NL: The Arcade Learning Environment in Natural Language
Please refer to the project report for a comprehensive overview.
ALE-NL supports the Arcade Learning Environment (ALE) with Large Language Models (LLMs), enabling LLMs to interact with and be evaluated on Atari games through natural language. Built on top of OCAtari, it allows systematic, interpretable, and reproducible benchmarking of LLMs in classic Atari games.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
🧠 Overview
ALE-NL translates game states into natural language descriptions that are easy to consume for LLMs. It provides a simple yet powerful interface to:
- Benchmark LLMs on Atari tasks 🏆
- Analyze and visualize behavior 🤖📊
- Reproduce results with ease 🔁
✅ Features
- [x] 12 Atari games supported (adding more!):
Asterix,BattleZone,BeamRider,Bowling,Boxing,Breakout,DemonAttack,Freeway,KungfuMaster,MsPacman,Seaquest,SpaceInvaders - [x] Run any HuggingFace
text-generationmodel locally 💻 - [x] Run OpenAI models via API ☁️
- [x] Modular and customizable prompting strategies (CoT, zero-shot, few-shot)
- [x] Easy ablation of sampling parameters (temperature, context length, etc.)
- [x] One-click benchmarking:
plot_benchmark_results.ipynb - [x] Visual + statistical debugging:
✨ Prompting Pipeline
Prompt templates are modularly composed from:
- Game Descriptions: Loaded from
src/captions/game_descriptions/(from ALE docs). - Prompt Chains: Found in
prompt_chains/to enable CoT, zero-shot, few-shot, etc. - State Descriptions: Defined per game in
src/captions/games/. Customizable for each game.
⚙️ Installation
We recommend using conda, but any Python 3.8+ virtual environment should work.
bash
conda create -n ale-nlp python=3.8 -y
conda activate ale-nlp
☁️ Running OpenAI Models
Only requires OpenAI's API client:
- Make sure your
pipis up to date by runningpip install --upgrade pip.
bash
pip install -r requirements_api.txt
- Make sure to set your OpenAI API key:
export OPENAI_API_KEY=<your_key>orconda env config vars set OPENAI_API_KEY=<your_key>.
🧠 Running LLMs Locally
Install dependencies with CUDA support:
bash
conda install -c conda-forge cudatoolkit-dev
pip install transformers[torch]
pip install -r requirements_local.txt
🔁 Final Setup
Necessary final step:
bash
pip install -e .
🚀 Running
Running any LLM in an Atari game is just one command away!
Simply pass the appropriate model name and environment ID to src/run.py:
<LLM_NAME>: Must be a valid model ID from either:- 🤗 HuggingFace (e.g.,
Qwen/Qwen2-0.5B) - 🧠 OpenAI (e.g.,
gpt-3.5-turbo-0125)
- 🤗 HuggingFace (e.g.,
<ENV_ID>: The Atari game name (e.g.,SpaceInvaders,MsPacman,Asterix, ...)
bash
python src/run.py --model_name=<LLM_NAME> --env_id=<ENV_ID>
Additional options can be passed for fine-grained control:
- --prompt_chain_path: Selects a prompting strategy
- --temperature: Controls sampling randomness
- --context_length: Limits the LLM input length
- ...and more!
e.g. Run gpt-3.5-turbo-0125 on SpaceInvaders with a CoT prompting strategy:
bash
python src/run.py --model_name=gpt-3.5-turbo-0125 --env_id=SpaceInvaders --prompt_chain_path=prompt_chains/think_stepbystep
e.g. Run gpt-4o on Freeway with a zero-shot prompting strategy:
bash
python src/run.py --model_name=gpt-4o --env_id=Freeway --prompt_chain_path=prompt_chains/simple
e.g. If installed locally, run Qwen/Qwen2.5-0.5B on SpaceInvaders with a CoT prompting strategy:
bash
python src/run.py --model_name=Qwen/Qwen2.5-0.5B --env_id=SpaceInvaders --prompt_chain_path=prompt_chains/think_stepbystep
(You can cancel the process at any time by pressing Ctrl+C or Ctrl+Z + pkill python)
📁 All logs, outputs, and interaction traces will be automatically saved in the results directory.
👉 For the full list of options, check src/run.py.
👉 After running, you can visualize the results with plot/plot_benchmark_results.ipynb.
👉 The full interaction trace, video, and logs are also saved in the results directory.
📬 Contribute or Explore More
Got a new game, prompt strategy, or LLM you want to try? Contributions and suggestions are welcome!
📚 Citation
If you use ALE-NL in your research, please consider citing it using the following format:
bibtex
@misc{ale-nl2025,
title = {ALE-NL: The Arcade Learning Environment in Natural Language},
author = {Creus Castanyer, Roger},
year = {2025},
url = {https://github.com/roger-creus/ale-nl},
note = {Accessed: 2025-04-16}
}
Owner
- Name: Roger Creus
- Login: roger-creus
- Kind: user
- Location: Montréal, Québec, Canada.
- Company: Mila Québec
- Website: https://roger-creus.github.io/
- Twitter: creus_roger
- Repositories: 13
- Profile: https://github.com/roger-creus
Research MSc @mila-iqia @montrealrobotics. Deep Reinforcement Learning
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use ALE-NL in your research, please cite it as below."
title: "ALE-NL: The Arcade Learning Environment in Natural Language"
authors:
- family-names: Creus Castanyer
given-names: Roger
affiliation: Mila Quebec AI Institute and University of Montreal
orcid: "https://orcid.org/0000-0003-1952-3357"
date-released: 2025-04-16
version: "0.1.0"
repository-code: "https://github.com/roger-creus/ale-nl"
license: MIT
GitHub Events
Total
- Watch event: 13
- Push event: 1
- Public event: 1
- Fork event: 1
Last Year
- Watch event: 13
- Push event: 1
- Public event: 1
- Fork event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- gymnasium *
- ocatari *
- openai *
- flash-attn *
- gymnasium *
- ocatari *
- openai *







