openhands-versa
Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.8%) to scientific vocabulary
Repository
Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"
Basic Info
- Host: GitHub
- Owner: adityasoni9998
- License: mit
- Language: Python
- Default Branch: main
- Size: 125 MB
Statistics
- Stars: 65
- Watchers: 0
- Forks: 7
- Open Issues: 4
- Releases: 0
Metadata Files
README.md
OpenHands-Versa: Coding Agents with Multimodal Browsing are Generalist Problem Solvers
This repository is a reference implementation of the paper Coding Agents with Multimodal Browsing are Generalist Problem Solvers containing scripts for reproducing the experiments in the paper.
** OpenHands-Versa currently ranks #1 on SWE-Bench Multimodal and The Agent Company leaderboards.**
Overview
Modern human labor is characterized by specialization; we train for years and develop particular tools that allow us to perform well across a variety of tasks. In addition, AI agents have been specialized for domains such as software engineering, web navigation, and workflow automation. However, this results in agents that are good for one thing but fail to generalize beyond their intended scope. One reason for this is that agent developers provide a highly specialized set of tools or make architectural decisions optimized for a specific use case or benchmark. In this work, we ask the question: what is the minimal set of general tools that can be used to achieve high performance across a diverse set of tasks? Our answer is OpenHands-Versa, a generalist agent built with a modest number of general tools: code editing and execution, web search, as well as multimodal web browsing and file access. Importantly, OpenHands-Versa demonstrates superior or competitive performance over leading specialized agents across three diverse and challenging benchmarks: SWE-Bench Multimodal, GAIA, and The Agent Company, outperforming the best-performing previously published results with absolute improvements in success rate of 9.1, 1.3, and 9.1 points respectively. Further, we show how existing state-of-the-art multi-agent systems fail to generalize beyond their target domains. These results demonstrate the feasibility of developing a generalist agent to solve diverse tasks and establish OpenHands-Versa as a strong baseline for future research.
Installation and LLM Configuration
OpenHands-Versa is built on top the OpenHands - a popular framework for open-source AI Agents and the installation instructions are similar as that of OpenHands. We require sudo access to the machine since experiments on The Agent Company need root privileges. All our experiments are run using Ubuntu OS (>=22.04) and we provide installation instructions for the same below:
1. Pre-requisites:
- Docker
- Conda
- OS-specific dependencies:
- Ubuntu: build-essential =>
sudo apt-get install build-essential
- Ubuntu: build-essential =>
Make sure you have all these dependencies installed before moving on to next steps.
2. Build and Setup The Environment
We recommend creation of a conda environment for installing dependencies as shown below:
```bash
Install Python=3.12, nodejs>=22.x, and poetry
conda create -n ohversa python=3.12 conda activate ohversa conda install -c conda-forge "nodejs>=22" conda install conda-forge::poetry ```
From the root directory of the project, run the below command to ensure OpenHands-Versa is ready to run on your system:
bash
make build
3. Configuring the Language Model
OpenHands-Versa supports a diverse array of Language Models (LMs) through the powerful litellm library. We use claude-3-7-sonnet-20250219 and claude-sonnet-4-20250514 for our experiments. You can configure the LLMs by creating a config.toml file in the project root directory similar to config_example.toml.
For details regarding support for other operating systems, support for other LLMs and debugging tips please refer to Development.md.
Reproducing Results
We benchmark OpenHands-Versa on three popular and challenging agent benchmarks: GAIA, The Agent Company, and SWE-Bench Multimodal. For instructions about reproducing our results, please refer to the respective README.md files for GAIA, The Agent Company and SWE-Bench Multimodal. Note that we use Tavily API for our search tool and running the experiments requires a search API key.
Note
The methodology in OpenHands-Versa has also been implemented in upstream OpenHands, and we recommend using the upstream repository if you want to use OpenHands Versa in your own work.
License
Distributed under the MIT License. See LICENSE for more information.
Cite
@misc{soni2025codingagentsmultimodalbrowsing,
title={Coding Agents with Multimodal Browsing are Generalist Problem Solvers},
author={Aditya Bharat Soni and Boxuan Li and Xingyao Wang and Valerie Chen and Graham Neubig},
year={2025},
eprint={2506.03011},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.03011},
}
Owner
- Login: adityasoni9998
- Kind: user
- Repositories: 1
- Profile: https://github.com/adityasoni9998
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it using the following metadata."
title: "OpenHands: An Open Platform for AI Software Developers as Generalist Agents"
authors:
- family-names: Wang
given-names: Xingyao
- family-names: Li
given-names: Boxuan
- family-names: Song
given-names: Yufan
- family-names: Xu
given-names: Frank F.
- family-names: Tang
given-names: Xiangru
- family-names: Zhuge
given-names: Mingchen
- family-names: Pan
given-names: Jiayi
- family-names: Song
given-names: Yueqi
- family-names: Li
given-names: Bowen
- family-names: Singh
given-names: Jaskirat
- family-names: Tran
given-names: Hoang H.
- family-names: Li
given-names: Fuqiang
- family-names: Ma
given-names: Ren
- family-names: Zheng
given-names: Mingzhang
- family-names: Qian
given-names: Bill
- family-names: Shao
given-names: Yanjun
- family-names: Muennighoff
given-names: Niklas
- family-names: Zhang
given-names: Yizhe
- family-names: Hui
given-names: Binyuan
- family-names: Lin
given-names: Junyang
- family-names: Brennan
given-names: Robert
- family-names: Peng
given-names: Hao
- family-names: Ji
given-names: Heng
- family-names: Neubig
given-names: Graham
year: 2024
doi: "10.48550/arXiv.2407.16741"
url: "https://arxiv.org/abs/2407.16741"
GitHub Events
Total
- Issues event: 2
- Watch event: 35
- Issue comment event: 6
- Push event: 12
- Pull request event: 5
- Fork event: 6
- Create event: 9
Last Year
- Issues event: 2
- Watch event: 35
- Issue comment event: 6
- Push event: 12
- Pull request event: 5
- Fork event: 6
- Create event: 9
Dependencies
- Mattraks/delete-workflow-runs v2 composite
- actions/checkout v4 composite
- actions/deploy-pages v4 composite
- actions/setup-node v4 composite
- actions/setup-python v5 composite
- actions/upload-pages-artifact v3 composite
- actions/checkout v4 composite
- actions/setup-node v4 composite
- actions/setup-python v5 composite
- docker/setup-buildx-action v3 composite
- KeisukeYamashita/create-comment v1 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- google-github-actions/auth v2 composite
- google-github-actions/upload-cloud-storage v2 composite
- slackapi/slack-github-action v2.0.0 composite
- actions/checkout v4 composite
- actions/setup-node v4 composite
- codecov/codecov-action v5 composite
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/download-artifact v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- codecov/codecov-action v5 composite
- docker/build-push-action v6 composite
- docker/login-action v3 composite
- docker/setup-buildx-action v3 composite
- docker/setup-qemu-action v3.6.0 composite
- KeisukeYamashita/create-comment v1 composite
- actions/checkout v4 composite
- actions/setup-node v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- actions/checkout v4 composite
- actions/setup-node v4 composite
- actions/setup-python v5 composite
- actions/checkout v4 composite
- actions/setup-node v4 composite
- actions/setup-python v5 composite
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/github-script v7 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- actions/checkout v4 composite
- actions/setup-node v4 composite
- actions/setup-python v5 composite
- codecov/codecov-action v5 composite
- docker/setup-buildx-action v3 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- snok/install-poetry v1.4.1 composite
- KeisukeYamashita/create-comment v1 composite
- actions/checkout v4 composite
- actions/stale v9 composite
- node 21.7.2-bookworm-slim build
- python 3.12.3-slim build
- dind latest build
- openhands latest build
- ubuntu 22.04 build
- ubuntu 22.04 build
- openhands latest
- python 3.12-bookworm build
- python 3.12-bookworm build
- python 3.12-bookworm build
- python 3.11-bookworm build
- python 3.12-bookworm build
- 1273 dependencies
- @docusaurus/module-type-aliases ^3.5.1 development
- @docusaurus/tsconfig ^3.7.0 development
- @docusaurus/types ^3.5.1 development
- typescript ~5.8.2 development
- @docusaurus/core ^3.7.0
- @docusaurus/plugin-content-pages ^3.7.0
- @docusaurus/preset-classic ^3.7.0
- @docusaurus/theme-mermaid ^3.7.0
- @mdx-js/react ^3.1.0
- clsx ^2.0.0
- prism-react-renderer ^2.4.1
- react ^19.0.0
- react-dom ^19.0.0
- react-icons ^5.5.0
- react-use ^17.6.0
- 1197 dependencies
- @mswjs/socket.io-binding ^0.1.1 development
- @playwright/test ^1.51.0 development
- @react-router/dev ^7.3.0 development
- @tailwindcss/typography ^0.5.16 development
- @tanstack/eslint-plugin-query ^5.67.2 development
- @testing-library/dom ^10.4.0 development
- @testing-library/jest-dom ^6.6.1 development
- @testing-library/react ^16.2.0 development
- @testing-library/user-event ^14.6.1 development
- @types/node ^22.13.9 development
- @types/react ^19.0.8 development
- @types/react-dom ^19.0.3 development
- @types/react-highlight ^0.12.8 development
- @types/react-syntax-highlighter ^15.5.13 development
- @types/ws ^8.18.0 development
- @typescript-eslint/eslint-plugin ^7.18.0 development
- @typescript-eslint/parser ^7.18.0 development
- @vitest/coverage-v8 ^3.0.8 development
- autoprefixer ^10.4.20 development
- cross-env ^7.0.3 development
- eslint ^8.57.0 development
- eslint-config-airbnb ^19.0.4 development
- eslint-config-airbnb-typescript ^18.0.0 development
- eslint-config-prettier ^10.1.1 development
- eslint-plugin-import ^2.29.1 development
- eslint-plugin-jsx-a11y ^6.10.2 development
- eslint-plugin-prettier ^5.2.3 development
- eslint-plugin-react ^7.37.4 development
- eslint-plugin-react-hooks ^4.6.2 development
- husky ^9.1.6 development
- jsdom ^26.0.0 development
- lint-staged ^15.4.3 development
- msw ^2.6.6 development
- postcss ^8.5.2 development
- prettier ^3.5.3 development
- stripe ^17.7.0 development
- tailwindcss ^3.4.17 development
- typescript ^5.8.2 development
- vite-plugin-svgr ^4.2.0 development
- vite-tsconfig-paths ^5.1.4 development
- vitest ^3.0.2 development
- @heroui/react 2.7.4
- @monaco-editor/react ^4.7.0-rc.0
- @react-router/node ^7.3.0
- @react-router/serve ^7.3.0
- @react-types/shared ^3.28.0
- @reduxjs/toolkit ^2.6.0
- @stripe/react-stripe-js ^3.3.0
- @stripe/stripe-js ^5.10.0
- @tanstack/react-query ^5.67.2
- @vitejs/plugin-react ^4.3.2
- @xterm/addon-fit ^0.10.0
- @xterm/xterm ^5.4.0
- axios ^1.8.2
- clsx ^2.1.1
- eslint-config-airbnb-typescript ^18.0.0
- framer-motion ^12.4.10
- i18next ^24.2.2
- i18next-browser-languagedetector ^8.0.4
- i18next-http-backend ^3.0.2
- isbot ^5.1.23
- jose ^6.0.8
- monaco-editor ^0.52.2
- posthog-js ^1.229.3
- react ^19.0.0
- react-dom ^19.0.0
- react-highlight ^0.15.0
- react-hot-toast ^2.5.1
- react-i18next ^15.4.1
- react-icons ^5.5.0
- react-markdown ^10.1.0
- react-redux ^9.2.0
- react-router ^7.3.0
- react-syntax-highlighter ^15.6.1
- react-textarea-autosize ^8.5.7
- remark-gfm ^4.0.1
- sirv-cli ^3.0.1
- socket.io-client ^4.8.1
- tailwind-merge ^3.0.2
- vite ^6.2.1
- web-vitals ^3.5.2
- ws ^8.18.1
- ipython *
- matplotlib *
- networkx *
- nltk *
- opencv-python *
- pandas ==1.4.4
- python-dateutil *
- pytz *
- pyyaml *
- scipy ==1.10.1
- seaborn *
- statsmodels *
- sympy *
- visdom *
- 384 dependencies