openhands-versa

Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"

https://github.com/adityasoni9998/openhands-versa

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"

Basic Info
  • Host: GitHub
  • Owner: adityasoni9998
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 125 MB
Statistics
  • Stars: 65
  • Watchers: 0
  • Forks: 7
  • Open Issues: 4
  • Releases: 0
Created 9 months ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

OpenHands-Versa: Coding Agents with Multimodal Browsing are Generalist Problem Solvers

This repository is a reference implementation of the paper Coding Agents with Multimodal Browsing are Generalist Problem Solvers containing scripts for reproducing the experiments in the paper.

** OpenHands-Versa currently ranks #1 on SWE-Bench Multimodal and The Agent Company leaderboards.**

Overview

Modern human labor is characterized by specialization; we train for years and develop particular tools that allow us to perform well across a variety of tasks. In addition, AI agents have been specialized for domains such as software engineering, web navigation, and workflow automation. However, this results in agents that are good for one thing but fail to generalize beyond their intended scope. One reason for this is that agent developers provide a highly specialized set of tools or make architectural decisions optimized for a specific use case or benchmark. In this work, we ask the question: what is the minimal set of general tools that can be used to achieve high performance across a diverse set of tasks? Our answer is OpenHands-Versa, a generalist agent built with a modest number of general tools: code editing and execution, web search, as well as multimodal web browsing and file access. Importantly, OpenHands-Versa demonstrates superior or competitive performance over leading specialized agents across three diverse and challenging benchmarks: SWE-Bench Multimodal, GAIA, and The Agent Company, outperforming the best-performing previously published results with absolute improvements in success rate of 9.1, 1.3, and 9.1 points respectively. Further, we show how existing state-of-the-art multi-agent systems fail to generalize beyond their target domains. These results demonstrate the feasibility of developing a generalist agent to solve diverse tasks and establish OpenHands-Versa as a strong baseline for future research.

Installation and LLM Configuration

OpenHands-Versa is built on top the OpenHands - a popular framework for open-source AI Agents and the installation instructions are similar as that of OpenHands. We require sudo access to the machine since experiments on The Agent Company need root privileges. All our experiments are run using Ubuntu OS (>=22.04) and we provide installation instructions for the same below:

1. Pre-requisites:

  • Docker
  • Conda
  • OS-specific dependencies:
    • Ubuntu: build-essential => sudo apt-get install build-essential

Make sure you have all these dependencies installed before moving on to next steps.

2. Build and Setup The Environment

We recommend creation of a conda environment for installing dependencies as shown below:

```bash

Install Python=3.12, nodejs>=22.x, and poetry

conda create -n ohversa python=3.12 conda activate ohversa conda install -c conda-forge "nodejs>=22" conda install conda-forge::poetry ```

From the root directory of the project, run the below command to ensure OpenHands-Versa is ready to run on your system: bash make build

3. Configuring the Language Model

OpenHands-Versa supports a diverse array of Language Models (LMs) through the powerful litellm library. We use claude-3-7-sonnet-20250219 and claude-sonnet-4-20250514 for our experiments. You can configure the LLMs by creating a config.toml file in the project root directory similar to config_example.toml.

For details regarding support for other operating systems, support for other LLMs and debugging tips please refer to Development.md.

Reproducing Results

We benchmark OpenHands-Versa on three popular and challenging agent benchmarks: GAIA, The Agent Company, and SWE-Bench Multimodal. For instructions about reproducing our results, please refer to the respective README.md files for GAIA, The Agent Company and SWE-Bench Multimodal. Note that we use Tavily API for our search tool and running the experiments requires a search API key.

Note

The methodology in OpenHands-Versa has also been implemented in upstream OpenHands, and we recommend using the upstream repository if you want to use OpenHands Versa in your own work.

License

Distributed under the MIT License. See LICENSE for more information.

Cite

@misc{soni2025codingagentsmultimodalbrowsing, title={Coding Agents with Multimodal Browsing are Generalist Problem Solvers}, author={Aditya Bharat Soni and Boxuan Li and Xingyao Wang and Valerie Chen and Graham Neubig}, year={2025}, eprint={2506.03011}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.03011}, }

Owner

  • Login: adityasoni9998
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it using the following metadata."
title: "OpenHands: An Open Platform for AI Software Developers as Generalist Agents"
authors:
  - family-names: Wang
    given-names: Xingyao
  - family-names: Li
    given-names: Boxuan
  - family-names: Song
    given-names: Yufan
  - family-names: Xu
    given-names: Frank F.
  - family-names: Tang
    given-names: Xiangru
  - family-names: Zhuge
    given-names: Mingchen
  - family-names: Pan
    given-names: Jiayi
  - family-names: Song
    given-names: Yueqi
  - family-names: Li
    given-names: Bowen
  - family-names: Singh
    given-names: Jaskirat
  - family-names: Tran
    given-names: Hoang H.
  - family-names: Li
    given-names: Fuqiang
  - family-names: Ma
    given-names: Ren
  - family-names: Zheng
    given-names: Mingzhang
  - family-names: Qian
    given-names: Bill
  - family-names: Shao
    given-names: Yanjun
  - family-names: Muennighoff
    given-names: Niklas
  - family-names: Zhang
    given-names: Yizhe
  - family-names: Hui
    given-names: Binyuan
  - family-names: Lin
    given-names: Junyang
  - family-names: Brennan
    given-names: Robert
  - family-names: Peng
    given-names: Hao
  - family-names: Ji
    given-names: Heng
  - family-names: Neubig
    given-names: Graham
year: 2024
doi: "10.48550/arXiv.2407.16741"
url: "https://arxiv.org/abs/2407.16741"

GitHub Events

Total
  • Issues event: 2
  • Watch event: 35
  • Issue comment event: 6
  • Push event: 12
  • Pull request event: 5
  • Fork event: 6
  • Create event: 9
Last Year
  • Issues event: 2
  • Watch event: 35
  • Issue comment event: 6
  • Push event: 12
  • Pull request event: 5
  • Fork event: 6
  • Create event: 9

Dependencies

.github/workflows/clean-up.yml actions
  • Mattraks/delete-workflow-runs v2 composite
.github/workflows/deploy-docs.yml actions
  • actions/checkout v4 composite
  • actions/deploy-pages v4 composite
  • actions/setup-node v4 composite
  • actions/setup-python v5 composite
  • actions/upload-pages-artifact v3 composite
.github/workflows/dummy-agent-test.yml actions
  • actions/checkout v4 composite
  • actions/setup-node v4 composite
  • actions/setup-python v5 composite
  • docker/setup-buildx-action v3 composite
.github/workflows/eval-runner.yml actions
  • KeisukeYamashita/create-comment v1 composite
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • google-github-actions/auth v2 composite
  • google-github-actions/upload-cloud-storage v2 composite
  • slackapi/slack-github-action v2.0.0 composite
.github/workflows/fe-unit-tests.yml actions
  • actions/checkout v4 composite
  • actions/setup-node v4 composite
  • codecov/codecov-action v5 composite
.github/workflows/ghcr-build.yml actions
  • actions/cache v4 composite
  • actions/checkout v4 composite
  • actions/download-artifact v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • codecov/codecov-action v5 composite
  • docker/build-push-action v6 composite
  • docker/login-action v3 composite
  • docker/setup-buildx-action v3 composite
  • docker/setup-qemu-action v3.6.0 composite
.github/workflows/integration-runner.yml actions
  • KeisukeYamashita/create-comment v1 composite
  • actions/checkout v4 composite
  • actions/setup-node v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
.github/workflows/lint-fix.yml actions
  • actions/checkout v4 composite
  • actions/setup-node v4 composite
  • actions/setup-python v5 composite
.github/workflows/lint.yml actions
  • actions/checkout v4 composite
  • actions/setup-node v4 composite
  • actions/setup-python v5 composite
.github/workflows/openhands-resolver.yml actions
  • actions/cache v4 composite
  • actions/checkout v4 composite
  • actions/github-script v7 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
.github/workflows/py-unit-tests.yml actions
  • actions/checkout v4 composite
  • actions/setup-node v4 composite
  • actions/setup-python v5 composite
  • codecov/codecov-action v5 composite
  • docker/setup-buildx-action v3 composite
.github/workflows/pypi-release.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • snok/install-poetry v1.4.1 composite
.github/workflows/run-eval.yml actions
  • KeisukeYamashita/create-comment v1 composite
  • actions/checkout v4 composite
.github/workflows/stale.yml actions
  • actions/stale v9 composite
containers/app/Dockerfile docker
  • node 21.7.2-bookworm-slim build
  • python 3.12.3-slim build
containers/dev/Dockerfile docker
  • dind latest build
  • openhands latest build
  • ubuntu 22.04 build
containers/e2b-sandbox/Dockerfile docker
  • ubuntu 22.04 build
docker-compose.yml docker
  • openhands latest
evaluation/benchmarks/logic_reasoning/Dockerfile docker
  • python 3.12-bookworm build
evaluation/benchmarks/miniwob/Dockerfile docker
  • python 3.12-bookworm build
evaluation/benchmarks/mint/Dockerfile docker
  • python 3.12-bookworm build
evaluation/benchmarks/scienceagentbench/Dockerfile docker
  • python 3.11-bookworm build
evaluation/benchmarks/toolqa/Dockerfile docker
  • python 3.12-bookworm build
docs/package-lock.json npm
  • 1273 dependencies
docs/package.json npm
  • @docusaurus/module-type-aliases ^3.5.1 development
  • @docusaurus/tsconfig ^3.7.0 development
  • @docusaurus/types ^3.5.1 development
  • typescript ~5.8.2 development
  • @docusaurus/core ^3.7.0
  • @docusaurus/plugin-content-pages ^3.7.0
  • @docusaurus/preset-classic ^3.7.0
  • @docusaurus/theme-mermaid ^3.7.0
  • @mdx-js/react ^3.1.0
  • clsx ^2.0.0
  • prism-react-renderer ^2.4.1
  • react ^19.0.0
  • react-dom ^19.0.0
  • react-icons ^5.5.0
  • react-use ^17.6.0
frontend/package-lock.json npm
  • 1197 dependencies
frontend/package.json npm
  • @mswjs/socket.io-binding ^0.1.1 development
  • @playwright/test ^1.51.0 development
  • @react-router/dev ^7.3.0 development
  • @tailwindcss/typography ^0.5.16 development
  • @tanstack/eslint-plugin-query ^5.67.2 development
  • @testing-library/dom ^10.4.0 development
  • @testing-library/jest-dom ^6.6.1 development
  • @testing-library/react ^16.2.0 development
  • @testing-library/user-event ^14.6.1 development
  • @types/node ^22.13.9 development
  • @types/react ^19.0.8 development
  • @types/react-dom ^19.0.3 development
  • @types/react-highlight ^0.12.8 development
  • @types/react-syntax-highlighter ^15.5.13 development
  • @types/ws ^8.18.0 development
  • @typescript-eslint/eslint-plugin ^7.18.0 development
  • @typescript-eslint/parser ^7.18.0 development
  • @vitest/coverage-v8 ^3.0.8 development
  • autoprefixer ^10.4.20 development
  • cross-env ^7.0.3 development
  • eslint ^8.57.0 development
  • eslint-config-airbnb ^19.0.4 development
  • eslint-config-airbnb-typescript ^18.0.0 development
  • eslint-config-prettier ^10.1.1 development
  • eslint-plugin-import ^2.29.1 development
  • eslint-plugin-jsx-a11y ^6.10.2 development
  • eslint-plugin-prettier ^5.2.3 development
  • eslint-plugin-react ^7.37.4 development
  • eslint-plugin-react-hooks ^4.6.2 development
  • husky ^9.1.6 development
  • jsdom ^26.0.0 development
  • lint-staged ^15.4.3 development
  • msw ^2.6.6 development
  • postcss ^8.5.2 development
  • prettier ^3.5.3 development
  • stripe ^17.7.0 development
  • tailwindcss ^3.4.17 development
  • typescript ^5.8.2 development
  • vite-plugin-svgr ^4.2.0 development
  • vite-tsconfig-paths ^5.1.4 development
  • vitest ^3.0.2 development
  • @heroui/react 2.7.4
  • @monaco-editor/react ^4.7.0-rc.0
  • @react-router/node ^7.3.0
  • @react-router/serve ^7.3.0
  • @react-types/shared ^3.28.0
  • @reduxjs/toolkit ^2.6.0
  • @stripe/react-stripe-js ^3.3.0
  • @stripe/stripe-js ^5.10.0
  • @tanstack/react-query ^5.67.2
  • @vitejs/plugin-react ^4.3.2
  • @xterm/addon-fit ^0.10.0
  • @xterm/xterm ^5.4.0
  • axios ^1.8.2
  • clsx ^2.1.1
  • eslint-config-airbnb-typescript ^18.0.0
  • framer-motion ^12.4.10
  • i18next ^24.2.2
  • i18next-browser-languagedetector ^8.0.4
  • i18next-http-backend ^3.0.2
  • isbot ^5.1.23
  • jose ^6.0.8
  • monaco-editor ^0.52.2
  • posthog-js ^1.229.3
  • react ^19.0.0
  • react-dom ^19.0.0
  • react-highlight ^0.15.0
  • react-hot-toast ^2.5.1
  • react-i18next ^15.4.1
  • react-icons ^5.5.0
  • react-markdown ^10.1.0
  • react-redux ^9.2.0
  • react-router ^7.3.0
  • react-syntax-highlighter ^15.6.1
  • react-textarea-autosize ^8.5.7
  • remark-gfm ^4.0.1
  • sirv-cli ^3.0.1
  • socket.io-client ^4.8.1
  • tailwind-merge ^3.0.2
  • vite ^6.2.1
  • web-vitals ^3.5.2
  • ws ^8.18.1
openhands/runtime/utils/vscode-extensions/hello-world/package.json npm
openhands/runtime/utils/vscode-extensions/memory-monitor/package.json npm
evaluation/benchmarks/mint/requirements.txt pypi
  • ipython *
  • matplotlib *
  • networkx *
  • nltk *
  • opencv-python *
  • pandas ==1.4.4
  • python-dateutil *
  • pytz *
  • pyyaml *
  • scipy ==1.10.1
  • seaborn *
  • statsmodels *
  • sympy *
  • visdom *
openhands/core/setup.py pypi
poetry.lock pypi
  • 384 dependencies
pyproject.toml pypi