https://github.com/all-hands-ai/browsergym

🌎💪 BrowserGym, a Gym environment for web task automation

https://github.com/all-hands-ai/browsergym

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • â—‹
    CITATION.cff file
  • â—‹
    codemeta.json file
  • â—‹
    .zenodo.json file
  • â—‹
    DOI references
  • ✓
    Academic publication links
    Links to: arxiv.org
  • â—‹
    Academic email domains
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

🌎💪 BrowserGym, a Gym environment for web task automation

Basic Info
  • Host: GitHub
  • Owner: All-Hands-AI
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.21 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of ServiceNow/BrowserGym
Created about 1 year ago · Last pushed about 1 year ago

https://github.com/All-Hands-AI/BrowserGym/blob/main/

![BrowserGym banner](https://github.com/user-attachments/assets/4853f210-43ac-4107-a0d2-95c9c614dbe7) [Setup](#%EF%B8%8F-setup) - [Usage](#-usage) - [Demo](#-demo) - [Ecosystem](#-ecosystem) - [AgentLab](https://github.com/ServiceNow/AgentLab) - [Contributors](#-contributors) - [Paper](https://arxiv.org/abs/2412.05467) - [Citation](#-citing-this-work) [![pypi](https://badge.fury.io/py/browsergym.svg)](https://pypi.org/project/browsergym/) [![PyPI - License](https://img.shields.io/pypi/l/browsergym?style=flat-square)]([https://opensource.org/licenses/MIT](http://www.apache.org/licenses/LICENSE-2.0)) [![PyPI - Downloads](https://img.shields.io/pypi/dm/browsergym-core?style=flat-square)](https://pypistats.org/packages/browsergym-core) [![GitHub star chart](https://img.shields.io/github/stars/ServiceNow/BrowserGym?style=flat-square)](https://star-history.com/#ServiceNow/BrowserGym) [![Code Format](https://github.com/ServiceNow/BrowserGym/actions/workflows/code_format.yml/badge.svg)](https://github.com/ServiceNow/BrowserGym/actions/workflows/code_format.yml) [![Tests](https://github.com/ServiceNow/BrowserGym/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/ServiceNow/BrowserGym/actions/workflows/unit_tests.yml) ```python pip install browsergym ```
> [!WARNING] > BrowserGym is meant to provide an open, easy-to-use and extensible framework to accelerate the field of web agent research. > It is not meant to be a consumer product. Use with caution! > [!TIP] > Check out [AgentLab](https://github.com/ServiceNow/AgentLab) ! > A seamless framework to implement, test, and evaluate your web agents on all BrowserGym benchmarks. https://github.com/ServiceNow/BrowserGym/assets/26232819/e0bfc788-cc8e-44f1-b8c3-0d1114108b85 _Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row)._ BrowserGym includes the following benchmarks by default: - [MiniWoB](https://miniwob.farama.org/) - [WebArena](https://webarena.dev/) - [VisualWebArena](https://jykoh.com/vwa) - [WorkArena](https://github.com/ServiceNow/WorkArena) - [AssistantBench](https://github.com/oriyor/assistantbench) - [WebLINX](https://github.com/McGill-NLP/weblinx) (static benchmark) Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the [`AbstractBrowserTask`](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/core/src/browsergym/core/task.py#L7C7-L7C26) class. ## Setup To use browsergym, install one of the following packages: ```sh pip install browsergym # (recommended) everything below pip install browsergym-experiments # experiment utilities (agent, loop, benchmarks) + everything below pip install browsergym-core # core functionalities only (no benchmark, just the openended task) pip install browsergym-miniwob # core + miniwob pip install browsergym-webarena # core + webarena pip install browsergym-visualwebarena # core + visualwebarena pip install browsergym-workarena # core + workarena pip install browsergym-assistantbench # core + assistantbench pip install weblinx-browsergym # core + weblinx ``` Then setup playwright by running ```sh playwright install chromium ``` Finally, each benchmark comes with its own specific setup that requires to follow additional steps. - for MiniWoB++, see [miniwob/README.md](browsergym/miniwob/README.md) - for WebArena, see [webarena/README.md](browsergym/webarena/README.md) - for VisualWebArena, see [visualwebarena/README.md](browsergym/visualwebarena/README.md) - for WorkArena, see [WorkArena](https://github.com/ServiceNow/WorkArena) - for AssistantBench, see [assistantbench/README.md](browsergym/assistantbench/README.md) ### Development setup To install browsergym locally for development, use the following commands: ```sh git clone git@github.com:ServiceNow/BrowserGym.git cd BrowserGym make install ``` Contributions are welcome! ## Usage Boilerplate code to run an agent on an interactive, open-ended task: ```python import gymnasium as gym import browsergym.core # register the openended task as a gym environment # start an openended environment env = gym.make( "browsergym/openended", task_kwargs={"start_url": "https://www.google.com/"}, # starting URL wait_for_user_message=True, # wait for a user message after each agent message sent to the chat ) # run the environment <> agent loop until termination obs, info = env.reset() while True: action = ... # implement your agent here obs, reward, terminated, truncated, info = env.step(action) if terminated or truncated: break # release the environment env.close() ``` MiniWoB ```python import gymnasium as gym import browsergym.miniwob # register miniwob tasks as gym environments # start a miniwob task env = gym.make("browsergym/miniwob.choose-list") ... # list all the available miniwob tasks env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/miniwob")] print("\n".join(env_ids)) ``` WorkArena ```python import gymnasium as gym import browsergym.workarena # register workarena tasks as gym environments # start a workarena task env = gym.make("browsergym/workarena.servicenow.order-ipad-pro") ... # list all the available workarena tasks env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")] print("\n".join(env_ids)) ``` WebArena ```python import gymnasium as gym import browsergym.webarena # register webarena tasks as gym environments # start a webarena task env = gym.make("browsergym/webarena.310") ... # list all the available webarena tasks env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/webarena")] print("\n".join(env_ids)) ``` VisualWebArena ```python import gymnasium as gym import browsergym.webarena # register webarena tasks as gym environments # start a visualwebarena task env = gym.make("browsergym/visualwebarena.721") ... # list all the available visualwebarena tasks env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/visualwebarena")] print("\n".join(env_ids)) ``` AssistantBench ```python import gymnasium as gym import browsergym.workarena # register assistantbench tasks as gym environments # start an assistantbench task env = gym.make("browsergym/assistantbench.validation.3") ... # list all the available assistantbench tasks env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")] print("\n".join(env_ids)) ``` ## Demo If you want to experiment with a demo agent in BrowserGym, follow these steps ```sh # conda setup conda env create -f demo_agent/environment.yml conda activate demo_agent # or pip setup pip install -r demo_agent/requirements.txt # then download the browser for playwright playwright install chromium ``` Our demo agent uses `openai` as a backend, be sure to set your `OPENAI_API_KEY`. Launch the demo agent as follows ```sh # openended (interactive chat mode) python demo_agent/run_demo.py --task_name openended --start_url https://www.google.com # miniwob python demo_agent/run_demo.py --task_name miniwob.click-test # workarena python demo_agent/run_demo.py --task_name workarena.servicenow.order-standard-laptop # webarena python demo_agent/run_demo.py --task_name webarena.4 # visualwebarena python demo_agent/run_demo.py --task_name visualwebarena.398 ``` You can customize your experience by changing the `model_name` to your preferred LLM (it uses `gpt-4o-mini` by default), adding screenshots for your VLMs with `use_screenshot`, and much more! ```python python demo_agent/run_demo.py --help ``` ## Ecosystem - [AgentLab](https://github.com/ServiceNow/AgentLab): Seamlessly run agents on benchmarks, collect and analyse traces. - [WorkArena(++)](https://github.com/ServiceNow/WorkArena): A benchmark for web agents on the ServiceNow platform. - [WebArena](https://github.com/web-arena-x/webarena): A benchmark of realistic web tasks on self-hosted domains. - [VisualWebArena](https://github.com/web-arena-x/visualwebarena): A benchmark of realistic visual web tasks on self-hosted domains. - [MiniWoB(++)](https://miniwob.farama.org/): A collection of over 100 web tasks on synthetic web pages. - [WebLINX](https://github.com/McGill-NLP/weblinx): A dataset of real-world web interaction traces. - [AssistantBench](https://github.com/oriyor/assistantbench): A benchmark of realistic and time-consuming tasks on the open web. - [DoomArena](https://github.com/ServiceNow/DoomArena): A framework for AI agent security testing which supports injecting attacks into web pages from Browsergym environments. ## Contributors [![BrowserGym contributors](https://contrib.rocks/image?repo=ServiceNow/BrowserGym&max=2000)](https://github.com/ServiceNow/BrowserGym/graphs/contributors) ## Citing This Work Please use the following BibTeX to cite our work: ```tex @inproceedings{workarena2024, title = {{W}ork{A}rena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?}, author = {Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {11642--11662}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, url = {https://proceedings.mlr.press/v235/drouin24a.html}, } ```

Owner

  • Name: All Hands AI
  • Login: All-Hands-AI
  • Kind: organization
  • Email: contact@all-hands.dev

We build AI software development agents for everyone, in the open.

GitHub Events

Total
  • Watch event: 3
  • Push event: 1
Last Year
  • Watch event: 3
  • Push event: 1