webarena

https://github.com/aeromaki/webarena

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
2 of 10 committers (20.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary

Keywords from Contributors

transformer cryptocurrency attention bert

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: aeromaki
License: apache-2.0
Language: Python
Default Branch: main
Size: 5.92 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created almost 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License Citation

WebArena: A Realistic Web Environment for Building Autonomous Agents

WebArena is a standalone, self-hostable web environment for building autonomous agents

Website • Paper • Leaderboard

Overview

News

[12/21/2023] We release the recording of trajectories performed by human annotators on ~170 tasks. Check out the resource page for more details.
[11/3/2023] Multiple features!
- Uploaded newest execution trajectories
- Added Amazon Machine Image that pre-installed all websites so that you don't have to!
- Zeno x WebArena which allows you to analyze your agents on WebArena without pain. Check out this notebook to upload your own data to Zeno, and this page for browsing our existing results!
[10/24/2023] We re-examined the whole dataset and fixed the spotted annotation bugs. The current version (v0.2.0) is relatively stable and we don't expect major updates on the annotation in the future. The new results with better prompts and the comparison with human performance can be found in our paper
[8/4/2023] Added the instructions and the docker resources to host your own WebArena Environment. Check out this page for details.
[7/29/2023] Added a well commented script to walk through the environment setup. ## Install ```bash # Python 3.10+ conda create -n webarena python=3.10; conda activate webarena pip install -r requirements.txt playwright install pip install -e .

optional, dev only

pip install -e ".[dev]" mypy --install-types --non-interactive browserenv agents evaluationharness pip install pre-commit pre-commit install ```

Quick Walkthrough

Check out this script for a quick walkthrough on how to set up the browser environment and interact with it using the demo sites we hosted. This script is only for education purpose, to perform reproducible experiments, please check out the next section. In the nutshell, using WebArena is very similar to using OpenAI Gym. The following code snippet shows how to interact with the environment. ```python from browserenv import ScriptBrowserEnv, createidbasedaction

init the environment

env = ScriptBrowserEnv( headless=False, observationtype="accessibilitytree", currentviewportonly=True, viewport_size={"width": 1280, "height": 720}, )

prepare the environment for a configuration defined in a json file

configfile = "configfiles/0.json" obs, info = env.reset(options={"configfile": configfile})

get the text observation (e.g., html, accessibility tree) through obs["text"]

create a random action

id = random.randint(0, 1000) action = createidbased_action(f"click [id]")

take the action

obs, _, terminated, _, info = env.step(action) ```

End-to-end Evaluation

[!IMPORTANT] To ensure the correct evaluation, please setup your own WebArena websites following step 1 and step 2. The demo sites are only for browsing purpose to help you better understand the content. After evaluating the 812 examples, reset the environment to the initial state following the instructions here.

Setup the standalone environment. Please check out this page for details.
Configurate the urls for each website. bash export SHOPPING="<your_shopping_site_domain>:7770" export SHOPPING_ADMIN="<your_e_commerce_cms_domain>:7780/admin" export REDDIT="<your_reddit_domain>:9999" export GITLAB="<your_gitlab_domain>:8023" export MAP="<your_map_domain>:3000" export WIKIPEDIA="<your_wikipedia_domain>:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing" export HOMEPAGE="<your_homepage_domain>:4399" # this is a placeholder

You are encouraged to update the environment variables in github workflow to ensure the correctness of unit tests

Generate config file for each test example bash python scripts/generate_test_data.py You will see *.json files generated in config_files folder. Each file contains the configuration for one test example.
Obtain the auto-login cookies for all websites mkdir -p ./.auth python browser_env/auto_login.py
export OPENAI_API_KEY=your_key, a valid OpenAI API key starts with sk-
Launch the evaluation bash python run.py \ --instruction_path agent/prompts/jsons/p_cot_id_actree_2s.json \ # this is the reasoning agent prompt we used in the paper --test_start_idx 0 \ --test_end_idx 1 \ --model gpt-3.5-turbo \ --result_dir <your_result_dir> This script will run the first example with GPT-3.5 reasoning agent. The trajectory will be saved in <your_result_dir>/0.html

Develop Your Prompt-based Agent

Define the prompts. We provide two baseline agents whose corresponding prompts are listed here. Each prompt is a dictionary with the following keys: python prompt = { "intro": <The overall guideline which includes the task description, available action, hint and others>, "examples": [ ( example_1_observation, example_1_response ), ( example_2_observation, example_2_response ), ... ], "template": <How to organize different information such as observation, previous action, instruction, url>, "meta_data": { "observation": <Which observation space the agent uses>, "action_type": <Which action space the agent uses>, "keywords": <The keywords used in the template, the program will later enumerate all keywords in the template to see if all of them are correctly replaced with the content>, "prompt_constructor": <Which prompt construtor is in used, the prompt constructor will construct the input feed to an LLM and extract the action from the generation, more details below>, "action_splitter": <Inside which splitter can we extract the action, used by the prompt constructor> } }
Implement the prompt constructor. An example prompt constructor using Chain-of-thought/ReAct style reasoning is here. The prompt constructor is a class with the following methods:
construct: construct the input feed to an LLM
_extract_action: given the generation from an LLM, how to extract the phrase that corresponds to the action

Citation

If you use our environment or data, please cite our paper: @article{zhou2023webarena, title={WebArena: A Realistic Web Environment for Building Autonomous Agents}, author={Zhou, Shuyan and Xu, Frank F and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and others}, journal={arXiv preprint arXiv:2307.13854}, year={2023} }

Owner

Name: -3σ
Login: aeromaki
Kind: user
Location: Seoul
Company: jobless

Website: m3sigma.net
Repositories: 1
Profile: https://github.com/aeromaki

buy me a 🌯

Citation (CITATION.cff)

@article{zhou2023webarena,
  title={WebArena: A Realistic Web Environment for Building Autonomous Agents},
  author={Zhou, Shuyan and Xu, Frank F and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and others},
  journal={arXiv preprint arXiv:2307.13854},
  year={2023}
}

GitHub Events

Total

Last Year

Committers

Last synced: about 1 year ago

All Time

Total Commits: 116
Total Committers: 10
Avg Commits per committer: 11.6
Development Distribution Score (DDS): 0.328

Past Year

Commits: 8
Committers: 2
Avg Commits per committer: 4.0
Development Distribution Score (DDS): 0.25

Top Committers

Name	Email	Commits
alexisxy	a**8@g**m	78
Tianyue Ou	t**3@j**u	16
Frank Xu	f**4@g**m	7
-3σ	y**e@m**t	6
Haofei Yu	h**y@c**u	2
Uri Alon	u**1@g**m	2
Massimo Caccia	m**a@g**m	2
Nicholas Chen	6****i	1
Ikko Eltociear Ashimine	e**r@g**m	1
Anam Hira	h**9@g**m	1

Committer Domains (Top 20 + Academic)

cs.cmu.edu: 1 m3sigma.net: 1 jhu.edu: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/pre-commit.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
pre-commit/action v3.0.0 composite

.github/workflows/tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

pyproject.toml pypi

requirements-dev.txt pypi

types-beautifulsoup4 * development
types-requests * development

requirements.txt pypi

Pillow *
aiolimiter *
beartype *
evaluate *
flask *
gymnasium *
nltk *
openai ==0.27.0
playwright *
python-dotenv *
text-generation *
tiktoken *
tqdm *
transformers ==4.33.2
types-tqdm *

setup.py pypi

webarena

Science Score: 64.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.md

WebArena: A Realistic Web Environment for Building Autonomous Agents

News

optional, dev only

Quick Walkthrough

init the environment

prepare the environment for a configuration defined in a json file

get the text observation (e.g., html, accessibility tree) through obs["text"]

create a random action

take the action

End-to-end Evaluation

Develop Your Prompt-based Agent

Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies