https://github.com/bytedance/web-bench

Web-Bench is a benchmark designed to evaluate the performance of LLMs in actual Web development.

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.0%) to scientific vocabulary

Keywords

benchmark

Last synced: 9 months ago · JSON representation

Repository

Web-Bench is a benchmark designed to evaluate the performance of LLMs in actual Web development.

Basic Info

Host: GitHub
Owner: bytedance
License: apache-2.0
Language: JavaScript
Default Branch: main
Homepage: https://huggingface.co/spaces/bytedance-research/Web-Bench-Leaderboard
Size: 3.82 MB

Statistics

Stars: 205
Watchers: 5
Forks: 21
Open Issues: 6
Releases: 1

Topics

benchmark

Created about 1 year ago · Last pushed 9 months ago

Metadata Files

Readme License

Web-Bench

中文 • Install • Paper • Datasets • LeaderBoard • Citation

📖 Overview

Web-Bench is a benchmark designed to evaluate the performance of LLMs in actual Web development. Web-Bench contains 50 projects, each consisting of 20 tasks with sequential dependencies. The tasks implement project features in sequence, simulating real-world human development workflows. When designing Web-Bench, we aim to cover the foundational elements of Web development: Web Standards and Web Frameworks. Given the scale and complexity of these projects, which were designed by engineers with 5-10 years of experience, each presents a significant challenge. On average, a single project takes 4–8 hours for a senior engineer to complete. On our given benchmark agent (Web-Agent), SOTA (Claude 3.7 Sonnet) achieves only 25.1\% Pass@1.

The distribution of the experimental data aligns well with the current code generation capabilities of mainstream LLMs.

pass@1

HumanEval and MBPP have approached saturation. APPS and EvalPlus are approaching saturation. The SOTA for Web-Bench is 25.1\%, which is lower (better) than that of the SWE-bench Full and Verified sets.

SOTAs

🚀 Quick Start

Refer to the Docker setup guide for instructions on installing Docker on your machine

Create a new empty folder, add two files in this folder:

./config.json5 ./docker-compose.yml

For config.json5, copy the json below and edit by Config Parameters:

json5 { models: [ 'openai/gpt-4o', // You can add more models here // "claude-sonnet-4-20250514" ], // Eval one project only // "projects": ["@web-bench/react"] }

For docker-compose.yml, copy the yaml below and set environment

yaml services: web-bench: image: maoyiweiebay777/web-bench:latest volumes: - ./config.json5:/app/apps/eval/src/config.json5 - ./report:/app/apps/eval/report environment: # Add enviorment variables according to apps/src/model.json - OPENROUTER_API_KEY=your_api_key # Add more model's key # - ANTHROPIC_API_KEY=your_api_key

Run docker-compose:

bash docker compose up

Evaluation Report will be generated under ./report/

If you wish to evaluate from source code, refer to Install from source.

🛠️ Contribution

Project Contribution

📚 Citation

bibtex @article{xu2025webbench, title={Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks}, author={Xu, Kai and Mao, YiWei and Guan, XinYi and Feng, ZiLong}, journal={arXiv preprint arXiv:2505.07473}, year={2025} }

📄 License

Apache 2.0

🌟 Contact us

Lark: Scan the QR code below with Register Feishu to join our Web Bench user group.

pass@1

Discord

Owner

Name: Bytedance Inc.
Login: bytedance
Kind: organization
Location: Singapore

Website: https://opensource.bytedance.com
Twitter: ByteDanceOSS
Repositories: 255
Profile: https://github.com/bytedance

GitHub Events

Total

Create event: 19
Issues event: 54
Release event: 1
Watch event: 140
Delete event: 18
Issue comment event: 48
Push event: 87
Gollum event: 101
Pull request review comment event: 6
Pull request review event: 14
Pull request event: 90
Fork event: 17

Last Year

Create event: 19
Issues event: 54
Release event: 1
Watch event: 140
Delete event: 18
Issue comment event: 48
Push event: 87
Gollum event: 101
Pull request review comment event: 6
Pull request review event: 14
Pull request event: 90
Fork event: 17

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 30
Total pull requests: 49
Average time to close issues: 5 days
Average time to close pull requests: about 1 hour
Total issue authors: 9
Total pull request authors: 5
Average comments per issue: 0.73
Average comments per pull request: 0.02
Merged pull requests: 36
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 30
Pull requests: 49
Average time to close issues: 5 days
Average time to close pull requests: about 1 hour
Issue authors: 9
Pull request authors: 5
Average comments per issue: 0.73
Average comments per pull request: 0.02
Merged pull requests: 36
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

luics (14)
xiaxiazhu (7)
diasforgood (2)
shellvon (2)
James4Ever0 (1)
mingrenbuke (1)
sijunhe (1)
joyfulcat (1)
Sunliangtai (1)

Pull Request Authors

sanmaopep (18)
liuyueweiyu (17)
luics (11)
JxJuly (2)
xiaxiazhu (1)

Top Labels

Issue Labels

bug (3) question (1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bytedance/web-bench

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Web-Bench

📖 Overview

🚀 Quick Start

🛠️ Contribution

📚 Citation

📄 License

🌟 Contact us

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels