burstgpt
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Keywords
Repository
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
Basic Info
Statistics
- Stars: 159
- Watchers: 6
- Forks: 9
- Open Issues: 2
- Releases: 2
Topics
Metadata Files
README.md
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
[!IMPORTANT] 🚧 Traces with new columns
SessionIDandElapsed timeare under collection now and will be available soon!
This repository contains public releases of a real-world trace dataset of LLM serving workloads for the benefit of the research and academic community.
This LLM serving is powered by Microsoft Azure.
There are currently 4 files in Release v1.1:
BurstGPT_1.csvcontains all of our trace in the first 2 months with some failure thatResponse tokensare0s. Totally 1429.7k lines.BurstGPT_without_fails_1.csvcontains all of our trace in the first 2 months without failure. Totally 1404.3k lines.BurstGPT_2.csvcontains all of our trace in the second 2 months with some failure thatResponse tokensare0s. Totally 3858.4k lines.BurstGPT_without_fails_2.csvcontains all of our trace in the second 2 months without failure. Totally 3784.2k lines.
BurstGPT_1.csv is also in /data for you to use.
Usage
- You may scale the average Requests Per Second (RPS) in the trace according to your evaluation setups.
- You may also model the patterns in the trace as indicated in our paper and scale the parameters in the models.
- Check our simple request generator demo in
example/. If you have some specific needs, we are eager to assist you in exploring and leveraging the trace to its fullest potential. Please let us know of any issues or questions by sending email to mailing list.
Future Plans
- We will continue to update the time range of the trace and add the end time of each request.
- We will update the conversation log, including the session IDs, time stamps, etc, in each conversation, for researchers to optimize conversation services.
- We will open-source the full benchmark suite for LLM inference soon.
Paper
If the trace is utilized in your research, please ensure to reference our paper:
bibtex
@inproceedings{BurstGPT,
author = {Yuxin Wang and Yuhan Chen and Zeyu Li and Xueze Kang and Yuchu Fang and Yeju Zhou and Yang Zheng and Zhenheng Tang and Xin He and Rui Guo and Xin Wang and Qiang Wang and Amelie Chi Zhou and Xiaowen Chu},
title = {{BurstGPT}: A Real-World Workload Dataset to Optimize LLM Serving Systems},
booktitle = {Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD ’25)},
year = {2025},
address = {Toronto, ON, Canada},
publisher = {ACM},
doi = {https://doi.org/10.1145/3711896.3737413},
url = {https://doi.org/10.1145/3711896.3737413},
}
Main characteristics
- Duration: 121 consecutive days in 4 consecutive months.
- Dataset size: ~5.29M lines, ~188MB.
Schema
Timestamp: request submission time, seconds from0:00:00on the first day.Model: called models, includingChatGPT(GPT-3.5) andGPT-4.Request tokens: Request tokens length.Response tokens: Response tokens length.Total tokens: Request tokens length plus response tokens length.Log Type: the way users call the model, in conversation mode or using API, includingConversation logandAPI log.
Data Overview (First 2 Months)

*Figure 1: Weekly Periodicity in BurstGPT.*

*Figure 2: Daily Periodicity in BurstGPT.*

*Figure 3: Average Daily Request and Response Throughput in BurstGPT.*

*Figure 4: Statistics of Request and Response Tokens in BurstGPT.*
Owner
- Name: High Performance Machine Learning Laboratory
- Login: HPMLL
- Kind: organization
- Repositories: 1
- Profile: https://github.com/HPMLL
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this dataset, please cite it as below."
authors:
- given-names: "Yuxin"
family-names: "Wang"
- given-names: "Yuhan"
family-names: "Chen"
- given-names: "Zeyu"
family-names: "Li"
- given-names: "Xueze"
family-names: "Kang"
- given-names: "Zhenheng"
family-names: "Tang"
- given-names: "Rui"
family-names: "Guo"
- given-names: "Xin"
family-names: "Wang"
- given-names: "Qiang"
family-names: "Wang"
- given-names: "Amelie Chi"
family-names: "Zhou"
- given-names: "Xiaowen"
family-names: "Chu"
title: "BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems"
version: 1.0
doi: 10.48550/arXiv.2401.17644
date-released: 2024-1-31
url: "https://github.com/HPMLL/BurstGPT"
preferred-citation:
type: article
authors:
- given-names: "Yuxin"
family-names: "Wang"
- given-names: "Yuhan"
family-names: "Chen"
- given-names: "Zeyu"
family-names: "Li"
- given-names: "Xueze"
family-names: "Kang"
- given-names: "Zhenheng"
family-names: "Tang"
- given-names: "Rui"
family-names: "Guo"
- given-names: "Xin"
family-names: "Wang"
- given-names: "Qiang"
family-names: "Wang"
- given-names: "Amelie Chi"
family-names: "Zhou"
- given-names: "Xiaowen"
family-names: "Chu"
doi: "10.48550/arXiv.2401.17644"
journal: "arXiv preprint arXiv:2401.17644"
title: "BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems"
year: 2024
GitHub Events
Total
- Issues event: 4
- Watch event: 65
- Issue comment event: 1
- Fork event: 4
Last Year
- Issues event: 4
- Watch event: 65
- Issue comment event: 1
- Fork event: 4
Dependencies
- aiohttp *
- argparse *
- bisect *
- contextlib *
- matplotlib *
- numpy *
- torch *
- transformers *
- vllm *