llvm-opt-benchmark

An LLVM IR dataset for data-driven compiler optimization research

https://github.com/dtcxzyw/llvm-opt-benchmark

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary

Keywords

compiler-construction llvm llvm-ir
Last synced: 6 months ago · JSON representation

Repository

An LLVM IR dataset for data-driven compiler optimization research

Basic Info
Statistics
  • Stars: 62
  • Watchers: 5
  • Forks: 7
  • Open Issues: 58
  • Releases: 0
Topics
compiler-construction llvm llvm-ir
Created about 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

LLVM Opt Benchmark

CI GitHub repo size in bytes

LLVM Opt Benchmark is an LLVM IR dataset for data-driven compiler optimization research. This repository is also used by LLVM developers to evaluate the impact of their patches on real-world applications.

Don't submit PR to add new benchmarks. You can request new open-source C/C++/Rust repos here.

Please cite this work with the following BibTex entry: @misc{opt-benchmark, title = {LLVM Opt Benchmark}, url = {https://github.com/dtcxzyw/llvm-opt-benchmark}, author = {Yingwei Zheng}, year = {2023}, }

FAQs

LLVM developers use this corpus to assess the impact of your patches on real-world applications. If you see a link to this repository in your PR, it means that the target PR demonstrates some performance regressions or improvements caused by your changes. Here are some common questions you may have:

How can I reproduce the regression locally?

You should be able to reproduce the regression locally in the following steps:

```

Apply your patch and rebuild opt.

...

Download the source IR. Note that you should replace optimized with original.

wget https://raw.githubusercontent.com/dtcxzyw/llvm-opt-benchmark/refs/heads/main/bench//original/.ll

Run opt to generate the optimized IR.

bin/opt -O3 -disable-loop-unrolling -vectorize-loops=false -vectorize-slp=false -S .ll -o opt.ll ```

Note that you don't need to clone the whole repository.

How can I evaluate my patch on this benchmark locally?

It is not recommended, as you can use the online service to evaluate your patch on GitHub if you have commit access to the LLVM repository.

You can use python3 ./scripts/gen_optimized.py bench <path-to-opt>. It will update the optimized IR files. Then you can review the diff with git.

The compile-time evaluation shows a huge impact on some files. What should I do?

Don't worry about it. If it doesn't affect the compile-time of the parent projects, it is generally acceptable. Otherwise, you may need to adjust the threshold or just handle simple cases.

What should I do when I see a regression?

Don't panic. Perfect is the enemy of good. We never ask the contributors to fix all the regressions before landing their patches.

Please follow the InstCombineContributorGuide to generalize your patch to cover the regression. If it doesn't work, try to find the pattern and file a separate issue. If it is hard to be caught by a separate transformation, try to bail out on the regression case. If we cannot make it better, the patch can still be accepted if the net effect is positive. Ask your reviewer to help you with the decision.

My method is expensive in compile time. But it shows some optimization opportunities. Should I abandon it?

Though we cannot accept the patch, we still encourage you to explore alternative approaches to handle the exposed optimization opportunities. As the distribution of the real-world code is not uniform, in general, a simple heuristic is good enough to cover most of the cases.

The evaluation result shows my patch has no effect on the benchmark. What does it mean?

We ask the issue reporter and the contributor to provide a motivating example from real-world scenarios. This benchmark only provides additional evidences to support the claim. It is highly recommended to run this benchmark if the real-world use case is missing, or it is found by fuzzers and super-optimizers. See also InstCombineContributorGuide.

The following patches may not be suitable for this benchmark: + SLPVectorizer/LoopVectorize/LoopUnroll patches. Vectorization and loop unrolling are disabled since the diff is huge and hard to review. The performance is highly dependent on the target machine so the running time may be more representative. + Sanitizer/Instrumentation/GPU patches. The related patterns are not included in this corpus. + Patches which handle scalable vectors. This corpus only contains fixed-width vectors (generated from X86 intrinsics).

Do the regressions in IR diff imply the run-time performance regressions?

Not necessarily. The IR diff is only a proxy for the run-time performance. Generally fewer instructions at IR level implies better analysis result and less instructions at run-time. However, it depends on the target micro-architecture and the LLVM CodeGen components. For example, a canonicalization in InstCombine may cause the SelectionDAG to not recognize certain patterns, leading to bad codegen. Please refer to llvm-codegen-benchmark for frequent isel patterns. Anyway, the run-time performance should be the golden metric. The IR diff only helps us to find the root cause of regressions.

In addition, most of IR snippets are not the hot paths in the real-world applications. I choose to keep all the source IR files instead of only keeping the hot spots, as it is useful for monitoring the code size changes, which is also critical for the frontend performance on modern devices. Another reason is that we cannot find the hot paths in large applications like LLVM and verilator-generated simulators. BTW the training data for PGO in some programs is unavailable or highly biased, you know :).

The IR diff looks weird. It contains some invalid instructions. Is it a bug?

Many IR diffs only change the name of instructions and basic blocks. Previously, I used llvm-diff to reduce meaningless changes. However, it is slow and ineffective. Now I use a heuristic name-remapping algorithm to reduce the noise. The algorithm can reduce up to 70% of line changes. However, as it works on textual diff and does not understand the semantics of LLVM IR, it may produce some invalid instructions. Please check the raw diff in the previous commit pre-commit: Update.

The IR diff contains hundreds of file changes. How can I review it efficiently?

To fit the GitHub's limit of diff rendering, only part of the files are picked to be committed. It is chosen by a heuristic algorithm to improve the diversity of the dataset.

In the diff mode, a summary of the diff is also provided. It contains some key information to allow you to quickly review the changes:

  • The number of files changed, lines added and removed (provided by git diff --shortstat). It is different from the numbers on the GitHub page, as it counts the statistics before diff reduction.
  • A summary of the top-10 LLVM statistics changes.
  • The number of line changes in each file (provided by git show <base>..HEAD --numstat --oneline). You can use this to quickly find the file with the most line additions or deletions (e.g., cat log | awk '{print $1 - $2, $3}' | sort -n).
  • A summary from LLM (powered by Qwen). It provides a high-level overview of the changes. However, it always gives a positive response, so it may not be very useful. You can use it to find the files that are worth reviewing in detail.

From my own experience, the patterns are likely to be similar in the same project. So you can skip the whole project after you review the first few files in the same project. If your patch optimizes the C++/Rust standard library and other widely-used libraries, you can also skip the files with similar bb names in the hunk header (e.g., _ZNSt6vector...).

The IR diff is totally unrelated to my patch. Why?

Your changes may break existing optimizations. Please reproduce it locally and try to provide a minimal phase-ordering regression test. Then follow the instructions for dealing with regressions above.

Online services (previously hosted by PLCT Lab, ISCAS/currently hosted by SUSTech ARiSE Lab)

Special Acknowledgement: Thank @goldsteinn for providing additional computational resources to meet the growing demand for testing!

  • Fuzzy DAG matching

Please file an issue to provide LLVM IR with a single function. I will add the grep label to trigger CI.

Example: https://github.com/dtcxzyw/llvm-opt-benchmark/issues/1072

  • Middle-end optimization pre-commit testing

Ping me if you want to see what is affected by your PR. It is useful for reviewers to find potential performance regressions and new optimization opportunities.

For convenience, all llvm members are authorized to request pre-commit tests in https://github.com/dtcxzyw/llvm-opt-benchmark/issues/1312. Some basic PR editing commands are also supported by leaving a comment starts with /:

  • /close : Close the PR
  • /reopen: Reopen the PR
  • /add-label labels: Add labels (separated by comma). Available labels: reviewed, regression, crash, hang and miscompilation.
  • /remove-label labels: Remove labels.

    • Codegen pre-commit testing

See also llvm-codegen-benchmark.

  • Weekly coverage report:

https://dtcxzyw.github.io/llvm-opt-benchmark/

Benchmark List

Currently, this repository contains the following libraries/applications:

|Name|Language|Stars|Last Updated|Active Files| |---|---|---|---|---| |abc|C|stars|2025-01-02|961| |bdwgc|C|stars|2025-02-01|4| |box2d|C|stars|2025-01-27|82| |brotli|C|stars|2025-01-31|20| |c3c|C|stars|2024-04-16|63| |chibicc|C|stars|2020-12-07|8| |cjson|C|stars|2024-09-23|2| |clamav|C|stars|2025-02-03|234| |cmake|C|stars|2025-02-04|632| |coremark|C|stars|2023-01-24|3| |cpython|C|stars|2025-02-03|249| |curl|C|stars|2025-02-03|114| |darktable|C|stars|2025-02-03|394| |ffmpeg|C|stars|2025-05-19|1890| |flac|C|stars|2025-02-03|48| |freetype|C|stars|2025-01-28|30| |git|C|stars|2025-02-03|330| |graphviz|C|stars|2025-02-10|231| |hdf5|C|stars|2025-02-14|342| |hwloc|C|stars|2025-02-13|46| |jemalloc|C|stars|2025-02-13|48| |jq|C|stars|2025-02-16|43| |kcp|C|stars|2024-12-01|1| |lean4|C|stars|2025-05-20|1047| |libdeflate|C|stars|2025-01-20|12| |libevent|C|stars|2025-02-03|27| |libjpeg-turbo|C|stars|2024-12-18|77| |libpng|C|stars|2025-02-12|16| |libquic|C|stars|2016-09-22|379| |libsodium|C|stars|2025-01-26|79| |libuv|C|stars|2025-02-17|25| |libwebp|C|stars|2025-01-30|99| |linux|C|stars|2024-02-29|1195| |lua|C|stars|2025-01-29|29| |luajit|C|stars|2025-01-13|67| |lvgl|C|stars|2025-02-17|126| |lz4|C|stars|2025-02-03|10| |memcached|C|stars|2025-02-04|26| |mimalloc|C|stars|2025-02-17|15| |miniaudio|C|stars|2023-11-15|1| |nanosvg|C|stars|2024-12-19|1| |nuklear|C|stars|2025-02-07|1| |nuttx|C|stars|2024-03-04|102| |ompi|C|stars|2025-02-14|231| |oniguruma|C|stars|2025-02-11|18| |openblas|C|stars|2025-02-17|331| |openssl|C|stars|2025-02-18|1048| |osqp|C|stars|2025-02-13|17| |php-src|C|stars|2025-02-17|348| |portaudio|C|stars|2025-02-08|11| |postgres|C|stars|2025-02-18|800| |qemu|C|stars|2025-02-16|49| |qoi|C|stars|2025-02-12|1| |quickjs|C|stars|2024-07-27|8| |raylib|C|stars|2025-02-17|7| |redis|C|stars|2025-02-16|146| |riscv-isa-sim|C|stars|2025-02-12|931| |ruby|C|stars|2025-02-18|178| |sdl|C|stars|2025-05-19|275| |slurm|C|stars|2025-02-17|297| |sqlite|C|stars|2025-02-18|3| |stb|C|stars|2024-11-08|17| |sundials|C|stars|2024-12-20|195| |wireshark|C|stars|2025-02-18|1537| |wolfssl|C|stars|2025-02-17|40| |yyjson|C|stars|2025-02-12|1| |zlib|C|stars|2025-02-13|13| |zstd|C|stars|2025-02-13|31| |abseil-cpp|C++|stars|2025-02-15|335| |annoy|C++|stars|2024-07-28|1| |arrow|C++|stars|2025-02-17|169| |assimp|C++|stars|2025-02-17|207| |boost|C++|stars|2024-10-25|368| |bullet3|C++|stars|2025-01-29|196| |casadi|C++|stars|2025-02-18|199| |ceres-solver|C++|stars|2025-02-17|121| |cpp-httplib|C++|stars|2025-02-17|1| |crow|C++|stars|2025-02-10|13| |csmith|C++|stars|2023-11-02|60| |cvc5|C++|stars|2025-02-17|679| |cxxopts|C++|stars|2025-01-14|1| |double-conversion|C++|stars|2025-02-14|7| |draco|C++|stars|2025-01-28|82| |duckdb|C++|stars|2025-02-18|227| |eastl|C++|stars|2023-08-16|86| |entt|C++|stars|2025-02-14|72| |faiss|C++|stars|2025-02-14|160| |flatbuffers|C++|stars|2025-02-10|35| |fmt|C++|stars|2025-02-14|26| |folly|C++|stars|2025-02-17|231| |g2o|C++|stars|2025-02-09|124| |glog|C++|stars|2025-02-16|20| |glslang|C++|stars|2024-06-25|42| |gromacs|C++|stars|2025-02-24|776| |grpc|C++|stars|2025-02-24|314| |gsl|C++|stars|2025-02-14|12| |harfbuzz|C++|stars|2025-02-23|15| |hermes|C++|stars|2023-12-15|228| |hyperscan|C++|stars|2023-04-19|197| |icu|C++|stars|2025-02-21|432| |imgui|C++|stars|2025-02-22|5| |ipopt|C++|stars|2025-02-23|105| |json|C++|stars|2025-02-21|77| |jsonnet|C++|stars|2025-02-23|17| |libcxx|C++|stars|2025-05-20|86| |libigl|C++|stars|2025-05-14|523| |libphonenumber|C++|stars|2025-02-13|34| |libzmq|C++|stars|2024-12-30|74| |lief|C++|stars|2025-02-23|323| |lightgbm|C++|stars|2025-02-24|33| |llama.cpp|C++|stars|2025-02-23|37| |llvm-project|C++|stars|2025-02-03|2166| |lodepng|C++|stars|2024-12-28|3| |luau|C++|stars|2025-02-21|153| |meshlab|C++|stars|2024-02-13|202| |meshoptimizer|C++|stars|2025-02-21|15| |minetest|C++|stars|2024-03-26|311| |mitsuba3|C++|stars|2024-03-22|152| |mixbox|C++|stars|2022-12-16|1| |mold|C++|stars|2025-02-21|86| |msdfgen|C++|stars|2024-01-06|16| |msgpack-c|C++|stars|2025-02-21|19| |nanobind|C++|stars|2025-02-21|29| |ncnn|C++|stars|2025-02-20|359| |nghttp2|C++|stars|2025-02-18|17| |ninja|C++|stars|2025-02-19|58| |nix|C++|stars|2024-03-06|211| |node|C++|stars|2023-12-17|155| |nori|C++|stars|2023-11-15|45| |open3d|C++|stars|2025-04-03|384| |open_spiel|C++|stars|2024-08-27|254| |opencc|C++|stars|2025-02-12|24| |opencolorio|C++|stars|2025-02-10|180| |opencv|C++|stars|2025-02-25|1510| |openexr|C++|stars|2025-02-18|149| |openimageio|C++|stars|2025-02-25|104| |openjdk|C++|stars|2024-07-16|1116| |openusd|C++|stars|2024-07-24|890| |openvdb|C++|stars|2023-12-06|37| |ozz-animation|C++|stars|2025-01-19|38| |pbrt-v4|C++|stars|2025-01-30|60| |pcg-cpp|C++|stars|2022-04-08|6| |pocketpy|C++|stars|2024-06-20|26| |proj|C++|stars|2025-02-22|219| |protobuf|C++|stars|2023-12-15|120| |proxy|C++|stars|2024-05-22|5| |proxygen|C++|stars|2023-12-16|83| |pugixml|C++|stars|2025-02-19|1| |quantlib|C++|stars|2024-09-10|881| |quest|C++|stars|2025-02-08|7| |re2|C++|stars|2023-12-14|17| |readerwriterqueue|C++|stars|2024-07-09|2| |recastnavigation|C++|stars|2024-01-28|46| |rocksdb|C++|stars|2025-02-26|320| |sentencepiece|C++|stars|2025-02-27|51| |simdjson|C++|stars|2025-02-21|1| |snappy|C++|stars|2024-08-17|1| |soc-simulator|C++|stars|2024-06-25|5| |spdlog|C++|stars|2025-02-11|7| |stockfish|C++|stars|2024-03-03|14| |taskflow|C++|stars|2025-02-21|39| |tev|C++|stars|2024-01-12|22| |tinygltf|C++|stars|2025-01-22|1| |tinympc|C++|stars|2025-02-11|8| |tinyobjloader|C++|stars|2025-01-29|1| |tinyrenderer|C++|stars|2025-02-21|4| |tomlplusplus|C++|stars|2025-02-27|1| |vcpkg-tool|C++|stars|2025-02-27|141| |velox|C++|stars|2023-12-15|171| |verilator|C++|stars|2025-03-02|141| |wasmedge|C++|stars|2024-07-15|66| |xgboost|C++|stars|2025-03-01|103| |yalantinglibs|C++|stars|2023-12-17|52| |yaml-cpp|C++|stars|2025-01-24|31| |yoga|C++|stars|2025-02-27|13| |yosys|C++|stars|2025-03-01|310| |z3|C++|stars|2025-02-28|801| |zfp|C++|stars|2025-02-12|35| |zxing-cpp|C++|stars|2025-02-19|95| |actix-web|Rust|stars|2024-04-15|114| |anki|Rust|stars|2024-06-24|8| |clap|Rust|stars|2024-03-01|19| |coreutils|Rust|stars|2024-04-23|712| |deku|Rust|stars|2025-05-16|3| |delta-rs|Rust|stars|2024-04-23|119| |diesel|Rust|stars|2024-03-01|227| |egg|Rust|stars|2024-08-30|15| |elfshaker|Rust|stars|2025-05-09|16| |fish-shell|Rust|stars|2025-05-19|25| |foundations|Rust|stars|2025-05-19|15| |html5ever|Rust|stars|2023-09-06|42| |hyper|Rust|stars|2024-03-02|4| |image|Rust|stars|2024-02-22|16| |influxdb|Rust|stars|2024-03-01|46| |jiff|Rust|stars|2025-05-18|16| |json|Rust|stars|2024-01-11|15| |just|Rust|stars|2024-04-01|16| |log|Rust|stars|2024-02-29|1| |logos|Rust|stars|2024-06-10|31| |meilisearch|Rust|stars|2024-06-25|44| |mini-lsm|Rust|stars|2024-02-26|45| |nom|Rust|stars|2024-04-21|6| |ockam|Rust|stars|2024-04-22|282| |pingora|Rust|stars|2025-05-09|124| |polars|Rust|stars|2025-05-19|303| |pyo3|Rust|stars|2024-06-24|30| |qdrant|Rust|stars|2024-03-19|44| |quiche|Rust|stars|2025-05-19|62| |quinn|Rust|stars|2025-05-20|48| |raft-rs|Rust|stars|2025-02-28|26| |rand|Rust|stars|2024-02-18|8| |rayon|Rust|stars|2024-02-27|21| |regex|Rust|stars|2024-01-10|44| |ring|Rust|stars|2024-03-03|16| |ripgrep|Rust|stars|2024-03-27|89| |ropey|Rust|stars|2024-04-08|15| |ruff|Rust|stars|2025-05-19|421| |rust-analyzer|Rust|stars|2024-04-22|470| |rust-base64|Rust|stars|2024-03-01|7| |rustfmt|Rust|stars|2024-03-04|16| |rustls|Rust|stars|2024-03-07|15| |salsa|Rust|stars|2025-05-19|16| |serde|Rust|stars|2024-01-08|1| |smol|Rust|stars|2024-03-04|16| |softposit-rs|Rust|stars|2022-12-14|11| |statrs|Rust|stars|2024-06-24|15| |syn|Rust|stars|2024-01-13|16| |tikv|Rust|stars|2025-05-20|10| |tokenizers|Rust|stars|2024-05-06|16| |tokio|Rust|stars|2024-03-04|41| |tree-sitter|Rust|stars|2024-03-08|85| |turborepo|Rust|stars|2024-10-03|59| |typst|Rust|stars|2024-03-25|81| |unicode-normalization|Rust|stars|2024-03-03|2| |uv|Rust|stars|2025-05-19|538| |wasmi|Rust|stars|2025-05-17|85| |wasmtime|Rust|stars|2024-04-22|301| |yara-x|Rust|stars|2025-08-08|143| |zed|Rust|stars|2024-10-04|1218|

Owner

  • Name: Yingwei Zheng
  • Login: dtcxzyw
  • Kind: user
  • Company: SUSTech

CG & HPC & Compiler

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1,425
  • Total pull requests: 1,112
  • Average time to close issues: about 13 hours
  • Average time to close pull requests: about 1 month
  • Total issue authors: 6
  • Total pull request authors: 4
  • Average comments per issue: 0.06
  • Average comments per pull request: 1.16
  • Merged pull requests: 4
  • Bot issues: 1,369
  • Bot pull requests: 79
Past Year
  • Issues: 644
  • Pull requests: 663
  • Average time to close issues: about 9 hours
  • Average time to close pull requests: 9 days
  • Issue authors: 4
  • Pull request authors: 4
  • Average comments per issue: 0.04
  • Average comments per pull request: 1.2
  • Merged pull requests: 4
  • Bot issues: 618
  • Bot pull requests: 32
Top Authors
Issue Authors
  • github-actions[bot] (1,369)
  • dtcxzyw (52)
  • v01dXYZ (1)
  • nikic (1)
  • goldsteinn (1)
  • DianQK (1)
Pull Request Authors
  • dtcxzyw (919)
  • zyw-bot (108)
  • github-actions[bot] (79)
  • goldsteinn (6)
Top Labels
Issue Labels
reviewed (772) non-deterministic (220) grep (45) crash (8) regression (7) queue (1)
Pull Request Labels
reviewed (484) regression (103) non-deterministic (20) crash (17) hang (2) grep (1) miscompilation (1)

Dependencies

.github/workflows/pre-commit.yml actions
  • actions/checkout v4 composite
  • thollander/actions-comment-pull-request v2 composite
.github/workflows/llvm-ci.yml actions
  • JasonEtco/create-an-issue v2 composite
  • actions/checkout v4 composite