https://github.com/amdresearch/npueval

NPUEval is an LLM evaluation dataset written specifically to target AIE kernel code generation on RyzenAI hardware.

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary

Keywords

benchmark codegen kernels llm npu optimization

Last synced: 10 months ago · JSON representation

Repository

NPUEval is an LLM evaluation dataset written specifically to target AIE kernel code generation on RyzenAI hardware.

Basic Info

Host: GitHub
Owner: AMDResearch
Language: C++
Default Branch: main
Homepage: https://amdresearch.github.io/NPUEval/
Size: 4.83 MB

Statistics

Stars: 16
Watchers: 4
Forks: 1
Open Issues: 0
Releases: 0

Topics

benchmark codegen kernels llm npu optimization

Created over 1 year ago · Last pushed 11 months ago

Metadata Files

Readme Contributing

NPUEval

NPUEval is an LLM evaluation dataset written specifically to target AIE kernel code generation on RyzenAI hardware.

Getting started

Requirements: * Ubuntu 24.04.2 or Ubuntu 24.10 (must have supported Linux kernel version >6.10) * Disable secure boot on your machine - this is needed because we'll be working with an experimental (unsigned) kernel module. * Docker - follow instructions in docs.docker.com for setup.

Once you have prerequisites use the install script: ./install.sh

This will bring up an XRT docker image that will build the XRT and XDNA debian packages which will be installed on your host machine. Then it will setup the NPUEval docker with all the tools required for NPU application compilation.

Starter notebooks

Launch the JupyterLab environment to open the notebooks and get familiar with using the dataset

./scripts/launch_jupyter.sh

You'll be able to connect from your browser on port 8888, e.g. http://localhost:8888/lab or give it an IP address if you're using the machine remotely.

Reproducing results

Currently there are 2 simple scripts to reproduce AIECoder results for gpt-4.1 and gpt-4o-mini. You can run these as regular scripts from your Jupyterlab or interactive docker session, or use docker_run_script.sh to run as individual docker sessions.

docker_run_script.sh scripts/run_completions.py docker_run_script.sh scripts/run_functional_tests.py

run_completions script will feed all the prompts to the AIECoder agent and generate solutions for each test. Make sure to set your OPENAI_API_KEY since it will be making requests to gpt-4.1 and gpt-4o-mini. run_functional_tests will evaluate the LLM generated solutions. Since this is just the evaluator it only requires the NPU and no access to an LLM.

Known issues limitations

Failed to open KMQ device (err=22): Invalid argument -- if you see this just reboot the machine, the driver can get into an unstable state. Hopefully this won't happen with newer versions of the NPU driver.
Only targeting AIE2 and AIE2P kernels. Phoenix/Hawk for AIE2 and Strix/Krackan for AIE2P.
Currently only single output kernels are supported, i.e. 1-in-1-out and 2-in-1-out.

References

Bibtex

@misc{kalade2025npuevaloptimizingnpukernels, title={NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers}, author={Sarunas Kalade and Graham Schelle}, year={2025}, eprint={2507.14403}, archivePrefix={arXiv}, primaryClass={cs.PL}, url={https://arxiv.org/abs/2507.14403}, }

Owner

Name: AMDResearch
Login: AMDResearch
Kind: organization

Repositories: 4
Profile: https://github.com/AMDResearch

GitHub Events

Total

Watch event: 11
Push event: 4
Fork event: 2

Last Year

Watch event: 11
Push event: 4
Fork event: 2

Dependencies

.github/workflows/gh-deploy.yml actions

actions/checkout v3 composite
actions/deploy-pages v4 composite
actions/setup-python v4 composite
actions/upload-pages-artifact v3 composite

setup.py pypi

CppHeaderParser *
anthropic *
llama_index *
ml_dtypes *
numpy *
openai *
pandas *
seaborn *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/amdresearch/npueval

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

NPUEval

Getting started

Starter notebooks

Reproducing results

Known issues limitations

References

Bibtex

Owner

GitHub Events

Total

Last Year

Dependencies