https://github.com/amdresearch/npueval

NPUEval is an LLM evaluation dataset written specifically to target AIE kernel code generation on RyzenAI hardware.

https://github.com/amdresearch/npueval

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary

Keywords

benchmark codegen kernels llm npu optimization
Last synced: 5 months ago · JSON representation

Repository

NPUEval is an LLM evaluation dataset written specifically to target AIE kernel code generation on RyzenAI hardware.

Basic Info
Statistics
  • Stars: 16
  • Watchers: 4
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
benchmark codegen kernels llm npu optimization
Created 11 months ago · Last pushed 6 months ago
Metadata Files
Readme Contributing

README.md

[ arxiv ] [ blog ] [ demo ] [ bibtex ]

NPUEval

NPUEval is an LLM evaluation dataset written specifically to target AIE kernel code generation on RyzenAI hardware.

Getting started

Requirements: * Ubuntu 24.04.2 or Ubuntu 24.10 (must have supported Linux kernel version >6.10) * Disable secure boot on your machine - this is needed because we'll be working with an experimental (unsigned) kernel module. * Docker - follow instructions in docs.docker.com for setup.

Once you have prerequisites use the install script: ./install.sh

This will bring up an XRT docker image that will build the XRT and XDNA debian packages which will be installed on your host machine. Then it will setup the NPUEval docker with all the tools required for NPU application compilation.

Starter notebooks

Launch the JupyterLab environment to open the notebooks and get familiar with using the dataset

./scripts/launch_jupyter.sh

You'll be able to connect from your browser on port 8888, e.g. http://localhost:8888/lab or give it an IP address if you're using the machine remotely.

Reproducing results

Currently there are 2 simple scripts to reproduce AIECoder results for gpt-4.1 and gpt-4o-mini. You can run these as regular scripts from your Jupyterlab or interactive docker session, or use docker_run_script.sh to run as individual docker sessions.

docker_run_script.sh scripts/run_completions.py docker_run_script.sh scripts/run_functional_tests.py

run_completions script will feed all the prompts to the AIECoder agent and generate solutions for each test. Make sure to set your OPENAI_API_KEY since it will be making requests to gpt-4.1 and gpt-4o-mini. run_functional_tests will evaluate the LLM generated solutions. Since this is just the evaluator it only requires the NPU and no access to an LLM.

Known issues limitations

  • Failed to open KMQ device (err=22): Invalid argument -- if you see this just reboot the machine, the driver can get into an unstable state. Hopefully this won't happen with newer versions of the NPU driver.
  • Only targeting AIE2 and AIE2P kernels. Phoenix/Hawk for AIE2 and Strix/Krackan for AIE2P.
  • Currently only single output kernels are supported, i.e. 1-in-1-out and 2-in-1-out.

References

Bibtex

@misc{kalade2025npuevaloptimizingnpukernels, title={NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers}, author={Sarunas Kalade and Graham Schelle}, year={2025}, eprint={2507.14403}, archivePrefix={arXiv}, primaryClass={cs.PL}, url={https://arxiv.org/abs/2507.14403}, }

Owner

  • Name: AMDResearch
  • Login: AMDResearch
  • Kind: organization

GitHub Events

Total
  • Watch event: 11
  • Push event: 4
  • Fork event: 2
Last Year
  • Watch event: 11
  • Push event: 4
  • Fork event: 2

Dependencies

.github/workflows/gh-deploy.yml actions
  • actions/checkout v3 composite
  • actions/deploy-pages v4 composite
  • actions/setup-python v4 composite
  • actions/upload-pages-artifact v3 composite
setup.py pypi
  • CppHeaderParser *
  • anthropic *
  • llama_index *
  • ml_dtypes *
  • numpy *
  • openai *
  • pandas *
  • seaborn *