intelliperf

Automated bottleneck detection and solution orchestration

https://github.com/amdresearch/intelliperf

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary

Keywords

amd genai gpu hip instinct llm performance rocm
Last synced: 6 months ago · JSON representation ·

Repository

Automated bottleneck detection and solution orchestration

Basic Info
  • Host: GitHub
  • Owner: AMDResearch
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 9.49 MB
Statistics
  • Stars: 4
  • Watchers: 9
  • Forks: 1
  • Open Issues: 23
  • Releases: 0
Topics
amd genai gpu hip instinct llm performance rocm
Created about 1 year ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Citation Codeowners

README.md

IntelliPerf: LLM-Powered Autonomous GPU Performance Engineer

License: MIT Ruff IntelliPerf CI DOI

[!IMPORTANT]
This project is intended for research purposes only and is provided by AMD Research and Advanced Development team. This is not a product. Use it at your own risk and discretion.

IntelliPerf

Overview

IntelliPerf is an automated performance engineering framework that addresses the complex challenge of GPU kernel optimization. Manual optimization requires deep domain expertise and is time-consuming, error-prone, and resource-intensive. IntelliPerf systematizes this workflow by orchestrating a comprehensive toolchain that automatically profiles applications using rocprofiler-compute, identifies high-level bottlenecks with Guided Tuning, pinpoints specific source code lines using Omniprobe, generates optimized code through Large Language Models (LLMs), and validates results using Accordo for correctness and performance. Built on a modular "formula-driven" architecture, it targets specific bottlenecks like bank conflicts, memory access patterns, and atomic contention through a sophisticated multi-stage optimization loop that includes profiling, analysis, code generation, and automated validation.

Key Features

  • AI-Powered Optimization: Generates optimized code using LLMs with iterative feedback for performance improvements
  • Precise Analysis: Pinpoints performance issues down to specific source code lines using compiler-based instrumentation
  • Automated Validation: Validates both correctness and performance improvements through runtime comparison
  • Comprehensive Coverage: Supports multiple bottleneck types (bank conflicts, memory access, atomic contention)
  • CI/CD Integration: Seamless workflow integration with automated pull request generation
  • Extensible Architecture: Formula-driven design for easy addition of new optimization targets

Installation

Quick Start with Containers

We provide both Apptainer and Docker images for easy setup:

Using Apptainer

bash ./apptainer/build.sh ./apptainer/run.sh

Using Docker

bash ./docker/build.sh ./docker/run.sh

Or use our prebuilt Docker image: bash docker pull audacioussw/intelliperf:latest export LLM_GATEWAY_KEY="your_api_key_here" docker run -it --rm --device=/dev/kfd --device=/dev/dri --group-add video -e LLM_GATEWAY_KEY="$LLM_GATEWAY_KEY" audacioussw/intelliperf

For baremetal installation

  1. Install Additional Dependencies: ```bash # ROCm dependencies apt-get install -y rocm-llvm-dev libzstd-dev

# KernelDB dependencies apt-get install -y libdwarf-dev

# Omniperf dependencies apt-get install -y locales locale-gen en_US.UTF-8 ```

Installation from Source

[!NOTE] Due to the complex dependency chain, IntelliPerf currently supports development mode installation only. Future versions will support standard pip installation.

  1. Clone the Repository: bash git clone git@github.com:AMDResearch/intelliperf.git cd intelliperf

  2. Install IntelliPerf: bash pip install -e .

  3. Install Dependencies: bash python3 scripts/install_tool.py --all

Environment Variables

Set the following environment variable for AI-powered optimization:

bash export LLM_GATEWAY_KEY="your_api_key_here"

Required for bank conflicts, memory access patterns, and atomic contention optimization. The AI-powered optimization supports various language models and providers through the --provider and --model command line arguments. The key should be the backend key for the specified provider.

Supported GPUs

IntelliPerf currently supports:

  • MI300X

[!NOTE] IntelliPerf may work on other AMD GPUs with ROCm compatibility, but has only been tested on MI300X.

Usage

IntelliPerf can be used to analyze and optimize your GPU applications:

bash intelliperf [options] -- <profile_cmd>

Examples

```bash

Optimize bank conflicts in a HIP application

intelliperf -b ~/rocBLAS/build.sh -f bankConflict -- ~/rocBLAS/build/bin/rocblas_gemm

Diagnose a Triton application

intelliperf -- python3 gemm.py ```

Command Line Options

| Option | Description | |----------------------------------|----------------------| | -h, --help | Show help message and exit | | -v, --verbose | Increase verbosity level (e.g., -v, -vv, -vvv) | | -b, --build_command | Command to build your application | | -i, --instrument_command | Command to build your application with instrument | | -p, --project_directory | Directory containing your codebase | | -f, --formula | Optimization formula to use (bankConflict, memoryAccess, atomicContention, diagnoseOnly) | | --top_n | Control top-n kernels in diagnoseOnly mode (default: 10) | | --num_attempts | Control optimization attempts (default: 10) | | -o, --output_file | Path to output file | | -t, --accordo_absolute_tolerance | Validation tolerance | | -m, --model | Specify the model to use for optimization (default: gpt-4o) | | -r, --provider | Specify the provider to use for optimization (default: openai) | | --internal | Use AMD's internal LLM service | | -l, --in_place | Modify source files in place during optimization (default: creates backups) | | --unittest_command | Command to run unit tests for additional validation |

[!NOTE] IntelliPerf copies the entire project directory to a temporary location. Make sure your project doesn't include any temporary CMake files if you pass the project_directory flag.

Documentation

Citation

If you use IntelliPerf or discuss our work in your research, please always cite our work:

bibtex @software{ Awad:2025:ILP, author = {Muhammad Awad and Cole Ramos and Keith Lowery}, title = {IntelliPerf: {LLM}-Powered Autonomous {GPU} Performance Engineer}, year = 2025, month = jul, doi = {10.5281/zenodo.15845118}, url = {https://github.com/AMDResearch/intelliperf}, code = {https://github.com/AMDResearch/intelliperf} }

You can also use the CITATION.cff file in the repository root for automatic citation generation.

Contributing

We welcome contributions! Please see our Contributing Guide for details on how to set up your development environment and contribute to the project.

Support

For support, please: 1. Open an issue 2. Contact the development team

License

This project is licensed under the MIT License - see the LICENSE file for details.

Owner

  • Name: AMDResearch
  • Login: AMDResearch
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use IntelliPerf or discuss our work in your research, please always cite our work."
authors:
  - family-names: "Awad"
    given-names: "Muhammad"
  - family-names: "Ramos"
    given-names: "Cole"
  - family-names: "Lowery"
    given-names: "Keith"
title: "IntelliPerf: LLM-Powered Autonomous GPU Performance Engineer"
doi: "10.5281/zenodo.15845118"
date-released: 2025-07-08
url: "https://github.com/AMDResearch/intelliperf"
repository-code: "https://github.com/AMDResearch/intelliperf"
license: MIT
keywords:
  - "GPU optimization"
  - "performance engineering"
  - "machine learning"
  - "LLM"
  - "ROCm"
  - "AMD"
  - "automated optimization"
  - "bank conflicts"
  - "memory access patterns"
  - "atomic contention"
abstract: "IntelliPerf is an automated performance engineering framework that addresses the complex challenge of GPU kernel optimization. It systematizes the optimization workflow by orchestrating a comprehensive toolchain that automatically profiles applications using rocprofiler-compute, identifies high-level bottlenecks with Guided Tuning, pinpoints specific source code lines using Omniprobe, generates optimized code through Large Language Models (LLMs), and validates results using Accordo for correctness and performance." 

GitHub Events

Total
  • Create event: 7
  • Issues event: 9
  • Watch event: 2
  • Delete event: 8
  • Member event: 2
  • Issue comment event: 3
  • Push event: 109
  • Public event: 1
  • Pull request event: 21
  • Pull request review event: 29
  • Pull request review comment event: 39
  • Fork event: 1
Last Year
  • Create event: 7
  • Issues event: 9
  • Watch event: 2
  • Delete event: 8
  • Member event: 2
  • Issue comment event: 3
  • Push event: 109
  • Public event: 1
  • Pull request event: 21
  • Pull request review event: 29
  • Pull request review comment event: 39
  • Fork event: 1

Dependencies

.github/workflows/lint.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
examples/bank_conflict/llm.c/requirements.txt pypi
  • datasets *
  • requests *
  • tiktoken *
  • torch *
  • torchaudio *
  • torchvision *
  • transformers *
external/guided-tuning/requirements-test.txt pypi
  • matplotlib * test
  • pytest * test
  • torch * test
  • torchaudio * test
  • torchvision * test
  • triton * test
external/guided-tuning/requirements.txt pypi
  • duckdb *
  • pandas *
  • rich *
  • tabulate *
pyproject.toml pypi
  • dspy *
  • duckdb *
  • ml_dtypes *
  • openai >=1.0.0
  • pandas *
  • rich *
  • tabulate *
  • tomli *
.github/workflows/ci.yml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v4 composite
  • digitalocean/action-doctl v2 composite