slothy

Assembly super-optimization via constraint solving

https://github.com/slothy-optimizer/slothy

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary

Keywords

assembly-language constraint-programming cryptography-tools post-quantum-cryptography superoptimization

Keywords from Contributors

interactive mesh interpretability profiles distribution sequences generic projection standardization optim
Last synced: 6 months ago · JSON representation ·

Repository

Assembly super-optimization via constraint solving

Basic Info
Statistics
  • Stars: 215
  • Watchers: 9
  • Forks: 20
  • Open Issues: 56
  • Releases: 5
Topics
assembly-language constraint-programming cryptography-tools post-quantum-cryptography superoptimization
Created over 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Citation

README.md

SLOTHY: Assembly optimization via constraint solving

Python 3.9 Python 3.10 Python 3.11 Python 3.12 Python 3.13

About SLOTHY

SLOTHY - Super (Lazy) Optimization of Tricky Handwritten assemblY - is an assembly-level superoptimizer for: 1. Instruction scheduling 2. Register allocation 3. Software pipelining (= periodic loop interleaving)

SLOTHY is generic in the target architecture and microarchitecture. This repository provides instantiations for: - Armv8.1-M+Helium: Cortex-M55, Cortex-M85 - AArch64: Cortex-A55, and experimentally Cortex-A72, Cortex-X/Neoverse-V, Apple M1 (Firestorm, Icestorm)

SLOTHY is discussed in Fast and Clean: Auditable high-performance assembly via constraint solving.

Goal

SLOTHY enables a development workflow where developers write 'clean' assembly by hand, emphasizing the logic of the computation, while SLOTHY automates microarchitecture-specific micro-optimizations. This accelerates development, keeps manually written code artifacts maintainable, and allows to split efforts for formal verification into the separate verification of the clean code and the micro-optimizations.

How it works

SLOTHY is essentially a constraint solver frontend: It converts the input source into a data flow graph and builds a constraint model capturing valid instruction schedulings, register renamings, and periodic loop interleavings. The model is passed to an external constraint solver and, upon success, a satisfying assignment converted back into the final code. Currently, SLOTHY uses Google OR-Tools as its constraint solver backend.

Performance

As a rough rule of thumb, SLOTHY typically optimizes workloads of <50 instructions in seconds to minutes, workloads up to 150 instructions in minutes to hours, while for larger kernels some heuristics are necessary.

Applications

SLOTHY has been used to provide the fastest known implementations of various cryptographic and DSP primitives: For example, the SLOTHY paper discusses the NTTs underlying ML-KEM and ML-DSA for Cortex-{A55, A72, M55, M85}, the FFT for Cortex-{M55,M85}, and the X25519 scalar multiplication for Cortex-A55. You find the clean and optimized source code for those examples in paper/.

Getting started

Have a look at the SLOTHY tutorial for a hands-on and example-based introduction to SLOTHY.

Real world uses

Installation

Option 1: Install from PyPI

The easiest way to install SLOTHY is via pip:

bash pip install slothy

Quick Start

Here's a minimal example of using SLOTHY to optimize assembly code:

```python import slothy import slothy.targets.aarch64.aarch64neon as AArch64Neon import slothy.targets.aarch64.cortexa55 as TargetCortexA55

Create SLOTHY instance for ARM Cortex-A55

s = slothy.Slothy(AArch64Neon, TargetCortexA55)

Load assembly code from file

s.loadsourcefrom_file('example.s')

Optimize the code

s.optimize(start='startlabel', end='endlabel')

Write optimized assembly to file

s.writesourcetofile('exampleoptimized.s')

print("Optimization complete! Check example_optimized.s") ```

Example assembly file (example.s): assembly start_label: ldr x0, [x1] ldr x2, [x3] add x4, x0, x2 str x4, [x5] end_label:

Option 2: Development Installation

For development or to run the examples in this repository, first clone the SLOTHY development repository:

bash git clone https://github.com/slothy-optimizer/slothy.git

Requirements

SLOTHY has been successfully used on

  • Ubuntu-21.10 and up (64-bit),
  • macOS Monterey 12.6 and up.

SLOTHY supports Python 3.9 up to 3.13. For development Python >= 3.10 is required. See requirements.txt for package requirements, and install via pip install -r requirements.txt.

Note: requirements.txt pins versions for reproducibility. If you already have newer versions of some dependencies installed and don't want them downgraded, consider using a virtual environment:

bash python3 -m venv venv ./venv/bin/python3 -m pip install -r requirements.txt

Then, enter the virtual environment via source venv/bin/activate prior to running SLOTHY. Finally, adjust your PATH environment variable to include the directories containining the slothy-cli script and the LLVM llvm-mca tool.

Verify Development Installation

To check that your development setup is complete, try the following from the base directory:

% python3 test.py --tests aarch64_simple0_a55

You should see something like the following:

* Example: aarch64_simple0_a55... INFO:aarch64_simple0_a55:SLOTHY version: 0.1.0 INFO:aarch64_simple0_a55:Instructions in body: 20 INFO:aarch64_simple0_a55.slothy:Perform internal binary search for minimal number of stalls... INFO:aarch64_simple0_a55.slothy:Attempt optimization with max 32 stalls... INFO:aarch64_simple0_a55.slothy:Objective: minimize number of stalls INFO:aarch64_simple0_a55.slothy:Invoking external constraint solver (OR-Tools CP-SAT v9.7.2996) ... INFO:aarch64_simple0_a55.slothy:[0.0721s]: Found 1 solutions so far... objective 19.0, bound 8.0 (minimize number of stalls) INFO:aarch64_simple0_a55.slothy:[0.0765s]: Found 2 solutions so far... objective 18.0, bound 12.0 (minimize number of stalls) INFO:aarch64_simple0_a55.slothy:OPTIMAL, wall time: 0.155224 s INFO:aarch64_simple0_a55.slothy:Booleans in result: 509 INFO:aarch64_simple0_a55.slothy.selfcheck:OK! INFO:aarch64_simple0_a55.slothy:Minimum number of stalls: 18

Option 3: Docker

A dockerfile for an Ubuntu-22.04 based Docker image with all dependencies of SLOTHY and the PQMX+PQAX test environments setup can be found in paper/artifact/slothy.dockerfile. See paper/artifact/README.md for instructions.

Examples

The SLOTHY Tutorial and the examples directory contain numerous exemplary assembly snippets. To try them, use python3 example.py --examples={YOUR_EXAMPLE}. See python3 example.py --help for the list of all available examples.

The use of SLOTHY from the command line is illustrated in paper/scripts/ supporting the real-world optimizations for the NTT, FFT and X25519 discussed in Fast and Clean: Auditable high-performance assembly via constraint solving.

Contributing

For information on how to contribute to SLOTHY, please see CONTRIBUTING.md.

Owner

  • Name: slothy-optimizer
  • Login: slothy-optimizer
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  SLOTHY: Assembly superoptimization via constraint solving
message: >-
  If you use this software, please cite both the article
  from preferred-citation and the software itself
type: software
authors:
  - given-names: Hanno
    family-names: Becker
identifiers:
  - type: url
    value: 'https://eprint.iacr.org/2022/1303.pdf'
    description: IACR ePrint
repository-code: 'https://github.com/slothy-optimizer/slothy'
url: 'https://slothy-optimizer.github.io/slothy'
keywords:
  - superoptimization
  - constraint solving
  - assembly
  - cryptography
license: MIT
preferred-citation:
  title: "Fast and Clean: Auditable high-performance assembly via constraint solving"
  type: conference-paper
  authors:
  - given-names: Amin
    family-names: Abdulrahman
  - given-names: Hanno
    family-names: Becker
  - given-names: Matthias
    family-names: Kannwischer
  - given-names: Fabien
    family-names: Klein
  journal: "IACR Transactions on Cryptographic Hardware and Embedded Systems (TCHES)"
  year: 2024
  conference: CHES
  volume: 1
  publisher:
   name: "International Association for Cryptologic Research (IACR)"

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 660
  • Total Committers: 12
  • Avg Commits per committer: 55.0
  • Development Distribution Score (DDS): 0.429
Past Year
  • Commits: 242
  • Committers: 8
  • Avg Commits per committer: 30.25
  • Development Distribution Score (DDS): 0.587
Top Committers
Name Email Commits
Hanno Becker b****n@a****k 377
Amin Abdulrahman a****n@a****e 123
Matthias J. Kannwischer m****s@k****u 113
Justus Bergermann j****n@m****g 19
Hanno Becker h****r@a****m 11
Rod Chapman r****p@a****m 5
dependabot[bot] 4****] 4
Juneyoung Lee a****e@g****m 3
FabKlein f****n@a****m 2
ggorlen g****n@g****m 1
Juneyoung Lee l****y@a****m 1
Brendan Moran b****n@a****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 71
  • Total pull requests: 390
  • Average time to close issues: 16 days
  • Average time to close pull requests: 8 days
  • Total issue authors: 13
  • Total pull request authors: 15
  • Average comments per issue: 0.87
  • Average comments per pull request: 0.48
  • Merged pull requests: 284
  • Bot issues: 0
  • Bot pull requests: 20
Past Year
  • Issues: 57
  • Pull requests: 282
  • Average time to close issues: 18 days
  • Average time to close pull requests: 5 days
  • Issue authors: 11
  • Pull request authors: 12
  • Average comments per issue: 0.81
  • Average comments per pull request: 0.42
  • Merged pull requests: 202
  • Bot issues: 0
  • Bot pull requests: 20
Top Authors
Issue Authors
  • mkannwischer (23)
  • hanno-becker (19)
  • dop-amin (9)
  • rod-chapman (6)
  • bremoran (3)
  • pennyannn (3)
  • jnk0le (2)
  • aqjune-aws (1)
  • nebeid (1)
  • josephjohnston (1)
  • Unlimitosu (1)
  • willieyz (1)
  • embg (1)
Pull Request Authors
  • mkannwischer (126)
  • hanno-becker (103)
  • dop-amin (86)
  • dependabot[bot] (20)
  • willieyz (14)
  • rod-chapman (10)
  • aqjune (6)
  • cheng-wei-huang0612 (5)
  • SH1E0r1r2y (5)
  • thisisjube (4)
  • aqjune-aws (4)
  • bremoran (2)
  • gtsoul-tech (2)
  • FabKlein (1)
  • hugovincent (1)
Top Labels
Issue Labels
enhancement (12) bug (6) documentation (2) armv8-m-mve (2) build issue (1) question (1)
Pull Request Labels
dependencies (20) python (20) needs-ci (11) enhancement (7) DO-NOT-MERGE (3)

Dependencies

requirements.txt pypi
  • ortools ==9.7.2996
  • pandas ==2.1.1
  • sympy ==1.12