slothy
Assembly super-optimization via constraint solving
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Assembly super-optimization via constraint solving
Basic Info
- Host: GitHub
- Owner: slothy-optimizer
- License: other
- Language: Assembly
- Default Branch: main
- Homepage: https://slothy-optimizer.github.io/slothy/
- Size: 105 MB
Statistics
- Stars: 215
- Watchers: 9
- Forks: 20
- Open Issues: 56
- Releases: 5
Topics
Metadata Files
README.md
SLOTHY: Assembly optimization via constraint solving
About SLOTHY
SLOTHY - Super (Lazy) Optimization of Tricky Handwritten assemblY - is an assembly-level superoptimizer for: 1. Instruction scheduling 2. Register allocation 3. Software pipelining (= periodic loop interleaving)
SLOTHY is generic in the target architecture and microarchitecture. This repository provides instantiations for: - Armv8.1-M+Helium: Cortex-M55, Cortex-M85 - AArch64: Cortex-A55, and experimentally Cortex-A72, Cortex-X/Neoverse-V, Apple M1 (Firestorm, Icestorm)
SLOTHY is discussed in Fast and Clean: Auditable high-performance assembly via constraint solving.
Goal
SLOTHY enables a development workflow where developers write 'clean' assembly by hand, emphasizing the logic of the computation, while SLOTHY automates microarchitecture-specific micro-optimizations. This accelerates development, keeps manually written code artifacts maintainable, and allows to split efforts for formal verification into the separate verification of the clean code and the micro-optimizations.
How it works
SLOTHY is essentially a constraint solver frontend: It converts the input source into a data flow graph and builds a constraint model capturing valid instruction schedulings, register renamings, and periodic loop interleavings. The model is passed to an external constraint solver and, upon success, a satisfying assignment converted back into the final code. Currently, SLOTHY uses Google OR-Tools as its constraint solver backend.
Performance
As a rough rule of thumb, SLOTHY typically optimizes workloads of <50 instructions in seconds to minutes, workloads up to 150 instructions in minutes to hours, while for larger kernels some heuristics are necessary.
Applications
SLOTHY has been used to provide the fastest known implementations of various cryptographic and DSP primitives:
For example, the SLOTHY paper discusses the NTTs underlying ML-KEM and ML-DSA for
Cortex-{A55, A72, M55, M85}, the FFT for Cortex-{M55,M85}, and the X25519 scalar multiplication for Cortex-A55. You find
the clean and optimized source code for those examples in paper/.
Getting started
Have a look at the SLOTHY tutorial for a hands-on and example-based introduction to SLOTHY.
Real world uses
AWS libcrypto (AWS-LC): SLOTHY-optimized X25519 code based on our un-interleaved form of the original code by Emil Lenngren has been formally verified and included in s2n-bignum (the bignum component of AWS-LC) and merged into AWS-LC. This was the topic of a Real World Crypto 2024 talk.
s2n-bignum routinely employs SLOTHY for finding further highly optimized ECC implementations (e.g., P256, P384, P521 and verifies them through automated equivalence-checking in HOL-Light.
Arm EndpointAI: SLOTHY-optimized code has been deployed to the CMSIS DSP Library for the radix-4 CFFT routines as part of the Arm EndpointAI project in this commit.
mlkem-native: AArch64 assembly routines of ML-KEM are automatically optimized using SLOTHY.
pqm7: Benchmarking framework for the Arm Cortex-M7 which has been created as a case study for automated microarchitectural migrations of software libraries using SLOTHY. See our paper for more details.
Installation
Option 1: Install from PyPI
The easiest way to install SLOTHY is via pip:
bash
pip install slothy
Quick Start
Here's a minimal example of using SLOTHY to optimize assembly code:
```python import slothy import slothy.targets.aarch64.aarch64neon as AArch64Neon import slothy.targets.aarch64.cortexa55 as TargetCortexA55
Create SLOTHY instance for ARM Cortex-A55
s = slothy.Slothy(AArch64Neon, TargetCortexA55)
Load assembly code from file
s.loadsourcefrom_file('example.s')
Optimize the code
s.optimize(start='startlabel', end='endlabel')
Write optimized assembly to file
s.writesourcetofile('exampleoptimized.s')
print("Optimization complete! Check example_optimized.s") ```
Example assembly file (example.s):
assembly
start_label:
ldr x0, [x1]
ldr x2, [x3]
add x4, x0, x2
str x4, [x5]
end_label:
Option 2: Development Installation
For development or to run the examples in this repository, first clone the SLOTHY development repository:
bash
git clone https://github.com/slothy-optimizer/slothy.git
Requirements
SLOTHY has been successfully used on
- Ubuntu-21.10 and up (64-bit),
- macOS Monterey 12.6 and up.
SLOTHY supports Python 3.9 up to 3.13. For development Python >= 3.10 is required.
See requirements.txt for package requirements, and install via pip install -r requirements.txt.
Note: requirements.txt pins versions for reproducibility. If you already have newer versions of some dependencies
installed and don't want them downgraded, consider using a virtual environment:
bash
python3 -m venv venv
./venv/bin/python3 -m pip install -r requirements.txt
Then, enter the virtual environment via source venv/bin/activate prior to running SLOTHY.
Finally, adjust your PATH environment variable to include the directories containining
the slothy-cli script and the LLVM llvm-mca tool.
Verify Development Installation
To check that your development setup is complete, try the following from the base directory:
% python3 test.py --tests aarch64_simple0_a55
You should see something like the following:
* Example: aarch64_simple0_a55...
INFO:aarch64_simple0_a55:SLOTHY version: 0.1.0
INFO:aarch64_simple0_a55:Instructions in body: 20
INFO:aarch64_simple0_a55.slothy:Perform internal binary search for minimal number of stalls...
INFO:aarch64_simple0_a55.slothy:Attempt optimization with max 32 stalls...
INFO:aarch64_simple0_a55.slothy:Objective: minimize number of stalls
INFO:aarch64_simple0_a55.slothy:Invoking external constraint solver (OR-Tools CP-SAT v9.7.2996) ...
INFO:aarch64_simple0_a55.slothy:[0.0721s]: Found 1 solutions so far... objective 19.0, bound 8.0 (minimize number of stalls)
INFO:aarch64_simple0_a55.slothy:[0.0765s]: Found 2 solutions so far... objective 18.0, bound 12.0 (minimize number of stalls)
INFO:aarch64_simple0_a55.slothy:OPTIMAL, wall time: 0.155224 s
INFO:aarch64_simple0_a55.slothy:Booleans in result: 509
INFO:aarch64_simple0_a55.slothy.selfcheck:OK!
INFO:aarch64_simple0_a55.slothy:Minimum number of stalls: 18
Option 3: Docker
A dockerfile for an Ubuntu-22.04 based Docker image with all dependencies of SLOTHY and the PQMX+PQAX test environments setup can be found in paper/artifact/slothy.dockerfile. See paper/artifact/README.md for instructions.
Examples
The SLOTHY Tutorial and the examples directory contain numerous exemplary
assembly snippets. To try them, use python3 example.py --examples={YOUR_EXAMPLE}. See python3 example.py --help for
the list of all available examples.
The use of SLOTHY from the command line is illustrated in paper/scripts/ supporting the real-world optimizations for the NTT, FFT and X25519 discussed in Fast and Clean: Auditable high-performance assembly via constraint solving.
Contributing
For information on how to contribute to SLOTHY, please see
CONTRIBUTING.md.
Owner
- Name: slothy-optimizer
- Login: slothy-optimizer
- Kind: organization
- Website: https://slothy-optimizer.github.io/slothy/
- Repositories: 3
- Profile: https://github.com/slothy-optimizer
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
SLOTHY: Assembly superoptimization via constraint solving
message: >-
If you use this software, please cite both the article
from preferred-citation and the software itself
type: software
authors:
- given-names: Hanno
family-names: Becker
identifiers:
- type: url
value: 'https://eprint.iacr.org/2022/1303.pdf'
description: IACR ePrint
repository-code: 'https://github.com/slothy-optimizer/slothy'
url: 'https://slothy-optimizer.github.io/slothy'
keywords:
- superoptimization
- constraint solving
- assembly
- cryptography
license: MIT
preferred-citation:
title: "Fast and Clean: Auditable high-performance assembly via constraint solving"
type: conference-paper
authors:
- given-names: Amin
family-names: Abdulrahman
- given-names: Hanno
family-names: Becker
- given-names: Matthias
family-names: Kannwischer
- given-names: Fabien
family-names: Klein
journal: "IACR Transactions on Cryptographic Hardware and Embedded Systems (TCHES)"
year: 2024
conference: CHES
volume: 1
publisher:
name: "International Association for Cryptologic Research (IACR)"
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Hanno Becker | b****n@a****k | 377 |
| Amin Abdulrahman | a****n@a****e | 123 |
| Matthias J. Kannwischer | m****s@k****u | 113 |
| Justus Bergermann | j****n@m****g | 19 |
| Hanno Becker | h****r@a****m | 11 |
| Rod Chapman | r****p@a****m | 5 |
| dependabot[bot] | 4****] | 4 |
| Juneyoung Lee | a****e@g****m | 3 |
| FabKlein | f****n@a****m | 2 |
| ggorlen | g****n@g****m | 1 |
| Juneyoung Lee | l****y@a****m | 1 |
| Brendan Moran | b****n@a****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 71
- Total pull requests: 390
- Average time to close issues: 16 days
- Average time to close pull requests: 8 days
- Total issue authors: 13
- Total pull request authors: 15
- Average comments per issue: 0.87
- Average comments per pull request: 0.48
- Merged pull requests: 284
- Bot issues: 0
- Bot pull requests: 20
Past Year
- Issues: 57
- Pull requests: 282
- Average time to close issues: 18 days
- Average time to close pull requests: 5 days
- Issue authors: 11
- Pull request authors: 12
- Average comments per issue: 0.81
- Average comments per pull request: 0.42
- Merged pull requests: 202
- Bot issues: 0
- Bot pull requests: 20
Top Authors
Issue Authors
- mkannwischer (23)
- hanno-becker (19)
- dop-amin (9)
- rod-chapman (6)
- bremoran (3)
- pennyannn (3)
- jnk0le (2)
- aqjune-aws (1)
- nebeid (1)
- josephjohnston (1)
- Unlimitosu (1)
- willieyz (1)
- embg (1)
Pull Request Authors
- mkannwischer (126)
- hanno-becker (103)
- dop-amin (86)
- dependabot[bot] (20)
- willieyz (14)
- rod-chapman (10)
- aqjune (6)
- cheng-wei-huang0612 (5)
- SH1E0r1r2y (5)
- thisisjube (4)
- aqjune-aws (4)
- bremoran (2)
- gtsoul-tech (2)
- FabKlein (1)
- hugovincent (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- ortools ==9.7.2996
- pandas ==2.1.1
- sympy ==1.12