rattle

Forward build system with speculation and caching

https://github.com/ndmitchell/rattle

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Forward build system with speculation and caching

Basic Info
  • Host: GitHub
  • Owner: ndmitchell
  • License: other
  • Language: Haskell
  • Default Branch: master
  • Size: 465 KB
Statistics
  • Stars: 104
  • Watchers: 13
  • Forks: 5
  • Open Issues: 11
  • Releases: 0
Created almost 7 years ago · Last pushed over 2 years ago
Metadata Files
Readme Changelog License Citation

README.md

Rattle Hackage version Stackage version Build status

{Fast, Correct, Simple, Unfinished} - Choose four.

To write a Rattle build system you need to write a script that builds your code. No dependencies, no targets, just a straight-line script that builds your code. Rattle takes that script, and turns it into a build system:

  1. Rattle skips commands if their inputs/outputs haven't changed.
  2. Rattle copies outputs over if a command has been run before on the same inputs (potentially for a different user).
  3. Rattle guesses what commands might be coming next, and speculates of them, to increase parallelism.

The closest build system to Rattle is Fabricate, which follows a similar approach, but only provides step 1. Rattle extends that to steps 2 and 3.

Advantages

Rattle build systems are simple to write - just write commands to build your project. There is only one "Rattle" specific thing, and that's running a command, giving it a tiny API. Rattle build systems can delegate to other build systems quite happily -- if a dependency is built with make, just shell out to make -- no need to recreate the dependencies with Rattle. You don't need to think about dependencies, and because you aren't writing dependencies, there's no way to get them wrong.

Disadvantages

The disadvantages of Rattle are that only system commands are skipped, so the zero build time might not be as quick as it could. Rattle also has to guess at useful commands to get parallelism, and that might go wrong.

Rattle fundamentally relies on tracing the files used by system commands, and currently uses the fsatrace program to do so, which must be installed. As a consequence of the design, if fsatrace fails, then rattle won't work properly. Reasons that fsatrace might fail include:

  • If you use Go compiled binaries on Linux or Mac then it's unlikely to be traced, see Issue 24.
  • If you a Mac OS X 1.10 or higher then system binaries that themselves invoke system binaries they won't be traced, unless you disable System Integrity Protection (which you do at your own risk).
  • If you use statically linked binaries on Linux they probably won't be traced.

It is possible to write alternative tracing mechanisms that wouldn't have these limitations, for example:

  • Bigbro uses ptrace so works for Go binaries and statically linked binaries.
  • traced-fs is an unfinished file system tracer that implements a user file system that would also work for Go binaries and static linking.
  • BuildXL uses a Mac sandbox based on KAuth + TrustedBSD Mandatory Access Control (MAC).

Relationship to Shake

Rattle was designed by the same author as the Shake build system. Rattle is named as a follow-on from Shake, but not a replacement -- Shake continues to be actively developed and is a much more mature system.

Rattle User Manual / Paper

The rest of this README serves as a dumping ground for things that will either go into a user manual or an academic paper (if one ever gets written). As a result, it is a bit of a mismatch, and on top of that it's very definitely not complete.

Abstract

Most build systems are backwards build systems, users define rules to produce targets. These systems have a lot of advantages -- they encode parallelism, they are compositional and the user can easily build any subtree of dependencies. However, they require manually specifying dependencies and all the state of an object must be encoded in its filename. The alternative, and poorly studied, forwards build systems are structured very differently, which we argue produces simpler systems for the end user, at least at small to medium scale.

Introduction

In a backwards build system (e.g. Shake, Bazel, Make) a rule to build a C file will probably look something like:

"*.o" %> \out -> do let src = out -<.> "c" need [src] cmd "gcc -c" src "-o" out

Taking a look at this rule, it has 4 lines:

  1. First line, say how to build .o files.
  2. Compute, from the .o filename alone, the .c source it corresponds to.
  3. Add an explicit dependency on the .c file.
  4. Specify the command and run it.

Of these steps, only step 4 is truely building things, the other steps are about providing the metadata for a backwards build system to operate. And furthermore, in a practical build system, we're likely to have to go further -- encoding optimisation level in the .o file, and depending on any headers that were used.

To describe it as a backwards build systems is apt. In all likelihood we will have had a .c file in mind, then we generate the .o file, then we go back to the .c file -- all a little odd. In contrast, a forward build system can be expressed more directly:

buildCSrc src = cmd "gcc -c" src "-o" (src -<.> "o")

We can talk about the .c file directly, so don't need to extra information back from the output. We don't need to add dependencies. We just specify the command line. Simple.

In the rest of this paper we describe how to construct a forward build system which is as powerful as a modern fully featured backward build system. There are trade offs, particularly at scale, which we describe and in some cases mitigate.

The no-op forward build system

We believe forward build systems should be simple, so let's take the other extreme, and think about what a build system is in it's minimality. At an absolute minimum, it must list the commands to be run. For example, to build a C executable we might say:

gcc -c hello.c -o hello.o gcc hello.o -o hello.exe

However, we've actually gone slightly further than just listing the commands, but supplied them in an order which satisfies the dependencies. We haven't mentioned the dependencies, but we have written the link command after the compile, which is an order that does satisfy them.

Some build systems are able to build without any ordering, by restarting and retrying commands. While an interesting area of research, we don't persue that path, but see David Roundry's system.

So, given this straight-line code that builds the program what are we missing relative to a build system?

  1. Build systems don't rebuild commands whose inputs haven't changed.
  2. Some build systems are able to download results from a shared store (e.g. cloud build systems).
  3. Backwards build systems can extract parallelism.

In the next three sections we add back each of these aspects.

Avoid rebuilds of things that haven't changed

Given a command, if you know what files it reads and writes, and none of those files have changed, you can skip running it providing that if it were deterministic it wouldn't be harmful. As an example, if the compiler produces a different object file each time, but it would be OK with not doing that, then you're fine. In practice this means that any command that isn't relied on to produce fresh entropy (the time, a GUID, a random number) is fine to skip subsequent times. Those commands that do produce fresh entropy are not well suited to a build system anyway, so aren't typically used in build systems.

This got solved by fabricate. You can use a system access tracer to watch what a process reads/writes, and have a database storing the command, previous reads/writes, and then skip it if the values haven't changed. This step is not complex.

However, it a build writes a file twice in a run, or reads it then writes to it, the result will be that a subsequent rebuild will have stuff to do, as it won't end at a quiescent state. To detect that, we introduce the concept of hazards.

Shared build cache

Given knowledge of the reads/writes of a command, it's not too hard to store them in a cloud, and satisfy commands not by rerunning them, but by copying their result from a shared cache. While this works beautifully in theory, in practice it leads to at least three separate problems which we solve.

Relative build dirs

Often the current directory, or users profile directory, will be accessed by commands. These change either if the user has two working directories, or if they use different machines. We solve this by having a substitution table.

Compound commands

Sometimes a command will produce something that is user specific, but the next step will make it not user specific. To fix that we add compound commands. You can do it with a shell script, using pipeline, or with TemplateHaskell blocks.

Non-deterministic builds

We solve this by having .rattle.hash files which we substitute for reading.

Generating parallelism

We use speculation and hazards to figure out if speculation went wrong.

Relative simplicity

Compare Rattle to Shake. The interface to Rattle is basically "run this command", providing a vastly simpler API. In contast, Shake has a lot of API to learn and use.

Owner

  • Name: Neil Mitchell
  • Login: ndmitchell
  • Kind: user
  • Location: Cambridge, UK
  • Company: Meta

Haskell/Rust programmer. All code is open source and licensed by me, not my employer. All views are my own.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you need to cite Rattle, this metadata is the paper that introduces Rattle."
repository-code: "https://github.com/ndmitchell/rattle"
title: Rattle
preferred-citation:
  type: article
  authors: 
  - family-names: Spall
    given-names: Sarah
  - family-names: Mitchell
    given-names: Neil
    orcid: "https://orcid.org/0000-0001-5171-9726"
  - family-names: Tobin-Hochstadt
    given-names: Sam
    orcid: "https://orcid.org/0000-0003-1302-6499"
  doi: "10.1145/3428237"
  month: 11
  day: 18
  journal: Proceedings of the ACM Programming Languages 4, OOPSLA
  number: 169
  title: "Build Systems with Perfect Dependencies"
  year: 2020
  url: https://ndmitchell.com/downloads/paper-build_scripts_with_perfect_dependencies-18_nov_2020.pdf
  abstract: "Build scripts for most build systems describe the actions to run, and the dependencies between those actions - but often build scripts get those dependencies wrong. Most build scripts have both too few dependencies (leading to incorrect build outputs) and too many dependencies (leading to excessive rebuilds and reduced parallelism). Any programmer who has wondered why a small change led to excess compilation, or who resorted to a clean step, has suffered the ill effects of incorrect dependency specification. We outline a build system where dependencies are not specified, but instead captured by tracing execution. The consequence is that dependencies are always correct by construction and build scripts are easier to write. The simplest implementation of our approach would lose parallelism, but we are able to recover parallelism using speculation."

GitHub Events

Total
  • Watch event: 5
Last Year
  • Watch event: 5

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 470
  • Total Committers: 4
  • Avg Commits per committer: 117.5
  • Development Distribution Score (DDS): 0.13
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Neil Mitchell n****l@g****m 409
Sarah s****l@i****u 59
Vaibhav Sagar v****r@g****m 1
Sam Tobin-Hochstadt s****0@g****m 1
Committer Domains (Top 20 + Academic)
iu.edu: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 14
  • Total pull requests: 20
  • Average time to close issues: 25 days
  • Average time to close pull requests: 11 days
  • Total issue authors: 5
  • Total pull request authors: 3
  • Average comments per issue: 2.86
  • Average comments per pull request: 1.5
  • Merged pull requests: 17
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • spall (8)
  • ndmitchell (3)
  • dbarowy (1)
  • dhess (1)
  • chadbrewbaker (1)
Pull Request Authors
  • spall (18)
  • samth (1)
  • vaibhavsagar (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • hackage 945 total
  • Total dependent packages: 0
  • Total dependent repositories: 8
  • Total versions: 2
  • Total maintainers: 1
hackage.haskell.org: rattle

A forward build system like Fabrciate but with speculation and remote caching.

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 8
  • Downloads: 945 Total
Rankings
Stargazers count: 8.5%
Dependent packages count: 19.1%
Forks count: 19.9%
Dependent repos count: 20.3%
Average: 30.1%
Downloads: 82.6%
Maintainers (1)
Last synced: 7 months ago

Dependencies

rattle.cabal hackage
  • async *
  • base >=4.10 && <5
  • bytestring *
  • cmdargs *
  • cryptohash-sha256 *
  • deepseq >=1.1
  • directory *
  • extra >=1.6.14
  • filepath *
  • filepattern *
  • hashable >=1.1.2.3
  • heaps *
  • js-dgtable *
  • js-flot *
  • js-jquery *
  • process *
  • rattle *
  • shake >=0.19
  • template-haskell *
  • terminal-size *
  • time *
  • time >=1.9.1
  • transformers >=0.2
  • unix >=2.7.2.2
  • unliftio-core >=0.1.2
  • unordered-containers >=0.2.7
  • utf8-string >=1.0.1.1
  • Cabal >=2.2 test
  • async * test
  • base >=4.10 && <5 test
  • bytestring * test
  • cryptohash-sha256 * test
  • deepseq >=1.1 test
  • directory * test
  • extra >=1.6.14 test
  • filepath * test
  • filepattern * test
  • hashable >=1.1.2.3 test
  • heaps * test
  • js-dgtable * test
  • js-flot * test
  • js-jquery * test
  • shake >=0.19 test
  • template-haskell * test
  • terminal-size * test
  • time * test
  • transformers >=0.2 test
  • unix >=2.7.2.2 test
  • unliftio-core >=0.1.2 test
  • unordered-containers >=0.2.7 test
  • utf8-string >=1.0.1.1 test
.github/workflows/ci.yml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • haskell/actions/setup v2 composite
  • ndmitchell/neil master composite