hyperast - HyperAST: Incrementally Mining Large Source Code Repositories

HyperAST: Incrementally Mining Large Source Code Repositories

Abstract

Modern software systems are large, with a project like Chromium reaching more than 30 million lines of code. Analyzing these large-scale projects over multiple versions rapidly becomes very expensive, and creating tools that can work at this scale is a challenge. This paper presents the HyperAST approach, that exploits the locality and redundancy of source code, to maintain thousands of Abstract Syntax Tree (AST) versions in memory.In particular, we contribute a programmatic interface to HyperAST that helps define the incremental computation of code metrics and efficient explorations of the fine-grained abstract syntax representation of source code.

Setup

Using the Docker Image (made with nix)

download the image attached to this release: HyperAST-.v0.2.0.tar.gz
load it with docker: docker load < HyperAST-.v0.2.0.tar.gz
run the image: docker run hyperAST:0.2.0 /scripting (it should print instructions about the executable)

If you want to build the sources yourself

The HyperAST is a standard rust project in its structure. It uses different crates to organize different components https://doc.rust-lang.org/cargo/reference/workspaces.html (similar to Maven modules)

1) install https://rustup.rs/ (and see https://rust-lang.github.io/rustup/concepts/channels.html for documentation, as HyperAST requires a nightly Rust version)

2) clone or download the https://github.com/HyperAST/HyperAST repository then go the root directory

3) compile the executable with: cargo run --release -p backend --bin scripting

- warning: depending on your exact configuration and environment, you might be missing system dependencies and the right rust version -> most of the time rustc and rustup tell you what to do to install the requirements.

Instructions to reproduce results

bash bench_scripting/run_scriptings.sh <depth> <metric> <out> describes the parameters for each repository. Note that the binary you created in the previous section also contains the description of the following parameters.

- run_scriptings.sh is very simple in itself, it mainly help with reproducibility, like storing the commit ids and get the results for all the projects.

- <depth> sets the number of commits to process, e.g. 100.

- <metric> sets the script that will be run, and its corresponding metric, such as, size, mcc, LoC.

- <out> sets the directory where benchmark results will be written.

Reproducing the result that were used for the plots in the paper:

- for Mcc: bash bench_scripting/run_scriptings.sh 100 mcc bench_scripting/

- for LoC: bash bench_scripting/run_scriptings.sh 100 LoC bench_scripting/

Reminder: you can also run the executable more directly: `./target/release/scripting` or `docker run hyperAST:0.2.0 /scripting`

- use the -h option to get the description of the command

- Rust
Published by quentinLeDilavrec over 1 year ago

hyperast - v0.1.2

- Rust
Published by quentinLeDilavrec over 1 year ago

hyperast - v0.1.1

- Rust
Published by quentinLeDilavrec over 1 year ago

hyperast - HyperAST: Incrementally Mining Large Source Code Repositories

HyperAST: Incrementally Mining Large Source Code Repositories

Abstract

Modern software systems are large, with a project like Chromium reaching more than 30 million lines of code.Analyzing these large-scale projects over multiple versions rapidly becomes very expensive, and creating tools that can work at this scale is a challenge.This paper presents the HyperAST approach, that exploits the locality and redundancy of source code, to maintain thousands of Abstract Syntax Tree (AST) versions in memory.In particular, we contribute a programmatic interface to HyperAST that helps define the incremental computation of code metrics and efficient explorations of the fine-grained abstract syntax representation of source code.

Installation instructions for reproducing results

The HyperAST is a standard rust project in its structure. It uses different workspaces to organise different components https://doc.rust-lang.org/cargo/reference/workspaces.html (similar to Maven modules)

1) install https://rustup.rs/ (and see https://rust-lang.github.io/rustup/concepts/channels.html for documentation, as HyperAST requires a nightly Rust version)

2) clone or download the https://github.com/HyperAST/HyperAST repository then go the root directory

3) compile the the executable with: cargo run --release -p client --bin scripting

- warning: depending on your exact configuration and environment you might be misshing system depedencies and the right rust version -> most of the time rustc and rustup tell you what to do to install the requirements.

4) then to reproduces the results from the paper, you can run: bash bench_scripting/run_scriptings.sh <depth> <metric> <out>

- for Mcc: bash bench_scripting/run_scriptings.sh 100 mcc bench_scripting/

- for LoC: bash bench_scripting/run_scriptings.sh 100 LoC bench_scripting/

- the script is very simple in itself, it mainly help with reproducibility like storing the commitids and get the results for all the projects.

- <depth> e.g. 100. changes the number of processed commits

- <metric> e.g. mcc. changes the preconfigured commit

- <out> tells the directory where to write the benchmark results

5) You can also run the executable more directly: ./target/release/scripting