prospector
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: acm.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: iskindar
- License: apache-2.0
- Language: C
- Default Branch: master
- Size: 2.83 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
README
Prospector is a directed greybox fuzzing for large-scale target sets, currently support the targets generated by Address Sanitizer(ASan).
For more details, check out our paper. To cite our work, you can use the following BibTeX entry:
@inproceedings{zhang2024prospector,
title={Prospector: Boosting Directed Greybox Fuzzing for Large-Scale Target Sets with Iterative Prioritization},
author={Zhang, Zhijie and Chen, Liwei and Wei, Haolai and Shi, Gang and Meng, Dan},
booktitle={Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis},
pages={1351--1363},
year={2024}
}
1. Directory Structure
Our local framework contains mainly two directory, "Prospector" contains the source code. "artifact" contains all the needed data and scripts to reproduce our experiment.
1.1 Prospector Structure
├── src # Source code of the fuzzing part in Prospector
│ ├── afl-fuzz-prospector.c # including 3.1 Target Priority Analysis(Dynamic), 3.2 focused targets maintenance, 3.5 byte scheduling.
│ ├── afl-fuzz-queue.c # including 3.3 Explore-exploit Transitions, 3.4 Seed Selection
│ ├── afl-fuzz-one.c # including 3.5 byte scheduling,
│ ├── afl-fishfuzz.cc # including 3.2 focused targets maintenance(modified from the target ranking of FishFuzz)
│ └── ...
├── scripts # Some scripts of static analysis
│ ├── gen_initial_priority.py # including 3.1 Target Priority Analysis(Static)
│ └── ...
├── instrumentation # Instrumentation code of Prospector
│ ├── afl-fish-pass.so.cc # modified to support 3.1 Target Priority Analysis
│ └── ...
All changes we made can be searched with the pattern: ``` /* prospector begin*/
/* prospector end/
or
/ prospector modification*/
```
1.2 Dockerfile Structure
├── Docker-base # All scripts and data required to run an example.
│ ├── example # example usage of Prospector
│ ├── Dockerfile
│ └── Prospector # source code of Prospector
├── Docker-main # All scripts and data required to build the Docker image of FishFuzz, AFL++ and Prospector in experiment RQ1 and RQ2
│ ├── benchmark # build scripts of benchmark progams
│ ├── Dockerfile
│ └── source
│ └── Prospector # source code of Prospector
├── Docker-parmesan # All scripts and data required to build the Docker image of ParmeSan in experiment RQ1
│ ├── benchmark
│ └── Dockerfile
├── Docker-windranger # All scripts and data required to build the Docker image of WindRanger in experiment RQ1
│ ├── benchmark
│ ├── Dockerfile
│ ├── setup_windranger.sh
│ └── windranger.tar.gz # downloaded from https://github.com/prosyslab/DAFL-artifact/blob/main/docker-setup/windranger.tar.gz
├── Docker-vuln # All scripts and data required to build the Docker image of experiment RQ3
│ ├── benchmark_vuln
│ ├── Dockerfile
│ └── source
1.3 Experimental Result Structure
├── Hyper # Intermediate results of the experiment Hyperparameter Setup
│ ├── Hyper_crashes.tar.gz # crashes generated from Hyperparameter Setup experiment
│ ├── Hyper_log # raw data of time to exposing bugs in Hyperparameter Setup
│ └── scripts # scripts to reproduce Hyperparameter Setup
├── RQ1 # Intermediate results of the experiment RQ1
│ ├── runtime
│ │ └── corpus # initial seed corpus
│ ├── RQ1_crashes.tar.gz # crashes generated from RQ1 experiment
│ ├── RQ1_log # raw data of time to exposing bugs in RQ1
│ └── scripts # scripts to reproduce RQ1
├── RQ2 # Intermediate results of the experiment RQ2
│ ├── RQ2_crashes.tar.gz # crashes generated from RQ2 experiment
│ ├── runtime
│ │ └── corpus # initial seed corpus
│ ├── RQ2_log # # raw data of time to exposing bugs in RQ2
│ └── scripts # scripts to reproduce RQ2
├── RQ3 # Intermediate results of the experiment RQ3
│ ├── RQ3_crashes.tar.gz # crashes generated from RQ2 experiment
│ ├── runtime
│ │ └── corpus # initial seed corpus
│ ├── RQ3_log # # raw data of time to exposing bugs in RQ3
│ └── scripts # scripts to reproduce RQ3
├── Docker-fuzzing # Build Docker images for fuzzing
├── binary.tar.gz # All binaries used in RQ1
├── benchmark_vuln.tar.gz # All binaries used in RQ3
└── Dockerfile # Dockerfile to build prospector-artifact:issta24
2. Getting Started
2.1 System Requirements
To run the experiments in the paper, we used a 64-core Intel Xeon Silver 4216 CPU (2.10GHz) machine with 256GB of RAM and Ubuntu 20.04. Out of 64 cores, We utilized no more 48 cores assigned for each core.
Additionally, we assume that the following environment settings are met.
- Ubuntu 20.04
- Docker
- Python 3.8+ (Other versions of Python may work, but not tested.)
If you want to run fuzzing experiment, please follow the system configuration.
To run AFL-based fuzzers, you should first fix the core dump name pattern.
$ echo core | sudo tee /proc/sys/kernel/core_pattern
If your system has /sys/devices/system/cpu/cpu*/cpufreq directory, fuzzer may also complain about the CPU frequency scaling configuration. Otherwise, just ignore it.Check the current configuration and remember it if you want to restore it later. Then, set it to performance.
$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
powersave
powersave
powersave
powersave
$ echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
2.2 Installation
We provide two types of images:
- Base: It includes the Prospector tool and an example usage and validation of intermediate results of our experiments.
- ISSTA24: It encompasses the experimental data in the paper.
Recommended You can install them by pulling the Docker image.
If you just want to run the example usage and validate the intermediate results, you can pull the base image.
docker pull iskindar/prospector-artifact:base
If you want to run the fuzzing experiment of fuzzers(AFL++,Fish++,Prospector), you can pull this image.
docker pull iskindar/prospector-artifact:issta24
Or you can manually build the Docker image, but this process may take a longer time, usually for serveral hours.
Build the experiment environment for example usage of Prospector and validation of intermediate results of our experiments.
cd Docker/Docker-base
docker build -t prospector-artifact:base .
Build the experiment environment for afl++, Fish++, Prospector.
cd Docker/Docker-main
docker build -t prospector-artifact:main .
Build the experiment environment for Parmesan.
cd Docker/Docker-parmesan
docker build -t prospector-artifact:parmesan .
Build the experiment environment for WindRanger.
cd Docker/Docker-windranger
docker build -t prospector-artifact:windranger .
3. Usage (Minimal Working Example)
We take the project mp3gain-1.5.2 used in the paper as an example, all command are integrated into one script "example.sh". You can just run the following command to use Prospector.
```
step 1, Create Container
docker run -ti iskindar/prospector-artifact:base bash
step 2, run example.sh
cd example && ./example.sh ```
If everything goes smoothly, Prospector will find a crash within a short period of time (~10min). Note that due to variations in machine performance, randomness, and other factors, the time of finding a crash may differ from the image below. However, it should be detected within a short period of time.
In the example.sh, there are four main steps:
- Generation of intermediate representations for static analysis.
- Static analysis: This includes distance calculation (origin from FishFuzz) and priority analysis.
- Generation of the final binary for fuzzing.
- Fuzzing. ``` #!/bin/bash
set -x
step 1, unzip the project and generate a bc file
BINNAME="mp3gain" PREFUZZ=/Prospector/ TMPDIR=$PWD/TEMP$BINNAME tar zxvf mp3gain-1.5.2.tar.gz cd mp3gain-1.5.2 export CC="clang -fsanitize=address -flto -fuse-ld=gold -Wl,-plugin-opt=save-temps -Wno-unused-command-line-argument -g" export CXX="clang++ -fsanitize=address -flto -fuse-ld=gold -Wl,-plugin-opt=save-temps -Wno-unused-command-line-argument -g"
The following two lines of sed commands only target the program mp3gain, other programs do not require them.
sed -i 's/CC=/CC?=/' Makefile sed -i 's/CFLAGS=/CFLAGS?=/' Makefile make -j$(nproc)
step 2, static analysis (i.e., static distance map calculation, priority analysis)
ADDITIONALRENAME="-load $PREFUZZ/afl-fish-pass.so -test -outdir=$TMPDIR -pmode=rename" ADDITIONALCOV="-load $PREFUZZ/SanitizerCoveragePCGUARD.so -cov" ADDITIONALANALYSIS="-load $PREFUZZ/afl-fish-pass.so -test -outdir=$TMPDIR -pmode=aonly" BCPATH=$(find . -name "$BINNAME.0.5.precodegen.bc" -printf "%h\n")/ mkdir -p $TMPDIR opt $ADDITIONALRENAME $BCPATH$BINNAME.0.5.precodegen.bc -o $BCPATH$BINNAME.rename.bc opt $ADDITIONALCOV $BCPATH$BINNAME.rename.bc -o $BCPATH$BINNAME.cov.bc opt $ADDITIONALANALYSIS $BCPATH$BINNAME.rename.bc -o $BCPATH$BIN_NAME.temp.bc
static distance map calculation
opt -dot-callgraph $BCPATH$BINNAME.0.5.precodegen.bc && mv $BCPATH$BINNAME.0.5.precodegen.bc.callgraph.dot $TMPDIR/dot-files/callgraph.dot $PREFUZZ/scripts/geninitialdistance.py $TMPDIR
priority analysis
$PREFUZZ/scripts/geninitialpriority.py --workdir $TMP_DIR --disable none
step 3, generating final target
ADDITIONALFUNC="-pmode=fonly -funcid=$TMPDIR/funcid.csv -outdir=$TMPDIR" CC=$PREFUZZ/afl-fish-fast CXX=$PREFUZZ/afl-fish-fast++ ASANLIBS="/llvm/build/lib/clang/12.0.1/lib/linux/libclangrt.asan-x8664.a" EXTRALDFLAGS="-ldl -lpthread -lm -lstdc++ -lrt" $CC $ADDITIONALFUNC $BCPATH$BINNAME.cov.bc -o $BINNAME.fuzz $EXTRALDFLAGS $ASAN_LIBS
step 4, fuzzing
TMPDIR=/example/TEMPmp3gain AFLNOAFFINITY=1 AFLSKIPCRASHES=1 /Prospector/afl-fuzz -i /example/mp3 -o /example/output -m none -t 1000+ -D -- /example/mp3gain-1.5.2/mp3gain.fuzz @@ ```
4. Reproducing the Results in the Paper
4.1 Overview
The overall replication process will be introduced below.
Step 1: pull docker images
In this step, we pull the docker image from dockerhub.
shell
docker pull iskindar/prospector-artifact:issta24
Step 2: generated the script to fuzz.
```shell
Assume you are in RQ1, RQ2, RQ3 or Hyper directory.
python3 scripts/generatescript.py -b "$PWD/runtime/fuzzscript" ```
Step 3: generate the commands to run evaluation.
This script will automatically generate the command you need to execute to start the fuzzing campain, copy-paste them to the shell to start the campaign.
```shell
Assume you are in RQ1, RQ2, RQ3 or Hyper directory.
python3 scripts/generateruntime.py -b "$PWD/runtime" > runexp.sh chmod +x ./run_exp.sh
Please check the content of run_exp.sh before you run it. As this will use 24(programs)x5(fuzzers) cores to run.
./run_exp.sh
the commands in run_exp.sh will looks like this
docker run -dt -v currentdir/runtime:/work --name prospectorexiv2 --cpuset-cpus 0 iskindar/prospector-artifact:issta24 "/work/fuzz_script/prospector/exiv2.sh"
docker run -dt -v currentdir/runtime:/work --name prospectorcflow --cpuset-cpus 1 iskindar/prospector-artifact:issta24 "/work/fuzz_script/prospector/exiv2.sh"
....
```
Step 4: manually stop the container after 24 hours and generate the crashes report.
```shell
stop and clean the fuzzing process
docker rm -f $(docker ps -a -q -f "ancestor=$IMAGE_NAME") sudo chown -R $(id -u):$(id -g) runtime/out
copy evaluation results to results folder,
-r 0 means it's the first round of results, change the round number accordingly if there are multiple rounds's result
mkdir results/ python3 scripts/copy_results.py -s "$PWD/runtime" -d "$PWD/results/" -r 0
create a container for analysis
docker run -ti --name validate $IMAGE_NAME bash docker cp scripts validate:/ docker cp results validate:/
step into container to validate crashes
cd results find . -name README.txt -exec rm {} \; find . -name .state -exec rm -r {} \; find . -name others -exec rm -r {} \;
run crash triage analysis
python3 scripts/analysis.py -b /results -c scripts/asan.crash.json -r 0 -o /results/log
plot the results of one round
python3 scripts/print_result.py -b /results/log/0/ ```
Since reproducing all experiments requires 39984 CPU hours, we provide the intermediate results for each RQ experiment stored in the corresponding RQ folder.
4.2 RQ1
4.2.1 Build Docker Images
RQ1 experiments require building three different Docker images because the dependency environments for different fuzzers may conflict:
- Docker-main contains the environments for AFL++, FishFuzz, and Prospector.
- Docker-parmesan contains the environment for parmesan.
- Docker-windranger contains the environment for windranger.
To build these images, simply navigate to the corresponding Docker folder and enter the following command: ```
build docker images for AFL++, FishFuzz, and Prospector
cd Docker-main docker build -t prospector-artifact:issta24 .
build docker images for parmesan
cd Docker-parmesan docker build -t parmesan:issta .
build docker images for windranger (Please build docker-main before)
cd Docker-windranger docker build -t windranger:issta . ```
4.2.2 Fuzzing Preparation
First, extract the compiled binary from the pulled images using the following command:
cd Docker-fuzzing
docker build -t prospector-artifact:fuzz .
Then, generate a fuzz script for each program for every fuzzer by running the following command:
cd RQ1
python3 scripts/generate_script.py -b "$PWD/runtime/fuzz_script"
Now you can see that the fuzzing scripts have been generated in the "runtime/fuzz_script" directory.
Subsequently, use the following command to create Docker containers and initiate fuzzing: ``` python3 scripts/generateruntime.py -b "$PWD/runtime" > runexp.sh chmod +x ./runexp.sh ./runexp.sh
the commands in run_exp.sh will looks like this
docker run -dt -v .../RQ1/runtime:/work --name prospectorflvmeta --cpuset-cpus 27 iskindar/prospector-artifact:issta24 "/work/fuzzscript/prospector/flvmeta.sh"
....
``
To validate if the fuzzing process runs normally, you can use commanddocker stats` to identify.
The CPU % attribute of the container displaying 100% indicates that the fuzzing process is running smoothly.
Manually stop the container after 24 hours and generate the crashes report. You can stop and clean the fuzzing process by the following command.
docker rm -f $(docker ps -a -q -f "ancestor=$IMAGE_NAME")
sudo chown -R $(id -u):$(id -g) runtime/out
Copy evaluation results to results folder. -r 0 means it's the first round of results, change the round number accordingly if there are multiple rounds's result.
mkdir RQ1_crashes/
python3 scripts/copy_results.py -s "$PWD/runtime" -d "$PWD/RQ1_crashes/" -r 0
We compressed 10 rounds of crash seeds into "RQ1_crashes.tar.gz".
4.2.3 Crash Triaging
Now, we have collected the crash seeds via fuzzing. The next step is to triage these crashes.
First, create a container for analyzing crashes.
docker run -ti --name validate iskindar/prospector-artifact:issta24 bash
Please run the following command to remove unnecessary files from RQ1crashes. If you are using the crashes we provided, you can skip this step. ``` cd RQ1/ tar zxvf RQ1crashes.tar.gz cd RQ1_crashes find . -name README.txt -exec rm {} \; find . -name .state -exec rm -r {} \; find . -name others -exec rm -r {} \; ```
Run the following command to analyze the results of the first round. (>3 hours, mostly due to seeds generated by ParmeSan)
python3 scripts/analysis.py -b /RQ1/RQ1_crashes -c scripts/asan.crash.json -r 0 -o /RQ1/RQ1_log
Or run cd scripts && ./triage.sh to analyze the results of all rounds. (>15 hours)
Run the following command to print a summary of the results from the first round.
python3 scripts/print_result.py -b /RQ1/RQ1_log/0/
Alternatively, you can use the pre-analyzed results provided in the "RQ1_log" directory.
Just run cd scripts/ && chmod +x ./plot.sh && ./plot.sh to obtain results in the paper.
Then we can see the generated plots stored at: /RQ1/plot/ in the terminal printout.
For this data, we'll provide a detailed explanation of each dataset corresponding to the sections in the paper.
/RQ1/plot/
|-- rq1_final.csv # Original data of Table 2 in the paper, needed to mannually set lame and ParmeSan
|-- target_number_influence.pdf # figure 5 in the paper
|-- vul_overtime_30000-110000.pdf # figure 6 in the paper
|-- ... # Other files can be ignored.
4.3 RQ2
The process is essentially the same as RQ1, but you need to navigate to the RQ2 folder to perform the corresponding operations.
To obtain the plot of RQ2, run cd /RQ2/scripts && ./plot.sh in the container.
/RQ2/plot/
|-- rq2_final.csv # Table 3 in the paper
|-- ... # Other files can be ignored.
4.4 RQ3
4.4.1 Build Docker Images and Fuzz
build the docker image by the following command:
cd Docker-vuln
docker build -t prospector:rq3 .
The process of running fuzz is similar to RQ1, with the difference being the use of different Docker images for the run. if you are using a Docker image other than "prospector:rq3," you will need to modify the "image" variable in generate_runtime.py accordingly.
4.4.2 Crashes Analysis
Assuming that we have completed the fuzzing process and created a "validate" container, with the fuzzing results stored in the "/RQ3/RQ3crashes" directory (tar zxvf RQ3crashes.tar.gz), and the scripts copied to the root directory of the container.
Next, let's begin to detail the process of reproducing RQ3 based on the provided crashes.
Extract the binary used in RQ3.
cd / && tar zxvf benchmark_vuln.tar.gz
Analysis the crashes and dump the time2bug info. (~3min)
cd /RQ3/scripts && python3 analysis.py -b /RQ3/RQ3_crashes -c asan.crash.json -r 0 -o /RQ3/RQ3_log
Print the results from the /RQ3/RQ3_log
python3 print_result.py -b /RQ3/RQ3_log/0/
And it will output the results "tte.csv" in /RQ3/RQ3_log/0/.
program,vul_type,ffapp,prospector,stack_trace
dwg2dxf,OOB read,,0:26:02.695000,secondheader_private->decode_R13_R2000->dwg_decode
dwg2dxf,requested,0:05:08.767000,,calloc->read_sections_map->read_r2007_meta_data
dwg2dxf,heap-buffer-overflow,"2 days, 22:44:46.357000",,dwg_section_wtype->read_sections_map->read_r2007_meta_data
MP4Box,heap-use-after-free,22:41:44.816000,22:10:58.381000,USE:gf_isom_box_del->gf_isom_box_array_reset
tcprewrite,attempting,22:05:32.803000,6:16:54.749000,free->our_safe_free->tcpedit_dlt_cleanup
w3m,OOB read,23:12:47.184000,8:06:34.190000,checkType->loadBuffer->loadSomething
....
Note that the crashes shown in tte.csv is duplicated, manual check is needed. The final version of "tte.csv" is shown in paper.
4.4.3 Info of New Bugs
The new bugs found by Prospector is listed below.
| Program | Bug Type | FishFuzz | Prospector | Status | CVEs | | ---------- | ----------- | -------- | ---------- | ------- | -------------------------------------------------- | | tcprewrite | Double Free | 22h05m | 6h16m | fixed | CVE-2023-4256 | | w3m | OOB Write | 23h12m | 8h6m | fixed | CVE-2023-4255 | | w3m | OOB Read | T.O. | 116h26m | fixed | CVE-2023-38252 | | w3m | OOB Read | T.O. | 148h21m | fixed | CVE-2023-38253 | | dwg2dxf | OOB Read | T.O. | 26m2s | fixed | Affect Dev version, but not affect release version | | dwg2dxf | Heap-UAF | 70h44m | T.O. | fixed | Affect Dev version, but not affect release version | | MP4Box | Heap-UAF | 22h41m | 22h10m | fixed | CVE-2023-51789 |
5. Expanding the artifact
5.1. Adding new targets
You can add new targets to the artifact by following the steps below.
1. Add build.sh in Docker-main/benchmark/project/. You can refer to other targets.
2. Add one command in Docker-main/benchmark/build_bench_prospector.sh, for example,
build_with_prospector "binutils-2.28" "objdump" "-ldl -lpthread -lm -lrt -lz"
3. Rebuild the Docker image by cd Docker-main && docker build -t $IMAGE_NAME .
4. Modifiy the scripts generate_scripts.py by add an element in data
Python
data = [
#id, prog, command_line, seed_folder
[1, "exiv2", "@@", "jpg"],
[2,"tiffsplit","@@","tiff"],
...
[100,"$BINARY_NAME","ARG","SEED_DIR"]
]
5. Prepare the initial seeds in runtime/corpus/
6. Modifiy the scripts generate_runtime.py
- add new binary name in benchmark_list
- change the name image as your previous set
7. Add new binary name and command line in asan.crash.json for further crash triage.
5.2 Adding new fuzzers
- Setup the prerequisites of the new fuzzing tools in the Dockerfile.
- Add
build_bench_$fuzzer.shinbenchmark. You can refer to other fuzzer scripts. - Rebuild the Docker image.
- Modifiy the function
write_script(..)ingenerate_scripts.pyto set the fuzzing command of new fuzzer. - Add the new fuzzer name in
fuzzer_listofgenerate_runtime.py.
Contact
If you have any questions & find any bugs, feel free to contact me via iskindar97@gmail.com.
Owner
- Name: Zhijie Zhang
- Login: iskindar
- Kind: user
- Location: Beijing
- Company: IIE
- Repositories: 1
- Profile: https://github.com/iskindar
A student
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- given-names: Marc
family-names: Heuse
email: mh@mh-sec.de
- given-names: Heiko
family-names: Eißfeldt
email: heiko.eissfeldt@hexco.de
- given-names: Andrea
family-names: Fioraldi
email: andreafioraldi@gmail.com
- given-names: Dominik
family-names: Maier
email: mail@dmnk.co
title: "AFL++"
version: 4.00c
type: software
date-released: 2022-01-26
url: "https://github.com/AFLplusplus/AFLplusplus"
keywords:
- fuzzing
- fuzzer
- fuzz-testing
- instrumentation
- afl-fuzz
- qemu
- llvm
- unicorn-emulator
- securiy
license: AGPL-3.0-or-later
GitHub Events
Total
- Watch event: 3
- Push event: 2
- Create event: 4
Last Year
- Watch event: 3
- Push event: 2
- Create event: 4
Dependencies
- actions/checkout master composite
- docker/build-push-action v2 composite
- docker/login-action v1 composite
- docker/setup-buildx-action v1 composite
- docker/setup-qemu-action v1 composite
- actions/checkout v2 composite
- actions/checkout v2 composite
- github/codeql-action/analyze v1 composite
- github/codeql-action/autobuild v1 composite
- github/codeql-action/init v1 composite
- actions-rs/toolchain v1 composite
- actions/checkout v2 composite
- ubuntu 22.04 build
- ubuntu 20.04 build
- fridadotre/manylinux-x86_64 latest build
- ubuntu xenial build
- tsc 2.0.3 development
- @types/node ^14.14.2 development
- tslint ^6.1.3 development
- typescript ^4.0.3 development
- typescript-tslint-plugin ^0.5.5 development
- @types/frida-gum ^16.2.0