Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.8%) to scientific vocabulary
Repository
A novel backdoor detection benchmark
Basic Info
Statistics
- Stars: 6
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
ROSARUM: a novel backdoor detection benchmark
About
The ROSARUM backdoor detection benchmark contains a series of backdoored programs which can be used to evaluate software backdoor detection methods.
Each benchmark comes in three flavors:
- safe: no backdoor exists in the program (to test the detection method's precision)
- backdoored: one or more backdoors exist in the program (to test the detection method's recall)
- ground-truth: the same backdoors exist as in the backdoored version, except that every time
they're hit they print something in
stderrto identify themselves (such as***BACKDOOR TRIGGERED***).
The ground-truth versions can be used to perform a precise evaluation of the precision and recall of a given detection method.
The benchmarks are also split into two large categories:
- authentic: real backdoors found in the wild
- synthetic: fake backdoors injected in (hopefully) backdoor-safe software
Benchmark summary
Authentic backdoor benchmarks
| Name | Backdoor description |
| ------------------ | -------------------------------------------------------------- |
| Belkin | HTTP request with secret URL value leads to web shell |
| D-Link | HTTP request with secret field value bypasses authentication |
| Linksys | Packet with specific payload enables memory read/write |
| Tenda | Packet with specific payload enables command execution |
| PHP | HTTP request with secret field value enables command execution |
| ProFTPD | Secret FTP command leads to root shell |
| vsFTPd | FTP usernames containing ":)" lead to root shell |
Synthetic backdoor benchmarks
| Name | Backdoor description | | ---------------------- | -------------------------------------------------------------------- | | sudo | Hardcoded credentials bypass authentication | | libpng | Secret image metadata values enable command execution | | libsndfile | Secret sound file metadata value triggers home directory encryption | | libtiff | Secret image metadata value enables command execution | | libxml2 | Secret XML node format enables command execution | | Lua | Specific string values in script enable reading from filesystem | | OpenSSL | Secret bignum exponentiation string enables command execution | | PHP | Specific string values in serialized object enable command execution | | Poppler | Secret comment character in PDF enables command execution | | SQLite3 | Secret SQL keyword enables removal of home directory |
Installation
Docker
We highly recommend using ROSARUM in a Docker container,
since some backdoors may carry payloads that can affect your machine (e.g., by removing the /home/
directory).
You can simply pull the existing ROSARUM Docker image by running:
console
$ docker pull plumtrie/rosarum:latest
Then, you can run a container using that image by running:
console
$ docker run -ti --rm plumtrie/rosarum:latest
(Note that this command will start an interactive session within the container, and that exiting the container will trigger its removal.)
Building the Docker image
If you wish to build the Docker image on your machine, you can use the helper build.sh script,
which will automatically tag the image with the current version. See the script itself for more
information.
Before running the script (or simply docker build ...), make sure that you have cloned all of
the submodules used in this repo. You can do this either by cloning the repo with
--recurse-submodules, or by running git submodule update --init post-cloning.
Be advised that the build might take some time (it takes ~12 minutes on a laptop with a 20-core 12th Gen Intel(R) Core(TM) i7-12800H CPU).
Once the Docker image is built, the run.sh convenience script may be used to run it. Generally,
released versions of the image will be tagged, so you can run git checkout <TAG> and run
./build.sh and ./run.sh to build and run a specific version of the image.
Building from source
WARNING: running the target programs in a native, unprotected environment may endanger the state of your machine. We highly recommend using a Docker container as described above.
You should be able to build all of the target programs on a modern Unix system (the builds have not been tested outside that environment). However, you first need to install a number of dependencies; you can find the full list of dependencies in the Dockerfile.
Once you have installed the dependencies, you should be able to build any target program, with different levels of granularity. To build all variants of all target programs, you can run (from the targets directory):
console
$ make
To build all variants of an entire category of target programs (e.g., authentic), you can run (from the targets directory):
console
$ make authentic
To build all variants of a specific target program (e.g., Sudo), you can run (from the targets directory):
console
$ make sudo-1.9.15p5
To build a specific variant (e.g., ground-truth) of a specific target program (e.g., Sudo), you can run (from the target program's root directory, e.g., targets/synthetic/sudo-1.9.15p5):
console
$ make ground-truth
Usage
Reproducing the backdoors
Instructions on how to run all of the variants can be found in the root directory of each backdoor sample.
Generally, for each sample, you'll want to first build it (if it's not built):
console
$ make # or `make <type>`, where `<type>` is `safe`, `backdoored` or `ground-truth`
Then, you need to perform any additional setup that may be needed (e.g., copying files to specific directories):
console
$ make setup
Once you're done with the target program, to make sure other programs are not affected, you should undo the setup:
console
$ make teardown
Evaluating a backdoor detection method on ROSARUM
If you want to evaluate a backdoor detection method, you can run it on the backdoor variants and
evaluate the results on the ground-truth variants, by inspecting stderr for the
***BACKDOOR TRIGGERED*** marker.
For instance, let us assume that your backdoor detection tool is used on
./targets/synthetic/sudo-1.9.15p5/backdoored/build/bin/sudo (note the use of the backdoored
variant) and produces backdoor-triggering inputs in the sudo-findings/ directory. For example,
this simple Bash script goes through the findings (inputs to the target program) and prints the name
of the finding file along with the result of the evaluation (true/false positive):
bash
for finding in $(ls sudo-findings)
do
# Note the use of the _ground-truth_ variant here.
./targets/synthetic/sudo-1.9.15p5/ground-truth/build/bin/sudo -Sk -- id 2>&1 \
< sudo-findings/$finding \
| grep "\*\*\*BACKDOOR TRIGGERED\*\*\*" >/dev/null \
&& echo "$finding: true positive" \
|| echo "$finding: false positive"
done
Contributing
Please read CONTRIBUTING.md.
Citing this repo
When citing the associated ICSE'25 paper, use the following snippet:
bibtex
@inproceedings{kokkonis-2025-rosa,
author = {Kokkonis, Dimitri and Marcozzi, Michaël and Decoux, Emilien and Zacchiroli, Stefano},
booktitle = {2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)},
title = {ROSA: Finding Backdoors with Fuzzing},
year = {2025},
volume = {},
number = {},
pages = {2816-2828},
keywords = {Runtime;Automation;Manuals;Binary codes;Fuzzing;Benchmark testing;Robustness;Software;Performance analysis;Standards;fuzzing;dynamic analysis;metamorphic testing;backdoors;vulnerability detection},
doi = {10.1109/ICSE55347.2025.00183},
}
When citing the actual repository/dataset itself, use CITATION.cff.
Owner
- Name: BINSEC development team
- Login: binsec
- Kind: organization
- Repositories: 6
- Profile: https://github.com/binsec
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: ROSARUM
message: >-
If you use this dataset, please cite it using the
metadata from this file.
type: dataset
authors:
- given-names: Dimitri
family-names: Kokkonis
email: dimitri.kokkonis@cea.fr
affiliation: 'Université Paris-Saclay, CEA, List'
orcid: 'https://orcid.org/0009-0009-5171-2992'
- given-names: Michaël
family-names: Marcozzi
email: michael.marcozzi@cea.fr
affiliation: 'Université Paris-Saclay, CEA, List'
orcid: 'https://orcid.org/0000-0002-8087-0537'
- given-names: Emilien
family-names: Decoux
email: emilien.decoux@protonmail.com
affiliation: 'Université Paris-Saclay, CEA, List'
- given-names: Stefano
family-names: Zacchiroli
email: stefano.zacchiroli@telecom-paris.fr
affiliation: 'LTCI, Télécom Paris, Institut Polytechnique de Paris'
orcid: 'https://orcid.org/0000-0002-4576-136X'
identifiers:
- type: doi
value: 10.5281/zenodo.14724250
description: Zenodo artifact
- type: swh
value: 'swh:1:rev:21d986293f083a09c0692c504305ac6e4fb9bf38'
repository-code: 'https://github.com/binsec/rosarum'
abstract: >-
A code-level backdoor is a hidden access, programmed and
concealed within the code of a program. For instance,
hard-coded credentials planted in the code of a file
server application would enable maliciously logging into
all deployed instances of this application. Confirmed
software supplychain attacks have led to the injection of
backdoors into popular open-source projects, and backdoors
have been discovered in various router firmware. Manual
code auditing for backdoors is challenging and existing
semi-automated approaches can handle only a limited scope
of programs and backdoors, while requiring manual
reverse-engineering of the audited (binary) program.
Graybox fuzzing (automated semi-randomized testing) has
grown in popularity due to its success in discovering
vulnerabilities and hence stands as a strong candidate for
improved backdoor detection. However, current fuzzing
knowledge does not offer any means to detect the
triggering of a backdoor at runtime. In this work we
introduce ROSA, a novel approach (and tool) which combines
a state-of-the-art fuzzer (AFL++) with a new metamorphic
test oracle, capable of detecting runtime backdoor
triggers. To facilitate the evaluation of ROSA, we have
created ROSARUM, the first openly available benchmark for
assessing the detection of various backdoors in diverse
programs. Experimental evaluation shows that ROSA has a
level of robustness, speed and automation similar to
classical fuzzing. It finds all 17 authentic or synthetic
backdooors from ROSARUM in 1 h 30 on average. Compared to
existing detection tools, it can handle a diversity of
backdoors and programs and it does not rely on manual
reverse-engineering of the fuzzed binary code.
keywords:
- Backdoors
- Fuzzing
- Vulnerability detection
- Binary programs
license: LGPL-2.1-only
GitHub Events
Total
- Watch event: 6
- Issue comment event: 1
- Push event: 12
- Public event: 1
- Pull request event: 2
- Fork event: 2
Last Year
- Watch event: 6
- Issue comment event: 1
- Push event: 12
- Public event: 1
- Pull request event: 2
- Fork event: 2
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- pre-commit/action v3.0.1 composite
- ubuntu 22.04 build