rosarum

A novel backdoor detection benchmark

https://github.com/binsec/rosarum

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 7 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

A novel backdoor detection benchmark

Basic Info

Host: GitHub
Owner: binsec
License: other
Language: C
Default Branch: main
Homepage:
Size: 38.9 MB

Statistics

Stars: 6
Watchers: 3
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed 12 months ago

Metadata Files

Readme Changelog Contributing License Citation Authors

ROSARUM: a novel backdoor detection benchmark

About

The ROSARUM backdoor detection benchmark contains a series of backdoored programs which can be used to evaluate software backdoor detection methods.

Each benchmark comes in three flavors:

safe: no backdoor exists in the program (to test the detection method's precision)
backdoored: one or more backdoors exist in the program (to test the detection method's recall)
ground-truth: the same backdoors exist as in the backdoored version, except that every time they're hit they print something in stderr to identify themselves (such as ***BACKDOOR TRIGGERED***).

The ground-truth versions can be used to perform a precise evaluation of the precision and recall of a given detection method.

The benchmarks are also split into two large categories:

authentic: real backdoors found in the wild
synthetic: fake backdoors injected in (hopefully) backdoor-safe software

Benchmark summary

Authentic backdoor benchmarks

| Name | Backdoor description | | ------------------ | -------------------------------------------------------------- | | Belkin | HTTP request with secret URL value leads to web shell | | D-Link | HTTP request with secret field value bypasses authentication | | Linksys | Packet with specific payload enables memory read/write | | Tenda | Packet with specific payload enables command execution | | PHP | HTTP request with secret field value enables command execution | | ProFTPD | Secret FTP command leads to root shell | | vsFTPd | FTP usernames containing ":)" lead to root shell |

Synthetic backdoor benchmarks

| Name | Backdoor description | | ---------------------- | -------------------------------------------------------------------- | | sudo | Hardcoded credentials bypass authentication | | libpng | Secret image metadata values enable command execution | | libsndfile | Secret sound file metadata value triggers home directory encryption | | libtiff | Secret image metadata value enables command execution | | libxml2 | Secret XML node format enables command execution | | Lua | Specific string values in script enable reading from filesystem | | OpenSSL | Secret bignum exponentiation string enables command execution | | PHP | Specific string values in serialized object enable command execution | | Poppler | Secret comment character in PDF enables command execution | | SQLite3 | Secret SQL keyword enables removal of home directory |

Installation

Docker

We highly recommend using ROSARUM in a Docker container, since some backdoors may carry payloads that can affect your machine (e.g., by removing the /home/ directory).

You can simply pull the existing ROSARUM Docker image by running:

console $ docker pull plumtrie/rosarum:latest

Then, you can run a container using that image by running:

console $ docker run -ti --rm plumtrie/rosarum:latest

(Note that this command will start an interactive session within the container, and that exiting the container will trigger its removal.)

Building the Docker image

If you wish to build the Docker image on your machine, you can use the helper build.sh script, which will automatically tag the image with the current version. See the script itself for more information.

Before running the script (or simply docker build ...), make sure that you have cloned all of the submodules used in this repo. You can do this either by cloning the repo with --recurse-submodules, or by running git submodule update --init post-cloning.

Be advised that the build might take some time (it takes ~12 minutes on a laptop with a 20-core 12th Gen Intel(R) Core(TM) i7-12800H CPU).

Once the Docker image is built, the run.sh convenience script may be used to run it. Generally, released versions of the image will be tagged, so you can run git checkout <TAG> and run ./build.sh and ./run.sh to build and run a specific version of the image.

Building from source

WARNING: running the target programs in a native, unprotected environment may endanger the state of your machine. We highly recommend using a Docker container as described above.

You should be able to build all of the target programs on a modern Unix system (the builds have not been tested outside that environment). However, you first need to install a number of dependencies; you can find the full list of dependencies in the Dockerfile.

Once you have installed the dependencies, you should be able to build any target program, with different levels of granularity. To build all variants of all target programs, you can run (from the targets directory):

console $ make

To build all variants of an entire category of target programs (e.g., authentic), you can run (from the targets directory):

console $ make authentic

To build all variants of a specific target program (e.g., Sudo), you can run (from the targets directory):

console $ make sudo-1.9.15p5

To build a specific variant (e.g., ground-truth) of a specific target program (e.g., Sudo), you can run (from the target program's root directory, e.g., targets/synthetic/sudo-1.9.15p5):

console $ make ground-truth

Usage

Reproducing the backdoors

Instructions on how to run all of the variants can be found in the root directory of each backdoor sample.

Generally, for each sample, you'll want to first build it (if it's not built):

console $ make # or `make <type>`, where `<type>` is `safe`, `backdoored` or `ground-truth`

Then, you need to perform any additional setup that may be needed (e.g., copying files to specific directories):

console $ make setup

Once you're done with the target program, to make sure other programs are not affected, you should undo the setup:

console $ make teardown

Evaluating a backdoor detection method on ROSARUM

If you want to evaluate a backdoor detection method, you can run it on the backdoor variants and evaluate the results on the ground-truth variants, by inspecting stderr for the ***BACKDOOR TRIGGERED*** marker.

For instance, let us assume that your backdoor detection tool is used on ./targets/synthetic/sudo-1.9.15p5/backdoored/build/bin/sudo (note the use of the backdoored variant) and produces backdoor-triggering inputs in the sudo-findings/ directory. For example, this simple Bash script goes through the findings (inputs to the target program) and prints the name of the finding file along with the result of the evaluation (true/false positive):

bash for finding in $(ls sudo-findings) do # Note the use of the _ground-truth_ variant here. ./targets/synthetic/sudo-1.9.15p5/ground-truth/build/bin/sudo -Sk -- id 2>&1 \ < sudo-findings/$finding \ | grep "\*\*\*BACKDOOR TRIGGERED\*\*\*" >/dev/null \ && echo "$finding: true positive" \ || echo "$finding: false positive" done

Contributing

Please read CONTRIBUTING.md.

Citing this repo

When citing the associated ICSE'25 paper, use the following snippet:

bibtex @inproceedings{kokkonis-2025-rosa, author = {Kokkonis, Dimitri and Marcozzi, Michaël and Decoux, Emilien and Zacchiroli, Stefano}, booktitle = {2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)}, title = {ROSA: Finding Backdoors with Fuzzing}, year = {2025}, volume = {}, number = {}, pages = {2816-2828}, keywords = {Runtime;Automation;Manuals;Binary codes;Fuzzing;Benchmark testing;Robustness;Software;Performance analysis;Standards;fuzzing;dynamic analysis;metamorphic testing;backdoors;vulnerability detection}, doi = {10.1109/ICSE55347.2025.00183}, }

When citing the actual repository/dataset itself, use CITATION.cff.

Owner

Name: BINSEC development team
Login: binsec
Kind: organization

Repositories: 6
Profile: https://github.com/binsec

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: ROSARUM
message: >-
  If you use this dataset, please cite it using the
  metadata from this file.
type: dataset
authors:
  - given-names: Dimitri
    family-names: Kokkonis
    email: dimitri.kokkonis@cea.fr
    affiliation: 'Université Paris-Saclay, CEA, List'
    orcid: 'https://orcid.org/0009-0009-5171-2992'
  - given-names: Michaël
    family-names: Marcozzi
    email: michael.marcozzi@cea.fr
    affiliation: 'Université Paris-Saclay, CEA, List'
    orcid: 'https://orcid.org/0000-0002-8087-0537'
  - given-names: Emilien
    family-names: Decoux
    email: emilien.decoux@protonmail.com
    affiliation: 'Université Paris-Saclay, CEA, List'
  - given-names: Stefano
    family-names: Zacchiroli
    email: stefano.zacchiroli@telecom-paris.fr
    affiliation: 'LTCI, Télécom Paris, Institut Polytechnique de Paris'
    orcid: 'https://orcid.org/0000-0002-4576-136X'
identifiers:
  - type: doi
    value: 10.5281/zenodo.14724250
    description: Zenodo artifact
  - type: swh
    value: 'swh:1:rev:21d986293f083a09c0692c504305ac6e4fb9bf38'
repository-code: 'https://github.com/binsec/rosarum'
abstract: >-
  A code-level backdoor is a hidden access, programmed and
  concealed within the code of a program. For instance,
  hard-coded credentials planted in the code of a file
  server application would enable maliciously logging into
  all deployed instances of this application. Confirmed
  software supplychain attacks have led to the injection of
  backdoors into popular open-source projects, and backdoors
  have been discovered in various router firmware. Manual
  code auditing for backdoors is challenging and existing
  semi-automated approaches can handle only a limited scope
  of programs and backdoors, while requiring manual
  reverse-engineering of the audited (binary) program.
  Graybox fuzzing (automated semi-randomized testing) has
  grown in popularity due to its success in discovering
  vulnerabilities and hence stands as a strong candidate for
  improved backdoor detection. However, current fuzzing
  knowledge does not offer any means to detect the
  triggering of a backdoor at runtime. In this work we
  introduce ROSA, a novel approach (and tool) which combines
  a state-of-the-art fuzzer (AFL++) with a new metamorphic
  test oracle, capable of detecting runtime backdoor
  triggers. To facilitate the evaluation of ROSA, we have
  created ROSARUM, the first openly available benchmark for
  assessing the detection of various backdoors in diverse
  programs. Experimental evaluation shows that ROSA has a
  level of robustness, speed and automation similar to
  classical fuzzing. It finds all 17 authentic or synthetic
  backdooors from ROSARUM in 1 h 30 on average. Compared to
  existing detection tools, it can handle a diversity of
  backdoors and programs and it does not rely on manual
  reverse-engineering of the fuzzed binary code.
keywords:
  - Backdoors
  - Fuzzing
  - Vulnerability detection
  - Binary programs
license: LGPL-2.1-only

GitHub Events

Total

Watch event: 6
Issue comment event: 1
Push event: 12
Public event: 1
Pull request event: 2
Fork event: 2

Last Year

Watch event: 6
Issue comment event: 1
Push event: 12
Public event: 1
Pull request event: 2
Fork event: 2

Dependencies

.github/workflows/ci.yaml actions

actions/checkout v4 composite
actions/setup-python v5 composite
pre-commit/action v3.0.1 composite

Dockerfile docker

ubuntu 22.04 build

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science