docker-packing-box

Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection

https://github.com/packing-box/docker-packing-box

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, ieee.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.1%) to scientific vocabulary

Keywords

binary-analysis dataset-generation docker-image elf-format executable-packing machine-learning malware-analysis malware-packers malware-research packing-detection pe-format research-platform research-tools
Last synced: 6 months ago · JSON representation ·

Repository

Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection

Basic Info
  • Host: GitHub
  • Owner: packing-box
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 129 MB
Statistics
  • Stars: 60
  • Watchers: 1
  • Forks: 15
  • Open Issues: 6
  • Releases: 0
Topics
binary-analysis dataset-generation docker-image elf-format executable-packing machine-learning malware-analysis malware-packers malware-research packing-detection pe-format research-platform research-tools
Created about 5 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

Packing Box Tweet

Experimental toolkit for static detection of executable packing.

Read The Docs Black Hat Arsenal Europe 2022 Black Hat Arsenal Europe 2023 Black Hat Arsenal Europe 2024 License: GPL v3

This Docker container is a CLI environment featuring a toolkit that gathers executable analyzers, packing detectors, packers and unpackers but also many tools for generating and manipulating datasets of packed and not-packed executables of different formats (including PE, ELF and Mach-O) for the sake of evaluating static detection techniques and tools, visualizing executables' layout and automating machine learning pipelines with the support of many algorithms.

See the Black Hat Arsenal presentations for demonstrations:

Here is what you can see when you start up the Docker container.

The various items integrated in the Packing-Box are defined in the very declarative and easy-to-use YAML format through different configuration files. This makes shaping the scope for evaluations and machine learning model training straightforward and practical for researchers.

:fast_forward: Quick Start

Building the image:

```console

docker build -t dhondta/packing-box .

[...] <<>> [...] ```

Starting it up with the current working directory mounted as /mnt/share in the container:

Windows

powershell PS C:\> docker run -it -h packing-box -v ${pwd}:/mnt/share dhondta/packing-box

Linux

```bash

docker run -it -h packing-box -v pwd:/mnt/share dhondta/packing-box

```

:clipboard: Basics

Items Usage

Items are configured through the YAML configuration files. They consist in: - analyzers.yml: utilities for analyzing files or more specifically packer trace - detectors.yml: tools for analyzing and deciding whether an executable is packed or not - packers.yml and unpackers.yml: self-explanatory

From within the Packing-Box, the packing-box tool allows to setup and test items.

Operation | Description | Command :---:| --- | --- setup | Setup an item from its YAML install definition | # packing-box setup detector die test | Test an item using a built-in set of test samples | # packing-box test packer upx

Afterwards, items are available from the console.

console $ die --help <<snipped>> $ upx --help <<snipped>>

Mass Packing & Detection

Packers and detectors have their respective dedicated tools for mass operations, packer and detector. They work either on a single file, a complete folder or a special dataset instance (as of the abstraction defined in the pbox package).

console $ packer upx path/to/executables --prefix "upx_" <<snipped>>

For the detector tool, not selecting any detector will use those selected in detectors.yml as being part of the "superdetector". Moreover, the --binary option will consider whether the target executable is packed or not and not is precise packer.

console $ detector path/to/single-executable -d die -d pypackerdetect <<snipped>> $ detector path/to/executables <<snipped ; will use "superdetection">> $ detector path/to/executables -d bintropy --binary <<snipped ; in this case, as Bintropy only supports binary classification, --binary is necessary>>

Learning Pipeline

Machine Learning models are fine-tuned through the YAML configuration files. They consist in: - algorithms.yml: the algorithms that are used with their static or dynamic parameters while training models - features.yml: the characteristics to be considered while training and using models

The PREPARE phase, especially feature engineering, is fine-tuned with the features YAML definition. Note that feature extraction is achieved with the pbox package of the Packing-Box while feature derivation and transformation is fine-tuned via the features YAML file.

The TRAIN phase is fine-tuned through the algorithms YAML file by setting the static and/or cross-validation parameters.

Dataset Manipulations

The PREPARE phase, especially dataset generation, is achieved with the dataset tool.

Operation | Description | Command :---:| --- | --- make | Make a new dataset, either fully packed or mixed with not-packed samples | # dataset make dataset -f PE -n 200 -s /path/to/pe merge | Merge two datasets | # dataset merge dataset dataset2 select | Select a subset of a dataset to create a new one | # dataset select dataset dataset2 -q "format == 'PE32'" update | Update a dataset with new samples given their labels | # dataset update dataset -l labels.json -s folder-of-executables

Data Visualization

The VISUALIZE phase can be performed with the dataset and visualizer tools.

In order to visualize feature values:

console $ dataset plot test-mix byte_0_after_ep byte_1_after_ep --multiclass

In order to visualize samples (aims to compare the not-packed and some packed versions):

console $ visualizer plot "PsExec.exe$" dataset -s -l not-packed -l MEW -l RLPack -l UPX

This will work for instance for a structure formatted as such:

folder/ +-- not-packed/PsExec.exe +-- packed +-- MEW/mew_PsExec.exe +-- RLPack/rlpack_PsExec.exe +-- UPX/upx_PsExec.exe

Model Manipulations

The TRAIN and PREDICT phases of the pipeline are achieved with the model tool.

Operation | Description | Command :---:| --- | --- compare | Compare the performance metrics of multiple models | # model compare model --dataset dataset --model model2 test | Test a model on a given dataset | # model test model --name dataset train | Train a model given an algorithm and input dataset | # model train dataset --algorithm dt

:star: Related Projects

You may also like these:

:books: Related Readings

:clap: Supporters

Stargazers repo roster for @packing-box/docker-packing-box

Forkers repo roster for @packing-box/docker-packing-box

Back to top

Citation (CITATIONS.bib)

@misc{dhondtPackingBox2024,
  title = {Packing-{{Box}}},
  author = {D'Hondt, Alexandre},
  year = {2024},
  abstract = {Docker image gathering many packing-related tools and for making datasets of packed executables for use with machine learning.},
  keywords = {elf,elf32,framework,mach-o,machine learning,packers,pe,pe32,platform,static analysis}
}

@inproceedings{dhondtExperimentalToolkitManipulating2024,
  title = {Experimental Toolkit for Manipulating Executable Packing},
  booktitle = {Risks and Security of Internet and Systems},
  author = {D'Hondt, Alexandre and Van Ouytsel, Charles Henry Bertrand and Legay, Axel},
  editor = {Ait Wakrime, Abderrahim and {Navarro-Arribas}, Guillermo and Cuppens, Fr{\'e}d{\'e}ric and Cuppens, Nora and Benaini, Redouane},
  year = {2024},
  month = jun,
  pages = {263--279},
  publisher = {Springer Nature Switzerland},
  doi = {10.1007/978-3-031-61231-2_17},
  abstract = {Executable packing is a well-known problematic especially in the field of malware analysis. It often consists in applying compression or encryption to a binary file and embedding a stub for reversing these transformations at runtime. This way, the packed executable is more difficult to reverse-engineer and/or is obfuscated, which is effective for evading static detection techniques. Many detection approaches, including machine learning, have been proposed in the literature so far, but most studies rely on questionable ground truths and do not provide any open implementation, making the comparison of state-of-the-art solutions tedious. We thus think that first solving the issue of repeatability shall help to compare existing executable packing static detection techniques. Given this challenge, we propose an experimental toolkit, named Packing Box, that leverages automation and containerization in an open source platform that brings a unified solution to the research community. We present our engineering approach for designing and implementing our solution. We then showcase it with a few basic experiments, including a performance evaluation of open source static packing detectors and training a model with machine learning pipeline automation. This introduces the toolset that will be used in further studies.},
  isbn = {978-3-031-61231-2},
  keywords = {machine learning,packer detection,packer identification,packing-box,toolkit}
}

GitHub Events

Total
  • Commit comment event: 1
  • Issues event: 29
  • Watch event: 12
  • Issue comment event: 45
  • Push event: 100
  • Pull request review event: 1
  • Pull request event: 5
  • Fork event: 5
Last Year
  • Commit comment event: 1
  • Issues event: 29
  • Watch event: 12
  • Issue comment event: 45
  • Push event: 100
  • Pull request review event: 1
  • Pull request event: 5
  • Fork event: 5

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 20
  • Total pull requests: 3
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 5 days
  • Total issue authors: 5
  • Total pull request authors: 2
  • Average comments per issue: 1.75
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 18
  • Pull requests: 3
  • Average time to close issues: 9 days
  • Average time to close pull requests: 5 days
  • Issue authors: 3
  • Pull request authors: 2
  • Average comments per issue: 1.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • AlexVanMechelen (12)
  • cybersecurity-dev (11)
  • jramhani (11)
  • clement-alloin-afk (4)
  • dhondta (3)
Pull Request Authors
  • AlexVanMechelen (30)
  • jramhani (5)
  • cybersecurity-dev (2)
  • clement-alloin-afk (1)
Top Labels
Issue Labels
failure (8) bug (7) enhancement (3) regression bug (1) wontfix (1) warning (1) help wanted (1)
Pull Request Labels
fix (3) enhancement (1) bug (1) help wanted (1)

Dependencies

Dockerfile docker
  • base latest build
  • customized latest build
  • ubuntu 22.04 build
docs/requirements.txt pypi
  • jinja2 <3.1.0
  • mkdocs ==1.2.3
  • mkdocs-bootswatch *
  • mkdocs-material *
  • mkdocs-rtd-dropdown *
  • pymdown-extensions *