https://github.com/csvl/sema

SEMA is based on angr, a symbolic execution engine used to extract API calls. Especially, we extend ANGR with strategies to create representative signatures based on System Call Dependency graph (SCDG). Those SCDGs can be exploited in machine learning modules to do classification/detection.

https://github.com/csvl/sema

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.6%) to scientific vocabulary

Keywords

angr binary-analysis classification concolic-execution ctf cybersecurity detection linux malware malware-analysis malware-detection malware-research python reverse reverse-engineering sema static-analysis symbolic symbolic-execution windows
Last synced: 5 months ago · JSON representation

Repository

SEMA is based on angr, a symbolic execution engine used to extract API calls. Especially, we extend ANGR with strategies to create representative signatures based on System Call Dependency graph (SCDG). Those SCDGs can be exploited in machine learning modules to do classification/detection.

Basic Info
  • Host: GitHub
  • Owner: csvl
  • License: bsd-2-clause
  • Language: Python
  • Default Branch: production
  • Homepage: https://csvl.github.io/SEMA/
  • Size: 1.42 GB
Statistics
  • Stars: 116
  • Watchers: 4
  • Forks: 23
  • Open Issues: 17
  • Releases: 0
Topics
angr binary-analysis classification concolic-execution ctf cybersecurity detection linux malware malware-analysis malware-detection malware-research python reverse reverse-engineering sema static-analysis symbolic symbolic-execution windows
Created almost 4 years ago · Last pushed 12 months ago
Metadata Files
Readme Changelog Contributing License

README.md

:skullandcrossbones: SEMA :skullandcrossbones:

ToolChain using Symbolic Execution for Malware Analysis.

```

```

Documentation Built by gendocs .github/workflows/.pre-commit-config.yaml CodeQL Documentation Generation pages-build-deployment Python application

Python Docker Debian

Star History Chart

Toolchain architecture

Our toolchain is represented in the following figure and works as follows:

  • A collection of labelled binaries from different malware families is collected and used as the input of the toolchain.
  • Angr, a framework for symbolic execution, is used to execute binaries symbolically and extract execution traces. For this purpose, different heuristics have been developed to optimize symbolic execution.
  • Several execution traces (i.e., API calls used and their arguments) corresponding to one binary are extracted with Angr and gathered together using several graph heuristics to construct a SCDG.
  • These resulting SCDGs are then used as input to graph mining to extract common graphs between SCDGs of the same family and create a signature.
  • Finally, when a new sample has to be classified, its SCDG is built and compared with SCDGs of known families using a simple similarity metric.

Toolchain Illustration

This repository contains a first version of a SCDG extractor. During the symbolic analysis of a binary, all system calls and their arguments found are recorded. After some stop conditions for symbolic analysis, a graph is built as follows: Nodes are system calls recorded, edges show that some arguments are shared between calls.

When a new sample has to be evaluated, its SCDG is first built as described previously. Then, gspan is applied to extract the biggest common subgraph and a similarity score is evaluated to decide if the graph is considered as part of the family or not. The similarity score S between graph G' and G'' is computed as follows: Since G'' is a subgraph of G', this is calculating how much G' appears in G''. Another classifier we use is the Support Vector Machine (SVM) with INRIA graph kernel or the Weisfeiler-Lehman extension graph kernel.

A web application is available and is called SemaWebApp. It allows to manage the launch of experiments on SemaSCDG and/or SemaClassifier.

Pre-commit

This repository uses pre-commit to ensure that the code is formatted correctly and that the code is clean. To install pre-commit, run the following command:

bash python3 -m pip install pre-commit pre-commit install

Documentation

  • Complete README of the entire toolchain : Sema README

  • SCDG README : SCDG README

  • Classifier README : Classifier README

  • Web app README : Web app README

  • A Makefile is provided to ease the usage of the toolchain, run make help for more information about the available commands

Credentials

Main authors of the projects:

  • Charles-Henry Bertrand Van Ouytsel (UCLouvain)

  • Christophe Crochet (UCLouvain)

  • Khanh Huu The Dam (UCLouvain)

  • Oreins Manon (UCLouvain)

Under the supervision and with the support of Fabrizio Biondi (Avast)

Under the supervision and with the support of our professor Axel Legay (UCLouvain) (:heart:)

Linked papers

Owner

  • Name: csvl
  • Login: csvl
  • Kind: organization

GitHub Events

Total
  • Issues event: 1
  • Watch event: 28
  • Push event: 6
  • Pull request review event: 3
  • Pull request event: 20
  • Fork event: 8
  • Create event: 5
Last Year
  • Issues event: 1
  • Watch event: 28
  • Push event: 6
  • Pull request review event: 3
  • Pull request event: 20
  • Fork event: 8
  • Create event: 5

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 5
  • Average time to close issues: N/A
  • Average time to close pull requests: 22 days
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 5
  • Average time to close issues: N/A
  • Average time to close pull requests: 22 days
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Thib-fkr (1)
Pull Request Authors
  • dependabot[bot] (8)
  • Thib-fkr (4)
  • ElNiak (3)
  • codacy-badger (2)
  • richarddej (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (8) python (8)

Dependencies

pyproject.toml pypi
.github/workflows/pr-generate-docs.yaml actions
  • actions/checkout v3 composite
  • stefanzweifel/git-auto-commit-action v4 composite
.github/workflows/python-app.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v3 composite
requirements.txt pypi
  • PyYAML ==6.0.1
  • mkdocs ==1.5.0
  • mkdocs-material ==9.1.15
  • mkdocs-material-extensions *
  • mkgendocs ==0.9.2
  • pymdown-extensions *
.github/workflows/pre-commit.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pre-commit/action v3.0.1 composite
requirement-mkdoc.txt pypi
  • Flask-Cors ==3.0.10
  • PyYAML ==6.0.1
  • angr ==9.2.21
  • avatar2 *
  • claripy ==9.2.21
  • click ==8.1.3
  • cryptography *
  • dill *
  • django *
  • flask *
  • flask ==2.2.5
  • flask_session *
  • gensim *
  • grakel *
  • graphviz *
  • kvm *
  • libvirt-python *
  • logbook *
  • markdown-callouts *
  • matplotlib *
  • minidump ==0.0.10
  • mkdocs *
  • mkdocs-awesome-pages-plugin *
  • mkdocs-coverage *
  • mkdocs-enumerate-headings-plugin *
  • mkdocs-exclude *
  • mkdocs-gen-files *
  • mkdocs-git-authors-plugin *
  • mkdocs-git-revision-date-localized-plugin *
  • mkdocs-img2fig-plugin *
  • mkdocs-literate-nav *
  • mkdocs-material *
  • mkdocs-material-extensions *
  • mkdocs-minify-plugin *
  • mkdocs-print-site-plugin *
  • mkdocs-same-dir *
  • mkdocs-section-index *
  • mkdocs-table-reader-plugin *
  • mkdocstrings *
  • mkgendocs *
  • mknotebooks *
  • mmh3 *
  • monkeyhex *
  • nose *
  • notebook *
  • npf-web-extension *
  • numpy *
  • pandas *
  • progressbar *
  • protobuf ==3.20.
  • psutil *
  • pygraphviz *
  • pyinstaller *
  • pymdown-extensions *
  • pymongo *
  • pyzipper *
  • r2pipe *
  • requests *
  • researchpy *
  • scikit-learn *
  • seaborn *
  • tasks *
  • termcolor *
  • terminal_banner *
  • torch *
  • torchvision *
  • unipacker *
  • unix *
.github/workflows/.pre-commit-config.yaml actions