https://github.com/csvl/sema
SEMA is based on angr, a symbolic execution engine used to extract API calls. Especially, we extend ANGR with strategies to create representative signatures based on System Call Dependency graph (SCDG). Those SCDGs can be exploited in machine learning modules to do classification/detection.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary
Keywords
Repository
SEMA is based on angr, a symbolic execution engine used to extract API calls. Especially, we extend ANGR with strategies to create representative signatures based on System Call Dependency graph (SCDG). Those SCDGs can be exploited in machine learning modules to do classification/detection.
Basic Info
- Host: GitHub
- Owner: csvl
- License: bsd-2-clause
- Language: Python
- Default Branch: production
- Homepage: https://csvl.github.io/SEMA/
- Size: 1.42 GB
Statistics
- Stars: 116
- Watchers: 4
- Forks: 23
- Open Issues: 17
- Releases: 0
Topics
Metadata Files
README.md
:skullandcrossbones: SEMA :skullandcrossbones:
ToolChain using Symbolic Execution for Malware Analysis.
```
```
Toolchain architecture
Our toolchain is represented in the following figure and works as follows:
- A collection of labelled binaries from different malware families is collected and used as the input of the toolchain.
- Angr, a framework for symbolic execution, is used to execute binaries symbolically and extract execution traces. For this purpose, different heuristics have been developed to optimize symbolic execution.
- Several execution traces (i.e., API calls used and their arguments) corresponding to one binary are extracted with Angr and gathered together using several graph heuristics to construct a SCDG.
- These resulting SCDGs are then used as input to graph mining to extract common graphs between SCDGs of the same family and create a signature.
- Finally, when a new sample has to be classified, its SCDG is built and compared with SCDGs of known families using a simple similarity metric.

This repository contains a first version of a SCDG extractor. During the symbolic analysis of a binary, all system calls and their arguments found are recorded. After some stop conditions for symbolic analysis, a graph is built as follows: Nodes are system calls recorded, edges show that some arguments are shared between calls.
When a new sample has to be evaluated, its SCDG is first built as described previously. Then, gspan is applied to extract the biggest common subgraph and a similarity score is evaluated to decide if the graph is considered as part of the family or not. The similarity score S between graph G' and G'' is computed as follows:
Since G'' is a subgraph of G', this is calculating how much G' appears in G''.
Another classifier we use is the Support Vector Machine (SVM) with INRIA graph kernel or the Weisfeiler-Lehman extension graph kernel.
A web application is available and is called SemaWebApp. It allows to manage the launch of experiments on SemaSCDG and/or SemaClassifier.
Pre-commit
This repository uses pre-commit to ensure that the code is formatted correctly and that the code is clean. To install pre-commit, run the following command:
bash
python3 -m pip install pre-commit
pre-commit install
Documentation
Complete README of the entire toolchain :
SCDG README :
Classifier README :
Web app README :
A Makefile is provided to ease the usage of the toolchain, run
make helpfor more information about the available commands
Credentials
Main authors of the projects:
Charles-Henry Bertrand Van Ouytsel (UCLouvain)
Christophe Crochet (UCLouvain)
Khanh Huu The Dam (UCLouvain)
Oreins Manon (UCLouvain)
Under the supervision and with the support of Fabrizio Biondi (Avast)
Under the supervision and with the support of our professor Axel Legay (UCLouvain) (:heart:)
Linked papers
Owner
- Name: csvl
- Login: csvl
- Kind: organization
- Repositories: 4
- Profile: https://github.com/csvl
GitHub Events
Total
- Issues event: 1
- Watch event: 28
- Push event: 6
- Pull request review event: 3
- Pull request event: 20
- Fork event: 8
- Create event: 5
Last Year
- Issues event: 1
- Watch event: 28
- Push event: 6
- Pull request review event: 3
- Pull request event: 20
- Fork event: 8
- Create event: 5
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 22 days
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 22 days
- Issue authors: 1
- Pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Thib-fkr (1)
Pull Request Authors
- dependabot[bot] (8)
- Thib-fkr (4)
- ElNiak (3)
- codacy-badger (2)
- richarddej (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- stefanzweifel/git-auto-commit-action v4 composite
- actions/checkout v4 composite
- actions/setup-python v3 composite
- PyYAML ==6.0.1
- mkdocs ==1.5.0
- mkdocs-material ==9.1.15
- mkdocs-material-extensions *
- mkgendocs ==0.9.2
- pymdown-extensions *
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pre-commit/action v3.0.1 composite
- Flask-Cors ==3.0.10
- PyYAML ==6.0.1
- angr ==9.2.21
- avatar2 *
- claripy ==9.2.21
- click ==8.1.3
- cryptography *
- dill *
- django *
- flask *
- flask ==2.2.5
- flask_session *
- gensim *
- grakel *
- graphviz *
- kvm *
- libvirt-python *
- logbook *
- markdown-callouts *
- matplotlib *
- minidump ==0.0.10
- mkdocs *
- mkdocs-awesome-pages-plugin *
- mkdocs-coverage *
- mkdocs-enumerate-headings-plugin *
- mkdocs-exclude *
- mkdocs-gen-files *
- mkdocs-git-authors-plugin *
- mkdocs-git-revision-date-localized-plugin *
- mkdocs-img2fig-plugin *
- mkdocs-literate-nav *
- mkdocs-material *
- mkdocs-material-extensions *
- mkdocs-minify-plugin *
- mkdocs-print-site-plugin *
- mkdocs-same-dir *
- mkdocs-section-index *
- mkdocs-table-reader-plugin *
- mkdocstrings *
- mkgendocs *
- mknotebooks *
- mmh3 *
- monkeyhex *
- nose *
- notebook *
- npf-web-extension *
- numpy *
- pandas *
- progressbar *
- protobuf ==3.20.
- psutil *
- pygraphviz *
- pyinstaller *
- pymdown-extensions *
- pymongo *
- pyzipper *
- r2pipe *
- requests *
- researchpy *
- scikit-learn *
- seaborn *
- tasks *
- termcolor *
- terminal_banner *
- torch *
- torchvision *
- unipacker *
- unix *