Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: lodetomasi1995
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 66.4 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Obfuscation Semantics

License: MIT

A framework for evaluating LLM-driven code obfuscation using novel metrics like Semantic Elasticity.

Overview

This repository contains the code and resources for the paper "Simplicity by Obfuscation: Evaluating LLM-Driven Code Transformation with Semantic Elasticity". It provides a comprehensive framework for:

  1. Evaluating code obfuscation capabilities of different LLMs
  2. Comparing standard and few-shot prompting approaches
  3. Measuring obfuscation effectiveness using novel metrics
  4. Benchmarking across diverse algorithmic patterns

Features

  • Multi-Model Support: Test Claude-3.5-Sonnet, Gemini-1.5, and GPT-4-Turbo
  • Diverse Algorithmic Patterns: 30 functions across 5 categories
  • Comprehensive Metrics: Includes pass rate, complexity, entropy, timing, and Semantic Elasticity
  • Configurable Experiments: Easily customize which models, functions, and approaches to test
  • Detailed Analysis: Generate visualizations and statistical comparisons

Requirements

  • Python 3.9+
  • Required packages (install via pip install -r requirements.txt):
    • anthropic
    • google-generativeai
    • openai
    • pandas
    • numpy
    • matplotlib
    • seaborn
    • scipy

Setup

  1. Clone the repository: bash git clone https://github.com/yourusername/obfuscation-semantics.git cd obfuscation-semantics

  2. Install dependencies: bash pip install -r requirements.txt

  3. Configure API keys:

    • Create a copy of config/example_config.json as config/config.json
    • Add your API keys for Claude, Gemini, and GPT-4

Usage

Running Experiments

bash python src/experiment_runner.py --config config/config.json

You can also pass API keys directly:

bash python src/experiment_runner.py --anthropic-key YOUR_ANTHROPIC_KEY --gemini-key YOUR_GEMINI_KEY --openai-key YOUR_OPENAI_KEY

Analyzing Results

bash python analysis/analysis.py --results-dir results/data

Generate visualizations:

bash python analysis/visualize.py --results-dir results/data

Project Structure

obfuscation-semantics/ ├── README.md ├── CITATION.cff ├── LICENSE ├── requirements.txt ├── config/ │ └── example_config.json ├── src/ │ ├── code_obfuscator.py │ ├── semantic_elasticity.py │ ├── experiment_runner.py │ ├── test_cases.py │ └── sample_functions.py ├── analysis/ │ ├── analysis.py │ └── visualize.py ├── examples/ │ ├── standard_prompt_examples/ │ └── few_shot_examples/ └── results/ ├── figures/ └── data/

Core Components

  • code_obfuscator.py: Handles interaction with LLMs for code obfuscation
  • semantic_elasticity.py: Implements the novel Semantic Elasticity metric
  • experiment_runner.py: Coordinates execution of trials and collects results
  • test_cases.py: Provides comprehensive test cases for function validation
  • sample_functions.py: Contains the 30 functions used for benchmarking

Semantic Elasticity Metric

Our Semantic Elasticity (SE) metric quantifies a model's ability to radically transform code structure while maintaining functionality:

SE = |ΔCC| × P² / E

Where: - |ΔCC|: Absolute cyclomatic complexity change - P²: Square of pass rate to emphasize functional correctness - E: Code expansion ratio (inversely related)

Higher values indicate more effective transformations that maintain functionality while significantly changing code structure.

Dataset

| Function Name | Category | Description | Algorithmic Pattern | Complexity | Uses Recursion | Parameters | |---------------|----------|-------------|---------------------|------------|----------------|------------| | factorial | Mathematical | Calculate factorial recursively | Recursive | O(n) | Yes | 1 | | fibonacci | Mathematical | Calculate Fibonacci number | Recursive with overlapping subproblems | O(2ⁿ) | Yes | 1 | | isprime | Mathematical | Check if number is prime | Conditional logic | O(√n) | No | 1 | | gcd | Mathematical | Find greatest common divisor | Euclidean algorithm | O(log(min(a,b))) | Yes | 2 | | lcm | Mathematical | Find least common multiple | Mathematical calculation | O(log(min(a,b))) | Yes (via gcd) | 2 | | power | Mathematical | Calculate power recursively | Recursive exponentiation | O(log n) | Yes | 2 | | sqrtnewton | Mathematical | Calculate square root | Newton's method | O(log n) | No | 2 | | bubblesort | Sorting/Searching | Sort array using bubble sort | Nested iterations | O(n²) | No | 1 | | binarysearch | Sorting/Searching | Search in sorted array | Divide-and-conquer | O(log n) | No | 2 | | mergesort | Sorting/Searching | Sort using merge sort | Divide-and-conquer with recursion | O(n log n) | Yes | 1 | | quicksort | Sorting/Searching | Sort using quick sort | Partition-based sorting | O(n log n) avg, O(n²) worst | Yes | 1 | | insertionsort | Sorting/Searching | Sort using insertion | Iterative insertion | O(n²) | No | 1 | | linearsearch | Sorting/Searching | Search in unsorted array | Simple iteration | O(n) | No | 2 | | strreverse | String Manipulation | Reverse a string | Simple string manipulation | O(n) | No | 1 | | ispalindrome | String Manipulation | Check if string is palindrome | String testing | O(n) | No | 1 | | wordcount | String Manipulation | Count words in text | Basic text processing | O(n) | No | 1 | | longestcommonsubstring | String Manipulation | Find common substring | Dynamic programming | O(m×n) | No | 2 | | levenshteindistance | String Manipulation | Calculate edit distance | Edit distance algorithm | O(m×n) | No | 2 | | countvowels | String Manipulation | Count vowels in string | Character filtering | O(n) | No | 1 | | flattenlist | Data Structure | Flatten nested list | Recursive list transformation | O(n) | Yes | 1 | | listpermutations | Data Structure | Generate all permutations | Combinatorial algorithm | O(n!) | Yes | 1 | | dictmerge | Data Structure | Merge dictionaries recursively | Nested structure merging | O(n+m) | Yes | 2 | | removeduplicates | Data Structure | Remove duplicates from list | Set operations | O(n) | No | 1 | | rotatearray | Data Structure | Rotate array elements | Array manipulation | O(n) | No | 2 | | towerofhanoi | Recursive | Solve Tower of Hanoi puzzle | Classic recursion problem | O(2ⁿ) | Yes | 1 | | binarytreedepth | Recursive | Find max depth of binary tree | Tree traversal | O(n) | Yes | 1 | | floodfill | Recursive | Perform flood fill on image | Graph traversal | O(n) | Yes | 4 | | knapsack | Recursive | Solve knapsack problem | Optimization problem | O(n×W) | Yes | 3 | | editdistance | Recursive | Calculate edit distance | String comparison | O(m×n) | No | 2 | | coin_change | Recursive | Find minimum coins for amount | Dynamic programming | O(n×amount) | No | 2 |

License

This project is licensed under the MIT License - see the LICENSE file for details.

Owner

  • Login: lodetomasi1995
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: De Tomasi
    given-names: Lorenzo
    affiliation: "University of L'Aquila"
    email: lorenzo.detomasi@graduate.univaq.it
  - family-names: Di Sipio
    given-names: Claudio
    affiliation: "University of L'Aquila"
    email: claudio.disipio@univaq.it
  - family-names: Di Marco
    given-names: Antinisca
    affiliation: "University of L'Aquila"
    email: antinisca.dimarco@univaq.it
  - family-names: Nguyen
    given-names: Phuong T.
    affiliation: "University of L'Aquila"
    email: phuong.nguyen@univaq.it
title: "Obfuscation Semantics: Evaluating LLM-Driven Code Transformation with Semantic Elasticity"
version: 1.0.0
date-released: 2025-01-15
url: "https://github.com/lorenzodetomasi/obfuscation-semantics"
preferred-citation:
  type: conference-paper
  authors:
    - family-names: De Tomasi
      given-names: Lorenzo
      affiliation: "University of L'Aquila"
      email: lorenzo.detomasi@graduate.univaq.it
    - family-names: Di Sipio
      given-names: Claudio
      affiliation: "University of L'Aquila"
      email: claudio.disipio@univaq.it
    - family-names: Di Marco
      given-names: Antinisca
      affiliation: "University of L'Aquila"
      email: antinisca.dimarco@univaq.it
    - family-names: Nguyen
      given-names: Phuong T.
      affiliation: "University of L'Aquila"
      email: phuong.nguyen@univaq.it
  title: "Simplicity by Obfuscation: Evaluating LLM-Driven Code Transformation with Semantic Elasticity"
  collection-title: "Proceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering"
  year: 2025
  month: 6
  publisher:
    name: "ACM"
  conference:
    name: "EASE 2025"
    city: "Istanbul"
    country: "Turkey"
    date-start: "2025-06-17"
    date-end: "2025-06-20"

GitHub Events

Total
  • Push event: 11
  • Create event: 2
Last Year
  • Push event: 11
  • Create event: 2

Dependencies

requirements.txt pypi
  • anthropic ==0.2.10
  • black ==23.7.0
  • google-generativeai ==0.4.0
  • matplotlib ==3.8.0
  • numpy ==1.25.0
  • openai ==1.6.0
  • openpyxl ==3.1.2
  • pandas ==2.1.0
  • pytest ==7.4.0
  • scipy ==1.11.3
  • seaborn ==0.13.0