java-vulnerability-patch-retriever
Sistema de recuperación semántica para sugerir commits de parche de vulnerabilidades en Java. Combina BM25 y Sentence Transformers para encontrar soluciones relevantes, evaluadas con métricas estándar. Basado en el dataset curado de TUHH-SoftSec.
https://github.com/lucascandia/java-vulnerability-patch-retriever
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Repository
Sistema de recuperación semántica para sugerir commits de parche de vulnerabilidades en Java. Combina BM25 y Sentence Transformers para encontrar soluciones relevantes, evaluadas con métricas estándar. Basado en el dataset curado de TUHH-SoftSec.
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
A Manually Curated Dataset of Vulnerability Introducing Commits In Java
Research in identifying vulnerabilities and the commits that introduce them is ongoing. However, many current methods rely heavily on automation, which can lead to a high rate of false positives and require significant error-checking. To address this issue, we developed a tool-assisted pipeline to manually review and examine vulnerabilities and their corresponding commits. Additionally, we collected relevant metadata such as modified lines of code, and the mapping of CVE and CWE categories. This data set can be used to validate automated methods like machine learning approaches.
Table of Contents
Dataset Description
The complete dataset can be found here.
It is structured in an JSON file with the following fields:
JSON Fields
| Fieldname | Brief | | --- | --- | |cwe| Common Weakness Enumeration ID | |introducing| Commit hash that introduces the vulnerability | |introstats| Number of lines added/deleted in the introducing commit | |introlines| Lines marked as vulnerable in the introducing commit | |fixingstats| Number of lines added/deleted in the fixing commits | |fixinglines| Lines marked as fixing the vulnerability in the fixing commit | |days_between| Days between the identified introducing and fixing commits |
Example
```json { "cve": "CVE-2019-11274", "cwe": "CWE-79", "repository": "https://github.com/cloudfoundry/uaa", "fixing": [ "a34f55fc97a81966faf21e3ae404ec24f1f31cf7" ], "introducing": "bb8ff8f4e8969b46fdacffcd27781d223c8c7244", "introstats": { "bb8ff8f4e8969b46fdacffcd27781d223c8c7244": { "add": 320, "del": 7 } }, "fixingstats": { "a34f55fc97a81966faf21e3ae404ec24f1f31cf7": { "add": 68, "del": 17 } }, "daysbetween": 1836, "fixinglines": { "server/src/main/java/org/cloudfoundry/identity/uaa/scim/endpoints/ScimGroupEndpoints.java": "168" }, "introducing_lines": { "scim/src/main/java/org/cloudfoundry/identity/uaa/scim/endpoints/ScimGroupEndpoints.java": "190" } },
```
Review Pipeline Instructions
Prerequisites
| Software | Used Version | | --- | --- | | Python3 |3.10.8 | | pip3 | 22.3.1 | | git | 2.29.0 | | Webbrowser of choice | Safari 16.1|
Setup
In order to install all required python packages please run the following command inside the review_pipeline directory:
- python3 -m pip install -R requirements.txt
Usage
The pipeline can be executed by the following command inside the review_pipeline directory:
- python3 manual_analysis_pipeline.py <path_to_input_dataset>
Input Dataset
The input dataset is expected to be a JSON file with the following fields:
| Fieldname | Brief | | --- | --- | |cveid| CVE id of the vulnerability| |repository| URL to the repository | |fixingcommits| List of fixing commit SHA-1 hashes |
Citation (citation.cff)
cff-version: 1.2.0 type: dataset message: "If you use this dataset, please cite it as below." authors: - family-names: "Hinrichs" given-names: "Torge" orcid: "https://orcid.org/0000-0001-7489-3540" - family-names: "Scandariato" given-names: "Riccardo" orcid: "https://orcid.org/0000-0003-3591-7671" title: "A Manually Curated Dataset of Vulnerability Introducing Commits In Java" version: 1.0.0 doi: 10.5281/zenodo.7565542 date-released: 25.01.2023 url: "https://github.com/tuhh-softsec/A-Manually-Curated-Dataset-of-Vulnerability-Introducing-Commits-in-Java"
GitHub Events
Total
- Push event: 4
- Create event: 3
Last Year
- Push event: 4
- Create event: 3
Dependencies
- GitPython *
- PyDriller *
- browser *