https://github.com/0xk1h0/git_cwe_collect

Felina

https://github.com/0xk1h0/git_cwe_collect

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Felina

Basic Info
  • Host: GitHub
  • Owner: 0xk1h0
  • Language: Python
  • Default Branch: main
  • Size: 38.1 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 10 months ago
Metadata Files
Readme

README.md

Felina


Just run this single command: ./collect_100k.sh

📋 What it does automatically:

  1. 75+ different search queries across:
    • CVE years (2020-2024)
    • Vulnerability types (XSS, SQLi, buffer overflow, etc.)
    • Languages (Python, Java, C, JavaScript, etc.)
    • CWE-specific searches
    • Date ranges
    • Security terms
  2. Smart duplicate removal at multiple levels:
    • Commit-level deduplication
    • Code-content deduplication
    • Cross-file deduplication
  3. Progress tracking:
    • Shows real-time progress toward 100k target
    • Stops automatically when target reached
    • Saves execution logs
  4. Automatic combination:
    • Merges all CSV files into one final dataset
    • Removes duplicates across all files
    • Provides final statistics

📊 Expected Results:

  • 100,000+ unique vulnerability code samples
  • High CWE coverage (20+ different vulnerability types)
  • Multi-language support (Python, Java, C, JavaScript, etc.)
  • 4-6 hours runtime (runs completely unattended)

🔧 Manual control available:

### Collect specific strategies only python3 collectmassivedataset.py --strategies cveyears languagespecific --target-size 50000

### Custom target size python3 collectmassivedataset.py --target-size 200000

Owner

  • Name: LEE KIHO
  • Login: 0xk1h0
  • Kind: user
  • Location: Seoul
  • Company: SKKU

Security

GitHub Events

Total
  • Push event: 1
  • Create event: 1
Last Year
  • Push event: 1
  • Create event: 1

Issues and Pull Requests

Last synced: 10 months ago