osptrack

labelled dataset for simulated package execution with package-analysis

https://github.com/wapiti08/osptrack

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.2%) to scientific vocabulary

Keywords

dataset dynamic-features open-source supply-chain-security
Last synced: 6 months ago · JSON representation ·

Repository

labelled dataset for simulated package execution with package-analysis

Basic Info
  • Host: GitHub
  • Owner: Wapiti08
  • License: apache-2.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 23.7 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
dataset dynamic-features open-source supply-chain-security
Created over 2 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

OSPTrack

labelled dataset for simulated package execution with package-analysis

This work has been accepted at MSR 2025 Data and Tool Showcase Track, will present on 28th, April, 2025

Python License Testing Environment DOI

Structure (core)

  • ana:

    • stastical analysis for BKC Dataset and also malicious-packages
    • the code to extract metrics.csv and iocs.csv files
    • label distribution analysis for labeled dataset
  • data:

    • collection from BKC and also malicious-packages
    • places to save bkcmal.csv and pkgmal.csv
    • places to save extracted data also final labeled dataset
  • data_create:

    • code to query BigQuery
    • code to run simulation
  • ext:

    • code to parse reports (json and csv)
    • code to extract features and generate final dataset
  • run_analysis.sh:

    custom shell script to run package-analysis to save results locally and avoid repetitions

Preparation (Environment Setting Up)

  • For BigQuqry: ``` # download bigquery key from google cloud # activate the key export GOOGLEAPPLICATIONCREDENTIALS="path/to/your/service-account-file.json" # the key needs to be loaded when querying BigQuery

```

```

git download

sudo apt-get install git

docker

sudo apt-get install -y docker.io

start the docker service

sudo systemctl start docker

golang download

sudo apt-get install golang

direct running --- check whether this tool works locally

how to run local instance

local instance

scripts/run_analysis.sh -ecosystem pypi -package test -local /path/to/test.whl

live instance

scripts/run_analysis.sh -ecosystem pypi -package Django -version 4.1.3

after successfully running one instance

replace the run_analysis.sh with the one provided in this resp --- give 755

```

Running Instructions

```

virtual environment setting up

eval "$(pyenv init -)" eval "$(pyenv virtualenv-init -)"

query data from BigQuery

python3 data_bigquery.py

run simulation by calling package-analysis

sudo python3 simu_run.py

```

Owner

  • Name: Wapiti
  • Login: Wapiti08
  • Kind: user
  • Location: Glasgow
  • Company: UofG

Building and researching cyber security and machine learning technology

Citation (CITATION.cff)

cff-version: 1.0.0-beta
message: "If you use this software, please cite it as below."
authors:
- family-names: "Tan"
  given-names: "Zhuoran"
  orcid: "https://orcid.org/0000-0002-0809-0376"
- family-names: "Anagnostopoulos"
  given-names: "Christos"
  orcid: "https://orcid.org/0000-0003-1517-6757"
- family-names: "Singer"
  given-names: "Jeremy"
  orcid: "https://orcid.org/0000-0001-9462-6802"
title: "OSPtrack: A Labelled Dataset Targeting Simulated Open-Source Package Execution "
version: 1.0.0-beta
doi: 10.5281/zenodo.14197321
date-released: 2024-11-21
url: "https://github.com/Wapiti08/OSPTrack"

GitHub Events

Total
  • Release event: 2
  • Watch event: 3
  • Public event: 1
  • Push event: 5
  • Create event: 1
Last Year
  • Release event: 2
  • Watch event: 3
  • Public event: 1
  • Push event: 5
  • Create event: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • dask ==2024.8.1
  • db-dtypes ==1.2.0
  • fastparquet ==2024.5.0
  • google-cloud-bigquery ==3.25.0
  • pandas ==2.2.2
  • pyarrow ==17.0.0
  • python-dotenv ==1.0.1
  • tqdm ==4.66.5