pi-scanner

GitHub PI Scanner - Detects Australian personally identifiable information in code repositories

https://github.com/obsidian-owl/pi-scanner

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

GitHub PI Scanner - Detects Australian personally identifiable information in code repositories

Basic Info
  • Host: GitHub
  • Owner: Obsidian-Owl
  • License: mit
  • Language: Go
  • Default Branch: main
  • Size: 78.3 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 3
  • Releases: 2
Created about 1 year ago · Last pushed 12 months ago
Metadata Files
Readme Changelog Contributing Funding License Citation Codeowners Security Support

README.md

GitHub PI Scanner

CI Status Go Report Card License: MIT Go Version

A high-performance scanner for detecting Australian Personal Information (PI) in GitHub repositories, designed for enterprise compliance with Australian privacy regulations.

Features

  • Australian PI Detection: Specialized detection for TFN, ABN, Medicare numbers, BSB codes, ACN, driver licenses, passports, and credit cards
  • Banking Domain Intelligence: AST-based analysis for Java, Scala, and Python with banking-specific risk assessment
  • Two-Phase Architecture: Pattern detection followed by optional AI-powered validation for 100% accuracy
  • Local LLM Integration: Code-aware validation using LM Studio for superior false positive reduction
  • Repository Structure Analysis: Intelligent risk zone mapping based on file paths and code patterns
  • Smart Progress Tracking: Real-time progress indicators with accurate time estimates
  • Secure Output: Configurable masking levels to protect sensitive data in reports
  • Enterprise Ready: Non-interactive mode for CI/CD integration with comprehensive reporting

Prerequisites

  • Go 1.21+ (for building from source)
  • GitHub token with repository read access
  • (Optional) LM Studio for AI-powered validation

Quick Start

Installation

Option 1: Docker (Recommended)

```bash

Pull the latest image

docker pull ghcr.io/macattak/pi-scanner:latest

Run with GitHub token

docker run --rm -e GITHUBTOKEN=$GITHUBTOKEN \ ghcr.io/macattak/pi-scanner:latest https://github.com/example/repo

Run with local output directory

docker run --rm -e GITHUBTOKEN=$GITHUBTOKEN \ -v $(pwd)/output:/home/scanner/output \ ghcr.io/macattak/pi-scanner:latest https://github.com/example/repo ```

Option 2: Download Binary

Download the latest release from the releases page.

```bash

macOS/Linux

curl -LO https://github.com/MacAttak/pi-scanner/releases/download/v1.2.0/pi-scanner-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m).tar.gz tar -xzf pi-scanner-*.tar.gz chmod +x pi-scanner sudo mv pi-scanner /usr/local/bin/ ```

Option 3: Build from Source

```bash

Clone the repository

git clone https://github.com/MacAttak/pi-scanner.git cd pi-scanner

Build the binary

go build -o bin/pi-scanner ./cmd/pi-scanner

Or use Make

make build ```

Basic Usage

The scanner provides a guided experience through two phases:

  1. Pattern-based scanning - Fast detection using regex patterns
  2. AI validation (optional) - Reduce false positives using LLM

```bash

Interactive guided scan

pi-scanner https://github.com/example/repo

The scanner will:

1. Clone and scan the repository for PI patterns

2. Save a masked report to ./reports/

3. Show you a summary of findings

4. Ask if you want to validate findings with AI

```

Non-Interactive Mode

For automation and CI/CD pipelines:

```bash

Pattern scan only (no AI validation)

pi-scanner https://github.com/example/repo --no-input

Automatic high-risk validation

pi-scanner https://github.com/example/repo --no-input --validate=high

Validate all findings

pi-scanner https://github.com/example/repo --no-input --validate=all ```

Masking Levels

Control how PI data appears in reports:

```bash

Partial masking (default) - Shows partial values like 123****82

pi-scanner https://github.com/example/repo --masking=partial

Full masking - Complete redaction

pi-scanner https://github.com/example/repo --masking=full

No masking - Shows full values (use with caution!)

pi-scanner https://github.com/example/repo --masking=none ```

AI-Powered Validation

The scanner can use a local LLM to validate findings and reduce false positives:

Setup LM Studio

  1. Download and install LM Studio
  2. Download a recommended model (e.g., qwen2.5-coder-7b-instruct)
  3. Start the local server (usually on port 1234)

Check LLM Availability

```bash

Test if LLM service is available

pi-scanner llm-check ```

Validation Options

During interactive scanning, you'll be presented with validation options:

``` 📊 Would you like to validate these findings with AI? This can significantly reduce false positives.

1) Validate all findings (329 items) - Est. 10-15 minutes 2) Validate HIGH + MEDIUM only (28 items) - Est. 1-2 minutes 3) Validate HIGH + CRITICAL only (5 items) - Est. < 1 minute 4) Skip validation ```

Reports

All scan results are saved to the ./reports/ directory with the following structure:

reports/ └── 20250628_140000_owner_repo/ ├── phase1_pattern_scan.json # Pattern scan results ├── phase2_llm_validated.json # AI validation results (if performed) └── summary.txt # Human-readable summary

Docker Usage

The PI Scanner is available as a Docker image from GitHub Container Registry.

Basic Docker Commands

```bash

Pull specific version

docker pull ghcr.io/macattak/pi-scanner:1.2.0

Run scan with GitHub token

docker run --rm -e GITHUBTOKEN=$GITHUBTOKEN \ ghcr.io/macattak/pi-scanner:latest https://github.com/example/repo

Save reports to local directory

docker run --rm -e GITHUBTOKEN=$GITHUBTOKEN \ -v $(pwd)/reports:/home/scanner/output \ ghcr.io/macattak/pi-scanner:latest https://github.com/example/repo

Run with custom config

docker run --rm -e GITHUBTOKEN=$GITHUBTOKEN \ -v $(pwd)/config.yaml:/etc/pi-scanner/config/config.yaml:ro \ ghcr.io/macattak/pi-scanner:latest https://github.com/example/repo ```

Docker Compose Example

yaml version: '3.8' services: pi-scanner: image: ghcr.io/macattak/pi-scanner:latest environment: - GITHUB_TOKEN=${GITHUB_TOKEN} volumes: - ./reports:/home/scanner/output - ./config.yaml:/etc/pi-scanner/config/config.yaml:ro command: https://github.com/example/repo --no-input --validate=high

CI/CD Integration

GitHub Actions Example

yaml - name: PI Security Scan run: | pi-scanner ${{ github.event.repository.html_url }} \ --no-input \ --validate=high \ --masking=full

Using Docker in CI

yaml - name: PI Security Scan (Docker) run: | docker run --rm \ -e GITHUB_TOKEN=${{ secrets.GITHUB_TOKEN }} \ -v ${{ github.workspace }}/reports:/home/scanner/output \ ghcr.io/macattak/pi-scanner:latest \ ${{ github.event.repository.html_url }} \ --no-input --validate=high --masking=full

Environment Variables

  • GITHUB_TOKEN - Required for accessing private repositories
  • NO_COLOR - Disable colored output
  • CI - Automatically enables non-interactive mode

Advanced Usage

Verbose Output

```bash

Show detailed progress and debugging information

pi-scanner https://github.com/example/repo --verbose ```

Custom LLM Configuration

```bash

Use a different LLM endpoint

pi-scanner llm-check --endpoint http://localhost:8080/v1 --model codellama-7b ```

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

MIT License - see LICENSE for details.

Owner

  • Name: Obsidian Owl
  • Login: Obsidian-Owl
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
title: GitHub PI Scanner
message: "If you use this software, please cite it as below."
type: software
authors:
  - family-names: "McCarthy"
    given-names: "D"
    email: macmilky1@gmail.com
repository-code: "https://github.com/MacAttak/pi-scanner"
url: "https://github.com/MacAttak/pi-scanner"
abstract: "A high-performance scanner for detecting Australian Personal Information (PI) in GitHub repositories, designed for enterprise compliance with Australian privacy regulations."
keywords:
  - security
  - privacy
  - compliance
  - golang
  - scanner
  - personal-information
  - australia
license: MIT
version: 1.0.0
date-released: 2025-01-01

GitHub Events

Total
  • Issues event: 1
  • Issue comment event: 22
  • Pull request event: 8
  • Create event: 5
Last Year
  • Issues event: 1
  • Issue comment event: 22
  • Pull request event: 8
  • Create event: 5

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 7
  • Average time to close issues: 2 months
  • Average time to close pull requests: 23 days
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 2.0
  • Average comments per pull request: 2.14
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 7
Past Year
  • Issues: 1
  • Pull requests: 7
  • Average time to close issues: 2 months
  • Average time to close pull requests: 23 days
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 2.0
  • Average comments per pull request: 2.14
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 7
Top Authors
Issue Authors
  • MacAttak (1)
Pull Request Authors
  • dependabot[bot] (7)
Top Labels
Issue Labels
stale (1)
Pull Request Labels
stale (1)

Dependencies

.github/workflows/ci.yml actions
  • actions/cache v4 composite
  • actions/checkout v4 composite
  • actions/github-script v7 composite
  • actions/setup-go v5 composite
  • actions/upload-artifact v4 composite
  • aquasecurity/trivy-action master composite
  • codecov/codecov-action v4 composite
  • github/codeql-action/analyze v3 composite
  • github/codeql-action/autobuild v3 composite
  • github/codeql-action/init v3 composite
  • github/codeql-action/upload-sarif v3 composite
  • golangci/golangci-lint-action v6 composite
  • securego/gosec master composite
Dockerfile docker
  • golang 1.23-alpine build
  • rust 1.75-alpine build
  • ubuntu 22.04 build
docker-compose.yml docker
  • pi-scanner latest
  • pi-scanner test
go.mod go
  • dario.cat/mergo v1.0.1
  • github.com/BobuSumisu/aho-corasick v1.0.3
  • github.com/Masterminds/goutils v1.1.1
  • github.com/Masterminds/semver/v3 v3.3.0
  • github.com/Masterminds/sprig/v3 v3.3.0
  • github.com/STARRY-S/zip v0.2.1
  • github.com/andybalholm/brotli v1.1.2-0.20250424173009-453214e765f3
  • github.com/aymanbagabas/go-osc52/v2 v2.0.1
  • github.com/bmatcuk/doublestar/v4 v4.8.1
  • github.com/bodgit/plumbing v1.3.0
  • github.com/bodgit/sevenzip v1.6.0
  • github.com/bodgit/windows v1.0.1
  • github.com/charmbracelet/lipgloss v0.5.0
  • github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc
  • github.com/dsnet/compress v0.0.2-0.20230904184137-39efe44ab707
  • github.com/fatih/semgroup v1.2.0
  • github.com/fsnotify/fsnotify v1.8.0
  • github.com/gitleaks/go-gitdiff v0.9.1
  • github.com/google/uuid v1.6.0
  • github.com/h2non/filetype v1.1.3
  • github.com/hashicorp/errwrap v1.1.0
  • github.com/hashicorp/go-multierror v1.1.1
  • github.com/hashicorp/golang-lru/v2 v2.0.7
  • github.com/hashicorp/hcl v1.0.0
  • github.com/huandu/xstrings v1.5.0
  • github.com/inconshreveable/mousetrap v1.1.0
  • github.com/klauspost/compress v1.17.11
  • github.com/klauspost/pgzip v1.2.6
  • github.com/lucasb-eyer/go-colorful v1.2.0
  • github.com/magiconair/properties v1.8.9
  • github.com/mattn/go-colorable v0.1.14
  • github.com/mattn/go-isatty v0.0.20
  • github.com/mattn/go-runewidth v0.0.14
  • github.com/mholt/archives v0.1.2
  • github.com/minio/minlz v1.0.0
  • github.com/mitchellh/copystructure v1.2.0
  • github.com/mitchellh/mapstructure v1.5.0
  • github.com/mitchellh/reflectwalk v1.0.2
  • github.com/muesli/reflow v0.2.1-0.20210115123740-9e1d0d53df68
  • github.com/muesli/termenv v0.15.1
  • github.com/nwaples/rardecode/v2 v2.1.0
  • github.com/pelletier/go-toml/v2 v2.2.3
  • github.com/pierrec/lz4/v4 v4.1.21
  • github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2
  • github.com/rivo/uniseg v0.2.0
  • github.com/rs/zerolog v1.33.0
  • github.com/sagikazarmark/locafero v0.7.0
  • github.com/sagikazarmark/slog-shim v0.1.0
  • github.com/shopspring/decimal v1.4.0
  • github.com/sorairolake/lzip-go v0.3.5
  • github.com/sourcegraph/conc v0.3.0
  • github.com/spf13/afero v1.12.0
  • github.com/spf13/cast v1.7.1
  • github.com/spf13/cobra v1.9.1
  • github.com/spf13/pflag v1.0.6
  • github.com/spf13/viper v1.19.0
  • github.com/stretchr/testify v1.10.0
  • github.com/subosito/gotenv v1.6.0
  • github.com/tetratelabs/wazero v1.9.0
  • github.com/therootcompany/xz v1.0.1
  • github.com/ulikunitz/xz v0.5.12
  • github.com/wasilibs/go-re2 v1.9.0
  • github.com/wasilibs/wazero-helpers v0.0.0-20240620070341-3dff1577cd52
  • github.com/zricethezav/gitleaks/v8 v8.27.2
  • go.uber.org/multierr v1.11.0
  • go4.org v0.0.0-20230225012048-214862532bf5
  • golang.org/x/crypto v0.35.0
  • golang.org/x/exp v0.0.0-20250218142911-aa4b98e5adaa
  • golang.org/x/sync v0.11.0
  • golang.org/x/sys v0.30.0
  • golang.org/x/text v0.22.0
  • gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c
  • gopkg.in/ini.v1 v1.67.0
  • gopkg.in/yaml.v3 v3.0.1
go.sum go
  • 347 dependencies
test/e2e/go.mod go
  • github.com/davecgh/go-spew v1.1.1
  • github.com/pmezard/go-difflib v1.0.0
  • github.com/stretchr/testify v1.10.0
  • gopkg.in/yaml.v3 v3.0.1
test/e2e/go.sum go
  • github.com/davecgh/go-spew v1.1.1
  • github.com/pmezard/go-difflib v1.0.0
  • github.com/stretchr/testify v1.10.0
  • gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405
  • gopkg.in/yaml.v3 v3.0.1
.github/workflows/docker.yml actions
  • actions/checkout v4 composite
  • anchore/sbom-action v0 composite
  • aquasecurity/trivy-action master composite
  • docker/build-push-action v5 composite
  • docker/login-action v3 composite
  • docker/metadata-action v5 composite
  • docker/setup-buildx-action v3 composite
  • docker/setup-qemu-action v3 composite
  • github/codeql-action/upload-sarif v3 composite
  • sigstore/cosign-installer v3 composite
.github/workflows/release.yml actions
  • actions/checkout v4 composite
  • actions/setup-go v5 composite
  • anchore/sbom-action v0 composite
  • aquasecurity/trivy-action master composite
  • docker/build-push-action v5 composite
  • docker/login-action v3 composite
  • docker/metadata-action v5 composite
  • docker/setup-buildx-action v3 composite
  • docker/setup-qemu-action v3 composite
  • github/codeql-action/upload-sarif v3 composite
  • sigstore/cosign-installer v3 composite
  • softprops/action-gh-release v2 composite
.github/workflows/stale.yml actions
  • actions/stale v9 composite
test/bdd/go.mod go
  • ../.. *
  • github.com/cucumber/godog v0.14.0