pi-scanner
GitHub PI Scanner - Detects Australian personally identifiable information in code repositories
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary
Repository
GitHub PI Scanner - Detects Australian personally identifiable information in code repositories
Basic Info
- Host: GitHub
- Owner: Obsidian-Owl
- License: mit
- Language: Go
- Default Branch: main
- Size: 78.3 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 3
- Releases: 2
Metadata Files
README.md
GitHub PI Scanner
A high-performance scanner for detecting Australian Personal Information (PI) in GitHub repositories, designed for enterprise compliance with Australian privacy regulations.
Features
- Australian PI Detection: Specialized detection for TFN, ABN, Medicare numbers, BSB codes, ACN, driver licenses, passports, and credit cards
- Banking Domain Intelligence: AST-based analysis for Java, Scala, and Python with banking-specific risk assessment
- Two-Phase Architecture: Pattern detection followed by optional AI-powered validation for 100% accuracy
- Local LLM Integration: Code-aware validation using LM Studio for superior false positive reduction
- Repository Structure Analysis: Intelligent risk zone mapping based on file paths and code patterns
- Smart Progress Tracking: Real-time progress indicators with accurate time estimates
- Secure Output: Configurable masking levels to protect sensitive data in reports
- Enterprise Ready: Non-interactive mode for CI/CD integration with comprehensive reporting
Prerequisites
- Go 1.21+ (for building from source)
- GitHub token with repository read access
- (Optional) LM Studio for AI-powered validation
Quick Start
Installation
Option 1: Docker (Recommended)
```bash
Pull the latest image
docker pull ghcr.io/macattak/pi-scanner:latest
Run with GitHub token
docker run --rm -e GITHUBTOKEN=$GITHUBTOKEN \ ghcr.io/macattak/pi-scanner:latest https://github.com/example/repo
Run with local output directory
docker run --rm -e GITHUBTOKEN=$GITHUBTOKEN \ -v $(pwd)/output:/home/scanner/output \ ghcr.io/macattak/pi-scanner:latest https://github.com/example/repo ```
Option 2: Download Binary
Download the latest release from the releases page.
```bash
macOS/Linux
curl -LO https://github.com/MacAttak/pi-scanner/releases/download/v1.2.0/pi-scanner-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m).tar.gz tar -xzf pi-scanner-*.tar.gz chmod +x pi-scanner sudo mv pi-scanner /usr/local/bin/ ```
Option 3: Build from Source
```bash
Clone the repository
git clone https://github.com/MacAttak/pi-scanner.git cd pi-scanner
Build the binary
go build -o bin/pi-scanner ./cmd/pi-scanner
Or use Make
make build ```
Basic Usage
The scanner provides a guided experience through two phases:
- Pattern-based scanning - Fast detection using regex patterns
- AI validation (optional) - Reduce false positives using LLM
```bash
Interactive guided scan
pi-scanner https://github.com/example/repo
The scanner will:
1. Clone and scan the repository for PI patterns
2. Save a masked report to ./reports/
3. Show you a summary of findings
4. Ask if you want to validate findings with AI
```
Non-Interactive Mode
For automation and CI/CD pipelines:
```bash
Pattern scan only (no AI validation)
pi-scanner https://github.com/example/repo --no-input
Automatic high-risk validation
pi-scanner https://github.com/example/repo --no-input --validate=high
Validate all findings
pi-scanner https://github.com/example/repo --no-input --validate=all ```
Masking Levels
Control how PI data appears in reports:
```bash
Partial masking (default) - Shows partial values like 123****82
pi-scanner https://github.com/example/repo --masking=partial
Full masking - Complete redaction
pi-scanner https://github.com/example/repo --masking=full
No masking - Shows full values (use with caution!)
pi-scanner https://github.com/example/repo --masking=none ```
AI-Powered Validation
The scanner can use a local LLM to validate findings and reduce false positives:
Setup LM Studio
- Download and install LM Studio
- Download a recommended model (e.g.,
qwen2.5-coder-7b-instruct) - Start the local server (usually on port 1234)
Check LLM Availability
```bash
Test if LLM service is available
pi-scanner llm-check ```
Validation Options
During interactive scanning, you'll be presented with validation options:
``` 📊 Would you like to validate these findings with AI? This can significantly reduce false positives.
1) Validate all findings (329 items) - Est. 10-15 minutes 2) Validate HIGH + MEDIUM only (28 items) - Est. 1-2 minutes 3) Validate HIGH + CRITICAL only (5 items) - Est. < 1 minute 4) Skip validation ```
Reports
All scan results are saved to the ./reports/ directory with the following structure:
reports/
└── 20250628_140000_owner_repo/
├── phase1_pattern_scan.json # Pattern scan results
├── phase2_llm_validated.json # AI validation results (if performed)
└── summary.txt # Human-readable summary
Docker Usage
The PI Scanner is available as a Docker image from GitHub Container Registry.
Basic Docker Commands
```bash
Pull specific version
docker pull ghcr.io/macattak/pi-scanner:1.2.0
Run scan with GitHub token
docker run --rm -e GITHUBTOKEN=$GITHUBTOKEN \ ghcr.io/macattak/pi-scanner:latest https://github.com/example/repo
Save reports to local directory
docker run --rm -e GITHUBTOKEN=$GITHUBTOKEN \ -v $(pwd)/reports:/home/scanner/output \ ghcr.io/macattak/pi-scanner:latest https://github.com/example/repo
Run with custom config
docker run --rm -e GITHUBTOKEN=$GITHUBTOKEN \ -v $(pwd)/config.yaml:/etc/pi-scanner/config/config.yaml:ro \ ghcr.io/macattak/pi-scanner:latest https://github.com/example/repo ```
Docker Compose Example
yaml
version: '3.8'
services:
pi-scanner:
image: ghcr.io/macattak/pi-scanner:latest
environment:
- GITHUB_TOKEN=${GITHUB_TOKEN}
volumes:
- ./reports:/home/scanner/output
- ./config.yaml:/etc/pi-scanner/config/config.yaml:ro
command: https://github.com/example/repo --no-input --validate=high
CI/CD Integration
GitHub Actions Example
yaml
- name: PI Security Scan
run: |
pi-scanner ${{ github.event.repository.html_url }} \
--no-input \
--validate=high \
--masking=full
Using Docker in CI
yaml
- name: PI Security Scan (Docker)
run: |
docker run --rm \
-e GITHUB_TOKEN=${{ secrets.GITHUB_TOKEN }} \
-v ${{ github.workspace }}/reports:/home/scanner/output \
ghcr.io/macattak/pi-scanner:latest \
${{ github.event.repository.html_url }} \
--no-input --validate=high --masking=full
Environment Variables
GITHUB_TOKEN- Required for accessing private repositoriesNO_COLOR- Disable colored outputCI- Automatically enables non-interactive mode
Advanced Usage
Verbose Output
```bash
Show detailed progress and debugging information
pi-scanner https://github.com/example/repo --verbose ```
Custom LLM Configuration
```bash
Use a different LLM endpoint
pi-scanner llm-check --endpoint http://localhost:8080/v1 --model codellama-7b ```
Contributing
See CONTRIBUTING.md for development setup and guidelines.
License
MIT License - see LICENSE for details.
Owner
- Name: Obsidian Owl
- Login: Obsidian-Owl
- Kind: organization
- Repositories: 1
- Profile: https://github.com/Obsidian-Owl
Citation (CITATION.cff)
cff-version: 1.2.0
title: GitHub PI Scanner
message: "If you use this software, please cite it as below."
type: software
authors:
- family-names: "McCarthy"
given-names: "D"
email: macmilky1@gmail.com
repository-code: "https://github.com/MacAttak/pi-scanner"
url: "https://github.com/MacAttak/pi-scanner"
abstract: "A high-performance scanner for detecting Australian Personal Information (PI) in GitHub repositories, designed for enterprise compliance with Australian privacy regulations."
keywords:
- security
- privacy
- compliance
- golang
- scanner
- personal-information
- australia
license: MIT
version: 1.0.0
date-released: 2025-01-01
GitHub Events
Total
- Issues event: 1
- Issue comment event: 22
- Pull request event: 8
- Create event: 5
Last Year
- Issues event: 1
- Issue comment event: 22
- Pull request event: 8
- Create event: 5
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 1
- Total pull requests: 7
- Average time to close issues: 2 months
- Average time to close pull requests: 23 days
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 2.0
- Average comments per pull request: 2.14
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 7
Past Year
- Issues: 1
- Pull requests: 7
- Average time to close issues: 2 months
- Average time to close pull requests: 23 days
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 2.0
- Average comments per pull request: 2.14
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 7
Top Authors
Issue Authors
- MacAttak (1)
Pull Request Authors
- dependabot[bot] (7)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/github-script v7 composite
- actions/setup-go v5 composite
- actions/upload-artifact v4 composite
- aquasecurity/trivy-action master composite
- codecov/codecov-action v4 composite
- github/codeql-action/analyze v3 composite
- github/codeql-action/autobuild v3 composite
- github/codeql-action/init v3 composite
- github/codeql-action/upload-sarif v3 composite
- golangci/golangci-lint-action v6 composite
- securego/gosec master composite
- golang 1.23-alpine build
- rust 1.75-alpine build
- ubuntu 22.04 build
- pi-scanner latest
- pi-scanner test
- dario.cat/mergo v1.0.1
- github.com/BobuSumisu/aho-corasick v1.0.3
- github.com/Masterminds/goutils v1.1.1
- github.com/Masterminds/semver/v3 v3.3.0
- github.com/Masterminds/sprig/v3 v3.3.0
- github.com/STARRY-S/zip v0.2.1
- github.com/andybalholm/brotli v1.1.2-0.20250424173009-453214e765f3
- github.com/aymanbagabas/go-osc52/v2 v2.0.1
- github.com/bmatcuk/doublestar/v4 v4.8.1
- github.com/bodgit/plumbing v1.3.0
- github.com/bodgit/sevenzip v1.6.0
- github.com/bodgit/windows v1.0.1
- github.com/charmbracelet/lipgloss v0.5.0
- github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc
- github.com/dsnet/compress v0.0.2-0.20230904184137-39efe44ab707
- github.com/fatih/semgroup v1.2.0
- github.com/fsnotify/fsnotify v1.8.0
- github.com/gitleaks/go-gitdiff v0.9.1
- github.com/google/uuid v1.6.0
- github.com/h2non/filetype v1.1.3
- github.com/hashicorp/errwrap v1.1.0
- github.com/hashicorp/go-multierror v1.1.1
- github.com/hashicorp/golang-lru/v2 v2.0.7
- github.com/hashicorp/hcl v1.0.0
- github.com/huandu/xstrings v1.5.0
- github.com/inconshreveable/mousetrap v1.1.0
- github.com/klauspost/compress v1.17.11
- github.com/klauspost/pgzip v1.2.6
- github.com/lucasb-eyer/go-colorful v1.2.0
- github.com/magiconair/properties v1.8.9
- github.com/mattn/go-colorable v0.1.14
- github.com/mattn/go-isatty v0.0.20
- github.com/mattn/go-runewidth v0.0.14
- github.com/mholt/archives v0.1.2
- github.com/minio/minlz v1.0.0
- github.com/mitchellh/copystructure v1.2.0
- github.com/mitchellh/mapstructure v1.5.0
- github.com/mitchellh/reflectwalk v1.0.2
- github.com/muesli/reflow v0.2.1-0.20210115123740-9e1d0d53df68
- github.com/muesli/termenv v0.15.1
- github.com/nwaples/rardecode/v2 v2.1.0
- github.com/pelletier/go-toml/v2 v2.2.3
- github.com/pierrec/lz4/v4 v4.1.21
- github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2
- github.com/rivo/uniseg v0.2.0
- github.com/rs/zerolog v1.33.0
- github.com/sagikazarmark/locafero v0.7.0
- github.com/sagikazarmark/slog-shim v0.1.0
- github.com/shopspring/decimal v1.4.0
- github.com/sorairolake/lzip-go v0.3.5
- github.com/sourcegraph/conc v0.3.0
- github.com/spf13/afero v1.12.0
- github.com/spf13/cast v1.7.1
- github.com/spf13/cobra v1.9.1
- github.com/spf13/pflag v1.0.6
- github.com/spf13/viper v1.19.0
- github.com/stretchr/testify v1.10.0
- github.com/subosito/gotenv v1.6.0
- github.com/tetratelabs/wazero v1.9.0
- github.com/therootcompany/xz v1.0.1
- github.com/ulikunitz/xz v0.5.12
- github.com/wasilibs/go-re2 v1.9.0
- github.com/wasilibs/wazero-helpers v0.0.0-20240620070341-3dff1577cd52
- github.com/zricethezav/gitleaks/v8 v8.27.2
- go.uber.org/multierr v1.11.0
- go4.org v0.0.0-20230225012048-214862532bf5
- golang.org/x/crypto v0.35.0
- golang.org/x/exp v0.0.0-20250218142911-aa4b98e5adaa
- golang.org/x/sync v0.11.0
- golang.org/x/sys v0.30.0
- golang.org/x/text v0.22.0
- gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c
- gopkg.in/ini.v1 v1.67.0
- gopkg.in/yaml.v3 v3.0.1
- 347 dependencies
- github.com/davecgh/go-spew v1.1.1
- github.com/pmezard/go-difflib v1.0.0
- github.com/stretchr/testify v1.10.0
- gopkg.in/yaml.v3 v3.0.1
- github.com/davecgh/go-spew v1.1.1
- github.com/pmezard/go-difflib v1.0.0
- github.com/stretchr/testify v1.10.0
- gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405
- gopkg.in/yaml.v3 v3.0.1
- actions/checkout v4 composite
- anchore/sbom-action v0 composite
- aquasecurity/trivy-action master composite
- docker/build-push-action v5 composite
- docker/login-action v3 composite
- docker/metadata-action v5 composite
- docker/setup-buildx-action v3 composite
- docker/setup-qemu-action v3 composite
- github/codeql-action/upload-sarif v3 composite
- sigstore/cosign-installer v3 composite
- actions/checkout v4 composite
- actions/setup-go v5 composite
- anchore/sbom-action v0 composite
- aquasecurity/trivy-action master composite
- docker/build-push-action v5 composite
- docker/login-action v3 composite
- docker/metadata-action v5 composite
- docker/setup-buildx-action v3 composite
- docker/setup-qemu-action v3 composite
- github/codeql-action/upload-sarif v3 composite
- sigstore/cosign-installer v3 composite
- softprops/action-gh-release v2 composite
- actions/stale v9 composite
- ../.. *
- github.com/cucumber/godog v0.14.0