
Shipwright
Local-first CLI turning tickets into verified, traceable changesets with deterministic PreToolUse hooks and SHA256 integrity
- Problem
- AI code changes are fast but unverifiable
- Solution
- Sandboxed CLI with PreToolUse hooks, SHA256 integrity, secret scanning
- Proof
- 32K LOC, 88 test files, 1.24:1 test-to-code ratio, 73 modules
- Stack
- Python, Git worktrees, Claude Code, JSONL audit, SHA256
The Problem
AI-generated code changes are fast but unverifiable. When Claude Code produces a changeset, there's no guarantee it followed the plan, passed the tests, or didn't modify files outside scope. There is no standard way to enforce what commands and file paths the AI can access, detect if it accidentally introduced secrets, or produce an auditable record of exactly what happened during generation. In regulated environments, the speed of AI without the proof of AI is a liability.
The Solution
Shipwright wraps the entire AI code generation process in a deterministic safety pipeline. A structured ticket defines the work, the allowed file scope, and the verification criteria. Execution happens in an isolated git worktree with a PreToolUse hook that enforces command allowlists and path boundaries on every single tool invocation. After execution, verification commands run automatically with a repair loop that retries on repairable failures. The final output is a tamper-evident bundle with per-file SHA256 hashes.
How It Works
- Write a ticket (
.brain.md) with YAML frontmatter specifying the task, file scope, and verification commands - Run
shipwright run ticket.brain.md --repo /path/to/repo - Shipwright creates an isolated git worktree, loads the policy bundle, sanitizes inputs, and invokes Claude Code
- A PreToolUse hook intercepts every tool call and enforces the command allowlist, path boundaries, and protected directory rules
- Verification commands run automatically. If tests fail and the failure is repairable, Shipwright retries with failure context
- On completion, two bundles are created: full and sanitized (secrets removed, paths redacted)
- Apply with
shipwright apply --bundle bundle.zip— includes risk classification and optional approval gates
PreToolUse Hook
The core security differentiator. Every single tool call Claude makes goes through a deterministic policy gate before it executes:
- Command allowlist — commands matched by basename, catching both
/usr/bin/curlandenv curl - Path boundaries — resolved through symlinks to defeat escape-via-symlink attacks
- Protected directories — hardcoded (
.claude/,.shipwright/,.git/), not configurable by design - Shell meta-character blocking —
&&,||,;,|, backticks,$()all blocked in Bash calls - Fail-closed — unknown tools are denied by default
Output Bundle
Every Shipwright run produces a non-negotiable set of artifacts:
- plan.md — the implementation plan, locked before execution begins
- summary.md — what was actually done, written after completion
- test_report.txt — full test output proving the changes work
- pr_description.md — ready-to-paste PR description
- audit.jsonl — append-only log of every action taken
- patch.diff — the exact changes, reviewable before apply
- bundle.zip — everything above, with per-file SHA256 manifest
- bundle_sanitized.zip — bank-safe transfer bundle with secrets removed and paths redacted
Security Layers
- PreToolUse hook — deterministic, fail-closed policy enforcement on every AI tool call
- OS-level sandbox — optional firejail (Linux) or sandbox-exec (macOS) for filesystem and network isolation
- Secret scanning — 13 regex patterns covering AWS keys, GitHub tokens, JWTs, PEM keys, and more
- Input sanitization — strips null bytes, ANSI escapes, fake system prompts, and command substitution patterns
- Provenance envelopes — untrusted content wrapped in DELIMIT/DATAMARK/ENCODE transforms for prompt injection defense
- Risk classification — LOW/MEDIUM/HIGH/CRITICAL with interactive approval gates
- Supply chain gate — blocks dependency file changes unless an exception token is provided
- Scope check — validates modified files against ticket-declared file scope
Architecture
32,082 lines of Python across 73 modules with a 1.24:1 test-to-code ratio (39,857 LOC of tests across 88 test files). The system follows a phase-based pipeline:
- Config resolution — four-layer cascade: CLI > env > user config > repo config. Policy packs ("home" vs "bank") set baseline constraints.
- Worktree setup —
git worktree addcreates an isolated branch per run with file-level locking - Preflight — ticket linting, policy checks, profile-specific environment validation
- Execution — Claude Code CLI invoked with stream-json parsing and PreToolUse hook enforcement
- Verification + repair — runs ticket-defined verification commands with automated retry on repairable failures
- Bundle creation — full and sanitized bundles with per-file SHA256 hashes and integrity manifest
The run index uses JSONL with a SQLite accelerator for full-text search, pagination, and failure categorization. 30+ CLI commands span execution, verification, bundling, risk classification, and observability.
What This Demonstrates
- Deterministic AI policy enforcement via PreToolUse hooks
- Defense-in-depth security architecture (hook + sandbox + scanning + sanitization)
- Cryptographic integrity verification and tamper-evident audit trails
- Git worktree management for sandboxed execution
- Production-grade Python CLI at scale (32K LOC, 88 test files, 1.24:1 test ratio)
Get in Touch
This repo is private. Request a walkthrough or view all projects.