Shipwright - Brent Slater

Problem: AI code changes are fast but unverifiable
Solution: Sandboxed CLI with PreToolUse hooks, SHA-256 integrity, secret scanning
Results: Every changeset verified before merge — 88 test files, 1.24:1 test-to-code ratio
Stack: Python, Git worktrees, Claude Code, JSONL audit, SHA-256

The Problem

AI-generated code changes are fast but unverifiable. When Claude Code produces a changeset, there's no guarantee it followed the plan, passed the tests, or didn't modify files outside scope. There is no standard way to enforce what commands and file paths the AI can access, detect if it accidentally introduced secrets, or produce an auditable record of exactly what happened during generation. In regulated environments, the speed of AI without the proof of AI is a liability.

The Solution

Shipwright wraps the entire AI code generation process in a deterministic safety pipeline. A structured ticket defines the work, the allowed file scope, and the verification criteria. Execution happens in an isolated git worktree with a PreToolUse hook that enforces command allowlists and path boundaries on every single tool invocation. After execution, verification commands run automatically with a repair loop that retries on repairable failures. The final output is a tamper-evident bundle with per-file SHA-256 hashes.

How It Works

Write a ticket (.brain.md) with YAML frontmatter specifying the task, file scope, and verification commands
Run shipwright run ticket.brain.md --repo /path/to/repo
Shipwright creates an isolated git worktree, loads the policy bundle, sanitizes inputs, and invokes Claude Code
A PreToolUse hook intercepts every tool call and enforces the command allowlist, path boundaries, and protected directory rules
Verification commands run automatically. If tests fail and the failure is repairable, Shipwright retries with failure context
On completion, two bundles are created: full and sanitized (secrets removed, paths redacted)
Apply with shipwright apply --bundle bundle.zip — includes risk classification and optional approval gates

PreToolUse Hook

The core security differentiator. Every single tool call Claude makes goes through a deterministic policy gate before it executes:

Command allowlist — commands matched by basename, catching both /usr/bin/curl and env curl
Path boundaries — resolved through symlinks to defeat escape-via-symlink attacks
Protected directories — hardcoded (.claude/, .shipwright/, .git/), not configurable by design
Shell meta-character blocking — &&, ||, ;, |, backticks, $() all blocked in Bash calls
Fail-closed — unknown tools are denied by default

Output Bundle

Every Shipwright run produces a non-negotiable set of artifacts:

plan.md — the implementation plan, locked before execution begins
summary.md — what was actually done, written after completion
test_report.txt — full test output proving the changes work
pr_description.md — ready-to-paste PR description
audit.jsonl — append-only log of every action taken
patch.diff — the exact changes, reviewable before apply
bundle.zip — everything above, with per-file SHA-256 manifest
bundle_sanitized.zip — bank-safe transfer bundle with secrets removed and paths redacted

Output Bundle

shipwright_SW-041_20250215/

├── plan.md (implementation plan, locked)

├── summary.md (what was actually done)

├── test_report.txt (142 tests passed, 0 failed)

├── pr_description.md (ready-to-paste PR body)

├── audit.jsonl (47 actions logged)

├── patch.diff (312 lines changed)

├── manifest.json (per-file SHA-256 hashes)

└── bundle_sanitized.zip (secrets removed, paths redacted)

Verification: 3/3 commands passed

Risk classification: LOW (no dependency changes, no new files outside scope)

Security Layers

PreToolUse hook — deterministic, fail-closed policy enforcement on every AI tool call
OS-level sandbox — optional firejail (Linux) or sandbox-exec (macOS) for filesystem and network isolation
Secret scanning — 13 regex patterns covering AWS keys, GitHub tokens, JWTs, PEM keys, and more
Input sanitization — strips null bytes, ANSI escapes, fake system prompts, and command substitution patterns
Provenance envelopes — untrusted content wrapped in DELIMIT/DATAMARK/ENCODE transforms for prompt injection defense
Risk classification — LOW/MEDIUM/HIGH/CRITICAL with interactive approval gates
Supply chain gate — blocks dependency file changes unless an exception token is provided
Scope check — validates modified files against ticket-declared file scope

Architecture

32,082 lines of Python across 73 modules with a 1.24:1 test-to-code ratio (39,857 LOC of tests across 88 test files). The system follows a phase-based pipeline:

Config resolution — four-layer cascade: CLI > env > user config > repo config. Policy packs ("home" vs "bank") set baseline constraints.
Worktree setup — git worktree add creates an isolated branch per run with file-level locking
Preflight — ticket linting, policy checks, profile-specific environment validation
Execution — Claude Code CLI invoked with stream-json parsing and PreToolUse hook enforcement
Verification + repair — runs ticket-defined verification commands with automated retry on repairable failures
Bundle creation — full and sanitized bundles with per-file SHA-256 hashes and integrity manifest

The run index uses JSONL with a SQLite accelerator for full-text search, pagination, and failure categorization. 30+ CLI commands span execution, verification, bundling, risk classification, and observability.

What This Demonstrates

Deterministic AI policy enforcement via PreToolUse hooks
Defense-in-depth security architecture (hook + sandbox + scanning + sanitization)
Cryptographic integrity verification and tamper-evident audit trails
Git worktree management for sandboxed execution
Production-grade Python CLI at scale (32K LOC, 88 test files, 1.24:1 test ratio)

Get in Touch

This repo is private. Request a walkthrough or view all projects.