Skip to content

Shipwright

Local-first CLI turning tickets into verified, traceable changesets with deterministic PreToolUse hooks and SHA256 integrity

PythonGit worktreesClaude CodePreToolUse hooksSHA256JSONL audit
Problem
AI code changes are fast but unverifiable
Solution
Sandboxed CLI with PreToolUse hooks, SHA256 integrity, secret scanning
Proof
32K LOC, 88 test files, 1.24:1 test-to-code ratio, 73 modules
Stack
Python, Git worktrees, Claude Code, JSONL audit, SHA256

The Problem

AI-generated code changes are fast but unverifiable. When Claude Code produces a changeset, there's no guarantee it followed the plan, passed the tests, or didn't modify files outside scope. There is no standard way to enforce what commands and file paths the AI can access, detect if it accidentally introduced secrets, or produce an auditable record of exactly what happened during generation. In regulated environments, the speed of AI without the proof of AI is a liability.

The Solution

Shipwright wraps the entire AI code generation process in a deterministic safety pipeline. A structured ticket defines the work, the allowed file scope, and the verification criteria. Execution happens in an isolated git worktree with a PreToolUse hook that enforces command allowlists and path boundaries on every single tool invocation. After execution, verification commands run automatically with a repair loop that retries on repairable failures. The final output is a tamper-evident bundle with per-file SHA256 hashes.

How It Works

  1. Write a ticket (.brain.md) with YAML frontmatter specifying the task, file scope, and verification commands
  2. Run shipwright run ticket.brain.md --repo /path/to/repo
  3. Shipwright creates an isolated git worktree, loads the policy bundle, sanitizes inputs, and invokes Claude Code
  4. A PreToolUse hook intercepts every tool call and enforces the command allowlist, path boundaries, and protected directory rules
  5. Verification commands run automatically. If tests fail and the failure is repairable, Shipwright retries with failure context
  6. On completion, two bundles are created: full and sanitized (secrets removed, paths redacted)
  7. Apply with shipwright apply --bundle bundle.zip — includes risk classification and optional approval gates

PreToolUse Hook

The core security differentiator. Every single tool call Claude makes goes through a deterministic policy gate before it executes:

  • Command allowlist — commands matched by basename, catching both /usr/bin/curl and env curl
  • Path boundaries — resolved through symlinks to defeat escape-via-symlink attacks
  • Protected directories — hardcoded (.claude/, .shipwright/, .git/), not configurable by design
  • Shell meta-character blocking&&, ||, ;, |, backticks, $() all blocked in Bash calls
  • Fail-closed — unknown tools are denied by default

Output Bundle

Every Shipwright run produces a non-negotiable set of artifacts:

  • plan.md — the implementation plan, locked before execution begins
  • summary.md — what was actually done, written after completion
  • test_report.txt — full test output proving the changes work
  • pr_description.md — ready-to-paste PR description
  • audit.jsonl — append-only log of every action taken
  • patch.diff — the exact changes, reviewable before apply
  • bundle.zip — everything above, with per-file SHA256 manifest
  • bundle_sanitized.zip — bank-safe transfer bundle with secrets removed and paths redacted

Security Layers

  • PreToolUse hook — deterministic, fail-closed policy enforcement on every AI tool call
  • OS-level sandbox — optional firejail (Linux) or sandbox-exec (macOS) for filesystem and network isolation
  • Secret scanning — 13 regex patterns covering AWS keys, GitHub tokens, JWTs, PEM keys, and more
  • Input sanitization — strips null bytes, ANSI escapes, fake system prompts, and command substitution patterns
  • Provenance envelopes — untrusted content wrapped in DELIMIT/DATAMARK/ENCODE transforms for prompt injection defense
  • Risk classification — LOW/MEDIUM/HIGH/CRITICAL with interactive approval gates
  • Supply chain gate — blocks dependency file changes unless an exception token is provided
  • Scope check — validates modified files against ticket-declared file scope

Architecture

32,082 lines of Python across 73 modules with a 1.24:1 test-to-code ratio (39,857 LOC of tests across 88 test files). The system follows a phase-based pipeline:

  1. Config resolution — four-layer cascade: CLI > env > user config > repo config. Policy packs ("home" vs "bank") set baseline constraints.
  2. Worktree setupgit worktree add creates an isolated branch per run with file-level locking
  3. Preflight — ticket linting, policy checks, profile-specific environment validation
  4. Execution — Claude Code CLI invoked with stream-json parsing and PreToolUse hook enforcement
  5. Verification + repair — runs ticket-defined verification commands with automated retry on repairable failures
  6. Bundle creation — full and sanitized bundles with per-file SHA256 hashes and integrity manifest

The run index uses JSONL with a SQLite accelerator for full-text search, pagination, and failure categorization. 30+ CLI commands span execution, verification, bundling, risk classification, and observability.

What This Demonstrates

  • Deterministic AI policy enforcement via PreToolUse hooks
  • Defense-in-depth security architecture (hook + sandbox + scanning + sanitization)
  • Cryptographic integrity verification and tamper-evident audit trails
  • Git worktree management for sandboxed execution
  • Production-grade Python CLI at scale (32K LOC, 88 test files, 1.24:1 test ratio)

Get in Touch

This repo is private. Request a walkthrough or view all projects.