PromptWheel: How We Built a Code Improvement Orchestrator (and How It Works)

Every codebase has a backlog of improvements that nobody gets to. Type safety gaps, missing tests, dead code, inconsistent error handling, documentation that drifted three sprints ago. The work isn’t hard — it’s just not urgent enough to prioritize over features and deadlines.

We built PromptWheel to batch that work. Point it at a repo, run a session, and get a set of clean, tested pull requests ready for review.

What PromptWheel actually does

PromptWheel is an open-source CLI (Apache 2.0) that orchestrates batch improvement cycles against your codebase:

Scout → Filter → Execute (parallel) → QA → PR → repeat

Scout scans a sector of the codebase for improvements using the active formula — security audit, test coverage, type safety, documentation gaps, or a custom formula you define.

Filter applies deduplication, impact scoring, and adversarial review. Every proposal gets challenged by a devil’s-advocate pass before it’s approved. Low-confidence or low-impact changes get filtered out automatically.

Execute runs approved changes in parallel using isolated git worktrees. Each ticket is sandboxed to specific file paths — a ticket scoped to src/auth/ physically cannot touch src/billing/.

QA runs your test suite, type checker, and linter against every change. Failed tickets are blocked, not merged. If a fix fails QA, PromptWheel retries with the error context before moving on.

PR creates draft pull requests — either one per ticket or batched into milestone PRs that group related changes for cleaner review.

Then it starts the next cycle.

The problem with one-shot tools

Most AI code tools are one-shot: you prompt, it generates, you review. That works for individual tasks but it doesn’t compound. Each session starts from zero.

PromptWheel is designed to get smarter every run through three mechanisms:

Cross-run learnings. Every success and failure is recorded and persisted across sessions. If a change to your auth module broke tests in session 3, session 4 knows about it. Learnings use temporal decay so recent experience weighs more than old history.

Sector rotation. The codebase is divided into logical regions. An EMA-weighted rotation concentrates effort where proposals succeed and automatically skips exhausted areas. By session 20, PromptWheel knows which parts of your codebase still have low-hanging fruit and which are already polished.

Formula adaptation. Built-in formulas rotate using UCB1 bandit scoring — the same algorithm that powers recommendation engines. It balances exploiting what’s worked with exploring less-tried approaches.

The result: session 1 is broad and exploratory. Session 5 is focused and efficient. Session 20 is surgical.

How we actually use it

We run PromptWheel on our own projects — including this site. Here’s the workflow:

# Single improvement cycle
promptwheel

# Extended batch run with milestone PRs
promptwheel --hours 4 --batch-size 30

# Security-focused audit
promptwheel --formula security-audit

# Architectural review (finds structural problems, not just lint)
promptwheel --deep

Or through Claude Code:

/promptwheel:run spin hours=4
/promptwheel:run formula=security-audit
/promptwheel:run deep

A typical batch session on a mid-size Next.js project produces 15-25 merged improvements covering type safety fixes, missing error boundaries, test coverage gaps, dead import cleanup, and documentation updates. Each one passes the full test suite before review.

Safety mechanisms

Letting an AI modify your codebase is a reasonable thing to be nervous about. PromptWheel has six layers of protection:

Scope enforcement. Every ticket is sandboxed to specific file paths. The tool physically prevents changes outside the allowed scope.
Trust ladder. PromptWheel starts conservative — refactoring, tests, docs. More impactful categories unlock as you build confidence. You control what it’s allowed to touch.
Adversarial review. Every proposal goes through a devil’s-advocate scoring pass. Changes that look good but have hidden risk get flagged before execution.
Mandatory QA. Type checker, test suite, and linter run against every change. No exceptions. Failed tickets are blocked.
Spindle detection. If an agent gets stuck in a QA ping-pong loop, file churn cycle, or command failure spiral, PromptWheel catches it and moves on instead of burning tokens.
Draft PRs. Nothing merges without your review. Every change lands as a draft PR (or a milestone batch) for human approval.

Formulas and trajectories

Formulas are one-command presets for common improvement patterns:

promptwheel --formula security-audit    # OWASP vulnerability scan
promptwheel --formula test-coverage     # Missing unit tests
promptwheel --formula type-safety       # Remove any/unknown casts
promptwheel --formula cleanup           # Dead code, unused imports
promptwheel --formula docs              # Documentation gaps

You can write custom formulas in YAML and drop them in .promptwheel/formulas/.

Trajectories are structured multi-step improvement plans. Define a dependency graph where each step has scoped file paths, acceptance criteria, and verification commands:

# .promptwheel/trajectories/harden-auth.yaml
name: harden-auth
description: Security hardening for auth module
steps:
  - id: input-validation
    title: Add input validation to all auth endpoints
    scope: "src/auth/**"
    categories: [security]
    acceptance_criteria:
      - All endpoints validate input before processing
    verification_commands:
      - npm test -- src/auth

  - id: rate-limiting
    title: Add rate limiting to login and reset endpoints
    scope: "src/auth/**,src/middleware/**"
    depends_on: [input-validation]
    acceptance_criteria:
      - Login endpoint rate-limited to 5 attempts per minute
    verification_commands:
      - npm test -- src/auth src/middleware

Sessions focus on the current step and advance automatically as acceptance criteria are verified.

What we learned building it

Impact scoring matters more than volume. Early versions generated too many low-value changes — import reordering, whitespace fixes, trivial renames. Adding impact × confidence scoring and a minimum threshold filter transformed the output from noisy to useful.

Parallel execution needs conflict awareness. Running five tickets in parallel is fast, but if two tickets modify the same file, one of them will fail on merge. PromptWheel uses wave scheduling to detect potential conflicts before execution and serializes overlapping scopes.

Deduplication is harder than it sounds. Naive deduplication (same file + same line) misses semantic duplicates. PromptWheel tracks proposals by intent, not just location, so it won’t propose the same type safety fix three sessions in a row even if the surrounding code changed.

Runaway agents are a real problem. Without spindle detection, an agent that hits a test failure will sometimes enter a fix-break-fix-break loop that burns hundreds of thousands of tokens. Detecting these patterns early and failing the ticket saves significant cost.

Getting started

PromptWheel is free and open source under Apache 2.0.

As a Claude Code plugin (recommended):

/install promptwheel@promptwheel

Uses your existing Claude Code subscription. No additional API key needed.

As a standalone CLI:

npm install -g @promptwheel/cli
cd your-project
promptwheel init
promptwheel              # Single cycle
promptwheel --hours 2    # Timed run
promptwheel --max-cycles 5  # Multi-cycle batch

PromptWheel also supports Codex (--codex), Kimi (--kimi), and local models (--local).

The source is at github.com/promptwheel-ai/promptwheel. Stars, issues, and contributions are welcome.

FAQ

How is this different from just running Claude Code?

Claude Code is a single-session tool. PromptWheel adds batch orchestration, cross-run memory, parallel execution with conflict-aware scheduling, scope enforcement, deduplication, adversarial review, and structured progression via trajectories and formulas. Think of it as the orchestration layer on top of your AI coding tool.

Will it break my code?

Every change runs through your type checker, test suite, and linter. Failed tickets are blocked, not merged. Scope enforcement sandboxes each ticket to specific file paths. All PRs are drafts by default — nothing merges without your review.

What languages and frameworks does it support?

PromptWheel auto-detects your test runner, framework, linter, and language across 10+ ecosystems including TypeScript, Python, Go, Rust, Java, Ruby, PHP, and more. If your project has a test command that returns an exit code, PromptWheel can verify changes against it.

How much does it cost?

PromptWheel itself is free and open source. It uses your existing AI provider credentials. The Claude Code plugin uses your existing subscription. The local backend (--local) is completely free.

PromptWheel: How We Built a Code Improvement Orchestrator (and How It Works)

PromptWheel: How We Built a Code Improvement Orchestrator (and How It Works)

What PromptWheel actually does

The problem with one-shot tools

How we actually use it

Safety mechanisms

Formulas and trajectories

What we learned building it

Getting started

FAQ

Related Articles

MCP Tools for Drupal: AI-Powered Site Building with Claude Code and OpenAI Codex

AI Agent Orchestration in 2026: OpenClaw, MCP, and the Security Lessons No One Wants to Hear

AI Agent Architecture: Security, Orchestration, and Tool Use Patterns