AGENTS / GITHUB / adaptive-harness
githubinferredactive

adaptive-harness

provenance:github:SeongwoongCho/adaptive-harness

A self-improving harness router for Claude Code.

View Source ↗First seen 1mo agoNot yet hireable
README
<p align="center">
  <h1 align="center">adaptive-harness</h1>
  <p align="center">
    <strong>A self-improving harness router for Claude Code.</strong><br/>
    It watches every task, picks the best workflow, scores the result, and evolves — automatically.
  </p>
  <p align="center">
    <a href="#installation">Install</a> &nbsp;&bull;&nbsp;
    <a href="#how-it-works">How It Works</a> &nbsp;&bull;&nbsp;
    <a href="#built-in-harnesses">Harnesses</a> &nbsp;&bull;&nbsp;
    <a href="#contributing">Contributing</a>
  </p>
</p>

---

> **Unlike static skill packs, adaptive-harness gets smarter the more you use it.**

```
You: Fix the login bug where empty email crashes the server

[adaptive-harness]
  Classified:  bugfix | low uncertainty | local | backend
  Selected:    tdd-driven (score 0.92)  >  systematic-debugging (0.81)

[tdd-driven subagent]
  1. Write failing test for empty-email path   ✓
  2. Implement null guard in validateEmail()   ✓
  3. Run test suite (47/47 pass)               ✓

[evaluator]
  correctness: 1.00 | completeness: 1.00 | quality: 0.91
  robustness: 0.88 | clarity: 0.95 | verifiability: 0.92
  overall: 0.94  ← harness weight updated: 1.00 → 1.02
```

After 8 sessions on similar tasks, the router **learns your codebase's patterns** and consistently picks the highest-scoring workflow.

---

## How It Works

```
User Task
    │
    ▼
┌─────────────────────────────────────┐
│  1. Classify    6-axis taxonomy     │
│  2. Route       best harness(es)    │
│  3. Execute     subagent pipeline   │
│  4. Evaluate    6-dim scoring       │
│  5. Evolve      update weights      │
└─────────────────────────────────────┘
```

**Three levels of self-improvement:**

| Level | What improves | How |
|-------|--------------|-----|
| **Routing** | Which harness gets picked | Weights adjust after every evaluation |
| **Content** | What the harness actually does | Evolution manager rewrites agent personas and `skill.md` via A/B testing |
| **Genesis** | Which harnesses exist | Evolution manager creates new harnesses by combining existing ones |

Hard tasks (`uncertainty=high` **and** `verifiability=hard` or `blast_radius=repo-wide`) automatically trigger **ensemble mode** — two harnesses run in parallel, a synthesizer merges the best of both.

---

## Installation

```bash
claude plugin marketplace add https://github.com/SeongwoongCho/adaptive-harness
claude plugin install adaptive-harness@adaptive-harness
```

Then start a new Claude Code session.

---

## Quick Start

```bash
cd your-project
claude                              # new session — hooks auto-initialize with --general defaults
```

That's it. Every task is now routed through the adaptive-harness pipeline automatically.

```
# Or run explicitly with options
/adaptive-harness:run "Refactor the payment module"
/adaptive-harness:run "Build a new feature"              # interview runs by default
/adaptive-harness:run --skip-interview "Build a new feature"  # skip interview
/adaptive-harness:run --harness=tdd-driven "Fix the login bug"
```

---

## Built-in Harnesses

| Harness | Best For | Model |
|---------|----------|-------|
| **tdd-driven** | Strict red-green-refactor cycles with enforced test coverage gates | Sonnet |
| **systematic-debugging** | Root cause analysis through structured reproduce-isolate-fix-verify phases | Sonnet |
| **rapid-prototype** | Fast MVP building with speed as the primary constraint | Sonnet |
| **research-iteration** | Hypothesis-driven cycles for high-uncertainty problems with rigorous measurement | Opus |
| **careful-refactor** | Safe refactoring via Mikado method without changing observable behavior | Sonnet |
| **code-review** | Multi-perspective review across security, quality, performance, and maintainability | Opus |
| **migration-safe** | Schema, dependency, and API migrations with audit trails and rollback plans | Sonnet |
| **ralplan-consensus** | Implementation planning with self-review — analyzes, plans, then challenges its own assumptions | Opus |
| **ralph-loop** | Persistent execution loop until all acceptance criteria pass (max iterations bounded) | Sonnet |
| **engineering-retro** | Weekly retrospective with commit history analysis, contributor metrics, trend tracking, and growth coaching | Sonnet |
| **plan-review** | Challenges scope and reviews architecture, quality, tests, and performance one issue at a time with failure mode analysis | Opus |
| **qa-testing** | Tests applications like a real user, computes a health score, and produces a structured report with screenshot evidence | Sonnet |
| **pre-landing-review** | Pre-merge diff review with critical (blocking) and informational (advisory) passes and interactive resolution | Sonnet |
| **ship-workflow** | Automated release: merges main, runs tests, bumps version, generates changelog, creates bisectable commits, and opens a PR | Sonnet |
| **deep-interview** | Resolves ambiguous requirements through structured clarifying interviews, builds a confirmed spec, then executes against it | Opus |
| **simple-executor** | Lightweight executor for trivial, well-defined local changes — no planning overhead | Sonnet |
| **documentation-writer** | Reads source truth first, then drafts accurate and well-styled docs, READMEs, API references, and guides | Sonnet |
| **security-audit** | OWASP Top-10 scan, dependency audit, secrets scan, and threat modeling with a prioritized findings report | Opus |
| **performance-optimization** | Measurement-driven optimization cycles: baseline → profile → hypothesize → implement → measure → verify | Sonnet |

### Experimental Harnesses

| Harness | Best For | Model |
|---------|----------|-------|
| **progressive-refinement** | Iterative quality improvement — rough solution first, then targets weakest dimension each pass | Sonnet |
| **divide-and-conquer** | Splits large tasks into independent sub-tasks, solves in isolation, integrates and verifies | Sonnet |
| **adversarial-review** | Implements a solution, then deliberately tries to break it with adversarial tests and edge-case attacks | Sonnet |
| **spike-then-harden** | Two-phase: fast throwaway prototype to learn the problem space, then production-quality rewrite | Sonnet |

The router supports **harness chaining** — e.g. `plan → execute → review` for complex tasks. Chains are **adaptive**: if a harness discovers mid-execution that the next planned step is wrong, it emits a `next_harness_hint` and the orchestrator reroutes dynamically.

---

## Task Taxonomy (6 Axes)

Every task is classified by LLM reasoning (not keyword matching):

| Axis | Values |
|------|--------|
| `task_type` | bugfix / feature / refactor / research / migration / incident / benchmark |
| `uncertainty` | low / medium / high |
| `blast_radius` | local / cross-module / repo-wide |
| `verifiability` | easy / moderate / hard |
| `latency_sensitivity` | low / high |
| `domain` | backend / frontend / mobile / ml-research / data-engineering / devops / security / infra / docs |
| `domain_hint` | *(optional)* free-text hint for mixed-domain tasks — logged for analytics, not used in routing (e.g., `"also touches devops"`, `"Spark ETL pipeline"`) |

---

## Evaluation Dimensions

Every task result is scored on **6 fixed dimensions** with fixed weights:

| Dimension | Weight | What it measures |
|-----------|--------|-----------------|
| **correctness** | 0.25 | Does the output satisfy stated requirements? |
| **completeness** | 0.20 | Does the output cover the full scope? |
| **quality** | 0.20 | Structural and stylistic quality |
| **robustness** | 0.10 | Edge case and failure mode handling |
| **clarity** | 0.15 | Clear communication of intent |
| **verifiability** | 0.10 | Can the output be independently verified? |

These dimensions apply universally to all task types — code, research, planning, writing, documentation. The evaluator model is auto-routed: Sonnet for simple tasks, Opus for complex ones.

---

## Evolution System

The evolution ma

[truncated…]

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub
first seenMar 14, 2026
last updatedMar 19, 2026
last crawledtoday
version

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:SeongwoongCho/adaptive-harness)