idle-harness

provenance:github:jhlee0409/idle-harness

WHAT THIS AGENT DOES

Idle Harness automatically creates complete web applications from simple descriptions. Imagine you have an idea for a website, but don't know how to code – this tool handles the entire process, from designing the look and feel to building the underlying technology. It’s designed for entrepreneurs, small business owners, or anyone who wants to quickly bring a web-based idea to life without needing technical expertise. The system works by having different AI components collaborate to build, test, and refine the application until it meets the initial request. What makes it special is its ability to build a fully functional application without any human coding involved, and it automatically fixes issues as it goes.

View Source ↗First seen 2mo agoNot yet hireable

README

# Idle Harness

> GAN-inspired multi-agent system that autonomously builds full-stack web apps from a single prompt using Claude AI agents

## What is Idle Harness?

Idle Harness is an autonomous multi-agent coding system inspired by GAN (Generative Adversarial Network) architecture. It takes a short natural-language prompt and automatically generates a complete full-stack web application — frontend, backend, database, and styling — without human intervention.

The system orchestrates three specialized AI agents (Planner, Generator, and Evaluator) that collaborate through a structured build-evaluate-iterate loop. Like a GAN's generator-discriminator dynamic, the Generator builds the application while the Evaluator tests it as a real user would — without ever reading the source code. This adversarial relationship drives quality: the Generator can't cut corners because the Evaluator will catch it.

Built on [Anthropic's harness design for long-running apps](https://www.anthropic.com/engineering/harness-design-long-running-apps) and powered by the Claude Agent SDK.

## Quick Start

```bash
git clone https://github.com/jhlee0409/idle-harness.git
cd idle-harness
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Interactive setup — configures auth automatically
python orchestrator.py --setup

# Build an app
python orchestrator.py "A tarot reading web app with card-draw animations and AI interpretations"

# After build — start the app
python orchestrator.py serve
```

That's it. If anything is missing, the harness detects it and offers to fix it automatically.

## Setup

Idle Harness includes a built-in interactive setup that detects and configures all dependencies.

### Option A: Auto-detect on first run

Just run the harness. If dependencies are missing, it offers to fix them:

```
$ python orchestrator.py "my app idea"

Preflight checks
  ✓ claude_agent_sdk
  ✓ node (v20.11.0)
  ✓ npm (10.2.4)
  ✓ git (2.43.0)
  ✗ auth — No auth configured
  ✓ MCP: playwright (SDK-managed via npx)

  Fix 1 issue(s) automatically? [Y/n]: y

  Choose auth method:
    [o] OAuth login (uses subscription quota)
    [a] API key (pay per use, no quota limit)
  Choose: o
  → Running: claude login
  ✓ OAuth authenticated

All issues fixed
```

### Option B: Explicit setup

```bash
python orchestrator.py --setup
```

Runs all checks and auto-fixes everything without asking.

### Option C: CI / Non-interactive

```bash
CI=1 python orchestrator.py "my app idea"
```

Fails hard with exit code 1 if any dependency is missing. No interactive prompts.

### Other commands

```bash
# Start the last-built app (opens browser automatically)
python orchestrator.py serve

# Clean runtime artifacts for a fresh run
python orchestrator.py clean

# Clean everything including generated apps
python orchestrator.py clean --all
```

### What gets checked

| Check | Auto-fixable | How |
|---|---|---|
| `claude_agent_sdk` | Yes | `pip install claude-agent-sdk` |
| `node`, `npm`, `git` | No | Prints install link |
| Claude CLI | No | Prints install link |
| Auth (OAuth or API key) | Yes | `claude login` or API key input |
| Playwright MCP | Automatic | SDK launches via npx — no user config needed |

### Authentication options

| Method | How to configure | Cost model |
|---|---|---|
| **OAuth** | `claude login` (interactive setup handles this) | Uses subscription quota (Pro/Max plan) |
| **API key** | Set `ANTHROPIC_API_KEY` env var | Pay per token, no quota limit |

## How It Works

```
User Prompt (1-4 sentences)
    ↓
┌─────────┐     ┌───────────┐     ┌───────────┐
│ Planner │ ──→ │ Generator │ ←─→ │ Evaluator │
│         │     │           │     │           │
│ Spec    │     │ React+TS  │     │ Browser   │
│ Design  │     │ Vite      │     │ Testing   │
│ Language│     │ FastAPI   │     │ Screenshot│
│         │     │ SQLite    │     │ Grading   │
└─────────┘     └───────────┘     └───────────┘
                      ↕
              Build → Evaluate → Feedback Loop (max 3 rounds)
```

1. **Plan** — Planner reads the frontend design skill, then expands the prompt into a full product spec with visual design language
2. **Negotiate** — Generator and Evaluator negotiate sprint contracts with testable criteria
3. **Build** — Generator implements the full-stack app in TypeScript + FastAPI, writes and runs tests (continuous session preserves context across retries)
4. **Evaluate** — Evaluator tests the running app via Playwright, grading on product depth, functionality, visual design, and code quality
5. **Iterate** — On FAIL, feedback is returned to the Generator for another attempt (up to 3 rounds)
6. **Integration** — After all sprints, a final cross-sprint evaluation verifies the complete application works together

### The GAN Principle

The Evaluator never reads source code. It can only interact with the running application through a browser — clicking buttons, filling forms, taking screenshots. This mirrors how a GAN's discriminator only sees the output, never the generator's internals. The result: the Generator must produce genuinely working software, not just code that looks correct.

## Agents

| Agent | Role | Key Behavior |
|-------|------|-------------|
| **Planner** | Prompt → Product Spec | Reads frontend design skill, defines visual design language, explores AI integration, high-level technical design (no implementation details) |
| **Generator** | Spec → Full-Stack Implementation | React+Vite+TypeScript+FastAPI+SQLite, writes tests (pytest+vitest), self-evaluates before handoff |
| **Evaluator** | Browser-Tests the Running App | Never reads source code (GAN principle), screenshot evidence, detects stubs/fakes, grades on 4 full-stack criteria |

## Evaluation Criteria

| Criterion | Weight | Description |
|-----------|--------|-------------|
| Product Depth | High | Are features complete and real, or surface-level stubs? |
| Functionality | High | Do core interactions work end-to-end with database persistence? |
| Visual Design | Normal | Does the app match the spec's visual design language? |
| Code Quality | Normal | Stability, error handling, edge case behavior |

## Configuration

Editable in `config.py`:

| Setting | Default | Description |
|---------|---------|-------------|
| `mode` | `full` | `full` (sprints + contracts + iteration) / `simple` (single build + eval) |
| `max_build_attempts` | `3` | Max build→evaluate retry rounds |
| `max_negotiation_rounds` | `3` | Max contract negotiation rounds |
| `generator_max_turns` | `200` | Max turns for Generator agent |
| `dev_server_url` | `http://localhost:5173` | Frontend server URL |
| `mcp_tool` | `playwright` | Evaluator browser testing tool |

## Project Structure

```
idle-harness/
├── orchestrator.py      # Main orchestration loop + preflight + setup
├── cli.py               # Claude Agent SDK wrapper
├── config.py            # Settings (mode, servers, limits)
├── state.py             # State management (status.json)
├── server.py            # Dev server start/stop
├── sprint.py            # Sprint parsing
├── agents/
│   ├── planner.md                # Planner system prompt
│   ├── generator.md              # Generator system prompt
│   ├── evaluator.md              # Evaluator system prompt
│   └── frontend-design-skill.md  # Design skill (Planner reads at runtime)
├── tests/               # pytest tests
├── comms/               # Runtime artifacts (spec, contracts, evaluations)
└── output/              # Generated applications
```

## FAQ

### What can I build with Idle Harness?

Any full-stack web application that can be described in a few sentences. Examples: a tarot reading app with AI interpretations, an AI-powered bookmark manager, a recipe finder with dietary filters, a personal finance tracker, a kanban board with drag-and-drop.

### How is this different from other AI code generators?

Most AI code generators produce code in a single pass. Idle Harness uses a multi-agent adversarial loop: 

[truncated…]

PUBLIC HISTORY

First discoveredApr 2, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenMar 29, 2026

last updatedApr 1, 2026

last crawled1 months ago

version—

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:jhlee0409/idle-harness)