shipyard-agent

provenance:github:maxpetrusenko/shipyard-agent

WHAT THIS AGENT DOES

Shipyard is an AI assistant that helps automate software development tasks. It takes a simple instruction, like "add a button to the checkout page," and breaks it down into smaller steps, making changes directly to the codebase. The system then automatically checks if those changes work correctly and fixes any errors it finds, ensuring a reliable outcome. Developers or technical teams can use Shipyard to speed up coding projects and reduce the risk of introducing bugs. What sets it apart is its ability to manage complex tasks by dividing them into smaller, manageable pieces and continuously verifying the results.

View Source ↗First seen 2mo agoNot yet hireable

README

# Shipyard -- Autonomous Coding Agent

Autonomous coding agent that takes a natural language instruction, decomposes it into steps, makes surgical file edits, verifies correctness via typecheck and tests, and self-corrects through a review loop. Built on LangGraph.

## Architecture

Shipyard runs as a stateful LangGraph `StateGraph` with conditional edges:

```
START -> plan -> coordinate -> verify -> review
                            |-> execute (fallback only)
                                       |-> done -> report -> END
                                       |-> retry -> plan (with feedback)
                                       |-> escalate -> error_recovery
                                                        |-> plan (retry)
                                                        |-> report (fatal) -> END
```

| Node | Model | Purpose |
|------|-------|---------|
| `plan` | GPT-5.3 Codex (default) | Decompose instruction into steps, explore codebase |
| `coordinate` | logic + worker agents | Spawn one worker per vertical step, run verify after every worker, spawn repair workers on failures |
| `execute` | GPT-5.4 Mini (default) | Single-agent fallback path used only when worker orchestration is disabled for recovery |
| `verify` | bash | Run lint (if configured) + `tsc --noEmit`; run tests on final step |
| `review` | GPT-5.3 Codex (default) | Quality gate: continue / done / retry / escalate |
| `error_recovery` | logic | Decide retry (with file rollback) vs abort |
| `report` | GPT-5.4 Mini (default) | Summarize results/cost + policy-driven next actions |

Tools: `read_file`, `edit_file` (4-tier surgical edit with fuzzy fallback), `write_file`, `bash`, `grep`, `glob`, `ls`, `web_search`, `spawn_agent`, `ask_user`, `commit_and_open_pr`, `inject_context`.

Default execution strategy: the parent run stays small-context and orchestration-focused. It plans once, then spawns isolated worker agents for each vertical step. After each worker, Shipyard runs deterministic verification; if verification fails, the coordinator spawns a repair worker for that same step before moving on.

File edits use anchor-based string replacement: exact match, whitespace-normalized, Levenshtein fuzzy (< 10% distance), full rewrite as last resort.

## Prerequisites

- **Node.js** >= 20
- **pnpm** (any recent version; `npm install -g pnpm`)

## Quick Start

```bash
git clone <repo-url> ship-agent
cd ship-agent
pnpm install
cp .env.example .env
# Edit .env -- set OPENAI_API_KEY (and optionally ANTHROPIC_* keys if you select Claude models)
pnpm dev
```

Server starts on `http://localhost:4200`. Verify:

```bash
curl http://localhost:4200/api/health
# {"status":"ok"}
```

For production:

```bash
pnpm build
pnpm start
```

For hot reload while editing Shipyard itself:

```bash
pnpm dev:watch
```

Use plain `pnpm dev` for long-running agent sessions so file changes in the workspace do not restart the server mid-run.

## Live Deployment

- Hostinger
- Dashboard: `https://agent.ship.187.77.7.226.sslip.io/dashboard`
- Base URL: `https://agent.ship.187.77.7.226.sslip.io`

## Web UI

- `/dashboard` — chat-style workspace for ask, plan, and agent runs
- `/dashboard` left sidebar `Config` tab — runtime model key overrides and GitHub App OAuth repo connection
- `/settings/connectors/github` — dedicated GitHub connector settings page (GitHub App install flow + repo connect)
- `/runs` — `Refactoring Runs` view; defaults to repo-touching/code-oriented history and hides pure ask chats
- `/benchmarks` — benchmark summaries and trend charts

## Configuration

All env vars are documented in `.env.example`. Copy it and fill in your values.

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `OPENAI_API_KEY` | Yes* | -- | OpenAI API key (`sk-...`) used by default model routing. |
| `ANTHROPIC_API_KEY` | No | -- | Optional Anthropic API key (`sk-ant-...`) if you explicitly select Claude models. |
| `ANTHROPIC_AUTH_TOKEN` | No | -- | Optional Anthropic OAuth token (alternative to Anthropic API key). |
| `ANTHROPIC_BASE_URL` | No | Anthropic default | Custom API base URL (e.g., local proxy for Claude Max) |
| `SHIPYARD_PORT` | No | `4200` | HTTP + WebSocket server port |
| `SHIPYARD_WORK_DIR` | No | `cwd()` | Working directory the agent operates in (the target repo) |
| `SHIPYARD_STALE_RUN_TIMEOUT_MS` | No | `120000` | Fails a run if no graph progress or live tool activity lands within this window. Set `0` to disable. |
| `SHIPYARD_API_KEY` | No | -- | Bearer token for API authentication. When set, all endpoints except `/api/health` require `Authorization: Bearer <key>`. |
| `SHIPYARD_ADMIN_TOKEN` | No | -- | Admin token for non-local settings, GitHub install selection, repo connect, and other sensitive write routes. |
| `SHIPYARD_ENABLE_WEB_SEARCH` | No | `false` | Enables the guarded `web_search` tool for outbound exact-error lookup. |
| `BRAVE_SEARCH_API_KEY` | No | -- | Brave Search API key used by `web_search` when enabled. |
| `SHIPYARD_DB_URL` | No | -- | PostgreSQL connection string for persistent run storage. Falls back to in-memory. |
| `LANGCHAIN_TRACING_V2` | No | `false` | Set `true` to enable LangSmith tracing |
| `LANGCHAIN_API_KEY` | No | -- | LangSmith API key (also accepts `LANGSMITH_API_KEY`) |
| `LANGCHAIN_PROJECT` | No | `shipyard` | LangSmith project name (also accepts `LANGSMITH_PROJECT`) |

## API

### `GET /api/health`

```bash
curl http://localhost:4200/api/health
```

### `POST /api/run`

Submit an instruction. Returns `{ runId }`.

```bash
curl -X POST http://localhost:4200/api/run \
  -H "Content-Type: application/json" \
  -d '{"instruction": "Add strict TypeScript to all files"}'
```

With a supplied execution plan (planner bypass):

```bash
curl -X POST http://localhost:4200/api/run \
  -H "Content-Type: application/json" \
  -d '{
    "instruction": "Run rebuild vertical 1",
    "executionPlan": [
      {"description": "Refactor API surface", "files": ["src/server/routes.ts"]},
      {"description": "Add regression tests", "files": ["test/server/routes.test.ts"]}
    ]
  }'
```

With context injection:

```bash
curl -X POST http://localhost:4200/api/run \
  -H "Content-Type: application/json" \
  -d '{
    "instruction": "Refactor auth module",
    "contexts": [{"label": "CLAUDE.md", "content": "...", "source": "system"}]
  }'
```

### `GET /api/runs/:id`

Get full state for a specific run.

```bash
curl http://localhost:4200/api/runs/<run-id>
```

### `POST /api/runs/:id/followup`

Append a message to an Ask thread. Follow-ups are queued in-order on the same thread, so you can keep sending asks while another run is still executing. Passing any model selection fields replaces the prior thread selection. Send `"model": null` to clear a stale whole-run override and fall back to the default provider routing.

```bash
curl -X POST http://localhost:4200/api/runs/<run-id>/followup \
  -H "Content-Type: application/json" \
  -d '{
    "instruction": "and now explain the tradeoff",
    "model": null
  }'
```

### `DELETE /api/runs/:id`

Remove a run from memory, `results/<id>.json`, and Postgres (if configured). Returns `409` if that run is still executing (stop it first).

```bash
curl -X DELETE http://localhost:4200/api/runs/<run-id>
```

### `GET /api/runs`

List runs with pagination.

```bash
curl "http://localhost:4200/api/runs?limit=10&offset=0"
```

### `POST /api/inject`

Inject context into a running agent (max 500KB).

```bash
curl -X POST http://localhost:4200/api/inject \
  -H "Content-Type: application/json" \
  -d '{"label": "extra-context", "content": "Use zod for all validation"}'
```

### `POST /api/cancel`

Cancel the current run.

```bash
curl -X POST http://localhost:4200/api/cancel
```

### `GET /api/settings/status`

Returns active workdir/repo metadata and whether runtime provider keys are present.

```bash
curl http://localhost:4200/api/settings/status
```

### `POST /api/settings/model-keys`

[truncated…]

PUBLIC HISTORY

First discoveredMar 24, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenMar 23, 2026

last updatedMar 23, 2026

last crawled1 months ago

version—

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:maxpetrusenko/shipyard-agent)