ouroboros
Ouroboros is an AI agent system designed to automate software development tasks, from planning to merging pull requests. It's built to handle natural language instructions and transform them into functional code changes. Developers can use Ouroboros to streamline their workflow and reduce the time spent on repetitive coding and testing activities. The system features a team of specialized agents that collaborate to complete tasks autonomously. A key feature is its ability to self-improve, learning from past failures and refining its processes. This allows for continuous optimization and adaptation to new challenges. Ouroboros aims to create a self-managing software development infrastructure.
Ouroboros solves the problem of tedious and time-consuming software development tasks, such as writing tests, reviewing code, and merging pull requests. Instead of manually performing these steps, developers can leverage Ouroboros to automate the process, freeing up their time for more complex and creative work.
CAPABILITIES & CONSTRAINTS
README
<p align="center">
<img src="https://img.shields.io/badge/python-3.12+-blue?logo=python&logoColor=white" alt="Python 3.12+"/>
<img src="https://img.shields.io/badge/model-Gemini_2.5_Flash-4285F4?logo=google&logoColor=white" alt="Gemini 2.5 Flash"/>
<img src="https://img.shields.io/badge/framework-PydanticAI-E92063?logo=pydantic&logoColor=white" alt="PydanticAI"/>
<img src="https://img.shields.io/badge/orchestration-LangGraph-1C3C3C" alt="LangGraph"/>
<img src="https://img.shields.io/badge/tracing-Logfire-FF6B35" alt="Logfire"/>
<img src="https://img.shields.io/badge/package_manager-uv-DE5FE9" alt="uv"/>
<img src="https://img.shields.io/badge/tests-284_passing-brightgreen" alt="Tests"/>
</p>
<h1 align="center">Ouroboros</h1>
<h3 align="center">AI Agent Software Development Infrastructure</h3>
<p align="center">
<em>The system that manages and improves itself.</em>
</p>
<p align="center">
<img src="https://img.shields.io/badge/%F0%9F%A7%AA_Testing-In_Progress-blue?style=for-the-badge" alt="Testing"/>
</p>
---
Ouroboros is an agent-first software engineering system.
**Input:** natural language task | **Output:** merged pull request
Six specialized agents — planner, implementer, validator, reviewer, cleaner, post-mortem — plan, write, test, review, merge, and learn from failures autonomously inside a constrained architecture.
---
**Ouroboros** is an agent-first software factory. It takes a natural language task as input and produces a merged, tested, reviewed pull request as output. The agents collaborate through typed Pydantic contracts — no text parsing, no regex, no string matching at any boundary.
The system is self-referential: agents can be tasked to improve the agent infrastructure itself — better prompts, tighter lint rules, new tools — all flowing through the same PR review process.
```mermaid
flowchart LR
Task["'Fix the off-by-one error\nin utils/counter.py'"] --> Plan
subgraph OUROBOROS
Plan --> Implement --> Validate --> OpenPR["Open PR"] --> Review
Validate -- "retry (max 5)" --> Implement
Review -- "approved" --> Merge
Review -- "not approved" --> Implement
Review -- "human feedback" --> Feedback["Feedback Loop"]
Feedback --> Implement
Validate -- "escalate" --> PostMortem["Post-Mortem\n→ self-improvement issue"]
end
Merge --> Done["Merged PR #42"]
```
### Quick Example
> **Task:** "Fix the off-by-one error in utils/counter.py"
>
> 1. **Planner** decomposes the task into typed execution steps
> 2. **Implementer** writes the patch, returns `FileChange[]`
> 3. **Validator** runs pytest + ruff + arch_lint — all pass
> 4. **PR opened** via `gh pr create`
> 5. **Reviewer** agent inspects the diff, approves
> 6. **PR merged** via `gh pr merge --squash`
>
> **Total cost:** $0.0087 | **Iterations:** 2 | **Time:** ~30s
---
### Current Status
> **Status: Testing**
| Done | Upcoming |
|------|----------|
| Core workflow (plan → implement → validate → review → merge) | Live agent integration tests with Vertex AI |
| Architecture linting with AGENT_REMEDIATION | Larger repo benchmarks |
| 10 Golden Principles with machine-checkable lint | Screenshot diff tool for UI validation |
| Repository index (189 symbols, 47 files) | Prompt tuning from Logfire trace data |
| Per-node token tracking and cost metrics | End-to-end Ralph Loop on real tasks |
| 284 tests passing (no GCP credentials required) | |
| Parallel sandbox execution via isolated git worktrees | |
| GitHub issue comment trigger (`/run-task`) | |
| PR feedback loop — agents address human review comments | |
| Struggle-driven self-improvement (post-mortem → auto-fix) | |
| Per-worktree app booting with isolated observability | |
| CLI interface (`ouroboros run`, `feedback`, `gc`, `status`) | |
---
## Table of Contents
- [Why This Project Exists](#why-this-project-exists)
- [Why Ouroboros?](#why-ouroboros)
- [Architecture Overview](#architecture-overview)
- [The Ralph Loop](#the-ralph-loop--pr-lifecycle-workflow)
- [Agent Workers](#agent-workers)
- [Typed Output Models](#typed-output-models)
- [Tool System](#tool-system)
- [Guard Rails](#guard-rails)
- [Cost Awareness](#cost-awareness)
- [PR Feedback Loop](#pr-feedback-loop)
- [Struggle-Driven Self-Improvement](#struggle-driven-self-improvement)
- [Entropy Management & GC](#entropy-management--garbage-collection)
- [Repository Index](#repository-index)
- [Context Builder](#context-builder)
- [Lint Framework](#lint-framework)
- [Observability](#observability)
- [Infrastructure & Sandboxing](#infrastructure--sandboxing)
- [Per-Worktree App Booting](#per-worktree-app-booting)
- [CLI](#cli)
- [Test Suite](#test-suite)
- [Core Beliefs](#core-beliefs)
- [Tech Stack](#tech-stack)
- [Repository Structure](#repository-structure)
- [Getting Started](#getting-started)
- [Configuration](#configuration)
- [CI/CD Pipelines](#cicd-pipelines)
---
## Why This Project Exists
Recent research suggests software engineering is shifting from *writing code* to *designing environments where agents write code*. The bottleneck moves from implementation speed to infrastructure quality — how well constrained, observable, and self-correcting the agent environment is.
Ouroboros explores what that environment looks like in practice:
- **Strict architectural constraints** — layered imports enforced by AST-based linting, not convention
- **Typed agent contracts** — every handoff is a Pydantic model, not a string to parse
- **Deterministic validation** — test/lint routing is a pure function, not an LLM guess
- **Automated entropy management** — daily GC workflow prevents codebase drift before it compounds
- **Cost as a first-class signal** — every run tracks tokens, dollars, and per-node breakdowns
The name is intentional: the system can be tasked to improve itself, and those improvements flow through the same constrained pipeline as any other change.
---
## Why Ouroboros?
Traditional software development is a loop: **plan → write → test → review → merge → repeat**. Ouroboros encodes this loop as a state machine where AI agents execute each step, with typed contracts at every boundary and hard limits to prevent runaway execution.
**Key design constraints:**
1. **No text parsing, ever.** Every agent output is a typed Pydantic model. No regex, no JSON extraction, no "parse the LLM response." If a handoff can fail silently, it will — so every handoff is a type.
2. **Guards are hard limits, not suggestions.** `MAX_IMPLEMENT_ITERATIONS = 5` is a constant, not a config value. An agent that loops forever is worse than one that escalates to a human.
3. **Token budgets are first-class.** The context builder enforces a token budget before agents see anything. Agents that read the whole repo are agents that fail on large repos.
4. **Entropy is tracked daily.** Ten machine-checkable Golden Principles (GP-001 to GP-010) are enforced by linters and a daily garbage collection workflow that opens atomic cleanup PRs.
5. **Self-improvement is the point.** Agents can write better agent workers, tighter lint rules, and new tools — all flowing through the same PR review process as any other change.
---
## Architecture Overview
Ouroboros uses a strict layered architecture enforced by AST-based linting. Each layer can only import from layers below it:
```mermaid
flowchart TD
workflows["WORKFLOWS\nralph_loop · feedback_loop · entropy_gc"]
workers["WORKERS\nplanner · implementer · reviewer · validator · cleaner · post_mortem"]
tools["TOOLS\nfs · shell · git · browser · observability · benchmark"]
core["CORE\nguards · state · context_builder · config · paths"]
models["MODELS\nPlanOutput · ImplementOutput · ReviewOutput · ValidationOutput\nCostSummary · HarnessImprovementOutput · ReproductionResult"]
workflows --> workers --> tools --> core --> models
style workflows fill:#4a9eff,color:#fff
style workers fill:#7c5cbf,color:#fff
style tools fil
[truncated…]PUBLIC HISTORY
IDENTITY
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
README BADGE
Add to your README:
