githubinferredactive
ouroboros
provenance:github:Tanush1912/ouroboros
Ouroboros: Agent first software engineering infrastructure. Autonomous AI agents plan, implement, test, review, and merge pull requests through typed contracts and constrained workflows. The system manages and improves itself.
README
<p align="center">
<img src="https://img.shields.io/badge/python-3.12+-blue?logo=python&logoColor=white" alt="Python 3.12+"/>
<img src="https://img.shields.io/badge/model-Gemini_2.5_Flash-4285F4?logo=google&logoColor=white" alt="Gemini 2.5 Flash"/>
<img src="https://img.shields.io/badge/framework-PydanticAI-E92063?logo=pydantic&logoColor=white" alt="PydanticAI"/>
<img src="https://img.shields.io/badge/orchestration-LangGraph-1C3C3C" alt="LangGraph"/>
<img src="https://img.shields.io/badge/tracing-Logfire-FF6B35" alt="Logfire"/>
<img src="https://img.shields.io/badge/package_manager-uv-DE5FE9" alt="uv"/>
<img src="https://img.shields.io/badge/tests-284_passing-brightgreen" alt="Tests"/>
</p>
<h1 align="center">Ouroboros</h1>
<h3 align="center">AI Agent Software Development Infrastructure</h3>
<p align="center">
<em>The system that manages and improves itself.</em>
</p>
<p align="center">
<img src="https://img.shields.io/badge/%F0%9F%A7%AA_Testing-In_Progress-blue?style=for-the-badge" alt="Testing"/>
</p>
---
Ouroboros is an agent-first software engineering system.
**Input:** natural language task | **Output:** merged pull request
Six specialized agents — planner, implementer, validator, reviewer, cleaner, post-mortem — plan, write, test, review, merge, and learn from failures autonomously inside a constrained architecture.
---
**Ouroboros** is an agent-first software factory. It takes a natural language task as input and produces a merged, tested, reviewed pull request as output. The agents collaborate through typed Pydantic contracts — no text parsing, no regex, no string matching at any boundary.
The system is self-referential: agents can be tasked to improve the agent infrastructure itself — better prompts, tighter lint rules, new tools — all flowing through the same PR review process.
```mermaid
flowchart LR
Task["'Fix the off-by-one error\nin utils/counter.py'"] --> Plan
subgraph OUROBOROS
Plan --> Implement --> Validate --> OpenPR["Open PR"] --> Review
Validate -- "retry (max 5)" --> Implement
Review -- "approved" --> Merge
Review -- "not approved" --> Implement
Review -- "human feedback" --> Feedback["Feedback Loop"]
Feedback --> Implement
Validate -- "escalate" --> PostMortem["Post-Mortem\n→ self-improvement issue"]
end
Merge --> Done["Merged PR #42"]
```
### Quick Example
> **Task:** "Fix the off-by-one error in utils/counter.py"
>
> 1. **Planner** decomposes the task into typed execution steps
> 2. **Implementer** writes the patch, returns `FileChange[]`
> 3. **Validator** runs pytest + ruff + arch_lint — all pass
> 4. **PR opened** via `gh pr create`
> 5. **Reviewer** agent inspects the diff, approves
> 6. **PR merged** via `gh pr merge --squash`
>
> **Total cost:** $0.0087 | **Iterations:** 2 | **Time:** ~30s
---
### Current Status
> **Status: Testing**
| Done | Upcoming |
|------|----------|
| Core workflow (plan → implement → validate → review → merge) | Live agent integration tests with Vertex AI |
| Architecture linting with AGENT_REMEDIATION | Larger repo benchmarks |
| 10 Golden Principles with machine-checkable lint | Screenshot diff tool for UI validation |
| Repository index (189 symbols, 47 files) | Prompt tuning from Logfire trace data |
| Per-node token tracking and cost metrics | End-to-end Ralph Loop on real tasks |
| 284 tests passing (no GCP credentials required) | |
| Parallel sandbox execution via isolated git worktrees | |
| GitHub issue comment trigger (`/run-task`) | |
| PR feedback loop — agents address human review comments | |
| Struggle-driven self-improvement (post-mortem → auto-fix) | |
| Per-worktree app booting with isolated observability | |
| CLI interface (`ouroboros run`, `feedback`, `gc`, `status`) | |
---
## Table of Contents
- [Why This Project Exists](#why-this-project-exists)
- [Why Ouroboros?](#why-ouroboros)
- [Architecture Overview](#architecture-overview)
- [The Ralph Loop](#the-ralph-loop--pr-lifecycle-workflow)
- [Agent Workers](#agent-workers)
- [Typed Output Models](#typed-output-models)
- [Tool System](#tool-system)
- [Guard Rails](#guard-rails)
- [Cost Awareness](#cost-awareness)
- [PR Feedback Loop](#pr-feedback-loop)
- [Struggle-Driven Self-Improvement](#struggle-driven-self-improvement)
- [Entropy Management & GC](#entropy-management--garbage-collection)
- [Repository Index](#repository-index)
- [Context Builder](#context-builder)
- [Lint Framework](#lint-framework)
- [Observability](#observability)
- [Infrastructure & Sandboxing](#infrastructure--sandboxing)
- [Per-Worktree App Booting](#per-worktree-app-booting)
- [CLI](#cli)
- [Test Suite](#test-suite)
- [Core Beliefs](#core-beliefs)
- [Tech Stack](#tech-stack)
- [Repository Structure](#repository-structure)
- [Getting Started](#getting-started)
- [Configuration](#configuration)
- [CI/CD Pipelines](#cicd-pipelines)
---
## Why This Project Exists
Recent research suggests software engineering is shifting from *writing code* to *designing environments where agents write code*. The bottleneck moves from implementation speed to infrastructure quality — how well constrained, observable, and self-correcting the agent environment is.
Ouroboros explores what that environment looks like in practice:
- **Strict architectural constraints** — layered imports enforced by AST-based linting, not convention
- **Typed agent contracts** — every handoff is a Pydantic model, not a string to parse
- **Deterministic validation** — test/lint routing is a pure function, not an LLM guess
- **Automated entropy management** — daily GC workflow prevents codebase drift before it compounds
- **Cost as a first-class signal** — every run tracks tokens, dollars, and per-node breakdowns
The name is intentional: the system can be tasked to improve itself, and those improvements flow through the same constrained pipeline as any other change.
---
## Why Ouroboros?
Traditional software development is a loop: **plan → write → test → review → merge → repeat**. Ouroboros encodes this loop as a state machine where AI agents execute each step, with typed contracts at every boundary and hard limits to prevent runaway execution.
**Key design constraints:**
1. **No text parsing, ever.** Every agent output is a typed Pydantic model. No regex, no JSON extraction, no "parse the LLM response." If a handoff can fail silently, it will — so every handoff is a type.
2. **Guards are hard limits, not suggestions.** `MAX_IMPLEMENT_ITERATIONS = 5` is a constant, not a config value. An agent that loops forever is worse than one that escalates to a human.
3. **Token budgets are first-class.** The context builder enforces a token budget before agents see anything. Agents that read the whole repo are agents that fail on large repos.
4. **Entropy is tracked daily.** Ten machine-checkable Golden Principles (GP-001 to GP-010) are enforced by linters and a daily garbage collection workflow that opens atomic cleanup PRs.
5. **Self-improvement is the point.** Agents can write better agent workers, tighter lint rules, and new tools — all flowing through the same PR review process as any other change.
---
## Architecture Overview
Ouroboros uses a strict layered architecture enforced by AST-based linting. Each layer can only import from layers below it:
```mermaid
flowchart TD
workflows["WORKFLOWS\nralph_loop · feedback_loop · entropy_gc"]
workers["WORKERS\nplanner · implementer · reviewer · validator · cleaner · post_mortem"]
tools["TOOLS\nfs · shell · git · browser · observability · benchmark"]
core["CORE\nguards · state · context_builder · config · paths"]
models["MODELS\nPlanOutput · ImplementOutput · ReviewOutput · ValidationOutput\nCostSummary · HarnessImprovementOutput · ReproductionResult"]
workflows --> workers --> tools --> core --> models
style workflows fill:#4a9eff,color:#fff
style workers fill:#7c5cbf,color:#fff
style tools fil
[truncated…]PUBLIC HISTORY
First discoveredMar 22, 2026
IDENTITY
inferred
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
platformgithub
first seenMar 5, 2026
last updatedMar 18, 2026
last crawled4 days ago
version—
README BADGE
Add to your README:
