githubinferredactive
corpusos
provenance:github:Corpus-OS/corpusos
Open-source protocol suite standardizing LLM, Vector, Graph, and Embedding infrastructure across LangChain, LlamaIndex, AutoGen, CrewAI, Semantic Kernel, and MCP. 3,330+ conformance tests. One protocol. Any framework. Any provider.
README
# Corpus OS
<img width="1128" height="191" alt="image" src="https://github.com/user-attachments/assets/cb2fe4ef-be6a-4406-b899-23ad1ed30c08" />








Reference implementation of the **[Corpus OS](https://corpusos.com)** — a **wire-first, vendor-neutral** SDK for interoperable AI frameworks and data backends across four domains: **LLM**, **Embedding**, **Vector**, and **Graph**.
---
**Contact:** [team@corpusos.com](mailto:team@corpusos.com)
**Website:** [https://corpusos.com](https://corpusos.com)
**Docs:** [https://docs.corpusos.com](https://docs.corpusos.com)
---
```
┌──────────────────────────────────────────────────────────────────────┐
│ Your App / Agents / RAG Pipelines │
│ (LangChain · LlamaIndex · Semantic Kernel · CrewAI · AutoGen · MCP) │
├──────────────────────────────────────────────────────────────────────┤
│ Corpus OS Protocol and SDK. │
│ One protocol · One error taxonomy · One metrics model │
├──────────┬──────────────┬────────────┬───────────────────────────────┤
│ LLM/v1 │ Embedding/v1 │ Vector/v1 │ Graph/v1 │
├──────────┴──────────────┴────────────┴───────────────────────────────┤
│ Any Provider: OpenAI · Anthropic · Pinecone · Neo4j · ... │
└──────────────────────────────────────────────────────────────────────┘
```
**Keep your frameworks. Standardize your infrastructure.**
> **Open-Core Model**: The **[Corpus OS](https://corpusos.com)** Protocol Suite and SDK are **fully open source** (Apache-2.0). Corpus Router and official production adapters are **commercial**, optional, and built on the same public protocols. Using this SDK does **not** lock you into CORPUS Router.
---
## Table of Contents
1. [Why CORPUS](#why-corpus)
2. [How CORPUS Compares](#how-corpus-compares)
3. [When Not to Use CORPUS](#when-not-to-use-corpus)
4. [Install](#install)
5. [Quick Start](#-quick-start)
6. [Domain Examples](#domain-examples)
7. [Core Concepts](#core-concepts)
8. [Error Taxonomy & Observability](#error-taxonomy--observability)
9. [Performance & Configuration](#performance--configuration)
10. [Testing & Conformance](#testing--conformance)
11. [Documentation Layout](#-documentation-layout)
12. [FAQ](#faq)
13. [Contributing](#contributing)
14. [License & Commercial Options](#license--commercial-options)
---
## Why Corpus OS
Modern AI platforms juggle multiple LLM, embedding, vector, and graph backends. Each vendor ships unique APIs, error schemes, rate limits, and capabilities — making cross-provider integration brittle and costly.
**The problem:**
- **Provider proliferation** — Dozens of incompatible APIs across AI infrastructure
- **Duplicate integration** — Different error handling, observability, and resilience patterns rewritten per provider and framework
- **Vendor lock-in** — Applications tightly coupled to specific backend choices
- **Operational complexity** — Inconsistent monitoring and debugging across services
**Corpus OS provides:**
- **Stable, runtime-checkable protocols** across all four domains
- **Normalized errors** with retry hints and machine-actionable scopes
- **SIEM-safe metrics** (low-cardinality, tenant-hashed, no PII)
- **Deadline propagation** for cancellation and cost control
- **Two modes** — compose under your own router (`thin`) or use lightweight built-in infra (`standalone`)
- **Wire-first design** — canonical JSON envelopes implementable in any language, with this SDK as reference
Corpus OS is **not** a replacement for LangChain, LlamaIndex, Semantic Kernel, CrewAI, AutoGen, or MCP. Use those for agent-specific orchestration, agents, tools, and RAG pipelines. Use Corpus OS to standardize the **infrastructure layer underneath them**. Your app teams keep their frameworks. Your platform team gets one protocol, one error taxonomy, and one observability model across everything.
---
## How Corpus OS Compares
| Aspect | LangChain / LlamaIndex | OpenRouter | MCP | **Corpus OS** |
|---|---|---|---|---|
| **Scope** | Application framework | LLM unification | Tools & data sources | **AI infrastructure protocols** |
| **Domains** | LLM + Tools | LLM only | Tools + Data | **LLM + Vector + Graph + Embedding** |
| **Error Standardization** | Partial | Limited | N/A | **Comprehensive taxonomy** |
| **Multi-Provider Routing** | Basic | Managed service | N/A | **Protocol for any router** |
| **Observability** | Basic | Limited | N/A | **Built-in metrics + tracing** |
| **Vendor Neutrality** | High | Service-dependent | High | **Protocol-first, no lock-in** |
### Who is this for?
- **App developers** — Keep using your framework of choice. Talk to all backends through Corpus OS protocols. Swap providers or frameworks without rewriting integration code.
- **Framework maintainers** — Implement one CORPUS adapter per protocol. Instantly support any conformant backend.
- **Backend vendors** — Implement `llm/v1`, `embedding/v1`, `vector/v1`, or `graph/v1` once, run the conformance suite, and your service works with every framework.
- **Platform / infra teams** — Unified observability: normalized error codes, deadlines, and metrics. One set of dashboards and SLOs across all AI traffic.
- **MCP users** — The Corpus OS MCP server exposes protocols as standard MCP tools. Any MCP client can call into your infra with consistent behavior.
### Integration Patterns
| Pattern | How It Works | What You Get |
|---|---|---|
| Framework → Corpus OS → Providers | Framework uses Corpus OS as client | Unified errors/metrics across providers |
| Corpus OS → Framework-as-adapter → Providers | Framework wrapped as Corpus OS adapter | Reuse existing chains/indices as "providers" |
| Mixed | Both of the above | Gradual migration, no big-bang rewrites |
Large teams typically run all three patterns at once.
---
## When Not to Use CORPUS
You probably don't need Corpus OS if:
- **Single-provider and happy** — One backend, fine with their SDK and breaking changes.
- **No governance pressure** — No per-tenant isolation, budgets, audit trails, or data residency.
- **No cross-domain orchestration** — Not coordinating LLM + Vector + Graph + Embedding together.
- **Quick throwaway prototype** — Lock-in, metrics, and resilience aren't worth thinking about yet.
If any of these stop being true, `corpus_sdk` is the incremental next step.
---
## Install
```bash
pip install corpus_sdk
```
Python ≥ 3.10 recommended. No heavy runtime dependencies.
---
## ⚡ Quick Start
```python
import asyncio
from corpus_sdk.llm.llm_base import (
BaseLLMAdapter, OperationContext, LLMCompletion,
LLMCapabilities, TokenUsage
)
class QuickAdapter(BaseLLMAdapter):
async def _do_capabilities(self) -> LLMCapabilities:
return LLMCapabilities(
server="quick-demo",
version="1.0.0",
model_family="demo",
max_context_length=4096,
)
async def _do_complete(self, messages, model=None, **kwargs) -> LLMCompletion:
return LLMCompletion(
text="Hello from CORPUS!",
model=model or "quick-demo",
model_family="demo",
usage=TokenUsage(prompt_tokens=2, completion_tokens=3, total_tokens=5),
finish_reason="stop",
)
async def _do_count_tokens(sel
[truncated…]PUBLIC HISTORY
First discoveredMar 21, 2026
IDENTITY
inferred
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
platformgithub
first seenOct 30, 2025
last updatedMar 18, 2026
last crawledtoday
version—
README BADGE
Add to your README:
