githubinferredactive
zeph
provenance:github:bug-ops/zeph
Rust AI agent where every context token earns its place. Self-learning skills, temporal graph memory, cascade quality routing, OWASP AI security. Hybrid inference: Ollama · Claude · Gemini · OpenAI · GGUF. MCP + ACP. One binary.
README
<div align="center"> <img src="book/src/assets/zeph_v8_github.png" alt="Zeph" width="800"> **The AI agent that respects your resources.** Single binary. Minimal hardware. Maximum context efficiency. [](https://crates.io/crates/zeph) [](https://bug-ops.github.io/zeph/) [](https://github.com/bug-ops/zeph/actions) [](https://github.com/bug-ops/zeph/actions) [](https://codecov.io/gh/bug-ops/zeph) [](https://github.com/bug-ops/zeph/tree/main/crates) [](https://www.rust-lang.org) [](LICENSE) </div> --- Zeph is a Rust AI agent built around one principle: **every token in the context window must earn its place**. Skills are retrieved semantically, tool output is filtered before injection, and the context compacts automatically under pressure — keeping costs low and responses fast on hardware you already own. ```bash curl -fsSL https://github.com/bug-ops/zeph/releases/latest/download/install.sh | sh zeph init # interactive setup wizard zeph migrate-config --config config.toml --diff # check for new options after upgrade zeph # start the agent ``` > [!TIP] > `cargo install zeph` also works. Pre-built binaries and Docker images are on the [releases page](https://github.com/bug-ops/zeph/releases). --- ## Quick Start **Local (Ollama)** — no cloud accounts required: ```bash ollama pull qwen3:8b && ollama pull qwen3-embedding zeph init && zeph ``` **Cloud (Claude, OpenAI, or Gemini)**: ```bash export ZEPH_CLAUDE_API_KEY=sk-ant-... # or ZEPH_OPENAI_API_KEY / ZEPH_GEMINI_API_KEY zeph init && zeph ``` **Hybrid** — Ollama by default, cloud as fallback: ```bash export ZEPH_CLAUDE_API_KEY=sk-ant-... # select "router" in `zeph init`, or copy the hybrid recipe below zeph ``` > [!WARNING] > **v0.17.0 breaking change**: the `[llm.cloud]`, `[llm.openai]`, `[llm.orchestrator]`, and `[llm.router]` config sections are replaced by `[[llm.providers]]`. Run `zeph migrate-config --in-place` to upgrade automatically. > [!NOTE] > **v0.18.1**: PILOT LinUCB contextual bandit routing (`LlmRoutingStrategy::Bandit`) and PostgreSQL memory backend are now available. > [!NOTE] > **v0.18.2**: Structured shell output envelope (`ShellOutputEnvelope` with separate stdout/stderr/exit_code/truncated), per-path file read sandbox (`[tools.file]` deny_read/allow_read), MCP elicitation support, IBCT capability tokens for A2A (`ibct` feature), reactive hooks for `CwdChanged`/`FileChanged` events, `[memory.store_routing]` strategy wired (BREAKING: `[memory.routing]` removed), and skill `disambiguation_threshold` default raised to 0.20. > [!TIP] > Copy-paste configs for all common setups — local, cloud, hybrid, coding assistant, Telegram bot — are in the **[Configuration Recipes](https://bug-ops.github.io/zeph/guides/config-recipes.html)** guide. --- ## What's inside | Feature | Description | |---|---| | **Hybrid inference** | Ollama, Claude, OpenAI, Google Gemini, any OpenAI-compatible API, or fully local via Candle (GGUF). Providers are declared as `[[llm.providers]]` entries in config. Gemini supports SSE streaming, thinking-part surfacing (Gemini 2.5), and streaming `functionCall` parts. Multi-model orchestrator with fallback chains, EMA latency routing, and adaptive Thompson Sampling for exploration/exploitation-balanced model selection. Cascade routing supports `cost_tiers` for explicit cheapest-first provider ordering and `ClassifierMode::Judge` for LLM-scored query routing. **Complexity triage routing** (`LlmRoutingStrategy::Triage`) classifies each request into Simple/Medium/Complex/Expert tiers before inference and dispatches to the tier-matched provider pool, avoiding over-provisioning cheap queries to expensive models. **PILOT LinUCB bandit routing** (`LlmRoutingStrategy::Bandit`) applies a contextual LinUCB bandit to provider selection — features include query complexity, provider latency history, and time-of-day signals; configured via `[llm.router.bandit]`. Claude extended context (`--extended-context` flag or `enable_extended_context = true`) enables the 1M token window with a TUI `[1M CTX]` header badge; cost warning emitted automatically. Built-in pricing includes gpt-5 and gpt-5-mini. [→ Providers](https://bug-ops.github.io/zeph/concepts/providers.html) | | **Skills-first architecture** | YAML+Markdown skill files with BM25+cosine hybrid retrieval. Bayesian re-ranking, 4-tier trust model, and self-learning evolution — skills improve from real usage. Agent-as-a-Judge feedback detection with adaptive regex/LLM hybrid analysis across 7 languages (English, Russian, Spanish, German, French, Portuguese, Chinese). The `load_skill` tool lets the LLM fetch the full body of any skill outside the active TOP-N set on demand. [→ Skills](https://bug-ops.github.io/zeph/concepts/skills.html) · [→ Self-learning](https://bug-ops.github.io/zeph/advanced/self-learning.html) | | **Context engineering** | Semantic skill selection, command-aware output filters, tool-pair summarization with deferred application (pre-computed eagerly, applied lazily to stabilize the Claude API prompt cache prefix), proactive context compression (reactive + proactive strategies), and reactive middle-out compaction keep the window efficient under any load. Three-tier compaction pipeline: deferred summary application at 70% context usage → pruning at 80% → LLM compaction on overflow. **HiAgent subgoal-aware compaction** tracks active and completed subgoals — active subgoal messages are protected from eviction while completed subgoals are candidates for summarization with MIG redundancy scoring. Large tool outputs are stored in SQLite (not on disk) and injected on demand via the native `read_overflow` tool, eliminating absolute-path leakage and enabling automatic cleanup on conversation delete. **Failure-driven compression guidelines** (ACON): after each hard compaction, the agent monitors responses for context-loss signals; confirmed failure pairs train an LLM-generated `<compression-guidelines>` block that is injected into every future compaction prompt. **ACON per-category guidelines** (`categorized_guidelines = true` in `[memory.compression_guidelines]`) tags each failure pair by category (tool_output / assistant_reasoning / user_context) and maintains separate per-category guideline blocks for finer-grained compression control. **Memex tool-output archive** (`archive_tool_outputs = true` in `[memory.compression]`) saves tool output bodies to SQLite before compaction and injects UUID back-references into summaries, preserving retrievability after the live context is discarded. `--debug-dump [PATH]` writes every LLM request, response, and raw tool output to numbered files for context debugging; `--dump-format <json\|raw\|trace>` (or `/dump-format` at runtime) switches the output format — `trace` emits OpenTelemetry-compatible OTLP JSON with a session → iteration → LLM-call/tool-call/memory-search span hierarchy. [→ Context](https://bug-ops.github.io/zeph/advanced/context.html) · [→ Debug Dump](https://bug-ops.github.io/zeph/advanced/debug-dump.html) | | **Semantic memory** | SQLite (default) or PostgreSQL + Qdrant with MMR re-ranking, temporal decay, write-time importance scoring, query-aware memory routing (keyword/semantic/hybrid/episodic), cross-session recall, implicit correction detection, and credential scrubbing. **Structured anchored summarization** preserves factual anc [truncated…]
PUBLIC HISTORY
First discoveredMar 22, 2026
IDENTITY
inferred
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
platformgithub
first seenFeb 5, 2026
last updatedMar 21, 2026
last crawledtoday
version—
README BADGE
Add to your README:
