langsight
LangSight acts as a safety net for AI agents, constantly monitoring the tools they use to complete tasks. It identifies problems like agents getting stuck repeating the same action, tools experiencing unexpected errors, or costs spiraling out of control. This helps businesses avoid frustrating user experiences, unexpected expenses, and potential security risks. Teams building and deploying AI agents, particularly those relying on external services, would find LangSight invaluable. What sets it apart is its focus on the operational reliability of these tools, providing real-time visibility and automated safeguards that go beyond simply evaluating the AI's responses.
README
# LangSight
**Your agent failed. Which tool broke — and how do we stop it next time?**
Detect loops. Enforce budgets. Break failing tools. Map blast radius.
For MCP servers: health checks, security scanning, schema drift detection.
[](https://www.langsight.dev)
[](https://pypi.org/project/langsight/)
[](LICENSE)
[](https://www.python.org/downloads/)
[](https://github.com/LangSight/langsight/actions/workflows/ci.yml)
[](https://docs.langsight.dev)
> **Not another prompt, eval, or simulation platform.**
> LangSight is the runtime reliability layer for AI agent toolchains.
---
## Where LangSight fits
Langfuse watches the **brain** (model outputs, token costs, evals).
LangWatch tests the **brain** (simulations, prompt optimization).
Datadog watches the **body** (CPU, memory, HTTP codes).
**LangSight watches the hands** (tools the agent calls, their health, safety, and cost).
| Question | Best tool |
|----------|-----------|
| Did the prompt/model perform well? | LangWatch / Langfuse / LangSmith |
| Should I change prompts or eval policy? | LangWatch / Langfuse / LangSmith |
| Is my server CPU/memory healthy? | Datadog / New Relic |
| **Which tool call failed in production?** | **LangSight** |
| **Is my agent stuck in a loop?** | **LangSight** |
| **Is an MCP server unhealthy or drifting?** | **LangSight** |
| **Is an MCP server exposed or risky?** | **LangSight** |
| **Why did this session cost $47 instead of $3?** | **LangSight** |
| **If this tool goes down, which agents break?** | **LangSight** |
Use LangSight alongside Langfuse and LangWatch — not instead of them.
---
## The problem
LLM quality is only half the problem. Teams already have ways to inspect prompts and eval scores. What they still cannot answer fast enough:
- **Agent stuck in a loop** — retries the same tool 47 times, burns $200, produces nothing
- **MCP server degraded silently** — schema changed, latency spiked, auth expired. Agent keeps calling, gets bad data
- **Cost explosion** — sub-agent retries geocoding-mcp endlessly. Nobody knows until the invoice arrives
- **Cascading failure** — postgres-mcp goes down. 3 agents depend on it. All sessions fail. No blast radius visibility
- **Unsafe MCP server** — 66% of community MCP servers have critical code smells. No automated scanning
---
## What LangSight does
### 1. Prevent — stop failures before users notice
```python
from langsight.sdk import LangSightClient
client = LangSightClient(
url="http://localhost:8000",
loop_detection=True, # detect same tool+args called 3x → auto-stop
max_cost_usd=1.00, # hard budget limit per session
max_steps=25, # hard step limit
circuit_breaker=True, # auto-disable tools after 5 consecutive failures
)
```
- **Loop detection** — same tool called with same args 3x → session terminated, alert fired
- **Budget guardrails** — max cost / max steps per session → hard stop before bill shock
- **Circuit breaker** — tool fails 5x → auto-disabled for cooldown → alert → auto-recovery test
### 2. Detect — see what broke and why
```
$ langsight sessions --id sess-f2a9b1
Trace: sess-f2a9b1 (support-agent) [LOOP_DETECTED]
5 tool calls · 1 failed · 2,134ms · $0.023
sess-f2a9b1
├── jira-mcp/get_issue 89ms ✓
├── postgres-mcp/query 42ms ✓
├── → billing-agent handoff
│ ├── crm-mcp/update 120ms ✓
│ └── slack-mcp/notify — ✗ timeout
Root cause: slack-mcp timed out at 14:32 UTC
```
- **Action traces** — every tool call in every session, with latency, status, cost
- **Multi-agent trees** — full call tree across agent handoffs via `parent_span_id`
- **Run health tags** — every session auto-classified: `success`, `loop_detected`, `budget_exceeded`, `tool_failure`
### 3. Monitor — MCP health + security
```
$ langsight mcp-health
Server Status Latency Schema Circuit
snowflake-mcp ✅ UP 142ms Stable closed
slack-mcp ⚠️ DEG 1,240ms Stable closed
jira-mcp ❌ DOWN — — open (5 failures)
postgres-mcp ✅ UP 31ms Changed closed
```
```
$ langsight security-scan
CRITICAL jira-mcp CVE-2025-6514 Remote code execution in mcp-remote
HIGH slack-mcp OWASP-MCP-01 Tool description contains injection pattern
HIGH postgres-mcp OWASP-MCP-04 No authentication configured
```
- **MCP health checks** — continuous ping, latency, uptime tracking
- **Schema drift detection** — tool schemas change → alert fires before agents hallucinate
- **Security scanning** — CVE (OSV), OWASP MCP Top 10, tool poisoning detection, auth audit
### 4. Attribute — cost at the tool level
```
$ langsight costs --hours 24
Tool Calls Failed Cost % of Total
geocoding-mcp 2,340 12 $1,872 44.6%
postgres-mcp/query 890 3 $445 10.6%
claude-3.5 (LLM) 156 0 $312 7.4%
```
Not model-level costs (Langfuse does that). **Tool-level costs.** Which MCP server is burning your budget?
### 5. Map — blast radius via lineage
```
postgres-mcp ❌ DOWN
Impact:
- support-agent: 200 sessions/day (HIGH)
- billing-agent: 50 sessions/day (MEDIUM)
- data-agent: 10 sessions/day (LOW)
Total: ~260 sessions/day affected
Circuit breaker: active (auto-disabled 3 minutes ago)
```
- **Lineage DAG** — which agents call which tools
- **Blast radius** — if this tool goes down, what else breaks?
- **Impact alerts** — "postgres-mcp is DOWN — 3 agents affected, 260 sessions/day"
### 6. Investigate — AI-assisted root cause
```
$ langsight investigate jira-mcp
Investigation: jira-mcp
├── Health: DOWN since 14:32 UTC (3 consecutive failures)
├── Schema: 2 tools changed (get_issue dropped 'priority' field)
├── Recent errors: 429 Too Many Requests (rate limit)
└── Recommendation: check API rate limits, restore 'priority' field
```
---
## Quick start
### Prerequisites
- Docker and Docker Compose
- Python 3.11+ and [uv](https://docs.astral.sh/uv/)
### 1. Clone and start
```bash
git clone https://github.com/LangSight/langsight.git
cd langsight
./scripts/quickstart.sh
```
Takes ~2 minutes. Generates secrets, starts 5 containers, seeds demo data.
### 2. Open the dashboard
**http://localhost:3003** — log in with the admin email and password written to `.env` by `quickstart.sh` (randomly generated — check the file).
### 3. Instrument your agent
```python
from langsight.sdk import LangSightClient
client = LangSightClient(url="http://localhost:8000", api_key="<from quickstart>")
traced = client.wrap(mcp_session, server_name="postgres-mcp", agent_name="my-agent")
result = await traced.call_tool("query", {"sql": "SELECT * FROM orders"})
```
Two lines. Every tool call is now traced, guarded, and cost-attributed.
---
## Alerting
| Channel | Status |
|---|---|
| Slack (Block Kit) | Shipped |
| Generic webhook | Shipped |
| OpsGenie (native Events API) | v0.3 |
| PagerDuty (Events API v2) | v0.3 |
Alert types: server down/recovered, schema drift, latency spike, SLO breach, anomaly, loop detected, budget exceeded, circuit breaker open, failure rate spike, blast radius impact.
---
## Horizontal scaling with Redis
For multi-worker deployments, add Redis for shared rate limiting, SSE broadcasting, and circuit breaker state:
```bash
# Install Redis support
pip install "langsight[redis]"
# Add to .env
REDIS_PASSWORD=$(openssl rand -hex 24)
LANGSIGHT_REDIS_URL=redis://:${REDIS_PASSWORD}@redis:6379
LANGSIGHT_WORKERS=4
# Start with Redis
docker compose --pro
[truncated…]PUBLIC HISTORY
IDENTITY
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
README BADGE
Add to your README:
