githubinferredactive
model-watchdog
provenance:github:feralghost/model-watchdog
WHAT THIS AGENT DOES
This agent automatically monitors the health of your artificial intelligence systems. It prevents costly downtime by detecting when a change to the system’s settings causes problems and then automatically reverting to a previously working configuration. This is especially helpful for businesses relying on AI agents that need to operate continuously. Anyone managing AI systems, particularly those who make frequent updates, would find this tool valuable. What makes it unique is its simplicity – it requires no additional software installations and is designed to run reliably on basic systems.
README
# model-watchdog
Auto-rollback for AI agent config changes. Zero dependencies beyond Python 3.8+.
## The Problem
I changed my AI agent's model config from `claude-opus-4-5` to `claude-opus-4-6` without checking if the installed software version supported it. The agent went down for **10 hours** while I was asleep.
This tool watches your agent's health endpoint and automatically rolls back the config if it detects failures — then restarts the service. It also saves a "last known good" backup whenever the config changes and the agent is healthy.
## Quick Start
```bash
# Probe http://localhost:18789/health every 30s
# Roll back after 3 failures in 3 minutes
python3 watchdog.py
# Custom config
python3 watchdog.py --config watchdog.yaml
# One-shot health check (for CI/scripts)
python3 watchdog.py --check-once
```
## How It Works
1. **Probe** your agent's health endpoint every N seconds
2. On **K failures within M minutes** → rollback config + restart service
3. When agent is **healthy after config change** → update the "good backup"
4. **Alert** via Telegram, Slack, Discord, or any HTTP webhook
```
Agent healthy with new config → save as "good backup"
↓
Config changes (model upgrade, etc.)
↓
Agent starts failing
↓
K failures in M minutes → rollback to good backup → restart
↓
Alert sent → agent back online
```
## Config
Generate a sample config:
```bash
python3 watchdog.py --dump-config > watchdog.yaml
```
Key options:
```json
{
"probe": {
"url": "http://localhost:18789/health",
"timeout_sec": 5,
"expected_status": 200,
"expected_body": "ok"
},
"thresholds": {
"failures": 3,
"window_sec": 180,
"probe_interval_sec": 30
},
"rollback": {
"config_path": "~/.openclaw/openclaw.json",
"backup_path": "~/.openclaw/openclaw.json.watchdog-good",
"restart_cmd": "systemctl --user restart openclaw-gateway",
"restart_wait_sec": 10
},
"alerts": {
"telegram_bot_token": "...",
"telegram_chat_id": "..."
}
}
```
## Run as a Service
```bash
# Install as systemd user service
cat > ~/.config/systemd/user/model-watchdog.service << EOF
[Unit]
Description=model-watchdog AI agent health monitor
After=network.target
[Service]
ExecStart=/usr/bin/python3 /path/to/watchdog.py --config /path/to/watchdog.yaml
Restart=always
RestartSec=5
[Install]
WantedBy=default.target
EOF
systemctl --user enable --now model-watchdog
systemctl --user status model-watchdog
```
## Works With
- [OpenClaw](https://github.com/openclaw/openclaw) (default config paths)
- Any AI agent with an HTTP health endpoint
- Any service with a config file + restart command
## Why No Dependencies?
Agents running 24/7 on minimal VPS installs shouldn't need a pip install to stay alive. This is a single Python file, standard library only.
Optional: `pip install pyyaml` for YAML config support (JSON works without it).
## License
MIT
PUBLIC HISTORY
First discoveredMar 21, 2026
IDENTITY
inferred
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
platformgithub
first seenMar 15, 2026
last updatedMar 19, 2026
last crawled4 days ago
version—
README BADGE
Add to your README:
