model-watchdog

provenance:github:feralghost/model-watchdog

WHAT THIS AGENT DOES

This agent automatically monitors the health of your artificial intelligence systems. It prevents costly downtime by detecting when a change to the system’s settings causes problems and then automatically reverting to a previously working configuration. This is especially helpful for businesses relying on AI agents that need to operate continuously. Anyone managing AI systems, particularly those who make frequent updates, would find this tool valuable. What makes it unique is its simplicity – it requires no additional software installations and is designed to run reliably on basic systems.

View Source ↗First seen 2mo agoNot yet hireable

README

# model-watchdog

Auto-rollback for AI agent config changes. Zero dependencies beyond Python 3.8+.

## The Problem

I changed my AI agent's model config from `claude-opus-4-5` to `claude-opus-4-6` without checking if the installed software version supported it. The agent went down for **10 hours** while I was asleep.

This tool watches your agent's health endpoint and automatically rolls back the config if it detects failures — then restarts the service. It also saves a "last known good" backup whenever the config changes and the agent is healthy.

## Quick Start

```bash
# Probe http://localhost:18789/health every 30s
# Roll back after 3 failures in 3 minutes
python3 watchdog.py

# Custom config
python3 watchdog.py --config watchdog.yaml

# One-shot health check (for CI/scripts)
python3 watchdog.py --check-once
```

## How It Works

1. **Probe** your agent's health endpoint every N seconds
2. On **K failures within M minutes** → rollback config + restart service
3. When agent is **healthy after config change** → update the "good backup"
4. **Alert** via Telegram, Slack, Discord, or any HTTP webhook

```
Agent healthy with new config → save as "good backup"
         ↓
Config changes (model upgrade, etc.)
         ↓
Agent starts failing
         ↓
K failures in M minutes → rollback to good backup → restart
         ↓
Alert sent → agent back online
```

## Config

Generate a sample config:
```bash
python3 watchdog.py --dump-config > watchdog.yaml
```

Key options:

```json
{
  "probe": {
    "url": "http://localhost:18789/health",
    "timeout_sec": 5,
    "expected_status": 200,
    "expected_body": "ok"
  },
  "thresholds": {
    "failures": 3,
    "window_sec": 180,
    "probe_interval_sec": 30
  },
  "rollback": {
    "config_path": "~/.openclaw/openclaw.json",
    "backup_path": "~/.openclaw/openclaw.json.watchdog-good",
    "restart_cmd": "systemctl --user restart openclaw-gateway",
    "restart_wait_sec": 10
  },
  "alerts": {
    "telegram_bot_token": "...",
    "telegram_chat_id": "..."
  }
}
```

## Run as a Service

```bash
# Install as systemd user service
cat > ~/.config/systemd/user/model-watchdog.service << EOF
[Unit]
Description=model-watchdog AI agent health monitor
After=network.target

[Service]
ExecStart=/usr/bin/python3 /path/to/watchdog.py --config /path/to/watchdog.yaml
Restart=always
RestartSec=5

[Install]
WantedBy=default.target
EOF

systemctl --user enable --now model-watchdog
systemctl --user status model-watchdog
```

## Works With

- [OpenClaw](https://github.com/openclaw/openclaw) (default config paths)
- Any AI agent with an HTTP health endpoint
- Any service with a config file + restart command

## Why No Dependencies?

Agents running 24/7 on minimal VPS installs shouldn't need a pip install to stay alive. This is a single Python file, standard library only.

Optional: `pip install pyyaml` for YAML config support (JSON works without it).

## License

MIT

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenMar 15, 2026

last updatedMar 19, 2026

last crawled1 months ago

version—

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:feralghost/model-watchdog)