githubinferredactive
SmudgeAI
provenance:github:ShaikhWarsi/SmudgeAI
WHAT THIS AGENT DOES
SmudgeAI is like having a digital assistant that can control your computer programs for you. It automates repetitive tasks across different applications, saving you time and effort. Businesses and individuals who spend a lot of time on tedious computer work – like data entry, report generation, or managing multiple software systems – would find SmudgeAI incredibly useful.
README
# SmudgeAI
SmudgeAI is a Windows-native autonomous desktop AI agent that understands screen context through vision models and UI element trees, enabling it to navigate any application and execute complex multi-step workflows with minimal human intervention. It leverages LLM-based reasoning to adapt to unknown interfaces while maintaining security through built-in permission systems and safeguards.
## Table of Contents
1. [Overview](#overview)
2. [Architecture](#architecture)
3. [Core Components](#core-components)
4. [AI Engine & Multimodal Processing](#ai-engine--multimodal-processing)
5. [Desktop Automation Stack](#desktop-automation-stack)
6. [UI Detection & Element Matching](#ui-detection--element-matching)
7. [Security & Safety Systems](#security--safety-systems)
8. [Error Handling & Recovery](#error-handling--recovery)
9. [Multi-Monitor & DPI Scaling Support](#multi-monitor--dpi-scaling-support)
10. [Internationalization & Localization](#internationalization--localization)
11. [Logging & Observability](#logging--observability)
12. [Capabilities Summary](#capabilities-summary)
13. [Comparison with OpenClaw](#comparison-with-openclaw)
14. [Bug Fixes & Security Hardening](#bug-fixes--security-hardening)
15. [Getting Started](#getting-started)
16. [Configuration](#configuration)
17. [Roadmap](#roadmap)
---
## Overview
SmudgeAI is designed to be a **fully autonomous desktop control agent** that can:
- **Understand screen context** through vision models and UI element trees
- **Navigate any Windows application** using UIA-based element discovery
- **Execute multi-step workflows** with verification and rollback
- **Adapt to unknown interfaces** through LLM-based reasoning
- **Operate safely** with permission systems and safeguards
### Key Design Goals
1. **Reliability First** - Every action is verified before proceeding
2. **Security by Default** - Permission systems and input sanitization on all dangerous operations
3. **Cross-UI Adaptability** - Works with any Windows application through UIA and CV fallback
4. **Production Ready** - Structured logging, error recovery, and observability
---
## Architecture
### High-Level System Diagram
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ SmudgeAI Architecture │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────────────────────────────────────┐ │
│ │ GUI │────▶│ AI Engine │ │
│ │ (PyQt5) │ │ ┌─────────────┐ ┌─────────────────────┐ │ │
│ │ │ │ │ Groq Client │ │ Gemini Client │ │ │
│ │ - Input │ │ │ (Primary) │ │ (Fallback) │ │ │
│ │ - Display │ │ └─────────────┘ └─────────────────────┘ │ │
│ │ - Status │ │ ┌─────────────────────────────────────┐ │ │
│ └──────────────┘ │ │ Rate Limiter & Model Cycling │ │ │
│ │ │ └─────────────────────────────────────┘ │ │
│ │ │ ┌─────────────────────────────────────┐ │ │
│ ▼ │ │ Conversation History Manager │ │ │
│ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ Task Manager │ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │ │ │
│ │ │ Permission │ │ Tool │ │ Error │ │ │ │
│ │ │ System │ │ Registry │ │ Classifier │ │ │ │
│ │ └────────────┘ └────────────┘ └────────────────────┘ │ │ │
│ └──────────────────────────────────────────────────────────────┘ │ │
│ │ │
│ ┌────────────────────┼────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Desktop │ │ CV/UI │ │ Local VLM │ │
│ │ State │ │ Integration │ │ (Optional) │ │
│ │ (UIA) │ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Action │ │
│ │ Execution │ │
│ │ (pyautogui) │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Component Responsibilities
| Component | Responsibility | Language/Framework |
|-----------|----------------|-------------------|
| **GUI** | User input, status display, permission dialogs | PyQt5 |
| **AI Engine** | LLM orchestration, vision analysis, tool calling | Python (async) |
| **Task Manager** | Tool execution, permission checks, caching | Python |
| **Desktop State** | UIA element tree, window management | pywinauto + pygetwindow |
| **CV/UI Integration** | Template matching, LLM coordinate verification | OpenCV + PIL |
| **Error Handler** | Error classification, retry strategies | Python |
| **Structured Logging** | Correlation IDs, action tracking | Python (custom) |
---
## Core Components
### 1. AI Engine (`ai_engine.py`)
The AI Engine is the brain of SmudgeAI, orchestrating all LLM interactions.
#### Features
- **Multi-Provider Support**: Groq (primary), Google Gemini (fallback)
- **Vision Analysis**: Llama 3.2 Vision (Groq) → Gemini 1.5 Flash fallback
- **Model Cycling**: Automatic failover when rate limits hit
- **Rate Limiting**: Built-in rate limiter with exponential backoff
- **Conversation History**: Properly serialized message history (dict-based)
- **Tool Schema Generation**: Dynamic tool schema from Python functions
#### Rate Limiter Implementation
```python
class RateLimiter:
requests_in_window: int # Requests in current window
window_size: float = 60.0 # 60-second window
max_requests: int = 30 # Max 30 requests per window
blocked_until: float # Timestamp when block expires
consecutive_errors: int # Track consecutive failures
```
**Backoff Strategy**:
- Base delay: 30 seconds
- Exponential: 2^consecutive_errors
- Jitter: Random 0-10 seconds
- Max block: 300 seconds (5 minutes)
#### Vision Pipeline
```
Screenshot → Groq Llama 3.2 Vision → JSON Elements → Coordinate Verification → Click
↓ (fallback)
Gemini 1.5 Flash Vision
```
### 2. Desktop State (`desktop_state.py`)
Captures and maintains the UI element hierarchy of all windows.
#### COM Threading Fix
Previously, pywinauto initialization caused race conditions. Now:
```python
def _ensure_pywinauto_com_init():
import pythoncom
pythoncom.CoInitializeEx(None, pythoncom.COINIT_MULTITHREADED)
```
#### Element Types Supported
- WINDOW, BUTTON, EDIT, MENU, MENU_ITEM
- TAB, CHECKBOX, RADIO_BUTTON, COMBOBOX
- LIST, LIST_ITEM, TEXT, UNKNOWN
#### Data Structures
```python
@dataclass
class UIElement:
title: str
element_type: ElementType
rect: tuple # (x, y, width, height)
automation_id: str # UIA AutomationId
class_name: str # Window class name
is_visible: bool
is_enabled: bool
children: List[UIElement]
@dataclass
class WindowInfo:
title: str
process_name: str
rect: tuple
[truncated…]PUBLIC HISTORY
First discoveredApr 1, 2026
IDENTITY
inferred
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
platformgithub
first seenMar 26, 2025
last updatedMar 21, 2026
last crawledtoday
version—
README BADGE
Add to your README:
