AGENTS / GITHUB / SmudgeAI
githubinferredactive

SmudgeAI

provenance:github:ShaikhWarsi/SmudgeAI
WHAT THIS AGENT DOES

SmudgeAI is like having a digital assistant that can control your computer programs for you. It automates repetitive tasks across different applications, saving you time and effort. Businesses and individuals who spend a lot of time on tedious computer work – like data entry, report generation, or managing multiple software systems – would find SmudgeAI incredibly useful.

View Source ↗First seen 1y agoNot yet hireable
README
# SmudgeAI

SmudgeAI is a Windows-native autonomous desktop AI agent that understands screen context through vision models and UI element trees, enabling it to navigate any application and execute complex multi-step workflows with minimal human intervention. It leverages LLM-based reasoning to adapt to unknown interfaces while maintaining security through built-in permission systems and safeguards.


## Table of Contents

1. [Overview](#overview)
2. [Architecture](#architecture)
3. [Core Components](#core-components)
4. [AI Engine & Multimodal Processing](#ai-engine--multimodal-processing)
5. [Desktop Automation Stack](#desktop-automation-stack)
6. [UI Detection & Element Matching](#ui-detection--element-matching)
7. [Security & Safety Systems](#security--safety-systems)
8. [Error Handling & Recovery](#error-handling--recovery)
9. [Multi-Monitor & DPI Scaling Support](#multi-monitor--dpi-scaling-support)
10. [Internationalization & Localization](#internationalization--localization)
11. [Logging & Observability](#logging--observability)
12. [Capabilities Summary](#capabilities-summary)
13. [Comparison with OpenClaw](#comparison-with-openclaw)
14. [Bug Fixes & Security Hardening](#bug-fixes--security-hardening)
15. [Getting Started](#getting-started)
16. [Configuration](#configuration)
17. [Roadmap](#roadmap)

---

## Overview

SmudgeAI is designed to be a **fully autonomous desktop control agent** that can:

- **Understand screen context** through vision models and UI element trees
- **Navigate any Windows application** using UIA-based element discovery
- **Execute multi-step workflows** with verification and rollback
- **Adapt to unknown interfaces** through LLM-based reasoning
- **Operate safely** with permission systems and safeguards

### Key Design Goals

1. **Reliability First** - Every action is verified before proceeding
2. **Security by Default** - Permission systems and input sanitization on all dangerous operations
3. **Cross-UI Adaptability** - Works with any Windows application through UIA and CV fallback
4. **Production Ready** - Structured logging, error recovery, and observability

---

## Architecture

### High-Level System Diagram

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                              SmudgeAI Architecture                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────────┐     ┌──────────────────────────────────────────────┐     │
│  │    GUI       │────▶│              AI Engine                        │     │
│  │  (PyQt5)     │     │  ┌─────────────┐  ┌─────────────────────┐   │     │
│  │              │     │  │ Groq Client │  │ Gemini Client        │   │     │
│  │  - Input     │     │  │ (Primary)   │  │ (Fallback)          │   │     │
│  │  - Display   │     │  └─────────────┘  └─────────────────────┘   │     │
│  │  - Status    │     │  ┌─────────────────────────────────────┐   │     │
│  └──────────────┘     │  │ Rate Limiter & Model Cycling          │   │     │
│         │              │  └─────────────────────────────────────┘   │     │
│         │              │  ┌─────────────────────────────────────┐   │     │
│         ▼              │  │ Conversation History Manager        │   │     │
│  ┌──────────────────────────────────────────────────────────────┐ │     │
│  │                    Task Manager                               │ │     │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────────────┐    │ │     │
│  │  │ Permission  │  │ Tool       │  │ Error              │    │ │     │
│  │  │ System      │  │ Registry   │  │ Classifier         │    │ │     │
│  │  └────────────┘  └────────────┘  └────────────────────┘    │ │     │
│  └──────────────────────────────────────────────────────────────┘ │     │
│                              │                                        │
│         ┌────────────────────┼────────────────────┐                 │
│         ▼                    ▼                    ▼                    │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────┐        │
│  │ Desktop     │    │ CV/UI       │    │ Local VLM       │        │
│  │ State       │    │ Integration │    │ (Optional)      │        │
│  │ (UIA)       │    │             │    │                 │        │
│  └─────────────┘    └─────────────┘    └─────────────────┘        │
│                              │                                        │
│                              ▼                                        │
│                     ┌─────────────────┐                              │
│                     │  Action         │                              │
│                     │  Execution      │                              │
│                     │  (pyautogui)    │                              │
│                     └─────────────────┘                              │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Component Responsibilities

| Component | Responsibility | Language/Framework |
|-----------|----------------|-------------------|
| **GUI** | User input, status display, permission dialogs | PyQt5 |
| **AI Engine** | LLM orchestration, vision analysis, tool calling | Python (async) |
| **Task Manager** | Tool execution, permission checks, caching | Python |
| **Desktop State** | UIA element tree, window management | pywinauto + pygetwindow |
| **CV/UI Integration** | Template matching, LLM coordinate verification | OpenCV + PIL |
| **Error Handler** | Error classification, retry strategies | Python |
| **Structured Logging** | Correlation IDs, action tracking | Python (custom) |

---

## Core Components

### 1. AI Engine (`ai_engine.py`)

The AI Engine is the brain of SmudgeAI, orchestrating all LLM interactions.

#### Features

- **Multi-Provider Support**: Groq (primary), Google Gemini (fallback)
- **Vision Analysis**: Llama 3.2 Vision (Groq) → Gemini 1.5 Flash fallback
- **Model Cycling**: Automatic failover when rate limits hit
- **Rate Limiting**: Built-in rate limiter with exponential backoff
- **Conversation History**: Properly serialized message history (dict-based)
- **Tool Schema Generation**: Dynamic tool schema from Python functions

#### Rate Limiter Implementation

```python
class RateLimiter:
    requests_in_window: int      # Requests in current window
    window_size: float = 60.0    # 60-second window
    max_requests: int = 30       # Max 30 requests per window
    blocked_until: float         # Timestamp when block expires
    consecutive_errors: int       # Track consecutive failures
```

**Backoff Strategy**:
- Base delay: 30 seconds
- Exponential: 2^consecutive_errors
- Jitter: Random 0-10 seconds
- Max block: 300 seconds (5 minutes)

#### Vision Pipeline

```
Screenshot → Groq Llama 3.2 Vision → JSON Elements → Coordinate Verification → Click
                ↓ (fallback)
         Gemini 1.5 Flash Vision
```

### 2. Desktop State (`desktop_state.py`)

Captures and maintains the UI element hierarchy of all windows.

#### COM Threading Fix

Previously, pywinauto initialization caused race conditions. Now:

```python
def _ensure_pywinauto_com_init():
    import pythoncom
    pythoncom.CoInitializeEx(None, pythoncom.COINIT_MULTITHREADED)
```

#### Element Types Supported

- WINDOW, BUTTON, EDIT, MENU, MENU_ITEM
- TAB, CHECKBOX, RADIO_BUTTON, COMBOBOX
- LIST, LIST_ITEM, TEXT, UNKNOWN

#### Data Structures

```python
@dataclass
class UIElement:
    title: str
    element_type: ElementType
    rect: tuple          # (x, y, width, height)
    automation_id: str    # UIA AutomationId
    class_name: str       # Window class name
    is_visible: bool
    is_enabled: bool
    children: List[UIElement]

@dataclass
class WindowInfo:
    title: str
    process_name: str
    rect: tuple

[truncated…]

PUBLIC HISTORY

First discoveredApr 1, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub
first seenMar 26, 2025
last updatedMar 21, 2026
last crawledtoday
version

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:ShaikhWarsi/SmudgeAI)