AGENTS / GITHUB / GhostDesk
githubinferredactive

GhostDesk

provenance:github:YV17labs/GhostDesk
WHAT THIS AGENT DOES

GhostDesk gives artificial intelligence the ability to interact with computers and software the way a person does. It provides an AI agent with a virtual desktop environment, allowing it to open applications, navigate websites, type, and click – essentially performing tasks on a computer screen. This solves the problem of AI being limited to text-based interactions, enabling it to automate complex workflows that involve multiple programs. Business analysts, researchers, and anyone needing to automate tasks across different software applications would find GhostDesk useful. What makes it unique is its ability to control existing software without needing special programming or APIs, making it adaptable to a wide range of applications, even older or less common ones.

View Source ↗First seen 2mo agoNot yet hireable
README
<p align="center">
  <img src="assets/logo.png" alt="GhostDesk" width="200">
</p>

<p align="center">
  <img src="https://img.shields.io/badge/MCP-compatible-blueviolet?style=for-the-badge" alt="MCP Compatible">
  <img src="https://img.shields.io/badge/python-3.12+-blue?style=for-the-badge&logo=python&logoColor=white" alt="Python 3.12+">
  <img src="https://img.shields.io/badge/license-AGPL--3.0-green?style=for-the-badge" alt="AGPL-3.0 License">
  <img src="https://img.shields.io/badge/platform-Linux%20%7C%20Docker-orange?style=for-the-badge&logo=docker&logoColor=white" alt="Platform">
</p>

<p align="center">
  <strong>Give your AI agent eyes, hands, and a full Linux desktop.</strong><br>
  An MCP server that lets LLM agents see the screen, move the mouse, type on the keyboard, launch apps, and run shell commands — all inside a sandboxed virtual desktop.
</p>

<p align="center">
  <em>If a human can do it on a desktop, your agent can too.</em>
</p>

<p align="center">
  <img src="demos/ghostdesk-amazon-sheets-automation.gif" alt="GhostDesk demo — Automated Amazon scraping to Google Sheets with real-time visualization" width="960">
</p>

---

## Why GhostDesk?

Most AI agents are trapped in text. They can call APIs and generate code, but they can't **use software**. GhostDesk changes that.

Connect any MCP-compatible LLM (Claude, GPT, Gemini...) and it gets a full Linux desktop with 11 tools to interact with **any application** — browsers, IDEs, office suites, terminals, legacy software, internal tools. No API needed. No integration required. If it has a UI, your agent can use it.

## What can your agent do with a full desktop?

Your agent gets its own Linux desktop. Here's what that unlocks:

### Agentic workflows — chain anything

```
"Go to the CRM, export last month's leads as CSV,
 open LibreOffice Calc, build a pivot table,
 take a screenshot of the chart, and email it to the team."
```

Your agent opens the browser, logs in, downloads the file, switches to another app, processes the data, captures the result, and sends it — autonomously, across multiple applications, in one conversation.

### Browse the web like a human

```
"Search for competitors on Google, open the first 5 results,
 extract pricing from each page, and summarize in a spreadsheet."
```

No Selenium. No CSS selectors. No Puppeteer scripts that break every week. The agent looks at the screen, clicks what it sees, fills forms naturally — with human-like mouse movement that bypasses bot detection.

### Operate any software — no API required

```
"Open the legacy inventory app, search for product #4521,
 update the stock count to 150, and confirm the change."
```

That old Java app with no API? That internal admin panel from 2010? A Windows app running in Wine? If it renders pixels on screen, your agent can operate it.

### Data extraction at scale

```
"Open the analytics dashboard, read the KPI table,
 scroll down to the revenue chart, take a screenshot,
 then export the raw data."
```

The agent takes screenshots, reads the screen visually, and extracts what it needs — works on any application, any UI framework, any language.

### QA & UI testing with evidence

```
"Navigate the signup flow, try invalid emails, empty fields,
 and SQL injection in every input. Screenshot each error state."
```

Your agent becomes a QA engineer — it clicks every button, fills every form, tests every edge case, and brings back screenshots as proof.

### Unattended automation — runs 24/7

```
"Every morning: log into the supplier portal, download
 the latest price list, compare with yesterday's, and
 flag any changes above 5%."
```

Runs headless in Docker. No physical screen. No human babysitting. Schedule your agent to handle repetitive desktop tasks while you sleep.

### Multi-app orchestration

```
"Open VS Code, create a new Python file, write a script
 that calls our API, run it in the terminal, debug if it fails,
 then commit and push to GitHub."
```

Your agent isn't limited to one app. It can switch between browser, terminal, IDE, file manager, email client — just like a human switching windows on their desktop.

## Key features

| | Feature | Why it matters |
|---|---|---|
| **📸** | **Screenshots** | Full or regional captures with cursor overlay — the agent sees exactly what a human would see |
| **🖱️** | **Human-like input** | Bézier mouse curves, variable typing speed, micro-jitter — bypasses bot detection |
| **📋** | **Clipboard** | Read & write the clipboard — paste long text instantly |
| **⌨️** | **Keyboard control** | Type text, press hotkeys, keyboard shortcuts — full keyboard access |
| **🖥️** | **Shell access** | Run any command, launch any app, capture stdout/stderr |
| **🐳** | **Sandboxed** | Runs in Docker — isolated, reproducible, safe |
| **👀** | **Live view** | Watch your agent work in real-time via VNC or browser (noVNC) |

## 10 tools at your agent's fingertips

### See the screen
| Tool | Description |
|------|-------------|
| `screenshot` | Capture the screen (full or region) with cursor position |

### Mouse & keyboard
| Tool | Description |
|------|-------------|
| `mouse_click` | Click at coordinates |
| `mouse_double_click` | Double-click at coordinates |
| `mouse_drag` | Drag from one position to another |
| `mouse_scroll` | Scroll in any direction (up/down/left/right) |
| `type_text` | Type with realistic per-character delays |
| `press_key` | Press keys or combos (`ctrl+c`, `alt+F4`, `Return`...) |

### System
| Tool | Description |
|------|-------------|
| `launch` | Start GUI applications |
| `get_clipboard` | Read clipboard contents |
| `set_clipboard` | Write to clipboard |

## Quick start

### 1. Run the container

```bash
docker run -d --name ghostdesk \
  -p 3000:3000 \
  -p 5900:5900 \
  -p 6080:6080 \
  ghcr.io/yv17labs/ghostdesk:latest
```

That's it. The virtual desktop, MCP server, and VNC are all running inside an isolated container. Your agent gets a full Linux desktop — your host machine stays untouched.

### 2. Connect your AI

GhostDesk works with any MCP-compatible client. Add it to your config:

**Claude Desktop / Claude Code** (Streamable HTTP)
```json
{
  "mcpServers": {
    "ghostdesk": {
      "type": "http",
      "url": "http://localhost:3000/mcp"
    }
  }
}
```

**Clients that only support stdio** (Cursor, Windsurf, etc.)
```json
{
  "mcpServers": {
    "ghostdesk": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "http://localhost:3000/mcp"]
    }
  }
}
```

Or run the bridge manually:
```bash
npx -y mcp-remote http://localhost:3000/mcp
```

**ChatGPT, Gemini, or any LLM with MCP support** — same config, just point to `http://localhost:3000/mcp`.

**Local models (Ollama, LM Studio, etc.)** — any MCP client library can connect to the same endpoint.

### 3. Watch your agent work

Open `http://localhost:6080/vnc.html` in your browser to see the virtual desktop in real time.

| Service | URL |
|---------|-----|
| MCP server | `http://localhost:3000/mcp` |
| noVNC (browser) | `http://localhost:6080/vnc.html` |
| VNC | `vnc://localhost:5900` (password: `changeme`) |

## Demos

See GhostDesk in action:

| Demo | Description |
|------|-------------|
| [Google Sheets Automation](demos/ghostdesk-sheets-automation.gif) | AI agent autonomously populates a spreadsheet with AI startup funding data, formats headers, and creates a 3D bar chart |
| [Amazon Scraper to Google Sheets](demos/amazon-scraper-to-sheets.gif) | AI agent scrapes Amazon laptops, extracts product data, populates Google Sheets, and visualizes with charts |
| [Flight Search & Comparison](demos/ghostdesk-flight-search.gif) | AI agent searches Google Flights for Paris CDG → New York JFK, compares prices, and builds a chart in LibreOffice Calc |
| [Wikipedia Research](demos/ghostdesk-wikipedia.gif) | AI agent browsing Wikipedia, reading articles, and extracting information |

## How it works

GhostDesk runs a virtual Linux desktop inside Docker and exposes it as an

[truncated…]

PUBLIC HISTORY

First discoveredMar 26, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub
first seenMar 24, 2026
last updatedMar 25, 2026
last crawled14 days ago
version

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:YV17labs/GhostDesk)