githubinferredactive
GhostDesk
provenance:github:YV17labs/GhostDesk
MCP server that gives AI agents full control of a virtual Linux desktop — mouse, keyboard, screenshots, UI reading, clipboard, and shell. Human-like input simulation to bypass bot detection. Runs in Docker.
README
<p align="center">
<img src="assets/logo.png" alt="GhostDesk" width="200">
</p>
<p align="center">
<img src="https://img.shields.io/badge/MCP-compatible-blueviolet?style=for-the-badge" alt="MCP Compatible">
<img src="https://img.shields.io/badge/python-3.12+-blue?style=for-the-badge&logo=python&logoColor=white" alt="Python 3.12+">
<img src="https://img.shields.io/badge/license-AGPL--3.0-green?style=for-the-badge" alt="AGPL-3.0 License">
<img src="https://img.shields.io/badge/platform-Linux%20%7C%20Docker-orange?style=for-the-badge&logo=docker&logoColor=white" alt="Platform">
</p>
<p align="center">
<strong>Give your AI agent eyes, hands, and a full Linux desktop.</strong><br>
An MCP server that lets LLM agents see the screen, move the mouse, type on the keyboard, launch apps, and run shell commands — all inside a sandboxed virtual desktop.
</p>
<p align="center">
<em>If a human can do it on a desktop, your agent can too.</em>
</p>
<p align="center">
<img src="demos/ghostdesk-amazon-sheets-automation.gif" alt="GhostDesk demo — Automated Amazon scraping to Google Sheets with real-time visualization" width="960">
</p>
---
## Why GhostDesk?
Most AI agents are trapped in text. They can call APIs and generate code, but they can't **use software**. GhostDesk changes that.
Connect any MCP-compatible LLM (Claude, GPT, Gemini...) and it gets a full Linux desktop with 11 tools to interact with **any application** — browsers, IDEs, office suites, terminals, legacy software, internal tools. No API needed. No integration required. If it has a UI, your agent can use it.
## What can your agent do with a full desktop?
Your agent gets its own Linux desktop. Here's what that unlocks:
### Agentic workflows — chain anything
```
"Go to the CRM, export last month's leads as CSV,
open LibreOffice Calc, build a pivot table,
take a screenshot of the chart, and email it to the team."
```
Your agent opens the browser, logs in, downloads the file, switches to another app, processes the data, captures the result, and sends it — autonomously, across multiple applications, in one conversation.
### Browse the web like a human
```
"Search for competitors on Google, open the first 5 results,
extract pricing from each page, and summarize in a spreadsheet."
```
No Selenium. No CSS selectors. No Puppeteer scripts that break every week. The agent looks at the screen, clicks what it sees, fills forms naturally — with human-like mouse movement that bypasses bot detection.
### Operate any software — no API required
```
"Open the legacy inventory app, search for product #4521,
update the stock count to 150, and confirm the change."
```
That old Java app with no API? That internal admin panel from 2010? A Windows app running in Wine? If it renders pixels on screen, your agent can operate it.
### Data extraction at scale
```
"Open the analytics dashboard, read the KPI table,
scroll down to the revenue chart, take a screenshot,
then export the raw data."
```
The agent takes screenshots, reads the screen visually, and extracts what it needs — works on any application, any UI framework, any language.
### QA & UI testing with evidence
```
"Navigate the signup flow, try invalid emails, empty fields,
and SQL injection in every input. Screenshot each error state."
```
Your agent becomes a QA engineer — it clicks every button, fills every form, tests every edge case, and brings back screenshots as proof.
### Unattended automation — runs 24/7
```
"Every morning: log into the supplier portal, download
the latest price list, compare with yesterday's, and
flag any changes above 5%."
```
Runs headless in Docker. No physical screen. No human babysitting. Schedule your agent to handle repetitive desktop tasks while you sleep.
### Multi-app orchestration
```
"Open VS Code, create a new Python file, write a script
that calls our API, run it in the terminal, debug if it fails,
then commit and push to GitHub."
```
Your agent isn't limited to one app. It can switch between browser, terminal, IDE, file manager, email client — just like a human switching windows on their desktop.
## Key features
| | Feature | Why it matters |
|---|---|---|
| **📸** | **Screenshots** | Full or regional captures with cursor overlay — the agent sees exactly what a human would see |
| **🖱️** | **Human-like input** | Bézier mouse curves, variable typing speed, micro-jitter — bypasses bot detection |
| **📋** | **Clipboard** | Read & write the clipboard — paste long text instantly |
| **⌨️** | **Keyboard control** | Type text, press hotkeys, keyboard shortcuts — full keyboard access |
| **🖥️** | **Shell access** | Run any command, launch any app, capture stdout/stderr |
| **🐳** | **Sandboxed** | Runs in Docker — isolated, reproducible, safe |
| **👀** | **Live view** | Watch your agent work in real-time via VNC or browser (noVNC) |
## 10 tools at your agent's fingertips
### See the screen
| Tool | Description |
|------|-------------|
| `screenshot` | Capture the screen (full or region) with cursor position |
### Mouse & keyboard
| Tool | Description |
|------|-------------|
| `mouse_click` | Click at coordinates |
| `mouse_double_click` | Double-click at coordinates |
| `mouse_drag` | Drag from one position to another |
| `mouse_scroll` | Scroll in any direction (up/down/left/right) |
| `type_text` | Type with realistic per-character delays |
| `press_key` | Press keys or combos (`ctrl+c`, `alt+F4`, `Return`...) |
### System
| Tool | Description |
|------|-------------|
| `launch` | Start GUI applications |
| `get_clipboard` | Read clipboard contents |
| `set_clipboard` | Write to clipboard |
## Quick start
### 1. Run the container
```bash
docker run -d --name ghostdesk \
-p 3000:3000 \
-p 5900:5900 \
-p 6080:6080 \
ghcr.io/yv17labs/ghostdesk:latest
```
That's it. The virtual desktop, MCP server, and VNC are all running inside an isolated container. Your agent gets a full Linux desktop — your host machine stays untouched.
### 2. Connect your AI
GhostDesk works with any MCP-compatible client. Add it to your config:
**Claude Desktop / Claude Code** (Streamable HTTP)
```json
{
"mcpServers": {
"ghostdesk": {
"type": "http",
"url": "http://localhost:3000/mcp"
}
}
}
```
**Clients that only support stdio** (Cursor, Windsurf, etc.)
```json
{
"mcpServers": {
"ghostdesk": {
"command": "npx",
"args": ["-y", "mcp-remote", "http://localhost:3000/mcp"]
}
}
}
```
Or run the bridge manually:
```bash
npx -y mcp-remote http://localhost:3000/mcp
```
**ChatGPT, Gemini, or any LLM with MCP support** — same config, just point to `http://localhost:3000/mcp`.
**Local models (Ollama, LM Studio, etc.)** — any MCP client library can connect to the same endpoint.
### 3. Watch your agent work
Open `http://localhost:6080/vnc.html` in your browser to see the virtual desktop in real time.
| Service | URL |
|---------|-----|
| MCP server | `http://localhost:3000/mcp` |
| noVNC (browser) | `http://localhost:6080/vnc.html` |
| VNC | `vnc://localhost:5900` (password: `changeme`) |
## Demos
See GhostDesk in action:
| Demo | Description |
|------|-------------|
| [Google Sheets Automation](demos/ghostdesk-sheets-automation.gif) | AI agent autonomously populates a spreadsheet with AI startup funding data, formats headers, and creates a 3D bar chart |
| [Amazon Scraper to Google Sheets](demos/amazon-scraper-to-sheets.gif) | AI agent scrapes Amazon laptops, extracts product data, populates Google Sheets, and visualizes with charts |
| [Flight Search & Comparison](demos/ghostdesk-flight-search.gif) | AI agent searches Google Flights for Paris CDG → New York JFK, compares prices, and builds a chart in LibreOffice Calc |
| [Wikipedia Research](demos/ghostdesk-wikipedia.gif) | AI agent browsing Wikipedia, reading articles, and extracting information |
## How it works
GhostDesk runs a virtual Linux desktop inside Docker and exposes it as an
[truncated…]PUBLIC HISTORY
First discoveredMar 26, 2026
IDENTITY
inferred
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
platformgithub
first seenMar 24, 2026
last updatedMar 25, 2026
last crawledtoday
version—
README BADGE
Add to your README:
