mirroir-mcp
This agent gives artificial intelligence the ability to interact with iPhones as if they were being used by a person. It allows AI to see what's on the iPhone screen and then tap or swipe to perform actions. This solves the problem of AI needing a physical human to operate mobile devices for tasks like testing apps or automating workflows. Developers, researchers, and businesses who want to automate mobile tasks or build more sophisticated AI applications would find this tool valuable. What makes it unique is its ability to mirror an iPhone's screen directly to an AI, enabling realistic and dynamic interaction. It’s like giving your AI a pair of eyes and hands to control a phone.
README
<p align="center">
<img src="website/public/mirroir-wordmark.svg" alt="mirroir-mcp" width="128" />
</p>
# mirroir-mcp
[](https://www.npmjs.com/package/mirroir-mcp)
[](https://github.com/jfarcand/mirroir-mcp/actions/workflows/build.yml)
[](https://github.com/jfarcand/mirroir-mcp/actions/workflows/install.yml)
[](https://github.com/jfarcand/mirroir-mcp/actions/workflows/installers.yml)
[](https://github.com/jfarcand/mirroir-mcp/actions/workflows/mcp-compliance.yml)
[](LICENSE)
[](https://support.apple.com/en-us/105071)
[](https://discord.gg/jVDBbMjPMf)
Give your AI eyes, hands, and a real iPhone. An MCP server that lets any AI agent see the screen, tap what it needs, and figure the rest out — through macOS iPhone Mirroring. Experimental support for macOS windows. [32 tools](docs/tools.md), any MCP client.
## Requirements
- macOS 15+
- iPhone connected via [iPhone Mirroring](https://support.apple.com/en-us/105071)
## Install
```bash
/bin/bash -c "$(curl -fsSL https://mirroir.dev/get-mirroir.sh)"
```
or via [npx](https://www.npmjs.com/package/mirroir-mcp):
```bash
npx -y mirroir-mcp install
```
or via [Homebrew](https://tap.mirroir.dev):
```bash
brew tap jfarcand/tap && brew install mirroir-mcp
```
The first time you take a screenshot, macOS will prompt for **Screen Recording** and **Accessibility** permissions. Grant both.
<details>
<summary>Per-client setup</summary>
#### Claude Code
```bash
claude mcp add --transport stdio mirroir -- npx -y mirroir-mcp
```
#### GitHub Copilot (VS Code)
Install from the MCP server gallery: search `@mcp mirroir` in the Extensions view, or add to `.vscode/mcp.json`:
```json
{
"servers": {
"mirroir": {
"type": "stdio",
"command": "npx",
"args": ["-y", "mirroir-mcp"]
}
}
}
```
#### Cursor
Add to `.cursor/mcp.json` in your project root:
```json
{
"mcpServers": {
"mirroir": {
"command": "npx",
"args": ["-y", "mirroir-mcp"]
}
}
}
```
#### OpenAI Codex
```bash
codex mcp add mirroir -- npx -y mirroir-mcp
```
Or add to `~/.codex/config.toml`:
```toml
[mcp_servers.mirroir]
command = "npx"
args = ["-y", "mirroir-mcp"]
```
</details>
<details>
<summary>Install from source</summary>
```bash
git clone https://github.com/jfarcand/mirroir-mcp.git
cd mirroir-mcp
./mirroir.sh
```
Use the full path to the binary in your `.mcp.json`: `<repo>/.build/release/mirroir-mcp`.
</details>
## How it works
Every interaction follows the same loop: **observe, reason, act**. `describe_screen` gives the AI every text element with tap coordinates (eyes). The LLM decides what to do next (brain). `tap`, `type_text`, `swipe` execute the action (hands) — then it loops back to observe. No scripts, no coordinates, just intent.
## Examples
Paste any of these into Claude Code, Claude Desktop, ChatGPT, Cursor, or any MCP client:
```
Open Messages, find my conversation with Alice, and send "running 10 min late".
```
```
Open Calendar, create a new event called "Dentist" next Tuesday at 2pm.
```
```
Open my Expo Go app, tap "LoginDemo", test the login screen with
test@example.com / password123. Screenshot after each step.
```
```
Start recording, open Settings, scroll to General > About, stop recording.
```
## Screen Intelligence
`describe_screen` is the AI's eyes. Three backends work together to give the agent a complete picture of what's on screen — text, icons, and semantic UI structure.
### Apple Vision OCR (default)
The default backend uses Apple's Vision framework to detect every text element on screen and return exact tap coordinates. This is fast, local, and requires no API keys or external services.
### Icon Detection (YOLO CoreML)
Text-only OCR misses non-text UI elements — buttons, toggles, tab bar icons, activity rings. Drop a YOLO CoreML model (`.mlmodelc`) in `~/.mirroir-mcp/models/` and the server auto-detects it at startup, merging icon detection results with OCR text. The AI gets tap targets for elements that text-only OCR cannot see.
| Mode | `ocrBackend` setting | Behavior |
|------|---------------------|----------|
| Auto-detect (default) | `"auto"` | Uses Vision + YOLO if a model is installed, Vision only otherwise |
| Vision only | `"vision"` | Apple Vision OCR text only |
| YOLO only | `"yolo"` | CoreML element detection only |
| Both | `"both"` | Always merge both backends (falls back to Vision if no model) |
### AI Vision Mode (embacle)
Instead of local OCR, `describe_screen` can send the screenshot to an AI vision model that identifies UI elements semantically — cards, tabs, buttons, icons, navigation structure — not just raw text. This produces richer context for the agent, especially on screens with complex layouts.
The [embacle](https://github.com/dravr-ai/dravr-embacle) runtime is embedded directly into the mirroir-mcp binary via Rust FFI. `describe_screen` calls the embedded runtime in-process — no separate server, no network round-trip, no additional setup. The FFI layer (`EmbacleFFI.swift` → `libembacle.a`) handles initialization, chat completion requests, and memory management across the Swift/Rust boundary.
embacle routes vision requests through already-authenticated CLI tools (GitHub Copilot, Claude Code) so there is no separate API key to manage. If you have a Copilot or Claude Code subscription, you already have access.
#### Install
```bash
brew tap dravr-ai/tap
brew install embacle # CLI tools (embacle-server, embacle-mcp)
brew install embacle-ffi # Rust FFI static library (libembacle.a)
```
Then rebuild mirroir-mcp from source (or reinstall via Homebrew) so the binary links against `libembacle.a`:
```bash
# From source
swift build -c release
# Or via Homebrew (rebuilds automatically)
brew reinstall mirroir-mcp
```
#### Zero-config activation
When the embacle FFI is linked into the binary, `screenDescriberMode` defaults to `"auto"` which automatically resolves to vision mode. No settings change required — install embacle-ffi, rebuild, and `describe_screen` starts using AI vision.
To force local OCR even when embacle is available, explicitly set `"ocr"`:
```json
// .mirroir-mcp/settings.json
{
"screenDescriberMode": "ocr"
}
```
See [Configuration](#configuration) for all available settings.
## Skills
When you find yourself repeating the same agent workflow, capture it as a skill. Skills are SKILL.md files — numbered steps the AI follows, adapting to layout changes and unexpected dialogs. Steps like `Tap "Email"` use OCR — no hardcoded coordinates.
Place files in `~/.mirroir-mcp/skills/` (global) or `<cwd>/.mirroir-mcp/skills/` (project-local).
```markdown
---
version: 1
name: Commute ETA Notification
app: Waze, Messages
tags: ["workflow", "cross-app"]
---
## Steps
1. Launch **Waze**
2. Wait for "Où va-t-on ?" to appear
3. Tap "Où va-t-on ?"
4. Wait for "${DESTINATION:-Travail}" to appear
5. Tap "${DESTINATION:-Travail}"
6. Wait for "Y aller" to appear
7. Tap "Y aller"
8. Wait for "min" to appear
9. Remember: Read the commute time and ETA.
10. Press Home
11. Launch **Messages**
12. Tap "New Message"
13. Type "${RECIPIENT}" and select the contact
14. Type "On my way! ETA {eta}"
15. Press **Return**
16. Screenshot: "message_sent"
```
`${VAR}` placeholders resolve from environment variables. `${VAR:-default}` for fallbacks.
### Skill Marketplace
[truncated…]PUBLIC HISTORY
IDENTITY
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
README BADGE
Add to your README:
