githubinferredactive
jarvis
provenance:github:dev-core-busy/jarvis
Autonomous AI Desktop Agent for Linux – Multi-LLM, Desktop Control via VNC, WhatsApp Integration, RAG Knowledge Base, OpenClaw Skill Ecosystem
README
<div align="center">
# 🤖 Jarvis AI Desktop Agent
**An autonomous AI agent with web frontend, desktop control, and multi-LLM support**
[](https://www.python.org/)
[](LICENSE)
[](https://github.com/dev-core-busy/jarvis/releases)
[](https://www.linux.org/)
[](https://github.com/dev-core-busy/jarvis/pulls)
[](https://github.com/dev-core-busy/jarvis#openclaw-skill-ecosystem)
*Control your Linux desktop with natural language. Receive tasks via WhatsApp. Search your knowledge base. Automate everything.*
[**Live Demo**](https://jarvis-ai.info) · [**Report Bug**](https://github.com/dev-core-busy/jarvis/issues) · [**Request Feature**](https://github.com/dev-core-busy/jarvis/issues) · [**Contribute**](#contributing)
---

</div>
---
## 📋 Table of Contents
- [Overview](#overview)
- [Key Features](#key-features)
- [Architecture](#architecture)
- [Tech Stack](#tech-stack)
- [Screenshots](#screenshots)
- [Installation](#installation)
- [Configuration](#configuration)
- [Skill System](#skill-system)
- [WhatsApp Integration](#whatsapp-integration)
- [Knowledge Base](#knowledge-base)
- [API Reference](#api-reference)
- [Contributing](#contributing)
- [Third-Party Licenses](#third-party-licenses)
- [License](#license)
---
## Overview
Jarvis is a **self-hosted, autonomous AI desktop agent** that runs on a Linux server. It combines a polished web frontend with real desktop control — you can watch and direct the agent as it works, right in your browser.
The core idea: give Jarvis a task (via chat, WhatsApp, or the web UI), and it figures out how to complete it — browsing the web, reading files, writing code, sending emails, managing your calendar — all while you observe through a live VNC split-screen view.
```
"Find all emails from last week about Project Alpha, summarize them,
and create a calendar event for the follow-up meeting."
```
Jarvis handles it. You watch it happen.
---
## Key Features
### 🖥️ VNC Split View
The web interface shows your LLM chat **and a live desktop feed side by side**. The agent can see exactly what it's doing — screenshots feed back into the LLM context automatically. No more blind automation.
### 🧩 Modular Skill System
Skills are self-contained Python packages that extend Jarvis with new capabilities. Install, enable, disable, and configure them through the UI without touching config files. Compatible with [openclaw](https://github.com/steipete/gogcli) skills.
### 🔀 Multi-LLM Support
Switch between AI providers without restarting anything:
- **Google Gemini** (gemini-2.0-flash, gemini-1.5-pro, ...)
- **Anthropic Claude** (claude-opus-4, claude-sonnet-4, ...)
- **OpenRouter** (hundreds of models)
- **Local Ollama** (llama3, mistral, qwen2.5, ... — fully offline)
- Any **OpenAI-compatible** endpoint
Both native tool/function calling **and** prompt-based tool calling are supported — so even models without native tool support can use all of Jarvis's capabilities.
### 📱 WhatsApp Agent
Send Jarvis a voice note or text message on WhatsApp, get a response back. Voice messages are transcribed via faster-whisper (runs locally, no cloud). Perfect for mobile task delegation.
### 📚 Knowledge Base
Drop PDFs, DOCX files, or plain text into watched folders. Jarvis indexes them with TF-IDF and can search them during tasks. Multi-folder support, automatic re-indexing on file changes.
### 🌐 Google Workspace Integration
Manage Gmail, Google Calendar, and Google Drive through natural language commands — powered by the openclaw/gog CLI.
### 🤖 Browser Automation
Full browser control via CDP (Chrome DevTools Protocol) and xdotool. The agent can navigate websites, fill forms, click elements, and extract information.
### 🔐 Secure by Default
- HTTPS with self-signed certificates (auto-generated)
- Session-based authentication
- All external services proxied through the FastAPI backend
---
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Browser Client │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ LLM Chat UI │ │ noVNC :6080 │ │ Settings / Skills│ │
│ │ (WebSocket) │ │ (Live VNC) │ │ WhatsApp Logs │ │
│ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │
└──────────┼────────────────┼────────────────────┼────────────┘
│ WSS/HTTPS │ WSS │ HTTPS
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ FastAPI Backend :443 │
│ ┌─────────────┐ ┌────────────┐ ┌──────────────────────┐ │
│ │ JarvisAgent │ │ Skills API │ │ WhatsApp Proxy │ │
│ │ (agent.py) │ │ /api/skills│ │ _wa_bridge_async() │ │
│ └──────┬──────┘ └─────┬──────┘ └──────────┬───────────┘ │
│ │ │ │ │
│ ┌──────▼──────────┐ │ ┌──────▼───────────┐ │
│ │ SkillManager │◄───┘ │ Baileys Bridge │ │
│ │ (skills/*.py) │ │ Node.js :3001 │ │
│ └──────┬──────────┘ │ (localhost only)│ │
│ │ └──────────────────┘ │
│ ┌──────▼──────────────────────────────────────────────────┐ │
│ │ Tool Layer │ │
│ │ shell · desktop · filesystem · screenshot · memory │ │
│ │ knowledge · browser_control · whatsapp · google_apps │ │
│ └──────────────────────────────────────────────────────────┘│
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ LLM Client │ │ x11vnc :5900│ │ Xvfb/X11 :1 │ │
│ │ (llm.py) │ │ (→ noVNC) │ │ Openbox WM │ │
│ │ Multi-Provider│ └──────────────┘ └───────────────┘ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### Component Overview
| Component | File | Description |
|-----------|------|-------------|
| FastAPI Server | `backend/main.py` | HTTP/WebSocket endpoints, auth, WhatsApp proxy |
| Agent Loop | `backend/agent.py` | Task execution, tool calling, LLM orchestration |
| LLM Client | `backend/llm.py` | Multi-provider abstraction (Gemini, Claude, OpenRouter, Ollama) |
| Config | `backend/config.py` | Environment + settings.json management |
| Skill Manager | `backend/skills/manager.py` | Load, enable, disable, configure skills |
| Tool Base | `backend/tools/base.py` | `BaseTool` class all tools inherit from |
| WhatsApp Bridge | `services/whatsapp-bridge/index.js` | Baileys v7 + Express API |
| Frontend | `frontend/index.html` + `js/` | Single-page app, no build system required |
---
## Tech Stack
### Backend
| Technology | Version | Purpose |
|-----------|---------|---------|
| Python | 3.13 | Core runtime |
| FastAPI | latest | REST API + WebSocket server |
| uvicorn | latest | ASGI server |
| faster-whisper | latest | Voice transcription (CPU, int8) |
### Frontend
| Technology | Purpose |
|-----------|---------|
| Vanilla JS | Zero-dependency UI |
| CSS Custom Properties | Dark Glassmorphism theme |
| WebSocket API | Real-time agent communication |
| noVNC | In-browser VNC client |
### Desktop / System
| Technology | Purpose |
|-----------|---------|
| Xvfb | Virtual framebuffer (headless X11) |
| Openbox | Lightweight window ma
[truncated…]PUBLIC HISTORY
First discoveredMar 21, 2026
IDENTITY
inferred
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
platformgithub
first seenMar 3, 2026
last updatedMar 20, 2026
last crawledtoday
version—
README BADGE
Add to your README:
