AGENTS / GITHUB / jarvis
githubinferredactive

jarvis

provenance:github:dev-core-busy/jarvis
WHAT THIS AGENT DOES

Jarvis is an autonomous AI desktop agent designed for Linux environments. It combines multiple large language models to provide desktop control via VNC, and integrates with WhatsApp for communication. The agent utilizes a retrieval-augmented generation (RAG) knowledge base and leverages the OpenClaw skill ecosystem. Developers and users seeking automated desktop tasks and integrated communication capabilities would find Jarvis useful.

PROBLEM IT SOLVES

Jarvis automates desktop tasks on Linux, reducing the need for manual intervention. Users can benefit from its ability to control their desktop and communicate via WhatsApp, streamlining workflows and increasing efficiency.

View Source ↗First seen 3mo agoNot yet hireable

CAPABILITIES & CONSTRAINTS

TECH & STACK
pythonai-agentlinuxllmrag
README
<div align="center">

# 🤖 Jarvis AI Desktop Agent

**An autonomous AI agent with web frontend, desktop control, and multi-LLM support**

[![Python](https://img.shields.io/badge/Python-3.13-blue?logo=python&logoColor=white)](https://www.python.org/)
[![License](https://img.shields.io/badge/License-AGPL--3.0-green?logo=gnu)](LICENSE)
[![Version](https://img.shields.io/badge/Version-0.8-orange)](https://github.com/dev-core-busy/jarvis/releases)
[![Platform](https://img.shields.io/badge/Platform-Linux-lightgrey?logo=linux)](https://www.linux.org/)
[![PRs Welcome](https://img.shields.io/badge/PRs-Welcome-brightgreen)](https://github.com/dev-core-busy/jarvis/pulls)
[![OpenClaw Compatible](https://img.shields.io/badge/OpenClaw-Compatible-6366f1)](https://github.com/dev-core-busy/jarvis#openclaw-skill-ecosystem)

*Control your Linux desktop with natural language. Receive tasks via WhatsApp. Search your knowledge base. Automate everything.*

[**Live Demo**](https://jarvis-ai.info) · [**Report Bug**](https://github.com/dev-core-busy/jarvis/issues) · [**Request Feature**](https://github.com/dev-core-busy/jarvis/issues) · [**Contribute**](#contributing)

---

![Jarvis Split View](https://jarvis-ai.info/img/split_view.png)

</div>

---

## 📋 Table of Contents

- [Overview](#overview)
- [Key Features](#key-features)
- [Architecture](#architecture)
- [Tech Stack](#tech-stack)
- [Screenshots](#screenshots)
- [Installation](#installation)
- [Configuration](#configuration)
- [Skill System](#skill-system)
- [WhatsApp Integration](#whatsapp-integration)
- [Knowledge Base](#knowledge-base)
- [API Reference](#api-reference)
- [Contributing](#contributing)
- [Third-Party Licenses](#third-party-licenses)
- [License](#license)

---

## Overview

Jarvis is a **self-hosted, autonomous AI desktop agent** that runs on a Linux server. It combines a polished web frontend with real desktop control — you can watch and direct the agent as it works, right in your browser.

The core idea: give Jarvis a task (via chat, WhatsApp, or the web UI), and it figures out how to complete it — browsing the web, reading files, writing code, sending emails, managing your calendar — all while you observe through a live VNC split-screen view.

```
"Find all emails from last week about Project Alpha, summarize them,
 and create a calendar event for the follow-up meeting."
```

Jarvis handles it. You watch it happen.

---

## Key Features

### 🖥️ VNC Split View
The web interface shows your LLM chat **and a live desktop feed side by side**. The agent can see exactly what it's doing — screenshots feed back into the LLM context automatically. No more blind automation.

### 🧩 Modular Skill System
Skills are self-contained Python packages that extend Jarvis with new capabilities. Install, enable, disable, and configure them through the UI without touching config files. Compatible with [openclaw](https://github.com/steipete/gogcli) skills.

### 🔀 Multi-LLM Support
Switch between AI providers without restarting anything:
- **Google Gemini** (gemini-2.0-flash, gemini-1.5-pro, ...)
- **Anthropic Claude** (claude-opus-4, claude-sonnet-4, ...)
- **OpenRouter** (hundreds of models)
- **Local Ollama** (llama3, mistral, qwen2.5, ... — fully offline)
- Any **OpenAI-compatible** endpoint

Both native tool/function calling **and** prompt-based tool calling are supported — so even models without native tool support can use all of Jarvis's capabilities.

### 📱 WhatsApp Agent
Send Jarvis a voice note or text message on WhatsApp, get a response back. Voice messages are transcribed via faster-whisper (runs locally, no cloud). Perfect for mobile task delegation.

### 📚 Knowledge Base
Drop PDFs, DOCX files, or plain text into watched folders. Jarvis indexes them with TF-IDF and can search them during tasks. Multi-folder support, automatic re-indexing on file changes.

### 🌐 Google Workspace Integration
Manage Gmail, Google Calendar, and Google Drive through natural language commands — powered by the openclaw/gog CLI.

### 🤖 Browser Automation
Full browser control via CDP (Chrome DevTools Protocol) and xdotool. The agent can navigate websites, fill forms, click elements, and extract information.

### 🔐 Secure by Default
- HTTPS with self-signed certificates (auto-generated)
- Session-based authentication
- All external services proxied through the FastAPI backend

---

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                        Browser Client                        │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│   │  LLM Chat UI │  │  noVNC :6080 │  │  Settings / Skills│  │
│   │  (WebSocket) │  │  (Live VNC)  │  │  WhatsApp Logs   │  │
│   └──────┬───────┘  └──────┬───────┘  └────────┬─────────┘  │
└──────────┼────────────────┼────────────────────┼────────────┘
           │ WSS/HTTPS       │ WSS                │ HTTPS
           ▼                 ▼                    ▼
┌─────────────────────────────────────────────────────────────┐
│                   FastAPI Backend :443                        │
│   ┌─────────────┐  ┌────────────┐  ┌──────────────────────┐  │
│   │ JarvisAgent │  │ Skills API │  │  WhatsApp Proxy      │  │
│   │  (agent.py) │  │ /api/skills│  │  _wa_bridge_async()  │  │
│   └──────┬──────┘  └─────┬──────┘  └──────────┬───────────┘  │
│          │               │                     │              │
│   ┌──────▼──────────┐    │              ┌──────▼───────────┐  │
│   │   SkillManager  │◄───┘              │  Baileys Bridge  │  │
│   │  (skills/*.py)  │                   │  Node.js :3001   │  │
│   └──────┬──────────┘                   │  (localhost only)│  │
│          │                              └──────────────────┘  │
│   ┌──────▼──────────────────────────────────────────────────┐ │
│   │                      Tool Layer                          │ │
│   │  shell · desktop · filesystem · screenshot · memory     │ │
│   │  knowledge · browser_control · whatsapp · google_apps   │ │
│   └──────────────────────────────────────────────────────────┘│
│                                                               │
│   ┌──────────────┐    ┌──────────────┐    ┌───────────────┐  │
│   │  LLM Client  │    │  x11vnc :5900│    │  Xvfb/X11 :1  │  │
│   │  (llm.py)    │    │  (→ noVNC)   │    │  Openbox WM   │  │
│   │  Multi-Provider│  └──────────────┘    └───────────────┘  │
│   └──────────────┘                                            │
└─────────────────────────────────────────────────────────────┘
```

### Component Overview

| Component | File | Description |
|-----------|------|-------------|
| FastAPI Server | `backend/main.py` | HTTP/WebSocket endpoints, auth, WhatsApp proxy |
| Agent Loop | `backend/agent.py` | Task execution, tool calling, LLM orchestration |
| LLM Client | `backend/llm.py` | Multi-provider abstraction (Gemini, Claude, OpenRouter, Ollama) |
| Config | `backend/config.py` | Environment + settings.json management |
| Skill Manager | `backend/skills/manager.py` | Load, enable, disable, configure skills |
| Tool Base | `backend/tools/base.py` | `BaseTool` class all tools inherit from |
| WhatsApp Bridge | `services/whatsapp-bridge/index.js` | Baileys v7 + Express API |
| Frontend | `frontend/index.html` + `js/` | Single-page app, no build system required |

---

## Tech Stack

### Backend
| Technology | Version | Purpose |
|-----------|---------|---------|
| Python | 3.13 | Core runtime |
| FastAPI | latest | REST API + WebSocket server |
| uvicorn | latest | ASGI server |
| faster-whisper | latest | Voice transcription (CPU, int8) |

### Frontend
| Technology | Purpose |
|-----------|---------|
| Vanilla JS | Zero-dependency UI |
| CSS Custom Properties | Dark Glassmorphism theme |
| WebSocket API | Real-time agent communication |
| noVNC | In-browser VNC client |

### Desktop / System
| Technology | Purpose |
|-----------|---------|
| Xvfb | Virtual framebuffer (headless X11) |
| Openbox | Lightweight window ma

[truncated…]

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub
first seenMar 3, 2026
last updatedMar 20, 2026
last crawledtoday
version

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:dev-core-busy/jarvis)