jarvis
Jarvis is an advanced AI assistant designed for users who need a reliable and autonomous system to handle complex tasks. It combines real-time tool execution with agent planning, allowing it to process various document types like PDFs and DOCX files, as well as visual information. The system features a stunning adaptive user interface and prioritizes reliability through a strict validation process before taking actions. Jarvis offers voice synthesis capabilities and can control operating systems, making it a versatile tool for automation and information processing. It is powered by Groq and utilizes Llama models for language and vision tasks. This assistant is ideal for users seeking a production-grade solution with a focus on accuracy and dependability.
Jarvis solves the problem of needing a dependable AI assistant that can handle complex, multi-step tasks involving diverse data formats and system interactions. Instead of manually processing documents, coordinating tools, and verifying results, users can rely on Jarvis to autonomously manage these processes with a focus on accuracy and reliability.
CAPABILITIES & CONSTRAINTS
README
<div align="center">
[](.)
[](https://git.io/typing-svg)
<br/>
[](https://python.org)
[](CHANGELOG.md)
[](LICENSE)
[](.)
[](.)
[](https://groq.com)
[](https://threejs.org)
[](CONTRIBUTING.md)
<br/>
[](.)
[](.)
[](.)
[](.)
[](.)
[](.)
[](.)
<br/>
> **A production-grade, reliability-first autonomous AI assistant** combining a priority intent router,
> a full Planner→Validator→Executor→Synthesizer agent loop, multimodal document intelligence,
> realtime voice output, OS-level system control, and a stunning Three.js adaptive plasma core UI.
<br/>
</div>
---
<p align="center">
<img src="assets/jarvis_ui.gif" width="700"/>
</p>
---
## 📌 Table of Contents
<details>
<summary>Expand Navigation</summary>
- [✨ What is JARVIS?](#-what-is-jarvis)
- [🚀 Feature Highlights](#-feature-highlights)
- [🏗️ Architecture](#️-architecture)
- [🛠️ Tech Stack](#️-tech-stack)
- [⚡ Quick Start](#-quick-start)
- [⚙️ Configuration](#️-configuration)
- [🖥️ Run Modes](#️-run-modes)
- [📁 Project Structure](#-project-structure)
- [🗺️ Roadmap](#️-roadmap)
- [🤝 Contributing](#-contributing)
- [🔐 Security](#-security)
- [📜 License](#-license)
- [👤 Author](#-author)
</details>
---
## ✨ What is JARVIS?
**JARVIS** is not a chatbot. It is a full-stack, autonomous AI assistant runtime built around a strict **reliability-first principle** — meaning every answer that claims to be real-time actually is, every tool call is validated before synthesis, and every system action is OS-verified before being reported as successful.
At its core, JARVIS combines:
- **⚡ Sub-millisecond local routing** for greetings, identity, and conversational turns
- **🧠 A multi-step agent loop** (Plan → Validate → Execute → Synthesize) for tool-backed queries
- **📄 A hybrid document intelligence pipeline** fusing text extraction, OCR, and LLM vision
- **🎤 Real-time, streaming voice synthesis** via Piper TTS with chunk-level playback
- **🖥️ A pywebview desktop GUI** rendered through a Three.js adaptive plasma sphere with live telemetry
Every module enforces its own reliability contract. No hallucinated real-time data. No fake success confirmations. No persona drift.
---
## 🚀 Feature Highlights
| Category | Capability |
|---|---|
| 🧭 **Smart Routing** | Priority intent router with 30+ local fast-paths before agent loop fallback |
| 🌐 **Live Web Search** | Real-time web + news evidence via Serper — factual queries always use live sources |
| 🌦️ **Weather + Forecast** | Current conditions, daily forecasts, and rain probability via Open-Meteo |
| 📄 **Document Intelligence** | PDF · DOCX · Image — text extraction, PaddleOCR, Groq Vision, SQLite caching |
| 💬 **Document Q&A** | Follow-up Q&A over analyzed documents without re-processing |
| ⚖️ **Multi-Doc Compare** | Pricing, risk, and feature comparison across multiple documents simultaneously |
| 🎤 **Realtime TTS** | Streaming Piper voice with first-chunk latency optimization |
| 🖥️ **App Control** | Open/close desktop apps with Start Menu indexing, fuzzy resolution, OS verification |
| 🔊 **System Control** | Volume · Brightness · Window management · Desktop control · Screen lock |
| 🌍 **Network Diagnostics** | Public IP · IP-based location · Connectivity probes · Speedtest |
| 🕒 **Temporal Awareness** | Precise time/date/day/month/year responses |
| 💾 **Persistent Memory** | JSON-backed user profile with session location and search context |
| 🎭 **Personality Engine** | Contextual humor system with anti-repetition guards and tone adaptation |
| ⏭️ **Skip Control** | UI button to safely interrupt active TTS mid-stream |
| 📊 **Live Telemetry** | CPU · RAM · Disk · Battery · Network · Uptime — all live in the HUD |
---
## 🏗️ Architecture
### Main Request Flow
```mermaid
flowchart TD
A(["🎙️ User Input"]) --> B{"⚡ Priority\nIntent Router"}
B -->|"Greeting / Wellbeing\nName / Correction\nLocation / Help"| C(["✅ Local Handler\n~0ms"])
B -->|"Tool-capable query"| D["🧠 Agent Loop"]
D --> E["📋 Planner\nGroq JSON"]
E --> F["🛡️ Validator\nSchema + Safety"]
F --> G["⚙️ Executor\nAsync / Parallel"]
G --> H[("🔧 Tools\nWeather · Search\nSystem · Document")]
H --> I["🔬 Synthesizer\nRelevance Filter"]
B -->|"General LLM query"| J["💬 Groq Stream\nllama-3.1-8b"]
I --> K["🎭 Personality +\nIdentity Guardrails"]
J --> K
C --> K
K --> L(["🔊 Response + TTS"])
style A fill:#0066ff,color:#fff,stroke:#00e1ff
style L fill:#0066ff,color:#fff,stroke:#00e1ff
style C fill:#00C853,color:#fff,stroke:none
style K fill:#7C3AED,color:#fff,stroke:none
```
### Document Intelligence Pipeline
```mermaid
flowchart LR
A(["📄 Document\nIntent"]) --> B["📁 File Selector\n+ Path Validation"]
B --> C{"File Type"}
C -->|"PDF"| D["PyMuPDF\n+ pdfplumber"]
C -->|"DOCX"| E["python-docx"]
C -->|"Image"| F["OcrParser"]
D & E --> G{"Content\nAnalysis"}
G -->|"Text-Rich"| H["📝 Text Primary\nLLM Pass"]
G -->|"Has Images\nor Scanned"| I["👁️ Groq Vision\nLlama 4 Scout"]
G -->|"Low Confidence"| J["🔠 PaddleOCR"]
H & I & J --> K["🔀 Fusion\nProcessor"]
K --> L["🧠 Reasoning\nllama-3.3-70b"]
L --> M["🗂️ Active Document\nIndex + SQLite Cache"]
M --> N(["💬 Q&A Engine\n+ Multi-Doc Compare"])
style A fill:#0066ff,color:#fff,stroke:#00e1ff
style N fill:#0066ff,color:#fff,stroke:#00e1ff
style L fill:#7C3AED,color:#fff,stroke:none
```
### Intent Routing Precedence
```mermaid
flowchart TD
A(["Query"]) --> B{"Priority 1–17\nCorrection · Name\nGreeting · Location\nWellbeing · Help"}
B -->|"Matched"| C(["Local Response"])
B -->|"No match"| D{"Priority 18–27\nSpeedtest · Connectivity\nIP · Weather · Stat
[truncated…]PUBLIC HISTORY
IDENTITY
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
README BADGE
Add to your README:
