jarvis

provenance:github:deepakrakshit/jarvis

WHAT THIS AGENT DOES

Jarvis is an advanced AI assistant designed for users who need a reliable and autonomous system to handle complex tasks. It combines real-time tool execution with agent planning, allowing it to process various document types like PDFs and DOCX files, as well as visual information. The system features a stunning adaptive user interface and prioritizes reliability through a strict validation process before taking actions. Jarvis offers voice synthesis capabilities and can control operating systems, making it a versatile tool for automation and information processing. It is powered by Groq and utilizes Llama models for language and vision tasks. This assistant is ideal for users seeking a production-grade solution with a focus on accuracy and dependability.

PROBLEM IT SOLVES

Jarvis solves the problem of needing a dependable AI assistant that can handle complex, multi-step tasks involving diverse data formats and system interactions. Instead of manually processing documents, coordinating tools, and verifying results, users can rely on Jarvis to autonomously manage these processes with a focus on accuracy and reliability.

View Source ↗First seen 3mo agoNot yet hireable

CAPABILITIES & CONSTRAINTS

TECH & STACK

pythongroqllamamultimodalagentttsuiautomation

README

<div align="center">

[![Header](https://capsule-render.vercel.app/api?type=waving&color=gradient&customColorList=2,3,30&height=220&section=header&text=J.A.R.V.I.S.&fontSize=72&fontColor=ffffff&animation=twinkling&fontAlignY=38&desc=Just%20A%20Rather%20Very%20Intelligent%20System&descAlignY=62&descSize=24)](.)

[![Typing SVG](https://readme-typing-svg.demolab.com?font=Orbitron&size=17&duration=2800&pause=1000&color=00E1FF&center=true&vCenter=true&width=680&lines=Autonomous+AI+Assistant+%E2%80%94+Groq+Powered;Real-Time+Tool+Execution+%2B+Agent+Planning;Multimodal+Document+Intelligence+%28PDF+%2B+DOCX+%2B+Vision%29;Realtime+Voice+Synthesis+%2B+Adaptive+Plasma+UI;Production-Grade+%E2%80%94+Reliability-First+Architecture)](https://git.io/typing-svg)

<br/>

[![Python](https://img.shields.io/badge/Python-3.10%2B-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://python.org)
[![Version](https://img.shields.io/badge/Version-v1.6.0-00C853?style=for-the-badge&logo=github&logoColor=white)](CHANGELOG.md)
[![License: MIT](https://img.shields.io/badge/License-MIT-F7C948?style=for-the-badge)](LICENSE)
[![Status](https://img.shields.io/badge/Status-Active-00C853?style=for-the-badge&logo=checkmarx&logoColor=white)](.)
[![Platform](https://img.shields.io/badge/Platform-Windows-0078D6?style=for-the-badge&logo=windows&logoColor=white)](.)
[![Groq](https://img.shields.io/badge/Powered%20by-Groq-F55036?style=for-the-badge&logoColor=white)](https://groq.com)
[![Three.js](https://img.shields.io/badge/UI-Three.js-000000?style=for-the-badge&logo=threedotjs&logoColor=white)](https://threejs.org)
[![PRs Welcome](https://img.shields.io/badge/PRs-Welcome-E91E7F?style=for-the-badge&logo=github&logoColor=white)](CONTRIBUTING.md)

<br/>

[![AI](https://img.shields.io/badge/LLM-Llama%203.1%20%2B%203.3-7C3AED?style=flat-square&logo=meta&logoColor=white)](.)
[![Vision](https://img.shields.io/badge/Vision-Llama%204%20Scout-0EA5E9?style=flat-square&logo=meta&logoColor=white)](.)
[![OCR](https://img.shields.io/badge/OCR-PaddleOCR-0052CC?style=flat-square&logo=paddlepaddle&logoColor=white)](.)
[![TTS](https://img.shields.io/badge/TTS-Piper%20%2B%20RealtimeTTS-10B981?style=flat-square)](.)
[![Search](https://img.shields.io/badge/Search-Serper.dev-F97316?style=flat-square)](.)
[![Weather](https://img.shields.io/badge/Weather-Open--Meteo-06B6D4?style=flat-square)](.)
[![Cache](https://img.shields.io/badge/Cache-SQLite%20%2B%20In--Memory-003B57?style=flat-square&logo=sqlite&logoColor=white)](.)

<br/>

> **A production-grade, reliability-first autonomous AI assistant** combining a priority intent router,
> a full Planner→Validator→Executor→Synthesizer agent loop, multimodal document intelligence,
> realtime voice output, OS-level system control, and a stunning Three.js adaptive plasma core UI.

<br/>

</div>

---

<p align="center">
  <img src="assets/jarvis_ui.gif" width="700"/>
</p>

---

## 📌 Table of Contents

<details>
<summary>Expand Navigation</summary>

- [✨ What is JARVIS?](#-what-is-jarvis)
- [🚀 Feature Highlights](#-feature-highlights)
- [🏗️ Architecture](#️-architecture)
- [🛠️ Tech Stack](#️-tech-stack)
- [⚡ Quick Start](#-quick-start)
- [⚙️ Configuration](#️-configuration)
- [🖥️ Run Modes](#️-run-modes)
- [📁 Project Structure](#-project-structure)
- [🗺️ Roadmap](#️-roadmap)
- [🤝 Contributing](#-contributing)
- [🔐 Security](#-security)
- [📜 License](#-license)
- [👤 Author](#-author)

</details>

---

## ✨ What is JARVIS?

**JARVIS** is not a chatbot. It is a full-stack, autonomous AI assistant runtime built around a strict **reliability-first principle** — meaning every answer that claims to be real-time actually is, every tool call is validated before synthesis, and every system action is OS-verified before being reported as successful.

At its core, JARVIS combines:

- **⚡ Sub-millisecond local routing** for greetings, identity, and conversational turns
- **🧠 A multi-step agent loop** (Plan → Validate → Execute → Synthesize) for tool-backed queries
- **📄 A hybrid document intelligence pipeline** fusing text extraction, OCR, and LLM vision
- **🎤 Real-time, streaming voice synthesis** via Piper TTS with chunk-level playback
- **🖥️ A pywebview desktop GUI** rendered through a Three.js adaptive plasma sphere with live telemetry

Every module enforces its own reliability contract. No hallucinated real-time data. No fake success confirmations. No persona drift.

---

## 🚀 Feature Highlights

| Category | Capability |
|---|---|
| 🧭 **Smart Routing** | Priority intent router with 30+ local fast-paths before agent loop fallback |
| 🌐 **Live Web Search** | Real-time web + news evidence via Serper — factual queries always use live sources |
| 🌦️ **Weather + Forecast** | Current conditions, daily forecasts, and rain probability via Open-Meteo |
| 📄 **Document Intelligence** | PDF · DOCX · Image — text extraction, PaddleOCR, Groq Vision, SQLite caching |
| 💬 **Document Q&A** | Follow-up Q&A over analyzed documents without re-processing |
| ⚖️ **Multi-Doc Compare** | Pricing, risk, and feature comparison across multiple documents simultaneously |
| 🎤 **Realtime TTS** | Streaming Piper voice with first-chunk latency optimization |
| 🖥️ **App Control** | Open/close desktop apps with Start Menu indexing, fuzzy resolution, OS verification |
| 🔊 **System Control** | Volume · Brightness · Window management · Desktop control · Screen lock |
| 🌍 **Network Diagnostics** | Public IP · IP-based location · Connectivity probes · Speedtest |
| 🕒 **Temporal Awareness** | Precise time/date/day/month/year responses |
| 💾 **Persistent Memory** | JSON-backed user profile with session location and search context |
| 🎭 **Personality Engine** | Contextual humor system with anti-repetition guards and tone adaptation |
| ⏭️ **Skip Control** | UI button to safely interrupt active TTS mid-stream |
| 📊 **Live Telemetry** | CPU · RAM · Disk · Battery · Network · Uptime — all live in the HUD |

---

## 🏗️ Architecture

### Main Request Flow

```mermaid
flowchart TD
    A(["🎙️ User Input"]) --> B{"⚡ Priority\nIntent Router"}
    B -->|"Greeting / Wellbeing\nName / Correction\nLocation / Help"| C(["✅ Local Handler\n~0ms"])
    B -->|"Tool-capable query"| D["🧠 Agent Loop"]
    D --> E["📋 Planner\nGroq JSON"]
    E --> F["🛡️ Validator\nSchema + Safety"]
    F --> G["⚙️ Executor\nAsync / Parallel"]
    G --> H[("🔧 Tools\nWeather · Search\nSystem · Document")]
    H --> I["🔬 Synthesizer\nRelevance Filter"]
    B -->|"General LLM query"| J["💬 Groq Stream\nllama-3.1-8b"]
    I --> K["🎭 Personality +\nIdentity Guardrails"]
    J --> K
    C --> K
    K --> L(["🔊 Response + TTS"])

    style A fill:#0066ff,color:#fff,stroke:#00e1ff
    style L fill:#0066ff,color:#fff,stroke:#00e1ff
    style C fill:#00C853,color:#fff,stroke:none
    style K fill:#7C3AED,color:#fff,stroke:none
```

### Document Intelligence Pipeline

```mermaid
flowchart LR
    A(["📄 Document\nIntent"]) --> B["📁 File Selector\n+ Path Validation"]
    B --> C{"File Type"}
    C -->|"PDF"| D["PyMuPDF\n+ pdfplumber"]
    C -->|"DOCX"| E["python-docx"]
    C -->|"Image"| F["OcrParser"]
    D & E --> G{"Content\nAnalysis"}
    G -->|"Text-Rich"| H["📝 Text Primary\nLLM Pass"]
    G -->|"Has Images\nor Scanned"| I["👁️ Groq Vision\nLlama 4 Scout"]
    G -->|"Low Confidence"| J["🔠 PaddleOCR"]
    H & I & J --> K["🔀 Fusion\nProcessor"]
    K --> L["🧠 Reasoning\nllama-3.3-70b"]
    L --> M["🗂️ Active Document\nIndex + SQLite Cache"]
    M --> N(["💬 Q&A Engine\n+ Multi-Doc Compare"])

    style A fill:#0066ff,color:#fff,stroke:#00e1ff
    style N fill:#0066ff,color:#fff,stroke:#00e1ff
    style L fill:#7C3AED,color:#fff,stroke:none
```

### Intent Routing Precedence

```mermaid
flowchart TD
    A(["Query"]) --> B{"Priority 1–17\nCorrection · Name\nGreeting · Location\nWellbeing · Help"}
    B -->|"Matched"| C(["Local Response"])
    B -->|"No match"| D{"Priority 18–27\nSpeedtest · Connectivity\nIP · Weather · Stat

[truncated…]

PUBLIC HISTORY

First discoveredApr 2, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenMar 25, 2026

last updatedApr 1, 2026

last crawled2 months ago

version—

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:deepakrakshit/jarvis)