OpenVoiceUI
OpenVoiceUI is an open-source voice AI that allows users to control their computer and build applications using only their voice. It functions as a hands-free AI computer, rendering live web apps, dashboards, and games directly on a visual canvas. Users can generate music, delegate tasks to multiple AI workers, and manage projects with voice commands. The system remembers information across sessions and offers a customizable interface with various animated faces. It is self-hosted, ensuring user data privacy and control, and is released under the MIT license. This tool is designed for developers, creators, and anyone seeking a more intuitive and efficient way to interact with their computer.
OpenVoiceUI solves the problem of tedious manual coding and repetitive tasks by enabling users to build and interact with applications solely through voice commands. It's a better alternative to typing prompts into a chat box or manually coding because it provides a real-time, visual rendering of applications as they are being created.
CAPABILITIES & CONSTRAINTS
README
<p align="center"> <img src="docs/banner.jpg" alt="OpenVoiceUI Banner" width="100%" /> </p> <h1 align="center">OpenVoiceUI</h1> <p align="center"><strong>The open-source voice AI that actually does work.</strong></p> <p align="center"> <a href="https://www.npmjs.com/package/openvoiceui"><img src="https://img.shields.io/npm/v/openvoiceui?style=flat-square&color=3b82f6" alt="npm version" /></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=flat-square" alt="MIT License" /></a> <a href="https://github.com/MCERQUA/OpenVoiceUI/stargazers"><img src="https://img.shields.io/github/stars/MCERQUA/OpenVoiceUI?style=flat-square&color=06b6d4" alt="GitHub Stars" /></a> <a href="https://openvoiceui.com"><img src="https://img.shields.io/badge/website-openvoiceui.com-0f172a?style=flat-square" alt="Website" /></a> </p> <p align="center"> Install, open <code>localhost:5001</code>, say <em>"build me a dashboard"</em>, and watch it render live. </p> --- <!-- TODO: Add 15-30s demo GIF showing voice prompt → canvas page rendering live --> > **[Watch the demo](https://openvoiceui.com)** -- see voice-to-canvas in action --- ## Install **Prerequisite: [Docker](https://docs.docker.com/get-docker/) must be installed and running for all install methods.** ### Pinokio (one-click) Download [Pinokio](https://pinokio.co) if you don't have it, then search **"OpenVoiceUI"** in the app store and click **Install**. ### npm ```bash npx openvoiceui setup # interactive wizard — walks you through API keys + builds Docker images npx openvoiceui start # starts everything ``` ### Docker ```bash git clone https://github.com/MCERQUA/OpenVoiceUI.git cd OpenVoiceUI cp .env.example .env # edit with your API keys docker compose up ``` Open **localhost:5001** and start talking. --- ## What is OpenVoiceUI? OpenVoiceUI is a hands-free, AI-controlled computer. You talk — it builds. Live web apps, dashboards, games, full websites — rendered in real time while you watch. No mouse, no keyboard, no typing prompts into a chat box. It runs on [OpenClaw](https://openclaw.org) and works with any LLM. The AI agent can build and display apps mid-conversation, switch between projects with a voice command, generate music on the fly, delegate work to parallel sub-agents, and remember everything across sessions. It uses any [Claude Code](https://docs.anthropic.com/en/docs/claude-code) or [OpenClaw](https://openclaw.org) skill — and the community can build and share more through the plugin system. Self-hosted. Your hardware, your data. MIT licensed, forever free. ## Core Features - **Hands-Free AI Computer** — Talk and watch it work. The AI builds apps, switches between projects, runs tasks, and displays results on a live visual canvas — all without touching a mouse or keyboard. - **Live Canvas** — AI renders real HTML pages mid-conversation: dashboards, tools, galleries, reports, full web apps. Not text responses — real interactive pages you can use. - **AI Music Generation** — Generate songs on the fly with your voice using Suno. Full music player with playlist management built in. - **Custom Animated Interface** — Choose from animated face modes (eye-face avatar, reactive halo-smoke orb) or install community-built faces through plugins. Build your own — the face system is fully extensible. - **Sub-Agents** — Delegate multiple tasks to parallel AI workers simultaneously and get results back. - **Long-Term Memory** — Optional context engine plugin curates knowledge every turn. Persists across sessions in human-readable markdown. - **Desktop OS Interface** — Themed desktop environment with window management (Windows XP, macOS, Ubuntu, Win95, Win 3.1). - **Admin Dashboard** — Mobile-responsive. Agent profiles, provider config, workspace file browser, plugin management, system health. Everything editable live. - **Self-Hosted** — Your hardware, your data. No vendor lock-in, no monthly fees. ## And More - Image generation (FLUX.1, Stable Diffusion 3.5) - Video creation (Remotion Studio) - Voice cloning (Qwen3-TTS via fal.ai) - Cron jobs for scheduled automation - File explorer with drag-and-drop - Agent profiles — switch personas, voices, and LLM providers from the admin panel --- ## Plugins OpenVoiceUI has a plugin system for community-built extensions. Plugins can include animated face packs, canvas pages, workflow dashboards, gateway adapters, or any combination of these. **Our first community plugin:** - [**BHB Animated Characters**](https://github.com/MCERQUA/openvoiceui-plugins) — Custom animated avatar faces by BHB **Build your own.** If you can build a canvas page, an animated face, or a workflow dashboard, you can package it as a plugin. See the [plugins repo](https://github.com/MCERQUA/openvoiceui-plugins) for submission guidelines and the BHB plugin as a reference. --- ## Install Details ### Option 1: Pinokio (one-click) 1. Install [Pinokio](https://pinokio.co) if you don't have it 2. Search **"OpenVoiceUI"** in the Pinokio app store 3. Click **Install**, then **Start** Pinokio handles Docker, dependencies, and configuration automatically. ### Option 2: npm Requires **Node.js 20+**, **Python 3.10+**, and **Docker**. ```bash npx openvoiceui setup # interactive wizard — configures LLM, TTS, API keys, builds Docker images npx openvoiceui start # starts OpenClaw gateway + Supertonic TTS + voice UI ``` The setup wizard walks you through choosing an LLM provider, TTS provider, and entering API keys. Configuration is saved to `.env` and `openclaw-data/`. ```bash npx openvoiceui stop # stop all services npx openvoiceui status # check what's running npx openvoiceui logs # tail service logs ``` ### Option 3: Docker Requires **Docker** and **Docker Compose**. ```bash git clone https://github.com/MCERQUA/OpenVoiceUI.git cd OpenVoiceUI cp .env.example .env ``` Edit `.env` with your API keys (at minimum: an LLM provider key and optionally a TTS key). Then: ```bash docker compose up -d ``` This starts three containers: | Container | Port | Purpose | |-----------|------|---------| | `openclaw` | 18791 | LLM gateway — routes to your chosen LLM provider | | `supertonic` | (internal) | Free local TTS — no API key needed | | `openvoiceui` | 5001 | Voice UI + Canvas + Admin dashboard | Open **http://localhost:5001** to use the voice interface, or **http://localhost:5001/admin** for the admin dashboard. To stop: `docker compose down` ### Option 4: VPS / Production For running on an Ubuntu server with nginx and systemd: ```bash git clone https://github.com/MCERQUA/OpenVoiceUI.git cd OpenVoiceUI cp .env.example .env # edit with your API keys sudo bash deploy/setup-sudo.sh # creates dirs, installs systemd service bash deploy/setup-nginx.sh # generates nginx config (edit domain) ``` See [`deploy/`](deploy/) for the full production setup including SSL, nginx reverse proxy, and systemd service files. --- ## Configuration All configuration is in `.env`. Copy `.env.example` to `.env` and fill in your values. **Required:** - An LLM provider API key (OpenAI, Anthropic, Groq, Z.AI, or any OpenClaw-compatible provider) - `CLAWDBOT_AUTH_TOKEN` — set during `npx openvoiceui setup` or in OpenClaw's setup wizard **Optional but recommended:** - `GROQ_API_KEY` — enables Groq Orpheus TTS (fast, high quality, free tier) - `SUNO_API_KEY` — enables AI music generation - `CLERK_PUBLISHABLE_KEY` — enables login/auth (for multi-user or public deployments) See [`.env.example`](.env.example) for all available options with descriptions. --- ## Works With Any Provider **LLM** | Provider | Status | |----------|--------| | OpenClaw Gateway | Built-in — routes to OpenAI, Anthropic, Groq, Z.AI, and more | | Z.AI (GLM-5-turbo) | Built-in | | Groq (Llama, Qwen) | Via OpenClaw | | Google Gemini | Via OpenClaw | | MiniMax | Via OpenClaw | | Ollama (local) | Via adapter | | Any LLM | Drop-in gatewa [truncated…]
PUBLIC HISTORY
IDENTITY
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
README BADGE
Add to your README:
