githubinferredactive
ETHEL
provenance:github:MoltenSushi/ETHEL
Local AI stack that observes a single space: currently runs YOLO for detection, Qwen-VL for vision captions, Whisper for audio, and Llama for reasoning. Logs to SQLite, builds summaries/analytics, and exposes an OpenAI-style chat API.
README
TL;DR
- ETHEL is a fully local AI system that watches a single environment, writes down what it sees and hears, and builds a long-term memory of that space. The core pipeline is working today (vision → captions → database → summaries → analytics → chat). A per-entity weight system (novelty, comfort, expectation) and proper identity layer (“Sparky vs dog”) are designed but not mathematically specified or implemented yet.
-----
ETHEL:
Emergent Tethered Habitat-aware Engram Lattice
ETHEL is a local, self-contained system built from multiple open-source components working together: vision, audio, language, and memory. It runs on a single Windows machine, stays in one environment, and forms an understanding of that space through repetition, absence, observation of interactions, and change.
This isn’t a single model. ETHEL is a stack of parts, each doing one job, connected by a simple bridge so they act like one system.
-----
What ETHEL Is Trying to Be --
- A continuous observer of one physical space, not a general chatbot.
- Local-first: no cloud calls; all models and data live on the machine.
- Transparent: every detection, caption, summary, and decision is logged in human readable entries -- plain text or SQLite.
- A long-running experiment in environmental continuity and emergent bias/personality from accumulated experience.
-----
## Midbrain Demo
A runnable demonstration of ETHEL’s midbrain pipeline is included in the repository under:
**`midbrain_demo/`**
This demo shows the core perception flow:
```
Video → JSONL events → SQLite journal → Hourly/Daily summaries
```
It includes three standalone scripts (detector, journaler, summarizer) and a dedicated README with full instructions.
### What’s in the demo
- Motion detection
- YOLO object detection
- Novelty via pHash
- Track enter/exit events
- Burst detection + motion summaries
- Ingestion into a SQLite journal
- Hourly and daily summary generation
### What’s *not* in the demo (but exists in the full system)
- Qwen vision observer
- Whisper audio capture
- LLM reasoning layers
- Video clip recording
The demo is intentionally small and self-contained — it represents ETHEL’s midbrain/spine, not the full architecture.
For full details, see:
**`midbrain_demo/README.md`**
-----
NOTE: The current implementation is private: this repository documents the architecture and behavior of the system.
Pipeline Overview (Parts)
Part 1 – Inputs (Camera & Mic)
- Takes a single primary video source (RTSP, USB, or file).
- Takes a single audio source for ambient sound and speech.
-----
Part 2 – Eyes (Video Detector / Recorder – detectorv3.py)
Eyes = what happened, frame by frame, and where the evidence is stored
What it does:
- Captures video from the configured source (RTSP/USB/file).
- Runs YOLO-based detection on the stream.
- Maintains a rolling 15-minute MKV buffer:
- Writes fixed-length 15-minute recording chunks.
- Keeps a 7-day history of these MKVs for later re-scan or model retraining.
- Generates a continuous stream of still frames (≈3 fps) into a rolling 15-minute stills buffer used by Qwen.
- Computes motion scores and perceptual hashes to:
- decide when something counts as an “event”
- reduce duplicate events.
- Creates structured event JSONL files with:
- timestamps
- object detections (boxes, classes, confidences)
- motion metrics
- links to stills and clips.
- Cleans up old stills and recordings according to retention rules.
- Emits everything into a standard directory tree (record/, events/, rolling media/), so later parts can treat it as a stable data source.
-----
Part 2.5 – Lens (Qwen Vision Server + Adapter – qwen_server.py, vision_qwen.py)
Lens = turn these stills into concise language.
What it does:
- Wraps Qwen2-VL as a local HTTP server, using an OpenAI-style /v1/chat/completions interface.
- Uses an adapter (vision_qwen.py) to:
- load Qwen with int8/int4/fp16 quantization
- downscale images to a fixed max side
- build proper chat templates for text-only or image+text queries.
- Supports:
- text-only prompts
- image+text prompts for vision questions
- “fake streaming” by chunking single outputs.
- On visual requests, finds the latest still in the rolling media buffer and sends it to Qwen with an instruction prompt.
- Returns short, single-sentence captions for each burst of stills (“a person walks through the room and sits down”).
- Acts as the visual “front end” for both:
- the Qwen Observer (below)
- any tools that need direct visual descriptions.
-----
Part 2.8 – Ears (Audio Ingest + Whisper – 2_audio.py)
What it does:
- Uses FFmpeg to stream raw audio from the source.
- Runs VAD (webrtcvad) to detect actual speech vs background noise.
- Chunks speech into WAV files and drops very short/quiet fragments to reduce junk.
- Sends speech chunks to a background Whisper worker for transcription.
- Writes output into:
- events/YYYY-MM-DD/audio/chunks/… (audio files)
- events/YYYY-MM-DD/audio/transcripts.jsonl (aligned transcripts).
- Handles FFmpeg restarts with backoff and logs failures separately.
- Keeps everything time-aligned so the Journaler can treat audio like just another event stream.
- Right now, Whisper logs to disk; it isn’t yet wired directly through the bridge to Llama.
-----
Part 2.9 – Lens-Observer (Qwen Observer – qwen_observer.py)
The “visual narrator” that glues Eyes + Lens + DB together.
Lens-Observer = what just happened in the last second or two, in one sentence?
What it does:
- Watches events/YYYY-MM-DD/.../stills/ for new stills coming from Eyes.
- Groups stills into ~1-second bursts (small temporal windows).
- For each burst:
- sends multiple frames to the Qwen server (Lens)
- asks for a single concise caption of the burst.
- De-duplicates captions that are near-identical to avoid spam.
- Inserts one row per burst into vision_events in the SQLite journal, including:
- timestamp
- caption text
- list of frame paths used.
- Runs continuously with a small polling interval and batching window.
-----
Part 3 – Journaler (SQLite Event Store – journalerv2.py)
Journaler = make all this chaos queryable.
What it does:
- Consumes outputs from:
- Eyes (events, boxes, clips, MKV refs)
- Ears (transcripts)
- Lens-Observer (vision event captions).
- Creates and maintains the journal database with tables like:
- meta
- clips
- events
- boxes
- captions
- responses
- vision_events (linked to still bursts).
- Uses WAL mode and tuned pragmas for high write throughput.
- Normalizes paths and IDs so every event is queryable by time and type.
- Aligns everything to time segments (e.g., 15-minute chunks) for easier rollups.
- Is idempotent and schema-safe: can be re-run without destroying existing data.
- Acts as the single source of truth for “what ETHEL has seen and heard.”
- Feeds downstream summary/analytics jobs (Parts 4 and 5).
-----
Part 4 – Summarizer (Hourly + Daily Rollups – stage4_summarizer.py)
-- This runs periodically, not as a permanent process.
Summarizer = yesterday on one page
What it does:
- Connects read-only to the journal DB.
- Produces hourly rollups into summaries/summ_hourly.jsonl:
- counts of events, motion, captions
- speech totals (seconds, words, characters)
- object and presence stats.
- Produces daily summaries into summaries/YYYY-MM-DD.json:
- aggregated event patterns
- who/what appeared that day
- activity profile across the day.
- Verifies that summarized IDs actually exist in the DB (sanity checking).
- Acts as an intermediate compression layer between raw events and long-term analytics.
- Gives higher-level parts a “what today looked like” view instead of raw events.
- Can be re-run to regenerate summaries if
[truncated…]PUBLIC HISTORY
First discoveredMar 21, 2026
IDENTITY
inferred
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
platformgithub
first seenNov 14, 2025
last updatedNov 23, 2025
last crawled25 days ago
version—
README BADGE
Add to your README:
