AGENTS / GITHUB / ETHEL
githubinferredactive

ETHEL

provenance:github:MoltenSushi/ETHEL

Local AI stack that observes a single space: currently runs YOLO for detection, Qwen-VL for vision captions, Whisper for audio, and Llama for reasoning. Logs to SQLite, builds summaries/analytics, and exposes an OpenAI-style chat API.

View Source ↗First seen 5mo agoNot yet hireable
README
TL;DR
- ETHEL is a fully local AI system that watches a single environment, writes down what it sees and hears, and builds a long-term memory of that space. The core pipeline is working today (vision → captions → database → summaries → analytics → chat). A per-entity weight system (novelty, comfort, expectation) and proper identity layer (“Sparky vs dog”) are designed but not mathematically specified or implemented yet.

-----

ETHEL:

Emergent Tethered Habitat-aware Engram Lattice

ETHEL is a local, self-contained system built from multiple open-source components working together: vision, audio, language, and memory. It runs on a single Windows machine, stays in one environment, and forms an understanding of that space through repetition, absence, observation of interactions, and change.

This isn’t a single model. ETHEL is a stack of parts, each doing one job, connected by a simple bridge so they act like one system.

-----

What ETHEL Is Trying to Be --
- A continuous observer of one physical space, not a general chatbot.
- Local-first: no cloud calls; all models and data live on the machine.
- Transparent: every detection, caption, summary, and decision is logged in human readable entries -- plain text or SQLite.
- A long-running experiment in environmental continuity and emergent bias/personality from accumulated experience.

-----

## Midbrain Demo

A runnable demonstration of ETHEL’s midbrain pipeline is included in the repository under:

**`midbrain_demo/`**

This demo shows the core perception flow:

```
Video → JSONL events → SQLite journal → Hourly/Daily summaries
```

It includes three standalone scripts (detector, journaler, summarizer) and a dedicated README with full instructions.

### What’s in the demo
- Motion detection  
- YOLO object detection  
- Novelty via pHash  
- Track enter/exit events  
- Burst detection + motion summaries  
- Ingestion into a SQLite journal  
- Hourly and daily summary generation  

### What’s *not* in the demo (but exists in the full system)
- Qwen vision observer  
- Whisper audio capture  
- LLM reasoning layers  
- Video clip recording  

The demo is intentionally small and self-contained — it represents ETHEL’s midbrain/spine, not the full architecture.

For full details, see:

**`midbrain_demo/README.md`**

-----

NOTE: The current implementation is private: this repository documents the architecture and behavior of the system.

Pipeline Overview (Parts)

Part 1 – Inputs (Camera & Mic)
- Takes a single primary video source (RTSP, USB, or file).
- Takes a single audio source for ambient sound and speech.

-----

Part 2 – Eyes (Video Detector / Recorder – detectorv3.py)

Eyes = what happened, frame by frame, and where the evidence is stored

What it does:
- Captures video from the configured source (RTSP/USB/file).
- Runs YOLO-based detection on the stream.
- Maintains a rolling 15-minute MKV buffer:
  - Writes fixed-length 15-minute recording chunks.
  - Keeps a 7-day history of these MKVs for later re-scan or model retraining.
- Generates a continuous stream of still frames (≈3 fps) into a rolling 15-minute stills buffer used by Qwen.
- Computes motion scores and perceptual hashes to:
  - decide when something counts as an “event”
  - reduce duplicate events.
- Creates structured event JSONL files with:
  - timestamps
  - object detections (boxes, classes, confidences)
  - motion metrics
  - links to stills and clips.
- Cleans up old stills and recordings according to retention rules.
- Emits everything into a standard directory tree (record/, events/, rolling media/), so later parts can treat it as a stable data source.

-----

Part 2.5 – Lens (Qwen Vision Server + Adapter – qwen_server.py, vision_qwen.py) 

Lens = turn these stills into concise language.

What it does:
- Wraps Qwen2-VL as a local HTTP server, using an OpenAI-style /v1/chat/completions interface.
- Uses an adapter (vision_qwen.py) to:
  - load Qwen with int8/int4/fp16 quantization
  - downscale images to a fixed max side
  - build proper chat templates for text-only or image+text queries.
- Supports:
  - text-only prompts
  - image+text prompts for vision questions
  - “fake streaming” by chunking single outputs.
- On visual requests, finds the latest still in the rolling media buffer and sends it to Qwen with an instruction prompt.
- Returns short, single-sentence captions for each burst of stills (“a person walks through the room and sits down”).
- Acts as the visual “front end” for both:
  - the Qwen Observer (below)
  - any tools that need direct visual descriptions.

-----

Part 2.8 – Ears (Audio Ingest + Whisper – 2_audio.py)

What it does:
- Uses FFmpeg to stream raw audio from the source.
- Runs VAD (webrtcvad) to detect actual speech vs background noise.
- Chunks speech into WAV files and drops very short/quiet fragments to reduce junk.
- Sends speech chunks to a background Whisper worker for transcription.
- Writes output into:
  - events/YYYY-MM-DD/audio/chunks/… (audio files)
  - events/YYYY-MM-DD/audio/transcripts.jsonl (aligned transcripts).
- Handles FFmpeg restarts with backoff and logs failures separately.
- Keeps everything time-aligned so the Journaler can treat audio like just another event stream.
- Right now, Whisper logs to disk; it isn’t yet wired directly through the bridge to Llama.

-----

Part 2.9 – Lens-Observer (Qwen Observer – qwen_observer.py)

The “visual narrator” that glues Eyes + Lens + DB together.

Lens-Observer = what just happened in the last second or two, in one sentence?

What it does:
- Watches events/YYYY-MM-DD/.../stills/ for new stills coming from Eyes.
- Groups stills into ~1-second bursts (small temporal windows).
- For each burst:
  - sends multiple frames to the Qwen server (Lens)
  - asks for a single concise caption of the burst.
  - De-duplicates captions that are near-identical to avoid spam.
  - Inserts one row per burst into vision_events in the SQLite journal, including:
    - timestamp
    - caption text
    - list of frame paths used.
- Runs continuously with a small polling interval and batching window.

-----

Part 3 – Journaler (SQLite Event Store – journalerv2.py)

Journaler = make all this chaos queryable.

What it does:
- Consumes outputs from:
  - Eyes (events, boxes, clips, MKV refs)
  - Ears (transcripts)
  - Lens-Observer (vision event captions).
- Creates and maintains the journal database with tables like:
  - meta
  - clips
  - events
  - boxes
  - captions
  - responses
  - vision_events (linked to still bursts).
- Uses WAL mode and tuned pragmas for high write throughput.
- Normalizes paths and IDs so every event is queryable by time and type.
- Aligns everything to time segments (e.g., 15-minute chunks) for easier rollups.
- Is idempotent and schema-safe: can be re-run without destroying existing data.
- Acts as the single source of truth for “what ETHEL has seen and heard.”
- Feeds downstream summary/analytics jobs (Parts 4 and 5).

-----

Part 4 – Summarizer (Hourly + Daily Rollups – stage4_summarizer.py)

-- This runs periodically, not as a permanent process.

Summarizer = yesterday on one page

What it does:
- Connects read-only to the journal DB.
- Produces hourly rollups into summaries/summ_hourly.jsonl:
  - counts of events, motion, captions
  - speech totals (seconds, words, characters)
  - object and presence stats.
- Produces daily summaries into summaries/YYYY-MM-DD.json:
  - aggregated event patterns
  - who/what appeared that day
  - activity profile across the day.
- Verifies that summarized IDs actually exist in the DB (sanity checking).
- Acts as an intermediate compression layer between raw events and long-term analytics.
- Gives higher-level parts a “what today looked like” view instead of raw events.
- Can be re-run to regenerate summaries if

[truncated…]

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub
first seenNov 14, 2025
last updatedNov 23, 2025
last crawled25 days ago
version

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:MoltenSushi/ETHEL)