AGENTS / GITHUB / ETHEL
githubinferredactive

ETHEL

provenance:github:MoltenSushi/ETHEL
WHAT THIS AGENT DOES

ETHEL is a local AI system designed to continuously observe and learn about a single environment, like a home or office. It uses cameras and microphones to record what it sees and hears, creating a long-term memory of the space. The system builds an understanding of the environment through repeated observations and changes, logging everything in a transparent and human-readable format. ETHEL is not a chatbot; instead, it acts as a continuous observer, providing a record of events and patterns within its designated area. Developers and researchers interested in environmental monitoring, emergent AI behavior, or creating localized AI systems would find it useful. Its distinctive feature is its local-first design, ensuring all data and processing remain on a single machine without relying on cloud services.

PROBLEM IT SOLVES

ETHEL solves the problem of needing a persistent, automated record of activity in a specific location. Instead of manually documenting events or relying on simpler, less context-aware tools, ETHEL provides a continuous, evolving understanding of the environment, identifying patterns and changes over time.

View Source ↗First seen 6mo agoNot yet hireable

CAPABILITIES & CONSTRAINTS

TECH & STACK
pythonsqliteyolocomputer-visionlocal-aiobject-detectionqwen
README
TL;DR
- ETHEL is a fully local AI system that watches a single environment, writes down what it sees and hears, and builds a long-term memory of that space. The core pipeline is working today (vision → captions → database → summaries → analytics → chat). A per-entity weight system (novelty, comfort, expectation) and proper identity layer (“Sparky vs dog”) are designed but not mathematically specified or implemented yet.

-----

ETHEL:

Emergent Tethered Habitat-aware Engram Lattice

ETHEL is a local, self-contained system built from multiple open-source components working together: vision, audio, language, and memory. It runs on a single Windows machine, stays in one environment, and forms an understanding of that space through repetition, absence, observation of interactions, and change.

This isn’t a single model. ETHEL is a stack of parts, each doing one job, connected by a simple bridge so they act like one system.

-----

What ETHEL Is Trying to Be --
- A continuous observer of one physical space, not a general chatbot.
- Local-first: no cloud calls; all models and data live on the machine.
- Transparent: every detection, caption, summary, and decision is logged in human readable entries -- plain text or SQLite.
- A long-running experiment in environmental continuity and emergent bias/personality from accumulated experience.

-----

## Midbrain Demo

A runnable demonstration of ETHEL’s midbrain pipeline is included in the repository under:

**`midbrain_demo/`**

This demo shows the core perception flow:

```
Video → JSONL events → SQLite journal → Hourly/Daily summaries
```

It includes three standalone scripts (detector, journaler, summarizer) and a dedicated README with full instructions.

### What’s in the demo
- Motion detection  
- YOLO object detection  
- Novelty via pHash  
- Track enter/exit events  
- Burst detection + motion summaries  
- Ingestion into a SQLite journal  
- Hourly and daily summary generation  

### What’s *not* in the demo (but exists in the full system)
- Qwen vision observer  
- Whisper audio capture  
- LLM reasoning layers  
- Video clip recording  

The demo is intentionally small and self-contained — it represents ETHEL’s midbrain/spine, not the full architecture.

For full details, see:

**`midbrain_demo/README.md`**

-----

NOTE: The current implementation is private: this repository documents the architecture and behavior of the system.

Pipeline Overview (Parts)

Part 1 – Inputs (Camera & Mic)
- Takes a single primary video source (RTSP, USB, or file).
- Takes a single audio source for ambient sound and speech.

-----

Part 2 – Eyes (Video Detector / Recorder – detectorv3.py)

Eyes = what happened, frame by frame, and where the evidence is stored

What it does:
- Captures video from the configured source (RTSP/USB/file).
- Runs YOLO-based detection on the stream.
- Maintains a rolling 15-minute MKV buffer:
  - Writes fixed-length 15-minute recording chunks.
  - Keeps a 7-day history of these MKVs for later re-scan or model retraining.
- Generates a continuous stream of still frames (≈3 fps) into a rolling 15-minute stills buffer used by Qwen.
- Computes motion scores and perceptual hashes to:
  - decide when something counts as an “event”
  - reduce duplicate events.
- Creates structured event JSONL files with:
  - timestamps
  - object detections (boxes, classes, confidences)
  - motion metrics
  - links to stills and clips.
- Cleans up old stills and recordings according to retention rules.
- Emits everything into a standard directory tree (record/, events/, rolling media/), so later parts can treat it as a stable data source.

-----

Part 2.5 – Lens (Qwen Vision Server + Adapter – qwen_server.py, vision_qwen.py) 

Lens = turn these stills into concise language.

What it does:
- Wraps Qwen2-VL as a local HTTP server, using an OpenAI-style /v1/chat/completions interface.
- Uses an adapter (vision_qwen.py) to:
  - load Qwen with int8/int4/fp16 quantization
  - downscale images to a fixed max side
  - build proper chat templates for text-only or image+text queries.
- Supports:
  - text-only prompts
  - image+text prompts for vision questions
  - “fake streaming” by chunking single outputs.
- On visual requests, finds the latest still in the rolling media buffer and sends it to Qwen with an instruction prompt.
- Returns short, single-sentence captions for each burst of stills (“a person walks through the room and sits down”).
- Acts as the visual “front end” for both:
  - the Qwen Observer (below)
  - any tools that need direct visual descriptions.

-----

Part 2.8 – Ears (Audio Ingest + Whisper – 2_audio.py)

What it does:
- Uses FFmpeg to stream raw audio from the source.
- Runs VAD (webrtcvad) to detect actual speech vs background noise.
- Chunks speech into WAV files and drops very short/quiet fragments to reduce junk.
- Sends speech chunks to a background Whisper worker for transcription.
- Writes output into:
  - events/YYYY-MM-DD/audio/chunks/… (audio files)
  - events/YYYY-MM-DD/audio/transcripts.jsonl (aligned transcripts).
- Handles FFmpeg restarts with backoff and logs failures separately.
- Keeps everything time-aligned so the Journaler can treat audio like just another event stream.
- Right now, Whisper logs to disk; it isn’t yet wired directly through the bridge to Llama.

-----

Part 2.9 – Lens-Observer (Qwen Observer – qwen_observer.py)

The “visual narrator” that glues Eyes + Lens + DB together.

Lens-Observer = what just happened in the last second or two, in one sentence?

What it does:
- Watches events/YYYY-MM-DD/.../stills/ for new stills coming from Eyes.
- Groups stills into ~1-second bursts (small temporal windows).
- For each burst:
  - sends multiple frames to the Qwen server (Lens)
  - asks for a single concise caption of the burst.
  - De-duplicates captions that are near-identical to avoid spam.
  - Inserts one row per burst into vision_events in the SQLite journal, including:
    - timestamp
    - caption text
    - list of frame paths used.
- Runs continuously with a small polling interval and batching window.

-----

Part 3 – Journaler (SQLite Event Store – journalerv2.py)

Journaler = make all this chaos queryable.

What it does:
- Consumes outputs from:
  - Eyes (events, boxes, clips, MKV refs)
  - Ears (transcripts)
  - Lens-Observer (vision event captions).
- Creates and maintains the journal database with tables like:
  - meta
  - clips
  - events
  - boxes
  - captions
  - responses
  - vision_events (linked to still bursts).
- Uses WAL mode and tuned pragmas for high write throughput.
- Normalizes paths and IDs so every event is queryable by time and type.
- Aligns everything to time segments (e.g., 15-minute chunks) for easier rollups.
- Is idempotent and schema-safe: can be re-run without destroying existing data.
- Acts as the single source of truth for “what ETHEL has seen and heard.”
- Feeds downstream summary/analytics jobs (Parts 4 and 5).

-----

Part 4 – Summarizer (Hourly + Daily Rollups – stage4_summarizer.py)

-- This runs periodically, not as a permanent process.

Summarizer = yesterday on one page

What it does:
- Connects read-only to the journal DB.
- Produces hourly rollups into summaries/summ_hourly.jsonl:
  - counts of events, motion, captions
  - speech totals (seconds, words, characters)
  - object and presence stats.
- Produces daily summaries into summaries/YYYY-MM-DD.json:
  - aggregated event patterns
  - who/what appeared that day
  - activity profile across the day.
- Verifies that summarized IDs actually exist in the DB (sanity checking).
- Acts as an intermediate compression layer between raw events and long-term analytics.
- Gives higher-level parts a “what today looked like” view instead of raw events.
- Can be re-run to regenerate summaries if

[truncated…]

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub
first seenNov 14, 2025
last updatedNov 23, 2025
last crawled2 months ago
version

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:MoltenSushi/ETHEL)