habit-builder-ai-agent

provenance:github:vtdinh13/habit-builder-ai-agent

An end-to-end AI agent project that transcribes audio files, embeds user queries, and searches in Qdrant and web browser via the Brave API. A Streamlit interface powered by OpenAI GPT models delivers actionable health insights from both the archive and the latest research.

View Source ↗First seen 8mo agoNot yet hireable

README

![python](https://img.shields.io/badge/Python-3.11.14-3776AB.svg?style=flat&logo=python)
![postgres](https://img.shields.io/badge/PostgreSQL-17.0-4169E1.svg?style=flat&logo=postgresql)
![docker](https://img.shields.io/badge/Docker-28.5.1-2496ED.svg?style=flat&logo=docker)
![pydantic](https://img.shields.io/badge/Pydantic-2.12.2-E92063.svg?style=flat&logo=pydantic)
![brave](https://img.shields.io/badge/Brave-FB542B.svg?style=flat&logo=brave)
![qdrant](https://img.shields.io/badge/Qdrant-v1.16-DC244C.svg?style=flat&)

## Introduction
Building lasting habits is challenging without understanding the *why* behind them. The Habit Builder AI Agent bridges this gap by drawing on the Huberman Lab podcast archive—a top 5 podcast on Apple and Spotify with over 7 million YouTube subscribers—to transform complex scientific insights into actionable habits supported by the latest research.

This AI agent acts as a personalized coach that delivers relevant knowledge from the podcast's extensive archive, searches the web for current research, and with access to all of this knowledge, recommends actionable takeaways grounded in expert interviews and scientific evidence.

Audio files are downloaded via RSS, transcribed with Faster Whisper, chunked with a sliding window, and embedded with Hugging Face's Sentence Transformer model `all-MPNet-base-v2`. Qdrant stores embeddings and Streamlit offers a nice interface to interact with the agent. You will be able to create the local Streamlit version if you replicate this project, or you can also visit the [Streamlit cloud version](https://habit-builder-ai-agent.streamlit.app/), as shown below:

The agent is implemented with the Pydantic's BaseModel for strict Python data validation and PydanticAI's Agent class for structured output and agent tooling. OpenAI's `gpt-4o-mini` powers the reasoning and the tools given to the agent include: searching the knowledge base, retrieving recent research articles, and summarizing the current state of the research for a requested topic.

Testing and evaluation combine vibe checks, unit tests (via the pytest framework), and a LLM judge. Logging/monitoring can be done locally or with Pydantic Logfire.

The diagram below outlines the development flow and supporting services.

## Setup
0. Clone this repo:
```bash
git clone https://github.com/vtdinh13/habit-builder-ai-agent.git
```

1. `uv` manages Python packages and the virtual environment. To replicate the project you can use either `uv` or `pip`, but using `uv` will match this repository’s workflow most closely. Choose Option 1 or 2, but not both.

*Option 1*: Manage with uv
- Install `uv` if it is not already on your system. See [Astral documentation](https://docs.astral.sh/uv/getting-started/installation/) for installation steps.
- Run `uv sync` to install all required packages and start uv's managed virtual environment.

*Option 2*: Manage with pip
- Run `pip install -r requirements.txt`.
2. Docker Desktop runs the Docker Engine daemon.
- Download Docker Desktop if needed (refer to the [Docker Desktop docs](https://docs.docker.com/desktop/)).
- Start Docker Desktop so the Docker Engine daemon is available.
- Run `docker-compose up` to start every service, or `docker-compose up -d` to run them in detached mode so the containers stay in the background while you continue working in the terminal.
3. API keys are required.

*Required keys*

- The agent uses OpenAI models. Sign up for an [OpenAI API key](https://auth.openai.com/create-account) if you don't already have one.
- The Brave API powers the web search tool. Register for a [Brave API key](https://api-dashboard.search.brave.com/register).

*Optional keys*

- The local vector databases is sufficient, but if you want to upload embeddings to Qdrant cloud, generate an API key from [Qdrant](https://login.cloud.qdrant.io/u/signup/identifier?state=hKFo2SBfRTd1VlpiZHlTRFJ5a1NoUGp4T20yenJDSzhsUHI4baFur3VuaXZlcnNhbC1sb2dpbqN0aWTZIGsxZ1RDOUc0U2UxMlNjNkdWbktLcXBneEM0em9WMlNJo2NpZNkgckkxd2NPUEhPTWRlSHVUeDR4MWtGMEtGZFE3d25lemc).

4. API keys are managed via `direnv`. Keys live in a `.env` file, and `.envrc` contains `dotenv` so the values load automatically. Example:

```
OPENAI_API_KEY=openai_api_key_value
BRAVE_API_KEY=brave_api_key_value
```

5. If you want to skip `direnv` and `.env` entirely, export the keys and its values:
```bash
export OPENAI_API_KEY="openai_api_key_value"
export BRAVE_API_KEY="brave_api_key_value"
```
The values to the API keys are now available in your current working environment. Exports only apply to this one session; you'd have to export it again if you need to revist this project.
## Ingestion

0. Downloading and transcribing audio files is a project on its own. A Parquet file containing transcripts is provided to avoid this step. See [Ingestion](ingestion/README.md) if you'd like to replicate the transcription process yourself.

1. Make sure Docker Desktop is running.

2. Start the qdrant service:

```
docker-compose up qdrant -d
```

3. The speed of chunking and embedding text and upserting into Qdrant depends on your processor. Adjust the parquet and embedding batch size according to the capability of your machine. The default is set at 128. Chunking and uploading embeddings to the local Qdrant vector database takes ~2 hours.

```bash
uv run python ingestion/ingest_qdrant.py \
--parquet-path transcripts/transcripts.parquet \
--collection-name transcripts \
--distance cosine \
--parquet-batch-size 128 \
--embedding-batch-size 128 \
--target local
```

<u>*Recommendation*</u>: Add the `--limit` argument to process only a sample of transcripts (each row corresponds to one episode/transcript). For example, `--limit 100` chunks and uploads the first 100 transcripts. This cuts the processing time to about 40 minutes.

4. If you are using pip to manage your packages, run the following instead:

```bash
python ingestion/ingest_qdrant.py \
--parquet-path transcripts/transcripts.parquet \
--collection-name transcripts \
--distance cosine \
--parquet-batch-size 128 \
--embedding-batch-size 128 \
--target local \
--limit 100
```

5. *Optional*: You can see your data in the Qdrant dashboard: http://localhost:6333/dashboard.

6. Keep the Qdrant service up to run the agent. Shut the Qdrant service down when the service is no longer neccessary: `docker-compose down qdrant`.

## Agent
1. The Qdrant database is ready for querying. Note that the Qdrant service has to be running via Docker. Testing on CLI allows you to ask only one question at a time and you cannot ask follow up questions. Make sure that your API keys are available in your current working environment.
- with uv:
```bash
uv run habit_agent_run.py
```

- with pip:
```bash
python habit_agent_run.py
```
2. You can also run the agent locally on Streamlit. This option includes streaming parsing and continuing conversation. Run the following command on CLI:
- with uv:
```bash
uv run streamlit run qdrant_app_no_logfire.py
```
- with pip:
```bash
python streamlit run qdrant_app_no_logfire.py
```
3. A window should pop up in your browser giving you access to the streamlit app. Paste this link in you

[truncated…]

PUBLIC HISTORY

First discoveredMar 22, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenNov 18, 2025

last updatedJan 8, 2026

last crawled1 months ago

version—

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:vtdinh13/habit-builder-ai-agent)