AGENTS / GITHUB / ai-phone-negotiator
githubinferredactive

ai-phone-negotiator

provenance:github:FilHouston/ai-phone-negotiator

AI voice agent that calls customer hotlines and negotiates better contract terms

View Source ↗First seen 1mo agoNot yet hireable
README
# 📞 AI Phone Negotiator

> An autonomous AI voice agent that calls customer hotlines and negotiates better contract terms — on your behalf.

[![TypeScript](https://img.shields.io/badge/TypeScript-5.9-blue?logo=typescript)](https://www.typescriptlang.org/)
[![Node.js](https://img.shields.io/badge/Node.js-≥20-green?logo=node.js)](https://nodejs.org/)
[![OpenAI Realtime](https://img.shields.io/badge/OpenAI-Realtime_API-412991?logo=openai)](https://platform.openai.com/docs/api-reference/realtime)
[![Twilio](https://img.shields.io/badge/Twilio-Voice_API-F22F46?logo=twilio)](https://www.twilio.com/docs/voice)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

---

## The Problem

Calling your ISP to negotiate a better deal means:

- ⏳ 20–45 minutes in hold queues
- 🤖 Navigating confusing IVR phone menus
- 🎭 Arguing with trained retention agents
- 😤 Repeating the same conversation every 12 months

**What if an AI agent could do all of that for you?**

## The Solution

AI Phone Negotiator is a fully autonomous voice agent that:

1. **Dials the hotline** — initiates an outbound call via Twilio Voice
2. **Navigates the IVR** — listens to menu options and sends DTMF tones or voice commands
3. **Waits on hold** — detects hold music via energy-based VAD and **disconnects the AI** to save cost (~$9/30 min saved)
4. **Reconnects instantly** — when a human agent picks up, the AI reconnects in <2 seconds
5. **Negotiates** — engages with the human agent using a multi-phase strategy (friendly → competitor leverage → cancellation threat → retention)
6. **Logs offers** — extracts and records any counter-offers via function calling
7. **Never commits** — always defers final decisions to the human ("I'll discuss this with my client")

The agent handles the call. You make the decision.

---

## Architecture

```
┌──────────────┐     ┌───────────────┐     ┌──────────────────┐
│  Dashboard   │────▶│   Backend     │────▶│  Twilio Voice    │
│  (Web UI)    │◀────│   (Express)   │◀────│  (Outbound Call) │
└──────────────┘     └───────┬───────┘     └────────┬─────────┘
       ▲ WS                  │                      │
       │              ┌──────▼───────┐     ┌────────▼─────────┐
       │              │  Agent Brain │◀───▶│  Media Streams    │
       │              │  (OpenAI     │     │  (WebSocket)      │
       │              │   Realtime)  │     │  µ-law 8kHz       │
       │              └──────┬───────┘     └──────────────────┘
       │                     │
       │              ┌──────▼───────┐     ┌──────────────────┐
       │              │  Tool Layer  │     │  Hold Detector   │
       └──────────────│  ├ DTMF      │     │  Energy-based    │
                      │  ├ Call Ctrl │     │  VAD on µ-law    │
                      │  ├ Offers    │     │  ┌────────────┐  │
                      │  └ Transcript│     │  │IVR→HOLD→   │  │
                      └──────────────┘     │  │   HUMAN    │  │
                                           │  └────────────┘  │
                                           └──────────────────┘
```

### Data Flow

1. **Twilio** places an outbound call and opens a bidirectional **Media Stream** (WebSocket, µ-law 8kHz)
2. The **Hold Detector** analyzes incoming audio frames in real time — classifying the call phase as `IVR`, `HOLD`, or `HUMAN`
3. During **IVR** and **HUMAN** phases: audio is forwarded to the **OpenAI Realtime API** for speech processing and response generation
4. During **HOLD** phase: the OpenAI WebSocket is **disconnected** — no AI cost while waiting
5. When the Hold Detector detects a human voice after hold: OpenAI reconnects in <2 seconds with context ("You were just on hold, a human agent is now speaking")
6. **Function calling** enables the agent to send DTMF tones, log offers, and control the call
7. The **Dashboard** provides real-time monitoring via WebSocket — live transcript, phase indicator, hold timer, and manual override controls

---

## Features

### 🎯 Strategy Engine
Pluggable JSON profiles per provider with customer data, leverage points, competition offers, escalation paths, and hard safety rules. Provider-specific verification fields (PIN, IBAN, DOB) are only disclosed when explicitly asked.

### 📊 Dashboard
Dark-themed web UI with three views:
- **Start** — select a strategy profile, view call history, initiate calls
- **Live** — real-time transcript, offer cards, phase indicator, hold timer, emergency disconnect
- **Result** — call summary with all logged offers

### ✏️ Strategy Editor
Create and manage negotiation profiles directly in the dashboard — no JSON editing required. Dynamic forms adapt to the selected provider (Vodafone PIN fields, Telekom IBAN/DOB, etc.). Sensitive fields are masked in the UI.

### 💤 Hold Gating
Energy-based Voice Activity Detection on raw µ-law audio frames. When hold music is detected, the OpenAI Realtime API WebSocket is disconnected entirely. A 30-minute hold queue costs ~$0.06 instead of ~$9. Auto-timeout at 45 minutes.

### 🔒 Safety
- Agent **never** accepts, changes, or confirms any contract
- Only shares pre-approved customer data, only when explicitly asked
- All offers are logged via function calling — nothing is agreed to
- Emergency kill switch in dashboard and via API
- Configurable hard rules per strategy profile

---

## Tech Stack

| Component | Technology | Purpose |
|-----------|-----------|---------|
| **Telephony** | [Twilio Voice](https://www.twilio.com/docs/voice) + [Media Streams](https://www.twilio.com/docs/voice/media-streams) | Outbound calls, DTMF, bidirectional audio |
| **Voice AI** | [OpenAI Realtime API](https://platform.openai.com/docs/api-reference/realtime) | Real-time speech processing, function calling |
| **Backend** | Express + WebSocket (Node.js) | Orchestration, audio bridge, API endpoints |
| **Frontend** | Vanilla HTML/CSS/JS | Dashboard — no build step, no framework |
| **Language** | TypeScript 5.9 | Type safety, better DX |
| **Strategy** | JSON profiles | Provider-specific negotiation playbooks |

### Why Twilio + OpenAI Realtime (not Bland.ai / Vapi)?

Managed platforms abstract away the telephony layer — convenient, but limiting. Building directly on Twilio + OpenAI Realtime gives full control over:

- **Audio pipeline** — direct Media Stream ↔ Realtime API bridge, no middleman latency
- **Hold Gating** — disconnect AI during hold queues (impossible with managed platforms)
- **Tool layer** — custom function calling for DTMF, call control, offer extraction
- **Strategy engine** — pluggable negotiation profiles per provider
- **Cost** — ~$0.45/min active negotiation, ~$0.002/min on hold

---

## Project Structure

```
ai-phone-negotiator/
├── src/
│   ├── server.ts            # Express + WebSocket server, static files, API routes
│   ├── config.ts            # Environment config + validation
│   ├── check.ts             # Connectivity check (Twilio + OpenAI)
│   ├── call.ts              # Outbound call initiation via Twilio REST
│   ├── media-stream.ts      # Twilio Media Stream ↔ OpenAI audio bridge + hold gating
│   ├── openai-realtime.ts   # OpenAI Realtime API WebSocket client
│   ├── strategy.ts          # Strategy engine + system prompt builder
│   └── tools/
│       ├── dtmf.ts          # DTMF tone sending
│       ├── call-control.ts  # Call lifecycle (end, hold-for-human)
│       ├── offer-logger.ts  # Offer extraction + logging via function calling
│       └── transcript.ts    # TranscriptLogger (JSON) + EventEmitter for dashboard WS
├── strategies/
│   └── example.json         # Example strategy profile (safe to commit)
├── transcripts/             # Call transcripts (gitignored)
├── .env.example             # Environment template
├── TWILIO-SETUP.md          # Twilio account + phone number setup guide
├── tsconfig.json
└── package.json
```

---

## Strategy Profiles

Negotiations follow pluggable JSON profiles:

```json
{
  "provider": "Example ISP",
  "hotline": "+49XXXXXXXXXXX",

[truncated…]

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub
first seenMar 3, 2026
last updatedMar 6, 2026
last crawled5 days ago
version

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:FilHouston/ai-phone-negotiator)