ai-engineering-handbook
This agent, the AI Engineering Handbook, provides practical guidance for businesses deploying artificial intelligence systems. It addresses the common challenges that arise after an AI model initially works—problems like unexpected errors, high costs, and security vulnerabilities. Business leaders, product managers, and technical teams responsible for AI projects would find it valuable. Unlike many introductory AI resources, it focuses on the engineering aspects of building reliable and cost-effective AI applications ready for real-world use, offering solutions for production environments. The handbook is distinctive because it bridges the gap between theoretical AI concepts and the practical realities of running AI systems at scale.
README
<div align="center">
<img src="https://img.shields.io/github/stars/pranavjangam57/ai-engineering-handbook?style=for-the-badge&logo=github&color=FFD700" alt="Stars"/>
<img src="https://img.shields.io/github/forks/pranavjangam57/ai-engineering-handbook?style=for-the-badge&logo=github&color=4A90D9" alt="Forks"/>
<img src="https://img.shields.io/github/contributors/pranavjangam57/ai-engineering-handbook?style=for-the-badge&color=2ECC71" alt="Contributors"/>
<img src="https://img.shields.io/badge/PRs-welcome-brightgreen?style=for-the-badge" alt="PRs Welcome"/>
<img src="https://img.shields.io/badge/last%20updated-2026-blue?style=for-the-badge" alt="Last Updated"/>
<br/><br/>
# 🧠 AI Engineering Handbook
### The missing manual for building AI systems that actually work in production.
**Not theory. Not tutorials. Engineering.**
*Curated guides · Production patterns · Real code · Battle-tested architecture*
<br/>
[**🚀 Start Here**](#-start-here-pick-your-path) · [**📖 Browse Topics**](#-table-of-contents) · [**🤝 Contribute**](CONTRIBUTING.md) · [**⭐ Star this repo**](https://github.com/pranavjangam57/ai-engineering-handbook)
</div>
---
## ⚡ Why This Exists
Every AI tutorial teaches you to build a chatbot in 10 minutes.
**Nobody teaches you what happens at minute 11** — when it hallucinates, costs $4,000/month, fails silently, and your boss is asking why.
This handbook covers the gap between **"it works on my laptop"** and **"it works at 3am under load."**
---
## 🚀 Start Here: Pick Your Path
| I want to... | Go here |
|---|---|
| Build my first production RAG system | [📂 RAG Engineering](#-rag--retrieval-systems) |
| Design a reliable AI agent | [📂 Agent Architecture](#-agent-engineering) |
| Stop my LLM from costing a fortune | [📂 Cost & Performance](#-cost--performance) |
| Evaluate my model properly | [📂 Evals & Observability](#-evals--observability) |
| Secure my AI application | [📂 AI Security](#-ai-security) |
| Clone a working production app | [📂 Starter Kits](#-starter-kits--templates) |
---
## 📖 Table of Contents
- [🧠 Core LLM Engineering](#-core-llm-engineering)
- [📂 RAG & Retrieval Systems](#-rag--retrieval-systems)
- [🤖 Agent Engineering](#-agent-engineering)
- [📊 Evals & Observability](#-evals--observability)
- [🔒 AI Security](#-ai-security)
- [💰 Cost & Performance](#-cost--performance)
- [🏗️ Infrastructure & Deployment](#️-infrastructure--deployment)
- [🎯 Fine-Tuning & Alignment](#-fine-tuning--alignment)
- [🚀 Starter Kits & Templates](#-starter-kits--templates)
- [📚 Curated Resources](#-curated-resources)
- [🤝 Contributing](#-contributing)
- [🏆 Contributors](#-contributors)
---
## 🧠 Core LLM Engineering
> **TL;DR:** Master the primitives before you build the system.
| Guide | Description | Difficulty |
|---|---|---|
| [Model Comparison Matrix 2026](docs/core-llm/model-comparison.md) | GPT-4o vs Claude vs Gemini vs OSS — real benchmarks | 🟢 Beginner |
| Intelligent Model Routing | Route by task type to cut costs 60%+ | 🟡 Intermediate · 🚧 Coming Soon |
| Prompt Architecture Patterns | System prompts, few-shot design, chain-of-thought | 🟡 Intermediate · 🚧 Coming Soon |
| Structured Output Reliability | JSON mode, tool use, schema enforcement | 🟡 Intermediate · 🚧 Coming Soon |
| Context Window Management | Chunking, summarization, long-context strategies | 🔴 Advanced · 🚧 Coming Soon |
| LLM API Error Handling | Rate limits, timeouts, fallbacks, retries | 🟢 Beginner · 🚧 Coming Soon |
> 📬 Want to write one of these? [Claim a topic](https://github.com/pranavjangam57/ai-engineering-handbook/issues/new?template=new_topic.md)
---
## 📂 RAG & Retrieval Systems
> **TL;DR:** RAG is 80% retrieval engineering. Get the pipeline right — the LLM is the easy part.
### Should you use RAG?
```
├── Data changes frequently? → YES: RAG is right
├── Knowledge base > 128k tokens? → YES: RAG is right
├── Need source citations? → YES: RAG is right
├── Static, small, well-defined data? → MAYBE: Consider fine-tuning
└── Real-time data required? → YES: RAG + streaming ingestion
```
| Guide | Description | Difficulty |
|---|---|---|
| [RAG System Design](docs/rag/system-design.md) | End-to-end architecture for production RAG | 🟡 Intermediate |
| Chunking Strategies | Fixed, semantic, hierarchical — tradeoffs | 🟡 Intermediate · 🚧 Coming Soon |
| Embedding Model Selection | OpenAI vs Cohere vs BGE vs local models | 🟡 Intermediate · 🚧 Coming Soon |
| Vector Database Comparison | Pinecone vs Weaviate vs Qdrant vs pgvector | 🟡 Intermediate · 🚧 Coming Soon |
| Hybrid Search | BM25 + dense vectors, re-ranking, fusion | 🔴 Advanced · 🚧 Coming Soon |
| Advanced RAG Patterns | HyDE, FLARE, Self-RAG, Corrective RAG | 🔴 Advanced · 🚧 Coming Soon |
### 🔥 Run a production RAG app in 10 minutes
Full working kit in [`starter-kits/rag-production/`](starter-kits/rag-production/) — FastAPI + Qdrant + BGE + Claude.
```bash
git clone https://github.com/pranavjangam57/ai-engineering-handbook.git
cd ai-engineering-handbook/starter-kits/rag-production
cp .env.example .env # paste your ANTHROPIC_API_KEY
docker-compose up --build # starts the API on :8000 + Qdrant on :6333
```
```bash
# Ingest a document
curl -X POST http://localhost:8000/ingest \
-H "Content-Type: application/json" \
-d '{"text": "Your document text here", "source": "my-doc.txt"}'
# Ask a question — get a cited answer
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "What does the document say about X?"}'
```
---
## 🤖 Agent Engineering
> **TL;DR:** An agent is an LLM in a loop with tools. The hard part is making the loop reliable.
```
┌──────────────────────────────────────┐
│ PLANNING LAYER │ ← ReAct / CoT / ToT
└──────────────┬───────────────────────┘
▼
┌──────────────────────────────────────┐
│ TOOL LAYER │ ← APIs, Code exec, Search
└──────────────┬───────────────────────┘
▼
┌──────────────────────────────────────┐
│ MEMORY LAYER │ ← Short + Long term
└──────────────┬───────────────────────┘
▼
┌──────────────────────────────────────┐
│ SAFETY LAYER │ ← Budget caps + human-in-loop
└──────────────────────────────────────┘
```
| Guide | Description | Difficulty |
|---|---|---|
| [Agent Architecture Patterns](docs/agents/architecture.md) | ReAct, Plan-and-Execute, Multi-agent, Reflexion | 🟡 Intermediate |
| Tool Design for Agents | Schemas, error handling, safe execution | 🟡 Intermediate · 🚧 Coming Soon |
| Agent Memory Systems | In-context vs vector vs episodic memory | 🔴 Advanced · 🚧 Coming Soon |
| Multi-Agent Orchestration | Supervisor patterns, agent communication | 🔴 Advanced · 🚧 Coming Soon |
| Human-in-the-Loop Design | When to pause, how to escalate | 🟡 Intermediate · 🚧 Coming Soon |
---
## 📊 Evals & Observability
> **TL;DR:** If you can't measure it, you can't ship it. Evals are the unit tests of AI engineering.
```
┌──────────┐
│ A/B in │ ← Production traffic
│ prod │
/└──────────┘\
/ ┌──────────┐ \
/ │ LLM-as- │ \ ← Automated evals
/ │ judge │ \
/ └──────────┘ \
/ ┌──────────────────┐ \
/ │ Unit evals on │ \ ← Deterministic checks
/ │ golden dataset │ \
/ └──────────────────┘ \
└────────────────────────────────┘
```
| Guide | Description | Difficulty |
|---|---|---|
| [Eval Framework Design](docs/evals/framework-design.md) | Golden datasets, metrics, LLM-as-judge pipeline | 🟡 Intermediate |
| RAG-Specific Evals | Faithfulness, relevance, context recall | 🔴 Advanced · 🚧 Coming Soon |
| Observability Stack | Langfuse, Helicone, Arize, custom logging | 🟡 Intermediate · 🚧 Coming Soon |
| Regression Testing fo
[truncated…]PUBLIC HISTORY
IDENTITY
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
README BADGE
Add to your README:
