AGENTS / GITHUB / pdhbc-credit-risk-agent-jupyter-internal
githubinferredactive

pdhbc-credit-risk-agent-jupyter-internal

provenance:github:Amit-hub1998/pdhbc-credit-risk-agent-jupyter-internal

Intelligent debugging chatbot for PDHBC Credit Risk Model — RAG + Pandas + Groq LLM

View Source ↗First seen 18d agoNot yet hireable
README
# PDHBC Intelligent Debugging Chatbot

> **Phase 1 — Agent Inside Jupyter Notebook**  
> HSBC Data Engineering Team | Credit Risk Pipeline

---

## Overview

This project implements an intelligent debugging chatbot for the **PDHBC Credit Risk Model**. It allows Finance teams to query specific attribute values and get structured root-cause answers — without involving DA/BA teams manually.

The agent combines:
- **RAG (Retrieval-Augmented Generation)** over BRD, ERD, and existing Jupyter Notebook logic
- **Pandas-based data lookup** across all 4 source tables
- **Groq LLM** (LLaMA 3.3 70B) for natural language structured answers

---

## Table Name Reference

| Table Name | File | Description |
|-----------|------|-------------|
| `credit_applications` | credit_applications.xls | Credit proposal records — anchor table for all joins |
| `basic_details` | basic_details.xls | Baseline risk ratings — fallback source when multi_ratings absent |
| `multi_ratings` | multi_ratings.xls | Primary risk ratings — preferred source, overrides basic_details |
| `downstream` | downstream.xls | Downstream customer ID mapping — customer-level, no proposal granularity |

---

## Key Variable Reference

| Variable | Table | Type | Nullable | Description |
|----------|-------|------|----------|-------------|
| `relationship_id` | All tables | VARCHAR(20) | NO | Root join key across all 4 tables. Prefix: 10x=India, FR=France, 20x=Germany |
| `credit_proposal_serial_no` | credit_applications, basic_details, multi_ratings | VARCHAR(10) | NO | Proposal-level join key. Format: YYMMDD |
| `customer_id` | basic_details, multi_ratings, downstream | VARCHAR(20) | NO | Customer identifier — mirrors relationship_id in current dataset |
| `final_rating` | basic_details, multi_ratings, final output | DECIMAL(4,1) | YES | Resolved credit risk rating. COALESCE(multi_ratings value, basic_details value) |
| `local_customer_id` | downstream | VARCHAR(20) | YES | Regional customer identifier. Format: LCL-IN-xxxx / LCL-FR-xxxx / LCL-DE-xxxx. ~60% populated |
| `app_status` | credit_applications | CHAR(1) | NO | Proposal status. A=Approved, L=Lapsed, R=Rejected. Pipeline keeps A only |
| `app_proposal_type` | credit_applications | VARCHAR(5) | NO | Proposal type. RN=Renewal, NL=New Loan, CN=Change-in-Terms. CN excluded by pipeline |
| `app_approval_dt` | credit_applications | DATE | YES | Approval date. Used to select latest proposal per relationship_id |
| `final_pd_basic_details` | basic_details | DECIMAL(6,4) | YES | Probability of Default from basic_details. Fallback when multi_ratings absent |
| `final_pd_multi_ratings` | multi_ratings | DECIMAL(6,4) | YES | Probability of Default from multi_ratings. Primary source |
| `rating_source` | final output | VARCHAR(10) | NO | Traceability flag. Values: multi_ratings / basic_details / MISSING |

---

## Folder Structure

```
HSBC_Conversion/
│
├── pdhbc_agent.ipynb                    ← Main agent notebook (entry point)
│
├── data/
│   ├── credit_applications.xls          ← Credit Applications table
│   ├── basic_details.xls                ← Basic Details (baseline ratings)
│   ├── multi_ratings.xls                ← Multi Ratings (primary ratings)
│   └── downstream.xls                   ← Downstream (customer ID mapping)
│
├── docs/
│   ├── BRD_CreditRiskPipeline.docx      ← Business Requirements Document
│   └── ERD_CreditRiskPipeline.docx      ← Entity Relationship Document
│
├── notebook/
│   └── CFC_PD_ECB.ipynb                 ← Existing model notebook (RAG source)
│
├── faiss_index/                          ← Auto-generated vector store (git ignored)
│
├── README.md                             ← This file
├── SETUP.md                              ← Step-by-step setup guide
├── ARCHITECTURE.md                       ← Technical architecture details
└── .gitignore
```

---

## Example Queries

```python
# Null rating debugging
"Why is final_rating null for relationship 1000101?"

# Downstream ID reverse lookup
"For local_customer_id LCL-IN-5530 what is the latest credit proposal?"

# Missing downstream ID
"Why is local_customer_id blank for relation 1000305?"

# General logic questions
"How is final_rating calculated in the pipeline?"

# Source tracing
"Where does final_rating come from for FR100201?"
```

---

## Key Design Decisions

| Decision | Reason |
|----------|--------|
| Jupyter Notebook agent | No new infrastructure — works in existing Citrix setup |
| FAISS over ChromaDB | ChromaDB requires SQLite 3.35+ which fails on Python 3.9 / Windows 11 |
| Groq LLM | Free, fast, no GPU needed, accessible via API key |
| Pandas over PySpark | Lightweight for local prototyping with mock Excel data |
| RAG on BRD + ERD + Notebook | Agent understands both business logic and technical implementation |

---

## Team

| Role | Responsibility |
|------|---------------|
| DA/BA | Run agent notebook, respond to Finance queries |
| Data Engineering | Maintain pipeline logic and RAG documents |
| Finance | Submit queries via the agent |

PUBLIC HISTORY

First discoveredMar 30, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub
first seenMar 29, 2026
last updatedMar 29, 2026
last crawled5 days ago
version

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:Amit-hub1998/pdhbc-credit-risk-agent-jupyter-internal)