AI-Bank-Statement-Document-Automation-By-LLM-And-Personal-Finanical-Analysis-Prediction
This agent automatically processes your bank statements, extracting all the key financial information like income and expenses. It takes those often-complex PDF documents and turns them into organized data, eliminating the need for manual entry and analysis. Financial advisors, small business owners, or anyone wanting a better understanding of their personal finances would find this helpful. The agent can then answer your questions about your spending habits and even predict future financial trends. It’s particularly useful because it combines advanced document understanding with artificial intelligence to provide personalized financial insights, all within a user-friendly interface.
README
# 🏦 AI Bank Statement Document Automation with LLM & Personal Financial Analysis [](https://www.python.org) [](https://opensource.org/licenses/Apache-2.0) [](https://github.com/ag2ai/ag2) **Automated extraction, structuring, RAG-powered querying, and AI-agent financial analysis of bank statement PDFs.** This project converts unstructured bank statement PDFs into structured data using computer vision (YOLO), OCR, and Large Language Models. It supports natural language queries and generates insightful monthly/yearly financial reports. --- ## ✨ Key Features - **Advanced Document Parsing** — Custom YOLOv8 layout detection + OCR + LLM table extraction - **RAG Pipeline** — Powerful retrieval-augmented generation with vector databases - **Autonomous AI Agents** — Built with **AG2** (migrated from pyautogen in Feb 2026) - **Financial Intelligence** — Income/expense categorization, trend analysis, monthly & yearly summaries - **Multimodal & Local LLM Support** — Works with Gemini, Ollama (Llama 3, Gemma 2, etc.) - **User Interface** — Streamlit web application (`apps.py`) - **Evaluation Framework** — DeepEval integration for RAG quality testing --- ## 🛠 Technology Stack - **Document Processing**: YOLOv8 (custom layout model), PyMuPDF, pytesseract, pymupdf4llm - **RAG & Vector Store**: LangChain, Chroma, Faiss - **Agent Framework**: **AG2** (latest) - **LLMs**: Google Gemini, Local models via Ollama - **Frontend**: Streamlit - **Analysis**: pandas, Plotly **Related Repo**: [YOLO Base Document Layout Detection](https://github.com/johnsonhk88/yolo-base-doc-layout-detection) --- ## 📁 Repository Structure ``` src/ ├── dev/ # Jupyter notebooks for development & testing │ ├── ai_bank_statement_dev.ipynb │ ├── ai_agent_dev.ipynb │ └── RAG_algorithm_test.ipynb ├── apps.py # Streamlit web application ├── bank-statement-document/ # Core processing scripts ├── yolo-base-layout-analysis/ ├── faiss_index/ & chroma_db/ ├── test-document/ # Sample PDFs for testing ├── *.sh # Installation & setup scripts ├── requirements.txt └── .env.example ``` --- ## 🚀 Quick Start ### 1. Clone & Setup ```bash git clone https://github.com/johnsonhk88/AI-Bank-Statement-Document-Automation-By-LLM-And-Personal-Finanical-Analysis-Prediction.git cd AI-Bank-Statement-Document-Automation-By-LLM-And-Personal-Finanical-Analysis-Prediction # Setup virtual environment and install dependencies ./src/build-python-virual-environment.sh ./src/activate_virual_environment.sh ./src/install-requirement.sh # Install Tesseract OCR (Ubuntu/Debian) ./src/install-pytesseract-for-linux.sh ``` ## Create a .env file and add your GOOGLE_API_KEY (for Gemini). ### 2. Run the Application #### Development Notebooks ```bash cd src/dev jupyter notebook ``` #### Streamlit Web UI ```bash cd src streamlit run apps.py ``` ## 📈 Recent Major Updates - Feb 24, 2026 — Full migration from pyautogen → AG2 agent framework - 2025 — Added advanced RAG pipeline, multimodal support, and DeepEval evaluation - Ongoing — Improving financial categorization and local LLM inference ## 🗺 Roadmap - Complete production-ready end-to-end pipeline - Advanced time-series forecasting for cash flow prediction - Multi-bank statement support with automatic categorization - Docker + API deployment - Rich interactive dashboard with more visualizations ------------------------------------------------------------------------ ## 📄 License ### This project is licensed under the Apache License 2.0. ------------------------------------------------------------------------- #### Made with ❤️ for personal finance automation in Hong Kong. #### ⭐ Star this repo if you find it useful! **Just copy the entire block above** and replace your current `README.md` file on GitHub. This is the **final, clean, and up-to-date version**. Push it and your project will look professional instantly! Want me to add screenshots, example queries, or a demo video section next? Just say the word! 🚀
PUBLIC HISTORY
IDENTITY
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
README BADGE
Add to your README:
