githubinferredactive
AI_agents
provenance:github:Sakshi3027/AI_agents
Multi-agent AI system for automated data analysis using CrewAI & GPT-4. Features intelligent agents for EDA, visualization, and report generation.
README
# AutoAnalyst: Multi-Agent Data Science Assistant
An intelligent multi-agent system that autonomously performs end-to-end data analysis, ML model training, and report generation — powered by CrewAI and GPT-4.
---
## What It Does
You upload any CSV file. The system automatically:
- Analyzes the data (EDA, statistics, correlations)
- Trains and compares multiple ML models
- Generates interactive visualizations
- Produces a full written report
No manual steps. No code required from the end user.
---
## System Architecture
The system is organized into 5 phases, each adding a new layer of capability:
**Phase 1 — Data Analysis Pipeline**
5 specialized AI agents run sequentially, each passing output to the next:
1. Data Loader Agent — validates data quality and structure
2. EDA Specialist Agent — performs statistical analysis
3. Visualization Expert Agent — generates and interprets charts
4. Insights Analyst Agent — identifies patterns and recommendations
5. Report Writer Agent — produces a professional markdown report
**Phase 2 — Machine Learning Pipeline**
4 ML agents handle automated modeling:
- Feature Engineer Agent — selects and transforms features
- Model Selector Agent — chooses appropriate algorithms
- Model Trainer Agent — trains Logistic Regression, Decision Tree, Random Forest, Gradient Boosting
- Model Evaluator Agent — compares models on Accuracy, Precision, Recall, F1, ROC-AUC
**Phase 3 — Streamlit Web Dashboard**
Interactive UI built with Streamlit:
- Drag-and-drop CSV upload
- Real-time data exploration with Plotly charts
- One-click ML model training with live progress
- Results dashboard with model comparison and feature importance
**Phase 4 — Advanced ML**
- AutoML with Optuna for hyperparameter tuning — achieved 37.1% F1 improvement (0.30 to 0.41)
- SHAP explainability — shows why the model made each prediction
- Ensemble methods — Voting and Stacking classifiers
**Phase 5 — Production API**
- FastAPI REST API deployed on Render.com
- Docker containerized for consistent deployment
- Swagger UI documentation at `/docs`
- Response time under 200ms
- Supports single prediction and batch CSV prediction
---
## ML Results (Healthcare Dataset)
Trained and evaluated on 500 patient records predicting heart disease:
| Model | Accuracy | F1-Score | ROC-AUC |
|---------------------|----------|----------|---------|
| Logistic Regression | 0.75 | 0.67 | 0.82 |
| Decision Tree | 0.72 | 0.64 | 0.78 |
| Random Forest | 0.81 | 0.75 | 0.88 |
| Gradient Boosting | 0.79 | 0.72 | 0.86 |
**Best Model:** Random Forest — auto-selected based on F1-Score
**Top Predictors (via SHAP):** Cholesterol, Age, Blood Pressure
**AutoML Result:** Random Forest F1 improved from 0.30 to 0.41 (+37.1%) using Optuna
---
## API Endpoints
| Method | Endpoint | Description |
|--------|-----------------|------------------------------------|
| GET | / | Welcome message and API info |
| GET | /health | Health check and model status |
| POST | /predict | Single patient prediction |
| POST | /batch-predict | Batch predictions from CSV |
| GET | /stats | Usage statistics |
| GET | /model-info | Model details and features |
| GET | /docs | Interactive Swagger UI |
Live API: `https://autoanalyst-api.onrender.com`
---
## Tech Stack
| Layer | Technology |
|----------------|-------------------------------------------------|
| AI Agents | CrewAI |
| Language Model | OpenAI GPT-4o-mini |
| ML | Scikit-learn, XGBoost, LightGBM |
| AutoML | Optuna |
| Explainability | SHAP |
| API | FastAPI |
| Frontend | Streamlit |
| Visualization | Plotly, Matplotlib, Seaborn |
| Data | Pandas, NumPy |
| Deployment | Docker, Docker Compose, Render.com |
| CI/CD | GitHub Actions |
| Language | Python 3.12 |
---
## Screenshots
### Web Dashboard
#### Home Page

#### ML Training in Progress

#### Model Performance Table

#### Results & Insights

---
### Data Analysis Outputs
#### Distribution Analysis

#### Correlation Heatmap

#### Categorical Analysis

---
### ML Results
#### Model Performance Comparison

#### Confusion Matrices

#### ROC Curves

#### Feature Importance (Random Forest)

#### Full Analysis Collage

---
## Quick Start
### Prerequisites
- Python 3.8+
- OpenAI API key
### Installation
```bash
git clone https://github.com/Sakshi3027/AutoAnalyst.git
cd AutoAnalyst
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Add your OPENAI_API_KEY to .env
```
### Run the Web Dashboard (Recommended)
```bash
streamlit run streamlit_app.py
```
Opens at `http://localhost:8501`
**Steps:**
1. Click "Load Sample Data" in the sidebar
2. Go to "Data Analysis" to explore
3. Go to "ML Pipeline" and click "Train Models"
4. View "Results" for full analysis
### Run Data Analysis Only
```bash
python main.py
```
Output: `outputs/analysis_report.md`
### Run ML Pipeline Only
```bash
python main_ml.py
```
Output: `outputs/ml_analysis_report.md` and `outputs/best_model.pkl`
### Run Advanced ML (AutoML + SHAP)
```bash
python main_advanced.py
```
---
## Docker Deployment
```bash
# Build
docker build -t autoanalyst .
# Run
docker run -p 8000:8000 --env-file .env autoanalyst
# Or with docker-compose
docker-compose up -d
```
---
## Use Saved Model
```python
import joblib
import pandas as pd
model = joblib.load('outputs/best_model.pkl')
new_patient = pd.DataFrame({
'age': [55],
'bmi': [28.5],
'blood_pressure_systolic': [140],
'blood_pressure_diastolic': [90],
'cholesterol': [220],
'glucose': [120],
'exercise_hours_per_week': [3],
'gender_Male': [1],
'smoker_Yes': [1]
})
prediction = model.predict(new_patient)
probability = model.predict_proba(new_patient)
print(f"Prediction: {'Heart Disease' if prediction[0] == 1 else 'No Heart Disease'}")
print(f"Probability: {probability[0][1]:.2%}")
```
---
## Project Structure
```
AI_agents/
├── agents/ # 12 AI agent definitions
│ ├── data_loader_agent.py
│ ├── eda_agent.py
│ ├── visualization_agent.py
│ ├── insight_agent.py
│ ├── report_agent.py
│ ├── feature_engineer_agent.py
│ ├── model_selector_agent.py
│ ├── model_trainer_agent.py
│ ├── model_evaluator_agent.py
│ ├── automl_agent.py
│ ├── explainability_agent.py
│ └── deep_learning_agent.py
├── utils/
│ ├── data_utils.py # EDA and visualization helpers
│ └── ml_utils.py # ML training and evaluation
├── data/
│ └── healthcare_data.csv # Sample dataset (500 records)
├── outputs/ # Generated reports and models
├── screenshots/ # UI screenshots
├── api
[truncated…]PUBLIC HISTORY
First discoveredMar 21, 2026
IDENTITY
inferred
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
platformgithub
first seenOct 19, 2025
last updatedMar 9, 2026
last crawled20 days ago
version—
README BADGE
Add to your README:
