AGENTS / GITHUB / TwoShakes
githubinferredactive

TwoShakes

provenance:github:Steve-Git9/TwoShakes
WHAT THIS AGENT DOES

TwoShakes is an AI tool that cleans up messy data, transforming it into a format ready for analysis. It helps businesses avoid errors and wasted time by automatically identifying and correcting issues like missing information, duplicates, and unusual values. Data analysts, business intelligence professionals, and anyone working with spreadsheets or data files would find this tool valuable. What sets TwoShakes apart is that it always asks for your approval before making any changes, ensuring you remain in control of your data. It’s like having a helpful assistant that prepares your data for you, but you get to review and approve every step.

View Source ↗First seen 1mo agoNot yet hireable
README
# Two Shakes Data Cleaning
<img src="https://raw.githubusercontent.com/Steve-Git9/TwoShakes/main/frontend/static/tsLogo.png" width="150"/>

AI-Powered Data Preparation: From Messy to Analysis-Ready in **Two Shakes of a Lamb's tail**

![Python](https://img.shields.io/badge/Python-3.11+-blue)
![Azure](https://img.shields.io/badge/Azure-Deployed-0078D4)
![Microsoft Foundry](https://img.shields.io/badge/Microsoft_Foundry-Powered-purple)
![Agent Framework](https://img.shields.io/badge/Azure_AI_Agents-azure--ai--projects-green)
![MCP](https://img.shields.io/badge/MCP-Server_Enabled-orange)
![License](https://img.shields.io/badge/License-MIT-green)

> 🏗️ Built for the **Microsoft Purpose-Built AI Platform Hackathon**
> Category: **Best Use of Microsoft Foundry** · Also targeting: **Best Multi-Agent System** · **Best Enterprise Solution**

---

# Azure Deployment Link

https://dataprepagent-499e361a.azurewebsites.net/

---

## DEMO VIDEO

https://youtu.be/wbrhYIQtNJQ

---

<img src="https://raw.githubusercontent.com/Steve-Git9/TwoShakes/main/docs/gif_TS.gif" width="600"/>

---

## How It Works in 60 Seconds

1. **Upload** any messy data file — CSV, Excel, JSON, XML, or even a PDF with tables
2. **AI profiles** every column: detects types, missing values, outliers, duplicates, and scores quality 0–100
3. **Review a cleaning plan** — approve, reject, or tweak each AI-proposed action before anything runs
4. **Optionally prepare for ML** — the AI recommends encoding, scaling, and feature transforms tailored to your data
5. **Download** your analysis-ready dataset as CSV, Excel, or Parquet

The LLM decides *what* to fix. Python executes it deterministically. Your data, your call: nothing changes without your approval.

---

## Microsoft Hero Technologies — Where & How They're Used

This section maps every required hackathon technology to the exact source files that implement it.

### ☁️ Microsoft Foundry (Azure AI Foundry)

All LLM calls in DataPrepAgent go through models hosted on Microsoft Foundry. The single LLM client in [`src/agents/__init__.py`](src/agents/__init__.py) connects to the Foundry endpoint using the `AZURE_AI_PROJECT_ENDPOINT` and `AZURE_AI_MODEL_DEPLOYMENT_NAME` environment variables. Three agents make LLM calls — the Profiler (semantic analysis), the Strategy Agent (cleaning plan generation), and the Validator (quality certificate) — plus the Feature Engineering Agent for ML recommendations. Azure Foundry's built-in content filters are active on every call.

**Key code:**
```python
# src/agents/__init__.py — AgentClient constructor
client = AIProjectClient(
    endpoint=os.getenv("AZURE_AI_PROJECT_ENDPOINT"),
    credential=AzureKeyCredential(os.getenv("AZURE_AI_PROJECT_KEY"))
)
# Creates Azure AI Agent with per-call threads
agent = client.agents.create_agent(
    model=os.getenv("AZURE_AI_MODEL_DEPLOYMENT_NAME"),
    name=self.name,
    instructions=self.instructions
)
```

### 🤖 Microsoft Agent Framework (`azure-ai-projects`)

The [`AgentClient`](src/agents/__init__.py) class uses `azure.ai.projects.AIProjectClient` as its tier-1 backend — the actual Microsoft Agent Framework SDK. It creates real Azure AI Agents with per-call threads and message-based conversations. If the SDK is unavailable (e.g., in environments without the preview package), it falls back gracefully to `openai.AzureOpenAI` pointing at the same Foundry-hosted model. Every agent in the system (Profiler, Strategy, Cleaner, Validator, Feature Engineering, Feature Transformer) uses this single client.

**Files:** [`src/agents/__init__.py`](src/agents/__init__.py) · [`src/agents/orchestrator_agent.py`](src/agents/orchestrator_agent.py) · [`src/agents/profiler_agent.py`](src/agents/profiler_agent.py) · [`src/agents/strategy_agent.py`](src/agents/strategy_agent.py) · [`src/agents/validator_agent.py`](src/agents/validator_agent.py) · [`src/agents/feature_engineering_agent.py`](src/agents/feature_engineering_agent.py) · [`src/agents/feature_transformer_agent.py`](src/agents/feature_transformer_agent.py)

### 🔌 MCP Server (7 tools)

[`src/mcp_server.py`](src/mcp_server.py) exposes the full pipeline as 7 MCP tools via stdio transport. Any MCP-compatible client — including **GitHub Copilot Agent Mode** — can call these tools programmatically. The tools are: `profile_data`, `suggest_cleaning_plan`, `clean_data`, `validate_cleaning`, `list_supported_formats`, `recommend_feature_engineering`, `apply_feature_engineering`.

### 🧑‍💻 GitHub Copilot Agent Mode

The repo includes a [`.vscode/mcp.json`](.vscode/mcp.json) configuration file that registers DataPrepAgent's MCP server as a tool source for GitHub Copilot Agent Mode in VS Code. With this config, a developer can ask Copilot: *"Profile the file test_data/messy_sales.csv and suggest a cleaning plan"* and Copilot will call the MCP tools automatically.

```json
// .vscode/mcp.json — already in the repo
{
  "servers": {
    "dataprepagent": {
      "command": "python",
      "args": ["src/mcp_server.py"],
      "env": { "AZURE_AI_PROJECT_ENDPOINT": "...", "AZURE_AI_PROJECT_KEY": "...", "AZURE_AI_MODEL_DEPLOYMENT_NAME": "gpt-4o-mini" }
    }
  }
}
```

### 📄 Azure AI Document Intelligence

[`src/parsers/pdf_parser.py`](src/parsers/pdf_parser.py) uses Azure AI Document Intelligence's `prebuilt-layout` model to extract tables from PDF files and scanned images. This is a second Azure AI service beyond the LLM, demonstrating multi-service integration on the Azure platform.

### ☁️ Azure App Service (Deployment)

[`infra/deploy.sh`](infra/deploy.sh) provides one-command deployment to Azure App Service. The script creates a resource group, App Service plan (B1 Linux), web app with Python 3.11 runtime, configures all environment variables, and deploys the code. [`startup.sh`](startup.sh) runs Streamlit on port 8000 for the Azure container. Full step-by-step instructions in [`infra/azure-deployment.md`](infra/azure-deployment.md).

---

## Architecture — 8-Agent Orchestrated Pipeline

![Architecture](docs/architecture.png)

**Agentic design patterns used:**
- **Multi-agent collaboration**: 8 specialized agents, each with a single responsibility
- **Agent-to-agent messaging**: Orchestrator sends structured `AgentMessage` objects to sub-agents
- **Orchestrator supervisor**: Central coordinator drives the pipeline, manages state, handles errors
- **Self-healing retry loop**: If quality score < target after cleaning, Orchestrator re-runs Strategy + Cleaner (up to 2 retries)
- **Human-in-the-loop checkpoints**: Pipeline pauses twice for user approval (cleaning plan + FE plan)
- **Tool-using agents**: MCP server exposes all agent capabilities as callable tools
- **Deterministic execution**: LLM reasons about *what* to do; Python code executes it. No AI-generated data values.

---

## The Problem

Data scientists spend **60–80% of their time** on data cleaning and preparation. Messy CSVs with mixed date formats, Excel exports with merged cells, nested JSON APIs with missing fields — every dataset needs hours of manual wrangling before any real analysis can begin.

## The Solution

DataPrepAgent automates the entire data preparation pipeline using **8 AI agents** orchestrated by a supervisor. Upload a messy file, get a detailed quality report, review the AI-generated cleaning plan action by action, then optionally apply ML feature engineering — all in minutes.

---

## What Makes This Different

**🧠 LLM reasons, Python executes.**
The model analyzes your data and proposes a plan. But actual transformations are deterministic pandas and scikit-learn functions. The AI never generates or modifies data values directly — no hallucinated data, no surprises.

**👤 Human-in-the-loop at every decision point.**
Both the cleaning plan and the feature engineering plan are presented as reviewable lists. Toggle each action on or off. Edit fill strategies. Change scaling methods. Nothing runs until you approve it.

**🔄 Self-healing pipeline.**
If the cleaned data

[truncated…]

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub
first seenMar 7, 2026
last updatedMar 14, 2026
last crawled18 days ago
version

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:Steve-Git9/TwoShakes)