githubinferredactive
disasterman
provenance:github:Rajy777/disasterman
WHAT THIS AGENT DOES
Disasterman is an AI agent that simulates coordinating disaster relief efforts across multiple affected areas. It helps train AI to make smart decisions about where to send rescue teams and supplies, even when faced with misleading information and unexpected events. Emergency response organizations and researchers could use this tool to develop and test AI systems that can improve the speed and effectiveness of disaster relief operations.
README
--- title: Disaster Relief Coordination Env emoji: 🚑 colorFrom: red colorTo: red sdk: docker pinned: false license: mit short_description: Multi-zone disaster relief AI env for OpenEnv --- # Disaster Relief Coordination Env (DRC-Env) **Meta PyTorch OpenEnv Hackathon by Scaler — OpenEnv-compliant AI training environment** Every year, delayed disaster response costs lives. DRC-Env puts an AI agent in the role of an emergency coordinator managing real triage decisions: deploy rescue teams, route supplies, and call airlifts across multiple disaster zones — all under time pressure, resource scarcity, and deliberately injected false SOS signals designed to drain resources from zones that actually need them. Live demo (backend): **https://huggingface.co/spaces/krishpotanwar/disaster-relief-env** Frontend dashboard: **https://disasterman-scaler-demo.vercel.app** --- ## Why This Environment Is Novel Most benchmark environments test agents on clean, well-labeled observations. DRC-Env deliberately breaks that assumption in three ways: **1. False SOS signals that look real.** Zones H, I, and J in task_3 broadcast genuine-looking SOS signals with non-zero severity scores — but have zero casualties. There is no flag in the observation space that marks a signal as false; `is_false_sos` is hidden and only visible to the grader. An agent that cannot distinguish signal noise from genuine distress wastes scarce airlifts and misses the zones that matter. A reward penalty of −0.05 per false-SOS resource deployment reinforces this. **2. Cascading failures mid-episode.** At step 7 of task_3, a dam breaks and floods Zone E with 60 new casualties. Road conditions shift with weather. The agent must continuously replan — a static resource allocation computed at step 0 fails catastrophically. This tests generalization, not memorization. **3. Native PyTorch integration in the decision loop.** Stage 1 of the baseline pipeline is a trained PyTorch MLP (`ZoneScorerNet`) that runs inference every step at under 1ms. Its output — a priority score per zone — feeds directly into the LLM triage prompt. This is not a toy PyTorch import; it replaces what would otherwise require the LLM to reason from raw numbers, demonstrably improving score on task_2 and task_3. --- ## Tasks | Task | Name | Difficulty | Zones | Steps | Target Score | |------|------|-----------|-------|-------|--------------| | task_1 | Single Zone Flood Response | Easy | 1 | 10 | 0.70–0.85 | | task_2 | Multi-Zone Earthquake Response | Medium | 5 | 15 | 0.40–0.60 | | task_3 | Cyclone + Cascading Failures + False SOS | Hard | 10 | 20 | 0.20–0.40 | **task_3 in detail:** Zones H, I, J send plausible SOS signals with zero real casualties. At step 7 a dam breaks, flooding Zone E (+60 casualties). Weather shifts drive dynamic road blocks. Optimal play requires false-alarm detection, immediate dam-break redeployment, and airlift precision. --- ## Baseline Agent — 4-Stage PyTorch Pipeline The included baseline fulfills the hackathon's PyTorch requirement through a production-style architecture, not a trivial import: ``` Stage 1: PyTorch ZoneScorerNet — local MLP, <1ms inference, runs every step Stage 2: Triage Agent — LLM, false SOS detection + deadline alerts Stage 3: Planner Agent — LLM, 3-step lookahead resource allocation Stage 4: Action Agent — LLM + hard constraint validator + deterministic fallback ``` **ZoneScorerNet (Stage 1):** Architecture: MLP 6→16→1 with Sigmoid output. Input features: `severity`, `casualty_ratio`, `supply_ratio`, `road_blocked`, `unattended`, `time_pressure`. Trained on 50K synthetic examples generated from the environment's own reward dynamics. False SOS zones consistently score near 0.0 — the network learns to ignore severity noise when casualty and supply ratios are zero. Training time: ~8 seconds on CPU. Pre-trained weights ship inside the Docker image so judges see no setup friction. **Anti-hallucination strategy (Stage 4):** LLMs hallucinate invalid zone IDs and over-commit resources. Three layers prevent this: 1. Constraint injection — every prompt lists valid zone IDs, blocked roads, and exact resource counts 2. Post-LLM hard validator (`_validate_and_fix()`) — rejects any action that violates game rules 3. Deterministic fallback heuristic — executes if LLM output fails validation, ensuring the episode always completes **LLM:** `llama-3.3-70b-versatile` via Groq (free tier, high rate limits) --- ## Grader Coverage The `/grader` endpoint scores all 8 dimensions defined in the OpenEnv spec: | Dimension | How It Is Measured | |-----------|-------------------| | Task completion | Casualties rescued + supply gaps closed across all zones | | Resource efficiency | Penalty for idle teams, supply over-delivery, and false SOS waste | | Time performance | Urgency decay for high-severity zones left unattended | | Decision quality | Critical rescues (severity ≥ 0.75) as a fraction of total | | Adaptability | Score delta before vs. after the dam-break event (task_3) | | False signal handling | Resources deployed to `is_false_sos` zones (hidden grader field) | | Airlift precision | Airlifts used only on blocked + critical zones vs. total airlifts used | | Episode score | `(tanh(cumulative_reward / max_steps * 2) + 1) / 2` — normalized to [0, 1] | --- ## Observation Space ``` zones[]: zone_id string Zone identifier (A–J) casualties_remaining int Remaining casualties needing rescue supply_gap int Remaining supply deficit severity float [0,1] Urgency score (hides casualties_critical) road_blocked bool Whether ground deployment is blocked teams_present int Rescue teams currently in zone sos_active bool SOS signal active (real OR false — indistinguishable) resources: teams_available int Teams at HQ ready to deploy supply_stock int Supply units available airlifts_remaining int Airlifts left (scarce) teams_in_transit dict[str,int] Teams currently traveling (1-step delay) step_number int steps_remaining int weather string [clear, storm, flood] last_action_result string [success, invalid, blocked, insufficient_resources, none] ``` **Hidden from agent, visible to grader only:** `casualties_critical`, `is_false_sos` --- ## Action Space | Action | Parameters | Description | |--------|-----------|-------------| | `deploy_team` | `to_zone`, `units` | Move teams to zone. Fails if road blocked. | | `send_supplies` | `to_zone`, `units` | Route supply units to zone. | | `airlift` | `to_zone`, `type` (rescue/supply) | Bypass road block. Consumes 1 airlift. | | `recall_team` | `from_zone`, `units` | Pull teams back to HQ (1-step transit delay). | | `wait` | — | Do nothing. Penalized every step. | --- ## Reward Function ``` step_reward = clamp(R_positive - R_negative, -1.0, 1.0) episode_score = (tanh(cumulative_reward / max_steps * 2) + 1) / 2 ``` | Component | Weight | Description | |-----------|--------|-------------| | rescue progress | +0.40 | Normalized casualties rescued | | supply gap closed | +0.20 | Supply deficit addressed | | zone completion | +0.15 | Zone fully rescued + supplied | | critical rescues | +0.15 | Rescues from severity ≥ 0.75 zones | | airlift precision | +0.10 | Smart airlift use on blocked+critical zones | | critical deaths | −0.40 | Critical casualties expired | | urgency decay | −0.15 | High-severity zones left unattended | | overcommitment | −0.10 | Teams idling in completed zones | | supply waste | −0.05 | Over-delivery of supplies | | false SOS | −0.05 | Resources deployed to false alarm zones | | wait penalty | −0.05 | Flat penalty per wait action | --- ## Setup & Usage ### Prerequisites - Python 3.11+ - `GROQ_API_KEY` (free at [console.groq.com](https:// [truncated…]
PUBLIC HISTORY
First discoveredMar 28, 2026
IDENTITY
inferred
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
platformgithub
first seenMar 26, 2026
last updatedMar 27, 2026
last crawledtoday
version—
README BADGE
Add to your README:
