disasterman

provenance:github:Rajy777/disasterman

WHAT THIS AGENT DOES

The Disaster Relief Coordination Environment (DRC-Env) is an AI training environment designed to simulate the challenges of coordinating disaster relief efforts across multiple zones. It places an AI agent in the role of an emergency coordinator, requiring them to make critical decisions about resource allocation, rescue deployments, and airlifts under pressure. The environment introduces realistic complexities like false distress signals, cascading failures, and dynamic conditions, forcing agents to adapt and prioritize effectively. This environment is ideal for researchers and developers working on reinforcement learning and AI agents capable of handling complex, real-world scenarios. It provides a challenging benchmark for evaluating agent performance in a dynamic and unpredictable setting. The inclusion of a pre-trained PyTorch model enhances the agent's decision-making capabilities, improving overall efficiency and accuracy.

PROBLEM IT SOLVES

DRC-Env addresses the problem of inefficient and delayed disaster response, which often results in preventable loss of life. It provides a platform to train AI agents to make better triage decisions than humans can under the same time and resource constraints, especially when dealing with misleading information and rapidly changing circumstances.

View Source ↗First seen 3mo agoNot yet hireable

CAPABILITIES & CONSTRAINTS

TECH & STACK

pythonpytorchaireinforcement-learningdisaster-reliefopenenvllm

README

---
title: Disaster Relief Coordination Env
emoji: 🚑
colorFrom: red
colorTo: red
sdk: docker
pinned: false
license: mit
short_description: Multi-zone disaster relief AI env for OpenEnv
---

# Disaster Relief Coordination Env (DRC-Env)

**Meta PyTorch OpenEnv Hackathon by Scaler — OpenEnv-compliant AI training environment**

Every year, delayed disaster response costs lives. DRC-Env puts an AI agent in the role of an emergency coordinator managing real triage decisions: deploy rescue teams, route supplies, and call airlifts across multiple disaster zones — all under time pressure, resource scarcity, and deliberately injected false SOS signals designed to drain resources from zones that actually need them.

Live demo (backend): **https://huggingface.co/spaces/krishpotanwar/disaster-relief-env**

Frontend dashboard: **https://disasterman-scaler-demo.vercel.app**

---

## Why This Environment Is Novel

Most benchmark environments test agents on clean, well-labeled observations. DRC-Env deliberately breaks that assumption in three ways:

**1. False SOS signals that look real.**
Zones H, I, and J in task_3 broadcast genuine-looking SOS signals with non-zero severity scores — but have zero casualties. There is no flag in the observation space that marks a signal as false; `is_false_sos` is hidden and only visible to the grader. An agent that cannot distinguish signal noise from genuine distress wastes scarce airlifts and misses the zones that matter. A reward penalty of −0.05 per false-SOS resource deployment reinforces this.

**2. Cascading failures mid-episode.**
At step 7 of task_3, a dam breaks and floods Zone E with 60 new casualties. Road conditions shift with weather. The agent must continuously replan — a static resource allocation computed at step 0 fails catastrophically. This tests generalization, not memorization.

**3. Native PyTorch integration in the decision loop.**
Stage 1 of the baseline pipeline is a trained PyTorch MLP (`ZoneScorerNet`) that runs inference every step at under 1ms. Its output — a priority score per zone — feeds directly into the LLM triage prompt. This is not a toy PyTorch import; it replaces what would otherwise require the LLM to reason from raw numbers, demonstrably improving score on task_2 and task_3.

---

## Tasks

| Task | Name | Difficulty | Zones | Steps | Target Score |
|------|------|-----------|-------|-------|--------------|
| task_1 | Single Zone Flood Response | Easy | 1 | 10 | 0.70–0.85 |
| task_2 | Multi-Zone Earthquake Response | Medium | 5 | 15 | 0.40–0.60 |
| task_3 | Cyclone + Cascading Failures + False SOS | Hard | 10 | 20 | 0.20–0.40 |

**task_3 in detail:** Zones H, I, J send plausible SOS signals with zero real casualties. At step 7 a dam breaks, flooding Zone E (+60 casualties). Weather shifts drive dynamic road blocks. Optimal play requires false-alarm detection, immediate dam-break redeployment, and airlift precision.

---

## Baseline Agent — 4-Stage PyTorch Pipeline

The included baseline fulfills the hackathon's PyTorch requirement through a production-style architecture, not a trivial import:

```
Stage 1: PyTorch ZoneScorerNet  — local MLP, <1ms inference, runs every step
Stage 2: Triage Agent           — LLM, false SOS detection + deadline alerts
Stage 3: Planner Agent          — LLM, 3-step lookahead resource allocation
Stage 4: Action Agent           — LLM + hard constraint validator + deterministic fallback
```

**ZoneScorerNet (Stage 1):**
Architecture: MLP 6→16→1 with Sigmoid output. Input features: `severity`, `casualty_ratio`, `supply_ratio`, `road_blocked`, `unattended`, `time_pressure`. Trained on 50K synthetic examples generated from the environment's own reward dynamics. False SOS zones consistently score near 0.0 — the network learns to ignore severity noise when casualty and supply ratios are zero. Training time: ~8 seconds on CPU. Pre-trained weights ship inside the Docker image so judges see no setup friction.

**Anti-hallucination strategy (Stage 4):**
LLMs hallucinate invalid zone IDs and over-commit resources. Three layers prevent this:
1. Constraint injection — every prompt lists valid zone IDs, blocked roads, and exact resource counts
2. Post-LLM hard validator (`_validate_and_fix()`) — rejects any action that violates game rules
3. Deterministic fallback heuristic — executes if LLM output fails validation, ensuring the episode always completes

**LLM:** `llama-3.3-70b-versatile` via Groq (free tier, high rate limits)

---

## Grader Coverage

The `/grader` endpoint scores all 8 dimensions defined in the OpenEnv spec:

| Dimension | How It Is Measured |
|-----------|-------------------|
| Task completion | Casualties rescued + supply gaps closed across all zones |
| Resource efficiency | Penalty for idle teams, supply over-delivery, and false SOS waste |
| Time performance | Urgency decay for high-severity zones left unattended |
| Decision quality | Critical rescues (severity ≥ 0.75) as a fraction of total |
| Adaptability | Score delta before vs. after the dam-break event (task_3) |
| False signal handling | Resources deployed to `is_false_sos` zones (hidden grader field) |
| Airlift precision | Airlifts used only on blocked + critical zones vs. total airlifts used |
| Episode score | `(tanh(cumulative_reward / max_steps * 2) + 1) / 2` — normalized to [0, 1] |

---

## Observation Space

```
zones[]:
  zone_id               string         Zone identifier (A–J)
  casualties_remaining  int            Remaining casualties needing rescue
  supply_gap            int            Remaining supply deficit
  severity              float [0,1]    Urgency score (hides casualties_critical)
  road_blocked          bool           Whether ground deployment is blocked
  teams_present         int            Rescue teams currently in zone
  sos_active            bool           SOS signal active (real OR false — indistinguishable)

resources:
  teams_available       int            Teams at HQ ready to deploy
  supply_stock          int            Supply units available
  airlifts_remaining    int            Airlifts left (scarce)
  teams_in_transit      dict[str,int]  Teams currently traveling (1-step delay)

step_number             int
steps_remaining         int
weather                 string         [clear, storm, flood]
last_action_result      string         [success, invalid, blocked, insufficient_resources, none]
```

**Hidden from agent, visible to grader only:** `casualties_critical`, `is_false_sos`

---

## Action Space

| Action | Parameters | Description |
|--------|-----------|-------------|
| `deploy_team` | `to_zone`, `units` | Move teams to zone. Fails if road blocked. |
| `send_supplies` | `to_zone`, `units` | Route supply units to zone. |
| `airlift` | `to_zone`, `type` (rescue/supply) | Bypass road block. Consumes 1 airlift. |
| `recall_team` | `from_zone`, `units` | Pull teams back to HQ (1-step transit delay). |
| `wait` | — | Do nothing. Penalized every step. |

---

## Reward Function

```
step_reward    = clamp(R_positive - R_negative, -1.0, 1.0)
episode_score  = (tanh(cumulative_reward / max_steps * 2) + 1) / 2
```

| Component | Weight | Description |
|-----------|--------|-------------|
| rescue progress | +0.40 | Normalized casualties rescued |
| supply gap closed | +0.20 | Supply deficit addressed |
| zone completion | +0.15 | Zone fully rescued + supplied |
| critical rescues | +0.15 | Rescues from severity ≥ 0.75 zones |
| airlift precision | +0.10 | Smart airlift use on blocked+critical zones |
| critical deaths | −0.40 | Critical casualties expired |
| urgency decay | −0.15 | High-severity zones left unattended |
| overcommitment | −0.10 | Teams idling in completed zones |
| supply waste | −0.05 | Over-delivery of supplies |
| false SOS | −0.05 | Resources deployed to false alarm zones |
| wait penalty | −0.05 | Flat penalty per wait action |

---

## Setup & Usage

### Prerequisites

- Python 3.11+
- `GROQ_API_KEY` (free at [console.groq.com](https://

[truncated…]

PUBLIC HISTORY

First discoveredMar 28, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenMar 26, 2026

last updatedMar 27, 2026

last crawled2 months ago

version—

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:Rajy777/disasterman)