SIMA2-Agent

provenance:github:hemantjuyal/SIMA2-Agent

WHAT THIS AGENT DOES

SIMA2-Agent is a framework designed to create intelligent agents capable of interacting with simulated environments. It's built to be flexible and adaptable, allowing developers to easily experiment with different approaches to agent design. The agent uses a 'Perceive-Think-Act' cycle, leveraging both vision and language models to understand its surroundings and make decisions. This framework is particularly useful for researchers and developers working on artificial intelligence and reinforcement learning. It provides a structured foundation for building agents that can reason and act in complex virtual worlds. The modular design makes it easy to customize and extend the agent's capabilities.

PROBLEM IT SOLVES

This agent solves the challenge of building complex, multi-modal AI agents for Gymnasium environments, streamlining the development process. Instead of manually coding intricate reasoning loops and integrating various models, developers can leverage SIMA2-Agent's pre-built architecture and modular components to rapidly prototype and test new agent designs.

View Source ↗First seen 6mo agoNot yet hireable

CAPABILITIES & CONSTRAINTS

TECH & STACK

pythongymnasiumvlmllmreinforcement-learningmodularaiagent

README

# SIMA2-Agent

A modular, multi-modal agent framework for Gymnasium environments, inspired by generalist agent architectures like Google's SIMA. This project provides a solid, extensible foundation for developing and experimenting with intelligent agents that can perceive, think, and act in various simulated worlds.

## Key Features

-   **Modular Architecture:** Core components (`Agents`, `Environments`, `Memory`, `Runtimes`) are fully decoupled and managed by factories, allowing for easy extension and swapping of implementations.
-   **Perceive-Think-Act Cycle:** The agent logic is built around an explicit reasoning loop. It first perceives the world (using a VLM), then generates an explicit "thought" about its strategy before finally selecting an action.
-   **Multi-Modal Intelligence:** Natively supports using a Vision-Language Model (VLM) for structured scene analysis and a separate Large-Language Model (LLM) for reasoning and decision-making.
-   **Extensible by Design:** Adding a new Gym environment or a new agent with a different reasoning architecture is a matter of creating a new module that adheres to the base abstract classes.
-   **Configurable:** All key parameters, including models, environment names, and run settings, are managed through a central `.env` file.

## Architecture Flow Diagram

The agent's architecture is designed around a clean orchestration of modular components. The `main.py` entry point assembles an `AgentContext` and passes it to the agent to run. The agent then enters the `Perceive-Think-Act` loop.

```mermaid
graph TD
    subgraph Initialization
        A[main.py] --> B{factories};
        B --> C[Environments];
        B --> D[Runtimes];
        B --> E[Memory];
        C -- configures --> E;
        A -- assembles --> F[AgentContext];
    end

    subgraph "Execution"
        A --> G{create_agent};
        G --> H[Agent];
        H -- uses --> F;
    end
    
    subgraph "Perceive-Think-Act Loop"
        I(Agent.run) -- image --> J(VLM Runtime);
        J -- structured JSON --> I;
        I -- prompt --> K(LLM Runtime);
        K -- JSON with thought & action --> I;
        I -- action --> L(Gym Environment);
        L -- observation & reward --> I;
        I -- experience --> M(Memory System);
        M -- summary --> I;
    end

    F --> I;
```

## Getting Started

### 1. Installation

This project uses `uv` for fast environment and package management.

```bash
# Navigate to the project root directory (SIMA2-Agent)
cd SIMA2-Agent

# Create a virtual environment
uv venv

# Activate the virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate

# Install the required dependencies
uv pip install -r gsima-agent/requirements.txt
```

### 2. Configuration

All configuration is handled in the `gsima-agent/configs/` directory.

```bash
# Navigate to the configs directory
cd gsima-agent/configs

# Copy the example .env file
cp .env.example main.env
```
Now, open `main.env` and edit the variables as needed. You will need to specify the `GYM_ENVIRONMENT` you want to run and the Ollama models (`OLLAMA_VLM_MODEL`, `OLLAMA_LLM_MODEL`) you have available.

### 3. Running the Agent

Ensure your Ollama server is running. Then, from the root `SIMA2-Agent` directory, run the main script:

```bash
python gsima-agent/gsima/main.py
```

Logs for the agent run will be stored in `gsima-agent/outputs/logs/`.

### 4. Running Tests

The test suite validates the modular architecture and agent logic.

```bash
# First, install the testing framework (if you haven't already)
uv pip install pytest

# Run the tests from the root SIMA2-Agent directory
uv run pytest gsima-agent/tests/
```

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenJan 17, 2026

last updatedMar 6, 2026

last crawled14 days ago

version—

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:hemantjuyal/SIMA2-Agent)