orbit

provenance:github:aadya940/orbit

WHAT THIS AGENT DOES

Orbit is a collection of composable building blocks designed for creating Computer Use AI Agents. It is built primarily in Python and focuses on enabling automation across various operating systems. The project supports browser automation and desktop automation, allowing developers to construct agents capable of interacting with computers. Orbit is useful for those seeking to build sophisticated AI agents that can perform tasks automatically. With 4 GitHub stars, it represents a growing resource in the agentic AI space.

PROBLEM IT SOLVES

Orbit solves the problem of building complex AI agents by providing reusable components, simplifying the development process. Developers can use Orbit instead of manually scripting automation tasks, saving time and effort.

View Source ↗First seen 4mo agoNot yet hireable

CAPABILITIES & CONSTRAINTS

TECH & STACK

pythonagentic-aiautomationbrowser-automationcomputer-useai-agent

README

<p align="center">
  <img src="logo.png" alt="Orbit logo">
</p>

<p align="center">
  <strong>Autonomous agents are demos. Controlled agents are products.</strong>
</p>

---

<p align="center">
  <a href="https://youtu.be/nll7Mmzwh00">
    <img src="demo_preview.svg" width="720" alt="Watch Orbit in action">
  </a>
</p>

---

## The problem

AI agents can use computers now.

But in practice:
- they loop
- they click the wrong thing
- they get stuck on simple steps
- they're impossible to steer mid-task

Most frameworks either hide everything in a black box, or hand you raw tools with no structure.

Neither works in production.


## Orbit

Natural language controls the screen.  
Python controls the flow.

Instead of one monolithic agent, Orbit breaks execution into **independent steps**:

`Do` · `Read` · `Check` · `Navigate` · `Fill`

Each step runs its own model, has its own budget, and returns typed output. All steps share context.


## Why this matters

- Use a cheap model for simple clicks, a powerful one for complex reasoning
- Cap LLM calls per step , nothing runs forever
- Inject guidance mid-execution when the agent is struggling
- Extract structured data directly into Pydantic models
- Toggle `planner=False` for low-latency direct execution

This turns agents from **demos into usable systems**.


## Key difference

Most agents see pixels.

**Orbit sees the UI.**

It reads the OS accessibility tree , no screenshots, no DOM hacks. Works across desktop apps and browsers with lower token usage.


## Quickstart

```bash
pip install orbit-cua
```

```python
from dotenv import load_dotenv
load_dotenv()

from orbit import Agent
import asyncio

async def main():
    result = await Agent(
        task="Open Chrome and go to Wikipedia",
        llm="gemini-3-pro-preview",
        verbose=True,
    ).run()
    print(result.status)

asyncio.run(main())
```

Set your API key , Orbit supports any model via [LiteLLM](https://docs.litellm.ai/):

```bash
export GEMINI_API_KEY="your-key"   # or OPENAI_API_KEY / ANTHROPIC_API_KEY
```


## Composable SDK

When you need precision, drop to the SDK:

```python
from dotenv import load_dotenv
load_dotenv()


from orbit import Do, Read, Check, Navigate, session
from pydantic import BaseModel
import asyncio

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool

class ProductList(BaseModel):
    products: list[Product]

async def main():
    action_model = "gemini-3-flash-preview"

    async with session() as s:
        await Navigate(
            "https://www.amazon.com/s?k=mechanical+keyboard",
            session=s, llm=action_model, max_steps=30, planner=False,
            extra_info="Avoid bookmark bar links; use direct navigation tools first.",
            verbose=True,
        ).run()

        if await Check(
            "The current page is a Captcha page and `Continue Shopping` button is visible",
            session=s, llm=action_model, max_steps=30, planner=False,
        ).check():
            await Do(
                "Click `Continue Shopping`, then solve the Captcha.",
                session=s, llm=action_model, max_steps=30,
            ).run()

        products = await Read(
            "All search results",
            schema=ProductList,
            session=s, llm=action_model, max_steps=30, verbose=True,
        ).run()

        cheapest = min(products.output.products, key=lambda p: p.price)

        await Do(f"click on '{cheapest.name}'", session=s, llm=action_model, max_steps=30).run()

        if await Check("Add to Cart button is visible", session=s, llm=action_model, max_steps=30).check():
            await Do("click Add to Cart", session=s, llm=action_model, max_steps=30).run()

asyncio.run(main())
```


## The idea

Agents shouldn't be one giant prompt.

They should be composable systems.

Orbit gives you:
- **verbs** instead of prompts
- **steps** instead of guesswork
- **control** instead of hope


## Custom actions

Build reusable, domain-specific actions by subclassing `BaseActionAgent`:

```python
from dotenv import load_dotenv
load_dotenv()

from orbit import BaseActionAgent, Navigate, session
from pydantic import BaseModel
import asyncio

class ProductList(BaseModel):
    products: list[dict]

class ReadTopProducts(BaseActionAgent):
    def __init__(self, category: str, **kw):
        super().__init__(max_steps=12, planner=False, **kw)
        self.category = category

    def task_prompt(self) -> str:
        return (
            f"Read top products for '{self.category}' from the current page. "
            "Extract name, price, and stock status only. Do not click or navigate."
        )

    def output_schema(self):
        return ProductList

async def main():
    async with session() as s:
        await Navigate("https://www.amazon.com/s?k=mechanical+keyboard", session=s).run()
        result = await ReadTopProducts(
            category="mechanical keyboard",
            session=s, llm="gemini-3-flash-preview", verbose=True,
        ).run()
        print(result.output.products[:3])

asyncio.run(main())
```


## Install from source

<details>
<summary>Build from source (requires Rust)</summary>

```bash
git clone --recurse-submodules https://github.com/aadya940/orbit.git
cd orbit

cd oculos && cargo build --release && cd ..
mkdir -p orbit/_bin

# Linux/macOS
cp oculos/target/release/oculos orbit/_bin/oculos

# Windows
copy oculos\target\release\oculos.exe orbit\_bin\oculos.exe

pip install .
```

macOS users: grant accessibility permissions as described [here](https://github.com/huseyinstif/oculos?tab=readme-ov-file#macos-grant-accessibility-permission).

</details>


## Support matrix

| OS | Architectures |
|---|---|
| **Windows** | x86-64 (`win_amd64`) |
| **Linux** | x86-64 (`manylinux`) |
| **macOS** | Intel + Apple Silicon (`universal2`) |

| Python | 3.10 · 3.11 · 3.12 · 3.13 |
|---|---|


## Safety

No permanent file deletion , destructive operations go to Trash/Recycle Bin. Disk writes require explicit human approval via a configurable callback.


## License

Apache 2.0 , Special thanks to [OculOS](https://github.com/huseyinstif/oculos) and the open-source packages that make this possible.

PUBLIC HISTORY

First discoveredMar 31, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenMar 14, 2026

last updatedMar 30, 2026

last crawled2 months ago

version—

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:aadya940/orbit)