GTA

provenance:github:open-compass/GTA

WHAT THIS AGENT DOES

GTA is an agent benchmark designed for the NeurIPS 2024 D&B Track. It serves as a platform for evaluating general tool agents. The agent is primarily implemented in Python and focuses on assessing large language model agents. GTA provides a standardized environment for researchers and developers to compare and improve the capabilities of LLM-powered agents. This benchmark allows for a more rigorous evaluation of agent performance across various tasks.

PROBLEM IT SOLVES

GTA addresses the need for a standardized benchmark to evaluate general tool agents, allowing for objective comparisons and progress tracking. Instead of manually assessing agent performance across diverse tasks, researchers can utilize GTA to automate and streamline the evaluation process.

View Source ↗First seen 2y agoNot yet hireable

CAPABILITIES & CONSTRAINTS

TECH & STACK

llm-agentllm-evaluationpythongithubbenchmarkneurips

PUBLIC HISTORY

First discoveredApr 18, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenJun 6, 2024

last updatedApr 17, 2026

last crawled2 months ago

version—

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:open-compass/GTA)