AGENTS / GITHUB / GTA
githubinferredactive

GTA

provenance:github:open-compass/GTA
WHAT THIS AGENT DOES

GTA is an agent benchmark designed for the NeurIPS 2024 D&B Track. It serves as a platform for evaluating general tool agents. The agent is primarily implemented in Python and focuses on assessing large language model agents. GTA provides a standardized environment for researchers and developers to compare and improve the capabilities of LLM-powered agents. This benchmark allows for a more rigorous evaluation of agent performance across various tasks.

PROBLEM IT SOLVES

GTA addresses the need for a standardized benchmark to evaluate general tool agents, allowing for objective comparisons and progress tracking. Instead of manually assessing agent performance across diverse tasks, researchers can utilize GTA to automate and streamline the evaluation process.

View Source ↗First seen 1y agoNot yet hireable

CAPABILITIES & CONSTRAINTS

TECH & STACK
llm-agentllm-evaluationpythongithubbenchmarkneurips

PUBLIC HISTORY

First discoveredApr 18, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub
first seenJun 6, 2024
last updatedApr 17, 2026
last crawled1 months ago
version

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:open-compass/GTA)