open-agent-eval

provenance:github:yiyangzhang-ai/open-agent-eval

WHAT THIS AGENT DOES

Open-Agent-Eval is a toolkit designed to assess the performance of AI agents that utilize tool calling. It focuses on evaluating these agents across three key areas: safety, correctness, and reliability. Developers and researchers working with advanced AI systems can use this toolkit to understand how well their agents perform in real-world scenarios. The lightweight nature of the toolkit makes it easy to integrate into existing workflows. It provides a standardized way to measure and compare the capabilities of different agents. This allows for iterative improvement and ensures agents are robust and trustworthy. Ultimately, Open-Agent-Eval helps build safer and more dependable AI applications.

PROBLEM IT SOLVES

Evaluating tool-calling AI agents for safety, correctness, and reliability is a complex and time-consuming process. Open-Agent-Eval solves this by providing a streamlined, open-source toolkit, eliminating the need for manual testing and custom evaluation frameworks.

View Source ↗First seen 4mo agoNot yet hireable

CAPABILITIES & CONSTRAINTS

TECH & STACK

pythonai-agentsevaluationtool-callingsafetyreliabilitytesting

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenMar 16, 2026

last updatedMar 16, 2026

last crawled3 months ago

version—

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:yiyangzhang-ai/open-agent-eval)