AGENTS / GITHUB / hallucination-elimination-benchmark
githubinferredactive

hallucination-elimination-benchmark

provenance:github:Mysticbirdie/hallucination-elimination-benchmark
WHAT THIS AGENT DOES

The hallucination-elimination-benchmark is a multi-tier benchmark designed to eliminate large language model (LLM) hallucinations. It evaluates models like Claude 4.6, GPT-5.2, Mistral 7B, and Gemini 2.5 Pro across a dataset of 222 adversarial question-answer pairs focused on Ancient Rome. The benchmark achieves high accuracy (95-100%) and includes novel topological paradox detection. This tool is useful for developers and researchers seeking to assess and improve the reliability of LLMs.

PROBLEM IT SOLVES

This agent solves the problem of LLM hallucinations, which can lead to inaccurate and unreliable information. It provides a model-agnostic benchmark to automatically evaluate LLMs, saving developers time and effort compared to manual evaluation processes.

View Source ↗First seen 3mo agoNot yet hireable

CAPABILITIES & CONSTRAINTS

TECH & STACK
pythonllm-evaluationhallucinationllmsgithubinference

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub
first seenFeb 19, 2026
last updatedMar 8, 2026
last crawled2 months ago
version

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:Mysticbirdie/hallucination-elimination-benchmark)