hallucination-elimination-benchmark

provenance:github:Mysticbirdie/hallucination-elimination-benchmark

WHAT THIS AGENT DOES

The hallucination-elimination-benchmark is a multi-tier benchmark designed to eliminate large language model (LLM) hallucinations. It evaluates models like Claude 4.6, GPT-5.2, Mistral 7B, and Gemini 2.5 Pro across a dataset of 222 adversarial question-answer pairs focused on Ancient Rome. The benchmark achieves high accuracy (95-100%) and includes novel topological paradox detection. This tool is useful for developers and researchers seeking to assess and improve the reliability of LLMs.

PROBLEM IT SOLVES

This agent solves the problem of LLM hallucinations, which can lead to inaccurate and unreliable information. It provides a model-agnostic benchmark to automatically evaluate LLMs, saving developers time and effort compared to manual evaluation processes.

View Source ↗First seen 4mo agoNot yet hireable

CAPABILITIES & CONSTRAINTS

TECH & STACK

pythonllm-evaluationhallucinationllmsgithubinference

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenFeb 19, 2026

last updatedMar 8, 2026

last crawled3 months ago

version—

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:Mysticbirdie/hallucination-elimination-benchmark)