Zakat Eligibility CopilotPOC

Eval dashboard

Deterministic checks on every memo: schema validity, citation grounding (verbatim substring of the cited clause), escalation consistency. Run from the CLI: uv run python -m eval.run_eval.

Latest baseline · gpt-5 · policy 2025-02-24

Citation faithfulness
100.0%
target ≥95%
Effective accuracy
90.0%
contested+escalate counts as ESCALATE
Category accuracy (strict)
83.3%
exact category match
Escalation recall
62.5%
target 100% on contested
Mean cost / draft
$0.02598
target <$0.05

Committed baselines

NameFaithfulEffectiveEsc. recall$/draft
phase1
5/21/2026, 8:57:14 PM
100.0%86.7%50.0%$0.02657
phase1_with_guardrails
5/21/2026, 9:03:33 PM
100.0%90.0%62.5%$0.02598