Public benchmark card
openai/evals
Support Bot Guard
Overall score improved from 76 to 86. Biggest movement came from quality.
Before
76
After
86
Delta
+10
Run status
completed
Why this artifact is shareable
Best improvement
quality
+13
Dimensions improved
4
out of 4 measured dimensions
Main risk
latency
5
Suggested launch post
Copy this when sharing the benchmark on X, GitHub, launch posts, or team chats.
A2ZAI Checks: openai/evals Support Bot Guard finished at 86 (+10 vs baseline). Best gain: quality +13. No failing examples were detected in this run. https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard
Benchmark URL: https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard
Social card: https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard/opengraph-image
Add to README
Link to this benchmark from your repo README so visitors see your eval results.
Badge (markdown)
[](https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard)
Link (markdown)
[Benchmark: Support Bot Guard](https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard)
Dimension scorecard
quality
74 -> 87
+13
safety
82 -> 91
+9
latency
79 -> 84
+5
cost
68 -> 81
+13
PR scorecard output
## A2ZAI Checks Scorecard Repo: `openai/evals` • PR #1842 Pack: `Support Bot Guard` Overall: **76 -> 86** (+10) ### Dimension deltas - quality: 74 -> 87 (+13) - safety: 82 -> 91 (+9) - latency: 79 -> 84 (+5) - cost: 68 -> 81 (+13) Public benchmark card: https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard
Run context
Repo: openai/evals
Branch: main -> feature/prompt-update
PR: #1842
Created: 3/12/2026, 5:40:41 PM
GitHub writeback: failed — GitHub API 404: {"message":"Not Found","documentation_url":"https://docs.github.com/rest/apps/apps#get-a-repository-installation-for-the-authenticated-app","status":"404"}
Cases to review
No failing examples were detected in this run.