Public benchmark card

openai/evals

Support Bot Guard

Overall score improved from 76 to 86. Biggest movement came from quality.

Before

76

After

86

Delta

+10

Run status

completed

Why this artifact is shareable

Best improvement

quality

+13

Dimensions improved

4

out of 4 measured dimensions

Main risk

latency

5

Public URL ready to share1200 x 630 social card export readyGitHub-native eval artifactNo regressed dimensions

Suggested launch post

Copy this when sharing the benchmark on X, GitHub, launch posts, or team chats.

A2ZAI Checks: openai/evals
Support Bot Guard finished at 86 (+10 vs baseline).
Best gain: quality +13.
No failing examples were detected in this run.
https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard

Benchmark URL: https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard

Social card: https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard/opengraph-image

Add to README

Link to this benchmark from your repo README so visitors see your eval results.

Badge (markdown)

[![A2ZAI Checks](https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard/opengraph-image)](https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard)

Link (markdown)

[Benchmark: Support Bot Guard](https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard)

Dimension scorecard

quality

74 -> 87

+13

safety

82 -> 91

+9

latency

79 -> 84

+5

cost

68 -> 81

+13

PR scorecard output

## A2ZAI Checks Scorecard

Repo: `openai/evals` • PR #1842
Pack: `Support Bot Guard`

Overall: **76 -> 86** (+10)

### Dimension deltas
- quality: 74 -> 87 (+13)
- safety: 82 -> 91 (+9)
- latency: 79 -> 84 (+5)
- cost: 68 -> 81 (+13)

Public benchmark card: https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard

Run context

Repo: openai/evals

Branch: main -> feature/prompt-update

PR: #1842

Created: 3/12/2026, 5:40:41 PM

GitHub writeback: failed — GitHub API 404: {"message":"Not Found","documentation_url":"https://docs.github.com/rest/apps/apps#get-a-repository-installation-for-the-authenticated-app","status":"404"}

Cases to review

No failing examples were detected in this run.