Public benchmark card

krishnaadavi/a2zai

Support Bot Guard

Overall score improved from 76 to 86. Biggest movement came from quality.

Download social card

Copy launch post

Run your own

Before

After

Delta

+10

Run status

completed

Why this artifact is shareable

Best improvement

quality

+13

Dimensions improved

out of 4 measured dimensions

Main risk

latency

Public URL ready to share1200 x 630 social card export readyGitHub-native eval artifactNo regressed dimensions

Suggested launch post

Copy this when sharing the benchmark on X, GitHub, launch posts, or team chats.

DriftCheck: krishnaadavi/a2zai
Support Bot Guard finished at 86 (+10 vs baseline).
Best gain: quality +13.
No failing examples were detected in this run.
https://a2zai.ai/checks/benchmarks/krishnaadavi-a2zai-support-bot-guard

Post to X

Copy text

Open social card

Benchmark URL: https://a2zai.ai/checks/benchmarks/krishnaadavi-a2zai-support-bot-guard

Social card: https://a2zai.ai/checks/benchmarks/krishnaadavi-a2zai-support-bot-guard/opengraph-image

Add to README

Link to this benchmark from your repo README so visitors see your eval results.

Badge (markdown)

[![DriftCheck](https://a2zai.ai/checks/benchmarks/krishnaadavi-a2zai-support-bot-guard/opengraph-image)](https://a2zai.ai/checks/benchmarks/krishnaadavi-a2zai-support-bot-guard)

Link (markdown)

[Benchmark: Support Bot Guard](https://a2zai.ai/checks/benchmarks/krishnaadavi-a2zai-support-bot-guard)

Dimension scorecard

quality

74 -> 87

+13

safety

82 -> 91

latency

79 -> 84

cost

68 -> 81

+13

PR scorecard output

## A2ZAI Checks Scorecard

Repo: `krishnaadavi/a2zai` • PR #1
Pack: `Support Bot Guard`

Overall: **76 -> 86** (+10)

### Dimension deltas
- quality: 74 -> 87 (+13)
- safety: 82 -> 91 (+9)
- latency: 79 -> 84 (+5)
- cost: 68 -> 81 (+13)

Public benchmark card: https://a2zai.ai/checks/benchmarks/krishnaadavi-a2zai-support-bot-guard

Run context

Repo: krishnaadavi/a2zai

Branch: main -> checks-writeback-test

PR: #1

Created: 3/12/2026, 5:10:34 PM

GitHub comment: posted successfully ↗

Cases to review

No failing examples were detected in this run.