Local-first builder utility: DriftCheck by A2ZAI

Catch AI regressions before you ship

Start with the open-source runner. Run the three V1 packs for tool calling, RAG faithfulness, and model migrations locally or in CI, then publish a proof card only when you choose.

Run locally first

Install the open-source runner, create starter packs, and check prompt, agent, RAG, and model changes without uploading code.

Track what could break

Use A2ZAI radar to connect provider, model, API, and SDK changes to the packs your stack should run next.

Publish proof when ready

Local reports stay private by default. Publish a proof card only when you want a README, launch, or social artifact.

Local-first quick start

npx @a2zai-ai/driftcheck init
npx @a2zai-ai/driftcheck check
npx @a2zai-ai/driftcheck check --pack tool-calling
Optional publish creates a proof card like the demo.View proof card →Want to know which pack to run?Check your AI stack →
1

Clone the repo

Install the standalone utility package without cloning the full A2ZAI app repo.

Open DriftCheck →
2

Run starter packs

Initialize the three V1 packs, run checks locally, and review the JSON and markdown reports.

Pack format docs →
3

Publish when ready

Local reports stay private. Publish only when you want a proof card for GitHub or social launch posts.

View proof gallery →

Starter packs

Pre-built packs you can run as-is or copy into the workbench and edit. Pick one below to load it.

Tool-Calling Reliability

Catch schema drift, hallucinated tool calls, and weak fallback behavior before agent changes ship.

Coding agents, PR workflows, patch quality and safety

Use in workbench →

RAG Faithfulness

Catch unsupported claims, missing citations, and weak refusals in retrieval-augmented answers.

Coding agents, PR workflows, patch quality and safety

Use in workbench →

Model Migration

Compare model migration behavior across quality, cost, latency, and safety before rollout.

Coding agents, PR workflows, patch quality and safety

Use in workbench →

Live Support Eval

Run a real OpenAI-backed comparison and score the responses with Checks guardrails.

Support bots, billing assistants, live OpenAI comparison

Use in workbench →

Agent Tool-Calling Guard

Protect tool-calling reliability, schema correctness, and fallback behavior for agent APIs.

Agent APIs, tool-calling reliability, schema and fallback behavior

Use in workbench →

Model Latency and Cost Guard

Catch regressions from pricing/latency/model-routing changes before rollout.

Model routing, latency budgets, and token-cost protection

Use in workbench →

Support Bot Guard

Catch regressions in answer quality, escalation behavior, latency, and cost.

Pre-computed support guardrails, escalation and safety

Use in workbench →

Coding Agent PR Pack

Score a coding agent across patch quality, safety, speed, and cost.

Coding agents, PR workflows, patch quality and safety

Use in workbench →
Pack format and custom YAML →

MVP workflow now live

GitHub-connected PR checks, manual runs, proof cards

`DriftCheck` now supports both manual scorecard runs and automatic GitHub App webhook runs on connected pull requests. Builders can connect a repo, let PR activity trigger checks automatically, and still run starter packs manually when they want to test a different workflow. Packs can evaluate actual case outputs with heuristic rules and, when configured, execute OpenAI-backed comparisons directly from the YAML pack.

First real workflow

Sign in to run your first Checks pack

The current MVP supports a manual repo connect flow: paste repo metadata, load a YAML starter pack, optionally execute OpenAI-backed cases from the pack itself, grade the outputs with heuristic rules, generate a PR scorecard, and publish a benchmark card.

Defined viral artifact

PR scorecard first, public proof card second

The artifact A2ZAI creates is not a generic dashboard. It is a visible object that travels naturally through GitHub, founder launches, and social sharing: a PR scorecard backed by a public proof card.

Viral Artifact

GitHub PR scorecard

Shareable PR comment

A2ZAI Checks

Prompt regression check for `support-agent.yaml`

Quality

+8.4%

Latency

+220ms

Cost

-31%

Passing: `refund-policy`, `invoice-lookup`, `cancel-subscription`

Regressed: `edge-case-promotions` on `gpt-4.1-mini`

Recommendation: merge after fixing one retrieval prompt and rerunning the pack.

Public Card

Benchmark card

Linkable showcase

Repo benchmark

support-agent / checkout-recovery

128 eval cases

Best model route

Claude Sonnet + GPT-4.1-mini fallback

Win summary

12% better success at 29% lower cost

Pass rate 94%Safety stable1 flaky case

This is the artifact that spreads on X, GitHub, and founder launches: a benchmark card builders can link to when they ship.

AI builder brief

Model launches, benchmarks, pricing moves, and outages that matter when you ship. No spam, unsubscribe anytime.

Or stay in the loop on X

Follow @_MomentumTrader

30-day MVP scope

Week 1

Builder radar positioning

  • Reframe the site around builders shipping with models, APIs, SDKs, and agents.
  • Tighten the river and briefs around releases, benchmarks, pricing, outages, and deprecations.
  • Publish the DriftCheck narrative and the benchmark artifact preview.

Week 2

First GitHub workflow

  • Support one happy path: repo connect, YAML test pack, PR comment output.
  • Score quality, safety, latency, and cost deltas from before vs after.
  • Generate a proof card page builders can link in launch posts and READMEs.

Week 3

MVP launch loop

  • Ship starter packs for support bots, coding agents, and retrieval workflows.
  • Create proof-card showcase pages for standout repos and agents using DriftCheck.
  • Turn real eval wins and regressions into quick bytes and brief coverage.

Week 4

Distribution and iteration

  • Publish example repos, proof cards, and operator writeups.
  • Use the agent index as the submission and distribution surface for builders.
  • Tune onboarding around the fastest path from install to first PR scorecard.

Distribution loop

1

Builder connects repo and runs Checks on a prompt or agent PR.

2

A2ZAI posts a scorecard comment with score deltas and failing examples.

3

Builder shares the public proof card in X, GitHub, or product launch posts.

4

A2ZAI features the best cards in the river, briefs, and agent showcase.

5

New builders arrive from those artifacts and install Checks in their own repos.

How the rest of A2ZAI fits

`Live River` explains the launches, pricing changes, and outages that might move benchmark results.

`Model pages` become compatibility and performance context for builder decisions.

`Agent Index` becomes the showcase surface for builders shipping with Checks.

`Briefs` become the weekly summary of what changed before your app breaks.