Skip to content
WebVoyant.
All notes
AIEngineering·Mar 10, 2026·11 min read

Eval-driven development for production AI

Why the eval set, not the prompt, is the artifact you should ship first when adding AI to a real product.

If you can't tell whether your AI feature got better or worse this week, you don't have an AI feature — you have a vibe.

We start every AI engagement by writing the eval set before the prompt. 100–300 cases drawn from real user questions, scored by a reference model and spot-checked by a human.

Once you have evals, everything downstream gets easier. Model swaps become a regression test. Prompt edits ship behind a flag with confidence intervals. Cost optimisation stops being a guess.

The team that owns the eval set owns the AI feature. Make sure that's your team, not your vendor's.

Building something this touches?

We’d love to hear about it. 30 minutes, no pitch deck.