Ways to make the output trustworthy

When AI produces something you are going to put live, trustworthy means you can rely on it without re-checking every line by hand. That trust is not automatic. You build it by being explicit about what good looks like, bringing in fresh eyes, and verifying the parts that carry real risk. This page is about the moves that turn a plausible-looking output into one you can put your name on.

Trust is not automatic. AI output earns it by being checked against clear criteria and looked over with fresh eyes, until it is something you can stand behind.

Ask for a shape, not prose

When you need data back from AI in a fixed form, the reliable move is to hand it a schema, the defined shape of the thing you want, and have it fill that in, rather than describing the JSON you want in a prompt and hoping. The schema is a contract for what comes back.

Say it another way

A schema is a fixed shape you ask the AI to fill in, like a form with set fields, instead of letting it answer in free-flowing text. Hand it the form and the answer comes back in the exact shape you need, with nothing missing and nothing made up.

Ask for JSON in words and extra output drifts in, a tip, a summary, a stray field. Hand over a schema and you get exactly those fields, the card and nothing else.

Doing this with Claude Code: Ask for a shape, not prose

Hand Claude a schema, the fields and their types, and ask it to fill that in, instead of describing the JSON you want in words. The shape then comes back guaranteed, and nothing extra rides along with it.

A schema earns its keep in five ways: a guaranteed shape, a shape that is not the same as the truth, a real empty instead of a guess, a forced shape you can route with and a clean value not just a clean shape.

A guaranteed shape, not a request. Ask for JSON by example and other output creeps in, a recommendation, a summary of the work, an extra field. With a recipe-card schema of a title, an ingredients list and a steps list, you get those fields and nothing else.

The shape is guaranteed, not the truth. The schema makes sure you get a well-formed card, but it cannot know that serves 4 is wrong for a recipe that clearly feeds six. Right shape, possibly wrong values, so you still check the meaning.

A real empty beats a guess. If a recipe never states a cook time, make that field nullable, so Claude returns nothing there rather than inventing thirty minutes just to fill the slot.

Force the shape and route with it. tool_choice decides whether a tool must be used at all and, when you offer several schemas, which one, so "any" turns Claude into a router that has to sort the input into the right shape instead of waffling. Force the extraction tool first too, so the facts are locked before any enrichment step reads them.

Pin the value, not just the shape. The schema guarantees the field exists, not that its value is clean, so add normalisation rules beside it (always Celsius, always grams) and give the awkward cases a home, an "other" plus detail for a known off-list value and an "unclear" for one the source does not settle.

Examples keep extraction honest

The other way to a trustworthy result is by example. When Claude is reading a messy source and pulling structured data out of it, a few worked examples stop it inventing values that were never there.

A scruffy source in, a clean card out. The worked examples keep it honest: a knob stays a knob, feeds a family of four becomes serves 4, an unstated cook time is left as not stated rather than invented.

Doing this with Claude Code: Examples that keep extraction honest

When you pull a recipe out of a scruffy web page into a clean card, the failure to watch for is invention, a value that was never on the page. Hand Claude a couple of worked examples, this messy source gives this clean card, and it carries across what is actually there instead of guessing a tidy number.

Examples earn their keep here in four ways: keeping it honest, defining the loose terms, handling any layout and getting the empty fields right.

Do not invent, keep it in scope. A recipe says a knob of butter, and the invention is writing 15 grams, a number that was never there. Show that a knob is about a tablespoon, so it extracts what is on the page and stays in scope.

Define the loose terms. Give it a small glossary by example, a drizzle of oil is roughly a tablespoon, a pinch is a quarter teaspoon, so the vague descriptions map to known values instead of its own guesses.

Handle any layout. The same recipe turns up as a blog story, a tidy card and a forum comment. Show it pulled from each, and it learns to find the data by what it is, not where it sits, so a new layout does not throw it.

Get the empty fields right. Show that feeds a family of four fills the serves field, and that an optional garnish left out stays honestly empty, so required fields are filled even when worded oddly and optional ones are not invented.

Validate and retry

Structured output gives you the shape; validating is checking the values are actually right, and when they are not, feeding back the specific failure so the AI corrects it rather than starting cold.

Validate the extracted card, and when it fails hand back the specific errors so it corrects that exact thing, looping until it holds. The exception: if the information simply is not in the source, no retry will conjure it, so you stop and flag instead.

Doing this with Claude Code: Validate and retry

When the output has to be right, do not just rerun it. Validate it, and hand back the specific failure so Claude corrects that exact thing rather than guessing again. The loop also sharpens your own sense of what right means, you tend to pin the requirement down properly only as you see the results.

The loop earns its keep in three ways: say why and show, know when a retry will not help, and let the output check itself.

Say why and show. A bare try again gets the same mistake back. Hand over the exact error, what failed and where, so the retry fixes that one thing. On the recipe card that is the serves field is empty and the butter has no amount, not that is wrong.

Know when a retry will not help. A retry fixes how it answered, a format or structure problem, but cannot invent what was never in the source. When the data is genuinely absent, stop and flag rather than loop.

Let the output check itself. Extract a calculated value next to the stated one, plus a conflict flag, so a discrepancy surfaces in the output instead of you catching it by eye. On the card, a stated serves sitting next to a calculated one.

Review with fresh eyes

The maker is the worst reviewer of its own work. The instance that wrote the code is carrying the reasoning that produced it, so reviewing its own output it tends to confirm rather than question. A fresh pair of eyes catches what it glossed over.

The instance that wrote the code carries the reasoning that made it, so reviewing its own work it tends to confirm, not question. A fresh instance with none of that history reviews it cold and catches what the maker glossed over.

Doing this with Claude Code: Review with fresh eyes

Have a separate instance review the work, one with none of the context that produced it, so it judges what is actually there rather than what was intended.

Fresh-eyes review earns its keep in three ways: a clean instance, focused passes, and confidence to triage.

A clean instance, not self-review. The instance that made it is primed to confirm its own calls, it is still carrying the reasoning that made them. A separate instance with none of that history reviews the output cold and questions it, like a different cook checking the dish against the card rather than the one who cooked it from memory.

Split a big review into focused passes. One sweep spreads attention thin. Go file by file for the local issues, then a separate pass on how the pieces fit together, so each pass has its full focus.

Tag findings with confidence. Have it report how sure it is on each finding, so you act on the certain ones straight away and give the unsure ones a closer look.

Guard against false positives

A check is only trustworthy if you act on what it says. The fastest way to lose that is the false positive, because one category that cries wolf makes you start ignoring all of the output, the accurate findings included. So part of making output trustworthy is protecting the signal, keeping the noise down so every finding is worth reading.

Say it another way

A false positive is when a check raises the alarm about something that is actually fine. A few of those and you start ignoring the alarm altogether, like a smoke detector that goes off every time you make toast, so a real fire gets missed too.

One noisy category makes you start ignoring the whole check, accurate findings and all. Tag why each finding fired so the noise can be grouped and suppressed, and the signal is trustworthy again. Like an oven timer that beeps at every little thing, the kitchen stops listening and misses the one bake that mattered, so you keep it for what counts.

Doing this with Claude Code: Guard against false positives

When Claude is checking or reviewing, the quiet killer is the false positive. A check that cries wolf trains you to skim or ignore all of its output, so the accurate findings get thrown out with the noise. Trust is slow to build and fast to lose, so a noisy category costs far more than it looks.

Protecting that trust earns its keep in four ways: one noisy category poisons the rest, mute and fix rather than tune live, show the line with examples and tag why each finding fired.

One noisy category poisons the rest. When one check cries wolf you start distrusting all of them and the real findings get ignored alongside the noise. A false positive costs more than a missed nicety, because it spends trust you cannot quickly earn back.

Mute and fix, do not tune live. When a category is noisy, switch it off to restore trust in the rest, then reframe and fix it properly before turning it back on, rather than leaving it spraying noise while you tinker.

Show the line with examples. Give it an acceptable case next to a real problem, so it learns where the boundary sits and stops flagging the harmless ones.

Tag why each finding fired. Have it return the reason it flagged each one, a detected-pattern field, so false positives become data you can group and suppress wholesale rather than a vague sense that it is noisy.

Two more trustworthy moves already live on the getting a sharp answer page: setting checkable criteria so quality is not a matter of opinion, and the Sharpen loop for tightening a draft until two people would decide the same way.

In plain words

In plain words: this page is about trusting what AI gives you, so you can rely on it without re-checking every line by hand. The moves are to ask for a fixed shape rather than free text, give worked examples so it does not invent things, check the values and feed back anything wrong, get a fresh set of eyes to review, and make sure your checks do not cry wolf. AI here means a tool that produces code or data from your instructions.

Part of Working with AI. The soft skills behind it: The soft skills of working with AI.

Why Claude. When this page carries tool-specific notes they use Claude Code, because it is one of the most widely used AI tools for building software. It is also what we build with day to day. The moves themselves are general and carry across to other capable AI tools.