2.3.3 - The One Metric That Proves This Works for Real Users

Define the proof metric: say what the metric is, what user behavior creates it and what threshold counts as enough.

Proof metric

Is this metric proving real value or just reporting activity?

The call

Choose one metric before you measure everything. Otherwise AI helps you build dashboards that track activity while value stays invisible.

Sharpen the proof metric with AI

This is a Build idea, so there are two AI moves: first Sharpen the proof metric using the prompt below, then Build it with AI, vibeCoding the thinnest working code that tests the decision. This idea also calls for Make the output trustworthy.

New here? Let your AI do this with you. Paste your problem into Guide me and it builds a prompt that walks you through this in ChatGPT or Claude. You do not need to know how to prompt.

Prefer to drive the AI yourself? Give ChatGPT or Claude this page and ask it to help. For skill.txt, the Claude Code plugin and other ways in, see how to use vibe2value.

Or copy this prompt into AI chat, replace the bracketed lines with your real proof metric and keep the instruction exactly as visible here. It helps you put your build/proof-metric.md together by refining your three lines until two people would make the same product decision from them.

You are checking whether this proof metric is clear enough before you move forward.

Constraint:
The proof metric must be specific enough that two people would make the same keep or cut decision from it.

Example of the standard:
Vague: "We will track engagement and adoption of the search tool."
Sharp: "We will track how many content creators act on at least one context-shaped search result in the same session. We will treat two out of three searches producing an actionable result as proof that the context layer is working."
The sharp version is specific enough that two people would make the same keep or cut decision. The vague one is not.

Working draft:
Metric: [what the metric is]
User behavior behind it: [what user behavior creates it]
Threshold: [what threshold counts as enough]

Task:
Decide whether this proof metric is specific enough that two people would make the same keep or cut decision. If it already is, say so. If it is vague, rewrite it to meet the standard in the example above.

Check:
- Would two people interpret this the same way?
- Does it stay concrete enough to guide the next step?
- Can you state what the metric is, what user behavior creates it and what threshold counts as enough?

Return:
- Verdict: clear, or needs work
- The corrected three lines, or the original three if already clear
- What was vague, and the one change that fixed it

Copy this into AI chat. Replace the bracketed parts. Keep the rest unchanged. AI will likely suggest refinements based on what you enter. Use those to sharpen your thinking, not replace it.

There are six ways to work with AI across a build. This idea's AI moves are noted above. See Working with AI for all six and where each fits.

The same idea on a recipe card

The proof a meal worked is not how many pans you used or how long you were in the kitchen. It is the clean plates and the hand reaching out for seconds. A kitchen can be a whirl of activity all night and still send out food that comes back barely touched.

Your proof metric is the clean plate and not the busy kitchen. Name the one signal that shows the user got the outcome you promised, the behaviour behind it and the line that counts as proof. Measure the seconds being asked for and not the number of pots, so activity never gets mistaken for value.

The recipe card is just a simple example, using everyday cooking ideas everyone understands, to make the concept clear. See the recipe card.

What it really means

A proof metric is not a dashboard full of numbers. It is the one signal that tells you whether the work created the result you care about. Until you can name one metric, one user behaviour behind it and one threshold that counts as proof, measurement will stay fuzzy. AI can help analyse data, but it cannot decide which metric is the decision line.

Make the proof metric concrete

Compare the broad version with a version you can actually test.

Too vague: We will track engagement and adoption of the search tool.
Concrete enough to test: We will track how many content creators act on at least one context-shaped search result in the same session. We will treat two out of three searches producing an actionable result as proof that the context layer is working.

The second version lets two people make the same keep or cut decision from it.

Most metrics are vanity, they move and prove nothing. The one proof metric is the single signal that would change your mind about whether this works, like users returning to search again, and it is the only one worth watching closely.

Check the proof metric

Pass: You can say what the metric is, what user behaviour creates it and what threshold counts as enough.
Fail: If the metric still depends on general words like usage, growth or engagement, it is not clear enough yet.

Do not move into launch, iteration or analysis work until this passes.

What you'll walk away with

This post is about the framing decision: the words that pin down what this idea actually means for your build, before any code. You'll come out with your own knowledge-base/build/proof-metric.md written and sharpened: the proof metric pinned down as a decision, three worked examples to map against your own surface and an AI prompt that pressure-tests it until two people would make the same call.

Write it down

Your proof-metric.md is a real file in your project. The AI forgets everything between sessions. It reads this file each time to pick up what you already decided, on this idea or another part of the build. Write it down once instead of explaining it again.

knowledge-base/shape-build-launch/build/proof-metric.md is just a suggested name and place for this information. The name and the location are just how we organise this type of documentation, not something you have to follow. Call the file and put it wherever suits your project. What matters is that it is written down in a format that is easily understood by both people and the AI.

The .md ending is markdown, a plain text format often used for documents kept in a code repository like GitHub. It is just the common way people store this kind of writing alongside their code.

Writing it down is how you keep good context for your AI. See Keep the signal, not the noise for why this matters across a whole build.

Risk and mitigation

Risk: Optimising for a metric that looks strong while user value stays flat, which can push the build in the wrong direction.
Mitigation: Pair the core metric with one user-impact check and only scale changes when both move in the same direction.

Key takeaway

Do not move forward until you can say what the metric is, what user behaviour creates it and what threshold counts as enough.

How to document your proof metric

Write your proof metric up in full so your AI has the whole picture: the answer, why you believe it, what you are optimising for, where you might be wrong and the standing instruction it should follow. Keep it in a file with your project like this.

# Proof metric

## Answer
Metric: [what the metric is]
User behavior behind it: [what user behavior creates it]
Threshold: [what threshold counts as enough]

## Evidence
Why you believe this. Conversations, examples, tickets, your own experience.

## Decision
What you are optimising for and what you are saying no to.

## Risk
Where you might be wrong and what would tell you.

## AI instruction
The standing note your AI reads on this project.

Here is one filled in so you can see what good looks like, grounded in the recipe card idea from earlier.

## Answer
Metric: Clean plates and hands reaching for seconds
User behavior behind it: Guests going back for a second helping
Threshold: More than half the table asks for seconds

## Evidence
The dishes people love are the ones the serving dish empties on, and the ones they do not come back to sit half eaten.

## Decision
Optimising for a dish people want more of. Saying no to judging success by polite compliments.

## Risk
People might take seconds out of politeness or hunger. Look for more than half the table doing it, not one guest.

## AI instruction
The proof this dish works is clean plates and hands reaching for seconds. Count it a success when more than half the table goes back for more.

Common questions

What is the one metric for a site and a list?
Not visitors. People joining your list and replying to what you send. A reply is worth a thousand pageviews because it is a person. Most traffic numbers are inflated by bots anyway. Your Ghost site covers getting to numbers you can trust.
What is it for an app?
The count of people who finished the main path this week. One number, one path. Everything else is a diagnostic for when that number moves. Build your product improves against exactly that signal.

What now?

Apply it. Guide me turns what you are building into a prompt that walks you through this idea.

Get all 27. The guide is a free coffee-length read with the whole method. Join free and it is yours to download.

Get a hand. Work with me and we will work out together what is worth doing. The first conversation is free.