Reviewing AI-Assisted Code Changes

When AI changes a lot of code quickly, review is no longer about reading lines. It’s about making sure you still understand what will happen in production.

The thinking becomes: I'm not reviewing code. I'm reviewing a change in system behaviour.

The bigger the AI-generated change, the more effort goes into reviewing intent and edges - not lines of code.

AI helps me move fast. Review is where I decide what's safe to ship.

Keep changes small and incremental whenever possible. That's the single best thing you can do for review quality.

Small change (minimal risk)

Typical size (guide, not a rule)

  • ~1 to 5 files
  • ~10 to 100 lines changed

Typical shape of change

  • Small diff, few files
  • Renames, formatting, tidy-ups
  • No behaviour change intended

PR checklist

  1. What is the one thing this change is meant to do?
  2. Does the diff match that intent?
  3. Were any new branches, defaults or fallbacks added?
  4. Did auth, env vars, network calls or data writes change?
  5. Was any error handling added or loosened?

If it looks mechanical and behaves the same, then suggest approve.

Medium change (some risk)

Typical size (guide, not a rule)

  • ~5 to 20 files
  • ~100 to 400 lines changed

Typical shape of change

  • Larger diff, several files
  • Helpers added, logic reshaped
  • Behaviour should mostly stay the same

PR checklist

  1. What is supposed to change and what must not?
  2. Are permissions or validation wider than before?
  3. Are errors still visible and logged?
  4. Do defaults still make sense?
  5. What happens with empty, invalid or missing input?
  6. What happens if a dependency fails?
  7. If this breaks, how would I notice?

Always run negative tests here.

Large change (high risk)

Typical size (guide, not a rule)

  • ~20+ files
  • ~400+ lines changed

Typical shape of change

  • Very large diff
  • Many files
  • Bulk AI edits or new flows
  • Direct production impact

PR checklist

  1. Which parts of this change are high-risk (auth, data, config, infra)?
  2. What must never change as a result of this PR?
  3. Did AI "helpfully" widen behaviour or hide failure?
  4. Are any errors being swallowed or turned into defaults?
  5. Are retries, timeouts or fallbacks hiding problems?
  6. Do I still trust the logs and alerts?
  7. What would a 2am failure look like?
  8. Is rollback or staged rollout in place?

If I don't trust it, then slow down.


If you use a different rule or checklist when reviewing AI-generated changes, bring it to a weekly workshop - that kind of practical swap is exactly what they're for.