Reviewing AI-Assisted Code Changes

When AI changes a lot of code quickly, review is no longer about reading lines. It’s about making sure you still understand what will happen in production.

I’m not reviewing code. I’m reviewing a change in system behaviour.

A good rule to use:

The bigger the AI-generated change, the more effort I put into reviewing intent and edges, and not lines of code.

AI helps me move fast.

Review is where I decide what’s safe to ship.

A good second rule:

Keep changes small and incremental whenever possible.

Following are some suggested strategies.


Small change (minimal risk)

Typical size (guide, not a rule)

  • ~1 to 5 files
  • ~10 to 100 lines changed

Typical shape of change

  • Small diff, few files
  • Renames, formatting, tidy-ups
  • No behaviour change intended

PR checklist

  1. What is the one thing this change is meant to do?
  2. Does the diff match that intent?
  3. Were any new branches, defaults or fallbacks added?
  4. Did auth, env vars, network calls or data writes change?
  5. Was any error handling added or loosened?

If it looks mechanical and behaves the same, then suggest approve.


Medium change (some risk)

Typical size (guide, not a rule)

  • ~5 to 20 files
  • ~100 to 400 lines changed

Typical shape of change

  • Larger diff, several files
  • Helpers added, logic reshaped
  • Behaviour should mostly stay the same

PR checklist

  1. What is supposed to change and what must not?
  2. Are permissions or validation wider than before?
  3. Are errors still visible and logged?
  4. Do defaults still make sense?
  5. What happens with empty, invalid or missing input?
  6. What happens if a dependency fails?
  7. If this breaks, how would I notice?

Always run negative tests here.


Large change (high risk)

Typical size (guide, not a rule)

  • ~20+ files
  • ~400+ lines changed

Typical shape of change

  • Very large diff
  • Many files
  • Bulk AI edits or new flows
  • Direct production impact

PR checklist

  1. Which parts of this change are high-risk (auth, data, config, infra)?
  2. What must never change as a result of this PR?
  3. Did AI “helpfully” widen behaviour or hide failure?
  4. Are any errors being swallowed or turned into defaults?
  5. Are retries, timeouts or fallbacks hiding problems?
  6. Do I still trust the logs and alerts?
  7. What would a 2am failure look like?
  8. Is rollback or staged rollout in place?

If I don’t trust it, then slow down.


Notes on terms used

  • PR (pull request): A proposed set of code changes reviewed before merging.
  • Diff: A view of what changed in the code.
  • System behaviour: What the software actually does in production, especially under failure.
  • Defaults / fallbacks: Values or paths used when something is missing or fails.
  • Negative tests: Tests that use bad input or failure scenarios on purpose.
  • Rollback: Reverting to a previous version if a change causes problems.
  • Staged rollout: Releasing changes gradually to reduce risk.

If you use a different rule or checklist when reviewing AI-generated changes, I’d be interested to hear it in the comments.