Reviewing AI-Assisted Code Changes
When AI changes a lot of code quickly, review is no longer about reading lines. It’s about making sure you still understand what will happen in production.
I’m not reviewing code. I’m reviewing a change in system behaviour.
A good rule to use:
The bigger the AI-generated change, the more effort I put into reviewing intent and edges, and not lines of code.
AI helps me move fast.
Review is where I decide what’s safe to ship.
A good second rule:
Keep changes small and incremental whenever possible.
Following are some suggested strategies.
Small change (minimal risk)
Typical size (guide, not a rule)
- ~1 to 5 files
- ~10 to 100 lines changed
Typical shape of change
- Small diff, few files
- Renames, formatting, tidy-ups
- No behaviour change intended
PR checklist
- What is the one thing this change is meant to do?
- Does the diff match that intent?
- Were any new branches, defaults or fallbacks added?
- Did auth, env vars, network calls or data writes change?
- Was any error handling added or loosened?
If it looks mechanical and behaves the same, then suggest approve.
Medium change (some risk)
Typical size (guide, not a rule)
- ~5 to 20 files
- ~100 to 400 lines changed
Typical shape of change
- Larger diff, several files
- Helpers added, logic reshaped
- Behaviour should mostly stay the same
PR checklist
- What is supposed to change and what must not?
- Are permissions or validation wider than before?
- Are errors still visible and logged?
- Do defaults still make sense?
- What happens with empty, invalid or missing input?
- What happens if a dependency fails?
- If this breaks, how would I notice?
Always run negative tests here.
Large change (high risk)
Typical size (guide, not a rule)
- ~20+ files
- ~400+ lines changed
Typical shape of change
- Very large diff
- Many files
- Bulk AI edits or new flows
- Direct production impact
PR checklist
- Which parts of this change are high-risk (auth, data, config, infra)?
- What must never change as a result of this PR?
- Did AI “helpfully” widen behaviour or hide failure?
- Are any errors being swallowed or turned into defaults?
- Are retries, timeouts or fallbacks hiding problems?
- Do I still trust the logs and alerts?
- What would a 2am failure look like?
- Is rollback or staged rollout in place?
If I don’t trust it, then slow down.
Notes on terms used
- PR (pull request): A proposed set of code changes reviewed before merging.
- Diff: A view of what changed in the code.
- System behaviour: What the software actually does in production, especially under failure.
- Defaults / fallbacks: Values or paths used when something is missing or fails.
- Negative tests: Tests that use bad input or failure scenarios on purpose.
- Rollback: Reverting to a previous version if a change causes problems.
- Staged rollout: Releasing changes gradually to reduce risk.
If you use a different rule or checklist when reviewing AI-generated changes, I’d be interested to hear it in the comments.