How to Do a Code Review with AI
Code review with AI using gstack's Review phase: get Staff Engineer, Senior Designer, and QA Lead feedback before a single human looks at your PR.
Code review with AI is one of the highest-leverage things you can do as a solo developer or small team. Instead of shipping first and hoping for the best, you run your diff through a structured panel of AI reviewers — each with a distinct role and perspective — before the code ever reaches a human reviewer. This is exactly what gstack's Review phase is designed for.
Here's how to do it properly.
Why AI Code Review Works#
The problem with most AI code review is that it's unfocused. You paste code into ChatGPT, ask "is this good?", and get a generic wall of suggestions. There's no frame of reference, no role, no accountability.
gstack fixes this by assigning roles. When you run the Review phase, you're not asking one generic AI to look at your code — you're asking a Staff Engineer to check correctness, a Senior Designer to evaluate UX implications, a Security Auditor to scan for vulnerabilities, and a Technical Writer to flag documentation gaps. Each role has different incentives and different things to catch.
The result is a code review that's more thorough than most human PRs get, completed in under two minutes.
Step 1: Complete the Build Phase First#
Don't skip straight to review. gstack's 18 specialist roles are designed to flow in sequence: Think → Plan → Build → Review → Test → Ship → Reflect.
If you try to review incomplete work, you get incomplete feedback. The Completeness Principle in gstack is explicit: "Boil the lake, never the ocean." Do the full implementation first, then review the whole thing.
That means:
- All planned files created or modified
- No TODO comments unless intentional and documented
- Tests scaffolded (even if empty)
- The feature works end-to-end in local dev
Only then do you open the Review phase.
Step 2: Stage Your Diff#
Before running the AI review, generate a clean diff. You want to give the reviewers exactly what changed, not the entire codebase.
# Show what's staged
git diff --staged
# Or diff against main
git diff main..HEAD
# Save it for reference
git diff main..HEAD > review.diffIf your diff is enormous (>500 lines), break it into logical chunks. Large diffs are where bugs hide — both from humans and AI.
Step 3: Run the Staff Engineer Review#
The Staff Engineer role in gstack's Review phase is the primary technical reviewer. Here's what to ask:
You are a Staff Engineer reviewing this diff for production readiness.
Check for:
- Logic errors and edge cases
- Performance implications
- API contract violations
- Security anti-patterns
- Missing error handling
- Race conditions or concurrency issues
Be specific. Reference line numbers. Don't be polite — be correct.
[paste diff]
The key instruction at the end: don't be polite, be correct. Generic AI reviewers tend toward encouragement. You want a reviewer that catches the bug in line 47 even if it means saying your implementation is wrong.
Step 4: Run the Security Auditor#
Security review is separate from correctness review. The Security Auditor role focuses on:
You are a Security Auditor reviewing this code change.
Check specifically for:
- Input validation and sanitization
- SQL injection vectors (even with ORMs)
- Authentication and authorization gaps
- Secrets or credentials in code
- Insecure defaults
- Dependency vulnerabilities introduced
Output: PASS, WARN, or FAIL for each category with explanation.
The structured output format (PASS/WARN/FAIL) makes it easy to scan. You're not reading prose — you're triaging a security checklist.
Step 5: Run the Senior Designer Review (if applicable)#
If your change touches the UI, run the Senior Designer role. This catches problems that engineers routinely miss:
You are a Senior Designer reviewing this UI change.
The app is [describe your app briefly].
Check for:
- Accessibility violations (WCAG AA minimum)
- Mobile responsiveness issues
- Loading state handling
- Empty state handling
- Error state handling
- Copy and microcopy quality
- Visual hierarchy problems
Even backend engineers ship UI sometimes. Run this whenever HTML, CSS, or component templates are in the diff.
Step 6: Run the Technical Writer Review#
This one gets skipped constantly, which is why most codebases have terrible documentation. The Technical Writer role:
You are a Technical Writer reviewing this code change for documentation quality.
Check:
- Are new functions/methods documented?
- Are complex algorithms explained?
- Are configuration options described?
- Is the README still accurate?
- Are there any public-facing API changes that need changelog entries?
Paste this output into your PR description. Future-you will thank present-you.
Step 7: Triage and Act#
You now have 3-4 structured reviews. Here's how to triage:
Immediate fix (block merge):
- Any FAIL from security review
- Logic errors flagged by Staff Engineer
- Missing error handling on critical paths
Fix before shipping:
- WARNs from security review
- Accessibility violations
- Missing documentation on public APIs
Backlog:
- Style preferences
- "Would be nice to have" refactors
- Performance improvements that aren't blocking
Make the immediate fixes, commit, and run the review again on the new diff. This sounds tedious but takes 5 minutes. It's dramatically faster than a post-merge bug report.
Step 8: Write Your PR Description#
Here's a template that integrates the AI review results:
## What Changed
[one paragraph summary]
## Why
[context and motivation]
## Review Notes
- Staff Engineer: [summary of findings and how you addressed them]
- Security: [PASS/WARN/FAIL summary]
- Designer: [if applicable]
## Testing
[how you tested this]
## Checklist
- [ ] Tests pass
- [ ] No secrets in code
- [ ] Documentation updated
- [ ] Changelog entry addedWhen a human reviewer sees this, they know the code has already been through a structured review process. They can focus on higher-level concerns: architecture, product direction, "does this fit our system?"
The gstack Safety Tools#
Before merging, gstack's ship workflow includes safety tools you should know:
- careful — flags the AI to double-check its own suggestions before outputting
- freeze — locks the current implementation to prevent further AI modification
- guard — protects specific files or sections from being changed
- unfreeze — releases a freeze when you're ready to iterate again
These are particularly useful when you've approved the implementation and want to make sure subsequent AI interactions (like adding tests) don't inadvertently change the logic you just reviewed.
Common Mistakes#
Running review on partial work. The AI will review what you give it. If the error handling isn't implemented yet, it won't know to check it. Finish the feature first.
Ignoring role-specific framing. "Review this code" produces worse results than "You are a Staff Engineer reviewing this diff for production readiness." The role framing matters.
Treating all findings as equal. Not every suggestion is a blocker. Triage aggressively. Otherwise you spend 2 hours on style preferences and ship the SQL injection vulnerability.
Skipping re-review after fixes. If the Staff Engineer flagged 5 issues and you fixed them, run the review again. Sometimes a fix introduces a new problem.
Integrating with DenchClaw#
If you're building on or with DenchClaw, the gstack workflow is built directly into the workspace. You can trigger reviews from the sidebar, and results are stored as documents in your workspace — searchable, linkable, and tracked over time.
This means your review history becomes institutional knowledge. You can ask "what did we find during the review of the auth module?" and get a real answer.
FAQ#
Does AI code review replace human review? No. AI review catches a different class of bugs than human review. Humans are better at architectural concerns, product intuition, and "does this fit our system?" Run AI review before human review so humans can focus on what they're uniquely good at.
What if the AI misses something? It will. AI review is not exhaustive. The goal is to dramatically raise the floor of code quality before human review, not to achieve perfection. Combine it with tests, static analysis, and human judgment.
How do I handle large PRs? Break them into logical chunks. Review each chunk separately. Large diffs are a smell regardless — both for AI and human reviewers. If your PR is >500 lines, consider splitting it.
Can I use this with any AI model? Yes. The role-framing technique works with any capable model. That said, models with longer context windows handle larger diffs better. Claude Sonnet and GPT-4o both work well for this.
What's the difference between gstack's Review phase and just asking ChatGPT to review my code? Role structure and sequencing. gstack uses specific roles with specific mandates, and runs them in the right phase of the development cycle. Generic "review my code" prompts produce generic feedback.
Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →
