AI-Powered PR Workflow: From Branch to Merge
The complete AI-powered PR workflow: AI code review, automated testing, documentation updates, and one-click merge with gstack and DenchClaw.
AI-Powered PR Workflow: From Branch to Merge
The pull request workflow is the heartbeat of modern software engineering. Every change goes through it: the branch, the commit, the review, the merge. Done well, it maintains quality, spreads knowledge, and creates an auditable history of why things changed. Done poorly, it's a bottleneck, a rubber stamp, or a source of frustration.
AI improves the PR workflow at every stage — not by removing human judgment, but by handling the mechanical work that slows reviews down and allows important things to slip through.
Here's the complete AI-powered PR workflow.
Stage 1: Branch and Development#
Before the PR exists, AI assists during development:
Smart commit messages: AI generates commit messages from the diff. "feat: add email validation for contact creation with max length check" is more useful than "update contacts."
In-editor review: Modern AI coding tools (Cursor, Copilot) catch issues as you type. This is the first layer of quality — catching obvious bugs before you even commit.
Incremental testing: AI generates test stubs as you write functions. By the time you're done with the feature, the tests are already scaffolded.
Stage 2: Pre-PR Checks#
Before opening the PR, run gstack's Ship pre-flight checks:
# What gstack Ship runs automatically:
# 1. Sync with main (rebase or merge)
git fetch origin && git rebase origin/main
# 2. Run full test suite
npm run test
# 3. Check coverage threshold
npm run test:coverage
# 4. Lint
npm run lint
# 5. Build verification
npm run build
# 6. Run the Engineering Review (AI staff engineer)
# 7. Run QA on the running application
# 8. Generate PR description
# 9. Open PRIf any of these fail, the PR doesn't open until the issues are fixed. No PRs with failing tests. No PRs without a description.
Stage 3: The AI-Generated PR Description#
The PR description is often the most neglected part of the workflow. Many PRs have one-line descriptions that tell reviewers almost nothing about what changed and why.
gstack generates a structured description automatically:
## Summary
Adds email validation for contact creation to prevent invalid email formats
and overly long email addresses from being stored in the database.
## Motivation
Support tickets #234 and #251 both involved email-related errors when users
tried to contact people whose emails were stored malformed. This validation
prevents malformed emails from entering the system.
## Changes
- `src/utils/validation.ts`: Added `validateContactEmail()` with format and
length validation
- `src/api/contacts.ts`: Applied validation to `POST /api/contacts` before
database write
- `src/components/ContactForm.tsx`: Added inline validation feedback
- `tests/utils/validation.test.ts`: Added 12 test cases covering valid/invalid
inputs, edge cases, and the `allowEmpty` option
## Testing Instructions
1. Create a contact with a valid email → should succeed
2. Create a contact with an invalid email format → should show error inline
3. Create a contact with an email > 254 characters → should show "too long" error
4. Create a contact with no email (if optional for your CRM setup) → should succeed
## Screenshots
[Before: no validation feedback]
[After: inline validation error shown]
## Related
- Closes #234
- Closes #251This description takes about 10 seconds to generate and saves 10-15 minutes of writing. More importantly, it's consistently thorough — even at the end of a Friday when you'd otherwise write "fixed email validation."
Stage 4: Automated PR Checks#
After the PR opens, automated checks run in CI:
Tests: All tests must pass. Not "most tests" — all of them.
Coverage: Coverage must not drop below the threshold. gstack enforces this before the PR opens, but CI is the final gate.
Lint: Code style must conform. No debates in review about indentation or quote style — the linter decides.
Build: The application must build successfully. Broken builds don't merge.
gstack Benchmark: Performance metrics compared against baseline. Bundle size increase >10% fails CI.
Security scan: Automated vulnerability scanning on dependencies (e.g., npm audit, Dependabot).
These run on every PR, automatically. No manual triggering. No "oh did we run the tests?" questions.
Stage 5: AI Code Review#
When CI is passing, the AI code review runs (or is already attached from pre-PR gstack):
The AI reviewer comments on the PR with specific, line-by-line feedback:
- Security vulnerabilities with recommended fixes
- Logic errors and edge cases
- Performance patterns (N+1 queries, unnecessary re-renders)
- Test gaps
- Documentation gaps
- Code quality suggestions
Each comment has a severity level (critical/high/medium/low) and a specific recommendation. Critical items block merge; lower severity items are informational.
Stage 6: Human Code Review#
With AI handling the structural review, human reviewers focus on:
Does this solve the right problem?: Is the feature/fix aligned with the actual requirement?
Domain correctness: Is the business logic correct for the specific domain this touches?
Architecture: Does this fit the system's direction? Is there a better architectural approach?
Knowledge transfer: Should others on the team understand something about this change?
Human review comments are focused on these higher-level concerns. The conversation is faster and more valuable when structural issues are already handled.
Stage 7: Review Cycle Management#
PRs that sit unreviewed for days create merge conflict debt and slow the whole team down. AI helps manage the review cycle:
Auto-assignment: When a PR opens, AI identifies the best reviewers based on recent file ownership, expertise signals, and current review load.
Staleness alerts: PRs open for more than 24 hours without activity get an alert to the author and the assigned reviewers.
Conflict detection: When the PR base has changed significantly since the branch was created, AI alerts that rebasing is needed before review continues.
Review coverage: After AI and human review, a summary confirms that all changed files have been reviewed by at least one human.
Stage 8: Merge and Post-Merge#
After all checks pass and reviews are complete:
Auto-merge on approval: PRs that pass all CI checks, AI review, and have human approval can be configured to auto-merge.
Post-merge cleanup: Branch is deleted automatically. Any linked issues are closed.
Changelog update: gstack Document updates the changelog with the PR's changes.
Canary monitoring: gstack Canary starts monitoring the deployment for 15-30 minutes.
Measuring PR Workflow Health#
Track the health of your PR workflow in DenchClaw:
SELECT
DATE_TRUNC('week', pr_opened_at) as week,
COUNT(*) as prs_opened,
AVG(EXTRACT(EPOCH FROM (first_review_at - pr_opened_at))/3600) as avg_hours_to_first_review,
AVG(EXTRACT(EPOCH FROM (merged_at - pr_opened_at))/3600) as avg_hours_to_merge,
COUNT(CASE WHEN ai_review_issues_critical > 0 THEN 1 END) as prs_with_critical_ai_findings,
AVG(human_review_comments) as avg_human_comments
FROM v_pull_requests
WHERE merged_at IS NOT NULL
GROUP BY 1
ORDER BY 1 DESC;Key metrics to track:
- Time from PR open to first review
- Time from PR open to merge
- AI critical findings rate (is AI catching real issues?)
- Human review comment count (is it going up or down? are comments focused?)
Frequently Asked Questions#
What's the right PR size for an AI-powered workflow?#
Small PRs (under 400 lines changed) are still better. AI review scales, but human review still benefits from focused, small PRs. The goal: one logical change per PR. AI makes the process faster; it doesn't eliminate the benefits of small PRs.
Should you merge PRs without human review if AI review passed?#
For trivial changes (dependency updates, documentation fixes, minor config changes): acceptable. For any user-facing feature or bug fix: human review should remain a requirement. AI review replaces the structural check, not the strategic judgment.
How do you handle PRs that pass AI review but fail human review?#
This is useful information: it means the issue was contextual or domain-specific, not structural. Log these cases. They tell you what AI review is missing, which helps you calibrate how much to rely on AI review for similar code.
What if the AI-generated PR description is inaccurate?#
Edit it before submitting. AI-generated descriptions are first drafts. The developer who wrote the code should verify the description accurately captures the intent and scope of the change.
How do you prevent AI-generated commit messages from masking poor commit discipline?#
AI can generate good messages from good commits. For small, focused commits, AI messages are accurate and descriptive. For large, unfocused commits that change many things at once, AI messages will be vague — which is correct feedback. AI commit messages don't fix the underlying commit discipline problem; they reflect it.
Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →
