Back to The Times of Claw

gstack QA: AI That Tests Your App Like a Human

gstack QA phase tests your app end-to-end like a senior QA lead. Finds bugs humans miss, fixes them atomically, and re-verifies before shipping.

Mark Rachapoom
Mark Rachapoom
·6 min read
gstack QA: AI That Tests Your App Like a Human

gstack QA: AI That Tests Your App Like a Human

Every developer tests their own code. And that's exactly the problem. When you write code, you know how it's supposed to work — so you naturally test the paths you designed. You test that adding a user works. You test that the form submits. You test the happy path.

What you don't test: what happens when the email already exists. What happens when the API is slow. What happens when you hit submit twice. What happens when the required field is blank.

A QA lead who didn't write the code approaches it differently. They ask "what can I do to break this?" They try every edge case they can think of because they're not anchored to the developer's mental model.

gstack's QA phase is that adversarial tester.

What gstack QA Actually Does#

The QA phase in DenchClaw's gstack workflow runs the application and tests it systematically as a QA lead would.

Phase 1: Smoke test Does the feature work at all? Does the primary happy path work end-to-end?

Phase 2: Boundary conditions What happens at the edges? Empty inputs, maximum-length inputs, special characters, unexpected data types.

Phase 3: Error states What happens when things fail? API down, validation errors, permission denied, network timeout.

Phase 4: State transitions What happens when you perform actions in unexpected orders? Submit then go back. Create then immediately delete. Navigate away mid-flow.

Phase 5: Concurrent interactions What happens when two users do the same thing at the same time? Double-submit, simultaneous edits.

Phase 6: Cross-browser and cross-device Does it work on mobile? Does it work in Safari vs. Chrome?

Each phase produces a list of findings: bugs, inconsistencies, edge cases not handled. Each finding includes reproduction steps and severity classification.

The Atomic Fix Principle#

One of gstack's QA principles: atomic commits for each bug found.

When the QA phase finds a bug, the fix process is:

  1. Create an atomic commit that addresses exactly that bug
  2. Re-run the specific test that found the bug to verify the fix
  3. Run the broader smoke test to verify the fix didn't break anything else
  4. Proceed to the next bug

This produces a clean git history where each bug has a corresponding fix commit. When a regression appears later, you can bisect cleanly to find which fix introduced a new problem.

The alternative — fixing multiple bugs in one commit — makes debugging later much harder and makes code review less useful.

What gstack QA Finds That Developers Miss#

Having run the QA phase on dozens of features, here are the categories of bugs it consistently finds that developers consistently miss:

Double-submit bugs: The user clicks submit. The request is slow. They click again. Both requests go through. Two of the same record get created. This is almost never tested by the developer because they don't click submit twice.

Empty state bugs: The feature works with data but was never tested before any data exists. The empty state shows an error or renders incorrectly.

Error message quality: Validation errors that say "error: field required" instead of "Please enter your email address." The functionality works; the error communication is poor.

Mobile layout breakages: The desktop layout was tested thoroughly. The mobile layout has overflow, cut-off text, or buttons that are too small to tap.

Loading state missing: The button has no loading indicator. The user clicks, nothing happens visually, they click again.

Stale data display: After creating or updating a record, the UI shows the old data until a refresh. The database was updated; the UI wasn't.

Permission edge cases: The feature works for admin users. For read-only users, it shows actions they can't take. Clicking those actions produces cryptic errors instead of clear permission messages.

Integration with gstack's Engineering Review#

QA and Engineering Review are complementary, not duplicative.

Engineering Review looks at the code: is the logic correct? Are there race conditions? Are edge cases handled in the implementation?

QA looks at the running application: does the user experience reflect the implementation? Are there UI-layer bugs the code review didn't catch? Does the full flow work end-to-end?

A bug can pass Engineering Review and still be caught by QA. "The code handles the error" doesn't mean "the error message the user sees is useful." Both layers matter.

Writing QA Test Cases in DenchClaw#

gstack QA produces a structured test case record for every significant feature. These test cases live in DuckDB as entries in a test_cases object.

Each test case has:

  • Feature tested
  • Preconditions
  • Test steps
  • Expected result
  • Actual result (pass/fail)
  • Severity if failing

Maintaining this test case library means QA knowledge accumulates over time. When you revisit a feature, the test cases from the previous QA phase are there. When you make a change that touches the feature, you know exactly what to test.

Regression Testing After Fixes#

After any significant bug fix, gstack runs a targeted regression test:

  1. Re-run the specific test cases that caught the original bug
  2. Run a broader set of related test cases (things that touch the same code path)
  3. Run a smoke test of the overall feature

This ensures that fixes are verified and don't introduce new problems. The "fix one bug, introduce two" problem is real — regression testing after every fix catches it.

Frequently Asked Questions#

Can gstack QA replace manual QA entirely?#

For most use cases, significantly. gstack QA finds a high percentage of the bugs that manual QA would find, including many that developers would miss. But it doesn't replace user testing (understanding whether users find the feature useful or intuitive), and it has limitations for very visual or highly subjective quality assessments.

How does QA interact with automated tests?#

gstack QA generates test cases that can be formalized as automated tests. After QA finds and fixes a bug, the test case for that bug gets added to the automated test suite. Over time, the automated test suite grows to cover the scenarios QA has found to be important.

How long does a QA phase take for a typical feature?#

30-60 minutes for a medium-complexity feature. Complex features with many state transitions or error conditions can take 90+ minutes. The investment is almost always worth it.

What happens when QA finds a critical bug right before a deadline?#

Fix the bug. A critical bug in production is more expensive than a delayed release. gstack's QA phase is run before the Ship phase specifically to prevent this situation — better to find it now than after customers hit it.

How do you prioritize which bugs to fix immediately vs. defer?#

gstack classifies bugs by severity: Critical (blocks core functionality), High (significant user impact, workaround exists), Medium (poor experience but functional), Low (minor cosmetic issues). Critical and High must be fixed before shipping. Medium and Low can be tracked as known issues with timelines.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →

Mark Rachapoom

Written by

Mark Rachapoom

Building the future of AI CRM software.

Continue reading

DENCH

© 2026 DenchHQ · San Francisco, CA