AI-Assisted Testing: A Practical Guide
AI-assisted testing in practice: generate test cases, find edge cases, and maintain test suites with AI. A hands-on guide for engineering teams.
AI-Assisted Testing: A Practical Guide
Testing is the part of software development that everyone agrees is important and nobody has enough time for. You write the feature, you make sure it works for the happy path, and the test suite is the thing that gets written "when there's time" — which is to say, often not at all.
AI changes the testing economics. Test generation, edge case identification, and test maintenance are all tasks where AI provides leverage — not replacing the judgment of what to test, but handling the mechanical work of writing and updating tests.
Here's a practical guide to integrating AI into your testing workflow.
The Test Coverage Problem#
Before the AI tools: understand the problem clearly.
Most engineering teams have a test coverage problem that falls into one of these patterns:
Pattern 1: No tests Usually early-stage startups or legacy codebases that grew without testing infrastructure. Everything is a black box; changes are scary.
Pattern 2: Tests for happy paths only Tests exist, but they only test the "it works correctly" scenario. What happens when the database is down? What happens when the input is malformed? Nobody tested that.
Pattern 3: Fragile tests Tests exist and have good coverage, but they're fragile — minor implementation changes break them, they're slow, they use real external services instead of mocks. The team spends more time fixing tests than writing features.
Pattern 4: Good tests, poor coverage gaps The testing infrastructure is solid, but there are systematic gaps — entire modules untested, certain code paths never reached, important business logic with no regression coverage.
AI helps with all four patterns, differently.
Generating Tests from Code#
The most direct AI testing application: generate tests from existing code.
For an existing function with no tests, an AI can analyze the function's signature, implementation, and behavior, and generate a comprehensive test suite:
// Given this function:
function validateContactEmail(email: string, options: { allowEmpty?: boolean } = {}): {
valid: boolean;
error?: string;
} {
if (!email && options.allowEmpty) return { valid: true };
if (!email) return { valid: false, error: 'Email is required' };
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
if (!emailRegex.test(email)) return { valid: false, error: 'Invalid email format' };
if (email.length > 254) return { valid: false, error: 'Email too long' };
return { valid: true };
}AI generates:
describe('validateContactEmail', () => {
describe('valid emails', () => {
it('accepts standard email format', () => {
expect(validateContactEmail('user@example.com')).toEqual({ valid: true });
});
it('accepts email with subdomain', () => {
expect(validateContactEmail('user@mail.example.com')).toEqual({ valid: true });
});
it('accepts email with plus addressing', () => {
expect(validateContactEmail('user+label@example.com')).toEqual({ valid: true });
});
});
describe('invalid emails', () => {
it('rejects empty email when allowEmpty is false', () => {
expect(validateContactEmail('')).toEqual({ valid: false, error: 'Email is required' });
});
it('rejects null', () => {
expect(validateContactEmail(null as any)).toEqual({ valid: false, error: 'Email is required' });
});
it('rejects email without @', () => {
expect(validateContactEmail('notanemail')).toEqual({ valid: false, error: 'Invalid email format' });
});
it('rejects email without domain', () => {
expect(validateContactEmail('user@')).toEqual({ valid: false, error: 'Invalid email format' });
});
it('rejects email that is too long (>254 chars)', () => {
const longEmail = 'a'.repeat(250) + '@b.com';
expect(validateContactEmail(longEmail)).toEqual({ valid: false, error: 'Email too long' });
});
});
describe('allowEmpty option', () => {
it('accepts empty email when allowEmpty is true', () => {
expect(validateContactEmail('', { allowEmpty: true })).toEqual({ valid: true });
});
it('still validates format when email is provided with allowEmpty', () => {
expect(validateContactEmail('notanemail', { allowEmpty: true })).toEqual({
valid: false,
error: 'Invalid email format'
});
});
});
});This is a complete, well-structured test suite generated in seconds. It covers the happy path, the error paths, edge cases (null input, very long email), and the optional parameter behavior.
Finding Edge Cases AI Identifies That Humans Miss#
AI excels at systematic edge case enumeration. Given a function's behavior, AI identifies:
Boundary values: The exact threshold between valid and invalid. If a field has a 254-character maximum, AI tests 253, 254, and 255.
Type boundaries: What happens with null, undefined, empty string, zero, negative numbers, very large numbers, NaN, Infinity.
Unicode and special characters: What happens with emoji, RTL text, null bytes, SQL metacharacters.
Concurrent execution: When two operations run simultaneously, do they interfere?
Order dependencies: Does the output change if operations run in different orders?
Resource limits: What happens at the limits of lists, nested objects, file sizes.
Humans systematically undertest these because they're thinking about the feature, not the enumerated set of all possible inputs. AI has no such bias.
Integration Test Generation#
Unit tests test individual functions. Integration tests test how they work together. AI helps here too, but the value is different: integration tests require understanding the desired system behavior, which AI can infer from the code but benefits from human guidance.
The workflow that works:
- Human: describe the user flow to test ("user creates a contact, adds a note, and the contact appears in search results")
- AI: generate the test that exercises that flow
- Human: review the test for correctness and completeness
AI handles the mechanical writing; humans provide the behavioral intent.
Maintaining Tests When Code Changes#
Test maintenance is the unsexy side of testing. When a function signature changes, or a behavior changes, tests break. Updating them is tedious.
AI can update tests automatically when code changes:
- AI detects that a function signature changed (from the PR diff)
- AI identifies all tests that call that function
- AI updates the test calls to match the new signature
- AI updates the expected values if the behavior changed
This isn't perfect — sometimes the behavior change means the test's intent has changed and a human needs to decide what the new expected behavior should be. But the mechanical update (change the parameter name, update the type) AI handles automatically.
Test Quality: What Makes a Good Test#
AI-generated tests are a starting point, not the final word. Review them for:
Single assertion per test: Each test should verify one thing. A test called "test contact creation" that checks 15 different assertions is hard to understand when it fails.
Descriptive names: "it rejects email that is too long (>254 chars)" is better than "test case 4." The test name should document the behavior.
No test interdependence: Tests should not depend on other tests running first. Each test should set up its own state.
Reasonable assertions: Don't test implementation details that change frequently. Test behavior, not internal state.
Appropriate mocking: External services should be mocked. Database calls in unit tests should be mocked or use test databases.
AI gets most of this right, but review for the few cases where it doesn't.
Testing in DenchClaw with gstack#
DenchClaw's gstack QA phase runs the actual application and generates test findings. The testing workflow in gstack:
- Benchmark checks performance before the PR
- Engineering Review checks code quality
- QA tests the running application
- Ship verifies test coverage before merging
The QA findings can be converted to automated tests: "Add a regression test for the double-submit bug found in QA." AI generates the specific test from the QA finding.
Frequently Asked Questions#
How do you prioritize which tests to write when you have low coverage?#
Prioritize by risk: what code, if it fails, has the biggest user impact? Core authentication, payment flows, data import/export, and any feature mentioned in customer support tickets more than once. Start with tests that cover the highest-risk code paths.
Should AI-generated tests be reviewed before merging?#
Yes, always. AI generates plausible tests, not always correct tests. Verify that the test is testing what you think it's testing, that the assertions are correct, and that it would actually fail if the feature were broken.
What's the right balance between unit, integration, and end-to-end tests?#
The testing pyramid: many unit tests (fast, targeted), fewer integration tests (slower, test interactions), few end-to-end tests (slow, test full flows). AI is most efficient at generating unit tests; end-to-end tests benefit most from human design but AI can write the code.
How do you handle tests for code that interacts with external APIs?#
Mock the external APIs in unit and integration tests. Have a separate test suite that runs against real external services (in a test environment). AI can generate both the test logic and the mock implementations.
What testing framework should you use with AI-assisted testing?#
Any mainstream testing framework works with AI generation (Jest, Vitest, pytest, go test, RSpec). AI knows them all. Pick the one your team knows best; the AI will adapt.
Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →
