aim success. You must define a hard numerical threshold. This eliminates the "cheap escape hatch" and forces the agent to explore deeper code paths.
- Bad: "Add more tests."
- Good: "Increase test coverage by at least 20%."
2. The Regression Proof
Coverage alone is insufficient. You must require the agent to demonstrate that the new tests have value by catching a defect. This shifts the goal from satisfying a report to hunting for proof of fragility.
- Bad: "Make sure tests pass."
- Good: "If you cannot identify at least one real regression or missing edge case, the test suite is invalid."
Implementation Example
Below is a TypeScript implementation demonstrating the difference between a trivial assertion and a pressure-driven test.
Context: A function calculateShipping that applies discounts based on order value and weight.
Trivial Output (Standard Prompt Result):
// This test executes the function but asserts nothing meaningful.
// It passes even if calculateShipping returns null or a negative number.
it('should calculate shipping', () => {
const result = calculateShipping({ value: 100, weight: 5 });
expect(result).toBeDefined(); // Noise assertion
});
Adversarial Output (Pressure Prompt Result):
// Prompt used:
// "Increase coverage by 20%.
// You must find a case where the discount logic fails for edge inputs.
// If no regression is found, the output is rejected."
describe('calculateShipping pressure tests', () => {
it('should reject negative order values with a validation error', () => {
// Regression found: Original code did not handle negative values,
// resulting in a negative shipping cost.
expect(() => calculateShipping({ value: -50, weight: 2 }))
.toThrow('Order value must be non-negative');
});
it('should cap discount at 50% regardless of volume', () => {
// Edge case exposed: Discount logic compounded incorrectly above threshold.
const result = calculateShipping({ value: 10000, weight: 100 });
expect(result.discountPercentage).toBeLessThanOrEqual(50);
});
});
Architecture Rationale:
- TypeScript Selection: Using TypeScript enforces type safety, which complements the pressure prompt by allowing the agent to generate tests that verify type constraints and interface contracts, not just runtime values.
- Behavioral Assertions: The adversarial output replaces
toBeDefined with specific behavioral checks (toThrow, toBeLessThanOrEqual). These assertions fail if the business logic is incorrect, providing genuine regression protection.
- Iterative Refinement: If the agent claims to find a regression but the test passes on the original code, the prompt must be reapplied. The "proof" requirement creates a feedback loop that improves test quality over iterations.
Pitfall Guide
Even with adversarial prompting, teams encounter specific failure modes. Below are common mistakes and their remedies based on production experience.
| Pitfall | Explanation | Fix |
|---|
| The Hallucinated Regression | The agent claims to find a bug, but the test passes on the original code. This often happens when the agent invents a scenario that the code already handles correctly. | Verify the regression manually. Run the new test against the unmodified source. If it passes, the "regression" is false; reject the output and re-prompt. |
| Trivial Assertion Drift | The agent satisfies the coverage floor but fills the suite with toBeTruthy or toEqual(undefined) checks to pad metrics. | Explicitly ban trivial assertions in the prompt. Add: "Do not use toBeDefined, toBeTruthy, or empty object comparisons. All assertions must verify specific business values or error states." |
| Context Blindness | The agent generates tests for edge cases that are impossible due to upstream validation, creating dead tests. | Provide domain constraints. Include comments or documentation in the prompt that define valid input ranges and invariants. |
| Mutation Blindness | The test suite has high coverage but survives code mutations, indicating weak assertions. | Integrate mutation testing (e.g., Stryker) into the CI pipeline. Use mutation score as a secondary metric to validate the quality of AI-generated tests. |
| The "Green" Trap | Developers accept the output because all tests pass, ignoring that the tests may be too loose to catch future changes. | Adopt a "Red-Green-Refactor" mindset for AI. The agent should first produce a failing test (proving the regression), then the code is fixed, then the test passes. |
| Over-Pressuring New Code | Applying high regression pressure to brand-new features where no bugs exist yet can lead to forced, artificial tests. | Adjust pressure based on code maturity. Use "Spec-Adherence" prompts for new code ("Verify implementation matches requirements") and "Regression-Hunting" prompts for legacy code. |
| Ignoring Test Performance | Pressure prompts may cause the agent to generate expensive integration tests or slow mocks to hit coverage targets. | Add performance constraints. Include: "Tests must execute in under 50ms. Use mocks for external dependencies. Do not introduce slow I/O operations." |
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Legacy Code with High Bug Rate | Adversarial Pressure<br>(High coverage floor + Regression proof) | Legacy code often has hidden edge cases. Pressure forces the AI to expose latent defects. | High initial time cost; significantly reduces long-term bug fix costs. |
| New Feature Development | Spec-Adherence Prompt<br>(Verify requirements + Boundary checks) | No existing bugs to find. Focus should be on ensuring the implementation matches the spec. | Low cost; accelerates development with reliable contracts. |
| Refactoring Existing Module | Regression-Hunting<br>(Focus on preserving behavior) | Goal is to ensure refactoring doesn't break existing logic. Pressure helps find behavioral drift. | Moderate cost; prevents regression bugs during refactor. |
| Critical Payment/Security Module | Adversarial + Mutation<br>(Pressure prompt + Mutation score > 80%) | High risk requires maximum confidence. Mutation testing validates assertion strength. | High cost; justified by risk mitigation. |
Configuration Template
Use this template in your .cursorrules, .github/copilot-instructions.md, or agent configuration to enforce adversarial standards globally.
# AI Testing Protocol: Adversarial Mode
# Apply to all test generation requests unless overridden.
rules:
- name: "Coverage Floor"
description: "Always request a minimum coverage increase."
template: "Increase test coverage by at least {coverage_delta}%."
- name: "Regression Proof"
description: "Tests must demonstrate value by catching defects."
template: "You must identify at least one regression, edge-case failure, or missing validation. If no defect is found, the test suite is invalid."
- name: "Assertion Quality"
description: "Ban trivial assertions."
forbidden_patterns:
- "toBeDefined"
- "toBeTruthy"
- "assertNotNull"
- "toEqual({})"
required_behavior: "Assertions must verify specific values, types, error messages, or state changes."
- name: "Performance Guard"
description: "Prevent slow tests."
constraint: "All tests must complete in < 50ms. Mock external I/O."
prompt_structure: |
"Analyze {module_name}.
Increase test coverage by {coverage_delta}%.
Hunt for regressions: If you cannot find a real bug or edge-case failure, your tests are insufficient.
Do not use trivial assertions.
Verify all findings against the current implementation."
Quick Start Guide
- Identify Target: Select a module with low coverage or high bug frequency.
- Run Baseline: Execute your test runner to get the current coverage percentage.
- Apply Prompt: Use the configuration template to generate the prompt. Example:
"Analyze OrderProcessor. Increase test coverage by 25%. You must find at least one regression where the discount calculation fails for boundary inputs. Do not use trivial assertions. If no regression is found, the output is rejected."
- Verify Output: Review the generated tests. Run them against the original code to confirm regressions fail.
- Iterate: If the agent returns weak tests, re-prompt with: "The previous assertions were trivial. Rewrite with behavioral checks and find a deeper edge case."
- Merge: Once tests pass and regressions are verified, commit the suite and update documentation.