Slide 01

The Testing Pyramid Is a Budget Document. Not an Engineering Standard.

CTO + Director + Board

Core claim

The pyramid exists because human beings are expensive and E2E test maintenance is a human-capital sinkhole. Remove that constraint and the entire ratio collapses.

The Testing Pyramid — broad base of unit tests, thin integration layer, tiny sliver of E2E — was never a technical recommendation. It was a financial one. The shape of the pyramid is the shape of human labor costs. In an agent-driven world, that shape no longer makes sense.

The shift When the cost of writing and maintaining E2E tests drops to near zero, the pyramid becomes the square — or the triangle flips entirely.

Slide 02

One Flaky E2E Test Can Burn a Day. Ten Can Burn a Sprint. That Was the Real Reason.

The original constraint

Unit tests Near zero

One engineer writes them alongside the code. Run in milliseconds. Break cleanly. A human can maintain hundreds without losing their mind.

Integration tests Moderate

Need test environments, data setup, cross-service coordination. Failures may live in your code or in a contract with someone else's service.

E2E tests Expensive

Full environments, synthetic users, browser automation, network dependencies. One flaky Selenium script can burn a full day of engineering time.

A team of five QA engineers can maintain maybe two hundred E2E tests before the maintenance burden starts eating their capacity to write new ones. That is the constraint. Not technical. Financial.

The pyramid exists because human beings are expensive — not because E2E tests are inferior

Slide 03

A Car Loan Has 200 Edge Cases. You Are Testing Maybe Eight of Them End-to-End.

The gap in your coverage

What a real E2E flow looks like

Account creation → KYC checks → identity verification → session management
Loan application → income verification → document upload → decisioning engine
36 monthly payments → ACH pulls → late fees → grace periods → credit reporting
Payoff → lien release → e-title transfer → account archival

What you actually test today

Identity team tests signup with a mocked lending service
Payments team tests ACH with a synthetic loan record that never touched real decisioning
The handshake between payment completion and e-title? Tested manually once a quarter by someone who knows where the bodies are buried
Integration between your domains? A prayer that the seams hold under real traffic

Slide 04

The Ratio Is a Human-Capital Ratio. Remove the Human Cost. Rethink the Ratio.

The agent economics

What Brendan realized

"The entire ratio — units to integration to E2E — is a human-capital ratio. If you take the human cost out of test maintenance, why would you keep the same ratio?"

In an agent-driven world: Writing a test — near zero cost. Maintaining it when code changes — near zero. Generating synthetic data — near zero. Updating a selector when the UI shifts — near zero.

What remains Compute costs. And engineers who understand what the tests should assert and whether a passing test actually means what you think it means. That judgment is not going anywhere.

What the new shape allows

Every account creation path. Every loan decisioning branch. Every payment edge case — extra payment, missed payment, payoff with a balance dispute. Every report format. Every regulatory variant.

The tests that most closely mirror user behavior. The tests that catch integration failures across domain seams. The tests the pyramid said you couldn't afford.

New shape Not the pyramid. Not the inverted pyramid. The square — comprehensive coverage at every layer because none of them are prohibitively expensive anymore.

Slide 05

Agents Write the Tests. Engineers Decide What Passing Means.

Where human judgment lives

Agents do

The mechanical work

Write the test cases. Generate synthetic data. Update selectors when UI changes. Maintain fixtures. Regenerate coverage when code changes. Run at scale without human supervision.

Humans do

The judgment work

Define what correct behavior looks like. Decide whether a passing test means the system is safe. Catch the class of bugs where the test is passing but the assertion is wrong. Know what the software is supposed to do.

The risk

Automating the wrong thing

Coverage goes up. Confidence goes up. But if engineers don't verify that passing tests mean real correctness — not just that the assertion matches the current behavior — you've built a faster way to ship confident bugs.

Warning The bottleneck moves from writing tests to understanding what you're testing. Your senior engineers become more valuable, not less. Do not cut QA leadership when you cut QA labor.

Slide 06

Start With Your Most Dangerous Seam. The One QA Tests Manually Once a Quarter.

Where to begin

What to do first

Map every domain seam in your system — places where one team's output becomes another team's input
Identify which seams have no automated E2E coverage, only manual verification or tribal knowledge
Pick the one most likely to fail under pressure — highest traffic, most recent drift, most manual coordination
Build agent-maintained E2E tests for that seam first. Prove the economics. Then expand.

What not to do

Do not run a "testing transformation" program. Start with one seam that matters.
Do not cut QA headcount before you've proven that agents maintain quality, not just coverage.
Do not conflate coverage numbers with safety. A 95% coverage score built on wrong assertions is worse than 60% coverage built on right ones.
Do not skip the assertion review. That is where the judgment lives. That is not automatable.

Slide 07

Your Pyramid Assumes 2009 Labor Costs. It Is 2026. Is Your Test Strategy Current?

Decision close

The decision in front of you

If you can write and maintain comprehensive E2E tests for near-zero labor cost, what is stopping you from doing it?

Not what is philosophically stopping you. What is actually stopping you — right now — in your specific system, with your specific teams, covering your most dangerous domain seams?

If the answer is "nothing," that's an operations gap. If the answer is "we haven't tried," that's a leadership gap. If the answer is "we don't know if the tools can do it," that's a 30-day experiment.