CxO + VP Engineering briefing 01 / 13

Slide 01

The Testing Pyramid Was Never a Technical Ideal. It Was a Budget Document.

CxO + VP Engineering + Board
Core thesis

The shape of the Testing Pyramid is the shape of human labor costs. Agents removed that constraint. The pyramid becomes a square.

For fifteen years, the pyramid told you to write fewer end-to-end tests because humans could not maintain them at scale. That was the right call given the constraint. The constraint no longer exists. If your testing strategy still looks like a pyramid in 2026, you are carrying risk you do not need to carry.

Decision Equal investment across every test type. Maximum risk buydown. No excuses.

Slide 02

The Pyramid Told You How to Allocate a Scarce Resource. The Scarce Resource Was People.

Market signal
Unit tests Cheap

One engineer writes them alongside the code. They run in milliseconds. They break cleanly. A human can maintain hundreds without losing their mind.

Integration tests More

Test environments. Data setup. Cross-service coordination. When they break, a human has to investigate whether the fault is yours or someone else's. That takes time.

E2E tests Expensive

Full environments, synthetic users, browser automation. One flaky E2E test burns a day of engineering time. Ten burn a sprint.

"Why does the pyramid look like a pyramid?" Because of what things cost. The entire ratio — units to integration to E2E — is a human-capital ratio.

That is not a testing strategy. That is a budget allocation model disguised as engineering wisdom.

Slide 03

E2E Tests Are Not the Worst Tests. They Are the Best Tests. We Just Could Not Afford Them.

The suppressed truth
What E2E tests catch

Integration failures. Regression bugs. Data corruption. Race conditions. UI inconsistencies. The behavioral drift that happens when twelve teams ship to the same product surface over eighteen months.

E2E tests most closely mirror what your users actually do. If you could write one for every permutation, every user path, every edge case, you would have near-perfect software.

The constraint

A team of five QA engineers can maintain maybe two hundred E2E tests before the maintenance burden eats their capacity to write new ones.

That is not a technical limitation. That is a financial one. The pyramid exists because human beings are expensive and E2E test maintenance is a human-capital sinkhole.

Key number ~200 E2E tests: the practical ceiling for a five-person QA team before maintenance consumes all capacity.

Slide 04

One Car Loan. Six Domains of Complexity. The Seams Between Them Are Where Production Breaks.

Concrete example
01

Account creation

Auth, KYC, identity verification, 2FA, session management, password reset, lockout policies. A full domain before anyone applies for anything.

02

Application + decisioning

Income, employment, document upload, automated underwriting. Approved, denied, or conditional. Each branch is its own workflow with its own edge cases.

03

36 monthly payments

ACH pulls, late fees, grace periods, principal-vs-interest application, extra payment handling. One missed payment triggers a cascade of business rules.

04

Reporting + payoff

Monthly statements, 1098 tax docs, credit bureau reporting, lien release, e-title transfer. Each with its own regulatory requirements and delivery mechanism.

Reality Nobody writes an E2E test that walks a synthetic customer from account creation through 36 months of payments to e-title delivery. So the seams between domains are where your production incidents live.

Slide 05

The Contract Between Services Is a Handshake Agreement. Someone Documented It in Confluence Eight Months Ago. It Has Drifted Twice.

Risk exposure

What the pyramid gives you

  • Good unit tests per domain
  • Decent integration tests per domain
  • A handful of E2E tests per domain
  • Mocked boundaries everywhere

What the pyramid hides

  • Contract drift between payment and loan ledger
  • Data shape mismatches between decisioning and payments
  • Integration between payment completion and e-title generation tested manually once a quarter
  • A prayer that the seams hold under real traffic

Slide 06

If You Take the Human Cost Out of Test Maintenance, Why Would You Keep the Same Ratio?

The shift
What agents make near-zero

Writing tests. Maintaining tests. Generating synthetic data. Updating tests when code changes. Regenerating fixtures when schemas shift.

The part where a human opens a test file, reads the failure, traces it to a code change three PRs ago, updates the selector, regenerates the fixture, reruns the suite, watches it fail again for a different reason, fixes that too, and commits the whole mess? That part is gone.

What still requires humans

Deciding what the tests should assert. Whether a passing test actually means what you think it means. Judgment about risk and correctness.

You still pay for compute. You still need engineers who understand what to test and why. That judgment is not going anywhere. But the manual labor of test maintenance is no longer the binding constraint.

Question If you could have any ratio of tests you wanted and the cost of maintaining each type was roughly equivalent, what shape would your testing strategy be?

Slide 07

The Testing Square: Equal Investment Across Every Test Type. Maximum Risk Buydown.

The new model
Unit Full depth

Agents generate and maintain the complete unit suite for every service. Same as before, but fully automated regeneration on every code change.

Integration Full depth

Agents maintain cross-service integration tests. Payment-to-ledger. Decisioning-to-payments. No more mocked boundaries hiding contract drift.

Contract Full depth

Agents verify that the payment API still matches what reporting expects. Every commit. Not once a quarter by a QA engineer who knows where the bodies are buried.

E2E Full depth

Agents walk a synthetic borrower through the complete 36-month lifecycle. Account creation to e-title. A suite that would have taken a human team a week to build and a month to keep from rotting.

Performance Full depth

Agents simulate ten thousand borrowers all making payments on the first of the month. Every release. Not just before the annual audit.

Slide 08

Two Lines of Code. In the Pyramid, Unit Tests Pass and Everyone Moves On. In the Square, Everything Runs Before the PR Is Reviewed.

Operating difference

Pyramid: rounding fix in extra-payment logic

  • Unit tests pass
  • PR approved
  • Merged to main
  • Year-end 1098 shows wrong interest for six months
  • Compliance finding

Square: same rounding fix

  • Agents regenerate integration tests
  • Contract tests verify the API still matches reporting
  • E2E suite confirms the fix does not break the year-end 1098
  • Performance suite checks for slow queries
  • All before the PR is reviewed

Slide 09

The Test Doubles Debate Was Never About Philosophy. It Was About What You Could Afford to Test Against.

Test architecture
The two camps

Mock everything for isolation. Or mock nothing because mocks hide the bugs that matter most. Both camps are right about what they worry about. Both are wrong about the tradeoff being permanent.

If you mock the database, you never find out your ORM generates a query that locks the table for thirty seconds under load. If you stub the payment processor, you never discover their API changed the error response shape and your retry logic now swallows failures silently.

Agents end the war

Agents build and maintain code for 100% testability. They spin up test environments, generate synthetic data, wire real services together, and run the full integration.

You do not have to choose between isolation and realism anymore. Mocks where isolation genuinely matters (pure business logic without network noise), real integrations everywhere else. The ratio stops being a religious war and becomes an engineering decision.

Quote "We spent six months arguing about whether to mock Kafka in our integration tests. The agents do not have opinions about mocking Kafka. They just write both versions and run them."

Slide 10

Quality Is Becoming a Property of the Build Process, Not a Department That Reviews the Build After It Is Done.

Org design
Why handoffs kill quality

Every handoff is a place where information degrades. When a developer finishes a feature and hands it to QA, context is lost. Intent is lost. The subtle reasoning behind a design choice does not survive the transition.

The QA engineer reads the acceptance criteria, builds a test plan around what was written down, and misses the thing that was never written down because the developer thought it was obvious.

Agent-driven quality

The agent that writes the code also writes the tests. It has full context. It knows why the code is structured the way it is. There is nothing to hand off.

The knowledge and the verification live in the same process. The feedback loop is measured in seconds, not days. Engineers who understand quality deeply are more valuable than ever — but they are embedded in engineering teams, shaping what agents build and how agents test.

Warning If your professional identity is built around being the gatekeeper between engineering and production, this should concern you.

Slide 11

The Square Costs Real Money. Compared to What?

Economics
Square cost Compute

Agent orchestration. Full test suites across every domain. Engineering time to design the test architecture — deciding what agents should assert and how domains get segmented. Real work requiring real judgment.

Pyramid cost Risk

Production outage during month-end close. Compliance finding from wrong interest on six months of 1098s. Security incident from an unvalidated session token exposing 1,200 borrowers' data.

What you get Trust

Every commit runs through unit, integration, contract, E2E, and performance tests at equal depth. You ship because the system told you it was safe. Not because someone said "I think we're good."

Your release cadence stops being a negotiation. It becomes a function of how much change your customers can absorb. That is a different problem entirely, and a better one to have.

The square is dramatically cheaper than the risk you are carrying right now — with a testing strategy shaped by a budget constraint from 2012.

Slide 12

Fowler Was Not Wrong. Cohn Was Not Wrong. They Were Constrained. The Constraints Changed.

Context
Credit where due

The Testing Pyramid was serious work. The engineers who built testing strategies around it for fifteen years were not wrong. They were designing for the world they were in.

Given the reality that human engineers had to write, maintain, and debug every test by hand, the pyramid was brilliant. It told you how to allocate a scarce resource across test types with wildly different maintenance costs.

The shift

That is how engineering works. You design for the world you are in, and when the world shifts, the design has to shift with it.

Human capital was the constraint. Agents removed it. The pyramid becomes a square.

Prerequisite If your quality organization is still writing your tests in 2026, fix that first. Developers own tests. Period. Quality is part of the role. If you have not made that transition, the square is not your problem. Your org design is.

Slide 13

If Your Testing Strategy Still Looks Like a Pyramid in 2026, What Exactly Are You Optimizing For?

Decision close
The decision

You are making a choice — whether you realize it or not — to under-invest in the test types that catch the most dangerous bugs.

Integration failures. Contract drift. End-to-end workflow breaks. Performance degradation under load. These are the defects that take down production, trigger compliance reviews, and erode customer trust. And you are deliberately writing fewer tests for them because fifteen years ago someone told you they were too expensive to maintain.

They were. They are not anymore.