Team A: Knowledge sharing
Senior walks junior through the change. Explains trade-offs. Everyone learns. Valuable. But that is mentoring, not quality assurance.
Slide 01
Your teams spend $343K/year on a ceremony where 85% of the output is about style, naming, and social norms. Your production incidents come from the gaps the ceremony does not cover. The answer is not better reviewers. It is verification gates that prove the product works.
Slide 02
Senior walks junior through the change. Explains trade-offs. Everyone learns. Valuable. But that is mentoring, not quality assurance.
Second pair of eyes before production. Also valuable. But only if the reviewer reads every line, traces logic paths, and checks edge cases. Almost nobody does.
"We reviewed 100% of PRs this quarter." On a slide deck nobody in the room believes. Measuring comment volume without quality filters. A SQL injection flag and a semicolon suggestion count the same.
Ask ten engineering leaders what code review is and you will get ten answers. That is not a practice. That is a phrase everyone uses to describe something different.
SAD MF (Scaled Agile DevOps Maturity Framework) — sadmf.com
Slide 03
It means I do not understand this area of the code well enough to say anything useful, but I am not going to admit that. So I will approve it and hope the tests catch whatever I missed.
That is not rigor. That is a rain dance.
SmartBear data. Eleven minutes for a change that took eight hours to build.
Microsoft research: reviewers spend most time on style and formatting. Things a linter handles in milliseconds.
Slide 04
Conservative estimate for a 200-person engineering org.
14.7 hours of senior engineer time per day at $90/hr loaded cost.
Spent on a process where 85% of the output is about style, naming, and social norms. Work a linter does for free.
The incidents that reach production are the ones that pass review and pass tests. Semantic bugs. Integration failures. Edge cases nobody thought to check. The things eleven-minute reviews were never going to find.
You are spending $343K on ceremony and your incidents come from the gaps
Slide 05
Capers Jones: structured, multi-participant, documented code inspection catches ~60% of defects. That is good. That is not what your teams do.
Industry estimates for pull request review done in eleven minutes between meetings. Your CI pipeline catches more. Your linter catches more.
Most teams track whether reviews happen. Not whether they produce value. That is like tracking whether your team wore helmets, not whether they scored.
Slide 06
Sensor networks for the US Army. Chemical weapon leak detection on a base where the team sat. If the software missed a reading or threw a false negative, the people breathing that air were us. We did not LGTM that code. We verified it. We validated it. We ran it.
Babies in the NICU whose bodies could not tolerate a rounding error. The difference between a therapeutic dose and a lethal dose for a two-pound infant is measured in micrograms. We did not skim that code in eleven minutes between meetings. We proved it was correct.
Slide 07
It reads every line, traces logic, checks edge cases, flags security issues. It does what your best reviewer does, but on every single PR without exception.
It does not know your state machine has an undocumented transition three customers depend on. It does not know the function was written to work around a vendor API bug from 2019.
Slide 08
Your billing service has a race condition when two invoices close in the same millisecond. No training data covers that. No generic review catches it. Only someone who knows your system or a test that exercises that exact path.
Your state machine has a transition that three customers depend on. It is not in the docs. It is not in the tests. It is in one engineer's head. A reviewer, human or AI, cannot catch what is not documented.
The function was written to work around a vendor API bug from 2019. A reviewer sees "ugly code" and refactors it. The workaround disappears. The bug returns. You build the thing, then you check the thing. The defect already exists.
Slide 09
Not by a reviewer. By the system. Deming said it about manufacturing fifty years before Farley and Humble translated it into software.
Slide 10
Every commit triggers a build. The artifact is identical to what runs in production. No manual packaging. No "it works on my machine."
Unit, integration, contract, end-to-end. Not "run the fast ones." All of them. Every time. The pipeline does not get tired on Fridays.
Automated security scanning. Dependency checks. Compliance rules encoded as code. Not a checklist someone fills out quarterly.
Critical-path latency. Memory usage. Throughput. Regressions caught before merge, not after customers complain.
Confirm the deployment can roll back cleanly. If you cannot undo it, you should not ship it.
The artifact that passes all gates is the artifact that deploys. No rebuild. No re-package. What was tested is what ships.
Slide 11
An AI gate that understands your domain model validates that a pricing change does not create negative-margin scenarios across your product catalog. Not pattern matching. System-aware verification.
An AI gate that has ingested your API contracts verifies that a schema change does not break downstream consumers in ways a static type checker cannot see.
An AI gate that knows your compliance requirements flags a data retention change that would put you out of HIPAA compliance before it ever reaches a human screen.
One is informed by training data. The other is informed by your system's actual requirements. That is the difference between AI reviewing code and AI verifying the product is correct.
AI as pipeline participant, not as diff reader
Slide 12
Have your agents write the tests that cover those paths. Measure change failure rate before and after. You should see movement within 90 days.
Contract tests between services. Performance benchmarks for critical paths. Start tracking what your pipeline catches that your reviewers did not.
When the pipeline catches more defects than reviewers, treat code review as knowledge sharing. Not quality gate. Reviews become about design, mentoring, shared understanding. The pipeline finds the bugs.
Slide 13
Stop measuring whether code reviews happen. Start measuring whether your pipeline catches defects before production. If your metric is "PR approval rate," you are measuring ceremony.
For each one, ask two questions. Would our review process have caught this? Would our pipeline have caught this? If neither, that is a gap in your validation gates. Not a reason to add more reviewers.
If your payment processing has three edge cases that caused incidents, the fix is not a more careful reviewer. The fix is three tests that make those edge cases impossible to ship.
Read Farley and Humble's Continuous Delivery. Read it again if you read it in 2010. I promise you did not implement half of it.
The data has been clear for years — deployment frequency correlates with lower failure rates
Slide 14
Code review has value as knowledge sharing, as mentoring, as shared understanding. That value is real. But it was never quality assurance. It was a ritual performed because it felt rigorous.