, ,

Every Consultant Says They Can Fix Your Legacy App with AI — Here Is the Test

·

·

13 min read

Sarah, I wrote an entire novel about a CEO who tried to fix his monolith three times. Forty-seven million dollars across three failed initiatives before he changed the approach. That was fiction. What I am about to tell you is not.

It is a Tuesday morning and your CTO just walked out of a board meeting where the legacy system came up again. It always comes up. The system that runs forty percent of revenue. The system nobody wants to touch. The system that has not had a confident deploy since the last two engineers who understood its internals left for companies that let them build things instead of babysit things.

The board wants to know why the company cannot move faster. The CTO knows the answer. The monolith. It is always the monolith.

And now someone — a vendor, a consultancy, a partner, a very polished person with a very polished deck — has walked into the room and said the words the CTO has been waiting to hear.

We can fix it. We have AI agents now. This time is different.

The CTO wants to believe them. The CTO needs to believe them. Because this is the third time.


The First Time

The first time was 2019. Maybe 2018 at your company. The initiative had a name. It always has a name. Phoenix. Horizon. Evolve. Something aspirational that looked good on the program charter and the quarterly business review.

The plan was a full rewrite. Microservices. The architecture team spent four months designing the target state. Beautiful diagrams. Clean domain boundaries. Event-driven everything. The kind of architecture that makes senior engineers nod approvingly in review meetings and makes absolutely no contact with the reality of a fifteen-year-old codebase that has nine hundred database tables and business logic buried in stored procedures that nobody has read since the developer who wrote them retired to teach high school math.

They staffed it with twenty engineers. Good engineers. They gave it a budget — call it four million over two years. Executive sponsorship. Stood it up in Jira with epics and milestones and a Gantt chart that showed the migration completing in Q3 of the following year.

By month six, they had rewritten two services. The original system had forty-seven. The two services they rewrote handled twelve percent of the traffic and required a synchronization layer back to the monolith that was more complex than the original code. The Gantt chart had been revised twice. The burn rate was ahead of plan. The team was tired.

By month fourteen, the initiative was quietly deprioritized. Not cancelled. Nobody cancels these things. They just stop getting mentioned in the QBR. The twenty engineers got reassigned to revenue-generating work. The two rewritten services stayed in production, running alongside the monolith, maintained by one person who wished they had not volunteered.

The monolith won. It always wins the first time. Because the first time, the company learns that rewriting a legacy system is not an engineering problem. It is a physics problem. And nobody accounted for the physics.


The Second Time

The second time was smarter. It is always smarter the second time. The scars from the first attempt made the leadership team cautious, specific, and more realistic about scope.

This time they did not try to rewrite. They tried to strangle. Someone had read Martin Fowler’s strangler fig pattern — or at least the blog post summary of it — and the approach made sense. Do not replace the monolith all at once. Wrap it. Intercept requests at the edge. Route new functionality to new services. Let the old system slowly lose surface area until it can be decommissioned.

The plan was sound. The theory was right. The execution hit the same wall it always hits.

The strangler fig requires you to identify seams — the places where you can cleanly intercept a request, redirect it to a new service, and guarantee that the behavior is identical. In a well-structured codebase, seams are findable. In a fifteen-year-old monolith with nine hundred tables and business logic in stored procedures, the seams are not where the architecture diagrams say they are. They are buried. They are tangled. They are defended by side effects that nobody documented because nobody knew they were side effects at the time.

The team spent three months mapping the first extraction boundary. They thought it would take three weeks. When they finally cut the seam and routed traffic to the new service, two downstream systems broke in ways nobody predicted because the monolith was doing six things that looked like one thing from the outside.

The second attempt lasted eighteen months. It extracted four modules. The monolith still ran thirty-six. The cost was north of three million. The organizational trust in modernization dropped to zero.

The CTO who sponsored it left nine months later. Not because of the project specifically. But the project did not help.


The Third Time

Sarah, this is where your company is right now. Standing at the edge of the third attempt.

And this time, the pitch is different. This time there is new technology in the room. AI agents that can read a million lines of code, map dependency graphs, write characterization tests, identify extraction boundaries, generate scaffolding. The tooling is real. It is powerful. It does change the economics of legacy rescue in ways that would have been absurd three years ago.

But the monolith is still the monolith. The organizational gravity is still the organizational gravity. The physics that killed the first two attempts have not been repealed because someone shipped better tooling.

The agents are an accelerant. They are not a strategy. And the difference between the third time working and the third time being the most expensive failure yet comes down to one thing.

Who is leading it.


The Person You Need Does Not Work at a Big Consultancy

The person who can lead a successful legacy rescue with AI agents does not work at a large consulting firm. They do not work at a tool vendor. They are not on a bench somewhere waiting to be staffed onto your account.

The large firms have two kinds of people and neither is what you need.

They have people who led legacy rescues before they joined the firm, before AI agents existed. These people understand the patterns. They know Michael Feathers. They know the strangler fig. They have written characterization tests against undocumented systems and lived with the consequences of getting the seams wrong. But they did all of that work by hand, with human teams, at human speed. They have not integrated agents into the discipline because the discipline predates the tooling.

And they have people who are expert in AI agents. They can configure them, chain them, orchestrate multi-step code generation workflows. They demo beautifully. But they have never stood in front of a fifteen-year-old monolith with nine hundred tables and felt the weight of it. They have never had to decide, at week six, that the extraction plan is wrong and the whole approach needs to change. They have never made the call that costs a month of rework but saves the project.

You need the person who lives in both worlds. Legacy rescue practitioner and AI-native builder. That person is rare because legacy rescue was a specialized discipline practiced by a small number of people, and AI-native development is new. The overlap is tiny.

The people in that overlap tend to work at small firms. Or on their own. They do not have a slide deck. They do not have a logo you recognize. They have a GitHub history and a track record and opinions about seam identification that they will share with you over coffee because they care about the problem more than the pitch.

They can name their price right now. And they should. Because what they can do — lead a legacy rescue with AI agents, in months instead of years, at a fraction of the historical cost — nobody at Accenture or Deloitte or your tool vendor’s professional services arm can do yet.


The Ninety-Second Test

So how do you tell? How do you know whether the person sitting across from you can actually lead this work?

Three names. Ninety seconds.

Say “Feathers.”

If they say Michael — as in Michael Feathers, who wrote Working Effectively with Legacy Code — you are in a real conversation. That book defined characterization testing and seam identification for an entire generation of engineers. If the person who is about to lead your third attempt at fixing the monolith does not know who Michael Feathers is, they have not done this work before. Walk them out politely. You cannot afford a fourth time.

Say “Martin.”

If they say Fowler — as in Martin Fowler, who gave us the refactoring catalog and the strangler fig pattern that your second attempt was built on — that is a second good signal. But push further. Ask them when the strangler fig fails. Ask them what they do instead. Because the strangler fig does fail — it fails when the seams are too tangled to intercept cleanly, when the side effects cross too many boundaries, when the cost of maintaining the proxy exceeds the cost of the thing it is replacing. A practitioner knows this. A presenter does not.

Say “cyclomatic complexity.”

Watch their face. If there is a pause. If they redirect to “code quality metrics” in the abstract. If they reach for a tool name instead of an explanation — you have your answer. Cyclomatic complexity is how a legacy rescue practitioner decides where to cut. It is how they tell the difference between a module that is too tangled to extract and one that is clean enough to start with. If they cannot explain it the way a chef explains knife technique — from muscle memory, without thinking about it — they are not a practitioner.

Three names. Ninety seconds. It has never been wrong.


Why the Third Time Will Fail the Same Way — Unless You Change the Physics

Sarah, your company is about to make the same structural mistake for the third time. Better tooling will not save you from it.

The pitch you are going to hear will sound so reasonable that your steering committee will approve it unanimously.

“We will embed with your teams. We will refactor the monolith within the constraints of your existing organization. We will coach your engineers along the way so that when we leave, your people own the new architecture and have the skills to maintain it.”

That has never worked. Not the first time. Not the second time. Not with AI agents. The combination of refactoring a legacy system inside the organization that produced it while simultaneously upskilling the teams that maintain it does not produce the outcome you are paying for.

The reason is organizational gravity.

Your organization is a gravitational field. Your sprint cadence, your planning rituals, your approval chains, your deployment process, your incentive structures — all of it pulls every modernization effort back toward the center of mass. The center of mass is the monolith. It is always the monolith.

Your engineers will default to the patterns the organization rewards. Not because they cannot learn new ones — because the system penalizes deviation. The engineer who follows the existing process gets a clean review. The engineer who tries the new extraction pattern has to explain it to three people, get an exception approved, and defend it in a retro when something breaks. People optimize for the path of least resistance. Your org chart defines that path.

Your planning process will chop the extraction into two-week sprints because that is the only shape your system knows. Legacy extraction does not fit that shape. The seams are where the seams are, not where your Jira board wants them. The planning process will force the work into familiar containers, and the extraction will die inside those containers the same way it died the last two times.

This is physics. Not willpower. Not talent. Not budget.


The Third Time Works Only in Isolation

The person who can actually lead this work — the one who passed the ninety-second test, the one from the small firm with no slide deck — will tell you something your organization does not want to hear.

The work has to happen outside your existing structure.

Not in secret. Not without your people involved. But structurally isolated from the gravitational field that killed the first two attempts. Separate governance. Separate deployment pipeline. Separate decision-making authority. No sprint planning. No architecture review board. No change advisory board. No four-signature approval chain before a deploy.

A small team. Three people. Maybe four. Working with AI agents the way a surgeon works with instruments — the practitioner makes the judgment calls, the tooling executes them faster. They write characterization tests against undocumented modules at a pace that would have taken a human team months. They map dependency graphs across the full codebase in hours instead of quarters. They find the real seams — not the ones on the architecture diagram from 2019 — and they make extraction calls based on what the code actually does, not what someone thought it did.

Module by module. Seam by seam. With the kind of judgment that only comes from having done this before and having been wrong before and knowing the difference between a clean cut and a cut that bleeds for six months.

And then — only then — they bring your teams in. Not to learn while the house is being rebuilt. To move into a house that is standing. Your engineers learn the new patterns by working in a codebase built with those patterns. They own the new system because the new system was built to be owned. The transfer happens on a timeline that makes sense, into an architecture that is ready to receive them.

Refactor first. In isolation. Transfer ownership second.

That is the only sequence that has ever worked. The first two attempts at your company failed because nobody told you that. Or someone told you, and the steering committee chose the version that sounded less risky and was actually more risky.


What the Third Time Looks Like When It Works

Week one. The practitioner reads the codebase. Not skims it. Reads it. Agents map every dependency, every call chain, every stored procedure, every table relationship. The practitioner looks at that map and starts making decisions no agent can make — which modules are load-bearing, which ones are dead weight dressed up as critical, which seams are real and which ones are traps.

Week four. The first extraction is in production. Not a prototype. A real module, cleanly separated, running alongside the monolith with a strangler proxy routing traffic. Hundreds of characterization tests — generated by agents, validated by the practitioner — guarantee behavioral equivalence. The monolith does not know the module left. The customers do not know. Nothing broke. That is the point.

Week eight. Three more modules out. The team has a rhythm. Agents are writing characterization tests faster than the practitioner can review them, which means the practitioner’s job has shifted from writing tests to making architectural decisions. Judgment, not labor. That is the correct allocation of a human being in 2026.

Week sixteen. The monolith has lost forty percent of its surface area. The remaining sixty percent is cleaner because the tangled cross-module dependencies are gone. Your engineers — the ones who have been watching from the other side of the isolation boundary — are starting to work in the extracted modules. They are learning the new patterns not from a training deck but from production code.

Week twenty-four. The monolith is either gone or small enough that maintaining it is a manageable cost rather than an existential risk. Your engineers own the new system. The practitioner writes a transition document, shakes hands, and moves to the next company standing at the edge of their third attempt.

Total cost — a fraction of what the first two attempts burned. Total time — months, not years. Organizational disruption — close to nothing, because the work happened in isolation and the transfer happened deliberately.


Why I Am Telling You This Now

Sarah, the people who can do this work are a small population. Very small. And every CTO in every company with a fifteen-year-old monolith is about to start looking for them at the same time. Because the board meeting where the monolith comes up again is happening everywhere, not just at your company.

These practitioners can name their price. What they offer — a completed legacy rescue in months, led by someone who has done it before, accelerated by tooling that did not exist two years ago — you cannot buy that from your existing vendors. You cannot buy it from the large consultancies. You cannot buy it from the tool companies. You can buy it from a small number of people who happen to have spent their careers in exactly the right discipline at exactly the right moment.

The window is open now. It will not stay open. As more practitioners cross from legacy rescue into AI-native tooling, supply will increase and pricing will normalize. But in March of 2026, supply is near zero and demand is about to spike.

Find them before your competitors do. Test them. Feathers. Martin. Cyclomatic complexity. Ninety seconds.

And when you find the right one, do not make the mistake your company made the first two times. Do not embed them in your org chart. Do not ask them to coach while they cut. Do not subject them to your sprint planning and your architecture review board and the gravitational field that killed the last two initiatives.

Give them isolation. Give them authority. Give them a small team and the tooling and the room to work at the speed the problem demands.

The monolith does not have to win this time.

But it will if you hire the same way you hired last time.

Written by

One useful note a week

Get one good email a week.

Short notes on AI-native software leadership. No launch sequence. No funnel theater.