11 min read
The night before your internal QBR, you dream you are already in the room. Not the board meeting. The internal one. The rehearsal before the rehearsal, where the slides are still editable and finance is deciding which line items deserve the red box.
In the dream, every slide is an invoice.
The consulting partner invoice is seven pages long. Page one says “strategic delivery acceleration.” Page seven has change order number fourteen. Nobody flinches. The Scrum and agile coaching layer floats to the front with words like predictability, alignment, maturity, operating cadence, and continuous improvement. Nobody asks what it returned to EBITDA.
Four consultants from four different firms are standing by the whiteboard again. One for each year you hired someone to finally figure out the economics of software delivery. They taught your leadership team the basics of measuring software ROI: value-stream maps, cost of delay, the laminated one-page model, and the breakout exercise where every table connected delivery work to business value.
Everyone nodded in the dream exactly the way they nodded in the real workshop. Your leadership team never caught on.
That is why you are doing this at all.
Not because tokens are magic. Because the Total Cost of Ownership (TCO) of creating software in your organization is still unknown. You spent four years and four consulting firms trying to get that number. They did not deliver it. The irony should bother everyone in the room.
The delivery-management layer gets called connective tissue. The release train gets called coordination. Quarterly planning gets called necessary. The Jira hygiene initiative gets called discipline. The transformation office gets called governance. The offshore pod gets called capacity. The staff augmentation contract gets called flexibility. Every slide passes.
Then the token invoice appears. One line in the cloud report: $118,000.
The room wakes up inside the dream. Suddenly everyone has fiscal responsibility.
That is the part that stays with you when you actually wake up. Not the number. The selectivity. The company can spend seven figures on consulting partners, offshore capacity, staff augmentation, agile coaching, delivery management, and planning ceremonies without one CFO-ready sentence about EBITDA.
But tokens get the emergency meeting.
Apparently fiscal responsibility has a trigger word, and the word is tokens.
That is the sentence a middle-layer director wants to say out loud:
“So it was okay to waste tons of money with bad consulting partners, but tokens are too much money?”
Do not say it that way in the meeting. It will feel good for four seconds, and then the CFO will ask for the numbers.
Bring them.
Bring every capacity invoice and one denominator finance can recognize: accepted production outcomes.
The ask is not permission to spend recklessly on tokens. The ask is permission to measure the value stream, find the true cost of creating software, and connect that cost back to EBITDA. If software delivery is supposed to increase revenue, reduce expense, protect margin, or lower risk, the production system that creates software needs an economic model.
Right now most companies have invoices, headcount plans, ceremonies, and vibes. That is not a model.
Replacing consulting and coaching dollars with token dollars is not the point by itself. Replacing unmeasured dollars with measured dollars is the point.
The CFO is right to circle the token line.
Finance sees a new variable cost growing from $42,000 a month to $118,000 a month, and finance asks whether that becomes $250,000 by Q4. That is governance doing its job.
The mistake is pretending the token line is the only place engineering capacity gets bought. Before this invoice existed, the company already bought extra capacity through offshore teams, staff augmentation, systems integrators, vendor professional services, coaching layers, delivery-management layers, release trains, quarterly planning, maturity assessments, and tooling nobody opens until the Monday before the steering committee.
Those were token bills too. They just arrived with nicer nouns.
Nobody asks the agile coach to tie their retainer to accepted production outcomes. Nobody asks the quarterly planning summit to defend its EBITDA contribution. Nobody asks the delivery-management layer how much decision latency it removed last month. Nobody asks the product operating model workshop why the roadmap still ships at the same speed six months later.
The invoice has the right cultural costume, so it passes.
Tokens do not have the costume yet.
Run the simpler test.
If a company walked in tomorrow with an IDE that cost $12,000 per engineer per year and made your engineering organization 40% faster, you would buy it.
You would not ask each engineer to justify every save, autocomplete, refactor, test run, or compile. You would not put a daily cap on how many times they could use the debugger. You would not make a senior developer explain whether this particular code search deserved the premium tier.
You would change governance to exploit it.
If a tool makes software creation materially faster, the correct response is not to meter the tool until it behaves like last year’s IDE budget. The correct response is to change the production system around the new speed: review policy, tests, release gates, security checks, architecture approval, product intake, budgeting, and measurement.
If the IDE made the team 40% faster and your governance still made every change wait twelve days for review, the IDE did not fail.
Your operating model did.
That is what is happening with tokens. The spend is being evaluated like a seat license while the capability is changing the economics of software creation.
So do not justify tokens against zero. Zero was never the baseline. Justify tokens against the capacity market your company already used.
This is the replacement-cost view I would bring to finance.
Offshore delivery pod: it promised cheaper capacity. Measure accepted work per month, rework rate, cycle time, and internal review hours.
Staff augmentation: it promised more hands quickly. Measure time to productive contribution, supervision load, and defect escape rate.
Systems integrator: it promised faster program delivery. Measure what actually shipped, change-order cost, and knowledge retained.
Vendor professional services: it promised product-specific speed. Measure implementation time, post-launch support load, and the dependency it created.
Scrum and agile coaching layer: it promised predictability and continuous improvement. Measure ceremony cost, management load, cycle time, accepted outcomes, and decision latency.
AI tokens and agents: they promise more output from people who already know the system. Measure cycle-time change, accepted outcomes, escaped defects, and cost of delay avoided.
The token invoice is an input cost. So was the offshore invoice, the consulting partner invoice, the agile coaching spend, the delivery-management layer, and the “temporary” staff augmentation contract that stayed for nineteen months because nobody wanted to admit the project still needed the people.
The question is not which input looked smallest when procurement approved it. The question is which input produced accepted work in production at the lowest total cost. The uncomfortable part is that you probably have the token number and not the accepted-outcome number for any of them.
That is why the token bill feels expensive. It is visible. The other waste got promoted into process.
Let me do the math in the ugly way, because this is the math a director can run before Thursday.
Take an offshore pod. Six engineers through a vendor at a blended $85 an hour. At 160 hours a month, that pod costs $81,600 a month before internal management load.
Now count outcomes, not hours. In the last ninety days, the pod completed twenty-four tickets. Fourteen were accepted without major rework. Six came back for material changes. Four were closed or superseded because requirements moved before the work landed.
That gives you a first-pass acceptance rate of 58%. The capacity you bought was not 960 clean engineering hours a month. It was 960 nominal hours multiplied by the rate at which those hours turned into accepted work.
Now add the cost nobody put on the vendor invoice. One senior internal engineer spent eight hours a week reviewing, explaining context, rewriting specs, and cleaning up integration issues. At $280,000 fully loaded, that engineer costs about $135 an hour. A product manager spent four hours a week clarifying tickets across time zones. The pod did not cost $81,600. It cost roughly $88,000 before delay, rework drag, support tail, and the meetings everyone pretended were normal.
If that pod shipped two accepted production outcomes a month, you paid about $44,000 per accepted outcome. If it shipped four, you paid $22,000. If it shipped one and created a support tail, you paid far more than the hourly rate ever admitted.
Offshore is not cheap because the hourly rate is cheap. Offshore is cheap only when accepted outcomes are cheap.
Now put the token line next to it.
Your best internal team is six engineers. They already know the system. You were already paying them before the token bill appeared. The question is whether the AI spend changes output enough to justify the new variable cost.
In March, that team spent $22,000 on AI tools and inference. In April, they spent $31,000. Finance sees a 41% increase and starts circling. Good. Circle it. Then put the output next to it.
Before agents were part of the workflow, the team averaged three accepted production outcomes a month on this part of the roadmap. After the workflow changed, they averaged five. The extra two were accepted by product, deployed behind flags, monitored for thirty days, and not rolled back.
If the incremental AI spend is $31,000 and the team produced two additional accepted outcomes, the gross incremental cost is $15,500 per additional outcome.
That number is not automatically good. It becomes good or bad when you compare it to the alternatives. If the offshore pod was effectively costing $22,000 to $44,000 per accepted outcome, and the internal AI-enabled team is producing additional accepted outcomes at $15,500 of incremental spend, the token bill is not the expensive line. It is the cheaper capacity channel.
That is before cost of delay. If one of those two additional outcomes is a pricing workflow worth $900 a day once live, and it lands twenty-one days earlier than it would have in the old system, that is $18,900 of value captured early.
The CFO does not need poetry. The CFO needs the denominator.
This is where middle-layer directors have an advantage. You are close enough to the work to know which tickets were fake progress, which vendor milestone was accepted because the steering committee was tired, which offshore team is good but buried under bad requirements, and which internal team quietly became faster because the senior engineer stopped hand-writing scaffolding and started reviewing generated changes against behavior.
The CFO sees invoices. You see the conversion rate.
Do not say, “AI makes developers 40% faster.” That dies in finance because it sounds like a vendor slide.
Say this:
“In the last ninety days, our offshore pod cost $264,000 including internal review load and produced seven accepted production outcomes. That is about $37,700 per accepted outcome, before cost of delay. In the same period, the internal team spent $74,000 on AI tools and produced five additional accepted outcomes over baseline. That is $14,800 of incremental AI spend per additional accepted outcome. Quality did not degrade. Escaped defects were flat. Cycle time improved from twelve days to seven. I want to expand the envelope for one more quarter and keep measuring the same denominator.”
That is a finance conversation.
The success-rate question matters more than the cost question.
For offshore capacity, what percentage of the work became accepted production change without major rework?
For staff augmentation, how many weeks passed before the person reduced load instead of creating it?
For systems integrators and vendor services, what shipped before the change orders started, and how much knowledge stayed inside the company?
For Scrum Masters, agile coaches, release train engineers, delivery managers, and program managers, what changed in queue time, rework rate, decision latency, accepted outcomes, and EBITDA?
For AI tokens, what changed in cycle time, first-pass acceptance, escaped defects, and cost of delay?
Those questions force every capacity model into the same room without making the article look like a procurement spreadsheet.
A lot of companies have been polite about outsourcing math for twenty years. They compare internal salaries to offshore hourly rates and stop there, because the next part gets socially expensive. The next part asks whether the cheap hours became working software, whether internal supervision ate half the capacity gain, and whether the vendor success story survived contact with production support.
AI does not get to skip those questions.
Neither should everyone else.
The trap is letting finance turn token governance into a rationing exercise before anyone has done substitution economics.
If the CFO says, “This token line is growing too fast,” do not respond with vibes. The CFO is doing their job. New variable spend needs a budget envelope, a forecast, and a control mechanism.
Give them one, but make the control mechanism outcome-based.
Set a quarterly AI capacity envelope at the portfolio level. Attach it to accepted outcomes, cycle time, quality, and cost of delay. Compare it against the external-capacity channels the portfolio would otherwise use. Expand the envelope when teams produce cheaper accepted outcomes than the alternatives. Contract it when they do not.
Do not set individual token caps unless you want your best engineers managing usage instead of work. Do not make every engineer explain why a task deserved the frontier model. That permission tax will cost more than the model.
Budget the raw material. Measure the output. Compare it to the capacity market you were already buying from.
That is control.
There is a harder version of this conversation, and it is the one a good CFO will eventually ask.
“If the AI-enabled internal team is cheaper per accepted outcome than offshore, why are we still using the offshore pod?”
Do not dodge it. Sometimes the answer is maintenance work, coverage, support hours, regional knowledge, or a stable backlog where the economics still work. Sometimes the vendor relationship is strategically useful. Sometimes the internal team cannot absorb the work without dropping something more valuable.
Those are real answers. “Because offshore is cheaper” is not. Cheaper per hour, or cheaper per accepted production outcome? Cheaper before rework, or after? Cheaper before internal review load, or after? Cheaper before cost of delay, or after the feature misses the quarter?
This is the useful pressure AI puts on the old model. It does not only make engineering faster. It exposes how lazy some of the old accounting was.
I am not arguing that tokens are always worth it. If your team is burning inference on vague prompts, rewriting the same generated code three times, accepting low-quality changes, and shipping no faster, finance should challenge you. If spend goes up while cycle time stays flat, escaped defects rise, and review load increases, you do not have an investment. You have a new way to create waste.
The discipline is not “spend less.” The discipline is “show me what the spend replaced, what it produced, and whether the substitution improved the economics of delivery.”
So when token prices rise, or usage rises, or the invoice finally gets large enough that finance notices, do not walk into the CFO’s office with a defense of tokens.
Walk in with the old invoices, the AI spend, and the conversion metrics: accepted outcomes, rework rate, internal review hours, cycle time, escaped defects, and cost of delay. Ask for permission to measure the value stream and ground the economics of software creation back to EBITDA.
How much did it actually cost?
What success rate did it actually have?
And if the token bill is expensive, what exactly are we buying back when we cut it, besides the comforting illusion that the old waste was free?
Companion
