12 min read
It is the same conversation now. I have it three times a week.
A friend in a CTO seat, or a VP of Engineering, or a Director of Platform one rung below the title, or a neighbor who does not want to be called Bill, pulls me aside on a call or at the back of a conference and says some version of this:
“Norman, the AI bill went up again last month. My CFO put a red box around an engineer who spent $340 on inference last week. He wants me to justify the ROI on tokens. The board is asking too. I don’t know what to tell them.”
The person saying this is not someone I am trying to embarrass. They have shipped software for twenty years. They are under real pressure from a finance team that is reading the inference invoice the way they used to read the McKinsey invoice. They are being asked a direct question by people who control their budget, and they want to give a direct answer. I respect them, and I respect the position they are in.
But the question itself is the part I am puzzled by, and I want to be honest about that.
A token is a cost of doing business. It is a raw material. A foundry does not ask the floor supervisor to justify the steel. The bakery does not audit the flour invoice line by line and ask the baker why he made the bread the customer ordered. The raw material is funded because it is the input to the thing the company sells. Nobody is suspicious of it.
If you knew the unit economics, this conversation would last thirty seconds. You would tell the CFO that the engineer who spent $340 last week shipped feature X, that feature X is worth Y dollars a quarter to the business, and the conversation would end. You should be able to defend this in your sleep. You cannot, because you do not know what a feature costs to produce, you do not know what one is worth in the market, and you do not have product metrics that connect the two. I will tell you who is not getting asked to defend token spend right now. The companies that know those three numbers. They run the math, sign the invoice, and move on. The token-justification conversation is happening in the orgs where those three numbers do not exist, and the conversation looks like it is about tokens because tokens are the only number on the table.
So when an engineering leader I respect tells me they need to justify the ROI on tokens, what I am hearing is not a token question. It is a measurement question, and it is being asked on the wrong floor of the building.
Editor’s note: This is not doom and gloom. For every conversation that opens with “I need to justify the ROI on tokens,” I have another one with a different leader, and that one goes the other direction.
The version I prefer sounds like this: “Norman, we want to find the ceiling. We are tracking what our highest-spending engineers are shipping, and they are shipping more than anyone else. The ones running up the bill are closing the most pull requests, fixing the flaky tests the team has wanted closed for two years, and delivering features that were quoted at quarters of effort. We are not looking to ration. We are looking to learn from them.”
That is the right posture. Look at what the high consumers produce, then lift the rest of the team toward it. The inference invoice is the most legible signal you have for where engineering productivity is heading.
More on that conversation in its own piece, soon.
My HOA gets mowed every Wednesday. My neighbor sits on the board and knows what the contract pays, and he told me the number when I asked. The contract is annual, the price is fixed, and the company that wins it does not show up with push mowers.
They show up with tractors. Two zero-turn commercial riders, a trailer of trimmers and blowers, and a truck that tows a small bulldozer because part of the contract is grading the easement on the back side of the property. Three guys, four machines, in and out before lunch.
The owner of that landscaping company can tell you, to the dollar, what it costs him to cut the HOA’s grass. Fuel. Labor by the minute. Machine depreciation per acre. Drive time from the previous job. He knew the number when he bid the contract, because if he did not know it he would not have been in business long enough to take the job.
He automated. He bought heavy equipment because the unit economics justified it, and he can tell you what the payback period was on the second zero-turn and on the bulldozer. His team does not push-mow the property at twenty-one inches a pass. That math does not work, and he knows it does not work, because he ran the math.
Your engineering organization spends more in a single sprint than that crew earns in a year, and you cannot tell me what it costs to build a feature, what it costs to maintain a feature after launch, or what a day of delay is worth on the next thing your VP of Product is presenting at the QBR.
The landscaper did the math on his own equipment before he bought it. Do the math on yours. Figure that out before you audit a $340 token bill, because the audit is meaningless until you know what the engineer was supposed to be producing on the other side of that spend.
Let me explain what cost of delay is, because most of the people building the token dashboards have never run the math, and the math is the part the CFO actually needs to see done out loud.
Cost of delay (CoD) is the revenue, savings, or strategic option value your company does not collect for every day a feature is not in production. Take an illustrative example, and let me show the work, because if I do not show it the CFO will tell me (correctly) that I waved my hands.
A pricing engine that lifts gross margin by 40 basis points (0.4%) on $80 million in annual revenue. On $80M of revenue, 0.4% is $320,000 of incremental margin a year. I am assuming the lift converts to margin one-for-one, which is roughly true for a price-driven feature with near-zero incremental cost. (If your gross margin is 50% and the lift is on revenue rather than margin, halve the number. If your business is not gross-margin shaped, pick the analog and own the assumption.) Divide $320,000 by 365 calendar days, because a feature in the backlog forgoes value every day of the year, not only the days the office is open. That is $877 a day. Use a 250-trading-day denominator instead and the daily number is closer to $1,280. Both are right, depending on the question you are asking. Both are larger than the inference invoice, by a lot.
That is the simplest version. Real CoD is worse. It includes competitive option loss (the analyst at the firm across the street is shipping a similar feature this quarter), customer attrition you cannot model, and the compounding cost of a roadmap that ages while the team debates whether to spend $40 more on inference. Don Reinertsen wrote a whole book about this in 2009 (The Principles of Product Development Flow). Most CFOs still do not measure it.
Now put a token bill next to that number.
A senior engineer in the United States, fully loaded, costs the business somewhere between $250,000 and $360,000 a year (U.S. metro, 2025–2026, base salary plus benefits, taxes, equipment, software, and the share of overhead the FP&A function allocates to engineering). Your loaded number is your loaded number. The range above is the one I see most often in Series B through Fortune-500 P&Ls right now. Take the low end. That is roughly $4,800 a week of cost on the books, every week, regardless of what shipped that week. If that engineer spends $400 a week on inference and ships features one day faster than they would have without it, the token spend returned $877 against $400 in week one, before you count the engineer’s own time. Counting the engineer’s time, the math is lethal. A single day of delay avoided is worth roughly three months of that engineer’s full token budget at current prices.
Tell me again why we are auditing the $340 line item.
You could extract the same dollar reduction the dashboard is trying to find by paying every engineer on the team 20% less. Try that and tell me how the recruiting funnel performs in Q2. Or renegotiate the office lease. Or cut the observability bill. Or stop sending engineers to industry conferences. There are at least nine line items above token spend in your engineering P&L, and every one of them is more elastic than the line item the CFO put a red box around.
Count the SaaS subscriptions your company pays at roughly $100 a seat, per month, for every engineer on the staff list. There are a dozen of them. The wiki, the design tool half the team has not opened in six weeks, the project tracker, the diagramming tool, the screen recorder one team uses, the analytics platform two PMs run reports on. Each one priced like a phone bill. The phone bill itself you also pay, by the way, at $75 a month per engineer in a stipend nobody audits. None of these line items get a red box. Nobody asks the engineer to justify the wiki seat. Nobody audits the design tool invoice. They are paid because somebody once decided they were the cost of running the operation, and nobody has reopened the conversation since.
Then there is the all-hands. The week-long quarterly planning summit, four hundred salaried engineers in a ballroom for five days, catered lunches every day, the offsite venue, the breakout sessions nobody asked for. Run the math on that one and tell me what the variance report says. It says nothing, because that line item is not on the report. That is just how the company runs.
The token bill is the line item on your engineering P&L most likely to be producing something. It is also the first one the CFO reaches for.
The reason executives reach for token spend is that it is new. New things feel measurable in a way that legacy spend does not. Nobody asks how many hours a day a developer is allowed to drink coffee. Nobody sends a 9pm message that says, “Karen used 14% more Wi-Fi bandwidth this week, please justify.” Electricity, water, coffee, internet are base utility. The cost is funded and the consumption is assumed.
Inference is a budgeted raw material now, the way bandwidth became one in 2008. The line item is on the P&L. Most CFOs have not caught up.
This is not the developer’s fault. I want to say that twice, because I have watched too many engineering managers turn this into a developer discipline conversation.
This is not the developer’s fault.
The developer’s job is to ship working software. If they discover that an extra $200 a week in inference cuts a feature’s cycle time from nine days to six, you want them to spend the $200. You want them to find the ceiling, hit it, and report back what they got for it. The product organization’s job is to put a dollar value on the feature and back into whether the cycle-time improvement was worth the spend. That is product economics. AI did not change the model. It just made the variable that used to be invisible (engineering time) suddenly visible as a dollar figure on a monthly invoice.
Most product executives have not learned to measure cost of delay. They should. Many will. But until they do, do not push the measurement burden down onto the developer, because the developer will respond rationally. They will use less inference, ship slower, and protect themselves from the dashboard. Your company will save $200 a week and lose $877 a day. That is what manufactured scarcity buys you.
The simple token economic policy I give to engineering leaders is three rules. It does not create scarcity, and it treats inference like the utility it is.
-
Tokens are budgeted at the portfolio level, not the team level, not the individual level. A portfolio is the collection of products or value streams managed under one P&L. Inference belongs there for the same reason headcount belongs there. It follows the work, not the org chart. If one team is shipping faster because they are pulling more inference, the budget shifts toward them. If another team has slack, the budget shifts away. The portfolio is where roadmap economics live, and where the inference budget should live.
-
Every roadmap commitment carries an estimated cost of delay. If product cannot tell you what a day of delay on a feature is worth, that feature is not ready to be on the roadmap. This is the rule that does the real work, and notice it has nothing to do with tokens. Once the CoD number exists, every other line in the engineering P&L gets a denominator. Headcount, tooling, infrastructure, inference, all of it gets compared against the same shipping clock.
-
The product manager and the comptroller co-own the inference line on the P&L, not the engineering manager. The product manager because they decide which features sit on the portfolio and what each one is worth in market. The comptroller because they own the financial-control function and the shape of the budget. Engineering is responsible for productivity and craft. Product and finance, together, are responsible for whether the productivity converted into value. If the inference bill goes up and cycle time does not come down, that is a portfolio question, not a discipline question for the engineer.
Three rules. The portfolio gets inference the way it gets electricity, and CoD gets measured the way revenue gets measured. ROI sits with the product and finance leaders whose calendars move the portfolio.
When you ration tokens at the developer level, the senior engineers leave first, because they have options and will not work at a company that meters their tools. The roadmap stretches, because the cycle-time gain you were getting from inference is the gain you are now choosing to give back. The savings show up in next quarter’s P&L. The losses show up two quarters later, in line items nobody connects to the token policy. Attrition cost. Time-to-market on the feature that lost the deal. The senior engineer who is now at the competitor.
This is the part of the conversation where I usually ask the executive whether the dashboard for token spend has a column next to it for cost of delay. It never does. That dashboard is harder to build because it requires product to commit to revenue numbers they have not had to commit to before. That is the transformation work. The token dashboard is the warmup.
When your CFO asks whether the company is getting a return on its AI investment, do not show them the token bill. Show them the cycle time, and walk them through the value stream, where ten of every twenty-eight days are actual work and the other eighteen are a leadership problem no token policy can fix. Show them the features that shipped this quarter that would have shipped next quarter without inference. Show them the senior engineers who did not quit because they were not audited at the line-item level for using a tool that costs less than the coffee in the kitchen.
Then ask the CFO this. If you measured every other line on the engineering P&L the way you are trying to measure tokens, would the company still be in business? The cost of delay on the next feature you slip will outrun your team’s annual inference budget faster than you think. Run that number once, and put the red box somewhere it moves the business.
You could roll back to a zero-token development system tomorrow. The 2018 SDLC, exactly as it was, with no inference invoice and no dashboards to red-box. How would that organization compete in 2026?
Companion
