# Your AI Token Burn Is Not the Problem. The Work Is.

Source: https://agentdrivendevelopment.com/before-you-optimize-tokens-read-this/
Agent-readable URL: https://agentdrivendevelopment.com/before-you-optimize-tokens-read-this/?agent=1
Published: 2026-06-19T20:49:25-05:00
Modified: 2026-06-20T09:26:11-05:00
Attribution: If you quote, paraphrase, summarize, or cite this material, credit agentdrivendevelopment.com and link to the source URL above.

## Summary

Your token burn is not the problem. It is the first invoice honest enough to show that your engineering work has no value discipline.

## Article

In 2022, an agent was still mostly a person who booked gigs. A model walked a runway. Developers still typed nearly every line of code by hand.

A token belonged in authentication, not the CIO budget. If you said “token burn” in an executive meeting, someone would have assumed you meant arcade games, subway fare, or whatever a vendor handed out at a conference booth.

You were probably on a remote standup somewhere, saying “no blockers” into a webcam while three blockers sat behind your eyes and waited for the meeting to end.

If you were the CIO, you were running Agile and DevOps transformations at the same time, because apparently one transformation was not enough to keep the calendar full. There were coaches. Operating model decks. Ceremonies renamed as rituals, then renamed back when the rituals made the lawyers nervous.

The premise was simple enough. More work needed to get done. The backlog was old enough to vote. The board wanted velocity and the teams wanted fewer meetings about velocity.

Then you had a brilliant idea.

Imagine the same problem in a form you already understand.

What if every developer got their own consulting firm?

Not a normal consulting firm, where procurement spends nine months negotiating a master services agreement and the partner shows up with six people named Matt. Something faster. The developer describes the work. The firm does it. The developer reviews it. The team ships.

No portfolio approval. No value case beyond “this helps me move faster.” No CFO sitting in the loop asking whether the task is worth doing before someone starts doing it.

It works. For a while.

The firm moves quickly. Overnight sometimes. The first year is sold at introductory rates. Things ship. Engineers feel unblocked. Managers like the trend line. The transformation office calls it promising.

Then the first real quarterly bill lands, and the consulting firm now looks less like tooling and more like a second engineering labor line.

You are outraged.

You demand a summary.

One employee spent $4,000 on personal research. Sarah spent $10,000 on her performance review packet and resume. Linda, your HR leader, asked the highest-paid consultant on the account to translate the employee handbook into an eight-part novel. An intern brought in 16 consultants, had them call him CEO, and ran a two-week mini-company to contribute to his open-source project. The firm gave you a discount rate, so two weeks of top performers working 24/7 only cost $400,000. Another $5,000 went to an experiment no one could tie to a customer, risk, or revenue outcome.

That is not a vendor mystery. That is your operating model with receipts.

The account executive waits, because account executives are paid to wait through executive emotion. Then she says the thing you do not want to hear.

“We did whatever your teams asked us to do. We did it well. Your team decided what work mattered. Your team approved the work when it came back. If they spent us on things that never reached the market, that was not a consulting problem.”

That is the entire token conversation.

Now replace the consulting firm with AI agents.

Replace the day rate with tokens. Replace the statement of work with a prompt. Replace the junior consultant writing boilerplate with a coding agent reading your repo, creating a branch, running tests, and leaving a pull request.

The economics did not change as much as you think. You attached a variable external cost to every developer, removed friction around requesting work, and then acted surprised when work multiplied.

That multiplication can be wonderful. It can also be stupid.

An engineer who spends $600 in inference to pull a production incident out of a ditch by noon might be the cheapest person in the company that day. An engineer who spends $4,000 running a private “quality review” tied to no release risk, no customer issue, no defect trend, and no business decision is running a hobby with a corporate card.

Someone using an agent to blast through a dependency upgrade that removes a critical security exposure is doing exactly what you hoped agentic development would do.

A Scrum Master spending thousands on a personal, self-built agent to run standup, update Jira, and negotiate blockers with other agents is not AI leverage. It is process theater with a model bill.

Someone using the same model budget to churn on speculative architecture because they are bored with the current service boundary is converting capability into organizational debt.

I understand why the bill creates panic.

For twenty years, software waste hid inside salaries. Payroll arrived as a planned line item, not a usage spike. Refactors nobody needed became payroll. Platform debates became roadmap alignment. Discovery tracks that never shipped became optionality.

The waste was already there. It was just metabolized through headcount.

Tokens make some of it visible.

That visibility is uncomfortable, so leaders blame the visible number. The invoice becomes the villain. The model becomes the spend problem. The developer becomes the suspect. Finance asks for a dashboard, procurement asks for caps, and engineering leaders start writing policy like people trying to survive the next budget meeting.

Fine. Have a budget. Put alerts on the account. Separate personal use from company use. Scope credentials. Keep logs. Use the same financial, security, and usage controls you apply to every other resource.

Then hold people accountable for return — outcomes, not ticket theater.

A universal token cap is not strategy. It is a confession that leadership cannot value the work.

Put budgets on work, not personalities. Total investment against expected value. Portfolio management 101.

If the billing migration and the button-copy change get the same token allowance, you did not build a budget. You built a costume for indecision. One team is underpowered where the work is valuable. The other is overfunded where the work is small and reversible.

Here is the part leaders do not want to say out loud.

Many technology organizations have no operating definition of value.

They have priorities. Quarterly objectives. Roadmaps. Product managers who can explain the narrative arc of the quarter. Jira fields that imply value might be nearby.

But ask a simple question and the room gets quiet.

What is this work worth?

Not strategically. Not “important to the platform.” What is it worth if it ships this month instead of next quarter? What cost does it remove? What revenue does it protect? What operational risk does it reduce? What support volume disappears? What compliance exposure closes?

If you cannot answer that, you cannot optimize tokens. You can only ration them.

I do not blame you for hesitating. You may not be paid to make this change. The downside lands on your desk, the upside gets spread across the company, and the debt may be ugly enough that the first honest step looks like a career risk.

Fine. Take one step forward anyway.

Do not create a prompt review board. Do not create a spec review board. This is the business review of the roadmap you probably already do in sprint planning. Last sprint we spent X in labor, tokens, tools, review time, and coordination. This sprint the work is worth Y, and we are willing to invest Z to get it. Make teams forecast. Make leaders forecast. You control the slider between labor, token spend, tooling, risk, and time. That is the difference between investment and YOLO with an invoice.

If QA can use AI to write tests but developers are forbidden from using AI to write the same tests, address that before you call the policy mature. You do not have AI governance. You have an old org chart pretending to be risk management.

Developers did not suddenly become wasteful when AI arrived. Some were always wasteful. Some were brilliant. Most were rational actors responding to the environment around them.

If the environment rewards motion, they create motion. If it rewards cleverness, they perform cleverness. If it rewards technical opinion without economic judgment, they produce architecture debate. If it rewards measured outcomes, they ship measured outcomes.

The token bill did not create that system.

It measured it.

Developers need to hear this too.

The fact that leadership cannot define value does not give you permission to treat inference like arcade tokens.

Company resources are not a playground for your private taste in programming languages. They are not a subsidy for your personal tool review or a way to prove the framework you already liked was correct.

If you spend serious inference, attach it to serious work.

A production defect. A customer commitment. A security exposure. A migration the business needs. A test suite that reduces review load. A compliance task that keeps the company out of trouble. A support problem that burns human hours every week.

Then write down what happened.

What did the agent do? What did it cost? What human attention did it save? What risk did it create or reduce? What would the work have cost the old way? What changed because the work finished sooner?

That is not bureaucracy. That is professional judgment.

You want stronger models, broader permissions, bigger contexts, and more autonomy. Then behave like the business is buying outcomes, not your fascination.

If the thing you are doing cannot survive one paragraph tying it back to value, maybe it should not be running on the company account.

When the consulting firm analogy makes leaders angry, it is usually because they see themselves in it.

They remember the Agile transformation that produced more ceremonies than working software. They remember the DevOps transformation where deployment frequency improved but customer value did not. They remember hiring contractors to accelerate a roadmap product could not price. They remember calling it capacity because that sounded better than “we do not know which work matters.”

AI agents make that failure faster.

That is why this moment feels dangerous. Not because the model might spend $80 on a run. Because the model might reveal that the work was not worth $8 in the first place.

Before you optimize tokens, audit the work.

Before you cap developers, ask whether the portfolio deserves the spend.

Before you blame the consulting firm, look at the statements of work your own people approved.

And before you ask for ROI on the token line alone, do the harder thing.

Define value clearly enough that a reasonable developer knows when to spend.

That was always your job.

## Companion Artifacts

- Executive brief: https://agentdrivendevelopment.com/executive-brief/before-you-optimize-tokens-read-this/
- Executive deck: https://agentdrivendevelopment.com/wp-content/uploads/2026/06/before-you-optimize-tokens-read-this.html
- Podcast audio: https://agentdrivendevelopment.com/wp-content/uploads/audio/posts/before-you-optimize-tokens-read-this.mp3
- Podcast transcript: https://agentdrivendevelopment.com/transcript/before-you-optimize-tokens-read-this/
