Skip to content
, ,

Before You Build a Token Economics Dashboard, Build a Value Dashboard

Before you optimize token spend, measure the completed work, human attention, cycle time, and risk reduction those tokens bought.

·

Let your agent read this

Executive briefClick to expand

Value stream economics, not unit cost, drives effective AI adoption.

Quantify outcomes, not just input costs.

  • Investment in AI capabilities must be evaluated against the total cost of delivery, including human effort, cycle time, and risk, not solely the cost of compute. An incomplete solution, however cheap its components, introduces substantial hidden costs in human labor and delay.
  • Model selection is a function of task complexity and desired outcome. Simpler models suffice for well-defined, low-context tasks, while frontier models are necessary for ambiguous, high-context work where human intervention is costly.
  • The true cost of a technical solution is the aggregate of all resources expended to achieve the desired outcome, not merely the most visible line item. Ignoring indirect costs leads to suboptimal resource allocation.
  • Technology economics evolve; today's expensive capability becomes tomorrow's commodity. Prioritizing current unit cost savings over the capability gains of frontier technologies leads to a competitive lag.

The first question for any AI program: what is the total cost of delivering the completed work, and what is its value?

Read the full executive package →

Pen doodle illustration for before-you-build-a-token-economics-dashboard-build-a-value-dashboard

9 min read

Dear leaders,

I heard you are worried your engineers are burning tokens, so you asked for a token economics dashboard and then limited their tokens.

Fine. That makes the spreadsheet quieter, but it also trains your engineering organization to avoid the machine you bought to increase delivery.

Before you build the dashboard, decide what it is supposed to tell you.

If it answers, “How do we spend fewer tokens?” it will make a bad decision look disciplined; if it answers, “What did those tokens finish?” now you have a management tool.

Here is the free leadership coaching. If the goal is to teach someone to sail, a fuel constraint makes sense. Give them ten gallons of gas for the motor for the summer, not ten gallons for the weekend, but ten gallons for the season.

They learn the wind, the current, when to tack, when to drift, and when to motor out of the dead pocket because August is coming and the can is already half empty.

That is a good training constraint, and it is also a good hobby if you do not need to get anywhere fast. It is a terrible operating model for a passenger or freight business.

If you ran a ferry that way and then complained you were not getting value from internal combustion engines, nobody serious would blame the engine; they would blame the operating model, because that is what your token policy is doing.

You are not teaching the organization how to use AI to deliver business outcomes. You are teaching your engineers to avoid the engine so they can preserve the fuel budget.

Then the executive team sits in a quarterly review and wonders why the AI program is not producing ROI. I have a theory: if a $40 model run saves two hours of engineering time, the $40 was not the problem. The two hours were.


The CFO is right to ask about the token bill, and that is the part people miss.

If a new line item shows up in the cloud invoice and starts growing every week, someone should ask what it is buying. That is normal financial discipline, not anti-AI, and not proof the CFO hates the future.

The mistake happens one question later, when the spreadsheet says, “How do we spend fewer tokens?” and the business should be asking, “What did those tokens finish?”

Before you do anything else, understand the ROI, not the token ROI in isolation, but the ROI of the thing being built.

What revenue does it protect? What cost does it remove? What risk does it reduce? What customer problem does it solve? What deadline changes if it ships this week instead of next month?

If the business does not know the value of the work, I would be concerned about a lot of things, and the token bill would be near the bottom.

Because the problem is not that an agent might spend $42. The problem is that your organization is funding work whose value nobody can explain, and somehow the model invoice became the first time anyone asked.

That is not an AI economics problem; it is a business management problem with a tokenizer attached.


The common objection comes next.

“What if people abuse the policy?”

Fine. Put a cap on it.

Put the cap where the company does not go bankrupt if the team hits it, but where the team still has enough room to build something valuable enough to justify the spend. That is the point of a cap: bound the downside without eliminating the upside.

Put alerts on it. Log the run, attach it to a ticket, scope the credentials, keep the review trail, and require production approval like you would for any other risky change.

Then, at the end of the sprint, tally the tokens against the value delivered and make a professional decision.

If the team spent $1,200 in tokens and pulled forward $40,000 of delivery, stop holding a hearing about the $1,200. If the team spent $1,200 in tokens and produced nothing anyone can explain, do the management work.

What if your captain is siphoning gas out of the tank for a side project your company does not own?

Then you do not have a fuel policy problem; you have a professionalism problem, maybe a misconduct problem.

Companies have dealt with misuse of company resources for generations. Adults do not need a ten-gallon summer ration to understand that boats are not fire pits, engines are not toys, and company resources are not a subsidy for somebody’s weekend business.


A year ago, the best model you could put in front of an engineering workflow was good for the era. It was good enough to generate scaffolding, explain code, and help a strong developer move faster if the developer stayed close.

Useful, but still a helper, not a worker. Then the frontier moved. Sonnet got better. Opus got better. GPT got better. Other models showed up with different strengths. Do not get religious about the labels. The important shift is capability.

There is a difference between a model that helps you type and a model that can hold the shape of a problem for hours. There is a difference between a model that needs a developer to manage every step and a model that can read the codebase, make a plan, run tests, inspect failure, repair the implementation, and leave a useful summary when you wake up.

Those are not the same economic unit, but too many companies still price them as if they are.

Try auto if your harness supports it.

Not everywhere, not forever, and not because a vendor demo looked clean.

Try it on a bounded workflow where the outcome is clear, the tests exist, the permissions are constrained, and failure is recoverable. Let the agent plan, run, fail, repair, and report without a developer hovering over every step.

It is getting good, and if it works, stop asking your team to run serious work through underpowered models because the token line looks smaller. For ambiguous, high-context, multi-step work, give your team frontier models.

This year’s frontier models will look slow and outdated in twelve months, which is fine. That is how technology works. Capability arrives expensive, then it commoditizes. The mistake is waiting for commoditization when the expensive version is the first one that can do the work you need today.

Spend the money where capability changes the outcome. Assume the price curve improves later. Do not make your team fight today’s problems with last year’s model because next year’s model will be cheaper.

You do not save money by making an expensive person babysit a cheap machine.


Start with the work: porting a library, remediating a CVE, or understanding a new codebase well enough to make a safe change. These are not toy prompts; they are real work with real business consequences.

Having GPT-5.5 summarize a new codebase in ten minutes is valuable if it saves an engineer from spending two hours wandering through stale abstractions, half-deleted modules, and “temporary” patterns from 2021.

But wait, you are going to use the twelve-month-old model because the token line looks better, then let the engineering team struggle through the difference, and call that financial discipline. No, that is just making the person absorb the machine’s weakness.

The formula is simple:

Total cost =
  model spend
  + (human attention hours x loaded hourly rate)
  + delay cost
  + rework risk

Or, if you want the executive version:

Use the stronger model when:
  frontier token premium
  < (human hours saved x loaded hourly rate) + delay avoided + risk reduced

Now put numbers on it.

Frontier model:
$40 model run
+ (0.25 human hours x $90 loaded hourly rate)
= $62.50 total before delay

Cheap model:
$4 model run
+ (2.00 human hours x $90 loaded hourly rate)
= $184 total before delay

Congratulations, you saved $36 in tokens and spent $121.50 more overall.

If two engineers sit in the conversation because the weak model produced something almost right, double the labor line. If the work waits until tomorrow because the team ran out of afternoon, add the delay cost. If the CVE stays open another day, add the risk you are pretending not to price.

This is why token-only economics are so seductive: they are the one line you can see while ignoring the bill you are actually paying.


Ask Artie and Camille. Between them, they worked with Dijkstra and Hopper, and they wrote code on paper because the machine was too expensive to use as a notebook.

You thought before you touched the computer. You wrote the program out. You punched the cards. Then you waited for your time on the machine. If you were wrong, you went back through the queue.

That was not because people were smarter then; the machine was scarce.

Then software economics flipped. Laptops got cheap. Editors became free. Compilers were just there. Cloud compute felt like electricity in the wall.

Now the machine can cost real money again, which feels strange inside software but normal almost everywhere else. The aircraft costs more than the pilot’s hourly wage. The MRI machine costs more than the technician. The CNC mill costs more than the operator.

Nobody serious says, “Use the cheaper machine that ruins the part because the electricity line looks better.”

This is not a crisis. It is compute becoming economically visible again.


This does not mean “use the most expensive model for everything,” because that is lazy in the other direction.

Some work belongs on cheaper models: formatting, summaries, simple transforms, narrow code edits, deterministic refactors with good tests, and classification where failure is cheap.

Use the cheaper engine when the road is flat and the cargo is light. Do not put the cheapest model on ambiguous work and then declare agents are not ready.

Architecture discovery is not formatting. Legacy code repair is not a summarization task. A migration that touches authentication, billing, permissions, and production data is not the place to celebrate that you saved forty cents on inference.

The model is part of the delivery system, and if it cannot hold enough context, reason through enough uncertainty, or recover from enough failure to finish the job, then its cheapness is fake.

You are not buying tokens; you are buying completed work, reduced cycle time, fewer handoffs, and less human attention trapped in machine management.

Price that.


The value dashboard is simple: track the outcome requested, whether the agent finished it, human attention required, calendar time removed, and production risk created or reduced.

Now the token number has context: a run that finishes a two-day cleanup overnight is cheap, a run that burns half a day of developer correction is expensive, and a run that safely completes a migration, updates tests, writes the rollout notes, and leaves a clean review path might be one of the cheapest engineering transactions you make all week.

Even a $250,000 agent run can be cheap if it replaces a $250,000 consulting engagement and leaves working code, tests, migration notes, review history, and internal learning. That is the same money staying inside the system instead of leaving as a slide deck and three partner readouts.

The unit is not tokens; the unit is flow.


So yes, watch the token bill, but do not worship it.

Set policy by work type. Use smaller models where the work is narrow, reversible, and easy to verify. Use stronger models where the work is ambiguous, high-context, multi-step, or expensive for a human to supervise.

Measure human attention as a cost. Measure calendar compression as a benefit. Measure completed outcomes, not just inference spend.

And when someone says the expensive model costs too much, ask the question software leaders have always had to ask: compared to what?

Compared to a cheaper model that fails? Compared to a senior engineer babysitting the agent all afternoon? Compared to a feature sitting in the backlog another sprint? Compared to a migration that never starts because the spreadsheet made the engine look expensive?

You can cross the ocean with wind if the cargo does not matter and the schedule is fiction.

But if you are carrying freight the business needs, stop bragging about the free breeze while the ship sits still.

Companion

Written by

The views and opinions expressed in this article are the author’s own and do not represent the positions of any employer, client, or affiliated organization.

Every article, narrated. Listen while you ship.
From the Author

Corporate fiction

Three books. One operating problem. No clean hero.

Read 2028, Meridian, and AgentDrivenDevelopment.com’s Survive free online.

Read free online →

One useful note a week

Get one good email a week.

Short notes on AI-native software leadership. No launch sequence. No funnel theater.