Thought Leadership

The Highest Tax in AI Coding: Token Micro-Economics

dehakuran.com · April 2026 · 3 min read

You are paying your agent to re-learn your codebase every morning. Every learning is lots of tokens. Every token is lots of dollars and euros.

Yes, there are some methodologies (claude-mem was my favorite). But many, many others are unaware of these methodologies.

Every Session Starts With Amnesia

You re-paste yesterday's context, the stack, the decision you made at 11pm. The agent re-reads it, re-tokenizes it, charges you for it. Multiply by every developer, every session, every day.

A coding agent isn't really a code generator. It's a file reader, a grep runner, a log parser, a reasoner — and then, finally, a code generator.

"Generation is the cheap part. The round-trips are the expensive part. And a large slice of that is waste."

The Data Is Uncomfortable

An ICLR-submitted OpenReview study on SWE-bench found that:

Input tokens dominate total cost, even with caching.
Token usage varies up to 10x between runs of the same task.

And the line that should be on every CFO's wall:

Higher token usage correlates with lower accuracy.
The agents that spend more get worse answers.

So I Built "Brief"

That is why I spent a couple of Sundays building Brief — a local tool that gives each AI coding agent a focused, persona-scoped brief instead of a raw memory dump.

It works across Claude Code, Codex, and Gemini CLI. And I used pixel-art office visuals to make it actually fun to look at.

Early benchmark: –35% cost, –44% wall-time, same test pass rate.

Why, When There Are So Many Tools Already?

Ehm — because it's fun. And because I can.

Happy Sunday.

Frequently Asked Questions

Why are AI coding agents so expensive to run?

They re-read your codebase every session. Input tokens dominate total cost — even with caching — according to an ICLR-submitted SWE-bench study, and token usage varies up to 10x between runs of the same task. You are paying to re-learn context that already existed the day before.

Does spending more tokens produce better AI coding results?

No. The same SWE-bench analysis found that higher token usage correlates with lower accuracy. The agents that spend more get worse answers, not better — the opposite of intuition.

How can teams cut AI coding agent costs without losing quality?

Give each agent a focused, persona-scoped brief instead of a raw memory dump. The 'Brief' tool, tested across Claude Code, Codex, and Gemini CLI, delivered −35% cost and −44% wall-time at the same test pass rate.

Coding Benchmarks Got Boring. ProgramBench Made Them Honest. →
ProgramBench asks agents to rebuild programs from binaries and docs. The models hit the floor, and that may be the point.
AI Labs Are Becoming Enterprise Operating Layers →
AI labs are moving from model providers to enterprise operating layers for agents, access, workflow, integration, and governance.
World Models Are the Next Shift, Not Larger LLMs →
Yann LeCun's AMI Labs just trained a world model on a single GPU. The next decade of AI may be defined by predicting reality, not the next token.

AI CodingDeveloper ToolsCost Optimization

Deha Kuran

AI Executive, Engineer, and Evangelist. Head of AI Business Operations at Philips.

Follow the thinking on LinkedIn →