Thought Leadership

The Highest Tax in AI Coding: Token Micro-Economics

dehakuran.com · April 2026 · 3 min read

You are paying your agent to re-learn your codebase every morning. Every learning is lots of tokens. Every token is lots of dollars and euros.

Yes, there are some methodologies (claude-mem was my favorite). But many, many others are unaware of these methodologies.

Every Session Starts With Amnesia

You re-paste yesterday's context, the stack, the decision you made at 11pm. The agent re-reads it, re-tokenizes it, charges you for it. Multiply by every developer, every session, every day.

A coding agent isn't really a code generator. It's a file reader, a grep runner, a log parser, a reasoner — and then, finally, a code generator.

"Generation is the cheap part. The round-trips are the expensive part. And a large slice of that is waste."

The Data Is Uncomfortable

An ICLR-submitted OpenReview study on SWE-bench found that:

Input tokens dominate total cost, even with caching.
Token usage varies up to 10x between runs of the same task.

And the line that should be on every CFO's wall:

Higher token usage correlates with lower accuracy.
The agents that spend more get worse answers.

So I Built "Brief"

That is why I spent a couple of Sundays building Brief — a local tool that gives each AI coding agent a focused, persona-scoped brief instead of a raw memory dump.

It works across Claude Code, Codex, and Gemini CLI. And I used pixel-art office visuals to make it actually fun to look at.

Early benchmark: –35% cost, –44% wall-time, same test pass rate.

Why, When There Are So Many Tools Already?

Ehm — because it's fun. And because I can.

Happy Sunday.

Frequently Asked Questions

Why are AI coding agents so expensive to run?

They re-read your codebase every session. Input tokens dominate total cost — even with caching — according to an ICLR-submitted SWE-bench study, and token usage varies up to 10x between runs of the same task. You are paying to re-learn context that already existed the day before.

Does spending more tokens produce better AI coding results?

No. The same SWE-bench analysis found that higher token usage correlates with lower accuracy. The agents that spend more get worse answers, not better — the opposite of intuition.

How can teams cut AI coding agent costs without losing quality?

Give each agent a focused, persona-scoped brief instead of a raw memory dump. The 'Brief' tool, tested across Claude Code, Codex, and Gemini CLI, delivered −35% cost and −44% wall-time at the same test pass rate.

Coding Benchmarks Got Boring. ProgramBench Made Them Honest. →
ProgramBench asks agents to rebuild programs from binaries and docs. The models hit the floor, and that may be the point.
World Models Are the Next Shift, Not Larger LLMs →
Yann LeCun's AMI Labs just trained a world model on a single GPU. The next decade of AI may be defined by predicting reality, not the next token.
The Agent Layer Is the Hard Part: Why Most Agentic AI Pilots Don't Reach Production →
43% of health systems pilot agentic AI; only ~3–4% run it in live workflows. The gap isn't a model problem — it's an Agentic OS problem.

AI CodingDeveloper ToolsCost Optimization

Deha Kuran

AI Executive, Engineer, and Evangelist. Head of AI Business Operations at Philips.

Follow the thinking on LinkedIn →