How to optimise token usage in Claude Code without burning through your subscription

21 Apr 2026

I've been using AI coding tools daily for quite a few months now. I started with Claude Code and the Max subscription for client work with Drupal, then I switched to OpenCode because it let you connect cheaper models and gave you more control over spending, and when Anthropic cut off the ability to use the subscription through OpenCode, I went back to Claude Code for the heavy lifting and kept OpenCode for routine tasks with cheaper models. Some months the bill has been higher than it should've been, and by trying things out and reading up I've gradually picked up a series of practices to reduce token consumption without losing quality in the results.

After years working with development tools, I've rarely seen such a big difference between using them well and using them badly as with AI coding agents.

Over the past few months I've seen quite a lot of people in Telegram groups, on Discord, Reddit and Twitter complaining about the same thing: they're burning too many tokens, the subscription isn't enough, the numbers don't add up. I'm still burning tokens. More than before, actually, because I now have automated processes running that consume quite a lot. But I think I'm controlling it reasonably well. I've been reading what other people do, trying out configurations, and bit by bit I've put together my own way of working.

What I'm sharing here are the practices that work for me today, with the current models and versions of Claude Code and OpenCode. I recently published an article explaining how I've set up my full AI development environment in DDEV, and following that I thought it'd be worth sharing the tips I've picked up over time to spend fewer tokens, which ultimately means the subscription lasts longer or the API bill is lower. This applies in a very similar way to both OpenCode and Claude Code, although in this article I'm going to focus more on Claude Code. Fair warning: I wouldn't rule out some of this becoming obsolete in a few months with new models or updates. But as of today, this is what works for me.

Fewer MCPs, more context

Every MCP server you've got connected takes up space in the context window. On every message you send. Whether you're using it for that particular task or not. If you've got several MCPs loaded at the same time, you're eating up a huge chunk of the window before you've even started working.

When I first started with Claude Code, MCPs consumed a ridiculous amount of context. In a later update they optimised how that information is loaded and managed to reduce consumption by 80% to 85% for MCPs you're not actually using. Now they only load the tool names at startup and download the details when Claude needs a specific tool. That helped a lot, but it doesn't remove the underlying problem.

And it's happened to me too, same as it happens to a lot of people. At first I kept enabling and trying MCPs: GitHub, Slack, Linear, Playwright... They stayed there installed even though I wasn't using them. Until I realised that for my day-to-day work, which is web development with Drupal, I only need one: Playwright. I use it so the AI can interact with my local site, log in as a user, take screenshots, view the HTML, check Twig comments, and verify that backend forms work as they should.

My solution was the simplest possible: uninstall all the MCPs I'd accumulated and keep only Playwright. Not disable, not temporarily turn off. Uninstall. If one day I need another MCP for something specific I'll install it, use it, and remove it. Having them there "just in case" is throwing tokens away.

Short conversations, always

This was the hardest change for me to adopt. My instinct was to keep a conversation open all week because that way Claude "knew what the project was about" and I didn't have to explain everything again.

Wrong.

Anthropic's API has no memory between calls. Every time you send a message, the entire conversation is re-sent from the beginning: the system prompt, all the tools, your configuration, and the full message history. The cost grows with each turn. A conversation that starts light can be consuming enormous amounts of tokens by turn 15 or 20.

When the conversation takes up too much, roughly around 80% of the window, Claude Code compacts automatically. It summarises everything to free up space. The problem is that summary loses important information. It's happened to me that after compacting, Claude "forgot" that an approach had already failed, tried it again, and failed again. On top of the quality drop, compaction also costs you more tokens. That's where things go wrong for a lot of people.

What I do is never reach that point. When I see the context window is past 50%, I start thinking about opening a new conversation. I close, start fresh, and if I need Claude to know something from the previous session, I pass it a quick summary of where things stood. I know it might seem tedious, but the alternative is worse: letting the conversation bloat, having compaction kick in, watching code quality drop and paying more on top of it.

The cache you lose when you sleep

This is closely tied to the previous point, but it deserves its own section because a lot of people don't know about it. If you work on a conversation in the afternoon and pick it up the next morning, you've lost the cache.

Anthropic caches the token prefixes that repeat between calls, and reading from cache costs a fraction of the normal price. But that cache has a limited lifespan. If you pause for more than a few minutes, it evaporates. When you resume an old conversation, all that context has to be reprocessed from scratch at full price. You're paying for the same conversation twice: yesterday's and today's to rebuild the cache.

The recommendation is simple: start a new conversation each day. Don't try to pick up where you left off yesterday.

Opus isn't for everything

Claude Code lets you have subagents and specify which model each one uses. This is key, because not everything needs to run on the most expensive model.

I use Opus for the main agent, which orchestrates and makes the complex decisions: architecture design, debugging difficult problems. For most development work I use Sonnet, which delivers practically the same result at a fraction of the cost. And then there are things I've got under control, that I know Haiku handles well, like searching code in the project or interacting with the browser through Playwright. Those tasks tend to consume a lot of tokens because every interaction generates loads of context, and I want those to be cheap tokens on a model that's smart enough but not excessively expensive.

The good part is you can configure each agent to use its model automatically. The main agent on Opus delegates to a development agent on Sonnet, which in turn calls another agent on Haiku to test in the browser that what's been built actually works. Different agents, different models, and consumption optimises itself without you having to switch models manually.

At the beginning I used Opus for absolutely everything, including stuff this mechanical, and when I looked at the numbers after the first month I couldn't believe it. The difference between doing a refactor with Opus and doing it with Sonnet can be $5 versus less than 50 cents. On a single task it doesn't seem like much, but over the course of a month it adds up.

On top of this, in OpenCode I use open source models like Qwen or Kimi 2.5, which you can run locally or through third-party servers at a much lower price than any Anthropic model. Anthropic's models are generally smarter and better, but for routine and relatively easy tasks you don't need that power, and the cost difference is enormous.

The context you don't share

There's another benefit to subagents that goes beyond model savings and that few people mention.

Each subagent works with its own context window, separate from the main conversation. When it finishes its task, it only returns a summary with the results. All the intermediate work it did, the files it read, the searches, the tests it ran, stays inside the subagent and doesn't pollute your main conversation.

What does this mean in practice? If you delegate heavy tasks to subagents, the main conversation stays clean for much longer. You take longer to fill the window, you delay or outright avoid compaction, and you can have longer, deeper sessions without quality degrading. It's one of those things that once you understand it, it changes how you organise your work quite a bit.

Skills instead of loose instructions

I see this point less in conversations about token optimisation, but for me it's quite important.

Skills are reusable instructions that Claude only loads when it needs them. Unlike the project's general configuration, which is sent on every call, skills load on demand. If you've got detailed instructions on how to deploy or how to run migrations, instead of putting them in the main configuration file where they'll take up space all the time, you put them in a skill and they only load when needed.

It's better to have a few well-written skills than a bunch of vague instructions stuffed into the general configuration. And while you're at it, have a few agents on Sonnet or Haiku that know how to use those skills. You don't need twenty agents. Four or five well-defined ones, each with the right model, is more than enough.

A CLAUDE.md that doesn't weigh you down

Related to the above. The CLAUDE.md file is sent on every API call. Every word you put in there you're paying for in every message.

I've seen CLAUDE.md configurations of 2,000 lines with absolutely everything about the project documented inside. That's throwing money away. My approach is to treat it as an index: only the essentials, what Claude keeps getting wrong so it doesn't do it again, and pointers to more detailed docs for when they're needed. The shorter, the better.

Max vs API, doing the maths

I've been asked several times what's better value, the Max subscription or going with API and paying per use. It depends entirely on how much you use it.

If you use Claude Code three or four times a week for specific things, the API works out cheaper. You'll probably spend less than $10 a month. But if you use it daily for several hours, the subscription is worth it by far. I've seen cases of people who with intensive use would've paid thousands of dollars on API and with the Max subscription it stays at $100 or $200 a month. Realistic? Absolutely. There's a documented case of a user who consumed 10 billion tokens in 8 months. On API that'd be over $15,000. With Max they paid $800.

In my case I use Max with Claude Code for day-to-day interactive work, and API through OpenCode for automated tasks and code quality loops I can leave running unsupervised.

The two most expensive mistakes

The reality is that most unnecessary spending comes from two places.

The first: using Claude Code with the default configuration, no agents, no skills, everything running on Opus. It's the most expensive setup possible and it gives you no real advantage over having well-configured agents with appropriate models for each task.

The second: having a single conversation per project and reusing it for everything over days or weeks. Compaction will kick in constantly, response quality keeps dropping as context accumulates, and you end up paying much more than you should for the same work.

What I've taken away

After months going from Claude Code to OpenCode and back, trying configurations and looking at the numbers, what I've learned is that the difference between spending little and spending a lot isn't about using the tool less. It's about how you configure it and how you manage conversations. Most of the spend goes on overhead: context being re-sent over and over, MCPs you're not using, conversations that should've been closed ages ago.

If you're only going to change one thing after reading this, make it setting Sonnet as the default model and leaving Opus only for tasks that genuinely need it. That alone and you'll notice the difference at the end of the month.

PS: If you work with Drupal and use DDEV, I'd recommend checking out the article where I explain how I've set up my full AI development environment. There I cover the architecture in detail, how the containers communicate, and why I've got two CLIs installed at the same time. I've also published the ddev-ai-workspace repository, which with a single command installs OpenCode, Claude Code, Playwright as an MCP, an autonomous orchestrator for unsupervised tasks, and 10 specialised Drupal agents with their skills and per-agent model configuration. It's completely free, the only thing you need is a Claude Code subscription or to pay for APIs to use the AI tools. If you're interested in trying it or have any questions, feel free to get in touch.

How to optimise token usage in Claude Code without burning through your subscription

Fewer MCPs, more context

Short conversations, always

The cache you lose when you sleep

Opus isn't for everything

The context you don't share

Skills instead of loose instructions

A CLAUDE.md that doesn't weigh you down

Max vs API, doing the maths

The two most expensive mistakes

What I've taken away

Fewer tokens, more judgment: Fable has updated my AI add-ons for DDEV + Drupal

Claude's subscription is worth less every month, so I'm going back to OpenCode

Need a Drupal Expert?

How to optimise token usage in Claude Code without burning through your subscription

Fewer MCPs, more context

Short conversations, always

The cache you lose when you sleep

Opus isn't for everything

The context you don't share

Skills instead of loose instructions

A CLAUDE.md that doesn't weigh you down

Max vs API, doing the maths

The two most expensive mistakes

What I've taken away

Related posts

Fewer tokens, more judgment: Fable has updated my AI add-ons for DDEV + Drupal

Claude's subscription is worth less every month, so I'm going back to OpenCode

Need a Drupal Expert?