Monday morning I opened up a task that had come out perfect on Friday. Same task, same prompt, same model. And suddenly the thing couldn't get anything right. What it nailed on the first try Friday, on Monday it couldn't manage on the second or third attempt. Nothing had changed on my end. Something had changed on theirs, and nobody told me.
This has happened to me a fair few times now, and every time it leaves the same aftertaste. It's not that the model is bad. It's that I don't control when it stops being good. And when you build a business on top of a tool that can go dumb overnight without warning, you've got a problem that isn't technical. It's structural.
The problem isn't the model, it's depending on it
Let's separate two things people tend to mix up. One is which model is best, Opus or GPT or whatever. That debate barely interests me. The other is who you depend on to keep that model running, and that one does keep me up at night.
Right now the serious AI market is in two or three hands. OpenAI with GPT, Anthropic with Claude and its Opus, Sonnet and the rest. If your workflow lives inside one of those companies, so does your business. They raise the price and you pay it. They pull a model from the subscription and you're forced onto the API with a brutal cost bump. They decide your use case no longer suits them and you're left out.
And then there's the layer on top, which is the government. All of this is US companies subject to the decisions of a single country. The US can block access to those models whenever it feels like it, for certain countries, for certain sectors, for whatever the reason of the month is. You don't need to be paranoid to see the risk. You just need to have lived through a couple of shifts in trade policy to know it happens, and that nobody's going to ask your opinion first.
I work as a freelance Drupal developer, subcontracted for agencies and with direct clients of my own. If tomorrow I can't deliver the work because my AI provider is down, has raised prices, or has simply cut off my access, that's my problem, not Anthropic's. I'm the one sending the invoice. So the question isn't which model is best. The question is how I get out of a position where somebody else's decision leaves me stranded.
The silent lobotomy
This deserves its own section because it's about as frustrating as it gets. One day the model works beautifully and the next it's like they've scooped out half its brain. Same name, same label, and yet the answers are noticeably worse.
What's usually going on underneath is that the provider has changed the version of the model it serves, often to a quantized version. Quantizing means reducing the precision of the model's weights so it takes up less space and runs cheaper. The provider saves money on infrastructure and you eat the weaker responses without anyone telling you. The model is called the same but it isn't the same.
With closed models you have no defense against this at all. You don't know which version they're serving you today or whether it's the same one as yesterday. You can't pin it. You just notice something's gone dumb and you're stuck rewriting prompts that had been working for months, as if the problem were yours.
The cost you don't see
The price per token drops and everyone claps. But the real cost isn't the price per token, it's how many tokens you burn to finish a task. And that's where the trap is.
Newer models tend to reason more, give longer answers, generate longer conversations. They cut the price per token in the shop window and underneath the total consumption shoots up. At the end of the month the bill hasn't gone down, or it's gone up, even though each individual token is cheaper. It's a hidden cost that never shows up in any launch announcement.
And then there's the more direct stuff. They raise prices, just like that. They pull a model from the subscription and push you onto the API, which costs several times more. They cap a plan that used to be plenty. Each of those moves is legitimate from their point of view, they're a business. The problem is when your business has no alternative and you're stuck swallowing all of them.
My antidote: a gateway with fallback
The underlying idea is simple. I don't want an API, I want one place where all my APIs live and where my agents talk only to that place. In my case I use LiteLLM. I tried it back in the day, liked it and stuck with it, but there are other AI proxies that do the same thing, so hold on to the concept more than the brand.
What it does is expose a single API to your agents. Underneath you've got all your external providers configured, plus the connection to your own servers. Your agents don't know and don't care what's behind it. They request a model and LiteLLM handles serving it from wherever it should come from.
The genuinely good part is the fallback. For each model you can tell it what its plan B is. If the connection to one provider fails, another one you've configured takes over, without the agent noticing. You can have Anthropic's API as your primary and Amazon Bedrock as backup, serving the same Opus and Sonnet but from a different provider. If one goes down, the other answers. The AI you were expecting keeps replying.
With open source models this gets even more powerful. DeepSeek, Kimi, Qwen and the rest are served by loads of different providers. You can set the one you like best as default, the cheapest, or the one with better cache handling, and leave a secondary for the same model at a similar price. Even if your main provider is having a bad day, the answer still comes through.
And here's the important thing about open source models: they don't go dumb. It's the model, and it's always the same one. If a provider quantizes the model on you and the answers get worse, the fix is stupidly simple. You switch providers. And if you can afford it, you host it yourself.
That last bit is what I'm doing with Qwen. I've set up a server where I run the simpler AI tasks, the more mechanical ones. A Qwen doesn't hold a candle to an Opus, let's be serious, but for a huge chunk of the day-to-day work it does the job fine. And when it runs on my machine, there's no provider who can touch it, raise the price, pull it, or quantize it behind my back. It's mine.
The other leg: self-improving skills
The fallback gives me resilience, but that's only half the story. The other half is getting a cheaper model to do the work I currently ask an expensive one to do. And there I've spent the last few weeks building something that's got me pretty happy.
It's an agent orchestrator with skill self-improvement. When a task is generated, there's an AI review afterwards. Then a human review. And after the human one, there's a guide that goes back over the whole conversation and detects whether something is worth turning into a procedure, for example by creating a new skill.
The trick is in the loop. Next time a similar task comes in, the skill that documents it already exists. We know which agents are the right ones, how to tackle it, and what to expect. That cuts the token cost of similar tasks because they come better documented from the start, and it standardizes the output, which for me matters almost as much as the savings.
And here's the beauty of the whole thing. You can use a smart model, an expensive one, only to generate that improvement. Once the skill is written, the dumber models make use of it. Sonnet, Haiku, DeepSeek or Qwen itself running locally handle tasks that used to need a pricier model, because the knowledge is already captured in the skill and they don't have to reinvent it every time. You pay for the intelligence once and reuse it many times.
On top of that I add my Drupal audit module, which I maintain in the community. It exposes a few drush commands so the AI, while it's working on my local, has access to the module's recommendations and to what it detects in the code and the configuration it's been leaving in the project itself. It's a closed loop. The AI corrects itself in a loop before another AI reviews it and before a human reviews it. Each layer filters a bit more and less noise reaches me to review.
In the end all of this is about the same thing. Cutting the time we humans spend reviewing, cutting the human cost, and cutting the token cost. It's not magic, it's plumbing.
The right tool, not the perfect one
If we lived in an ideal world, we'd all use the most expensive, smartest model for everything and be done with it. But in a business you don't use the perfect tools, you use the right ones. The one that gets the work done, is fast, and is balanced on price. There's no point pulling out the most expensive, perfect model for a task that a simple, cheap, fast one does just as well.
Building this whole setup has its cost. The fallback has to be configured and maintained. The local server has to be administered, and it's not free in time even if it is on the monthly bill. The self-improving skills system I'm still tuning, and I don't know whether the return will make up for all the hours I'm pouring into it. And above all, this field moves so fast that half of what I've written here might be obsolete in six months.
I don't want my ability to work depending on whether a company thousands of kilometers away is having a good day today, on whether its API is online, on whether they haven't raised prices, or on whether some government decides this is the month to cut off the tap. I'd rather spread the risk, even if it costs me more work, even if it's never quite finished.