Eliminating LLM Vendor Lock-In with a Single-Tenant Router
A few weeks ago we published the technical account of how we left a single AI provider behind: Eliminating LLM Vendor Lock-In with a Single-Tenant Router. It covered the architecture, the routing, and the six-week build.
This is the same story without the engineering. The first post was written for the people building it. This one is for anyone weighing the same move: less about how we built it, more about why we did, and how we knew the moment had come.
For three years we ran on one provider, and for most of that time it was the right call. They offered the strongest models and made it easy to connect our software, so standardising on one let us ship fast. We watched the trade-offs the whole time. The discipline was in not tearing up a setup that was still letting us move faster than the alternatives, just because we could see weather on the horizon.
What we were really tracking was the moment the maths flipped: the point where the convenience stopped paying for itself and started setting limits on what we could do. Moving too early would have been its own mistake. You don't rebuild your foundations to solve a problem you don't have yet. The judgment was in spotting when "still working well" had become "working well, but on someone else's terms."
When that moment came, the cost showed up in one place above all. Everything we'd built was shaped around one provider's way of working. Our software had to phrase every request exactly as that provider expected, and read every answer back with code written for that provider alone. Switching models was never a setting we could change. It meant rewriting all of that, every time.
| "Real lock-in isn't a contract you can't exit. It's a system that quietly costs more to leave than to stay."
We could absorb a price rise. What held us in place was the prospect of weeks of engineering every time we wanted to move.
There was a second cost we rarely said out loud. Anything sensitive had to be stripped out before it went near the model, which left our most useful context the context we were least able to use.
01 | What Changed Our Minds
It wasn't a single failure or a dramatic outage. We just noticed we were asking the wrong question. "Which provider is best?" mattered far less than why we were letting that one choice carry so much weight.
So we wrote down what we actually wanted. Never to be forced into a rewrite again. To spend money on the work itself, not on the privilege of routing everything through one company. And to keep our own data our own. None of those are technical ambitions. They're what any business wants from a supplier it depends on.
Once those requirements were clear, the fix was obvious. We built a single layer that sits between our applications and the AI models, a kind of universal translator and switchboard. Our applications talk to it one way. Behind the scenes, it handles each provider's quirks, so nothing upstream needs to know which model answered.
Some of that work no longer touches the cloud. It runs on the laptops already sitting idle on our team's desks. A small open model on a MacBook Air returns its first words in about 1.3 seconds, and on longer answers it ran up to eight times faster than the comparable cloud option, at no cost per request.
If you want the technical details, read the first installment mentioned above.
02 | Three things We'd Tell Anyone Weighing the Same Move
If AI is becoming part of how you deliver, you're probably feeling some version of this already. Three things we'd pass on.
1. The cost of lock-in is rarely on the invoice. It's in the decisions you slowly stop being able to make: the migration you keep postponing, the data you keep leaving out, the price you keep accepting because the alternative is weeks of work.
2. You don't have to tear anything down to fix it. We didn't replace what we had. We put a layer in front of it so the choice stayed ours. Moving a workload between providers is now a setting we change, not code we rewrite. That used to be measured in sprints.
3. Timing is the whole game, and knowing beats guessing. Moving too early is its own mistake; moving too late costs you the years we're describing. The skill is spotting the moment the maths flips. By the time we moved, it was unmistakably the right call, and that certainty is what we'd tell anyone else to wait for.
"The six weeks were the easy bit. The hard part was the call before them."
WHERE GYSHO FITS
Most firms building on AI will reach the same fork we did: the moment a convenient supplier choice starts setting the limits of what you can do. We're building Gysho because we needed that layer ourselves, the one that keeps the choice yours, reversible in minutes rather than sprints, and your data inside your own walls. If you're starting to feel the cost of a decision you made early and never revisited, that's usually the moment worth a conversation.