Skip to content
All posts

INDUSTRY

Vendor lock-in in the LLM era — and the gateway pattern that loosens it

2 min read

Lock-in is rarely the model weights. It is the API surface, the prompts and the ops glue. Here is how OpenAI-compatible gateways became the standard answer — and what they don’t fix.

Switching language models sounds easy: change the model name. In practice, teams discover their lock-in is everywhere except the weights.

Where lock-in actually lives

It lives in the API surface you integrated against, the prompts you tuned for one model’s behaviour, the SDK and the tool and function-call schemas, and the operational glue — logging, retries, rate limits — wired to one vendor. None of that moves when you swap a model name.

How one company’s API became the lingua franca

The unlock was an accident of adoption. OpenAI’s Chat Completions API, launched in 2023, became the format everyone cloned. By late 2024 Google’s Gemini was reachable from the OpenAI library, Anthropic shipped an OpenAI-compatible layer, AWS Bedrock added an OpenAI-compatible endpoint, and open-weight servers like vLLM serve the same routes. One integration can now target many providers by changing a base URL and a key.

What a gateway adds

An LLM gateway turns that compatibility into leverage. It sits between your apps and many providers behind one (usually OpenAI-compatible) API and adds routing, automatic fallback, caching, rate limiting, per-team budgets, unified logging and cost observability, and one place to enforce policy. Even single-provider shops adopt one just for central key management and cost attribution — by 2025, analysts had reframed AI gateways from optional tooling to standard infrastructure.

What it does not fix

Be honest about the limits. “OpenAI-compatible” is a de-facto standard, not feature parity — Anthropic’s own docs note their compatibility layer drops features like prompt caching and is meant for testing and migration, not as a production substitute. A gateway also relocates lock-in rather than deleting it: you now depend on the gateway, your prompts may still be model-tuned, and the layer adds a hop and a place data transits. The win is a lower switching cost, not zero.

Portability is a configuration change, not a rewrite — if you design for it early.

That is the bet behind building our own gateway, Qevron: standardise the interface once, keep the option to move, and front your own models so the most sensitive traffic never has to leave your control.