Audrey's take on the emerging race for dominance in LLM middleware
A decade ago, you could mint a $10B company by wrapping cloud primitives with usable interfaces: think early days Datadog for metrics, Cloudflare for edge delivery, Snowflake for SQL on BLOBs. Today, LLMs are the new primitives, and we’re seeing a similar middleware moment emerge. Only this time, it’s happening faster. Because unlike the slow precipitation of the cloud, AI didn't arrive as a back-end shift; it bulldozed onto the scene as a front-end mandate.
1 in 4 new startups is an AI company, according to Pitchbook and just yesterday Forbes reported on every business becoming an AI company (personally, I think this is hyperbolic but directional). Let’s be honest. Most of those companies or enterprise initiatives started with a guy/gal and a single OpenAI API call. Now, nearly all are reckoning with the sprawl that followed: fine-tuned models, vector DBs, prompt orchestration, retrieval pipelines, agents, tools, hallucination guards, and the spectre of prompt injection… to call out a few.
Welcome to the era of LLM middleware.
This started 2 years ago. Back then, Sequoia observed that 88% of teams used some retrieval mechanism, 38% used orchestration frameworks, but fewer than 10% had proper monitoring. As applications scale, that sounds like a house of cards. When canonical diagrams of LLM app stacks make the Paris metro maps look digestible, it’s easy to understand why yesterday’s glue code and good intentions need to mature.
However, the stack is fragmented. With dozens of point solutions to target everything from logging, validation, orchestration, embedding, caching, retrieval, execution, hosting (I need a break!)... LLMs need full-suite, opinionated abstractions to manage prompt flows, model routing, rate limits, and privacy rules. Something to take the mess out of everyone’s hands and say “here, use this, and stop reinventing the wheel”, similar to the example set by the history of managed microservices for cloud.
But two years ago was too early. We were still debating “one model to rule them all.” Today, we all agree that myth needs to die. Chatbot Arena lists over 200 LLMs. HuggingFace hosts over one million models. SOTA isn't a steady state; it’s a treadmill. Building for GPT-4 means your infra breaks with GPT-4.5 as the output shape changes.
Originally, inference was atomic; you prompted a model and it replied. Done. Now you construct prompts, retrieve from RAG pipelines, loop outputs, stream responses, and orchestrate complex model change. A single agentic interaction may hit 5 models and rack up 10,000 tokens in context.
As model diversity and use cases grow, so does entropy.
This is why LLM gateways (like Qwak, LiteLLM, and more on this below etc) are surging. They offer a clean abstraction layer over model APIs. Call GPT-4 today, Claude 3 tomorrow, Mistral next month. One line change, no hard coded keys, no duplicated auth logic, no fractured logs.
Today, the LLM gateway is becoming foundational for LLM interfacing, and they feel like API gateways did in 2013: nerdy, vital, and soon to be everywhere.
This has led to a crowded field of contenders, each carving a small niche:
The diversity of opinion here is exactly what you’d expect in an early middleware market. To me, it’s a sign of how foundational the gateway layer will become.
But which niche is the horse to back in this race? LangChain stands for modular composition. LlamaIndex leans into document abstraction. LiteLLM is betting on a unified OpenAI-style API. Qwak (now JFrog) believes in enterprise-grade LLM governance. Nexos is security-first. OpenRouter is UX-first. TensorZero is DIY-to-the-core.
To me, the most interesting companies in middleware aren’t just routing API calls and selling a niche story with racing-blinders. They’re building philosophies that scale. That’s where moats are made. Several contenders have “entered the chat” but the final contender will be the one that can both spark fire at the grassroots and build a robust-enough feature set and trusted brand that can rivals those that “nobody gets fired for buying”. This type of pincer movement is hard to execute, especially for early teams with limited bandwidth, but we’ve seen it work.
Requesty is a compelling emerging player that is executing on that tactic. At one end, they’ve embedded into grassroots developer communities, most notably Roo Code and other dev tools like LibreChat, making them the default interface for LLM routing without ever needing a sales call. At the other, they’ve built a robust, enterprise-grade backend with token-level observability, security, caching, regional compliance, and failover controls strong enough to handle Fortune 500 workloads.
If OpenRouter is UX-first, and Nexos is security-first, Requesty is both dev-loved and boardroom-ready.
Their unified API, built-in analytics, and provider abstraction make building with a suite of models trivial for developers, while offering the features needed for scale, governance, and margin control. And early traction hints that this go-to-market wedge is working.
We are barely two years into the LLM inflection and this stack is going to mutate rapidly. But one thing is already clear: middleware is where AI gets industrialized. It’s where chaos becomes controlled. And the horse to bet on are the ones developers want and CIOs can buy.