Why I'm passing on AI wrappers in 2026

The moat question is louder than it's ever been, and the answer most founders give is the wrong one.

Apr 10, 20267 min read

The pitch landed on a Wednesday. Small team in Munich, six months of real revenue, clean deck. ChatGPT-for-insurance-underwriters, measurably better UX than the incumbent tool. They'd hit €40k MRR faster than the last four companies I'd seen in this category. They asked for €1.5M.

I passed.

I liked the team. The product worked. The customers were real. But I spent the weekend trying to talk myself into the investment and couldn't. Something about the shape of the business was wrong in a way I've been seeing a lot this year, and it's worth writing down.

"Wrapper" has stopped meaning anything

The word is tired. Every LLM-native product now gets dismissed as "just a wrapper" - a lazy slur that implies the product is one prompt and a login page away from irrelevance. It isn't useful. Most of the products we actually depend on are wrappers over something: Stripe wraps Visa, Uber wraps a car, every SaaS tool wraps a database.

The real question isn't whether you built the model. It's whether you've built something the model's creator couldn't trivially absorb, or - more urgently - whether a hundred teams with the same idea aren't going to catch you inside twelve months.

That second part is what killed the insurance underwriter pitch for me. Not OpenAI. The hundred other people shipping the same thing by August.

Three questions I actually ask

When I'm evaluating a company that builds on top of frontier models, I run through three questions. The order matters.

1. What happens when the underlying model gets 10x better or cheaper?

The unit economics of this business are set by a curve I don't control. If the next generation of models drops the cost of the core inference by 90% next quarter - and that is absolutely going to happen - does your margin expand, does your moat compress, or does your product collapse?

Some businesses get better when the model improves. A legal research tool that currently hedges with five-minute reviews becomes a tool that finishes in thirty seconds, and that changes what it's worth. Those are long the model.

Others get worse. If the reason someone buys your product is "it makes GPT usable for X workflow," and GPT now natively does X, the distance between you and zero is very short. Those are short the model.

Most wrappers are short the model and don't know it.

2. What proprietary data or workflow do you own that a model alone cannot reconstruct?

Not the prompts. Not the system message. The data. The integrations. The escalation path to a human when the model is wrong. The approval workflow. The compliance trail. The fourteen years of historical decisions that teach a new model what "correct" looks like in your specific corner of the world.

If the answer to this question is "our prompts are really good," you're not defensible. If the answer is "we have a hundred thousand labelled outcomes from the last three years that nobody else can legally obtain," that's different.

3. What does the first hour of using the product teach the product about the user?

This is the one most founders miss. If the product gets measurably better for each user as they use it - because their actions, corrections, and escalations become training signal - you have a compounding advantage. Every week of usage is a moat brick.

If the product is static - same quality on day one and day one thousand - you're renting your moat from your model vendor's roadmap.

The insurance underwriter tool scored C, B, and D on these. The team was great. The math wasn't.

The real moat in 2026 is speed

Here's the thing I've come around on over the last six months, watching the portfolio I do have:

The winners of the next twelve months are not going to be the teams with the prettiest prompts or the cleverest architecture. They're going to be the teams shipping the most integrations per week.

Every AI product I talk to eventually hits the same wall: the model is good enough, the UI is good enough, the market is ready, and the only thing separating the companies doing $5M ARR from the ones doing $500k ARR is how many systems they talk to and how well. The top quartile ship a new integration every seven to ten days. The bottom half ship one a quarter.

That's the real compounding asset. Not model quality - model quality is a rising tide. Not UX - UX gets copied in a week. Integration surface area. The number of places the product already sits in someone's workflow when a competitor shows up with a better prompt.

If you want a cheap test: ask the founder when the next integration ships. If they can name it, when it goes live, and who's building it - they're going to win. If they wave at a roadmap, they probably aren't.

What I'm actually writing checks for

A short list, because people keep asking.

Tools with deep, real-time workflow integration. Not "we have an API." I mean: the product reads from your CRM, writes to your calendar, triggers your deploy, and lives inside Slack - and the loop closes within the same sprint. These are rare and they compound.

Agent infrastructure. The pipes, not the apps. Evals, observability, cost tracking, memory, tool-use orchestration, the unglamorous stack that everyone building agents needs and nobody wants to build themselves. This is the picks-and-shovels phase.

Vertical plays with structural data advantages. Regulatory data, hardware telemetry, legal filings, medical imaging, engineering simulations - anywhere the data is genuinely hard to get because it's expensive, regulated, or locked inside incumbents with no reason to share it. If the moat is "we spent two years getting this data cleared," that's a moat.

Applications with a clear non-LLM core. Products where the LLM is an interface layer over something else that does the hard work. Forecasting, optimisation, search, simulation - anywhere the model is a translator, not the engine.

What I keep passing on: chat-over-data startups with no proprietary data, creative-writing assistants, general-purpose knowledge-worker copilots, anything that calls itself "the ChatGPT for X" where X is a market with three other ChatGPTs already.

What I told the Munich team

They asked for feedback, which I appreciated. I told them what I'm telling you: six months of usage is enough to start building a real data asset, but they weren't capturing it. Every underwriter decision they helped with was training signal, and they were throwing it away. Rebuild the product so every accepted and rejected recommendation flows back into a per-customer fine-tune, and in twelve months they'd have something the next model can't replicate - because it would need their customers' data to do it.

They're thinking about it. I hope they do it. If they do, I'll call them back.

The rest of 2026 is going to be hard on wrappers. Not because the model vendors will eat them - the vendors mostly won't bother - but because their competitors will, three times a week, for as long as it takes. The only defence is to be compounding something the competitors can't catch up on, no matter how fast they ship.

That's what I'm looking for. That's what I mostly don't see.