Inception Labs says its diffusion LLM is 10x faster than Claude, ChatGPT, Gemini

On a recent episode of the The New Stack Agents, Inception Labs CEO Stefano Ermon introduced Mercury 2, a large language model built on diffusion rather than the standard autoregressive approach. Traditional LLMs generate text token by token from left to right, which Ermon describes as “fancy autocomplete.” In contrast, diffusion models begin with a rough draft and refine it in parallel, similar to image systems like Stable Diffusion.

This parallel process allows Mercury 2 to produce over 1,000 tokens per second—five to ten times faster than optimized models from labs such as OpenAI, Anthropic, and Google, according to company tests. Ermon argues diffusion models better leverage GPUs, with support from investor Nvidia to optimize performance.

While Mercury 2 matches mid-tier models like Claude Haiku and Google Flash rather than top systems such as Claude Opus or GPT-4, Ermon believes diffusion’s speed and economic advantages will become increasingly compelling as AI applications scale.

Learn more from The New Stack about the latest developments around around large language model built on diffusion:

How Diffusion-Based LLM AI Speeds Up Reasoning

Get Ready for Faster Text Generation With Diffusion LLMs

Join our community of newsletter subscribers to stay on top of the news and at the top of your game.

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(300)

AI can write your infrastructure code. There's a reason most teams won't let it.

In this episode ofThe New Stack Agents, Marcin Wyszynski, co-founder of Spacelift and OpenTofu, explains how AI is transforming infrastructure as code (IaC). Originally built for individual operators,...

20 Maalis 29min

OutSystems CEO on how enterprises can successfully adopt vibe coding

Woodson Martin, CEO ofOutSystems, argues that successful enterprise AI deployments rarely rely on standalone agents. Instead, production systems combine AI agents with data, workflows, APIs, applicati...

6 Maalis 43min

NanoClaw's answer to OpenClaw is minimal code, maximum isolation

OnThe New Stack Agents, Gavriel Cohen discusses why he built NanoClaw, a minimalist alternative to OpenClaw, after discovering security and architectural flaws in the rapidly growing agentic framework...

20 Helmi 51min

The developer as conductor: Leading an orchestra of AI agents with the feature flag baton

A few weeks after Dynatrace acquired DevCycle, Michael Beemer and Andrew Norris discussed on The New Stack Makers podcast how feature flagging is becoming a critical safeguard in the AI era. By integr...

19 Helmi 19min

The reason AI agents shouldn’t touch your source code — and what they should do instead

Dynatrace is at a pivotal point, expanding beyond traditional observability into a platform designed for autonomous operations and security powered by agentic AI. In an interview on *The New Stack Mak...

13 Helmi 22min

You can’t fire a bot: The blunt truth about AI slop and your job

Matan-Paul Shetrit, Director of Product Management at Writer, argues that people must take responsibility for how they use AI. If someone produces poor-quality output, he says, the blame lies with the...

11 Helmi 57min

GitLab CEO on why AI isn't helping enterprise ship code faster

AI coding assistants are boosting developer productivity, but most enterprises aren’t shipping software any faster. GitLab CEO Bill Staples says the reason is simple: coding was never the main bottlen...

10 Helmi 57min