d-Matrix - Ultra-low Latency Batched Inference for Gen AI

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Episoder(2000)

3532: How AI Keeps Live Events Personal for Fans at Event Tickets Center

3532: How AI Keeps Live Events Personal for Fans at Event Tickets Center

What makes live events feel personal in an age of algorithms making the calls? That's the tension marketers are living in right now. Ben Kruger, Chief Marketing Officer at Event Tickets Center, sits a...

26 Des 202525min

3531: Scaling Without the Hype Inside Uploadcare's Technical Philosophy

3531: Scaling Without the Hype Inside Uploadcare's Technical Philosophy

What does it really take to build software that can grow from a single line of code to millions of users a day without losing its soul along the way? In this episode of Tech Talks Daily, I'm joined by...

25 Des 202527min

3530: Candy Crush Accessibility Lessons From a 200 Million Player Game

3530: Candy Crush Accessibility Lessons From a 200 Million Player Game

If you have ever opened Candy Crush over the holidays without thinking about the design decisions behind every swipe, this episode offers a rare look behind the curtain.  I sit down with Abigail Rindo...

24 Des 202524min

3529: How Ping Identity Sees the Next Chapter of Digital Identity

3529: How Ping Identity Sees the Next Chapter of Digital Identity

What does it actually mean to prove who we are online in 2025, and why does it still feel so fragile? In this episode of Tech Talks Daily, I sit down with Alex Laurie from Ping Identity to talk about ...

23 Des 202527min

3528: How Boomi Thinks About Scaling AI Without Losing Control

3528: How Boomi Thinks About Scaling AI Without Losing Control

What does it really mean to keep humans at the center of AI when agentic systems are accelerating faster than most organizations can govern them? At AWS re:Invent, I sat down with Michael Bachman from...

22 Des 202526min

3527: How AWS Is Building Trust Into Responsible AI Adoption

3527: How AWS Is Building Trust Into Responsible AI Adoption

What does responsible AI really look like when it moves beyond policy papers and starts shaping who gets to build, create, and lead in the next phase of the digital economy? In this conversation recor...

21 Des 202527min

3526: TinyMCE and the Human Side of Developer Experience

3526: TinyMCE and the Human Side of Developer Experience

What does it really mean to support developers in a world where the tools are getting smarter, the expectations are higher, and the human side of technology is easier to forget? In this episode of Tec...

20 Des 202531min

3525: iBanFirst and the Shift Toward Specialist Fintechs for Global Payments

3525: iBanFirst and the Shift Toward Specialist Fintechs for Global Payments

What does it really take to build a fintech company that quietly fixes one of the most frustrating problems SMEs face every day? In this episode of Tech Talks Daily, I'm joined by Pierre-Antoine Duso...

20 Des 202530min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
popradet
stopp-verden
forklart
det-store-bildet
lydartikler-fra-aftenposten
rss-ness
rss-gukild-johaug
fotballpodden-2
dine-penger-pengeradet
hanna-de-heldige
aftenbla-bla
nokon-ma-ga
rss-dannet-uten-piano
rss-penger-polser-og-politikk
rss-utenrikskomiteen-med-bogen-og-grasvik
e24-podden
bt-dokumentar-2