d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Oppdag Premium

Prøv 14 dager gratis

Kjøp Premium

Episoder(2000)

3482: IGEL CEO Klaus Oestermann on Why the Endpoint Is the Forgotten Link in Cybersecurity

What if the real weakness in enterprise cybersecurity isn't the cloud or the network, but the endpoint sitting on every desk? In this episode, Klaus Oestermann, CEO of IGEL Technology, joins me at the...

11 Nov 202528min

3481: From Annual Headache to Real-Time Helper: The AI Future of Tax Filing

What if filing your taxes was as effortless as asking your AI assistant a question? For millions of people, the annual ritual of gathering receipts, logging into confusing portals, and racing against ...

10 Nov 202526min

3480: How Zenphi Helps Businesses Move from AI Experiments to Enterprise-Scale Adoption

What if the real story of AI isn't about chatbots or copilots at all, but about what happens when intelligence becomes part of the infrastructure of how work gets done? That's the idea driving this co...

9 Nov 202529min

3479: From F1 to the Boardroom: Seb Sheppard on Building High-Performance Tech Teams

What can leadership in Formula One teach the rest of us about business transformation? In this episode of the Tech Talks Daily Podcast, I sit down with Seb Sheppard, whose career has taken him from fl...

8 Nov 202534min

3478: Why Aviatrix Believes Network Visibility Is the Missing Pillar of Cloud Defense

How do you secure a world where trusted internal traffic now travels over the public internet? That's the question I put to Doug Merritt, CEO of Aviatrix, in this thought-provoking conversation record...

7 Nov 202540min

3477: The Intersection of AI, DX, and Technical Debt.

Every software team, no matter its size or sophistication, has wrestled with the same quiet threat, technical debt. But what if the issue isn't just messy code or outdated frameworks, but something mo...

6 Nov 202527min

3476 How Denodo's DeepQuery Brings Reasoning to Enterprise AI

What if business intelligence didn't stop at answering what happened, but could finally explain why? In this episode of Tech Talks Daily, I sit back down with Alberto Pan, Chief Technology Officer at ...

5 Nov 202523min

3475: Jamf - Why Zero Trust Must Include macOS and iOS

For years, many businesses believed that Apple devices were inherently secure. That illusion has faded. In this episode of Tech Talks Daily, I speak with Adam Boynton, Senior Security Strategy Manager...

4 Nov 202530min

Reklamefrie Premium-podkaster

Hør populære podkaster som Storefri med Mikkel og Herman, Ida med hjertet i hånden, Krimpodden og mye mye mer

Skap din egen podkastboble

I appen skaper du ditt eget bibliotek med favoritter, og vi gir deg også anbefalinger til podkaster du ikke kan gå glipp av.

Prøv 14 dager gratis

Dersom du er ny Podme-bruker får du 14 dager gratis prøveperiode når du oppretter abonnement

Premium

99 kr/ måned

Tilgang til alle våre Premium-podkaster
Alle podkaster fra VG, Aftenposten, BT og SA
Reklamefritt Premium-innhold
Ingen bindingstid. Avslutt når du ønsker

Prøv 14 dager gratis

Premium

129 kr/ måned

Tilgang til alle Premium-podkaster
Alle podkaster fra VG, Aftenposten, BT og SA
Reklamefritt Premium-innhold
Ingen bindingstid. Avslutt når du ønsker
En Ekstra bruker

Prøv 14 dager gratis

Populært innen Politikk og nyheter

rss-utenrikskomiteen-med-bogen-og-grasvik

Historiene og stemmene du vil høre

Ubegrenset tilgang til alle dine favorittpodkaster og lydbøker

Les mer