d-Matrix - Ultra-low Latency Batched Inference for Gen AI

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Avsnitt(2000)

3490: How Zenoti Is Redefining Guest Experience With AI

3490: How Zenoti Is Redefining Guest Experience With AI

What happens when a former Microsoft leader walks away from tech, immerses himself in personal wellbeing, and accidentally discovers one of the biggest blind spots in the global spa, salon, and wellne...

18 Nov 202526min

3489: Tredence on Why Data Darwinism Will Shape the Next Wave of Enterprise AI

3489: Tredence on Why Data Darwinism Will Shape the Next Wave of Enterprise AI

What happens when enterprise AI moves faster than the data foundations meant to support it? That question guided my conversation with Sumit Mehra, CTO and Co-Founder of Tredence, who joined me while t...

18 Nov 202530min

3488: How Akeneo Sees the Future of Product Experience in an AI First Retail World

3488: How Akeneo Sees the Future of Product Experience in an AI First Retail World

What happens when AI becomes the centre of how we shop, yet trust still determines whether any of it works? That question shaped my conversation with Romain Fouache, CEO of Akeneo, who joined me to un...

17 Nov 202525min

3487: vFairs Explains the Next Chapter of Event Tech

3487: vFairs Explains the Next Chapter of Event Tech

What happens when events become the most human channel in a world increasingly shaped by AI? That thought set the tone for my conversation with Muhammad Younas, founder and CEO of vFairs, who has spe...

16 Nov 202526min

3486: Augury on Why AI Literacy Is Becoming a Core Skill for Every Worker

3486: Augury on Why AI Literacy Is Becoming a Core Skill for Every Worker

What does it say about the future of work when AI competency starts to feel as expected as basic reading? That question sat with me throughout my latest conversation with Artem Kroupenev, VP of Strate...

15 Nov 202531min

3485: The Road to Predictable, Reliable Infrastructure with Nutanix

3485: The Road to Predictable, Reliable Infrastructure with Nutanix

What does resilience look like when your business depends on keeping data, apps, and infrastructure running flawlessly in a world that never sleeps? At IGEL's Now & Next event in Frankfurt, I sat down...

14 Nov 202521min

3484: How BDO Is Turning AI Investment Into Real Outcomes

3484: How BDO Is Turning AI Investment Into Real Outcomes

Have you ever wondered what it looks like when a global professional services firm commits over one billion dollars to AI and expects it to reshape the way its people work across every corner of the b...

13 Nov 202533min

3483: Cisco and Presidio Unite to Build the AI Ready Network of the Future

3483: Cisco and Presidio Unite to Build the AI Ready Network of the Future

*]:pointer-events-auto [content-visibility:auto] supports-[content-visibility:auto]:[contain-intrinsic-size:auto_100lvh] scroll-mt-[calc(var(--header-height)+min(200px,max(70px,20svh)))]" dir="auto" t...

12 Nov 202536min

Populärt inom Politik & nyheter

aftonbladet-krim
svenska-fall
p3-krim
flashback-forever
rss-krimstad
rss-sanning-konsekvens
aftonbladet-daily
rss-vad-fan-hande
motiv
spar
rss-krimreportrarna
rss-flodet
politiken
olyckan-inifran
rss-frandfors-horna
svd-ledarredaktionen
dagens-eko
blenda-2
grans
rss-aftonbladet-krim