d-Matrix - Ultra-low Latency Batched Inference for Gen AI

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Episoder(2000)

LAMs (Large Action Models) and the Future of AI Ownership

LAMs (Large Action Models) and the Future of AI Ownership

What happens when AI stops talking and starts working, and who really owns the value it creates? In this episode of Tech Talks Daily, I'm joined by Sina Yamani, founder and CEO of Action Model, for a ...

28 Jan 32min

Pegasystems on Why Legacy Modernization Finally Has a Way Forward

Pegasystems on Why Legacy Modernization Finally Has a Way Forward

What does it really take to remove decades of technical debt without breaking the systems that still keep the business running? In this episode of Tech Talks Daily, I sit down with Pegasystems leaders...

27 Jan 55min

UiPath and the Reality of Managing AI at Enterprise Scale

UiPath and the Reality of Managing AI at Enterprise Scale

What does it really take to move AI from proof-of-concept to something that delivers value at scale? In this episode of Tech Talks Daily, I'm joined by Simon Pettit, Area Vice President for the UK and...

26 Jan 26min

3568: Getty Images: How Brands Can Avoid  AI's Sloppification of Visual Content

3568: Getty Images: How Brands Can Avoid AI's Sloppification of Visual Content

*]:pointer-events-auto scroll-mt-[calc(var(--header-height)+min(200px,max(70px,20svh)))]" dir="auto" tabindex="-1" data-turn-id= "72e2f87d-7178-4c4c-8917-454a147e9ed3" data-testid= "conversation-turn-...

25 Jan 39min

3567: What a Chief Communications Officer Really Does and Why It Matters

3567: What a Chief Communications Officer Really Does and Why It Matters

What actually happens when a company loses control of its own voice in a world full of channels, platforms, and constant noise? In this episode of Tech Talks Daily, I sat down with Joshua Altman, foun...

25 Jan 25min

3566: How Ergodic Predicts Complex Disruptions Before They Happen

3566: How Ergodic Predicts Complex Disruptions Before They Happen

What if your AI systems could explain why something will happen before it does, rather than simply reacting after the damage is done? In this episode of Tech Talks Daily, I sat down with Zubair Magrey...

24 Jan 37min

3565: CKEditor and the Reality of Supporting Developers Across Every Tech Stack

3565: CKEditor and the Reality of Supporting Developers Across Every Tech Stack

What does it actually take to build trust with developers when your product sits quietly inside thousands of other products, often invisible to the people using it every day? In this episode of Tech T...

24 Jan 37min

3564: Why Banking Is the Ultimate Test for Responsible AI

3564: Why Banking Is the Ultimate Test for Responsible AI

If artificial intelligence is meant to earn trust anywhere, should banking be the place where it proves itself first? In this episode of Tech Talks Daily, I'm joined by Ravi Nemalikanti, Chief Product...

23 Jan 34min

Populært innen Politikk og nyheter

aftenpodden
giver-og-gjengen-vg
stopp-verden
popradet
aftenpodden-usa
forklart
lydartikler-fra-aftenposten
det-store-bildet
rss-gukild-johaug
dine-penger-pengeradet
rss-ness
hanna-de-heldige
fotballpodden-2
nokon-ma-ga
rss-utenrikskomiteen-med-bogen-og-grasvik
e24-podden
aftenbla-bla
rss-penger-polser-og-politikk
rss-dannet-uten-piano
bt-dokumentar-2