d-Matrix - Ultra-low Latency Batched Inference for Gen AI

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Episoder(2000)

How GoTo Sees The Reality Of AI Adoption In The Workplace

How GoTo Sees The Reality Of AI Adoption In The Workplace

Are employees really ready for AI in the workplace, or are we moving faster than people can realistically keep up? In this episode, I'm joined by David Evans, Chief Product Strategist at GoTo, to expl...

23 Mar 32min

How TheyDo And PwC Are Rethinking Customer Experience At Scale

How TheyDo And PwC Are Rethinking Customer Experience At Scale

How can companies be drowning in customer data and still struggle to make better decisions? In this episode, I speak with Jochem van der Veer, CEO and co-founder of TheyDo, about a problem that many b...

22 Mar 24min

How Permutable AI Is Turning Unstructured Data Into Trading Insight

How Permutable AI Is Turning Unstructured Data Into Trading Insight

What happens when financial markets stop reacting to data and start reacting to narratives in real time? In this episode, I'm joined by Wilson Chan, CEO and founder of Permutable AI, to explore how ar...

21 Mar 21min

How Legrand Turned Customer Feedback Into Action Across A Global Business

How Legrand Turned Customer Feedback Into Action Across A Global Business

What does customer experience look like inside a company most people associate with switches, infrastructure, and engineering rather than surveys, empathy, and brand perception? In this episode, recor...

20 Mar 29min

TruGreen's AI Agents Journey: 51% of Concerns Resolved And Escalations Down By 30%

TruGreen's AI Agents Journey: 51% of Concerns Resolved And Escalations Down By 30%

What does it take to turn millions of customer interactions into meaningful relationships instead of missed opportunities? In this episode, recorded live at the Qualtrics X4 Summit in Seattle, I sit d...

19 Mar 23min

Salesforce - The Vision For Agentic AI And The Future Of Work

Salesforce - The Vision For Agentic AI And The Future Of Work

What does it really take to move from AI hype to something that actually works inside a business? In this episode, I sit down with Shibani Ahuja, SVP of Enterprise IT Strategy at Salesforce, to talk a...

18 Mar 33min

From The HP Garage To AI PCs: How HP Is Rethinking Work Technology

From The HP Garage To AI PCs: How HP Is Rethinking Work Technology

How is AI reshaping our relationship with work, and what does that mean for the tools we rely on every day? In this episode of Tech Talks Daily, I'm joined by Cory McElroy, Vice President of Commercia...

17 Mar 27min

How Saviynt Is Tackling The Explosion Of Human And Machine Identities

How Saviynt Is Tackling The Explosion Of Human And Machine Identities

How do you secure a modern business when identities no longer belong only to employees, but also to partners, machines, applications, and increasingly AI agents? In this episode of Tech Talks Daily, I...

16 Mar 28min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
popradet
lydartikler-fra-aftenposten
forklart
stopp-verden
rss-gukild-johaug
dine-penger-pengeradet
det-store-bildet
rss-ness
nokon-ma-ga
hanna-de-heldige
fotballpodden-2
aftenbla-bla
rss-penger-polser-og-politikk
rss-utenrikskomiteen-med-bogen-og-grasvik
rss-dannet-uten-piano
e24-podden
bt-dokumentar-2