d-Matrix - Ultra-low Latency Batched Inference for Gen AI

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Avsnitt(2000)

3541: How IBS Software Sees AI Redefining Airline Retail and Loyalty

3541: How IBS Software Sees AI Redefining Airline Retail and Loyalty

What if airlines stopped thinking in terms of seats and schedules and started designing for the entire journey instead? In this episode of Tech Talks Daily, I'm joined by Somit Goyal, CEO of IBS Softw...

4 Jan 31min

3540: Hill Climbers, Where Tech, Fitness, and Human Connection Meet

3540: Hill Climbers, Where Tech, Fitness, and Human Connection Meet

What happens when a podcast stops being something you listen to and becomes something you physically show up for? In this episode of Tech Talks Daily, I wanted to explore a different kind of tech stor...

3 Jan 28min

3539: ShelterZoom CEO on Keeping Care Moving When Systems Go Down

3539: ShelterZoom CEO on Keeping Care Moving When Systems Go Down

What happens to patient care when hospital systems suddenly go dark and clinicians are forced back to pen and paper in the middle of a crisis? In this episode of the Tech Talks Daily Podcast, I speak ...

2 Jan 23min

3538: How Storyblok Sees Content Strategy Changing in an AI First Internet

3538: How Storyblok Sees Content Strategy Changing in an AI First Internet

Is your website still the front door to your business, or has AI already quietly changed where customers first meet your brand? In this episode of the Tech Talks Daily Podcast, I sit down with Dominik...

1 Jan 33min

3537: Why Aztec Labs is Building the Endgame for Blockchain Privacy

3537: Why Aztec Labs is Building the Endgame for Blockchain Privacy

What happens when the push for smarter crypto wallets runs headfirst into the reality that everything on a public blockchain can be seen by anyone? In this episode of Tech Talks Daily, I wanted to tak...

31 Dec 202529min

3536: When AI Knows Us Too Well and What It Means for Human Choice

3536: When AI Knows Us Too Well and What It Means for Human Choice

What happens when the systems designed to make life easier quietly begin shaping how we think, decide, and choose? In this episode of the Tech Talks Daily Podcast, I sit down with Jacob Ward, a journa...

30 Dec 202535min

3535:  HR at a Crossroads: Performance, Culture, and Technology

3535: HR at a Crossroads: Performance, Culture, and Technology

How is HR changing when AI, economic pressure, and rising employee expectations all collide at once? In this episode of Tech Talks Daily, I'm joined by Simon Noble, CEO of Cezanne HR, to unpack how th...

29 Dec 202528min

3534: Agentic AI at Scale: What 120 Million Monthly Conversations Really Mean

3534: Agentic AI at Scale: What 120 Million Monthly Conversations Really Mean

What does it really mean when AI moves from answering questions to making decisions that affect real people, real money, and real outcomes? In this episode of Tech Talks Daily, I'm joined by Joe Kim, ...

28 Dec 202528min

Populärt inom Politik & nyheter

svenska-fall
aftonbladet-krim
p3-krim
rss-krimstad
flashback-forever
spar
rss-sanning-konsekvens
motiv
rss-vad-fan-hande
olyckan-inifran
aftonbladet-daily
politiken
rss-krimreportrarna
grans
rss-flodet
blenda-2
rss-aftonbladet-krim
krimmagasinet
rss-frandfors-horna
dagens-eko