d-Matrix - Ultra-low Latency Batched Inference for Gen AI

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Jaksot(2000)

3497: How Phil Gilbert Turned Culture Into IBM's Most Powerful Asset

3497: How Phil Gilbert Turned Culture Into IBM's Most Powerful Asset

What happens when a leader realises that the success of every major initiative, from AI projects to return to office plans, rests on something far deeper than strategy or tools? In my conversation wit...

24 Marras 202530min

3496: Why the LoopUp Startup Story Is a Masterclass in Leading Through Uncertainty

3496: Why the LoopUp Startup Story Is a Masterclass in Leading Through Uncertainty

What happens when your entire market disappears overnight? That was the reality facing LoopUp when the pandemic transformed the way the world communicates. In this episode of Tech Talks Daily, I sit d...

23 Marras 202525min

3495: How Adebimpe Ibosiola is Bringing Clarity to Digital Transformation in Regulated Industries

3495: How Adebimpe Ibosiola is Bringing Clarity to Digital Transformation in Regulated Industries

What happens inside a transformation program when every decision must withstand scrutiny, every dependency carries weight, and every undocumented rule inside a legacy system can change the outcome of ...

22 Marras 202523min

3494: The Fastest Way to Recover Endpoint Devices During an IT Outage

3494: The Fastest Way to Recover Endpoint Devices During an IT Outage

Why do entire organisations invest millions building resilient data centres yet leave their endpoints exposed to outages that can last days? That question kept coming back to me during my conversation...

21 Marras 202526min

3493: Industrial AI in Action, Somya Kapoor on Digital Workers and ROI

3493: Industrial AI in Action, Somya Kapoor on Digital Workers and ROI

What happens when a founder who built a billion dollar company during a global crisis steps into the centre of industrial AI and begins reshaping how entire organisations think and work? That question...

20 Marras 202524min

3492: How Mammoth Enterprise AI Browser Redefines Security at the Endpoint

3492: How Mammoth Enterprise AI Browser Redefines Security at the Endpoint

Have you ever wondered what happens when the browser stops being a simple window to the web and starts becoming the control point for how AI touches every part of enterprise life? That was the startin...

19 Marras 202526min

3491: From NHL Ice to Enterprise Data: Ataccama's CEO on Building AI That Actually Works

3491: From NHL Ice to Enterprise Data: Ataccama's CEO on Building AI That Actually Works

What happens when a former NHL player who once faced Wayne Gretzky ends up running a global data company that sits at the center of the AI boom? That question kept coming back to me as I reconnected w...

19 Marras 202530min

3490: How Zenoti Is Redefining Guest Experience With AI

3490: How Zenoti Is Redefining Guest Experience With AI

What happens when a former Microsoft leader walks away from tech, immerses himself in personal wellbeing, and accidentally discovers one of the biggest blind spots in the global spa, salon, and wellne...

18 Marras 202526min

Suosittua kategoriassa Politiikka ja uutiset

uutiscast
aikalisa
politiikan-puskaradio
ootsa-kuullut-tasta-2
rss-ootsa-kuullut-tasta
rss-pinnalla
rss-vaalirankkurit-podcast
tervo-halme
rss-podme-livebox
rss-tasta-on-kyse-ivan-puopolo-verkkouutiset
rss-asiastudio
aihe
the-ulkopolitist
otetaan-yhdet
et-sa-noin-voi-sanoo-esittaa
rss-girls-finish-f1rst
rikosmyytit
rss-polikulaari-pitka-kiekko-ja-muut-ts-podcastit
rss-50100-podcast
linda-maria