d-Matrix - Ultra-low Latency Batched Inference for Gen AI

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Episoder(2000)

3523: From Chaos to Clarity, Valiantys on Making AI Work for Developers

3523: From Chaos to Clarity, Valiantys on Making AI Work for Developers

How much value do your developers actually get to deliver in a typical week, and how much of their time is quietly lost to meetings, context hunting, and process drag? I'm joined by Phil Heijkoop, Glo...

18 Des 202530min

3522: Building the Future of Money at Gnosis With Dr. Friederike Ernst

3522: Building the Future of Money at Gnosis With Dr. Friederike Ernst

*]:pointer-events-auto [content-visibility:auto] supports-[content-visibility:auto]:[contain-intrinsic-size:auto_100lvh] scroll-mt-[calc(var(--header-height)+min(200px,max(70px,20svh)))]" dir="auto" t...

17 Des 202540min

3521: What ABB Is Seeing Across Global Industrial Energy Systems

3521: What ABB Is Seeing Across Global Industrial Energy Systems

In this episode of Tech Talks Daily, I'm joined by Stuart Thompson, President of ABB's Electrification Service Division, to explore the intersection of industrial sustainability, energy security, and ...

16 Des 202536min

3520: How Ecolab Is Rethinking Water Risk In An AI Driven World

3520: How Ecolab Is Rethinking Water Risk In An AI Driven World

Are we finally treating water risk like a board-level issue, rather than a line item that only shows up when something breaks?   In this episode, I'm joined by Emilio Tenuta, SVP and Chief Sustainabil...

15 Des 202531min

3519: How Verdent AI is Building the Next Generation AI Coding Agents.

3519: How Verdent AI is Building the Next Generation AI Coding Agents.

In this episode of Tech Talks Daily, I sit down with Yuyu Zhang to unpack a shift that many developers can feel but struggle to articulate. Yuyu's journey spans academic research at Georgia Tech, buil...

14 Des 202536min

3518: AWS re:Invent: The New Playbook For Detection, Response, And Secure AI

3518: AWS re:Invent: The New Playbook For Detection, Response, And Secure AI

How do you move faster with AI and cloud innovation without losing control of security along the way?   Recorded live from the show floor at AWS re:Invent in Las Vegas, this episode of Tech Talks Dail...

14 Des 202528min

3516: Twilio's Vision For AI First Engagement And The Rise Of Context Driven Interactions

3516: Twilio's Vision For AI First Engagement And The Rise Of Context Driven Interactions

How do you make sense of an industry that is changing at a pace few predicted, especially with SIGNAL London still fresh in our minds and Twilio unveiling the next stage of its vision for customer eng...

12 Des 202528min

3517: How AWS and the PGA Tour Are Changing Live Sports Technology

3517: How AWS and the PGA Tour Are Changing Live Sports Technology

How do you capture every moment of a golf tournament spread across hundreds of acres, tens of thousands of shots, and dozens of players competing at the same time? That question sits at the heart of t...

12 Des 202526min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
stopp-verden
popradet
forklart
lydartikler-fra-aftenposten
rss-ness
nokon-ma-ga
det-store-bildet
dine-penger-pengeradet
rss-gukild-johaug
hanna-de-heldige
aftenbla-bla
fotballpodden-2
rss-dannet-uten-piano
rss-utenrikskomiteen-med-bogen-og-grasvik
e24-podden
bt-dokumentar-2
rss-garne-damer