d-Matrix - Ultra-low Latency Batched Inference for Gen AI

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Avsnitt(2000)

From IoT To AI: How Middleby Is Powering The Future Of Foodservice

From IoT To AI: How Middleby Is Powering The Future Of Foodservice

What if the biggest transformation in hospitality isn't happening in the dining room, but in the kitchen you never see? In this episode, I'm joined by James Pool, Chief Technology and Operations Offic...

1 Mars 26min

From Data Overload To Decision Advantage: Inside  Anticipatory Intelligence with Ansel Stein

From Data Overload To Decision Advantage: Inside Anticipatory Intelligence with Ansel Stein

In this episode, I'm joined by Ansel Stein, Vice President of Operations at Crisis24, and the leader behind AiiA powered by Palantir, an intelligence platform built to help executives cut through nois...

28 Feb 23min

From FBI Gag Order To Privacy-First Telco: The Nicholas Merrill Story

From FBI Gag Order To Privacy-First Telco: The Nicholas Merrill Story

How did a routine request from the FBI turn into a decade-long legal battle that helped reshape modern privacy law and ultimately inspire a new kind of mobile network? In this episode, I sit down with...

28 Feb 29min

AI Fraud vs AI Scams, Alloy CEO Tommy Nicholas Explains The Difference

AI Fraud vs AI Scams, Alloy CEO Tommy Nicholas Explains The Difference

Have you noticed how every week brings a new headline about AI driven fraud, yet it still feels hard to tell what is real risk and what is noise? In this Tech Talks Daily episode, I'm joined by Tommy ...

27 Feb 54min

How Lenovo Is Preparing Classrooms For The AI Era

How Lenovo Is Preparing Classrooms For The AI Era

How do you prepare an entire generation for a world where AI is already shaping how we work, create, and solve problems? In this episode of Tech Talks Daily, I'm joined by Dr. Tara Nattrass, Chief Inn...

26 Feb 30min

ServiceNow, Dynatrace And The Future Of End-To-End IT Autonomy

ServiceNow, Dynatrace And The Future Of End-To-End IT Autonomy

What does autonomous IT really look like when you move beyond the slideware and start wiring systems together in the real world? At Dynatrace Perform in Las Vegas, I sat down with Pablo Stern, EVP and...

25 Feb 30min

Scrut Automation And The Security Blind Spot Facing The 99%

Scrut Automation And The Security Blind Spot Facing The 99%

What happens when nearly half of organizations admit they have no AI-specific security controls, yet AI-driven data leaks are accelerating at the same time? In this episode of Tech Talks Daily, I spok...

24 Feb 24min

Inside Epicor's Approach To Inclusive, High-Performing Tech Teams

Inside Epicor's Approach To Inclusive, High-Performing Tech Teams

How do you build enterprise software for the companies that keep the world turning, while also building a leadership culture where people can actually thrive? In this episode of Tech Talks Daily, I sp...

24 Feb 33min

Populärt inom Politik & nyheter

aftonbladet-krim
svenska-fall
p3-krim
rss-krimstad
flashback-forever
spar
rss-sanning-konsekvens
rss-vad-fan-hande
aftonbladet-daily
motiv
rss-krimreportrarna
politiken
rss-klubbland-en-podd-mest-om-frolunda
grans
rss-flodet
rss-aftonbladet-krim
olyckan-inifran
krimmagasinet
rss-frandfors-horna
dagens-eko