d-Matrix - Ultra-low Latency Batched Inference for Gen AI

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Jaksot(2000)

AWS re:Invent: Ruth Buscombe on How AWS Helps F1 Engineers Read a Million Data Points a Second

AWS re:Invent: Ruth Buscombe on How AWS Helps F1 Engineers Read a Million Data Points a Second

Did you know a single Formula 1 car produces 1.1 million data points every second from hundreds of sensors? That number alone sets the tone for this conversation with Ruth Buscombe, an F1 strategist, ...

3 Joulu 202526min

3506: How Marriott International Builds Digital Fluency at Global Scale,

3506: How Marriott International Builds Digital Fluency at Global Scale,

Have you ever wondered how a company with nearly a million associates across continents keeps everyone learning, aligned, and prepared for constant change? That question sat at the heart of my convers...

2 Joulu 202524min

3505: When Home Improvement Meets Real-Time Intelligence

3505: When Home Improvement Meets Real-Time Intelligence

Have you ever wondered how an industry known for delays and uncertainty suddenly starts operating with the pace of a tech company? That thought stayed with me as I spoke with Eppie Vojt, the Chief Dig...

1 Joulu 202530min

3504: Building Software for a Cross Platform World

3504: Building Software for a Cross Platform World

What does it really mean to run a company that aims to be "good" before it ever thinks about becoming "great"? That was the question sitting with me as I sat down with Appfire's CEO, Matt Dircks. The ...

30 Marras 202538min

3503: The Next Security Challenge Created by AI Coding Tools

3503: The Next Security Challenge Created by AI Coding Tools

What happens when AI adoption surges inside companies faster than anyone can track, and the data that fuels those systems quietly slips out of sight? That question sat at the front of my mind as I spo...

29 Marras 202531min

3502: Preparing Teams for Change with AI Driven Upskilling

3502: Preparing Teams for Change with AI Driven Upskilling

Why does it feel as though every headline about the future of work points to AI pushing entry-level roles off a cliff? That question stayed with me as I sat down with Robin Adda, a long-time learning ...

29 Marras 202525min

3501: How Aily Labs is Bringing AI Decision Intelligence To Fortune 500 Teams

3501: How Aily Labs is Bringing AI Decision Intelligence To Fortune 500 Teams

Have you ever wondered what it looks like when an enterprise finally breaks free from spreadsheet-driven decision paralysis and lets AI take the wheel? That was the question at the back of my mind as ...

28 Marras 202531min

3500: Nullshot Reimagines How Teams Create With AI

3500: Nullshot Reimagines How Teams Create With AI

Is AI quietly pushing us to work alone when creativity has always thrived on collaboration? I'm joined by Joseph "Coop" Cooper, co-founder of Nullshot, to unpack a different vision for how AI should s...

27 Marras 202522min

Suosittua kategoriassa Politiikka ja uutiset

uutiscast
aikalisa
politiikan-puskaradio
ootsa-kuullut-tasta-2
rss-ootsa-kuullut-tasta
tervo-halme
rss-asiastudio
rss-vaalirankkurit-podcast
otetaan-yhdet
rss-podme-livebox
the-ulkopolitist
rss-raha-talous-ja-politiikka
et-sa-noin-voi-sanoo-esittaa
rss-kaikki-uusiksi
rss-hyvaa-huomenta-bryssel
rss-ulkopoditiikkaa
rss-pinnalla
rss-50100-podcast
rss-kuka-mina-olen
rss-girls-finish-f1rst