d-Matrix - Ultra-low Latency Batched Inference for Gen AI

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Avsnitt(2000)

3514: How JLL Is Reshaping Commercial Real Estate Through AI

3514: How JLL Is Reshaping Commercial Real Estate Through AI

Have you ever wondered what it takes to run technology for one of the largest commercial real estate companies in the world? That question shapes my conversation with Yao Morin, Global CTO at JLL, as ...

10 Dec 202534min

3513: How Dropbox Is Rethinking Work With AI And Dropbox Dash

3513: How Dropbox Is Rethinking Work With AI And Dropbox Dash

Did you ever stop and wonder how many hours you lose each week hunting for files, tabs, links, or half-written ideas scattered across your apps? It is a familiar frustration, and it sits at the center...

9 Dec 202538min

3512: How D2L's Rob Telfer Sees Universities Adapting to an AI First World

3512: How D2L's Rob Telfer Sees Universities Adapting to an AI First World

What does learning look like when technology shifts faster than most university systems can adapt? That question shaped my conversation with Rob Telfer, who leads education strategy for D2L across Eur...

8 Dec 202529min

3511: BCG on Closing the Gap Between AI Experiments and Real Business Impact

3511: BCG on Closing the Gap Between AI Experiments and Real Business Impact

*]:pointer-events-auto scroll-mt-[calc(var(--header-height)+min(200px,max(70px,20svh)))]" dir="auto" tabindex="-1" data-turn-id= "request-WEB:221c7553-c733-4456-a06c-c66c0626b35b-7" data-testid= "conv...

7 Dec 202525min

3510: Orange Business and the Rise of Digital Innovation Across IMEA

3510: Orange Business and the Rise of Digital Innovation Across IMEA

Did you know that when many people hear "Orange," they still ask if it involves SIM cards? That was the perfect place to begin my conversation with Sahem Azzam, President for IMEA and Inner Asia at Or...

6 Dec 202523min

3509: What AWS re:Invent Revealed About the Acceleration of Agentic AI

3509: What AWS re:Invent Revealed About the Acceleration of Agentic AI

Did you ever walk into a conference session thinking you were ready for the week, only to realise the announcements were coming so fast that you almost needed an agent of your own to keep up? That was...

5 Dec 202525min

3508: Movember at re:Invent, A Conversation on Tech and Men's Health

3508: Movember at re:Invent, A Conversation on Tech and Men's Health

Have you ever wondered how an idea that begins with two friends in a pub ends up shaping conversations about health all over the world? That was on my mind as I met  Graham Link & Timothy Gnaneswaran ...

4 Dec 202524min

AWS re:Invent: Ruth Buscombe on How AWS Helps F1 Engineers Read a Million Data Points a Second

AWS re:Invent: Ruth Buscombe on How AWS Helps F1 Engineers Read a Million Data Points a Second

Did you know a single Formula 1 car produces 1.1 million data points every second from hundreds of sensors? That number alone sets the tone for this conversation with Ruth Buscombe, an F1 strategist, ...

3 Dec 202526min

Populärt inom Politik & nyheter

aftonbladet-krim
svenska-fall
p3-krim
rss-krimstad
blenda-2
flashback-forever
rss-sanning-konsekvens
politiken
rss-vad-fan-hande
aftonbladet-daily
motiv
rss-krimreportrarna
spar
grans
svd-ledarredaktionen
rss-frandfors-horna
rss-flodet
dagens-eko
olyckan-inifran
rss-klubbland-en-podd-mest-om-frolunda