d-Matrix - Ultra-low Latency Batched Inference for Gen AI

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Jaksot(2000)

Motive on Why Accurate, Real-Time Edge AI Saves Lives in Physical Operations.

Motive on Why Accurate, Real-Time Edge AI Saves Lives in Physical Operations.

As someone who spends a lot of time covering AI announcements, product launches, and conference stages, it is easy to forget that most AI today is still built for desks, screens, and digital workflows...

9 Helmi 29min

Building Responsible Agentic AI: Genpact's Blueprint For Enterprise Leaders

Building Responsible Agentic AI: Genpact's Blueprint For Enterprise Leaders

*]:pointer-events-auto scroll-mt-[calc(var(--header-height)+min(200px,max(70px,20svh)))]" dir="auto" tabindex="-1" data-turn-id= "54141b02-3c0e-46be-b764-c57b8d9d7ccc" data-testid= "conversation-turn-...

9 Helmi 32min

Slalom On The AI Leadership Gap Between Confidence And Capability

Slalom On The AI Leadership Gap Between Confidence And Capability

What happens when leaders are confident about AI, but the people expected to use it are not ready? In this episode of Tech Talks Daily, I sat down with Caroline Grant from Slalom Consulting to explore...

8 Helmi 32min

LastPass CEO: If the Browser is AI's New Interface, What Does it Mean for Security?

LastPass CEO: If the Browser is AI's New Interface, What Does it Mean for Security?

Is the browser quietly becoming the most powerful and dangerous interface in modern work? In this episode of Tech Talks Daily, I sat down with Karim Toubba, CEO of LastPass, to unpack a shift that man...

7 Helmi 30min

Harness And The AI Velocity Paradox Slowing Software Delivery

Harness And The AI Velocity Paradox Slowing Software Delivery

What really happens when AI helps teams write code faster, but everything else in the delivery process starts to slow down? In this episode of Tech Talks Daily, I'm joined once again by returning gues...

6 Helmi 34min

FreedomPay on The $44.4 Billion Payment Risk Facing Retail And Hospitality

FreedomPay on The $44.4 Billion Payment Risk Facing Retail And Hospitality

What really happens to a business when payments stop working, even for a few minutes? I recorded this episode live at Dynatrace Perform in Las Vegas, inside the Venetian, surrounded by engineers, oper...

5 Helmi 25min

What Bubble Learned About Responsibility in AI-built Apps

What Bubble Learned About Responsibility in AI-built Apps

In this episode of Tech Talks Daily, I'm joined by Josh Haas, co-founder and co-CEO of Bubble, to unpack why the next phase of software creation is already taking shape. We talk about how the early ex...

5 Helmi 24min

Cloudinary and the Business Case for Developer-Led Product Growth

Cloudinary and the Business Case for Developer-Led Product Growth

How do you turn a developer-first product into a growth engine without losing trust, clarity, or focus along the way? In this episode of Tech Talks Daily, I'm joined by Sanjay Sarathy, VP of Developer...

4 Helmi 27min

Suosittua kategoriassa Politiikka ja uutiset

uutiscast
aikalisa
politiikan-puskaradio
ootsa-kuullut-tasta-2
rss-ootsa-kuullut-tasta
tervo-halme
rss-podme-livebox
rss-asiastudio
otetaan-yhdet
rss-vaalirankkurit-podcast
rss-raha-talous-ja-politiikka
the-ulkopolitist
et-sa-noin-voi-sanoo-esittaa
aihe
linda-maria
rss-sinivalkoinen-islam
rss-polikulaari-pitka-kiekko-ja-muut-ts-podcastit
rss-hyvaa-huomenta-bryssel
rss-girls-finish-f1rst
rss-tasta-on-kyse-ivan-puopolo-verkkouutiset