d-Matrix - Ultra-low Latency Batched Inference for Gen AI

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Episoder(2000)

How InfoScale Is Redefining Enterprise Resilience In A Multi-Cloud World

How InfoScale Is Redefining Enterprise Resilience In A Multi-Cloud World

Have you noticed how every week brings a new headline about AI driven fraud, yet it still feels hard to tell what is real risk and what is noise? In this Tech Talks Daily episode, I'm joined by Tommy ...

6 Mar 32min

How Ticket Fairy Is Rebuilding The Technology Behind Live Events

How Ticket Fairy Is Rebuilding The Technology Behind Live Events

Have you ever bought a ticket to a show and wondered why the experience still feels strangely disconnected, with one app for ticketing, another for marketing, another for refunds, and a dozen spreadsh...

6 Mar 22min

Hiring AI Talent Across Borders With Alcor

Hiring AI Talent Across Borders With Alcor

Have you ever looked at a global hiring plan and wondered whether you are building a team, or accidentally buying a bundle of hidden fees, legal risk, and avoidable stress? In this episode, I'm joined...

5 Mar 42min

How Flashfood Uses Data And AI To Solve The Grocery Food Waste Crisis

How Flashfood Uses Data And AI To Solve The Grocery Food Waste Crisis

How can a world that produces more than enough food still leave millions of people struggling to put a healthy meal on the table? In this episode of Tech Talks Daily, I speak with Jordan Schenck, CEO ...

4 Mar 39min

SmartRecruiters On Turning AI Experiments Into Business Outcomes

SmartRecruiters On Turning AI Experiments Into Business Outcomes

Is 2026 the year AI finally has to prove it is worth the investment? In this episode, I'm joined by Chris Riche-Webber, VP of Business Intelligence and Analytics at SmartRecruiters, to explore why so ...

4 Mar 27min

From Core To Edge: Akamai On Where AI Inference Must Live Next

From Core To Edge: Akamai On Where AI Inference Must Live Next

What if the real AI race in 2026 isn't about building bigger models, but about where decisions are made, how fast they happen, and whether they deliver measurable value? In this episode, I'm joined by...

3 Mar 27min

Removing Friction From Work: How Notion Is Redesigning The Modern Workplace

Removing Friction From Work: How Notion Is Redesigning The Modern Workplace

What happens when AI moves from a standalone tool to a teammate that works inside the flow of your organization? In this episode, I'm joined by Mick Hodgins, General Manager for EMEA at Notion, to exp...

2 Mar 31min

Technical Debt, Monoliths, And Microservices: Hexaware's Path To AI Readiness

Technical Debt, Monoliths, And Microservices: Hexaware's Path To AI Readiness

*]:pointer-events-auto scroll-mt-[calc(var(--header-height)+min(200px,max(70px,20svh)))]" dir="auto" tabindex="-1" data-turn-id= "request-WEB:927f9cca-7aa3-47be-8fb7-33bf01261dc7-6" data-testid= "conv...

1 Mar 26min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
popradet
forklart
lydartikler-fra-aftenposten
stopp-verden
dine-penger-pengeradet
rss-gukild-johaug
det-store-bildet
rss-ness
nokon-ma-ga
hanna-de-heldige
fotballpodden-2
aftenbla-bla
rss-penger-polser-og-politikk
rss-dannet-uten-piano
rss-utenrikskomiteen-med-bogen-og-grasvik
e24-podden
bt-dokumentar-2