d-Matrix - Ultra-low Latency Batched Inference for Gen AI

What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Kokeile Premiumia

Nauti 14 päivää ilmaiseksi

Tilaa Premium

Jaksot(2000)

3554: The Mammoth Enterprise AI Browser and the Future of Secure Agentic Workflows

What happens when the web browser stops being a passive window to information and starts acting like an intelligent coworker, and why does that suddenly make security everyone's problem? At the start ...

14 Tammi 18min

3553: How Coralogix is Turning Observability Data Into Real Business Impact

What happens when engineering teams can finally see the business impact of every technical decision they make? In this episode of Tech Talks Daily, I sat down with Chris Cooney, Director of Advocacy a...

14 Tammi 32min

3552: How CI&T Is Turning AI Ambition Into Measurable Business Results

What does real AI transformation look like when leaders stop chasing prototypes and start demanding outcomes they can actually measure? That question sat at the center of my conversation with Alex Cro...

13 Tammi 33min

3551: AI That Delivers at Scale: Inside HGS and Real Business Transformation

What does AI-led transformation actually look like when it moves beyond pilots, hype, and slide decks and starts changing how work gets done every day? That question framed my conversation with Venk K...

12 Tammi 26min

3550: Signos and the Case for Seeing Your Metabolism in Real Time

What if the biggest breakthrough in weight management is not a new diet, but finally seeing how your body responds in real time? That question sat at the center of my conversation with Sharam Fouladga...

11 Tammi 27min

3549: Moonshot AI and the Rise of Self-Optimizing Websites

What if your website could spot its own problems, fix them, and quietly make more money while you focus on building your business? That question sat at the heart of my conversation with Aviv Frenkel, ...

11 Tammi 27min

3548: Logility and the AI Compass for Supply Chain Leaders

What happens when decades of supply chain planning collide with AI, volatility, and a world that no longer moves at a predictable pace? That question sat at the heart of my conversation with Piet Buyc...

10 Tammi 33min

3547: Telus Digital on the Human Role in the Final Mile of AI Safety and Security

Today's episode is a conversation with Bret Kinsella, recorded while he was in Las Vegas for CES and preparing to step onto the AI stage. Bret brings a rare combination of long-term perspective and ha...

9 Tammi 33min

Kaikki yhdessä sovelluksessa

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi yhdessä paikassa.

Sinulle valikoitua sisältöä

Podme-sovelluksessa kokoat suosikkisi helposti omaan kirjastoosi. Saat meiltä myös kuuntelusuosituksia!

Jatka kuuntelua koska tahansa

Voit jatkaa siitä mihin jäit, myös offline-tilassa.

Premium

9,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa

Aloita 14 päivän kokeilu

Premium

13,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa
Yksi lisäkäyttäjä

Kokeile 14 päivää maksutta

Suosittua kategoriassa Politiikka ja uutiset

rss-polikulaari-pitka-kiekko-ja-muut-ts-podcastit

rss-tasta-on-kyse-ivan-puopolo-verkkouutiset

Tarinat ja äänet, joita rakastat kuunnella

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi

Lue lisää