Why Vision Language Models Ignore What They See with Munawar Hayat - #758

Why Vision Language Models Ignore What They See with Munawar Hayat - #758

In this episode, we’re joined by Munawar Hayat, researcher at Qualcomm AI Research, to discuss a series of papers presented at NeurIPS 2025 focusing on multimodal and generative AI. We dive into the persistent challenge of object hallucination in Vision-Language Models (VLMs), why models often discard visual information in favor of pre-trained language priors, and how his team used attention-guided alignment to enforce better visual grounding. We also explore a novel approach to generalized contrastive learning designed to solve complex, composed retrieval tasks—such as searching via combined text and image queries—without increasing inference costs. Finally, we cover the difficulties generative models face when rendering multiple human subjects, and the new "MultiHuman Testbench" his team created to measure and mitigate issues like identity leakage and attribute blending. Throughout the discussion, we examine how these innovations align with the need for efficient, on-device AI deployment. The complete show notes for this episode can be found at https://twimlai.com/go/758.

Avsnitt(783)

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757

In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running al...

2 Dec 202548min

Proactive Agents for the Web with Devi Parikh - #756

Proactive Agents for the Web with Devi Parikh - #756

Today, we're joined by Devi Parikh, co-founder and co-CEO of Yutori, to discuss browser use models and a future where we interact with the web through proactive, autonomous agents. We explore the tech...

19 Nov 202556min

AI Orchestration for Smart Cities and the Enterprise with Robin Braun and Luke Norris - #755

AI Orchestration for Smart Cities and the Enterprise with Robin Braun and Luke Norris - #755

Today, we're joined by Robin Braun, VP of AI business development for hybrid cloud at HPE, and Luke Norris, co-founder and CEO of Kamiwaza, to discuss how AI systems can be used to automate complex wo...

12 Nov 202554min

Building an AI Mathematician with Carina Hong - #754

Building an AI Mathematician with Carina Hong - #754

In this episode, Carina Hong, founder and CEO of Axiom, joins us to discuss her work building an "AI Mathematician." Carina explains why this is a pivotal moment for AI in mathematics, citing a conver...

4 Nov 202555min

High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753

High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753

In this episode, Hung Bui, Technology Vice President at Qualcomm, joins us to explore the latest high-efficiency techniques for running generative AI, particularly diffusion models, on-device. We dive...

28 Okt 202552min

Vibe Coding's Uncanny Valley with Alexandre Pesant - #752

Vibe Coding's Uncanny Valley with Alexandre Pesant - #752

Today, we're joined by Alexandre Pesant, AI lead at Lovable, who joins us to discuss the evolution and practice of vibe coding. Alex shares his take on how AI is enabling a shift in software developme...

22 Okt 20251h 12min

Dataflow Computing for AI Inference with Kunle Olukotun - #751

Dataflow Computing for AI Inference with Kunle Olukotun - #751

In this episode, we're joined by Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and co-founder and chief technologist at Sambanova Systems, to discuss ...

14 Okt 202557min

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750

Today, we're joined by Jacob Buckman, co-founder and CEO of Manifest AI to discuss achieving long context in transformers. We discuss the bottlenecks of scaling context length and recent techniques to...

7 Okt 202557min

Populärt inom Politik & nyheter

aftonbladet-krim
svenska-fall
rss-krimstad
p3-krim
flashback-forever
politiken
rss-sanning-konsekvens
aftonbladet-daily
blenda-2
spar
rss-vad-fan-hande
rss-krimreportrarna
motiv
rss-frandfors-horna
rss-flodet
svd-ledarredaktionen
rss-aftonbladet-krim
dagens-eko
olyckan-inifran
spotlight