Inside Nano Banana 🍌 and the Future of Vision-Language Models with Oliver Wang - #748

Inside Nano Banana 🍌 and the Future of Vision-Language Models with Oliver Wang - #748

Today, we’re joined by Oliver Wang, principal scientist at Google DeepMind and tech lead for Gemini 2.5 Flash Image—better known by its code name, “Nano Banana.” We dive into the development and capabilities of this newly released frontier vision-language model, beginning with the broader shift from specialized image generators to general-purpose multimodal agents that can use both visual and textual data for a variety of tasks. Oliver explains how Nano Banana can generate and iteratively edit images while maintaining consistency, and how its integration with Gemini’s world knowledge expands creative and practical use cases. We discuss the tension between aesthetics and accuracy, the relative maturity of image models compared to text-based LLMs, and scaling as a driver of progress. Oliver also shares surprising emergent behaviors, the challenges of evaluating vision-language models, and the risks of training on AI-generated data. Finally, we look ahead to interactive world models and VLMs that may one day “think” and “reason” in images. The complete show notes for this episode can be found at https://twimlai.com/go/748.

Episoder(778)

LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736

LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736

Today, we're joined by Ben Wellington, deputy head of feature forecasting at Two Sigma. We dig into the team’s end-to-end approach to leveraging AI in equities feature forecasting, covering how they i...

17 Jun 202559min

Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735

Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735

Today, we're joined by Jason Corso, co-founder of Voxel51 and professor at the University of Michigan, to explore automated labeling in computer vision. Jason introduces FiftyOne, an open-source platf...

10 Jun 202556min

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles fro...

5 Jun 20251h 25min

Google I/O 2025 Special Edition - #733

Google I/O 2025 Special Edition - #733

Today, I’m excited to share a special crossover edition of the podcast recorded live from Google I/O 2025! In this episode, I join Shawn Wang aka Swyx from the Latent Space Podcast, to interview Logan...

28 Mai 202526min

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732

Today, we're joined by Sebastian Gehrmann, head of responsible AI in the Office of the CTO at Bloomberg, to discuss AI safety in retrieval-augmented generation (RAG) systems and generative AI in high-...

21 Mai 202557min

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731

Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mah...

13 Mai 20251h 1min

How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730

How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730

Today, we're joined by Josh Tobin, member of technical staff at OpenAI, to discuss the company’s approach to building AI agents. We cover OpenAI's three agentic offerings—Deep Research for comprehensi...

6 Mai 20251h 7min

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evalu...

30 Apr 202556min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
forklart
stopp-verden
popradet
det-store-bildet
fotballpodden-2
dine-penger-pengeradet
rss-gukild-johaug
bt-dokumentar-2
nokon-ma-ga
lydartikler-fra-aftenposten
aftenbla-bla
hanna-de-heldige
rss-dannet-uten-piano
e24-podden
frokostshowet-pa-p5
rss-ness
rss-penger-polser-og-politikk