Speculative Decoding and Efficient LLM Inference with Chris Lott - #717

Speculative Decoding and Efficient LLM Inference with Chris Lott - #717

Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language model inference. We explore the challenges presented by the LLM encoding and decoding (aka generation) and how these interact with various hardware constraints such as FLOPS, memory footprint and memory bandwidth to limit key inference metrics such as time-to-first-token, tokens per second, and tokens per joule. We then dig into a variety of techniques that can be used to accelerate inference such as KV compression, quantization, pruning, speculative decoding, and leveraging small language models (SLMs). We also discuss future directions for enabling on-device agentic experiences such as parallel generation and software tools like Qualcomm AI Orchestrator. The complete show notes for this episode can be found at https://twimlai.com/go/717.

Jaksot(782)

This Week in ML & AI - 6/24/16: Dueling Neural Networks at ICML, Plus Training a Robotic Housekeeper

This Week in ML & AI - 6/24/16: Dueling Neural Networks at ICML, Plus Training a Robotic Housekeeper

This Week in Machine Learning & AI brings you the week’s most interesting and important stories from the world of machine learning and artificial intelligence. This week's show covers the Internationa...

25 Kesä 201625min

This Week in Machine Learning & AI - 6/17/16: Apple's New ML APIs, IBM Brings Deep Learning Thunder

This Week in Machine Learning & AI - 6/17/16: Apple's New ML APIs, IBM Brings Deep Learning Thunder

This Week in Machine Learning & AI brings you the week’s most interesting and important stories from the world of machine learning and artificial intelligence. This week’s podcast digs into Apple's ML...

18 Kesä 201624min

This Week In Machine Learning & AI - 6/10/16: Self-Motivated AI, Plus A Kill-Switch for Rogue Bots

This Week In Machine Learning & AI - 6/10/16: Self-Motivated AI, Plus A Kill-Switch for Rogue Bots

This Week in Machine Learning & AI brings you the week’s most interesting and important stories from the world of machine learning and artificial intelligence. This week’s podcast looks at new researc...

11 Kesä 201624min

This Week In Machine Learning & AI - 6/3/16: Facebook's DeepText, ML & Art, Artificial Assistants

This Week In Machine Learning & AI - 6/3/16: Facebook's DeepText, ML & Art, Artificial Assistants

This Week in Machine Learning & AI brings you the week’s most interesting and important stories from the world of machine learning and artificial intelligence. This week’s podcast looks at Facebooks' ...

4 Kesä 201624min

This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars

This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars

This Week in Machine Learning & AI brings you the week's most interesting and important stories from the world of machine learning and artificial intelligence. This week's episode explores the White H...

28 Touko 201625min

This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE

This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE

This Week In Machine Learning & AI - May 20, 2016. Google I/O, deep learning hardware and an AI to save you from conference call hell.

21 Touko 201619min

Suosittua kategoriassa Politiikka ja uutiset

uutiscast
aikalisa
ootsa-kuullut-tasta-2
rss-ootsa-kuullut-tasta
politiikan-puskaradio
tervo-halme
rss-podme-livebox
rss-vaalirankkurit-podcast
et-sa-noin-voi-sanoo-esittaa
the-ulkopolitist
otetaan-yhdet
rss-asiastudio
aihe
rikosmyytit
rss-hyvaa-huomenta-bryssel
rss-merja-mahkan-rahat
rss-kaikki-uusiksi
rss-aijat-hopottaa-podcast
rss-raha-talous-ja-politiikka
viisupodi