Tim & Heinrich — Democraticizing Reinforcement Learning Research

Tim & Heinrich — Democraticizing Reinforcement Learning Research

Since reinforcement learning requires hefty compute resources, it can be tough to keep up without a serious budget of your own. Find out how the team at Facebook AI Research (FAIR) is looking to increase access and level the playing field with the help of NetHack, an archaic rogue-like video game from the late 80s.

Links discussed:

The NetHack Learning Environment:

https://ai.facebook.com/blog/nethack-learning-environment-to-advance-deep-reinforcement-learning/

Reinforcement learning, intrinsic motivation:

https://arxiv.org/abs/2002.12292

Knowledge transfer:

https://arxiv.org/abs/1910.08210


Tim Rocktäschel is a Research Scientist at Facebook AI Research (FAIR) London and a Lecturer in the Department of Computer Science at University College London (UCL). At UCL, he is a member of the UCL Centre for Artificial Intelligence and the UCL Natural Language Processing group. Prior to that, he was a Postdoctoral Researcher in the Whiteson Research Lab, a Stipendiary Lecturer in Computer Science at Hertford College, and a Junior Research Fellow in Computer Science at Jesus College, at the University of Oxford.

https://twitter.com/_rockt


Heinrich Kuttler is an AI and machine learning researcher at Facebook AI Research (FAIR) and before that was a research engineer and team lead at DeepMind.

https://twitter.com/HeinrichKuttler

https://www.linkedin.com/in/heinrich-kuttler/


Topics covered:

0:00 a lack of reproducibility in RL

1:05 What is NetHack and how did the idea come to be?

5:46 RL in Go vs NetHack

11:04 performance of vanilla agents, what do you optimize for

18:36 transferring domain knowledge, source diving

22:27 human vs machines intrinsic learning

28:19 ICLR paper - exploration and RL strategies

35:48 the future of reinforcement learning

43:18 going from supervised to reinforcement learning

45:07 reproducibility in RL

50:05 most underrated aspect of ML, biggest challenges?


Get our podcast on these other platforms:

Apple Podcasts: http://wandb.me/apple-podcasts

Spotify: http://wandb.me/spotify

Google: http://wandb.me/google-podcasts

YouTube: http://wandb.me/youtube

Soundcloud: http://wandb.me/soundcloud


Tune in to our bi-weekly virtual salon and listen to industry leaders and researchers in machine learning share their research:

http://wandb.me/salon


Join our community of ML practitioners where we host AMA's, share interesting projects and meet other people working in Deep Learning:

http://wandb.me/slack


Our gallery features curated machine learning reports by researchers exploring deep learning techniques, Kagglers showcasing winning models, and industry leaders sharing best practices:

https://wandb.ai/gallery

Episoder(136)

The rise of AI agents

The rise of AI agents

In this episode of Gradient Dissent, host Lukas Biewald sits down with João Moura, CEO & Founder of CrewAI, one of the leading platforms enabling AI agents for enterprise applications. Joe shares insi...

25 Feb 202549min

R1, OpenAI’s o3, and the ARC-AGI Benchmark: Insights from Mike Knoop

R1, OpenAI’s o3, and the ARC-AGI Benchmark: Insights from Mike Knoop

In this episode of Gradient Dissent, host Lukas Biewald sits down with Mike Knoop, Co-founder and CEO of Ndea, a cutting-edge AI research lab. Mike shares his journey from building Zapier into a major...

4 Feb 20251h 12min

DeepSeek, Stargate and AI's $600 Billion Question with Sequoia's David Cahn

DeepSeek, Stargate and AI's $600 Billion Question with Sequoia's David Cahn

In this episode of Gradient Dissent, host Lukas Biewald sits down with David Cahn, partner at Sequoia Capital, for a compelling discussion on the dynamic world of AI investments. They dive into recent...

28 Jan 202558min

Building the future of collaborative AI development with Akshay Agrawal

Building the future of collaborative AI development with Akshay Agrawal

In this episode of Gradient Dissent, Akshay Agrawal, Co-Founder of Marimo, joins host Lukas Biewald to discuss the future of collaborative AI development. They dive into how Marimo is enabling develop...

7 Jan 202541min

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs.They discu...

17 Des 202455min

AI’s breakthrough in weather forecasting with Brightband’s Julian Green

AI’s breakthrough in weather forecasting with Brightband’s Julian Green

In this episode of Gradient Dissent, Julian Green, Co-founder & CEO of Brightband, joins host Lukas Biewald to discuss how AI is transforming weather forecasting and climate solutions.They explore Bri...

26 Nov 202449min

What’s the path to AGI? A conversation with Turing Co-founder and CEO Jonathan Siddharth

What’s the path to AGI? A conversation with Turing Co-founder and CEO Jonathan Siddharth

In this episode of Gradient Dissent, Jonathan Siddharth, CEO & Co-Founder of Turing, joins host Lukas Biewald to discuss the path to AGI.They explore how Turing built a "developer cloud" of 3.7 millio...

7 Nov 202454min

Vercel’s CEO & Founder Guillermo Rauch on the impact of AI on Web Development and Front End Engineering

Vercel’s CEO & Founder Guillermo Rauch on the impact of AI on Web Development and Front End Engineering

In this episode of Gradient Dissent, Guillermo Rauch, CEO & Founder of Vercel, joins host Lukas Biewald for a wide ranging discussion on how AI is changing web development and front end engineering. T...

24 Okt 202456min

Populært innen Business og økonomi

stopp-verden
lydartikler-fra-aftenposten
dine-penger-pengeradet
rss-penger-polser-og-politikk
e24-podden
rss-borsmorgen-okonominyhetene
pengesnakk
livet-pa-veien-med-jan-erik-larssen
utbytte
rss-pa-konto
pengepodden-2
finansredaksjonen
morgenkaffen-med-finansavisen
liberal-halvtime
tid-er-penger-en-podcast-med-peter-warren
stormkast-med-valebrokk-stordalen
lederpodden
rss-markedspuls-2
okonomiamatorene
rss-sunn-okonomi