180: Reinforcement Learning

180: Reinforcement Learning

Intro topic: Grills

News/Links:

Book of the Show


Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h


Tool of the Show

  • Patrick:
    • Pokemon Sword and Shield
  • Jason:

Topic: Reinforcement Learning

  • Three types of AI
    • Supervised Learning
    • Unsupervised Learning
    • Reinforcement Learning
  • Online vs Offline RL
  • Optimization algorithms
    • Value optimization
      • SARSA
      • Q-Learning
    • Policy optimization
      • Policy Gradients
      • Actor-Critic
      • Proximal Policy Optimization
  • Value vs Policy Optimization
    • Value optimization is more intuitive (Value loss)
    • Policy optimization is less intuitive at first (policy gradients)
    • Converting values to policies in deep learning is difficult
  • Imitation Learning
    • Supervised policy learning
    • Often used to bootstrap reinforcement learning
  • Policy Evaluation
    • Propensity scoring versus model-based
  • Challenges to training RL model
    • Two optimization loops
      • Collecting feedback vs updating the model
    • Difficult optimization target
      • Policy evaluation
  • RLHF & GRPO

★ Support this podcast on Patreon ★

Suosittua kategoriassa Politiikka ja uutiset

rss-ootsa-kuullut-tasta
aikalisa
tervo-halme
ootsa-kuullut-tasta-2
politiikan-puskaradio
otetaan-yhdet
rss-podme-livebox
rikosmyytit
et-sa-noin-voi-sanoo-esittaa
rss-vaalirankkurit-podcast
rss-raha-talous-ja-politiikka
the-ulkopolitist
rss-kiina-ilmiot
rss-suomen-lehdiston-podcast
rss-polikulaari-humanisti-vastaa-ja-muut-ts-podcastit
viisupodi
radio-antro
rss-kovin-paikka
rss-kaikki-uusiksi
rss-50100-podcast