TechQuanta: Engineering & Science

Deep Reinforcement Learning from Human Preferences

14 min · 29. Okt. 2024
Episode Deep Reinforcement Learning from Human Preferences Cover

Beschreibung

This podcast dives into how reinforcement learning (RL) can use human feedback to achieve complex goals without traditional reward functions. It details research from OpenAI and DeepMind that employs human comparisons of agent behaviors to train a reward model, allowing agents to learn tasks that are difficult to define with simple rewards, like performing backflips or playing video games. Human feedback enables the RL system to improve with only minimal input—less than 1% of agent-environment interactions—making human-guided RL more practical. This approach could make RL more aligned with human intentions, a crucial step for future AI applications.

Kommentare

0

Sei die erste Person, die kommentiert

Melde dich jetzt an und werde Teil der TechQuanta: Engineering & Science-Community!

Loslegen

2 Monate für 1 €

Dann 4,99 € / Monat · Jederzeit kündbar.

  • Podcasts nur bei Podimo
  • 20 Stunden Hörbücher / Monat
  • Alle kostenlosen Podcasts

Alle Folgen

19 Folgen