Learning GenAI via SOTA Papers

EP281: Restoring plasticity to over-trained AI

22 min · I går
episode EP281: Restoring plasticity to over-trained AI cover

Beskrivelse

Title: When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff Source: http://arxiv.org/abs/2606.09932v1 Summary: This paper identifies and solves the critical 'loss of plasticity' bottleneck in the standard LLM post-training pipeline where excessive SFT inhibits subsequent RL optimization. It introduces 'Rejuvenation', a foundational training primitive that uses model fusion and neuron resets to enable robust reasoning gains during RL while preserving SFT-acquired knowledge.

Kommentarer

0

Vær den første til at kommentere

Tilmeld dig nu og bliv en del af Learning GenAI via SOTA Papers-fællesskabet!

Kom i gang

1 måned kun 9 kr.

Derefter 99 kr. / måned · Opsig når som helst.

  • Podcasts kun på Podimo
  • 20 lydbogstimer pr. måned
  • Gratis podcasts

Alle episoder

283 episoder