Why reward models are still key to understanding alignment
Interconnects - A podcast by Nathan Lambert
![](https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/68/94/99/68949976-4175-6665-5753-307a7a6dfcff/mza_8943019308595837552.jpg/300x300bb-75.jpg)
Categories:
In an era dominated by direct preference optimization and LLMasajudge, why do we still need a model to output only a scalar reward?This is AI generated audio with Python and 11Labs. Music generated by Meta's MusicGen.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: In an era dominated by direct preference optimization and LLM-as-a-judge, why do we still need a model to output only a scalar reward?Podcast figures:Figure 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/reward-models/img_004.pngFigure 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/reward-models/img_009.png0:00 Why reward models are still key to understanding alignment Get full access to Interconnects at www.interconnects.ai/subscribe