Why reward models are still key to understanding alignment

Interconnects - A podcast by Nathan Lambert

Categories:

In an era dominated by direct preference optimization and LLMasajudge, why do we still need a model to output only a scalar reward?This is AI generated audio with Python and 11Labs. Music generated by Meta's MusicGen.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: In an era dominated by direct preference optimization and LLM-as-a-judge, why do we still need a model to output only a scalar reward?Podcast figures:Figure 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/reward-models/img_004.pngFigure 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/reward-models/img_009.png0:00 Why reward models are still key to understanding alignment Get full access to Interconnects at www.interconnects.ai/subscribe

Visit the podcast's native language site