
The podcast features a speaker detailing the development of Olmo 3, a thinking model, and its evaluation process. The talk covers architecture considerations like Group Query Attention (GQA) for memory efficiency, data prioritization for reasoning, and long context extension. It also discusses post-training techniques such as SFT, DPO, and RL, emphasizing the infrastructure challenges and solutions for Reinforcement Learning from Verifiable Rewards (RLVR). The speaker reflects on the complexities of evaluation, including the high computational cost, variance in results, and the need for more diverse and representative benchmarks, while also addressing the challenges of benchmarking and the importance of open-source models for research.
Sign in to continue reading, translating and more.
Continue