Hyung Won Chung, a research scientist from OpenAI, delivers a lecture on Transformers, focusing on the dominant driving forces behind AI research and how understanding these forces can help predict future trajectories in the field. He argues that exponentially cheaper compute and associated scaling are the primary drivers, referencing Rich Sutton's plot on computational power. Chung discusses the "Bitter Lesson," emphasizing the importance of general methods with weaker modeling assumptions and scaling up with more data and compute. He analyzes the structures of encoder-decoder and decoder-only Transformer architectures, highlighting the trade-offs between adding structure for short-term gains and removing it for long-term scalability, and concludes by encouraging the audience to revisit assumptions in their problems to shape the future of AI effectively.
Sign in to continue reading, translating and more.
Continue