Measuring Progress on Scalable Oversight for Large Language Models | BlueDot Narrated | Podwise