The Evaluators Are Being Evaluated — Pavel Izmailov (Anthropic/NYU) | The MAD Podcast with Matt Turck

The podcast explores AI safety and reasoning, particularly focusing on the potential risks of advanced AI models developing deceptive behaviors. Pavel Izmailov, a researcher at Anthropic and professor at NYU, discusses the cultural differences between major AI labs like Anthropic, OpenAI, and XAI, based on his experience. He also addresses a viral article about AI models evolving "alien survival instincts," noting that while such behaviors can be observed, they often require contrived scenarios. Izmailov further examines the concept of "epiplexity," a new measure of information content dependent on the observer's computational power, and its implications for synthetic data. The conversation also covers the challenges and future directions in AI alignment, reasoning, and the potential impact of AI on science and mathematics.

Outlines

Part 1: AI Deception, Myths, and Reality

Part 2: Alignment Frameworks and Safety

Part 3: Advanced Supervision and Interpretability

Part 4: Reasoning, Compute, and Automation

Part 5: Epiplexity and Synthetic Data

Part 6: Future Outlook and Research

Sign in to continue reading, translating and more.

Continue

The Evaluators Are Being Evaluated — Pavel Izmailov (Anthropic/NYU)

The MAD Podcast with Matt Turck

Part 1: AI Deception, Myths, and Reality

Debunking the AI "Alien Survival Instincts" Article: Reality vs. Hype

The Origins of Deceptive Behavior in AI Models: Nature vs. Nurture

Part 2: Alignment Frameworks and Safety

Defining AI Alignment and Super Alignment: Ensuring Beneficial AI Behavior

Pavel Izmailov's Journey: From Russia to Leading AI Labs

Academia vs. Industry: Balancing Exploration and Execution in AI Research

Reasoning and AI Alignment: Navigating the Risks of Smarter Models

Part 3: Advanced Supervision and Interpretability

Scalable Oversight: Using Models to Align and Supervise Other Models

Weak-to-Strong Generalization: Training AI Beyond Human Capabilities

Assessing Progress in AI Alignment: Concerns and Open Questions

Mechanistic Interpretability: Understanding the Inner Workings of AI Models

Part 4: Reasoning, Compute, and Automation

Reasoning in AI: Progress, Generalization, and the Transformer Paradigm

Compute Multipliers: New Ideas for Efficient AI Training

Long Horizon Tasks: Automating Complex, Extended AI Processes

Part 5: Epiplexity and Synthetic Data

Epiplexity: A New Measure of Information Content in Data

The Impact of Epiplexity on Synthetic Data and AI Training

Part 6: Future Outlook and Research

AI Predictions for 2026: Reasoning, Multi-Agent Systems, and Scientific Discovery

Research Focus for PhD Students: Exploratory Work in Pre-training, Architectures, and More

The Evaluators Are Being Evaluated — Pavel Izmailov (Anthropic/NYU)

The MAD Podcast with Matt Turck

Part 1: AI Deception, Myths, and Reality

00:22Debunking the AI "Alien Survival Instincts" Article: Reality vs. Hype

Debunking the AI "Alien Survival Instincts" Article: Reality vs. Hype

03:33The Origins of Deceptive Behavior in AI Models: Nature vs. Nurture

The Origins of Deceptive Behavior in AI Models: Nature vs. Nurture

Part 2: Alignment Frameworks and Safety

05:25Defining AI Alignment and Super Alignment: Ensuring Beneficial AI Behavior

Defining AI Alignment and Super Alignment: Ensuring Beneficial AI Behavior

07:34Pavel Izmailov's Journey: From Russia to Leading AI Labs

Pavel Izmailov's Journey: From Russia to Leading AI Labs

11:54Academia vs. Industry: Balancing Exploration and Execution in AI Research

Academia vs. Industry: Balancing Exploration and Execution in AI Research

13:40Reasoning and AI Alignment: Navigating the Risks of Smarter Models

Reasoning and AI Alignment: Navigating the Risks of Smarter Models

Part 3: Advanced Supervision and Interpretability

16:19Scalable Oversight: Using Models to Align and Supervise Other Models

Scalable Oversight: Using Models to Align and Supervise Other Models

17:49Weak-to-Strong Generalization: Training AI Beyond Human Capabilities

Weak-to-Strong Generalization: Training AI Beyond Human Capabilities

20:24Assessing Progress in AI Alignment: Concerns and Open Questions

Assessing Progress in AI Alignment: Concerns and Open Questions

22:37Mechanistic Interpretability: Understanding the Inner Workings of AI Models

Mechanistic Interpretability: Understanding the Inner Workings of AI Models

Part 4: Reasoning, Compute, and Automation

25:08Reasoning in AI: Progress, Generalization, and the Transformer Paradigm

Reasoning in AI: Progress, Generalization, and the Transformer Paradigm

27:07Compute Multipliers: New Ideas for Efficient AI Training

Compute Multipliers: New Ideas for Efficient AI Training

29:35Long Horizon Tasks: Automating Complex, Extended AI Processes

Long Horizon Tasks: Automating Complex, Extended AI Processes

Part 5: Epiplexity and Synthetic Data

31:49Epiplexity: A New Measure of Information Content in Data

Epiplexity: A New Measure of Information Content in Data

36:46The Impact of Epiplexity on Synthetic Data and AI Training

The Impact of Epiplexity on Synthetic Data and AI Training

Part 6: Future Outlook and Research

38:19AI Predictions for 2026: Reasoning, Multi-Agent Systems, and Scientific Discovery

AI Predictions for 2026: Reasoning, Multi-Agent Systems, and Scientific Discovery

41:42Research Focus for PhD Students: Exploratory Work in Pre-training, Architectures, and More

Research Focus for PhD Students: Exploratory Work in Pre-training, Architectures, and More