Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 | Latent Space

The podcast explores the advancements and future directions of AI, particularly focusing on reasoning and reinforcement learning (RL). Yi Tay, from Google's Gemini team in Singapore, discusses the team's pursuit of AGI and the shift towards on-policy RL, emphasizing its generalizability over imitation learning. The conversation touches on the IMO gold medal win using Gemini, highlighting the move from specialized systems to end-to-end models. Further discussion includes the increasing utility of AI in coding, the importance of data efficiency, and the potential of LLMs in recommendation systems. Yi Tay also shares insights on establishing a frontier research lab in Singapore, emphasizing the importance of talent density and research taste.

Outlines

Part 1: Return to Google and AGI Vision

Part 2: Gemini, IMO Gold, and Reasoning Benchmarks

Part 3: Defining Reasoning and AI Productivity

Part 4: Architecture, Scaling, and Data Efficiency

Part 5: World Models and Information Retrieval

Part 6: Talent, Culture, and Personal Growth

Sign in to continue reading, translating and more.

Continue

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space

Part 1: Return to Google and AGI Vision

Introduction to Yi Tay's Return to Google and Gemini Team in Singapore

Reflections on Google Infrastructure, RL, and the On-Policy vs. Off-Policy Learning

On-Policy Learning: Analogies to Human Learning and Development

Machine Learning Insights into Human Learning: Learning Rate and Self-Consistency

Part 2: Gemini, IMO Gold, and Reasoning Benchmarks

Diving into the IMO World: Gemini's Gold Medal and the Shift from AlphaProof

IMO Gold: Model Output, Team Dynamics, and the Excitement of Live Competition

IMOCAT Codenames, Reasoning Efforts at GDM, and Pokemon Benchmarks

Completing the Pokedex: Planning, Research, and the Generation of Novel Knowledge

Part 3: Defining Reasoning and AI Productivity

Demystifying Reasoning: Chain of Thought, RL, and AI Coding

AI Coding: From Vibe Coding to Investigating Bugs and Improving Productivity

AI as a Passive Aura: Buffing Productivity and the Collective Effort of AI Research

Part 4: Architecture, Scaling, and Data Efficiency

Is Attention All You Need? Transformers, Scale, and the Future of AI Architecture

The Rate of New Ideas and the Closed Lab Advantage in AI Research

Memory vs. Compute Bound, Data Efficiency, and Data Optimal Training

Data Crunch, Pre-Training, and the Pursuit of More Efficient Learning Paradigms

Part 5: World Models and Information Retrieval

World Models: Video, Code, and the Resolution of Possible Worlds

Data Efficiency: Flops per Token, Learning Algorithms, and RL Environments

LLM REXIS: Ranking, Filtering, Personalization, and Re-Indexing

DSI and Generative Retrieval: Reimagining Search and the Modeling Dynamics of IR Tasks

Academic Benchmarks, Online Evals, and the Importance of Business Problems

Part 6: Talent, Culture, and Personal Growth

Geography Matters: Talent, Time Zones, and the Culture of AI

Hiring for Talent Density: RL, Coding Competitions, and Exceptional Achievements

Demonstrating Research Taste: Guidance, Taste, and the Importance of Health

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space

Part 1: Return to Google and AGI Vision

00:00Introduction to Yi Tay's Return to Google and Gemini Team in Singapore

Introduction to Yi Tay's Return to Google and Gemini Team in Singapore

01:25Reflections on Google Infrastructure, RL, and the On-Policy vs. Off-Policy Learning

Reflections on Google Infrastructure, RL, and the On-Policy vs. Off-Policy Learning

06:11On-Policy Learning: Analogies to Human Learning and Development

On-Policy Learning: Analogies to Human Learning and Development

07:58Machine Learning Insights into Human Learning: Learning Rate and Self-Consistency

Machine Learning Insights into Human Learning: Learning Rate and Self-Consistency

Part 2: Gemini, IMO Gold, and Reasoning Benchmarks

12:33Diving into the IMO World: Gemini's Gold Medal and the Shift from AlphaProof

Diving into the IMO World: Gemini's Gold Medal and the Shift from AlphaProof

19:00IMO Gold: Model Output, Team Dynamics, and the Excitement of Live Competition

IMO Gold: Model Output, Team Dynamics, and the Excitement of Live Competition

25:22IMOCAT Codenames, Reasoning Efforts at GDM, and Pokemon Benchmarks

IMOCAT Codenames, Reasoning Efforts at GDM, and Pokemon Benchmarks

27:08Completing the Pokedex: Planning, Research, and the Generation of Novel Knowledge

Completing the Pokedex: Planning, Research, and the Generation of Novel Knowledge

Part 3: Defining Reasoning and AI Productivity

32:23Demystifying Reasoning: Chain of Thought, RL, and AI Coding

Demystifying Reasoning: Chain of Thought, RL, and AI Coding

37:50AI Coding: From Vibe Coding to Investigating Bugs and Improving Productivity

AI Coding: From Vibe Coding to Investigating Bugs and Improving Productivity

40:51AI as a Passive Aura: Buffing Productivity and the Collective Effort of AI Research

AI as a Passive Aura: Buffing Productivity and the Collective Effort of AI Research

Part 4: Architecture, Scaling, and Data Efficiency

45:01Is Attention All You Need? Transformers, Scale, and the Future of AI Architecture

Is Attention All You Need? Transformers, Scale, and the Future of AI Architecture

51:02The Rate of New Ideas and the Closed Lab Advantage in AI Research

The Rate of New Ideas and the Closed Lab Advantage in AI Research

53:44Memory vs. Compute Bound, Data Efficiency, and Data Optimal Training

Memory vs. Compute Bound, Data Efficiency, and Data Optimal Training

57:05Data Crunch, Pre-Training, and the Pursuit of More Efficient Learning Paradigms

Data Crunch, Pre-Training, and the Pursuit of More Efficient Learning Paradigms

Part 5: World Models and Information Retrieval

1:01:13World Models: Video, Code, and the Resolution of Possible Worlds

World Models: Video, Code, and the Resolution of Possible Worlds

1:04:21Data Efficiency: Flops per Token, Learning Algorithms, and RL Environments

Data Efficiency: Flops per Token, Learning Algorithms, and RL Environments

1:07:50LLM REXIS: Ranking, Filtering, Personalization, and Re-Indexing

LLM REXIS: Ranking, Filtering, Personalization, and Re-Indexing

1:13:14DSI and Generative Retrieval: Reimagining Search and the Modeling Dynamics of IR Tasks

DSI and Generative Retrieval: Reimagining Search and the Modeling Dynamics of IR Tasks

1:17:50Academic Benchmarks, Online Evals, and the Importance of Business Problems

Academic Benchmarks, Online Evals, and the Importance of Business Problems

Part 6: Talent, Culture, and Personal Growth

1:21:56Geography Matters: Talent, Time Zones, and the Culture of AI

Geography Matters: Talent, Time Zones, and the Culture of AI

1:24:46Hiring for Talent Density: RL, Coding Competitions, and Exceptional Achievements

Hiring for Talent Density: RL, Coding Competitions, and Exceptional Achievements

1:27:03Demonstrating Research Taste: Guidance, Taste, and the Importance of Health

Demonstrating Research Taste: Guidance, Taste, and the Importance of Health