02 Apr 2026
1h 6m

Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — with Chris Manning and Fan-yun Sun

Podcast cover

Latent Space: The AI Engineer Podcast

Moonlake's founders, Fan-yun Sun and Chris Manning, discuss their approach to building world models, emphasizing structure and reasoning over pure scale. They differentiate their work from video generation models like Sora by focusing on action-conditioned models that predict the consequences of actions over longer timescales, requiring abstracted semantic understanding. Manning critiques Yann LeCun's view on the limited utility of language, arguing for the power of symbolic representations in achieving causal understanding and long-term consistency. Moonlake employs a multimodal reasoning model for causality and a diffusion model named Reverie to restyle the persistent representation into photorealistic styles. They envision their technology as a new paradigm of rendering, enabling programmable interactions and customization in gaming and embodied AI.

Outlines

Part 1: Introduction, Context

Part 2: Core Philosophy, World Models

Part 3: Technical Implementation, Interactivity

Part 4: Evaluation, Utility

Part 5: Product Vision, Multimodality

Part 6: Career, Hiring, Future

Sign in to continue reading, translating and more.

Open full episode in Podwise