YouTube22 Jul 2025
44m

Chelsea Finn: Building Robots That Can Do Anything

Podcast cover

Y Combinator

Developing general-purpose robots requires shifting from purpose-built models to foundation models that leverage large-scale, diverse data. Physical Intelligence utilizes a "Pi Zero" architecture, which integrates vision-language models with diffusion-based action heads to enable robots to perform complex, long-horizon tasks like laundry folding and kitchen cleanup. Success hinges on a two-stage recipe: pre-training on broad, diverse datasets followed by fine-tuning on curated, high-quality demonstrations. This approach allows robots to generalize to unseen environments and respond to open-ended human prompts, effectively bridging the gap between digital intelligence and physical execution. By treating robotics as a foundation model problem, developers can avoid building custom software for every application, ultimately accelerating the deployment of versatile, intelligent agents capable of operating in real-world, dynamic settings.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise