Chelsea Finn: Building Robots That Can Do Anything

Developing general-purpose robots requires shifting from purpose-built models to foundation models that leverage large-scale, diverse data. Physical Intelligence utilizes a "Pi Zero" architecture, which integrates vision-language models with diffusion-based action heads to enable robots to perform complex, long-horizon tasks like laundry folding and kitchen cleanup. Success hinges on a two-stage recipe: pre-training on broad, diverse datasets followed by fine-tuning on curated, high-quality demonstrations. This approach allows robots to generalize to unseen environments and respond to open-ended human prompts, effectively bridging the gap between digital intelligence and physical execution. By treating robotics as a foundation model problem, developers can avoid building custom software for every application, ultimately accelerating the deployment of versatile, intelligent agents capable of operating in real-world, dynamic settings.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise

Y Combinator

Overcoming Robotics Silos with General-Purpose Foundation Models

Scaling Robot Generalization to Novel Real-World Environments

Enabling Open-Ended Human-Robot Interaction through Hierarchical Models

Research Frontiers and Infrastructure for Physical Intelligence

Chelsea Finn: Building Robots That Can Do Anything

Y Combinator

00:00Overcoming Robotics Silos with General-Purpose Foundation Models

Overcoming Robotics Silos with General-Purpose Foundation Models

17:08Scaling Robot Generalization to Novel Real-World Environments

Scaling Robot Generalization to Novel Real-World Environments

25:19Enabling Open-Ended Human-Robot Interaction through Hierarchical Models

Enabling Open-Ended Human-Robot Interaction through Hierarchical Models

30:08Research Frontiers and Infrastructure for Physical Intelligence

Research Frontiers and Infrastructure for Physical Intelligence