In this podcast episode, Jack Parker-Holder and Shlomi Fruchter from Google DeepMind discuss Genie 3, a groundbreaking model capable of generating interactive and consistent virtual environments in real-time from text prompts. They delve into the project's origins, highlighting the integration of previous projects like Genie 2 and Game & Gen, and emphasize Genie 3's unique "special memory" feature, which allows for persistent world states. The conversation explores potential applications in gaming, robotics, education, and agent training, while also addressing the challenges of balancing text adherence with realistic world simulation. The speakers also touch upon the differences between Genie 3 and other video generation models like VO3, and the future directions of world model research, including the possibility of multi-user environments and enhanced physical understanding.
Sign in to continue reading, translating and more.
Continue