[AIEWF Preview] Gemini in 2025 and Realtime Voice AI

The podcast episode centers on Google's Gemini updates and the future of voice-based AI applications, particularly within the context of the Live API. Logan Kilpatrick and Shrestha Basu Mallick highlight features like thinking budgets for 2.5 Pro and native audio output, emphasizing developer control and multilingual capabilities. The discussion explores the challenges and infrastructure required for real-time voice agents, including voice activity detection and latency reduction, with Kwindla Hultman Kramer offering insights from Daily's partnership with Google. A key point of discussion involves the balance between componentized models and a unified Gemini model, with the ultimate goal of integrating diverse capabilities. The speakers touch on proactive audio and speaker identification as emerging features, and express wishes for more language support and integrated capabilities in future Gemini iterations.

Outlines

Part 1: Introduction, Team Roles

Part 2: Gemini API Features, Caching, UI

Part 3: Live API, Audio/Video, Workflows

Part 4: Partnerships, Infrastructure, Voice Agents

Part 5: Future Outlook, Closing

Sign in to continue reading, translating and more.

Continue

Latent Space: The AI Engineer Podcast

Part 1: Introduction, Team Roles

Introduction to Gemini API Updates and Team Roles at Google I/O

Part 2: Gemini API Features, Caching, UI

Gemini 2.5 Pro Budgets, Thought Summaries, and Native Audio Output Highlights

Implicit Context Caching and the Challenges of Caching in AI Models

Gemini Team's Video Content and the Potential of Gemini Diffusion for Generative UI

Part 3: Live API, Audio/Video, Workflows

Audio/Video's Role and Challenges in Live API Development

Complex Workflows and the Future of Voice-Based Applications

Gemini's Unified Model Approach and the Role of Specialized Models

Part 4: Partnerships, Infrastructure, Voice Agents

Introduction of Kwindla Hultman Kramer and Daily's Partnership with Gemini

Gemini Live's Model Cascade and the Balance of Latency, Cost, and Quality

Real-Time Voice Agents and the Importance of Low-Latency Networking

Proactive Audio and Speaker Identification in Gemini Models

Part 5: Future Outlook, Closing

Wish List for Next Year's I.O. and Closing Remarks

[AIEWF Preview] Gemini in 2025 and Realtime Voice AI

Latent Space: The AI Engineer Podcast

Part 1: Introduction, Team Roles

00:04Introduction to Gemini API Updates and Team Roles at Google I/O

Introduction to Gemini API Updates and Team Roles at Google I/O

Part 2: Gemini API Features, Caching, UI

01:14Gemini 2.5 Pro Budgets, Thought Summaries, and Native Audio Output Highlights

Gemini 2.5 Pro Budgets, Thought Summaries, and Native Audio Output Highlights

03:52Implicit Context Caching and the Challenges of Caching in AI Models

Implicit Context Caching and the Challenges of Caching in AI Models

05:17Gemini Team's Video Content and the Potential of Gemini Diffusion for Generative UI

Gemini Team's Video Content and the Potential of Gemini Diffusion for Generative UI

Part 3: Live API, Audio/Video, Workflows

07:00Audio/Video's Role and Challenges in Live API Development

Audio/Video's Role and Challenges in Live API Development

09:55Complex Workflows and the Future of Voice-Based Applications

Complex Workflows and the Future of Voice-Based Applications

11:23Gemini's Unified Model Approach and the Role of Specialized Models

Gemini's Unified Model Approach and the Role of Specialized Models

Part 4: Partnerships, Infrastructure, Voice Agents

14:29Introduction of Kwindla Hultman Kramer and Daily's Partnership with Gemini

Introduction of Kwindla Hultman Kramer and Daily's Partnership with Gemini

15:34Gemini Live's Model Cascade and the Balance of Latency, Cost, and Quality

Gemini Live's Model Cascade and the Balance of Latency, Cost, and Quality

18:25Real-Time Voice Agents and the Importance of Low-Latency Networking

Real-Time Voice Agents and the Importance of Low-Latency Networking

20:15Proactive Audio and Speaker Identification in Gemini Models

Proactive Audio and Speaker Identification in Gemini Models

Part 5: Future Outlook, Closing

22:34Wish List for Next Year's I.O. and Closing Remarks

Wish List for Next Year's I.O. and Closing Remarks