NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo) | Latent Space: The AI Engineer Podcast

The Latent Space Podcast explores the evolving landscape of AI development, developer experience, and the future of inference. Nader Khalil and Kyle Kranen from NVIDIA discuss the acquisition of Brev, a developer tool for GPU access, and NVIDIA's broader strategy in developer experience, emphasizing the importance of understanding the end-user. They introduce Dynamo, a data center scale inference engine designed to accelerate inference by leveraging techniques like disaggregation, and touch on the concept of "SOL" (Speed of Light) as a method to create urgency and understand the theoretical limits of project timelines. The conversation also covers the shift towards hardware-model co-design, the challenges of long-context models, and the potential of agents in streamlining workflows, while also addressing security concerns.

Outlines

Part 1: Security and Agent Capabilities

Part 2: NVIDIA, Brev, and Developer Experience

Part 3: The SOL Philosophy and Internal Culture

Part 4: Dynamo and Inference Scaling

Part 5: Context Length and Model Architecture

Part 6: Agents, Security, and Developer Tools

Part 7: Future Outlook and Community

Sign in to continue reading, translating and more.

Open full episode in Podwise

NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo)

Latent Space: The AI Engineer Podcast

Part 1: Security and Agent Capabilities

00:00Agent Capabilities and Security Enforcement Points

Agent Capabilities and Security Enforcement Points

Part 2: NVIDIA, Brev, and Developer Experience

00:38NVIDIA's Acquisition of Brev: A Developer-Focused Tool for GPU Access

NVIDIA's Acquisition of Brev: A Developer-Focused Tool for GPU Access

04:22Brev's Artisan Marketing and NVIDIA's Passion for Developer Experience

Brev's Artisan Marketing and NVIDIA's Passion for Developer Experience

07:19Acquisition Synergy and Brev's Role in NVIDIA's Developer Experience

Acquisition Synergy and Brev's Role in NVIDIA's Developer Experience

09:14NVIDIA's Expanding Developer Base and the Reinvention of Developer UX

NVIDIA's Expanding Developer Base and the Reinvention of Developer UX

10:30Addressing the Needs of New GPU Users and the DGX Spark UX

Addressing the Needs of New GPU Users and the DGX Spark UX

Part 3: The SOL Philosophy and Internal Culture

12:41NVIDIA Sync and the SOL Philosophy: Urgency and Root Understanding

NVIDIA Sync and the SOL Philosophy: Urgency and Root Understanding

15:16SOL's Application and Prioritization in Software Development

SOL's Application and Prioritization in Software Development

17:10Incremental Progress and Hardware-Driven SOL at NVIDIA

Incremental Progress and Hardware-Driven SOL at NVIDIA

18:38Kyle's Journey into Tabular Data, Recommenders, and Graph Neural Networks at NVIDIA

Kyle's Journey into Tabular Data, Recommenders, and Graph Neural Networks at NVIDIA

20:58Passion-Driven Innovation and the $0 Billion Business Concept at NVIDIA

Passion-Driven Innovation and the $0 Billion Business Concept at NVIDIA

22:29NVIDIA's Internal Communication and the Power of Momentum

NVIDIA's Internal Communication and the Power of Momentum

24:20Market Creation and the Ideologically Free Research at NVIDIA

Market Creation and the Ideologically Free Research at NVIDIA

Part 4: Dynamo and Inference Scaling

26:33Introduction to Dynamo: A Data Center Scale Inference Engine

Introduction to Dynamo: A Data Center Scale Inference Engine

28:47Scaling Up vs. Scaling Out and the Challenges of Multi-Node Inference

Scaling Up vs. Scaling Out and the Challenges of Multi-Node Inference

31:12Model Selection, Accuracy, and the Three Axes of Inference: Cost, Quality, Latency

Model Selection, Accuracy, and the Three Axes of Inference: Cost, Quality, Latency

34:06Experimentation and the "Just Try Again" Approach to Model Optimization

Experimentation and the "Just Try Again" Approach to Model Optimization

35:34Inference Reading Groups and Nemotron's Layered Release

Inference Reading Groups and Nemotron's Layered Release

38:40Dynamo's Role in Balancing Cost, Quality, and Latency

Dynamo's Role in Balancing Cost, Quality, and Latency

40:35Scaling Out with Dynamo and Kubernetes for Prefill and Decode

Scaling Out with Dynamo and Kubernetes for Prefill and Decode

Part 5: Context Length and Model Architecture

43:01Context Length Upper Bounds and Model Hardware Co-Design

Context Length Upper Bounds and Model Hardware Co-Design

45:50Sparsity and Hardware Model Context Co-Design

Sparsity and Hardware Model Context Co-Design

47:15Training Models for Specific Harnesses and the Future of Context Length

Training Models for Specific Harnesses and the Future of Context Length

49:02Unhobblers and Scientific Discoveries in Model Architecture

Unhobblers and Scientific Discoveries in Model Architecture

51:21Future Model Architectures and the Grok Acquisition

Future Model Architectures and the Grok Acquisition

Part 6: Agents, Security, and Developer Tools

53:02Dynamo Sessions at GTC and the Future of Agents in Production Inference

Dynamo Sessions at GTC and the Future of Agents in Production Inference

55:25NVIDIA's Deployment of Codecs and the Fluidity of the Organization

NVIDIA's Deployment of Codecs and the Fluidity of the Organization

58:41Agent Security and the Importance of Enforcement Points

Agent Security and the Importance of Enforcement Points

1:01:14NVIDIA's Internal Inference Gateway and the Fork VS Code Hackathon

NVIDIA's Internal Inference Gateway and the Fork VS Code Hackathon

1:03:34Driverless Car Hackathon and the World's Shortest Hackathon

Driverless Car Hackathon and the World's Shortest Hackathon

1:05:11Agents and Dynamo: Automating Configuration and Agent UX

Agents and Dynamo: Automating Configuration and Agent UX

1:07:13The Importance of CLIs and the Open CLI Foundation

The Importance of CLIs and the Open CLI Foundation

1:09:31CLIs vs. APIs and the Ubiquity of Bash

CLIs vs. APIs and the Ubiquity of Bash

Part 7: Future Outlook and Community

1:11:04The Blackwell RTX 6000 and the Economy of Scale

The Blackwell RTX 6000 and the Economy of Scale

1:12:30WideEP and the Year of the Sub-Agent

WideEP and the Year of the Sub-Agent

1:14:05System as Model and the Complexity of Multi-Agent Systems