This interview podcast features Amir Haghighat and Yineng Zhang from Baseten, a leading large language model (LLM) inference platform, discussing the newly released DeepSeek v3 model. The discussion begins with an overview of DeepSeek v3's capabilities and its ranking on the LM Arena leaderboard, highlighting its significance as the best open-weights model. The conversation then shifts to the challenges of serving such large models, focusing on Baseten's use of H200 clusters and the importance of frameworks like SGLang for efficient inference. Finally, the speakers delve into the three pillars of mission-critical inference workloads—model-level performance, horizontal scalability, and developer experience—with a detailed explanation of Baseten's approach and the unique features of SGLang. A specific takeaway is the emphasis on Baseten's consumption-based pricing model, contrasting it with per-token pricing and highlighting its suitability for customers with custom models and strict performance requirements.