YouTube01 May 2026
1h 38m

Shipping complex AI applications — Braintrust & Trainline

Podcast cover

AI Engineer

Delivering reliable AI applications at scale requires moving beyond prototype-level development to robust operational workflows. This session outlines a systematic approach to industrializing generative AI, emphasizing the necessity of observability, structured evaluation, and continuous feedback loops. By breaking down monolithic LLM calls into multi-stage agentic workflows, developers can pinpoint failure modes and improve system performance. Practical implementation involves using Braintrust to trace execution, manage prompts, and apply both deterministic and LLM-as-a-judge scoring functions. Real-world examples from Trainline demonstrate how these techniques enable teams to maintain quality while rapidly iterating on complex agentic products, such as travel assistants, ensuring cost-efficiency and reliability in production environments. This methodology transforms AI development from a "works on my machine" experiment into a rigorous, collaborative engineering process that allows for safe, rapid deployment of mission-critical systems.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise