12 Mar 2026
1h 0m

Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

Podcast cover

Latent Space: The AI Engineer Podcast

Summary

Turbopuffer's origin story and architecture are explored with Simon Eskildsen, delving into its evolution as a search engine for unstructured data. Eskildsen details Turbopuffer's unique approach to database architecture, leveraging NVMe SSDs and object storage, and its reliance on S3's consistency for consensus. He recounts leaving Shopify to consult for Readwise, where the high cost of embedding articles sparked the idea for a more cost-effective solution. The conversation highlights Turbopuffer's early customers, including Cursor and Notion, and the lengths they went to, such as buying dark fiber, to meet Notion's latency requirements. Eskildsen also shares insights into Turbopuffer's pricing strategy, team-building philosophy centered on "P99 engineers," and future plans, including expanding full-text search capabilities and scaling to handle massive datasets.

Outlines

Part 1: Origins, Philosophy, and the Aarhus Connection

00:00

A Risky Proposition: Promising to Return Capital if Product-Market Fit Fails

Speaker 1 recounts a conversation with Locky, an investor, where he proposed returning the investment if Turbopuffer didn't achieve product-market fit by year's end. Speaker 1 and Justine were only willing to continue if the project showed real promise and were committed to giving it their best shot, including hiring aggressively. Locky was surprised by the offer, noting he had never heard anyone say that before. The podcast introduces Alessio and Swyx, hosts of the Latent Space Podcast, and their guest, Simon Eskildsen of Turbopuffer.

00:48

Simon Eskildsen's Danish Roots and the Aarhus Mafia's Programming Prowess

Swyx notes Turbopuffer's rapid growth and welcomes Simon Eskildsen, highlighting his Danish background and connection to the "Aarhus mafia," a group of influential programmers including Bjørn Strøstrup, Rasmus Lerdof, and members of the V8 and Google Maps teams. Despite now being mostly Canadian, Eskildsen acknowledges his Danish upbringing and its influence, particularly the "ruthless pragmatism" and focus on aesthetics. He expresses humility about being considered part of the Aarhus mafia compared to its legendary members.

02:10

Turbopuffer Defined: A Search Engine for Unstructured Data and AI Connectivity

Turbopuffer is defined as a search engine specializing in full-text and vector search, designed to connect vast amounts of unstructured data to AI. It aims to be the search engine for all the world's knowledge, enabling AI models to reason with and make sense of information stored in full fidelity. Speaker 1 outlines three conditions needed to build a big database company: a new workload connecting data to AI, a new storage architecture based on NVMe SSDs and object storage, and the ability to implement various query plans over time.

Part 2: Technical Genesis and Architectural Innovation

06:26

From Shopify to Readwise: The Genesis of Turbopuffer's AI-Powered Search

Speaker 1 recounts his decade at Shopify, focusing on infrastructure scaling challenges, particularly with Elasticsearch. After leaving Shopify, he consulted for Readwise, where he developed a recommendation engine using embeddings. The engine worked well, even suggesting articles about having a child to one of the co-founders, but the infrastructure costs were prohibitively high at $30,000 per month for one feature. This experience highlighted the latent demand for a more cost-effective solution, sparking the initial idea for Turbopuffer.

12:26

Napkin Math and Object Storage: The Architectural Foundation of Turbopuffer

Speaker 1 describes his "napkin math" calculations that led to the idea of building a database using object storage, NVMe SSDs, and DRAM. The concept involves storing everything on object storage and "puffing" data into NVMe or DRAM as needed. This approach minimizes round trips and leverages the capabilities of modern hardware and cloud infrastructure. He notes that S3 consistency and NVMe SSD availability in the cloud are recent developments that make this architecture feasible.

17:12

S3 Consistency, Compare and Swap, and Dark Fiber: Overcoming Cloud Infrastructure Limitations

Speaker 1 explains how Turbopuffer relies on S3's strong consistency, which became available in December 2020, and compare-and-swap operations. The company initially started on GCP due to the availability of compare-and-swap, but faced latency issues when working with Notion, which ran on AWS. To address this, Turbopuffer bought dark fiber between AWS regions in Oregon to reduce latency, highlighting their commitment to performance and customer experience.

Part 3: Market Adoption and Real-World Use Cases

23:32

Notion's Workloads and the Buy vs. Build Decision in the Age of AI

The discussion shifts to the specific workloads Turbopuffer handles for Notion, emphasizing the importance of low latency. Speaker 1 explains that Notion engineers had considered building a similar system internally but chose to partner with Turbopuffer due to time constraints and the need for a specialized team. This highlights the evolving "buy versus build" equation in the age of AI, where speed and expertise are critical factors.

25:59

Cursor's Early Adoption: A 95% Cost Reduction and the Power of All-In Support

Speaker 1 recounts how Cursor became an early customer, emphasizing the scrappy beginnings of Turbopuffer. After Cursor reached out, Speaker 1 visited their office and offered complete support. This led to a 95% cost reduction for Cursor, which significantly improved their per-user economics. This success prompted Speaker 1 to bring on Justine as a co-founder to help manage the workload and ensure ongoing support.

28:55

Code as Data: Cursor's Security Posture and the Hybrid Nature of Workloads

The conversation explores whether code is a different workload than normal text, noting that Cursor uses its own embedding model and post-trains it for semantic search. Speaker 1 highlights Cursor's exceptional security posture, including using their own embedding model, obfuscating file paths, and encrypting data with their own keys. The discussion concludes that all workloads are hybrid, requiring a combination of semantic, text, regex, and SQL queries.

Part 4: Business Strategy and Pricing Evolution

31:17

The Evolution of Search: From Context Building to Concurrent Agent Queries

The discussion shifts to the evolution of search in AI, from single searches to build context to concurrent agent queries. Speaker 1 notes that LLMs are increasingly used for reasoning, with Turbopuffer serving as a tool call. He observes a growing demand for concurrency, with users driving multiple queries simultaneously. In response, Turbopuffer is reducing query pricing by 5x to accommodate these new workloads.

34:22

Turbopuffer's Pricing Evolution: From Vibe-Based to Hardware-Driven

Speaker 1 discusses Turbopuffer's pricing strategy, which initially started as a "vibe price" based on a rough estimate of costs. The pricing was storage, writes, and query based. The company optimized to ensure a small margin, driven by the high GCP bills. Turbopuffer offers SaaS, single-tenant cluster, and BYOC deployment options.

38:17

Why Locky? Honesty, Authenticity, and the Value of a Generalist Investor

Speaker 1 explains why he chose Locky as an investor, emphasizing the importance of honesty and authenticity. He recounts telling Locky that he would return the money if Turbopuffer didn't achieve product-market fit, a statement that surprised Locky. Speaker 1 values Locky's lack of database expertise, as it complements the team's knowledge and allows him to focus on candidates and customers.

Part 5: The P99 Engineer and Team Culture

41:27

Building a P99 Engineering Team: Traits, Talent Density, and the Default "No"

The discussion turns to team building, particularly the concept of the "P99 engineer." Speaker 1 emphasizes the importance of a talent-dense company and describes the traits they look for in candidates. He explains that the default decision is to reject a candidate, requiring strong advocacy from the team to justify a hire.

45:26

Defining the P99 Engineer: Bending Software to Your Will and a Love of Maps

Speaker 1 elaborates on the traits of a P99 engineer, including the ability to "bend" software to their will, citing the example of Nathan, who enabled Turbopuffer to search 100 billion vectors with low latency. He also jokingly mentions a love of maps as a common trait, emphasizing the importance of being obsessive and detail-oriented.

48:57

Trade-offs, First Principles, and the High-Agency P99 Engineer

The discussion continues on the traits of P99 engineers, highlighting their ability to think in trade-offs and articulate them clearly. Speaker 1 explains that "bending the software to your will" means optimizing the system to get closer to its first-principles performance limits. He contrasts this with a high-agency engineer who might go to extreme lengths to achieve a goal.

Part 6: Future Roadmap and Personal Interests

51:13

The Future of Turbopuffer: Full-Text Search, Scale, and a Better Dashboard

Speaker 1 outlines the future of Turbopuffer, describing Act 1 as vector search and Act 2 as full-text search. The company is focused on improving full-text search features, scaling to handle larger datasets, and enhancing the dashboard. He expresses a desire to bring back the simplicity and functionality of PHPMyAdmin.

54:30

Act 3 Candidates: Simpler OLAP Queries, Traces and Logging, and Time Series

Speaker 1 discusses potential Act 3 candidates for Turbopuffer, including simpler OLAP queries, traces and logging, and time series. He emphasizes the importance of focus and avoiding overextension. The company is looking for patterns in customer usage to guide its future development.

57:05

Yabukita Kamairicha: A Tea Obsession and the Pursuit of the Perfect Cup

The conversation shifts to Speaker 1's love of tea, specifically Yabukita Kamairicha from the Green Tea Shop. He describes his air table of 200 teas and his preference for Chinese green tea. He notes that the best time to get this tea is during the spring harvest.

58:48

P99 Live: A Potential New Venture with Sam Lambert of PlanetScale

Speaker 1 mentions a recent X Live session with Sam Lambert of PlanetScale and hints at a potential new venture called P99 Live or P99 Pod. The podcast concludes with thanks and appreciation.

Sign in to continue reading, translating and more.

Open full episode in Podwise