07 Apr 2026
1h 12m

Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review — Ryan Lopopolo, OpenAI Frontier & Symphony

Podcast cover

Latent Space: The AI Engineer Podcast

Summary

The discussion centers on Harness engineering, a methodology for AI-driven software development, with Ryan Lopopolo from OpenAI. Lopopolo details his team's experience building an internal tool using AI agents that wrote over a million lines of code with minimal human intervention. A key aspect involves inverting control, allowing the AI to manage its environment and choose its tools, rather than operating within a predefined scaffold. The conversation covers the team's iterative process, adapting to model updates and optimizing for agent productivity by enforcing strict build time limits. They also explore the concept of "ghost libraries," distributing software as specifications for AI to reassemble, and the potential for AI to handle tasks such as code review, dependency management, and even humor generation.

Outlines

Part 1: Foundations of AI-Driven Development

00:00

Exploring Codex and Harness Engineering for AI Product Development

Ryan Lopopolo from OpenAI discusses the potential of Codex and harness engineering in building AI products. He emphasizes the momentum around improving models for coding and the ability to translate product ideas into code using the Codex Harness. Ryan works on Frontier product exploration, focusing on novel ways to deploy OpenAI models into enterprise solutions. His team aims to create packaged products that enterprises can use to safely deploy agents at scale with good governance.

01:47

Building an Internal Tool with Zero Code: A 10x Speed Improvement

Ryan describes his experience at OpenAI, highlighting the company's AI-maximalist environment and internal resources. He recounts building an internal tool with zero lines of code written by himself, resulting in a million-line codebase. This approach was reportedly 10x faster than traditional methods. Starting with early versions of Codex CLI, the team adopted a strategy of breaking down complex tasks into smaller, manageable building blocks for the model to assemble.

05:44

Optimizing Build Systems for AI Agents: From Turbo to NX

The conversation shifts to the evolution of the build system, transitioning from a bespoke make file to Bazel, Turbo, and finally NX to optimize agent productivity. A key constraint was maintaining build times under one minute to facilitate a fast inner loop. The discussion touches on the significance of background shells in Codex 5.3 and the need to adapt the codebase to model revisions. Ryan emphasizes the importance of systems thinking, identifying agent mistakes, and automating SDLC processes.

Part 2: Agent Architecture and Observability

09:11

Observability and Knowledge Injection for AI Coding Agents

The discussion centers on how humans became the bottleneck in the AI-driven development process, despite a small team producing a million lines of code. The focus shifted to providing the model with observability, such as the graph in the article, to improve its performance. Instead of setting up an environment for the coding agent, the agent itself is the entry point, equipped with skills and scripts to boot the stack. The models crave text, so the team found ways to inject text into the system.

13:22

Balancing Autonomy and Control: Code Review Agents and Incident Response

Concerns about autonomous merging by coding review agents are addressed, with a discussion on how the agents are instructed to acknowledge and respond to feedback. The prompts allow the agents to push back, and the reviewer agents are biased toward merging. The agents handle a wide range of tasks, including product development, code and tests, CI configuration, documentation, and production dashboard definitions. The team uses Codex to author JSON for Grafana dashboards and respond to pages.

17:46

Agent Legibility and the Future of Software Engineering

The conversation explores how the team adapted to the model's preferred way of writing software, prioritizing agent legibility over human legibility. Ryan shares his mindset of being removed from the process, similar to a group tech lead for a large organization. He emphasizes the importance of a Command base class for repeatable business logic with built-in tracing and metrics. The discussion touches on how models are improving at proposing abstractions, allowing humans to focus on higher-level strategic issues.

21:33

Coding Agents Eating Knowledge Work and Persisting Non-Functional Requirements

Ryan suggests that coding agents will eventually handle non-coding knowledge work. He emphasizes the importance of providing models with scripts. The team encodes non-functional requirements into docs, tests, and review agents to inject prompts into the agent. The goal is to extract what engineers think good looks like and coach the agent to meet those standards.

Part 3: The Symphony System and Workflow Automation

25:01

The Effectiveness of Giving AI Models Articles and the End of Software Dependencies

People are reportedly giving the article on Harness engineering to AI models like Pi or Codex, and it's proving wildly effective. Brett Taylor's response is discussed, focusing on the idea that software dependencies are going away and can be vendored. Ryan agrees, stating that the complexity of dependencies that can be internalized is currently low to medium. He also notes that security can be improved by deeply reviewing and changing internalized dependencies.

28:02

Internal Tooling and the Symphony Spec

The team had deployed their app to the first dozen users internally, had some performance issues, and asked them to export a trace for them. The on-call engineer worked with Codex to build a local DevTool Next.js app that visualizes the entire trace. The team is distributing Symphony as a spec, which some are calling ghost libraries. The team is spinning up a new repo, asking Codex to write the spec, and then having another Codex review the implementation.

31:30

Symphony: Automating the Development Workflow

The discussion transitions to Symphony, an Elixir-based system designed to automate the development workflow. The model chose Elixir because the process supervision and gen servers are amenable to the type of process orchestration that the team is doing. Symphony aims to remove the need for humans to sit in front of their terminals, allowing them to be more latency insensitive and less attached to the code.

34:43

AI-Pilled Development and the Rigid Architecture of the App

The team has been working to be as AI-pilled as possible, and many of their innovations have influenced OpenAI's products. The team has a daily stand-up that's 45 minutes long because they almost have to fan out the understanding of the current state. The app has a rigid architecture with 500 NPM packages to prevent people from trampling on each other.

Part 4: Collaboration and Technical Optimization

37:34

Issue Trackers, Collaboration, and the Future of Tooling

The team uses Linear as their issue tracker and Slack for communication. The team fires off Codex to do low-success-y fix-offs to sink that knowledge into the repository. The team discusses the need for collaboration tooling that allows agents to naturally collaborate with humans. The team gives the agent full accessibility over its domain.

42:49

Adapting Non-Textual Things to Improve Model Behavior

The team has been adapting non-textual things to that shape in order to improve model behavior. Agents do not perceive visually in the same way that humans do. If the team wants to actually make it see the layout, it's almost easier to rasterize that image to ASCII architecture and feed it in to the agent.

45:00

The Coordination Layer and the Importance of Instructions

The coordination layer was a tricky piece to get right. The model takes a shortcut and uses the primitives that it can make use of in the runtime that has native process supervision. The team gives the agent the GH CLI with some text that says CI has to pass. The agents are good at following instructions, so give them instructions and it will improve the reliability of the result.

49:15

Software Flexibility and Trust in the Output

Software is made more flexible when it's able to adapt to the environment in which it is deployed. The agents are good at following instructions, so give them instructions and it will improve the reliability of the result. The video that is shared here is the same sort of video the coding agent would attach to the PR that is created.

Part 5: Enterprise Scaling and Future Outlook

53:05

The Future of Coding and the Limits of Current Models

The team is at the computer with windows popping up all over the place and getting captured and files appearing on the desktop. The team discusses the different models and how to deploy them. The models are not there yet on being able to go from new product idea to prototype.

57:22

Frontier: OpenAI's Enterprise Platform

The discussion transitions to OpenAI's Frontier, the platform by which OpenAI wants to do AI transformation of every enterprise. The goal is to make it easy to deploy, highly observable, safe, controlled, identifiable agents into the workplace. The platform will work with company native IAM stacks and plug into security tooling.

1:01:20

Agent Management and the Data Feedback Layer

The demo videos are an example of very large scale agent management. The dashboard is for IT, GRC and governments folks, AI innovation office, and the security team. The data is the feedback layer, and it needs to be solved first in order to have the product's feedback loop closed.

1:05:02

The Building Blocks of Agents and the Tension Between Harness and Training

The team has skills for how to properly generate deep fried memes and have ReactG culture and Slack. There's a fundamental tension between whether or not to invest deeper into the harness or invest deeper into the training process to get the model to do more of this by default. The team is building an on policy harness, which is already within distribution and modifying it from there.

1:09:03

Shipping Relentlessly and the Growth of OpenAI

The Codex team ships relentlessly. The team is super excited to support the self-hosted Harness thing. There is lots of work to be done in order to successfully serve enterprise customers here in Frontier. The team is hiring and the Codex app has just passed 2 million weekly active users.

Sign in to continue reading, translating and more.

Open full episode in Podwise