Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review — Ryan Lopopolo, OpenAI Frontier & Symphony
Latent Space: The AI Engineer Podcast
The discussion centers on Harness engineering, a methodology for AI-driven software development, with Ryan Lopopolo from OpenAI. Lopopolo details his team's experience building an internal tool using AI agents that wrote over a million lines of code with minimal human intervention. A key aspect involves inverting control, allowing the AI to manage its environment and choose its tools, rather than operating within a predefined scaffold. The conversation covers the team's iterative process, adapting to model updates and optimizing for agent productivity by enforcing strict build time limits. They also explore the concept of "ghost libraries," distributing software as specifications for AI to reassemble, and the potential for AI to handle tasks such as code review, dependency management, and even humor generation.
Part 1: Foundations of AI-Driven Development
00:00Exploring Codex and Harness Engineering for AI Product Development
Exploring Codex and Harness Engineering for AI Product Development
Ryan Lopopolo from OpenAI discusses the potential of Codex and harness engineering in building AI products. He emphasizes the momentum around improving models for coding and the ability to translate product ideas into code using the Codex Harness. Ryan works on Frontier product exploration, focusing on novel ways to deploy OpenAI models into enterprise solutions. His team aims to create packaged products that enterprises can use to safely deploy agents at scale with good governance.
01:47Building an Internal Tool with Zero Code: A 10x Speed Improvement
Building an Internal Tool with Zero Code: A 10x Speed Improvement
Ryan describes his experience at OpenAI, highlighting the company's AI-maximalist environment and internal resources. He recounts building an internal tool with zero lines of code written by himself, resulting in a million-line codebase. This approach was reportedly 10x faster than traditional methods. Starting with early versions of Codex CLI, the team adopted a strategy of breaking down complex tasks into smaller, manageable building blocks for the model to assemble.
05:44Optimizing Build Systems for AI Agents: From Turbo to NX
Optimizing Build Systems for AI Agents: From Turbo to NX
The conversation shifts to the evolution of the build system, transitioning from a bespoke make file to Bazel, Turbo, and finally NX to optimize agent productivity. A key constraint was maintaining build times under one minute to facilitate a fast inner loop. The discussion touches on the significance of background shells in Codex 5.3 and the need to adapt the codebase to model revisions. Ryan emphasizes the importance of systems thinking, identifying agent mistakes, and automating SDLC processes.
Part 2: Agent Architecture and Observability
09:11Observability and Knowledge Injection for AI Coding Agents
Observability and Knowledge Injection for AI Coding Agents
The discussion centers on how humans became the bottleneck in the AI-driven development process, despite a small team producing a million lines of code. The focus shifted to providing the model with observability, such as the graph in the article, to improve its performance. Instead of setting up an environment for the coding agent, the agent itself is the entry point, equipped with skills and scripts to boot the stack. The models crave text, so the team found ways to inject text into the system.
13:22Balancing Autonomy and Control: Code Review Agents and Incident Response
Balancing Autonomy and Control: Code Review Agents and Incident Response
Concerns about autonomous merging by coding review agents are addressed, with a discussion on how the agents are instructed to acknowledge and respond to feedback. The prompts allow the agents to push back, and the reviewer agents are biased toward merging. The agents handle a wide range of tasks, including product development, code and tests, CI configuration, documentation, and production dashboard definitions. The team uses Codex to author JSON for Grafana dashboards and respond to pages.
17:46Agent Legibility and the Future of Software Engineering
Agent Legibility and the Future of Software Engineering
The conversation explores how the team adapted to the model's preferred way of writing software, prioritizing agent legibility over human legibility. Ryan shares his mindset of being removed from the process, similar to a group tech lead for a large organization. He emphasizes the importance of a Command base class for repeatable business logic with built-in tracing and metrics. The discussion touches on how models are improving at proposing abstractions, allowing humans to focus on higher-level strategic issues.
21:33Coding Agents Eating Knowledge Work and Persisting Non-Functional Requirements
Coding Agents Eating Knowledge Work and Persisting Non-Functional Requirements
Ryan suggests that coding agents will eventually handle non-coding knowledge work. He emphasizes the importance of providing models with scripts. The team encodes non-functional requirements into docs, tests, and review agents to inject prompts into the agent. The goal is to extract what engineers think good looks like and coach the agent to meet those standards.
Part 3: The Symphony System and Workflow Automation
25:01The Effectiveness of Giving AI Models Articles and the End of Software Dependencies
The Effectiveness of Giving AI Models Articles and the End of Software Dependencies
People are reportedly giving the article on Harness engineering to AI models like Pi or Codex, and it's proving wildly effective. Brett Taylor's response is discussed, focusing on the idea that software dependencies are going away and can be vendored. Ryan agrees, stating that the complexity of dependencies that can be internalized is currently low to medium. He also notes that security can be improved by deeply reviewing and changing internalized dependencies.
28:02Internal Tooling and the Symphony Spec
Internal Tooling and the Symphony Spec
The team had deployed their app to the first dozen users internally, had some performance issues, and asked them to export a trace for them. The on-call engineer worked with Codex to build a local DevTool Next.js app that visualizes the entire trace. The team is distributing Symphony as a spec, which some are calling ghost libraries. The team is spinning up a new repo, asking Codex to write the spec, and then having another Codex review the implementation.
31:30Symphony: Automating the Development Workflow
Symphony: Automating the Development Workflow
The discussion transitions to Symphony, an Elixir-based system designed to automate the development workflow. The model chose Elixir because the process supervision and gen servers are amenable to the type of process orchestration that the team is doing. Symphony aims to remove the need for humans to sit in front of their terminals, allowing them to be more latency insensitive and less attached to the code.
34:43AI-Pilled Development and the Rigid Architecture of the App
AI-Pilled Development and the Rigid Architecture of the App
The team has been working to be as AI-pilled as possible, and many of their innovations have influenced OpenAI's products. The team has a daily stand-up that's 45 minutes long because they almost have to fan out the understanding of the current state. The app has a rigid architecture with 500 NPM packages to prevent people from trampling on each other.
Part 4: Collaboration and Technical Optimization
37:34Issue Trackers, Collaboration, and the Future of Tooling
Issue Trackers, Collaboration, and the Future of Tooling
The team uses Linear as their issue tracker and Slack for communication. The team fires off Codex to do low-success-y fix-offs to sink that knowledge into the repository. The team discusses the need for collaboration tooling that allows agents to naturally collaborate with humans. The team gives the agent full accessibility over its domain.
42:49Adapting Non-Textual Things to Improve Model Behavior
Adapting Non-Textual Things to Improve Model Behavior
The team has been adapting non-textual things to that shape in order to improve model behavior. Agents do not perceive visually in the same way that humans do. If the team wants to actually make it see the layout, it's almost easier to rasterize that image to ASCII architecture and feed it in to the agent.
45:00The Coordination Layer and the Importance of Instructions
The Coordination Layer and the Importance of Instructions
The coordination layer was a tricky piece to get right. The model takes a shortcut and uses the primitives that it can make use of in the runtime that has native process supervision. The team gives the agent the GH CLI with some text that says CI has to pass. The agents are good at following instructions, so give them instructions and it will improve the reliability of the result.
49:15Software Flexibility and Trust in the Output
Software Flexibility and Trust in the Output
Software is made more flexible when it's able to adapt to the environment in which it is deployed. The agents are good at following instructions, so give them instructions and it will improve the reliability of the result. The video that is shared here is the same sort of video the coding agent would attach to the PR that is created.
Part 5: Enterprise Scaling and Future Outlook
53:05The Future of Coding and the Limits of Current Models
The Future of Coding and the Limits of Current Models
The team is at the computer with windows popping up all over the place and getting captured and files appearing on the desktop. The team discusses the different models and how to deploy them. The models are not there yet on being able to go from new product idea to prototype.
57:22Frontier: OpenAI's Enterprise Platform
Frontier: OpenAI's Enterprise Platform
The discussion transitions to OpenAI's Frontier, the platform by which OpenAI wants to do AI transformation of every enterprise. The goal is to make it easy to deploy, highly observable, safe, controlled, identifiable agents into the workplace. The platform will work with company native IAM stacks and plug into security tooling.
1:01:20Agent Management and the Data Feedback Layer
Agent Management and the Data Feedback Layer
The demo videos are an example of very large scale agent management. The dashboard is for IT, GRC and governments folks, AI innovation office, and the security team. The data is the feedback layer, and it needs to be solved first in order to have the product's feedback loop closed.
1:05:02The Building Blocks of Agents and the Tension Between Harness and Training
The Building Blocks of Agents and the Tension Between Harness and Training
The team has skills for how to properly generate deep fried memes and have ReactG culture and Slack. There's a fundamental tension between whether or not to invest deeper into the harness or invest deeper into the training process to get the model to do more of this by default. The team is building an on policy harness, which is already within distribution and modifying it from there.
1:09:03Shipping Relentlessly and the Growth of OpenAI
Shipping Relentlessly and the Growth of OpenAI
The Codex team ships relentlessly. The team is super excited to support the self-hosted Harness thing. There is lots of work to be done in order to successfully serve enterprise customers here in Frontier. The team is hiring and the Codex app has just passed 2 million weekly active users.
Sign in to continue reading, translating and more.
Open full episode in Podwise