Artificial Analysis founders Micah-Hill Smith and George Cameron discuss the evolution and future of AI benchmarking with Swyx. They detail their journey from a side project to a company providing independent AI model analysis, emphasizing the importance of objective metrics. They cover their business model, which includes enterprise subscriptions and private benchmarking, and the tech stack behind their public benchmarks. The conversation explores the nuances of AI model evaluation, including cost considerations, the challenges of parsing model responses, and the importance of controlling for variance in benchmarks. They also introduce new metrics like the Omniscience Index for measuring hallucination and discuss the trend of decreasing costs for AI intelligence alongside increasing overall spending due to new use cases.
Outlines
Part 1: Origins, Mission, and Business Model
Part 2: The Science of Independent Benchmarking
Part 3: The Intelligence Index and Market Landscape
Part 4: Knowledge, Hallucination, and Physics
Part 5: Model Architecture and Parameters
Part 6: Agentic Performance and Real-World Tasks
Part 7: Openness and Transparency
Part 8: Economics and Token Efficiency
Part 9: Future Outlook and Community
Sign in to continue reading, translating and more.