On the Biology of a Large Language Model (Part 1)

In this monologue podcast, Yannic Kilcher discusses Anthropic's paper, "On the Biology of a Large Language Model," which explores the inner workings of transformer language models using a method called circuit tracing. He explains how Anthropic uses replacement models and transcoders to create attribution graphs, which help decipher the features activated within these models. Kilcher presents examples from the paper, such as multi-step reasoning, poetry generation, and multilingual capabilities, analyzing the attribution graphs and intervention experiments used to understand how these models function. While he finds the research insightful, he also expresses skepticism about Anthropic's framing and suggests that simpler explanations, like fine-tuning, might suffice in some cases. He also discusses the limitations of transcoders, such as computational burden and performance loss, and emphasizes the importance of intervention experiments to validate the transcoder's findings.

Outlines

Sign in to continue reading, translating and more.

Continue

Yannic Kilcher

Introduction to Anthropic's Research on Large Language Models

Circuit Tracing and Transcoders: A Deep Dive

Understanding Attribution Graphs and Feature Activation

Multi-Step Reasoning: An Example with Capital Cities

Intervention Experiments and Poetry Generation

Planning in Language Models and Multilingual Circuits

Multilingual Feature Overlap and English Bias

On the Biology of a Large Language Model (Part 1)

Yannic Kilcher

00:00Introduction to Anthropic's Research on Large Language Models

Introduction to Anthropic's Research on Large Language Models

04:16Circuit Tracing and Transcoders: A Deep Dive

Circuit Tracing and Transcoders: A Deep Dive

13:02Understanding Attribution Graphs and Feature Activation

Understanding Attribution Graphs and Feature Activation

20:21Multi-Step Reasoning: An Example with Capital Cities

Multi-Step Reasoning: An Example with Capital Cities

28:31Intervention Experiments and Poetry Generation

Intervention Experiments and Poetry Generation

37:06Planning in Language Models and Multilingual Circuits

Planning in Language Models and Multilingual Circuits

47:57Multilingual Feature Overlap and English Bias

Multilingual Feature Overlap and English Bias