AI-MI Seminar Series: Analyzing the Nonlocality of Sparse Autoencoder Features
Join us for the next AI-MI Seminar Series talk as we explore how sparse autoencoders are helping unlock interpretable features within large language models.
This talk examines how tools from machine learning and theoretical physics can be combined to better understand the internal representations of LLMs. Drawing inspiration from holographic duality, the speaker introduces a novel entropy-based measure to quantify how nonlocal learned features are in relation to input tokens—offering new insight into how information is structured and processed in these systems.
Watch live at youtube.com/@AIMaterialsInstitute
Topic: Sparse autoencoders (SAEs) have emerged as a useful tool for extracting interpretable features from the internal representations of large language models (LLMs). Motivated by an analogy with holographic duality in theoretical physics, where strongly correlated boundary degrees of freedom map to weakly coupled bulk fields of varying nonlocality, we introduce an entropy measure that quantifies how nonlocal each SAE feature is in the input token space, in the sense of how many input tokens have a strong impact on the activation of this feature. We analyze this measure in different language models, providing a new tool for understanding the information dynamics in LLMs.
Speaker: Xiaoliang Qi is a Professor of Physics at Stanford University, where his research spans the interplay of quantum entanglement, quantum gravity, and quantum chaos, alongside continued work on topological phases in condensed matter systems. He is a recipient of the New Horizons in Physics Prize and the Packard Fellowship, among other honors. Qi has been a leading voice in applying ideas from quantum information and tensor networks to many-body and materials problems, and is engaged in the emerging conversation about agentic and AI-driven approaches to scientific research.

