April
This seminar is scheduled for 3PM (not the usual 4PM)
Dynamical versus Bayesian Transitions in a Toy Model of Superposition
Susan Wei Monash University
Despite the extensive use of stochastic optimization methods in training neural networks, the reasons for their effectiveness remain elusive. In this talk, we propose modeling parameter inference in deep learning as a process of free energy minimization, which allows us to draw on a rich body of insights from statistical physics. Further leveraging singular learning theory reveals that the free energy landscapes of “singular” models—encompassing most practical neural network architectures—differ markedly from those of “regular” models, displaying a pronounced energy-entropy competition as the sample size increases. In theory, this heightened competition manifests as Bayesian transitions.
We investigate whether such transitions actually arise by examining the Toy Model of Superposition (TMS), a simplified neural network studied by Anthropic for mechanistic interpretability. Through both theoretical analysis and empirical evidence, we show that TMS indeed exhibits these predicted Bayesian transitions. In the latter part of the talk, we explore the suitability of free energy minimization as a model for parameter inference (via stochastic gradient descent) in the TMS. We show that the observed training-time energy-entropy dynamics—termed dynamical transitions—align with theoretical predictions, suggesting that free energy minimization may serve as a viable, minimal model for understanding the training process of neural networks.
Contrasting modes of cultural evolution: Kra-Dai languages and weaving technologies
Emma Kopp Université Paris Dauphine, PSL
Computational methods have been used to reconstruct the history of languages over several millennia, based on data from modern languages. Using stochastic models of evolution along a phylogenetic tree, these methods infer language relationships (the topology of the tree) along with the ages of ancestral languages, usually in the Bayesian setting. We investigate and compare the evolution of two aspects of culture, languages and weaving technologies, amongst the Kra-Dai (Tai-Kadai) peoples of southwest China and southeast Asia, using Bayesian Markov-Chain Monte Carlo methods to uncover phylogenies. The results show that languages and looms evolved in related but different ways, and bring some new insights into the diaspora of the Kra-Dai speakers across southeast Asia.