A teleological approach to understanding LLMs
Comments on the paper "Embers of AutoRegression: Understanding Large Language Models Through the Problem They are Trained to Solve" by McCoy, R.T. et al.
Note- I am not author of the original paper and any errors or omissions in this article are my own.
In a previous note on the essay “Artificial General Intelligence is already Here” I mostly agreed with the authors, and stressed the importance of not imposing preconceived prescriptive ideas about how AI models should work, given that we do not currently have a comprehensive and satisfactory grand unified theory of how minds work more generally.
Superficially, it might seem that the very existence of large-language models proves that we understand how they work - after all, we built them. But words can be tricky, and a more accurate description would be to say that we merely trained them. By its very nature, training entails specifying what the system should do, but not how it should do it. In the words of Eliezer Yudowsky, machine-learning models “… are giant, inscrutable matrices of floating-point numbers that we nudge in the direction of better performance until they inexplicably start working”.
Thus we have conjured into existence artifacts such as large-language models that exhibit impressive capabilities, but science is still scrabbling to catch-up and explain how these models actually work.
This state of affairs is nothing new; after all, over millions of years evolution nudged brains in the direction of better performance until they inexplicably started talking. About a half-century ago a new scientific discipline — cognitive science — was founded to try to understand how minds work, and why they work the way do. Cognitive science is multi-disciplinary, drawing on other fields such as biology, computer-science, psychology, economics, linguistics and anthropology, to name but a few.
One way that cognitive science advances our understanding is by making testable conjectures in order to rule out some explanations in favour of others and then performing experiments. One seminal experiment is the "Stroop Effect" experiment (Stroop, 1935).
This experiment revealed insights into the automatic processes that influence the way our minds work, particularly concerning attention and interference. Stroop was interested in exploring the conflict between different mental processes when they’re pitted against each other.
In the classic Stroop task, participants are presented with a list of color words (like "red", "blue", "green", etc.), but these words are printed in ink colors that do not match the words themselves. For instance, the word "red" might be printed in blue ink, and so on. Participants are then asked to name, as quickly as possible, the color of the ink that the words are printed in, not the words themselves.
What Stroop found was a significant amount of interference in the reaction times of the participants. It took individuals longer to name the color of the ink when the word described a different color (incongruent condition - second row) than when the color of the ink and the word were the same (congruent condition - first row). This effect demonstrated that the known meaning of the word had an automatic influence on the individual's ability to do the task, as it interfered with the naming of the color of the ink.
These results underscored the idea that cognitive processes can operate without the need for conscious guidance, and sometimes in conflict with conscious intention. The insights gained from this research have had a profound impact, contributing to theories of selective attention, automaticity, and the processing speed of semantic information in human cognition.
Many of the most informative experiments in psychology, and in science more generally, gain their insights from the study of when a system does not work as expected by looking at "failure modes," "anomalies" or "interferences" within a system. The Stroop experiment is a prime example of this.
In the Stroop task, the interference is evident when participants struggle to quickly name the color of the ink because the semantic meaning of the word itself is in conflict with the ink's color. Under "normal" conditions, where there is no interference, participants can quickly and efficiently process the information (e.g., reading words or naming colors separately). But when these tasks are combined in a conflicting manner, the system's inefficiencies or vulnerabilities are exposed.
Researchers are now applying similar approaches from cognitive psychology in order to try and better understand artificial cognition. A recent paper by Thomas McCoy et al. at Princeton systematically examines some of the conditions under which OpenAI GPT models fail to give correct answers (McCoy et al. 2023). The authors conjectured that, given the way these models are trained, the correctness of the models’ output would be influenced in part by the prevalence of specific examples in the training data (the “task probability”).
For example, in the task illustrated below, the underlying problem is the same, but the specific numbers that are plugged into the function vary.
In the first case, in the top-left, the specific values have a high task probability, ie. they appear very often in the training data, and the model often (but not always) produces the correct answer. In contrast the data points for other examples have low task probability, and, as predicted, the model shows worse performance.
McCoy et al. (2023) performed many other experiments, and identified several failure modes of large-language models that arise from the way they are trained:
Difficulty in tasks that depend on meaning. “We remain agnostic about whether LLMs truly capture meaning or only capture other properties that correlate with it; what we believe is clear is that meaning-sensitive tasks do not come naturally to systems trained solely on textual input, such that we can expect LLMs to encounter difficulty in handling these tasks.” (McCoy et al. 2023, p. 44).
Inability to modify text that has already been produced. “LLMs produce text one word at a time. They are not explicitly trained to plan far ahead, and they are unable to alter text once they have produced it. These facts sometimes cause inaccuracies in model performance because LLMs will produce an error due to the inability to plan ahead and cannot later correct that error because they cannot change their previously-produced text.” (McCoy et al. 2023, p. 44).
Training distribution: societal biases and spurious correlations. “… neural networks are susceptible to using invalid heuristics that yield the correct answer most of the time in their training distribution but that are not valid strategies in the general case” (McCoy et al. 2023, p. 47).
Training distribution: idiosynratic memorization “… if the dataset contains some sentences that are frequently repeated, the model is likely to memorize them—even if they are not important or high-probability sentences in the broader world.” (McCoy et al. 2023, p. 44).
Architecture: sensitivity to tokenization and other aspects of input formatting. “models can be brittle to perturbations (e.g., typos) that cause a word to be broken into unfamiliar token sequences, whereas approaches that do not rely on subword tokens can be more robust to such perturbation” (McCoy et al. 2023, p. 48).
Architecture: limited compositionality and systematicity. “Even though neural networks can behave compositionally and systematically, it is not straightforward for them to do so, so it can be expected that their handling of these phenomena will encounter some difficulties.” (McCoy et al. 2023, p. 49).
Clearly it will be important to take these properties into account when using large-language models in practice, but the broader contribution of the paper is an important insight into the methodology we use to understand artificial cognition; the authors advocate a teleological approach in which we take into account the goals of the system, i.e. its purpose and what it was trained to do, when trying to understand how it works and what it can do.
This is a similar to another area of cognitive science, evolutionary psychology, which attempts to understand the mind as a system designed by natural selection whose purpose is to maximise inclusive fitness. By looking at the mind through the lens of evolution we can sometimes better understand the many biases that affect human cognition (Haselton et al. 2015). For example, we perceive noisy moving objects as going faster when the sound is approaching as compared to when it is receding, a “mistake” which only makes sense when we consider it as an adaptation to the increased threat that approaching objects posed in our ancestral environment.
In order to understand the biases of artificial system we should take a teleological approach in which we account for the task the system was trained in. Yes, large-language models often get things wrong, but so do humans, and we should not rush to label them as lacking intelligence. The human brain evolved to navigate complex social interactions and ever-changing environments, whereas language models are optimized for predicting text patterns based on vast data.
Recognizing these divergent evolutionary and design pathways allows for a more nuanced understanding of what intelligence entails and reminds us that cognitive biases—human or artificial—are reflections of a system's underlying purpose and developmental context.
It's imperative that we assess systems, whether biological or artificial, with respect to their origins, training, and intended functions, acknowledging that both effectiveness and limitations are deeply rooted in the unique contexts and objectives for which each system was developed.
References
Haselton, M.G., Nettle, D. and Andrews, P.W., 2015. The evolution of cognitive bias. The handbook of evolutionary psychology, pp.724-746.
McCoy, R.T., Yao, S., Friedman, D., Hardy, M. and Griffiths, T.L., 2023. Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve. arXiv preprint arXiv:2309.13638.
Stroop, J.R., 1935. Studies of interference in serial verbal reactions. Journal of Experimental Psychology: General, 18(6), 643–662.