From Social Brains to Agent Societies
Evolving Cooperation for Autonomous Systems
We hear a lot about “scaling” in AI. Currently, the conversation often revolves around scaling large language models (LLMs)—the hypothesis that increasing the number of parameters in a model unlocks greater capabilities. This has generated a great deal of debate, but largely focuses on the capabilities of individual models. However when we consider AI agents—software entities that interact with humans and other agents—the idea of scaling takes on a very different meaning.
This article explores how nature’s methods for scaling agents can inform the design of large-scale AI systems—arguing that intelligence is inherently social, and thus successful scaling requires mechanisms that sustain cooperation among many agents.
AI agents do not exist in isolation. Like humans, they operate in a world filled with other actors, both artificial and human. As such, they are inherently part of wider distributed, heterogeneous systems. second part of this post
Moreover, within the field of AI, intelligence itself has long been considered as agentic. In the 1980’s Marvin Minsky, in his book Society of Mind, argued that human intelligence arises not from a single unified mind but from the interaction of many smaller mental “agents” each responsible for simple functions, whose coordination produces complex behavior. This perspective maps naturally onto recent developments in AI, where prompt pipelines orchestrate multiple specialized LLM agents to perform modular sub-tasks—such as planning, memory retrieval, and decision-making—within a larger workflow. Just as Minsky’s internal agents required carefully designed communication and conflict-resolution mechanisms to function coherently, prompt pipelines must be governed by protocols that manage dependencies, resolve ambiguity, and align sub-agent outputs toward a common objective. The challenge is not simply computational, but socio-cognitive: these artificial “societies of mind” need rules, incentives, and shared context to behave coherently as a system.
A core challenge then becomes: how do we scale not just individual intelligence, but collective intelligence? How can we design systems in which increasing the number of agents leads to enhanced cooperation and productivity, rather than instability, defection, or collapse?
This problem is not restricted to AI, it occurs throughout nature; biology, sociology, and economics have all grappled with variants of the same essential problem. For example, drawing on primatology and social anthropology, the social brain hypothesis is based a striking correlation observed by Robin Dunbar. He noticed that among different species of primates, the relative size of their neocortex compared to the rest of the brain—the neocortex ratio— is correlated with that species’ mean group size. On the graph below, each circle represents a different species of primate, grouped by family1
Moreover, Dunbar went on to extrapolate human group sizes from this correlation. Humans are a species of ape, and have an average neocortex ratio of approximately 4.1. Dunbar plugged this number into the x-axis above, and using the regression line for apes predicted a mean group size for humans of approximately 150, which is now known as Dunbar’s number:
This prediction was later supported by ethnographic and observational studies of human communities. For example, it aligns with the typical size of military units such as the Roman centuria, the average number of active relationships an individual maintains on social media, and the size of traditional hunter-gatherer bands.
So there seems to be a clear scaling law: larger groups, i.e. larger numbers of agents, need larger brains. To explain this scaling, Dunbar posited that larger groups require more cognitive resources to manage social relationships in order to maintain group cohesion in the face of rivalries and competition, and thus species with larger neocortex ratios are able to support larger social groups.
As hominid groups grew, they developed new tools—like language, culture, and institutions—to extend cooperation beyond what could be maintained by solitary cognition alone. These social tools served to scale trust, creating reputations, shared narratives and norms that held groups together. The brain grew and co-evolved in tandem with the culture and societies it helped sustain.
A similar story plays out throughout nature. The evolution of language and culture is just one of the major evolutionary transitions, as outlined by Maynard Smith and Szathmáry, These are events in evolutionary history when individual units such as molecules, cells, or organisms came together to form larger, more complex systems. Each transition involved not just an increase in size or number, but the emergence of new mechanisms for coordination, communication, and conflict resolution that allowed these larger structures to function as coherent wholes. Maynard Smith and Szathmáry identified several of these key turning points:
Simple molecules into cells – The first major transition involved self-replicating molecules becoming enclosed within lipid or protein compartments, forming protocells. Compartmentalization provided selective advantages by protecting molecules from environmental disturbances and enabling more stable chemical interactions, thus enhancing replicative efficiency.
Genes joining into chromosomes – Early replicators likely existed as individual genes replicating independently. By joining into chromosomes, genes could replicate in synchrony, reducing conflicts among competing genes and ensuring coordinated inheritance, which increased overall genomic stability.
Cells with DNA and proteins – Initially, life relied on RNA for both genetic information storage and catalytic activity (the RNA world hypothesis). Eventually, a division of labor emerged: DNA, being chemically stable, became the primary information-storage molecule, while proteins, structurally diverse and efficient catalysts, took over catalytic roles. This specialization increased cellular efficiency and robustness.
Simple cells merging into complex ones (eukaryotes) – Simple cells merging into complex ones (eukaryotes) – Some single-celled organisms engulfed others, which then became symbiotic partners rather than being digested. For example, mitochondria and chloroplasts originated from free-living bacteria through this process of endosymbiosis.
Asexual reproduction to sexual reproduction – Transitioning from asexual reproduction to sexual reproduction allowed organisms to combine genetic material from two parents through recombination. This genetic mixing dramatically increased variation, accelerating adaptation and enhancing survival in changing environments.
Single-celled to multicellular organisms – Single-celled organisms transitioned to multicellularity by forming coordinated groups in which cells took on specialized roles. Some cells specialized in direct reproduction (germ cells), while others (somatic cells, e.g., muscle, skin, neurons) supported the organism’s survival and indirectly promoted reproduction through kin selection.
Evolution of eusociality – Some animal species began cooperating in complex hierarchical colonies (e.g. ants, bees, termites). These social structures enhanced survival and reproductive success through coordinated behavior, shared defense, and collective resource acquisition..
The emergence of human language, culture and societies – This transition allowed humans to transmit ideas, knowledge, and norms culturally through language, teaching, and storytelling rather than solely through genetics. Cumulative cultural evolution enabled knowledge to build progressively over generations, facilitating cooperation in increasingly large, complex societies and laying the foundations for modern civilization
Each of these transitions required mechanisms to prevent cheating and maintain cooperation between the constituent parts, each of which has the possibility of disrupting the higher-level system for its own benefit. For example, Paul Davies argues that one way to explain cancer is by viewing cancer cells as “selfish”:
With the appearance of energised oxygen-guzzling cells, the way lay open for the second major transition relevant to cancer – the emergence of multicellular organisms. This required a drastic change in the basic logic of life. Single cells have one imperative – to go on replicating. In that sense, they are immortal. But in multicelled organisms, ordinary cells have outsourced their immortality to specialised germ cells – sperm and eggs – whose job is to carry genes into future generations. The price that the ordinary cells pay for this contract is death; most replicate for a while, but all are programmed to commit suicide when their use-by date is up, a process known as apoptosis. And apoptosis is also managed by mitochondria.
Cancer involves a breakdown of the covenant between germ cells and the rest. Malignant cells disable apoptosis and make a bid for their own immortality, forming tumours as they start to overpopulate their niches. In this sense, cancer has long been recognised as a throwback to a "selfish cell" era.
As LLM-based agents proliferate—customer service bots, trading agents, collaborative research assistants—the same question arises: how can systems of agents scale without collapsing into chaos, exploitation, or inefficiency?
This is already visible in today’s experiments. Projects like AutoGPT and AgentVerse deploy multiple agents that set tasks for one another, share tools, and attempt collaborative problem-solving. Yet they often fall prey to classic failure modes: endless loops, redundant tasks, or conflicting goals (Chen et al., 2023). The issue isn’t the intelligence of individual agents—it’s their lack of social cognition and mechanisms to coordinate. The history of life suggests that cooperation at scale is hard, but possible—with the right mechanisms in place.
The underlying tension between individual rationality and collective benefit can be understood by analysing a famous stylized model of cooperation from the field of game-theory: the Prisoner's Dilemma. In this scenario, two players, henceforce “agents”, independently and simultaneously choose whether to “cooperate” (help the other agent) or “defect” (betray the other agent) without communicating with each other in advance. The resulting payoffs can be represented in a matrix:
Here, mutual cooperation yields a moderate payoff for both (3, 3), but each agent is tempted to betray its partner by choosing “defect” because it offers a higher individual reward if the other cooperates (5, 0). However, if both fail to cooperate by choosing defect, the outcome is worse for both (1, 1).
It is called the Prisoner’s Dilemma because one example famous scenario used to explain the dilemma involved two prisoners. From the Stanford Encyclopedia of Philsophy:
Tanya and Cinque have been arrested for robbing the Hibernia Savings Bank and placed in separate isolation cells. Both care much more about their personal freedom than about the welfare of their accomplice. A clever prosecutor makes the following offer to each: “You may choose to confess or remain silent. If you confess and your accomplice remains silent I will drop all charges against you and use your testimony to ensure that your accomplice does serious time. Likewise, if your accomplice confesses while you remain silent, they will go free while you do the time. If you both confess I get two convictions, but I'll see to it that you both get early parole. If you both remain silent, I'll have to settle for token sentences on firearms possession charges. If you wish to confess, you must leave a note with the jailer before my return tomorrow morning.
But the game is not just hypothetical, its structure closely mirrors very many real-world dilemmas—most famously, the Cold War doctrine of Mutually Assured Destruction (MAD), heavily studied by the RAND Corporation. In this scenario, the two nuclear superpowers faced a stark dilemma: disarming would be collectively beneficial, but each had a strong incentive to maintain or even pre-emptively use nuclear weapons out of fear the other might do the same. This real-world instantiation of the Prisoner's Dilemma underscores how game theory can help explain the fragility of cooperation in high-stakes environments.
This also illustrates the core problem in multi-agent systems: what works well for a single agent optimizing its own loss function may lead to inefficiencies or failure when many such agents interact. Social dilemmas like the Prisoner’s Dilemma capture the tension between individual and collective benefit and highlight the need for coordination mechanisms that can align self-interested behavior with socially beneficial outcomes. Without such mechanisms when each agent optimizes for its own outcome, the group as a whole can suffer.
Despite its simplicity, the Prisoner’s Dilemma model, and its variants, are highly versatile, and can be extended to model more realistic situations in which we have not just two players but a whole population of agents adapting their choices over many rounds of play under different levels of competition. Such models have been used to explain the evolution of animal behavior. A key concept in sustaining cooperation in such settings is conditional reciprocity — choosing to cooperate conditionally based on previous interactions. Robert Trivers, and later Martin A. Nowak (Nowak & Sigmund, 2005) showed that, under certain conditions, when some agents in a large populations use strategies based on conditional reciprocity, the population eventually stabilizes on high levels of cooperation, as defectors are gradually driven out.
Prior to Trivers’ work it was understood that kin selection can elicit cooperation based on genetic relatedness. Organisms are more likely to help relatives because doing so increases the propagation of shared genes. This is often observed in eusocial insects like ants or bees, where individuals sacrifice personal reproduction to support the colony. Trivers famously put it: "Would I lay down my life to save my brother? No, but I would to save two brothers or eight cousins" This quote encapsulates Hamilton’s rule, which formalizes the idea that altruistic behavior can evolve when the cost to the actor is less than the benefit to the recipient multiplied by their degree of relatedness.
Direct reciprocit (a in the figure below), by contrast, refers to cooperative behavior between unrelated individuals, where one agent helps another with the expectation that the favor will be returned in the future. This form of cooperation is called reciprocal altruism, and depends on repeated interactions and memory of past behavior. Nowak further formalized this dynamic using strategies based on Anatol Rapoport’s Tit-for-Tat strategy—where an agent begins by cooperating and then mimics its partner's previous action in subsequent rounds. For example, if Agent A cooperates and Agent B defects, then Agent A will retaliate by defecting in the next round, but will return to cooperation if Agent B does. Thus agents who cooperate, elicit cooperation in turn, but defectors are punished with likewise defection.
Simulations and mathematical modelling show that when self-interested agents learn by imitation from each other, or when the population evolves by eliminating agents who do not accrue rewards, while allowing the successful to reproduce, then this can help drive out defection. Thus reciprocal altruism is simple yet effective in eliciting stable cooperation in repeated interactions—showing how cooperation can be learned, or evolve, even when each agent is acting selfishly to maximise its own reward or fitness.
In indirect reciprocity (b in the above figure), agents build reputations by helping others, knowing that their cooperative behavior will be observed and rewarded by third parties in the future—you help someone, and others help you because you have a public reputation as a helpful person. Once again, the presence of this strategy in an evolving or socially-learning population can stablise on high levels of cooperation; defectors have poor reputations, and accordingly do not accrue rewards, which in turn drives them out. This form of reciprocity is especially significant in human societies, where cooperation often depends on reputation signals. While the interactions themselves are still pairwise, what distinguishes indirect reciprocity is the public nature of information about those interactions. Agents observe, remember, and communicate third-party behaviors, allowing reputations to circulate and influence decisions even among agents who have never interacted directly; this mechanism scales trust by facilitating cooperation between strangers.
Trivers’ and Nowak's theoretical models highlight the potential of reciprocal altruism by formalizing how a memory of previous interactions and reputational information can sustain cooperative behavior. But crucially we observe similar behavior in actual animal populations; vampire bats offer one of the most striking examples of direct reciprocity in the animal kingdom. These bats feed on blood, a resource that is hard to find and critical for survival—missing a meal for just two nights can be fatal. Remarkably, well-fed bats will often regurgitate blood to share with hungry roost-mates, even if they are not closely related. Studies have shown that this sharing is not random but is based on past interactions: bats remember who helped them before and are more likely to return the favor later; that is they seem to use a strategy based on direct reciprocity.
Additionally Dunbar’s work in primatology and anthropology, which underpins the social brain hypothesis, suggests that human capacities such as language, moral judgment, and gossip evolved in part to support the tracking and sharing of social reputations, forming the cognitive scaffolding necessary for large-scale cooperation via indirect reciprocity.
Similar ideas explain cooperation in modern economics. The tragedy of the commons, popularized by Garrett Hardin in 1968, describes a situation in which individual users, acting independently and rationally according to their self-interest, deplete or spoil shared resources—even though it is clearly not in anyone's long-term interest to do so, e.g. villagers grazing their cattle on a common pasture; each herder benefits personally from adding more cattle, but collectively, this practice exhausts the land, harming everyone. This concept has become foundational in economics, environmental science, and political science to illustrate the risks of unmanaged shared resources. Although the commons was originally envisaged as physical resources, the idea has been extended to the digital commons, such as open-source software.
The tragedy of the commons is yet another example of social dilemma, and can be formalized as a variant of the Prisoner’s Dilemma discussed above. In contrast to reciprocal altruism, economists have proposed that the tragedy can be averted using mechanisms such as regulation and privatization to align individual incentives with sustainable outcomes.
Elinor Ostrom fundamentally changed how we think about resource management, not just in our distant evolutionary past, but in contemporary societies. Contrary to the prevailing belief that only top-down regulation or privatization could prevent the tragedy of the commons, Ostrom demonstrated through extensive fieldwork that communities around the world had successfully managed common resources—like forests, fisheries, and irrigation systems—without central authorities. She identified key design principles that enabled these communities to sustain cooperation over time: clearly defined boundaries, collective decision-making, graduated sanctions for rule-breakers, and mechanisms for conflict resolution. Her work showed that self-governance is not only possible but often more effective than imposed solutions.
Ostrom’s insights have deep relevance for multi-agent systems today: as we build artificial societies composed of autonomous agents, the challenge is not unlike that faced by human communities managing shared resources. Effective cooperation will likely depend on similar principles—localized rules, collective monitoring, and adaptive governance—embedded in the protocols that shape agent behavior.
While these bottom-up mechanisms support cooperation in many settings, they may falter in certain use-cases, e.g. when reputational signals are noisy or absent due to limited data or history of interactions. Cybernetics, first formalized by Norbert Wiener in the 1940s, emphasizes adaptation through feedback loops—such as a thermostat regulating temperature— to maintain stability in dynamic systems (Wiener, 1948). In biology, cybernetic principles govern systems like the endocrine system, which uses hormones to regulate growth and metabolism, or the immune system, which identifies and eliminates rogue cells to preserve system integrity. Just as these systems maintain stability in living organisms, artificial agents can use similar feedback mechanisms to regulate behavior, enforce norms, and prevent systemic failure. Cybernetic control offers a means of enforcing coherence and long-term stability in agent societies. Rather than relying solely on local strategies like reciprocity, cybernetic systems provide some level of top-down coordination, in addition to feedback, adaptation and autonomy, that can correct imbalances, resolve conflicts, and ensure that individual agents' incentives remain aligned with collective goals when faced with a dynamic and uncertain environment.
In artificial multi-agent systems, these ideas can be translated into governance structures: monitors that observe agent behavior, feedback loops that adjust rewards and penalties, and controllers that maintain systemic balance by enforcing shared rules. Emerging technologies such as smart contracts can play a role here, serving as programmable enforcement mechanisms that automatically execute agreed-upon rules, reduce ambiguity, and ensure transparency in agent interactions.
A complementary approach comes from mechanism design, a branch of game theory sometimes described as "inverse game theory": rather than analyzing the outcomes of a given game, the goal is to design the rules of a multi-player game so that the equilibrium outcomes are socially desirable—even when all participants act in their own self-interest. In my earlier work on evolutionary mechanism design, I argued for treating the design of agent protocols not as a one-shot theoretical exercise, but as an iterative, co-evolutionary process. Rather than assuming rational agents follow simple strategies in a stylized game, we simulate agent populations interacting in realistic environments using actual behaviors observed in the real world—adjusting rules through cycles of testing, measurement, and refinement. This “design as evolution” ensures that even when the environment is dynamic and unpredictable, the overall system behavior still remains robust and aligned.
When applied to LLM-based agent societies, these principles could guide the development of reputation systems, task allocation protocols, or access controls that evolve alongside agent behavior—helping governance frameworks adapt over time rather than being brittle or hard-coded. But this will only be effective if the foundation-models on which our agents our based are capable of social cognition.
Projects like Meta’s CICERO (Bakhtin et al., 2022), show what’s possible when agents are trained to negotiate, build alliances, and even deceive strategically in multiplayer games. To test whether large language models can specifically operationalize reciprocal altruism we ran a series of experiments using GPT models (Phelps & Russell, 2024). Our goal was to see how foundation-model-based agents would behave in social dilemmas. We used game-theoretic setups—the one-shot Dictator Game and the iterated Prisoner’s Dilemma—and prompted the models with different motivational attitudes (altruistic, selfish, cooperative, competitive). These scenarios allowed us to evaluate under what conditions these models would enact behaviors corresponding to unconditional defection, unconditional altruism, reciprocal altruism etc as modelled in the evolution of cooperation.
Overall, our study demonstrated that some GPT models can translate folk‑psychological descriptions of behavior into corresponding strategies in repeated social dilemmas. This provides evidence of a latent machine psychology of cooperation—a model’s ability to operationalize human-like strategic reasoning to social dilemmas when given the right framing. But robust deployment of this capacity in multi-agent systems still requires scaffolding: reputation systems and reliable data of past interactions.
A number of projects aim to leverage blockchain and Web3 to create decentralized reputation systems for AI agents; ie to enable AI agents to use indirect reciprocity. These systems remove the need for a single trusted intermediary by using on-chain records, smart contracts, and tokens to manage trust. For example, PipeIQ proposes that each agent will have a verifiable trust score based on its performance (pipeiq.ai). Agents carry a cryptographic on-chain identity, and based on this their successful task completions or failures could be recorded to update reputation. This could be done centrally, but also peer-to-peer. For example, we could have staking-linked reputation – an agent could stake the network’s native token as reputation collateral to boost the credibility of ratings it provides on other agents. If the rated agent then misbehaves or fails, the rater could lose stake. That is, we could adapt mechanisms used in decentralized finance to build scalable trust for AI agents.
If we want want AI to truly scale, we must focus less on parameter counts and more on norms, institutions, and trust architectures. We need ways for agents to signal integrity, evaluate trustworthiness, and form lasting affiliations. We need hybrid systems that blend decentralized flexibility with centralized stability. And above all, we need to remember that intelligence is social. As naptha.ai puts it: “Intelligence thrives in vast, diverse ecosystems”.
As with the evolution of human societies, scaling AI agents is not simply a technical problem—it’s a political, economic, and ethical one. Practical AI is not just about building bigger models. We’re building new societies—composed of both human and artificial agents—interacting, cooperating, and co-governing shared environments. And the quality of those societies will depend not on how how many benchmarks our models beat, but on how well agents cooperate and coordinate in complex heterogeneous environments. The design of socially intelligent AI systems thus requires deliberate choices not only about technology, but also about the norms, ethics, and governance structures that will shape future interactions between humans and artificial agents.
To continue reading:
In the second part of this post, we review incentive-engineering in the wild, looking at staking-based enforcement and real-world agent platforms.In part three we examine frameworks for reputation and identity.
Bibliography
Meta Fundamental AI Research Diplomacy Team (FAIR)†, Bakhtin, A., Brown, N., Dinan, E., Farina, G., Flaherty, C., Fried, D., Goff, A., Gray, J., Hu, H. and Jacob, A.P., 2022. Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science, 378(6624), pp.1067-1074.
Chen, W., Su, Y., Zuo, J., Yang, C., Yuan, C., Qian, C., Chan, C.M., Qin, Y., Lu, Y., Xie, R. and Liu, Z., 2023. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arXiv preprint arXiv:2308.10848, 2(4), p.6.
Dunbar, R.I., 2024. The social brain hypothesis–thirty years on. Annals of Human Biology, 51(1), p.2359920.
Hamilton, W.D., 1964. The genetical evolution of social behaviour. II. Journal of theoretical biology, 7(1), pp.17-52.
Feeny, D., Berkes, F., McCay, B.J. and Acheson, J.M., 1990. The tragedy of the commons: twenty-two years later. Human ecology, 18(1), pp.1-19.
Greco, G.M. and Floridi, L., 2004. The tragedy of the digital commons. Ethics and Information Technology, 6(2), pp.73-81.
Maynard Smith, J. and Szathmary, E., 1997. The major transitions in evolution. Oxford University Press.
Minsky, M., 1986. Society of mind. Simon and Schuster.
Nowak, M. A., & Sigmund, K. (2005). Evolution of indirect reciprocity. Nature, 437(7063), 1291-1298.
Ostrom, E. (1990). Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge University Press.
Phelps, S. and Russell, Y.I., 2025. The machine psychology of cooperation: can GPT models operationalize prompts for altruism, cooperation, competitiveness, and selfishness in economic games?. Journal of Physics: Complexity, 6(1), p.015018.
Phelps, S., McBurney, P. and Parsons, S., 2010. Evolutionary mechanism design: a review. Autonomous agents and multi-agent systems, 21(2), pp.237-264.
Trivers, R.L., 1971. The evolution of reciprocal altruism. The Quarterly review of biology, 46(1), pp.35-57.
Wiener, N. (1948). Cybernetics: Or Control and Communication in the Animal and the Machine. MIT Press.
Monkeys comprise a diverse group of primates within the infraorder Simiiformes, including both Old World and New World lineages. While not a single taxonomic family, the term 'monkey' loosely refers to members of the superfamilies Ceboidea and Cercopithecoidea






