Evolution of Cooperation[5]: Brain Evolution and MARL Reflections

Reading notes on Michael Tomasello’s work, with my own reflections. Translated from Chinese and lightly polished with Claude.

This essay is mostly my own thinking, on two topics:

1. The debate over what drove primate brain evolution

I am of course a complete amateur in anthropology and evolutionary biology, so what follows is only a rough thought.

Tomasello’s research can be placed in a much bigger research landscape.11 Dunbar, Robin I. M., and Susanne Shultz. “Why are there so many explanations for primate brain evolution?” Philosophical Transactions of the Royal Society B 372.1727 (2017). In that landscape, the central question is: why do primates live in groups? Why did primates evolve such large brains, with brain size showing a strikingly strong positive correlation with group size? See the Dunbar–Shultz paper for details (recommended reading).

From the broader vantage point, two things about Tomasello’s theory stand out:

These are real blind spots in T’s work. Yet I find T’s research very interesting — in both method and direction — and very meaningful for the study of intelligence:

I largely agree with this. Two things follow from it:

The analysis of cognitive ability is crucial because behavior is always deceptive. Recall from Evolution of Cooperation [4] that apes hunting monkeys in coordinated formation does not imply that they have the cognitive capacity to coordinate group behavior. Yet much of the research that links predation risk to group intelligence stops at correlation, with little deep analysis of the cognitive capacity embedded in the group’s anti-predator behavior. (There may be such work — I should look — but Dunbar does not cite any.) In fact, as Evolution of Cooperation [2] noted, vocalization is part of the primate response to predators, but primate vocalization has no real cognitive scaffolding — it is just a gene-encoded emotional response.

A related question: what cognitive abilities, exactly, do different primate groups evolve in the course of their group lives? “Life processes” include things like (1) the “small-group bonding” produced by grooming; (2) inhabiting different layers of social relations, from close kin to distant acquaintances; (3) attacking another individual while having to consider its status in the wider group, including its ties to individuals not currently visible. The same principle applies: a “life process” does not entail a cognitive capacity. A qualitative analysis of cognitive abilities across different primates, different life processes, and different collective actions is what really matters.

T focuses on qualitative research, on a comparative perspective, and proposes a reasonable environmental challenge that favors group life. His theory is self-consistent. Still, I have one more intuition:

The ape-to-human transition may be special because we truly moved from large-scale collective life to large-scale collective action. The difficulty of such collective action forces many cognitive abilities to be deployed at once for the action to succeed.

So, for intelligence — or for group intelligence — perhaps the complexity of an environment should be defined by the minimum set of cognitive abilities required to act within it. This idea matches Leibo’s view.22 Joel Leibo, YouTube talk It also relates, perhaps, to the notion of “horizontal capacity.”

2. Modeling the evolution of cooperative behavior with multi-agent RL

My deepest motivation is not really primate research, but this: what are the unique capabilities that cannot be acquired in a solipsistic way? In other words: if a single agent is already strong enough (or rather, useful enough, since usefulness is not intelligence), does “learning inside a group” add anything that is uniquely intelligent?

You can approach this from many angles — for example, by asking what tasks or environments truly require a group (perhaps the most important first step). But here I want to focus on the angle of multi-agent reinforcement learning algorithms, since that is closest to Tomasello’s work.

T’s key claim from Evolution of Cooperation [3], recapped here:

A person has an individual goal. To achieve it, they may need another person’s participation — a social intention. To make participation possible, they need to form a common conceptual ground with the other person. Common conceptual ground rests on joint attention — wanting the other person and oneself to jointly attend to something (the referential intention), and to expect or direct the other to act accordingly. One route to joint attention is communication. All of this is enveloped in social norms, including the mutual assumption — or norm — of cooperation, which takes hold whenever behavior is made public.

This packs in many “cognitive capabilities” — too many to list. But the most important may include:

From a modeling perspective, a few principles may be useful:

A separate observation: a lot of work either trains MARL from scratch and looks for emergent behavior, or uses LLMs for simulation (mostly in scenarios that don’t really require cooperation). In-context learning is, of course, a form of learning — the core mechanism is shifting the output distribution. But why not provide the right environment (with person-to-person and group-to-group mixed-motive dynamics), run MARL on LLMs, and watch what emerges? It is messy, sure — but worth doing.

My views keep changing. The content above is valid as of May 27, 2026.

Additional reference: Herrmann, Esther, et al. “Humans have evolved specialized skills of social cognition: The cultural intelligence hypothesis.” Science 317.5843 (2007): 1360–1366.