How can a collection of autonomous agents—each with its own perceptions, goals, and limited knowledge—coordinate to achieve outcomes that no single agent could produce alone? This coordination problem is the central pressure that has driven multiagent systems (MAS) since its emergence. Unlike single-agent AI, where a solitary program faces a static environment, MAS must contend with agents that act concurrently, communicate imperfectly, and may pursue conflicting objectives. The field's history is a sequence of frameworks that each reframed the coordination problem, reacting to the limitations of earlier approaches while preserving their insights.
The Actor Model, introduced in 1973, provided the first computational foundation for thinking about agents as autonomous entities. Instead of shared memory or centralized control, actors communicate exclusively through asynchronous message passing. Each actor has a mailbox, a local state, and the ability to send messages to other actors whose addresses it knows. This design rejected the assumption that concurrent computation could be managed by a single scheduler or shared data structure. For MAS, the Actor Model established a lasting principle: agency begins with encapsulation. An agent's internal state is private; coordination happens only through explicit communication. This message-passing paradigm later influenced agent communication languages and the design of distributed systems, though the Actor Model itself did not address how agents should decide what messages to send or how to resolve conflicts.
By the late 1970s, researchers began asking how a network of agents could jointly solve a single problem. Distributed Problem Solving (DPS) assumed that agents were cooperative by design—they shared a global goal and were willing to suboptimize locally for the sake of the system. The Contract Net protocol, a landmark DPS mechanism, allowed agents to announce tasks, receive bids from potential contractors, and award contracts based on cost and capability. This framework treated coordination as a form of task decomposition and resource allocation. DPS worked well for domains like distributed sensor networks and factory scheduling, where agents had aligned incentives. But its cooperative assumption became a limitation as MAS expanded into settings where agents represented different stakeholders with private interests. The field needed a way to model strategic behavior, not just cooperative problem solving.
In the mid-1980s, a wave of criticism targeted the symbolic, deliberative planning that underlay most early MAS. Reactive Agents, inspired by behavior-based robotics, argued that intelligent coordination did not require internal world models or explicit reasoning. Instead, agents could be built from simple stimulus-response rules—if a sensor detects an obstacle, turn left—and still produce coherent group behavior through interaction with the environment. This framework narrowed the design space dramatically: no planning, no communication, no representation of other agents. The reactive turn showed that much of what looked like coordination could emerge from local rules without any central designer. Yet pure reactivity struggled with tasks that required foresight, memory, or explicit commitments. An agent that cannot plan ahead cannot promise to meet another agent at a specific time.
The Belief-Desire-Intention (BDI) architecture, proposed in 1987, directly confronted the tension between reactivity and deliberation. BDI agents maintain three mental attitudes: beliefs about the world, desires representing goals, and intentions representing committed courses of action. The key innovation was the notion of commitment—an intention persists even when the agent's beliefs change, unless a compelling reason to reconsider arises. This allowed BDI agents to act quickly in dynamic environments without replanning from scratch at every step, while still retaining the ability to pursue long-term goals. BDI absorbed the reactive challenge by incorporating fast, precompiled plan libraries alongside deliberative reasoning. It became the dominant architecture for agent-oriented software engineering, used in applications from autonomous spacecraft to business process management. BDI remains active today, often combined with other frameworks: a BDI agent might use game-theoretic reasoning to decide which intentions to adopt, or employ reinforcement learning to refine its plan selection.
If DPS assumed cooperation and BDI focused on individual practical reasoning, game-theoretic MAS, emerging around 1990, addressed the missing piece: agents with conflicting interests. This framework imported equilibrium concepts from economics—Nash equilibrium, Bayesian games, mechanism design—to model how self-interested agents should act when their outcomes depend on others' choices. Auctions, negotiation protocols, and voting rules became central topics. Game-theoretic MAS provided rigorous guarantees: under certain conditions, a mechanism can ensure that rational agents acting in their own interest produce a socially desirable outcome. This was a sharp departure from DPS's cooperative assumption. However, game theory assumed that agents are perfectly rational and have common knowledge of the game structure—conditions rarely met in practice. The framework coexists with BDI and MARL, offering normative benchmarks for what rational agents should do, even when actual agents fall short.
Multi-Agent Reinforcement Learning (MARL), emerging in the mid-1990s, took a different route: instead of prescribing optimal strategies, it let agents learn coordination through trial and error. Each agent runs a reinforcement learning algorithm, updating its policy based on rewards received. MARL faced two fundamental problems that single-agent RL does not. First, non-stationarity: as other agents learn, the environment changes from the perspective of any one agent, violating the Markov assumption that underlies standard RL. Second, credit assignment: when multiple agents contribute to a reward, how should each agent's contribution be evaluated? These problems pushed MARL researchers to borrow equilibrium concepts from game theory—for example, learning algorithms that converge to Nash equilibria in self-play. MARL has been transformed by deep learning, enabling agents to learn complex policies from high-dimensional inputs. Today, MARL is the leading framework for problems where the environment is too complex to model analytically, such as multiplayer games, robot soccer, and autonomous driving. It coexists with game-theoretic MAS: game theory provides the equilibrium targets, while MARL provides the learning dynamics.
Swarm Intelligence, dating from 1989, represents the most radical departure from centralized coordination. Inspired by ant colonies, bird flocks, and fish schools, swarm systems consist of large numbers of simple agents following local rules—move toward the average position of neighbors, avoid collisions, align velocity. Global patterns like foraging trails or flocking emerge without any agent having a global representation or explicit communication. Swarm Intelligence narrowed the agent design to extreme simplicity, contrasting sharply with BDI's rich mental states and game theory's strategic reasoning. It proved remarkably effective for optimization (ant colony optimization, particle swarm optimization) and for tasks requiring robustness and scalability, such as drone swarm coordination. But swarm methods struggle when agents need to make distinct, heterogeneous decisions or when the desired global behavior is hard to specify through local rules. Swarm Intelligence and Organization-Oriented Design represent opposite solutions to the same decentralization problem: emergence versus engineered structure.
Organization-Oriented Design (OOD), emerging around 1998, took the view that coordination should be engineered explicitly through roles, norms, and organizational structures. Instead of hoping that global behavior emerges from local rules, OOD defines a social architecture: agents occupy roles with specified permissions and responsibilities; norms regulate permissible actions; organizational structures define reporting lines and information flow. This framework absorbed insights from DPS (task decomposition) and BDI (agent capabilities) but added a top-down design layer. OOD is particularly suited for applications requiring verifiability and compliance, such as business process automation, disaster response coordination, and multi-robot systems with safety constraints. It coexists with Swarm Intelligence as a complementary approach: OOD provides the structure for systems where behavior must be predictable, while Swarm Intelligence offers flexibility for systems where it can be emergent.
The most recent framework, Multi-Agent Safety (2015–present), reframes the field's success criterion from coordination effectiveness to trustworthiness. As MAS are deployed in high-stakes domains—autonomous driving, financial trading, power grids—the question is no longer just whether agents can coordinate, but whether they can do so safely, robustly, and align with human values. Multi-Agent Safety draws on and challenges both game-theoretic mechanism design and MARL. From game theory, it inherits the concern with incentives and equilibrium selection, but adds the requirement that mechanisms be robust to adversarial manipulation and model misspecification. From MARL, it inherits the need for learning, but demands that learned policies satisfy safety constraints even during exploration. This framework is still young, but it has already produced new problem formulations: safe multi-agent exploration, robust mechanism design, and value alignment in groups of learning agents.
No single framework has won. The leading frameworks today—BDI, game-theoretic MAS, MARL, and Swarm Intelligence—each excel in different regions of the design space. BDI is best for applications requiring explicit reasoning about goals and commitments, especially in human-agent teams. Game-theoretic MAS provides the gold standard for analyzing strategic interactions and designing incentive-compatible mechanisms. MARL is the tool of choice for learning coordination in complex, high-dimensional environments. Swarm Intelligence offers unmatched scalability and robustness for homogeneous agent collectives. These frameworks are increasingly hybridized: BDI agents use MARL to refine their plan selection; game-theoretic mechanisms regulate the behavior of MARL agents to ensure convergence to desirable equilibria; swarm algorithms are combined with OOD to add structure to emergent behaviors. The main disagreements center on how much intelligence to put in each agent (rich BDI reasoning vs. simple swarm rules) and how to handle the tension between learning and safety (MARL's exploration vs. Multi-Agent Safety's constraints). The field's pluralism is not a sign of fragmentation but a recognition that coordination is not a single problem—it is a family of problems, each requiring its own combination of frameworks.