How can a group of independent machines act together when no single robot has a complete picture of the world? This question has driven multi-robot systems (MRS) since the late 1980s. The field's central tension is between the desire for global efficiency and the reality of local information. Early researchers assumed that a central planner could direct every robot, but they soon discovered that communication delays, sensor noise, and sheer scale made that vision brittle. The history of MRS is a series of attempts to resolve this tension, each framework exposing the limits of its predecessors and carving out its own territory.
The first systematic approach to multi-robot coordination was Centralized Multi-Robot Coordination (1988–2005). In this framework, a single computer or lead robot receives all sensor data, computes a global plan, and transmits commands to every other robot. The appeal was clear: with full information, a central planner could produce optimal or near-optimal solutions for tasks like warehouse routing or formation movement. Early work in this vein drew on classical planning and operations research, treating the robot team as a single distributed machine.
Yet centralized coordination had two fatal weaknesses. First, it did not scale: as the number of robots grew, the computational cost of planning exploded and communication bandwidth became a bottleneck. Second, the central node was a single point of failure—if it crashed or lost contact, the entire team froze. These limitations pushed researchers to ask whether robots could coordinate using only local information, without a conductor.
Two frameworks emerged in the late 1980s and early 1990s that directly challenged the centralized assumption. Decentralized Multi-Robot Coordination (1990–Present) became a broad research program built on a simple commitment: each robot makes its own decisions using only locally available data and limited communication with neighbors. This was not a single algorithm but a family of approaches—distributed constraint satisfaction, potential fields, and auction protocols—all sharing the conviction that global coordination could arise from local interactions.
At nearly the same time, Behavior-Based Multi-Robot Systems (1988–2010) offered a more radical break. Inspired by biological systems and the broader behavior-based robotics movement, this framework rejected the very idea of internal world models and deliberative planning. Instead, each robot ran a small set of simple, reactive behaviors (avoid obstacles, follow a leader, move toward a goal) and the team's overall behavior emerged from the interaction of these low-level rules. Behavior-based systems were fast, robust, and required almost no communication, but they struggled with tasks that demanded explicit coordination, such as carrying a large object or precisely timing a joint action.
Decentralized coordination and behavior-based methods coexisted for years, but they differed in their attitude toward formal guarantees. Decentralized coordination often borrowed tools from control theory and distributed computing to prove that a team would converge to a desired state. Behavior-based practitioners were more comfortable with empirical demonstrations and tolerated unpredictable emergent outcomes.
Swarm Robotics (1992–Present) radicalized the behavior-based ethos by scaling it to hundreds or thousands of robots and by eliminating any remaining trace of explicit coordination. Where behavior-based systems might still use a leader-follower rule or a simple broadcast signal, swarm robotics insisted on strict homogeneity: every robot runs the same program, has no unique identifier, and cannot directly communicate its intentions to others. Coordination is entirely implicit, arising from the robots' reactions to the physical environment and to each other's movements.
This framework drew inspiration from social insects—ants, bees, termites—whose colonies achieve complex tasks (nest building, foraging, path finding) without any central planner or explicit messages. Swarm robotics showed that large numbers of simple, cheap robots could perform tasks like area coverage, collective transport, and self-assembly through purely local sensing and minimal computation. The cost was a loss of control: a swarm designer cannot predict exactly what each robot will do at any moment, only the statistical properties of the group's behavior.
Swarm robotics remains an active research area, especially for applications where scalability and robustness matter more than optimality. It coexists with other decentralized frameworks, but its commitment to homogeneity and emergence sets it apart from approaches that allow robots to negotiate or share plans.
As decentralized coordination matured, researchers developed structured mechanisms for two specific sub-problems: task allocation and formation maintenance. These frameworks did not replace the broader decentralized paradigm; they gave it sharper tools.
Market-Based Task Allocation (1998–2020) treated robots as self-interested agents that bid on tasks in an auction. Each robot computed its own cost for completing a task (based on distance, energy, or capability) and the auctioneer assigned the task to the lowest bidder. This approach combined the efficiency of centralized optimization with the robustness of decentralized decision-making. Market-based methods were especially successful in domains like multi-robot exploration and disaster response, where tasks were independent and robots had heterogeneous capabilities.
However, market-based allocation had limits. It required a communication channel for bidding, and the auction process could become a bottleneck in large teams. More fundamentally, it assumed that tasks could be evaluated independently—a poor fit for tightly coupled operations where one robot's action affects another's cost. By 2020, the framework had largely been absorbed into broader decentralized coordination and multi-agent learning approaches, which could handle more complex interdependencies.
Consensus and Formation Control (2000–Present) addressed a different problem: how to make a team of robots move in a desired geometric pattern using only local information. Drawing on graph theory and distributed control, this framework proved that if each robot adjusts its velocity to match its neighbors, the entire team will converge to a common heading or a prescribed formation. The key insight was that the communication graph's connectivity determines whether the team can achieve consensus. This approach provided formal guarantees—provable convergence, bounded error—that behavior-based and swarm methods could not match.
Consensus and formation control became the standard tool for applications like drone swarms, satellite formations, and coordinated surveillance. It remains active today, often combined with learning methods to handle dynamic environments or communication failures.
Multi-Agent Reinforcement Learning (MARL) (2005–Present) represents a fundamental shift in how multi-robot coordination is designed. Instead of hand-crafting rules, auction protocols, or control laws, MARL lets robots learn coordination policies through trial and error. Each robot is an agent that observes the state of the world, takes an action, and receives a reward. Over many episodes, the agents learn to maximize their collective reward.
MARL challenges all prior frameworks because it does not require a human designer to specify how robots should coordinate. In principle, it can discover coordination strategies that no human would think of. In practice, MARL faces two unique difficulties. The first is non-stationarity: as one robot learns, the environment for every other robot changes, making the learning problem a moving target. The second is credit assignment: when a team succeeds or fails, it is hard to determine which robot's actions were responsible.
Despite these challenges, MARL has become a leading framework, especially for tasks with complex dynamics, such as multi-robot soccer, autonomous driving, and warehouse coordination. It often incorporates ideas from older frameworks—using consensus to stabilize learning, or auction mechanisms to structure exploration—but it treats coordination as something to be discovered rather than designed.
Today, no single framework dominates multi-robot systems. The field is pluralistic, with different approaches suited to different problems. Decentralized coordination remains the default paradigm for most applications, but it has been transformed by learning methods. Consensus and formation control provides provable guarantees for formation tasks, while swarm robotics excels at large-scale, homogeneous tasks where emergence is acceptable. MARL is the fastest-growing area, but its lack of formal guarantees and high sample complexity limit its use in safety-critical domains.
There is broad agreement on one point: centralized coordination is impractical for most real-world systems. The field has converged on the principle that robots should act on local information. The major disagreement is about how much structure to impose. Control theorists argue for provable convergence and bounded error. Learning researchers argue that hand-crafted rules are brittle and that coordination should be learned from experience. Swarm robotics practitioners accept emergence as a design principle, while others see it as a last resort. These disagreements are productive: they drive the development of hybrid approaches that combine formal guarantees with learned policies, or that use swarm principles to initialize MARL training.
The central tension that opened the field—how to coordinate without a conductor—remains unresolved. But the frameworks that have emerged in response to it have given researchers a rich toolkit, from provable control laws to emergent swarm behavior to learned policies. The choice of framework depends on the task, the number of robots, the need for guarantees, and the tolerance for unpredictability.