Distributed algorithms are the rules by which separate computers—each with its own memory, its own clock, and no direct knowledge of the others' state—reach agreement, coordinate actions, and maintain consistency. The central tension that drives the field is this: how can a collection of independent processes behave as a coherent system when they cannot rely on shared memory, a global clock, or perfectly reliable communication? Every framework in the history of distributed algorithms is an answer to some piece of that question, and each new framework emerged because earlier answers left something unresolved.
The earliest distributed algorithms, developed in the 1970s, tackled the most basic coordination problems: ensuring that only one process at a time accesses a shared resource (mutual exclusion) and picking a single process to coordinate a task (leader election). These problems were already well understood in centralized systems, but the absence of shared memory forced a new approach: processes could only coordinate by exchanging messages. The Ricart–Agrawala algorithm for mutual exclusion and the Bully algorithm for leader election established the pattern of using message passing with logical ordering to simulate centralized control. Although both problems involve agreement, they differ in structure: mutual exclusion requires continuous arbitration over a resource, while leader election is a one-time decision that then simplifies subsequent coordination. These early frameworks showed that distributed coordination was possible, but they assumed that processes were reliable and that messages arrived within a known time—assumptions that later frameworks would challenge.
The most consequential methodological choice in distributed algorithms is whether to model the system as synchronous or asynchronous. In a synchronous model, processes execute in lockstep rounds, and messages are guaranteed to arrive within a bounded delay. This assumption makes reasoning about time and failures straightforward, and many early algorithms (including the first mutual exclusion and leader election protocols) implicitly relied on it. The asynchronous model, by contrast, imposes no bounds on message delays or process speeds; a message may be arbitrarily delayed, and a process may pause indefinitely. This model is more realistic for the Internet and large-scale systems, but it also makes coordination fundamentally harder. The synchronous/asynchronous split is not just one framework among equals—it is the foundational modeling decision that defines the design space for every other framework. Synchronous models enable strong guarantees but are brittle in practice; asynchronous models are robust but impose impossibility results that constrain what can be achieved. Later frameworks, especially Consensus, would be forced to confront this trade-off directly.
Without a global clock, processes cannot agree on the order of events by timestamp alone. Logical time, introduced by Leslie Lamport in 1978, provided a substitute: a way to assign timestamps to events so that if one event causally affects another, the timestamp of the cause is less than that of the effect. Lamport clocks are simple—each process increments a counter and piggybacks it on messages—but they cannot distinguish concurrent events from causally related ones. If two events have the same Lamport timestamp, they might be concurrent, but the clock cannot tell. Vector clocks, developed shortly after, solved this by having each process maintain a vector of counters, one per process. With vector clocks, an event's timestamp carries enough information to determine whether two events are concurrent or causally ordered. Logical time became the reasoning infrastructure for almost everything that followed: it made Distributed Snapshots possible, it underlies the ordering guarantees in Consensus protocols, and it remains the standard tool for causal reasoning in distributed systems.
Once logical time gave the field a way to talk about event ordering, the next question was whether a consistent global state of a distributed system could be recorded without stopping all processes. The Distributed Snapshots framework, introduced by Chandy and Lamport in 1985, showed that it could. The algorithm uses marker messages that propagate through the system, and each process records its local state when it receives a marker on each incoming channel. The resulting snapshot is consistent: it could have been observed at some instant in a hypothetical synchronous execution, even though no global clock exists. Distributed Snapshots differ from Consensus in that they do not require processes to agree on a single value; they only need to agree on a consistent cut of the execution. This makes snapshots a lighter-weight coordination primitive, useful for checkpointing, debugging, and monitoring. The framework coexists with Consensus as a complementary tool for observing rather than deciding.
Consensus is the problem of getting multiple processes to agree on a single value despite failures. It became the central paradigm of distributed algorithms after Fischer, Lynch, and Paterson proved in 1985 that consensus is impossible in a purely asynchronous system if even one process can crash. The FLP impossibility result is often misunderstood: it does not say that consensus can never be achieved, but that no deterministic algorithm can guarantee agreement in bounded time under all possible failure scenarios. The adversary can always delay messages or processes just enough to prevent a decision. This result forced the field to confront the synchronous/asynchronous trade-off head-on. Practical consensus protocols, such as Paxos and Raft, work by assuming partial synchrony—the system behaves asynchronously most of the time but occasionally becomes synchronous long enough to make progress. Later extensions, such as Byzantine fault-tolerant consensus (e.g., PBFT), handle arbitrary malicious failures by requiring more rounds and more replicas. Consensus remains the dominant framework for achieving strong consistency in replicated systems, from databases to blockchains. Its assumptions—that processes must coordinate to agree—stand in direct tension with the next framework.
Conflict-Free Replicated Data Types (CRDTs), introduced around 2000, represent a deliberate departure from the consensus paradigm. Instead of coordinating to agree on a single order of updates, CRDTs are designed so that replicas can accept updates independently and later merge them without conflicts. The key design principle is commutativity: operations that commute (or are idempotent) can be applied in any order and still converge to the same state. State-based CRDTs (CvRDTs) merge by taking the least upper bound in a join-semilattice; operation-based CRDTs (CmRDTs) rely on reliable broadcast of commutative operations. CRDTs reject the assumption that coordination is necessary for consistency. They offer eventual consistency without coordination, making them attractive for systems that prioritize availability and partition tolerance over strong consistency. The CAP theorem, which states that a distributed system cannot simultaneously guarantee consistency, availability, and partition tolerance, provides the theoretical backdrop: CRDTs choose availability and partition tolerance, while consensus-based systems choose consistency. CRDTs are not a replacement for consensus in all cases—they cannot enforce global invariants like total order or atomicity—but they have become the standard approach for collaborative editing (e.g., in Google Docs) and for replicated data in geo-distributed databases.
Today, the leading frameworks in distributed algorithms coexist in a state of productive tension. Consensus protocols (Paxos, Raft, PBFT) remain the default for systems that require strong consistency, such as replicated state machines and blockchains. CRDTs are the default for systems that can tolerate eventual consistency and need high availability, such as collaborative applications and multi-master replication. Logical time and Distributed Snapshots are used as infrastructure within both paradigms: consensus protocols rely on logical ordering, and snapshots are used for checkpointing in consensus-based systems. The synchronous/asynchronous model split continues to shape every new protocol: designers must decide whether to assume bounded delays (and risk liveness in practice) or to assume asynchrony (and accept impossibility results that force probabilistic or partially synchronous solutions). The main area of agreement across frameworks is that coordination is costly—whether through message rounds in consensus or through merge functions in CRDTs—and that the choice of model determines what is possible. The main area of disagreement is whether coordination is necessary for correctness: consensus advocates argue that strong consistency requires agreement; CRDT advocates argue that many applications can be built with commutative operations that avoid coordination entirely. This tension drives ongoing research into hybrid approaches, such as protocols that use consensus for some operations and CRDTs for others, and into new models that blur the line between synchronous and asynchronous assumptions.