Distributed Systems: CAP Theorem

Wahome
12 min readApr 3, 2023

--

Photo by Alina Grubnyak on Unsplash

Distributed computing is a model in which an arbitrary property, such as processing or data storage, is spread across several actors. It is characterised by a collection of independent components that are distributed across multiple computing systems linked together using a network. These components split up the work, communicate, and coordinate their efforts by passing messages to achieve a common goal.

“Don’t communicate memory by sharing. Share memory by communicating” — Rob Pike

Most distributed systems, even the ones that work, tend to be very fragile: they are hard to keep up, hard to manage, hard to grow, hard to evolve, and hard to program. There are several issues that are prevalent in the way we think about these systems. These issues include the fault model, high availability, graceful degradation, data consistency, evolution, composition, and autonomy. Distributed systems’ engineering is full of tradeoffs, with tensions between a variety of these issues.

A Distributed System

Consider a very simple distributed system that is composed of two nodes,
N₁ and N₂. Both of these nodes are keeping track of the same variable, v, whose value is initially v₀. N₁ and N₂ can communicate with each other and can also communicate with external clients. A client can request to write and read from any node.

When a node receives a request, it executes any necessary computations and then sends an appropriate response to the client.

And here is what a read looks like:

When designing distributed systems, there are three properties that are commonly desired: consistency, availability, and partition tolerance.

Consistency

You have recently moved to a new house across the city and want to update the address registered with your Internet service provider. You call the customer service centre, a representative picks your call, and you have them make the update. However, once your call ends, it occurs to you that while the street name has been updated, you retained your old house number. Thus, you call the customer service centre again. This time when you call, you are connected to a different representative who can access your records as well. The representative knows that you have recently updated your address. They make the relevant changes in the house number while the rest of the address stays the same per the last update. This is called Consistency because even though you connected to a different representative, they were able to retrieve the same information.

“Any read operation that begins after a write operation completes must return that value, or the result of a later write operation.” — Gilbert & Lynch³

Concurrent systems, including both multiprocessor and distributed systems, often rely on object replication for improving performance and scalability. Object replication is the idea of keeping localised copies of shared data structures to speed up read operations. However, this scale and performance optimization requires that all local copies of data have a single, up-to-date, readable version available to all clients, i.e, be consistent with each other.

Every write operation needs to be replicated or made visible across all the local copies “instantaneously”, which in real-world systems is not possible. Network delays in distributed systems and overlapping operations on shared data structures within multiprocessor systems are some of the major challenges in ensuring consistency. In a consistent system, once a client writes a value to any server and gets a response, it expects to get that value (or a fresher value) back from any server it reads from.

Following the simple distributed system above, below is an example of an inconsistent state:

The client writes v₁ to N₁ and N₁ acknowledges, but when it reads from N₂, it gets stale data v

and here is an example of a consistent state:

N₁ replicates its value to N₂ before sending an acknowledgement to the client. Thus, when the client reads from N₂, it gets the most up to date value of v₁.

Availability

Imagine that there is a very popular Internet service provider in your city and you are one of its customers because of the amazing plans offered. You like them because, they have an amazing customer service experience that allows you to call anytime and raise queries and concerns which are promptly and expertly answered. Whenever a customer calls, the internet service provider is able to connect them to one of their customer service representatives. This is called Availability because every customer is able to connect to a representative and get the help they seek.

“Every request received by a non-failing node in the system must result in a response.” — Gilbert & Lynch³

In an available system, if a client sends a request to a server and the server has not crashed, then the server must eventually respond to the client. The server is not allowed to ignore the client’s requests.

Partition Tolerance

Any messages N₁ and N₂ send to one another can be dropped. If all the messages were being dropped, then our system would look like this.

You have recently noticed that your current mobile and Internet service plan does not suit you. Your need for mobile data has significantly reduced because you have good Wi-Fi facilities both at home and at the office, and you hardly step outside that much these days. You would like to update your plan to only include talk time and messaging. Thus you decide to call the customer service centre once again.

On connecting with the representative this time, they inform you that they have not been able to update their records due to some technical issues. The information available to the customer service representative might not be up to date, therefore they cannot make updates to it. This is called lack of Partition tolerance because the service is broken due to a communications breakdown.

“The network will be allowed to lose arbitrarily many messages sent from one node to another.” — Gilbert & Lynch³

Partition tolerance is the ability of a system to keep responding to client requests even if there is a communication failure — a lost or temporarily delayed connection between two nodes or partitions, i.e., a network partition.

Brewer’s Conjecture

Brewer’s CAP Theorem illustrated using a Venn diagram.

Prof. Eric A. Brewer¹ gave a conference keynote at the Principles of Distributed Computing (PoDC) 2000 in which he conjured up the theory that there are three core systemic requirements that exist in a special relationship when it comes to designing and deploying applications in a distributed environment: consistency, availability, and tolerance to network partitions. But of these three desired properties of shared data systems, only two can be achieved at any given point in time.

“These are not (yet) provable principles, but merely ways to think about the issues that simplify design in practice. They draw on experience at Berkeley and with giant-scale systems built at Inktomi, including the system that handles 50% of all web searches.” — Brewer

Brewer’s 2000 talk¹ was based on his theoretical work at UC Berkley and observations from running Inktomi, though he and others had been talking about trade-off decisions that need to be made in highly scalable systems years before that as evidenced by “Cluster-Based Scalable Network Services” from SOSP in 1997 and “Harvest, yield, and scalable tolerant systems” in 1999.

Proof by Contradiction

Assume, for contradiction, that there does exist a system that is consistent, available, and partition tolerant.

We start by partitioning this utopian system:

Next, the client requests that v₁ be written to N₁. Since the system under consideration is available, N₁ must respond but since the network is also partitioned, N₁ cannot replicate its data to N₂.

Gilbert and Lynch³, in their proof, call this phase of execution α₁

Next, the client issue a read request to N₂. Again, since the system under consideration is available, N₂ must respond but since the network is also partitioned, N₂ cannot update its value from N₁. It returns v₀.

Gilbert and Lynch³, in their proof, call this phase of execution α₂

N₂ returns v₀ to the client after the client had already written v₁ to N₁. This is inconsistent.

Having assumed a consistent, available, and partition tolerant system existed, the sequence of steps above illustrate that there exists an execution for any such system in which it acts inconsistently. Thus, no such system exists by this contradiction.

Trade-offs

CAP theorem addresses a perceived impossibility of building large-scale and clustered (web) service architectures. The practical impossibility raised by Brewer’s conjecture is that guaranteeing all three desired properties of a distributed architecture — consistency, availability, and partition tolerance — is impossible. There is a thus a implied trade-off that is advanced by this theorem.

“The most natural way of formalizing the idea of a consistent service is an atomic data object.”

Gilbert and Lynch³ gave a more formal definition to Brewer’s Conjecture¹ in which they defined the three properties as:

  • Consistency: atomic or linearizable consistency is the condition expected by web services in which there must exist a total order on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time.
  • Availability: every request received by a non-failing node in the system must result in a response. That is, any algorithm used by the service must eventually terminate. When qualified by the need for partition tolerance, this can be seen as a strong definition of availability: even when severe network failures occur, every request must terminate.
  • Partition tolerance: the network will be allowed to lose arbitrarily many messages sent from one node to another. No set of failures less than total network failure is allowed to cause the system to respond incorrectly.

Distributed systems realistically cannot exist without network partitions. For example, handling partition cases is the foundation of distributed databases thus we cannot sacrifice tolerance to network partitions when we talk about distributed databases. Hence, partition tolerance is a given for any distributed system, otherwise, it would not sustain network faults. What the conjecture theorises, therefore, is that distributed architectures need to decide on a trade-off between high availability and partition tolerance (AP) on one hand or consistency and partition tolerance (CP) on the other. But what about designing for high availability and consistency?

Linearizability vs Serializability

Linearizability is a guarantee about single operations on single objects. It provides a real-time (i.e., wall-clock) guarantee on the behaviour of a set of single operations (often reads and writes) on a single object (e.g., distributed register or data item). Under linearizability, writes should appear to be instantaneous. In simpler terms, once a write completes, all later reads (where “later” is defined by wall-clock start time) should return the value of that write or the value of a later write. Once a read returns a particular value, all later reads should return that value or the value of a later write.

Linearizability: single operation, single object, and real-time order. Image from https://blog.acolyer.org/2016/02/26/distributed-consistency-and-session-anomalies/

For example, consider the following write operations: W₁ at time T₁ denoted by: <W₁, T₁> and similarly <W₂, T₂>, <W₃, T₃> on an object O such that T₁ < T₂ < T₃. Any read operation on object O₁ scheduled after T₃, should return the value set by operation W₃ notwithstanding whether all the three write operations are dispatched by the same client (same core or node) or by different clients.

Linearizability requires the global order of operations to be preserved.

Linearizability for read and write operations is synonymous with the term “atomic consistency” and is the “C,” or “Consistency,” in Gilbert and Lynch’s proof of the CAP Theorem. This is the strongest form of consistency and gives correct results as if all the operations occur atomically and on a single copy shared data type. Linearizability is also composable (or “local”) — any combination of two linearizable objects is also linearizable given that we preserve the total order of events. If operations on each object in a system are linearizable, then all operations in the system are linearizable.

Serializability: multi-operation, multi-object, arbitrary total order. Image from https://blog.acolyer.org/2016/02/26/distributed-consistency-and-session-anomalies/

Serializability is a guarantee about transactions, or groups of one or more operations over one or more objects. It guarantees that the execution of a set of transactions (usually containing read and write operations) over multiple items is equivalent to some serial execution (total ordering) of the transactions. It is the traditional “I,” or “Isolation”, in ACID. If users’ transactions each preserve application correctness (“C,” or “Consistency”, in ACID), a serializable execution also preserves correctness. Therefore, serializability is a mechanism for guaranteeing database correctness.

Unlike linearizability, serializability does not — by itself — impose any real-time constraints on the ordering of transactions. Serializability is also not composable. It does not imply any kind of deterministic order — it simply requires that some equivalent serial execution exists. Combining serializability and linearizability yields strict serializability: transaction behaviour is equivalent to some serial execution, and the serial order corresponds to real-time.

Sequential, Causal, and Eventual Consistency

Consistency Models in Distributed Systems. Image from https://xzhu0027.gitbook.io/blog/misc/index/consistency-models-in-distributed-system

Maintaining linearizability (strong consistency) is extremely difficult for a distributed system. Network delays in distributed systems and overlapping operations on shared data structures within multiprocessor systems are some of the major challenges in ensuring consistency. To overcome some of these issues, without sacrificing correctness and performance, the research community has proposed various “relaxations” in the consistency requirement to derive different types of consistency such as sequential, causal and eventual consistency.

Sequential consistency is not compositional and modern microprocessors, by default, do not offer sequential consistency.

Unlike linearizability, sequential consistency preserves the local ordering of program instructions i.e. the system can freely interleave, in any order, the operations coming from different clients but it must preserve the order of instructions coming from the same client. Although a weaker consistency guarantee than linearizability, it also requires the operations to occur one at a time as if they are executed atomically on a single copy of the object. However, programmers can explicitly ask for it by using compiler directives which compiles down to memory fences that force the CPU to update the local copy of the shared object with other copies. Moreover, observational refinement implies sequential consistency if and only if two threads are sequentially consistent and are non-interfering — except through the shared concurrent data structure.

If a system is sequentially consistent, it is also causally consistent.

Causal consistency is a weaker consistency model that allows asynchronous operation between two or more clients. It allows nodes (or processors) to write to their respective local copies of the shared object without the “immediate” need for those local copies to be in sync with each other. Causal consistency captures the potential causal relationships between operations and guarantees that all processes observe causally-related operations in a common order. It is stronger than eventual consistency but weaker than sequential consistency or linearizability.

Let updates happen, worry about deciding their order later.

The eventual consistency model was popularized by CAP theorem with the main highlighted problem being network partitions. In the case of a network partitioning, there is no way all nodes in a distributed system can remain in communication with each other thus, in order to keep them consistent a compromise on availability has to be made — i.e., to avoid divergence, block all the nodes until the network (communication) is restored. Considered one of the “weak” consistency schemes, eventual consistency is a theoretical guarantee that, provided no new updates to an entity are made, all reads of the entity will eventually return the last updated value.

--

--