Distributed Systems: Consensus, Elections, and Fault Tolerance

11 min readMar 24, 2023

--

The Roman Forum at sunrise. L-R: Temple of Vespasian and Titus, church of Santi Luca e Martina, Septimius Severus Arch, and the ruins of the Temple of Saturn. (Image credit: mammuth via Getty Images)

Growing into one of the largest empires in human history is an incredible achievement. The Roman Kingdom, then Republic, and then Empire gained this status through more than a thousand years of expansion, conquest, trade, and internal development. It grew from a city-state to a colossal empire stretching from Britain to Egypt. For centuries, Rome was the centre of the empire, but as the city’s fortunes changed, the seat of power eventually shifted away from it, and the empire permanently split into two separate states — one in the east with Rome as its capital, and one in the west with Constantinople as its capital.

“This agglomeration which was called and which still calls itself the Holy Roman Empire was neither holy, nor Roman, nor an empire.”― Voltaire

The old adage “Rome wasn’t built in a day” claims that great things take time and patience to implement. While quite obviously true in reference to the rise of Rome, this expression did not originate in ancient Rome contrary to popular belief. In fact, Rome played no part in the coinage of the iconic phrase which traces to medieval France where it was first mentioned in the collection Li Proverbe au Vilain circa 1190. It read as “Rome ne fu pas faite toute en un jour”, and did not make the leap into the now popular English proverb until the John Heywood publication, “A Dialogue Conteinyng the Nomber in Effect of all the Prouerbes in the Englishe Tongue”, of 1538.

And just as Rome and its empire wasn’t — quite literally — built in a day, it was not destroyed in one either.

In 286 CE, the Emperor Diocletian restructured the Roman government by establishing the Tetrarchy, a system in which four men shared rule over the massive Roman Empire. However, the system broke down very quickly thereafter and by 313, there were only two emperors: Constantine in the west and Licinius in the east. The tetrarchic system was at an end, although it took until 324 for Constantine to finally defeat Licinius, reunite the two halves of the Roman Empire, and declare himself sole Augustus. Constantine abandoned the decaying city of Rome and moved his court to Byzantium, an ancient port town strategically located on the Bosporus strait separating Europe and Asia, marking the dawn of the the Byzantine Empire or Byzantium.

Basileia tōn Romaiōn, the Greek-speaking Roman Empire of the Middle Ages.

The Byzantine Empire lasted for over 1100 years, from 330–1453, surviving through the Late Antiquity and the Middle Ages. It owed much of its military success to Greek Fire, a mysterious incendiary liquid that was used to set enemy troops and ships ablaze. Once ignited, the thick, sticky substance that could be sprayed from siphons or hurled in clay pots like grenades, could not be extinguished with water and could even burn on the surface of the sea. Greek Fire was most famously associated with the Byzantine navy, who used it to devastating effect against Arab and Russian invaders during sieges of Constantinople in the seventh, eighth and tenth centuries.

While Byzantine is better known for its military prowess and ancient napalm whose precise recipe — it might have contained everything from petroleum and pine resin to sulphur and saltpetre — has been lost to history, this vast ancient civilisation is brought to life once more in game theory and computer science as an allegory describing a situation in which, in order to avoid catastrophic failure of a system, its actors must agree on a concerted strategy, but some of these actors are unreliable.

Byzantine Generals’ problem

If all generals attack in coordination, the battle is won (left). If two generals falsely declare that they intend to attack, but instead retreat, the battle is lost (right). Image from Wikipedia

Imagine that several divisions of the revered Byzantine army, each commanded by its own general, are camped outside an enemy city which they intend to capture. There is fierce resistance from within the city but the Byzantine army has completely encircled it. The generals communicate with one another as well as with all lieutenants within their division only by messengers. After observing the enemy, they must decide upon a common plan of action that includes among other things, an exact time to attack all at once and an exact time to retreat all at once if faced by fierce resistance. If the attack or retreat is without full strength, then it means only one thing — unacceptable defeat. However, some of the generals and lieutenants may be traitors, trying to prevent the loyal ones from reaching agreement, and no general knows, a priori, whether the other is loyal or treacherous.

Not only that they may fail to communicate, but that they may communicate incorrect or inconsistent information at will.

In their 1982 paper¹, “The Byzantine Generals Problem”, Leslie Lamport, Robert Shostak, and Marshall Pease illustrated the problem of achieving consensus in a distributed system under the most extreme failure model for components, using a thought experiment based on the grand Eastern Roman empire better known as Byzantine. Lamport et al drew inspiration from Edsger Dijkstra’s Dining Philosopher’s Problem — another famous computer science thought experiment — whose popularity demonstrated to Lamport that the best way to attract attention to a problem is to present it in terms of a story. Lamport wanted to assign the generals a nationality that would not offend any readers.

“I stole the idea of the generals and posed the problem in terms of a group of generals, some of whom may be traitors, who have to reach a common decision.” — Leslie Lamport

The elegant marrying of a healthy imagination and medieval history to demonstrate quite an abstract computer science problem aside, the generals are collectively known as processes, the commanding general who initiates the order is the source process, and the orders sent to the other processes are messages. Loyal generals and lieutenants are correct processes while traitorous ones are faulty processes. The order to retreat or attack is a message with a single bit of information: a one or a zero. The Byzantine Generals’ Problem explores the limits of consensus in a distributed system with unreliable components and a reliable network. Lamport, in his own words, “stole” the idea of the generals from a related and likewise brilliantly narrated but somewhat different problem called the Two Generals Problem.

Two Generals’ Problem

Imagine two armies, each led by a different general, are preparing to attack a fortified city. The armies are encamped near the city, each in its own valley. Both generals want to capture the city, but neither side has an army large enough to do so alone. Both must attack the city at the same time to have a chance at taking it. A third valley separates the two hills, and the only way for the two generals to communicate is by sending messengers through the valley. Unfortunately, the valley is occupied by the city’s defenders and there is a chance that any given messenger sent through the valley will be captured.

The Two Generals’ Problem is a classic unsolvable problem in distributed systems that concerns reaching consensus over a lossy network.

While the two generals have agreed to a coordinated attack, they have not agreed upon a time for it. They must thus communicate with each other to decide on a time and to agree to proceed with the attack at that time, and each general must know that the other knows that they have reached an agreement. Because acknowledgement of message receipt can be lost as easily as the original message, a potentially infinite series of messages is required to come to consensus. The thought experiment involves considering how they might go about coming to a consensus. It explores the limits of consensus in a distributed system with reliable components and an unreliable network.

The Two Generals’ Problem seems innocuous enough, but it turns out that there does not exist an algorithm that can guarantee consensus. In its simplest form, one general is known to be the leader, decides on the time of the attack, and must communicate this time to the other general. Allowing that it is quite simple for the generals to come to an agreement on the time to attack —i.e. one successful message with a successful acknowledgement — the subtlety of the Two Generals’ Problem is in the impossibility of designing algorithms for the generals to use, including sending messages and processing received messages, to allow them to correctly conclude.

This thought experiment is also referred to as the Chinese Generals’ Problem or the Two Thieves Paradox.

Let us consider that one of the two generals wants to attack. The general writes “attack” on a piece of paper, places it in an envelope, and sends it off to the other general. One of two things could happen:

the message could be intercepted by a nefarious actor and destroyed, never to reach the other general
the message could be intercepted by a nefarious actor but is delivered to the other general

The initiating general does not know which of these events occurs. Due to the uncertainty about the message being delivered, this general cannot decide to attack without the risk that the other decided to retreat. That is, they cannot attack with the certainty of consensus.

The paradox is about sending acknowledgements to acknowledgements without knowing when to stop.

Perhaps we can solve this uncertainty with acknowledgements. Let’s say the cooperating general receives the message upon which they respond with an acknowledging message. This message could also either be destroyed or delivered. If the message is destroyed, the initiating general does not receive an acknowledgement and cannot make a certain decision. The cooperating general cannot also make a decision without the possibility that the initiating general makes the opposite decision.

Perhaps we can solve this uncertainty with more acknowledgements. The cooperating general waits for an acknowledgement of their acknowledgement. But, after sending it, the initiating general is uncertain that it was delivered. If it wasn’t, then this general would still be uncertain on what decision to make. This cycle continues indefinitely. At no point is either general certain what the other decides.

Agreement amidst Faults

Mutual agreement or consensus is the paradigmatic problem in fault-tolerant distributed computing where nodes (or processors) often compete as well as cooperate to achieve a common goal. It requires networked nodes that communicate by message passing to agree on common value even in the presence of benign or malicious faults. For example, in distributed database systems, data managers at nodes must agree on whether to commit or to abort a transaction. Prior to the 1982 paper by Lamport, Shostak, and Pease, it was generally assumed that a three-processor system could tolerate one faulty processor. However, the work of the trio illustrated that “Byzantine” faults, in which a faulty processor sends inconsistent information to the other processors, can defeat any traditional three-processor algorithm.

In general, 3n+1 processors are needed to tolerate n faults. However, if digital signatures are used, 2n+1 processors are enough.

In agreement problems, a very general model of processor failures is considered. A processor can fail in three modes: crash fault, omission fault, and malicious fault. In a crash fault, a processor stops functioning and never resumes operation. In an omission fault, a processor “omits” to send messages to some processors. In a malicious fault, a processor behaves randomly and arbitrarily. For example, a processor may send fictitious messages to other processors to confuse them. Malicious faults are very broad in nature and thus most other conceivable faults can be treated as malicious faults. They are also referred to as Byzantine faults.

There are three well known agreement problems in distributed computing: the Byzantine agreement problem, the consensus problem, and the interactive consistency problem. In the Byzantine agreement problem, a single value, which is to be agreed on, is initialized by an arbitrary processor and all nonfaulty processors have to agree on that value. In the consensus problem, every processor has its own initial value and all nonfaulty processors must agree on a single common value. In the interactive consistency problem, every processor has its own initial value and all nonfaulty processors must agree on a set of common values. In general, a solution to an agreement problem must pass three tests:

termination — eventually, every correct process decides some value
agreement — if all the correct processes proposed the same value v, then any correct process must decide v
validity — every correct process must agree on the same value.

Leader and Followers

Black vector silhouette of fantasy king and guards riding horses. Image from https://www.istockphoto.com/portfolio/Cattallina?mediatype=illustration

While it does not lend itself to an easy naive solution, a pragmatic approach to dealing with the problem of reaching agreement over an unreliable network is to use schemes that accept the uncertainty of the communications channel and not attempt to eliminate it, but rather mitigate it to an acceptable degree. For example, in the Two Generals’ Problem, the first general could send 100 messengers, anticipating that the probability of all being captured is low. With this approach, the first general will attack no matter what, and the second general will attack if any message is received. Alternatively, the first general could send a stream of messages and the second general could send acknowledgments to each, with each general feeling more comfortable with every message received.

The main assumption here is to accept the uncertainty of the communication channel and mitigate it to a sufficient degree.

Another pragmatic solution to agreement problems in distributed computing is to designate one node (processor) amongst the cluster as leader. The leader is responsible for taking decisions on behalf of the entire cluster and propagating them to all the other nodes — followers. Every node at startup looks for an existing leader. If no leader is found, it triggers a leader election. The nodes accept requests only after a leader is selected successfully. The Paxos consensus algorithm published by Leslie Lamport in his 1998 paper The Part-Time Parliament¹¹ in which he unsurprisingly cast the algorithm in terms of a parliament on an ancient Greek island, and variants of it such as Raft¹², are used pervasively in widely deployed distributed and cloud computing systems. These algorithms are typically synchronous, dependent on an elected leader to make progress, and tolerate only crashes and not Byzantine failures.

Distributed Systems: Consensus, Elections, and Fault Tolerance

Byzantine Generals’ problem

Two Generals’ Problem

Agreement amidst Faults

Leader and Followers

References

Written by Wahome

No responses yet