Distributed Systems: Clocks, Causality, and Ordering Events

13 min readMar 20, 2023

For thousands of years, civilisations believed that the agents of spatial temporal events were either deities — far more superior than they were — , human beings, or animals. The problem of truly understanding cause and effect patterns became apparent when civilisations started building machines and implements. At that instance, when something was not working within known expectations, considering gods, human beings, or animals responsible for it became a rather outrageous proposition.

The past is the past. The present precedes the future. Cause comes before effect. Except when it doesn’t.

If you came into a room and observed a ball rolling across the floor, you would naturally look around to see who kicked it. The ball didn’t start moving of its own accord. It has no ability to move without an initial cause; an initial mover. Breakfast in your house is a causal affair. The kettle boils because you have switched it on. The toast acquires its golden crust because you put it in the toaster. The butter is available at the table because you removed it from the fridge and carried it there. For all the weirdness that the universe throws at us, these are simple truths that we can take for granted.

Cause and Effect

Things influence other things. That’s a basic statement of any dynamic world where things change, and things would be very dull if it weren’t the case — not that we’d exist to know about it — without a cause. Causality is a genetic connection of phenomena through which one thing (the cause) under certain conditions, gives rise to something else (the effect). Its essence is the generation and determination of one phenomenon by another. In this respect, causality differs from various other kinds of connection, for example, the simple temporal sequence of phenomena of the regularities of accompanying processes. Causality is an active relationship which brings to life some thing new and turns possibility into actuality. A cause is an active and primary thing in relation to the effect.

But “after this” does not always mean “because of this”. It would be a parody of justice if we were to say that where there is punishment there must have been a crime.

In the classical world we live in, causality comes with a few basic assumptions. The first big rule of classical causality is that things have causes. They don’t just happen of their own accord. If a ball moves, the likelihood is someone kicked it; if an apple falls from a tree, it’s because its weight became too great for the branch it was hanging from. Second, effects follow causes in a predictable, linear manner. You swing your leg, make contact with the ball, and off it moves, in that order and no other. Third, big effects grow up from little causes. A piston, for example, starts to move when a lot of individual hot atoms hit against it and push it a certain way. The laws of thermodynamics, which govern the way atoms move, then provide certain rules about what causes can precipitate what effects, and so an overall direction for causality — a flow of time.

The concepts of “cause” and “effect” are used both for defining simultaneous events, events that are contiguous in time, and events whose effect is born with the cause.

The connection between cause and effect takes place in time. This temporary relation may be defined in various ways. Some people believe that cause always precedes effect, that there is a certain interval between the time when the cause begins to act (for example, the interaction of two systems) and the time the effect appears. For a certain time cause and effect coexist, then the cause dies out and the consequence ultimately becomes the cause of something else. And so on to infinity. Other thinkers believe that these intervals partially overlap. It is also maintained that cause and effect are always strictly simultaneous. Still others maintain that it is pointless to speak of a cause already existing and therefore taking effect while the effect has not yet entered the sphere of existence. How can there be a “non-effective cause”?

The Concept of Time

The concept of time is fundamental to our way of thinking. It is derived from the more basic concept of the order in which events occur and is used to describe the progression of events. It is often thought of as a continuous sequence of events, with the past preceding the present and the future following it. In physics, time is often considered to be a dimension, similar to space, and is closely connected to the concept of causality.

We say that something happened at 3:15 if it occurred after our clock read 3:15 and before it read 3:16.

The nature of time and its relationship to space and matter is a topic of ongoing debate in physics and philosophy, with different theories and perspectives offering different explanations. Some philosophers and physicists argue that time is an emergent property that arises from the physical interactions of matter and energy, while others believe that it is a fundamental aspect of the universe that cannot be reduced to physical processes. The concept of time also raises important questions about causality, the nature of change, and the direction of time’s arrow.

The concept of the temporal ordering of events pervades our thinking about systems.

Time is in the first place an order relation between events, allowing you to specify whether an event a came either before or after an event b. For example, in an airline reservation system we specify that a request for a reservation should be granted if it is made before the flight is filled. However, when considering events in a distributed system, this concept must be carefully re-examined.

Relations and Order Theory

Shoppers in a grocery store are served at the cashiers’ on a first-come-first-served basis. When there are many people at the cashiers, queues are formed. People in these queues are ordered for service where those at the head of a queue are served sooner than those at the end. Cars waiting for the signal to change at an intersection are also ordered similarly. Natural numbers can also be ordered in the increasing order of their magnitude. These examples represent a few instances of order in our daily lives. The properties common to orders we see in our daily lives can be extracted and used to characterise the concepts of order. Order relations are an abstraction of those relations.

A relation is a set of tuples, all of which have the same arity.

There is a relation between two things if there is some connection between them. Relations are a structure on a set that pairs any two objects that satisfy certain properties. Examples of familiar relations are 7 is greater than 5, Alice is married to Bob, and Paul is younger than Peter. For each of these statements, the elements of a set are related by a statement. The inclusion of a tuple in a relation indicates that the components of the tuple are related. A set of n-tuples is called an n-ary relation. Sets of pairs are called binary relations, sets of triples are called ternary relations, etc.

A relation ℜ is a predicate with (at least) two arguments of arbitrary types X and Y.

Order theory is a branch of mathematics that investigates the intuitive notion of order using binary relations. Given a set, S, a binary relation in the set is some subset of the cartesian product, S×S. When this relation provides us with a method of comparison between the objects in the set, we call it an order relation, order, or ordering. To say in simpler terms, if a relation provides us with a method to rank elements of sets against one another, it is called an order relation. In a group of people, the age of each person provides us an order relation, where ranking is done by the relation “older than” or “younger than”.

Partial and Total orders

The sequence of the 2019 annular solar eclipse. (Image credit: goh keng cheong via Getty Images)

Consider a set of 2D rectangles of different sizes; let’s assume for the sake of simplicity that no two rectangles in this set have exactly the same size. We’ll say that box X fits inside box Y if we could physically enclose X inside Y; in other words, if Y’s dimensions are larger than X’s. In this case “fits” is a partial order on a set of 2D rectangular boxes, because even though we can order some of the boxes relative to each other, some other pairs of boxes have no relative order among themselves. If all pairs of boxes in this set had relative ordering, we could define a total order on the set. Another example for this is a set of 2D squares — rather than rectangles — in which all the squares in the set have unique sizes. We can always define a total order on them because for any pair of squares either the first can fit in the second, or vice versa.

The basic unit of analysis in order theory is the binary relation.

A binary relation is a list of ordered pairs, which we interpret as saying that, if {x, y} ∈ S, then x is in relation to y, whereas if {x, y} is not in S then x is not in relation to y. Any binary relation R can be divided into its symmetric (xIy if and only if xRy and yRx) and asymmetric (xPy if and only if xRy but not yRx) elements. There are a number of useful properties relations could have:

reflexivity: xRx ∀ x ∈ S
symmetry: xRy implies yRx
antisymmetric: xRyRx implies that x = y
asymmetric: if xRy then not yRx
transitive: xRyRz implies xRy
complete: xRy or yRx
acyclic: x₁Rx₂R…Rxₙ implies that x₁ ≠ xₙ

We can use these properties to define classes of preference relations: equivalence, pre-oder or preferential order, partial order, and linear or total order relations.

A relation on a nonempty set S is called a partial ordering or a partial-order relation if it is reflexive, antisymmetric, and transitive. The mathematical symbol ⪯ is often used to denote a partial ordering, and (S,⪯) is called a partially ordered set or a poset. The usual “less than or equal to” relation on 𝑅, denoted ⪯, is a perfect example of partial ordering. Obviously, if 𝑎⪯𝑏 but 𝑎≠𝑏, then we can write 𝑎≺𝑏. We sometimes say 𝑎 precedes 𝑏, or 𝑏 succeeds 𝑎. We also say 𝑎 is the predecessor of 𝑏, or 𝑏 is the successor of 𝑎.

The definition of a poset does not require every pair of distinct elements to be comparable. If a partial ordering has the additional property that for any two distinct elements 𝑎 and 𝑏, either 𝑎≺𝑏 or 𝑏≺𝑎 (hence, any pair of distinct elements are comparable), we call the relation a total ordering. In simple terms, if a partial order is complete then it is a linear or total order. A totally ordered set is called a chain.

Partial and total orders frequently come up in programming, especially when thinking about sorts. Sorting an array usually implies finding some total order on its elements. Tie breaking is important, but not always possible. If there is no way to tell two elements apart, we cannot mathematically come up with a total order, but we can still sort (and we do have a weak partial order). This is where the distinction between regular and stable sorts comes in.

The Happened-Before relation

An illustration of system of events. From — https://lamport.azurewebsites.net/pubs/time-clocks.pdf

Distributed programs are more difficult to design and to implement than their sequential counterparts. This is due to the additional problems the programmer is faced with, such as parallelism, nondeterminism, and lack of precise global time or global state information. The increased complexity of distributed programs becomes painfully apparent when such software is tested and debugged.

A system is distributed if the message transmission delay is not negligible compared to the time between events in a single process.

A distributed system consists of a collection of distinct processes which are spatially separated, and which communicate with one another by exchanging messages. A network of interconnected computers, such as the ARPA net, is a distributed system. A single computer can also be viewed as a distributed system in which the central control unit, the memory units, and the input-output channels are separate processes. A process is a sequence of totally ordered events, i.e., for any event a and b in a process, either a comes before b or b comes before a.

The natural state in a distributed system is partial order.

In a system that is distributed, neither the network nor independent nodes make any guarantees about relative order; but at each node, you can observe a local order. The events that happen in a distributed system can be classified into three fundamental categories:

local events that happen at a node and changing its state
send events that represent a node sending a message to another node to inform about a change
receive events that represent a node receiving a message from another node about a change

These events exchanged between nodes to propagate information can also propagate time changes between nodes. The happened-before relation is a partial ordering of events in distributed systems such that:

if events a and b are two events happening at the same node, the relation a → b holds if the occurrence of event a preceded the occurrence of event b. This is easy to determine for a single node.
if event a is the event of a node corresponding to sending a message and event b is the event of a different node corresponding to the receipt of the same message, then a → b.
for three events a, b, and c, if a → b and b → c, then a → c.

If two events a and b are not related by the → relation, then they are executed concurrently, denoted by a || b, thus have no causal relationship.

Logical Clocks

A vector clock in action across three nodes. From https://www.exhypothesi.com/clocks-and-causality/

When values are stored across multiple servers, it is necessary to reliably establish which values were stored before the other. Clocks keep time and produce timestamps. Conventional clocks (such as time-of-day clocks) use a common reference to learn time. That reference could be internal hardware or a common service that serves time using protocols like NTP. The system timestamp can not be used to define ordering because wall clocks are not monotonic and clock values from two different servers should not be compared.

As there is no upper bound on clock drift across servers, it is impossible to compare timestamps on two different servers.

The system timestamp, which represents the time of the day, is measured by a clock machinery generally built with a crystal oscillator. The known problem with this mechanism is that it can drift away from the actual time of the day, based on how fast or slow the crystals oscillate. To fix this, computers typically have a service like NTP which synchronizes computer clocks with well known time sources on the internet. Because of this, two consecutive readings of the system time on a given server can have time going backwards.

When an event a in a system happens before another event b, it might have a causal relationship. Causal relationship means that a might have some role in causing b. This ‘a happens before b’ relationship is established by attaching a timestamp to each event. If a happens before b, the timestamp attached to a will be lower than the timestamp attached to b. But because we can not rely on system time, we need some way to make sure that the happens-before relationship is maintained for the timestamp attached to the events.

Simple counters that assign numbers to events.

To elegantly solve the problem Leslie Lamport, in his seminal paper Time, Clocks and Ordering Of Events, suggested a solution to use logical clocks that is now known as Lamport timestamps. Every process Pi is going to have its own clock function Ci that assigns a number Ci(a) to any event in that process. We assume these clocks aren’t tied to the physical concept of time, so all it takes to make this system of clocks correct are a few rules:

if a and b are events in Pi and a comes before b, then Ci(a) < Ci(b)
if a is sending a message from Pi to b in Pj, then Ci(a) < Cj(b)

These two rules are essentially establishing that if a → b (a happened before b), then C(a) < C(b); thus a’s timestamp is lower than b’s. This is called the Clock Condition which can be coded into an algorithm for a system of clocks that ensures you can always tell the order of two events in the system:

a process Pi advances its clock Ci between successive events
every message must contain a timestamp Tm of when it was sent and when a process receives the message, it advances its clock to be greater than Tm

When two events can’t influence each other through messages they are considered concurrent despite their ordering in physical time. We can now use these timestamps to order all events in the system using an arbitrary ordering when timestamps are ambiguous. It’s important to note that while the partial ordering from before is unique, the total ordering depends on clock choice and the arbitrariness of ordering events where no causal link exists.

A limitation of Lamport Clock (or Scalar Logical Clock) is the following: x → y implies C(x) < C(y) but C(x) < C(y) doesn’t imply x → y. Also, in order to become causally consistent we need a way of representing local time and global time separately. This is where advanced implementations of logical clocks such as the Vector Clock come in.