Distributed Systems: Transactions, Atomic Commitment, Sagas

Wahome
12 min readApr 19, 2023
Image from https://commons.wikimedia.org/wiki/File:Social_Network_Analysis_Visualization.png

Computing systems are completely clockwork at the scale of one function where the system is composed of static events and even perturbations resolve into an outcome with a definite end such that given particular input the output is always the same, and completely complex in production where the system is an ongoing, dynamic process that is always in a state of flux with parts adapting or evolving.

“Clockwork is the best! On short timescales, small physical scales — one computer — , and without human interaction. For everything else, there’s complexity.”

In a distributed system, it is impossible to say for sure what events happened before others. We get into general relativity complications even at short distances, because information travels through networks at unpredictable speeds. Thus, there is no one such thing as time, no single sequence of events that says what happened before what. There is time-at-each-point, but inventing a convenient fiction to reconcile them is a tough row to hoe; a topic we have canvassed in an earlier post about clocks, causality, and ordering events in distributed systems. This problem is typically dealt with by funneling every event through a single point: a transactional database.

A Transaction

Image from https://algodaily.com/lessons/transactions-in-sql

A transaction is a set of related tasks that either succeed (commit) or fail (abort) as a unit, among other things. Consider the classic example of transferring money: Alice wants to transfer the sum of 100 shillings from her account to Bob’s, being payment for goods ordered online. To accomplish this transfer, the principal sum of 100 shillings needs to be deducted from Alice’s bank balance after which it is added to Bob’s balance. A guarantee must be provided that the entire two-part transfer operation — the deduction from Alice’s balance and the addition to Bob’s — either succeeds or fails atomically as a whole. If one operations succeeds while the other fails, the transfer would result in an inconsistent state.

A transaction is a unit of reliable and consistent computation.

There are four prevailing properties of transactions: Atomicity, Consistency, Isolation, and Persistence which are codified into the ACID acronym. Atomicity requires that each transaction be “all or nothing”, i.e., if one part fails, then the entire transaction fails and the database remains unchanged w.r.t the actions in the transaction. Consistency requires that the pre-state and post-state of a transaction on a database be valid. Typically this refers to having all constraints on the data structures in the database be valid. Isolation refers to the degree of serializability. Providing maximum isolation for concurrent transactions would have an equivalent post-state as running all transactions serially. Durability refers to a guarantee that once a transaction has been committed, it will remain committed even in the event of some crash or power-loss.

A Distributed Transaction

A distributed system is about spreading an arbitrary property across several actors. A distributed system consists of a collection of distinct processes which are spatially separated, and which communicate with one another by exchanging messages. A network of interconnected computers, such as the ARPA net, is a distributed system. A single computer can also be viewed as a distributed system in which the central control unit, the memory units, and the input-output channels are separate processes. An application that is decomposed into loosely coupled, independent and autonomous services that collaborate and communicate with each other is a distributed system.

Microservices decentralize decisions around data persistence and storage.

Transactions that update multiple business entities are fairly common. These kinds of transactions are trivial to implement in a monolithic application because there is a single database. When a microservice architecture decomposes a monolithic system into self-encapsulated services, it can break transactions resulting in a local transaction in the monolithic system being distributed into multiple services that will be called in a sequence.

Let us consider the example of placing an order in an e-commerce system. In the monolithic system (single database) depicted below, if a user makes a Place Order request, the system creates a local database transaction that wraps the operations on the Order and Inventory tables. If any step fails, the transaction can roll back hence both the order and inventory updates are reserved:

Place Order transaction in a monolithic e-commerce system

Suppose the same e-commerce platform is decomposed into single-function, loosely coupled microservices: Order and Inventory. This decomposition has the implication of distributing the place order transaction because each microservice owns and encapsulates its state — the database per service pattern. When a Place Order request is made by a user, both microservices will be invoked to apply changes into their own database:

Place Order transaction in a microservice e-commerce system

A distributed transaction is a transaction that affects several resources. For a distributed transaction to commit, all participants must guarantee that any change to data will be permanent. Changes must persist despite system crashes or other unforeseen events. If even a single participant fails to make this guarantee, the entire transaction fails, and any changes to data within the scope of the transaction are rolled back. ACID is most commonly associated with transactions on a single database server, but distributed transactions extend that guarantee across multiple databases.

Atomic Commitment Problem

An atomic commit is an operation that applies a set of distinct changes as a single operation. If the changes are applied, then the atomic commit is said to have succeeded. If there is a failure before the atomic commit can be completed, then all of the changes completed in the atomic commit are reversed. This ensures that the system is always left in a consistent state.

The problem with atomic commits is that they require coordination and agreement between multiple systems.

Distributed transaction processing systems are designed to facilitate transactions that span heterogeneous, transaction- aware resource managers in a distributed environment. A distributed transaction will consist of a local transaction at each of the participating sites such that if any local transaction aborts, the distributed transaction aborts and if all local transactions commit, the distributed transaction commits. The execution of a distributed transaction requires coordination between a global transaction management system and all the local resource managers of all the involved systems.

Mutual agreement in a distributed system is a fundamental issue of both theoretical and practical importance.

Reaching agreement is a paradigmatic problem in fault-tolerant distributed computing where nodes or processors often compete as well as cooperate to achieve a common goal; a topic that we have also canvassed in an earlier post about consensus, elections, and fault tolerance. Consensus and non-blocking atomic commitment are two well known versions of this paradigm. The consensus problem considers a fixed collection of processors each with its own initial value, and all non-faulty processors must agree on a single common value. The non-blocking atomic commitment (NB-AC) problem arises in distributed systems to ensure the consistent termination of transactions.

Each process that participates in the processing of a database transaction arrives at an initial opinion (vote) about whether to commit the transaction or abort it. Processes must eventually reach a common decision (commit or abort). The decision to commit may be reached only if all processes initially vote to commit. In this case, “commit” must be reached if there is no failure. This is achieved by employing an atomic commit protocol (ACP) that executes a commit or an abort operation across multiple sites as a single logical operation.

Two-Phase Commit (2PC)

Unlike a transaction on a local database, a distributed transaction involves altering data on multiple, often distributed, databases. Consequently, distributed transaction processing is more complicated, because the coordination of the transaction as a self-contained unit involves many participating sites that need to guarantee atomicity. To ensure that a distributed transaction satisfies ACID properties, an atomic commit protocol should be used. The requirements of the atomic commitment problem are similar to those of the consensus problem and are summarized below:

  • termination — eventually, every correct process decides some value
  • agreement — if all the correct processes proposed the same value v, then any correct process must decide v
  • validity — every correct process must agree on the same value.

In a distributed system, a transaction is inevitably decomposed into a set of sub-transactions, each of which executes at a single participating site. Assuming that each database site preserves atomicity of (sub)transactions at its local level, global atomicity cannot be guaranteed without taking additional measures. This is because without global synchronization a distributed transaction might end-up committing at some participating sites and aborting at others due to a site or a communication failure. Thus, jeopardizing global atomicity and, consequently, the consistency of the (distributed) database.

To ensure transaction failure atomicity in a distributed system, an agreement problem must be solved among a set of participating processes.

Two-phase commit (2PC) is a synchronization protocol that solves the atomic commitment problem, a special case of the Byzantine Generals problem. The essence of two phase commit, unsurprisingly, is that it carries out an update in two phases:

  • the first, prepare, the initiating node in the transaction asks the other participating nodes to promise to commit or roll back the transaction.
  • the second, commit, the initiating node asks all participating nodes to commit the transaction. If this outcome is not possible, then all nodes are asked to roll back.

When a participant receives a prepare-to-commit message, it validates the transaction with respect to data consistency. If the transaction can be committed (i.e., it passed the validation process), the participant responds with a “yes” vote. Otherwise, it responds with a “no” vote and aborts the transaction, releasing all the resources held by the transaction. If a participant had voted “yes”, it can neither commit nor abort the transaction unilaterally and has to wait until it receives a final decision from the coordinator. In this case, the participant is said to be blocked for an indefinite period of time called window of uncertainty (or window of vulnerability) awaiting the coordinator’s decision.

The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. The coordinator is a single point of failure.

When a participant receives the final decision, it complies with the decision, sends back an acknowledgement message, ack, to the coordinator and releases all the resources held by the transaction. When the coordinator receives ack’s from all the participants, it forgets the transaction by discarding all information pertaining to the transaction from its protocol table that is kept in main memory. If the coordinator fails permanently, some participants will never resolve their transactions: after a participant has sent an agreement message to the coordinator, it will block until a commit or rollback is received. Two phase commit does not scale well. O(n²) messages are required in worst case and throughput is limited by the slowest node in the cluster.

A Saga

Long running transactions hold on to database resources for relatively long periods of time, significantly delaying the termination of shorter and more common transactions. A long-running transaction is an interactive component of a distributed system which must be executed as if it were a single atomic action. It is a transaction whose execution, even without interference from other transactions, takes a substantial amount of time, possibly on the order of hours or days. A long lived transaction, has a long duration compared to the majority of other transactions either because it accesses many database objects, it has lengthy computations, it pauses for inputs from the users, or a combination of these factors.

Examples are transactions to produce monthly account statements at a bank, transactions to process claims at an insurance company, and transactions to collect statistics over an entire database.

To alleviate these problems, Garcia-Molina and Salem in their 1987 paper⁹, proposed the notion of a saga as a way to ensure data consistency in a distributed architecture without having a single ACID transaction. A long lived transaction is a saga if it can be written as a sequence of sub transactions T₁, T₂, T₃….Tₙ, each independently running in its own context, that can be interleaved with other transactions. Sub-transaction should not have a dependency on one another and they may or may not be ACID compliant.

A sequence of two transactions is not a transaction.

A process P can be defined as a sequence of steps a, b, …, where each step is a transaction. This sequence of steps a, b, …, is called a trace in which the sequential composition of transactions is not closed under atomicity.

trace(P) = a • b • …

If a process consists of a single transaction, the process exhibits the atomicity guarantees of a transaction:

A process P with one step a

If a process consists of multiple transactions, it does not exhibit the atomicity guarantees of a transaction. Thus, the process may execute partially:

A process P with two steps a and b

But, any partial executions of a process are undesirable because its transactions are related and should be executed as a unit. The saga pattern provides transaction management using a sequence of local transactions. A local transaction is the atomic work effort performed by a saga participant. Every operation that is part of the saga can be rolled back by a compensating action. The pattern guarantees that either all operations complete successfully or the corresponding compensation actions are run for all executed operations to roll back any work previously done.

A transaction and a compensating action

To mitigate against partial execution, a process may implement either backward or forward recovery. Backward recovery refers to failure mitigation strategies that transition the system from the intermediate state to a state that is equivalent to the initial state; rolling the process backward. In this paradigm, compensating transactions try to restore the sub-transaction by nullifying it. So, basically compensating transactions rollback the effect of the sub transactions. Forward recovery refers to failure mitigation strategies that transition the system from the intermediate state to a state that is equivalent to the final state; moving the process forward. There are no compensating transactions rather the saga attempts retries by restarting the sub-transaction from the beginning. In a forward recovery saga, each transaction has to be idempotent.

Backward and forward recovery

In saga patterns:

  • compensable transactions are transactions that can potentially be reversed by processing another transaction with the opposite effect.
  • a pivot transaction is the go/no-go point in a saga. If the pivot transaction commits, the saga runs until completion. A pivot transaction can be a transaction that is neither compensable nor retryable, or it can be the last compensable transaction or the first retryable transaction in the saga.
  • retryable transactions are transactions that follow the pivot transaction and are guaranteed to succeed.

There are two common saga implementation approaches, orchestration and choreography. Orchestration is a way to coordinate sagas where a centralized controller tells the saga participants what local transactions to execute. The saga orchestrator handles all the transactions and tells the participants which operation to perform based on events. The orchestrator executes saga requests, stores and interprets the states of each task, and handles failure recovery with compensating transactions. Choreography is a way to coordinate sagas where participants exchange events without a centralized point of control. With choreography, each local transaction publishes domain events that trigger local transactions in other services.

--

--