Distributed Systems: A Primer

Wahome
6 min readMar 13, 2023

--

Image from https://www.abiprod.com/soccer-tiki-taka-revolution/

Around the late 1970s, Pele — no less than a Brazilian god and the king of football — popularised the expression “o jogo bonito”, a Portuguese phrase that translates to “the beautiful game” in English. The exact origin of the phrase is disputed but attributions have been made to Pele’s teammate Waldyr “Didi” Pereira. The presenter Stuart Hall claimed to have originated it in 1958 when he went to see Manchester City play at Maine Road and used the term “the beautiful game” to describe Peter Doherty’s style of play.

Beauty comes first. Victory is secondary. What matters is joy.” — Doctor Sócrates

Notwithstanding the originator of the phrase, it is the words of another Brazilian legend, Doctor Sócrates — a medical doctor and the iconic 1982 captain of Brazil’s greatest national team — that expresses the underlying philosophy of Brazilian football: “Joga Bonito”. It’s a philosophy of football, or even a way of life, that revels in jubilation and enjoyment for those playing and those watching. The emphasis is on playing stylish, attacking football where there’s no dearth of creativity and innovation. Flick it, flip it, dance with your feet, dribble past your opponents, you have the full license for liberation.

But what makes football the beautiful game? Is it the unpredictability? You never know what will happen in a game of football. A red card, an own goal, or a wonder goal from 40-yards out. It keeps the teams, players, and fans constantly on their toes. Is it that the eleven players in each side try to collectively execute certain plays or tactics, yet no player can read the other’s mind? Is it that while the team as a unit stays important, there is opportunity for individuals to fearlessly and flamboyantly show their brilliance?

And what does “o jogo bonito” have to do with distributed systems? 🤔

An analogy

Image from https://circleci.com/blog/distributed-systems/

Football is a team sport played between two teams of eleven players each that, using any part of their bodies except their hands and arms, try to manoeuvre the ball into the opposing team’s goal. With only the goalkeeper being permitted to handle the ball and may do so only within the penalty area surrounding the goal, the objective of the game is to outscore the opposing team. Football is a fast-paced and physically demanding sport that requires players to possess a range of skills, including speed, agility, strength, and coordination. It is also a game that requires strategy, teamwork, and communication, making it both challenging and exciting for players and spectators alike.

One may even refer to FC Barcelona’s tiki taka as a finely tuned micro-services deployment.

The players try to execute certain plays, strategies, or tactics, but no player can read the other’s mind. There is no shared memory (state) among players, and the timing of the players can also be off. Each player has only a limited, incomplete view of the system — such as an evolving attack. Yet, the global objective for the team is well-defined: defend your goal post, and eventually score a goal. The game of football is, quite remarkably, analogous to distributed systems in which a collection of autonomous nodes communicate to collectively perform a particular task, with no shared memory and no common physical clock. It may be possible to stretch the analogy by relating team tactics to distributed algorithms, injuries to crash faults, team captains to distributed systems’ leaders, etc.

Single point of fragility

Imagine all the different, complex, and (quite often now) physically distant components and services that must communicate with each other just to ensure that Google Maps takes you to Kilimanjaro, a famed restaurant along Nairobi’s Kimathi Street, and not Kilimanjaro, the highest mountain on the African continent and the world’s highest free standing mountain. The field of computing did not achieve this level of efficiency and timeliness overnight. Instead, a series of need-inspired advancements got us to where we are today. But why go through all this trouble? Why not use one single supercomputer that can do everything and save ourselves from what feels like intractable complexity?

“You can have a second computer once you’ve shown you know how to use the first one.”–Paul Barham

There are a lot of good reasons not to build distributed systems. Complexity is one: distributed systems are legitimately harder to build, and significantly harder to understand and operate. Efficiency is another. As McSherry et al argue in Scalability! But at what COST?, contrary to the common wisdom that effective scaling is evidence of solid systems building, any system can scale arbitrarily well with a sufficient lack of care in its implementation. Modern computers are huge and fast thus single-system designs can have great performance and efficiency.

The availability of a monolithic system is limited to the availability of the piece of hardware it runs on.

Modern hardware is pretty great, and combined with a good datacenter and good management practices, servers can be expected to fail with an Annual Failure Rate (AFR) in the single-digit percentages. But while this is reasonably good, today’s typically large and centralised monolithic systems are characterised by either being single point of failures, or having severe bottlenecks under load. There’s nothing inherently wrong about big monolithic or even centralised systems. If you have one and you’re not experiencing any of these issues, there’s absolutely no reason to change the approach. However, when they do, overweight monoliths exhibit two classes of problems: degrading system performance and stability, and slow development cycles. So, whatever we do comes from the desire to escape these technical and consequently social challenges.

Decentralised != Distributed

Centralised, decentralised, distributed illustrated. GIF from https://twitter.com/danheld

Systems can be very small, interconnecting only a few devices and a handful of users. Or they can be immense and span countries and continents. Either way, they face the same challenges: fault tolerance, maintenance costs, and scalability. While all these systems can function effectively, some are more stable and secure than others by design.

The terms centralised and decentralised refer to levels of control. Where does the control lie?

A centralised computing system is where all computing is performed by a single computer in one location. Centralised systems are designed following the traditional client-server architecture, where a single, centralised server is responsible for storing and processing all the information, which it needs to make available to other users, known as client nodes who can connect directly to the main server and submit data requests instead of performing them directly. They may have helped build the internet, but they have important disadvantages.

Problems begin to arise once the single point of fragility has actually started failing under heavy load at which point having a large attack surface can translate to a perpetual state of emergency. For example, an outage in non-critical data processing brings down your entire service. You moved all time-intensive tasks to one huge group of background workers, and keeping them stable gradually becomes a full-time job for a small team. Changing one part of the system unexpectedly affects some other parts even though they’re logically unrelated. That’s what decentralised and distributed systems try to address.

Decentralised systems incur cost and complexity because they continuously avoid getting into this state.

In systems theory, a decentralised system is one in which lower level components operate on local information to accomplish global goals. The global pattern of behaviour is an emergent property of dynamical mechanisms that act upon local components, such as indirect communication, rather than the result of a central ordering influence of a centralised system. A decentralised system distributes workload among several nodes without having a single central node to manage, coordinate, or govern the system. Each of these nodes can act autonomously and makes its own decision. The final behaviour of the system is the aggregate of the decisions of the individual nodes. Depending on how they are designed, decentralised systems can have a few benefits compared to centralised systems — greater resilience, horizontal scalability, lower hardware costs, etc.

Distribution refers to differences of location.

The terms “decentralised” and “distributed” sound extremely similar, are often used interchangeably, and can be logically concluded to have the same meaning but their difference, albeit subtle, has a significant technological impact. A decentralized system is about spreading the control across several actors. A distributed system is about spreading an arbitrary property across several actors. Distributed systems follow a peer-to-peer architecture. All parts of the system are located in different physical locations. The processing is spread across multiple nodes, but decision making can be centralised or decentralised. Various nodes can communicate and coordinate by passing messages.

--

--

No responses yet