Distributed systems

Gossip Glomers 3 (a-c): Single-Node, Multi-Node, and Fault Tolerant Broadcast

Last time we introduced the Gossip Glomers challenge from Fly.io and discussed our approach to Challenge #2: Unique ID Generation. This time, we’ll talk about the first three parts of Challenge #3: Broadcast. Parts D and E are saved for a separate post as they’re a bit more involved. The overall theme of Challenge 3 is to build a broacast system to propagate messages to all nodes1. We iteratively build up our system, from a single-node cluster that simply stores and returns received messages, to a multi-node cluster that shares received messages, to a fault-tolerant multi-node cluster that can operate even during network partitions by Part C (D and E are about efficiency)....

Gossip Glomers: Intro and Unique ID Generation

One of the challenges for practicing implementing distributed systems is that it is not easy to simulate the various situations a distributed system might find itself in. Moreover, I previously could not even come up with an easy way to deploy a toy setup; the only thing I could think of is to use minikube and build a Kubernetes-based environment, but frankly at that point it is too much investment for me1....

Understanding EWD998: Shmuel Safra's version of termination detection

I’ll soon be attending Markus Kuppe’s workshop on TLA+ and one of the pre-read materials is Dijkstra’s EWD998 - Shmuel Safra’s version of termination detection. I haven’t read a serious, academic paper since college like 3 years ago1, so it was quite an adventure getting back into the swing of things. As I was reading, I spent a lot of time going back and forth to make things make sense, because the hallmark of a real paper is that you can’t just consume it in one go if you actually want to understand the material....