Gossip Glomers 3 (a-c): Single-Node, Multi-Node, and Fault Tolerant Broadcast

Last time we introduced the Gossip Glomers challenge from Fly.io and discussed our approach to Challenge #2: Unique ID Generation. This time, we’ll talk about the first three parts of Challenge #3: Broadcast. Parts D and E are saved for a separate post as they’re a bit more involved. The overall theme of Challenge 3 is to build a broacast system to propagate messages to all nodes1. We iteratively build up our system, from a single-node cluster that simply stores and returns received messages, to a multi-node cluster that shares received messages, to a fault-tolerant multi-node cluster that can operate even during network partitions by Part C (D and E are about efficiency)....

November 29, 2023 · 7 min · 1328 words · Lyndon Shi

Gossip Glomers: Intro and Unique ID Generation

One of the challenges for practicing implementing distributed systems is that it is not easy to simulate the various situations a distributed system might find itself in. Moreover, I previously could not even come up with an easy way to deploy a toy setup; the only thing I could think of is to use minikube and build a Kubernetes-based environment, but frankly at that point it is too much investment for me1....

November 26, 2023 · 5 min · 908 words · Lyndon Shi

Understanding EWD998: Shmuel Safra's version of termination detection

I’ll soon be attending Markus Kuppe’s workshop on TLA+ and one of the pre-read materials is Dijkstra’s EWD998 - Shmuel Safra’s version of termination detection. I haven’t read a serious, academic paper since college like 3 years ago1, so it was quite an adventure getting back into the swing of things. As I was reading, I spent a lot of time going back and forth to make things make sense, because the hallmark of a real paper is that you can’t just consume it in one go if you actually want to understand the material....

February 18, 2023 · 16 min · 3384 words · Lyndon Shi