One of the challenges for practicing implementing distributed systems is that it is not easy to simulate the various situations a distributed system might find itself in. Moreover, I previously could not even come up with an easy way to deploy a toy setup; the only thing I could think of is to use minikube and build a Kubernetes-based environment, but frankly at that point it is too much investment for me1.

Fortunately, I recently came across a series of distributed systems challenges created by fly.io and Kyle Kingsbury (author of Jepsen): Gossip Glomers. The challenges use Maelstrom, a framework for running and testing toy implementations of distributed systems, so that you only have to implement the individual nodes and not worry about anything else. Even better, Maelstrom provides a Go library containing the boilerplate for creating Maelstrom Nodes, leaving you to focus on the fun stuff2!

With this being so accessible, I guess I’ll be working through the Gossip Glomers as I get time. When I complete a challenge, I’ll also write a post about my thought process3!


Repository Notes

My code can be found on GitHub at lynshi/gossip-glomers. Implementations for each challenge can be found under internal/. Finally, I’ve set up GitHub Actions to run tests for each challenge.


Challenge 2: Unique ID Generation

Challenge 2 is Unique ID Generation. The system must generate globally unique IDs and be totally available, so network partitions must not lead to violation of the uniqueness constraint.

In order to be resilient to partitions, we cannot rely on node-to-node communication to synchronize the list of used IDs. One way of ensuring that nodes do not generate conflicting IDs is to split the range of possible IDs across nodes so that none of the ranges overlap.

While the obvious approach is to split the pool of possible IDs in 34, an easier to implement (and subjectively more elegant) method is to have a unique prefix for each node. The prefix can be derived from the node ID (nodes are numbered n1, n2, …). Then, when asked for an ID, each node will return a locally-unique ID prefixed with its node ID. For example, the first ID generated by node 1 might look like 1-0.


Upon rereading the prompt, I realized that IDs may be of any type, so the format <node ID>-<locally incrementing index> suffices to solve the problem. However, I completedly missed this as I was working on it, so I came up with a scheme to create integer IDs.

Since the test sends 1000 requests per second for 30 seconds, the system is expected to generate 30,000 unique IDs. This means that the set of ID numbers generated by each node must include the range [0, 30000); this fits in a 15-bit number (2^15 = 32768). So, to generate a globally unique ID from a locally unique one, we left-shift the node ID by 15 bits to obtain the prefix, then add the locally unique ID. Overall, our ID generator generates 17-bit IDs5.


Implementation

Naively, the way to create locally unique IDs is to have a counter that starts at 0. Every time a generate message is received, we return the current counter value and increment the counter.

There is a small issue — presumably, a node can receive concurrent requests. There needs to be a synchronization primitive for the counter to ensure that each counter value is only read once. Normally we would use a lock to protect access, but we are using Go so let’s do better!

We’ll use a channel to provide unique IDs. The channel is initialized with a capacity of 1 so that we generate IDs on-demand, but you could set a higher capacity to have more IDs ready at a time, or even prepopulate with all possible IDs up front6. We use uint32 for the counter to avoid type conversions as the final ID requires 17 bits.

counter := make(chan uint32, 1)

Next, we’ll start a goroutine to add a new, unique ID whenever the channel is emptied.

i := uint32(0)
go func() {
  for {
    select {
    case <-ctx.Done():
      close(counter)
      return
    case counter <- i:
      i++
    }
  }
}()

With a channel providing unique IDs now available, the unique_ids handler only needs to read a value from the channel to respond to a request. Recall the need to prefix the read ID with the node ID, which we do by shifting the node ID left 15 bits.

node_id, err := strconv.Atoi(n.ID()[1:])
if err != nil {
  return errors.Wrapf(err, "could not get integer from %s", n.ID())
}

id, ok := <-counter
if !ok {
  return errors.New("counter channel closed")
}

node_id = node_id << 15

// ...
resp["id"] = uint32(node_id) + id

For those curious, errors.Wrapf comes from github.com/pkg/errors, which provides convenient utilities for wrapping errors nicely to bubble up trace information.

That’s all there is to it! The full code can be found at internal/unique_ids/unique_ids.go.


  1. We haven’t even gotten to how to create and run proper tests yet! ↩︎

  2. Plus the added benefit of getting to use Go 🤣 ↩︎

  3. The first challenge is just a “Hello, World!” for getting you set up, so I’ll skip it. ↩︎

  4. In this challenge, there are only three nodes. ↩︎

  5. The first two bits will be used for the unique prefix (00, 01, and 10). ↩︎

  6. Be careful not to create a situation where the channel is unbuffered and you continuously add IDs without an upper limit on the ID value! ↩︎