One of our main goals has always been to offer the lowest latency possible to our users. The only option is to distribute the service across multiple machines in multiple regions.

To make this easier, we are building the agent in a way that’s infinitely horizontally scalable and deploy it behind a global load balancer. Each machine is identical and capable of serving any incoming request. It has access to the databases and will cache the data locally in memory or on disk.

Why clustering?

If every machine can do everything, why do they need to communicate with each other?

For some operations, such as ratelimiting, we require eventual consistency across the entire fleet.

Machine A and machine B might receive ratelimit requests for the same identifier and if we do not coordinate the ratelimiting between them, we might end up exceeding the set limits by an unacceptable amount.

Starting a cluster

To start a cluster, you need to run the agent with a cluster configuration.

When starting, each node will try to join the cluster by contacting all of the other nodes.

Membership

We use serf for membership, failure detection and gossip communication.

When a node joins the cluster, it will broadcast its presence to all other nodes and serf takes care of the rest.

Consistent Hashing

After membership has been established and each node in the cluster always knows about all other nodes, we can start sharding data across the cluster.

We use consistent hashing to shard data deterministically across all nodes in the cluster.