Let depict a situation where we have now partitioned our dataset running across multiple nodes and multiple machine…
There is a question here… When a client wants to make a request, how does it know which node to connect to? 🤔
- As partitions are rebalanced, the assignment of partitions to nodes changes.
- Somebody needs to stay on top of those changes, in order to answer the question…
- If I want to read or write the key “sig1”
- Which IP/port number do I need to connect to?!
- If I want to read or write the key “sig1”
This is an instance of a more general problem called… Service Discovery.
- Which is not limited to just the databases
- Any piece of software that is available over a network has this problem
- Especially, if it is aiming for high availability
- Running a redundant configuration on multiple machines
Service Discovery
Many companies have written there own in house Service Discovery tools… And many of these have been released as open source!
At a high level there are few different approaches to this problem:
- Allow clients to contact any node
- For example, via round robin load balancer
- If that node coincidentally owns the partition to which request applies, it can handle the request directly.
- Otherwise it forwards the request to the appropriate node
- Receives the reply and passes the reply along to the client
- Send all requests from clients to a routing tier first
- Which determines the node that should handle each request
- And forwards it accordingly
- This routing tier does not itself handle any requests
- It only acts as a partition aware load balancer
- Which determines the node that should handle each request
- Require that clients are aware of the partitioning and the assignment of partitions to nodes
- In this case the client can connect to the appropriate node without any intermediary
In all cases the key problem is how does the component make the routing decision?
- This maybe any of the below…
- One of the nodes
- The routing tier
- Or the client
This is a challenging problem!
- Because it is important that all participants agree
- Otherwise requests will be sent to the wrong nodes 😔
- And handled incorrectly 👎
There are protocols to achieving consensus in a distributed system. 💡
- ⚠️ Although consensus protocols are hard to implement correctly 😵💫
- 👉 Many distributed data systems rely on separate co-ordination service
Co-ordination Service (ZooKeeper)
Zookeeper is a type of co-ordination service that tracks cluster meta data.
- Each node registers itself in ZooKeeper
- And ZooKeeper maintains the authoritative mapping of partitions to nodes
- Other actors such as the routing tier, or partition aware clients can subscribe to this information in ZooKeeper
- Whenever a partition changes ownership, or node is added or removed
- ZooKeeper notifies the routing tier so that it can keep its routing tier up to date
For example:
LinkenIn’s Espresso has used Helix for cluster management. Implementing a routing tier which in turn relies on ZooKeeper!
Other notable mentioned that track ZooKeeper assignment:
Alternatives to ZooKeeper
- MongoDB – has a similar architecture but it relies on it’s own configuration server implementation and Mongo’s daemons as the routing tier
- Cassandra and Riak take a different approach… they use a Gossip Protocol among the nodes
- To decimate any changes in cluster state, requests can be sent to any node
- And that node forwards them to that appropriate node for the requested partition
- To decimate any changes in cluster state, requests can be sent to any node
Approach 1 of the 3 approaches of the request routing problem highlighted earlier in this blog.
- This model puts more complexity on the databases nodes
- But avoids the dependency on an external coordination service such as ZooKeeper
CouchBase does not rebalance automatically which simplifies the design…
- Normally it is configured by the routing tier
- This leans about routing changes from the cluster nodes
It still needs to find the IP address to connect to?
When using a routing tier, or when sending requests to node clients, it still needs to find the IP address to connect to!?
- These are not as fast changing as the assignment of partitions to nodes 👎
- So it is often sufficient to offer DNS for this purpose 🤗









Leave a reply to How to Relieve Hotspots with Skewed Workloads – Scalable Human Blog Cancel reply