# Overview
Consistent hashing is a [[Hashing]] technique used in distributed systems (i.e., systems where data resides on many machines, potentially spread across different geographic locations) to evenly distribute data across the machines. Moreover, this technique aims to minimize the effort needed to re-distribute data in the case of one of the machines failing or a new machine entering the system compared to the approach of [[Traditional Hashing]].
To do so, consistent hashing places all possible values produced by the hash onto a continuous spectrum, which can be visualized as a ring. Each node in the system is then distributed across values on this ring. After the data is hashed, the hash value is placed on the ring. It is then assigned to the closest node based on moving clockwise.
This approach minimizes re-distribution because one the segment impacted by a new or removed nodes needs to be re-distributed, which is as simple as moving the data to the new closest node.
## Diagram
![[Consistent Hashing 2024-10-03 11.45.07.excalidraw.svg]]
# Key Considerations of Consistent Hashing #flashcard
There is orchestration logic necessary to make it work. In practice, this usually means:
1. Signaling the beginning of a scaling event. Recording both the old and new server assignments.
2. Slowly disconnecting clients from the old server and having them reconnect to their newly assigned server.
3. Signaling the end of the scaling event and updating the coordination service with the new server assignments.
4. In the interim, having messages which are sent to both the old and new server until they're fully transitioned.
<!--ID: 1751507776506-->
# Pros of Consistent Hashing #flashcard
- Predictable server assignment
- Minimal connection disruption during scaling
- No central coordination needed for routing
- Works well with stateful connections
- Easy to add/remove servers
<!--ID: 1751507776509-->
# Cons on Consistent Hashing #flashcard
- Does not assist with creating a uniform distribution across the nodes. This is still dependent on the hashing algorithm and node placement.
- Complex to implement correctly
- Requires coordination service (like [[Apache ZooKeeper]])
- All servers need to maintain routing information
- Connection state is lost if a server fails
<!--ID: 1751507776511-->
# Use Cases for Consistent Hashing #flashcard
- [[Redis]] Cluster
- [[Cassandra]]
- [[DynamoDB]]
- [[Content Delivery Network (CDN)]]
- [[Design a Distributed Database]]
- [[Design a Distributed Cache]]
- [[Design a Distributed Message Broker]]
<!--ID: 1751507776513-->
# Related Topics