Consistent Hashing and More

May 16, 2026

Consistent Hashing, the Karger Paper, and Modern Distributed Systems — Q&A Notes

Original Paper

Consistent Hashing and Random Trees (Karger et al., 1997)
Link: https://www.cs.princeton.edu/courses/archive/fall09/cos518/papers/chash.pdf

Q: Origin server will always be a bottleneck if distinct documents are asked for, isn’t it?

Yes.

Caching only helps when there is reuse.

If requests are mostly unique:

cache hit rate -> very low

then every request eventually reaches the origin.

Caching optimizes repetition, not uniqueness.

Q: How does consistent hashing solve this?

It does not solve uniqueness itself.

It solves:

how to distribute ownership/routing cleanly across many nodes

Instead of hierarchical cache trees:

Client -> Local -> Regional -> Root

consistent hashing distributes requests horizontally:

key -> hash -> responsible node

This removes centralized bottlenecks.

Q: What about the Plaxton/Rajaraman algorithm?

Plaxton/Rajaraman introduced distributed routing through a namespace instead of a strict hierarchy.

Requests progressively move closer to the object’s “home.”

Along the path:

intermediate caches can store copies
future requests terminate earlier
hotspots diffuse naturally

This influenced:

Pastry
Tapestry
DHTs
CDN routing ideas

Q: What are spread and load in consistent hashing?

Spread

Spread measures:

how many different nodes different clients may map the SAME key to

under inconsistent membership views.

Load

Load measures:

how unevenly objects/requests distribute across nodes

Low load means fewer hotspots.

Q: What is the client/server in this context?

Client

The machine/library making routing decisions.

Server

The storage/cache nodes.

Example in Cassandra:

Application + driver = client
Cassandra nodes = servers

Q: How does Aerospike differ from Cassandra?

Aerospike uses:

key -> fixed partition
partition -> node

instead of classic ring-style consistent hashing.

Partitions are fixed (typically 4096).

Ownership metadata is explicit.

Q: Why did industry move away from pure hash rings?

Because operational issues emerged:

vnode tuning
token imbalance
repair complexity
hotspot management
difficult debugging

Modern systems often prefer:

partition maps
explicit ownership tables
metadata coordinators

Q: Does the client look up multiple nodes?

Usually no.

Normally:

key -> deterministic node

Replication may add fallback nodes, but routing is deterministic.

Q: What if the chosen node doesn’t have the key?

Possible reasons:

cache miss
stale topology
migration in progress
replication lag

Systems may:

retry another replica
refresh metadata
redirect requests

Q: What happens if client metadata is stale during rebalance?

Example:

x moved from C3 -> C4

but client still thinks:

x -> C3

Possible behaviors:

redirect
forwarding
metadata refresh
temporary overlapping ownership

Q: How does replication complicate things?

Now the question becomes:

Which replicas own x right now?

instead of:

Where does x live?

This introduces:

repair
quorum logic
reconciliation
ownership transitions

Q: How does Cassandra solve this?

Cassandra chooses:

availability > temporary consistency

Uses:

gossip
hinted handoff
anti-entropy repair
overlapping ownership

Q: How does Aerospike rebalance?

Aerospike migrates at:

partition granularity

not per-key granularity.

Only partition ownership metadata changes.

Q: Can Aerospike get hot partitions?

Yes.

Hashing balances:

keys

not necessarily:

traffic
bytes
CPU
IO

Q: Can partitions become unevenly large?

Yes.

Hashing balances approximately:

number of keys

not record sizes.

Large objects can create storage skew.

Q: Is the ring even mentioned in the Karger paper?

Not really.

The paper is more abstract and probabilistic.

The modern “hash ring” became popular later as an implementation and visualization strategy.

Q: Why did the paper use multiple hash functions?

The hash family mainly supported:

randomized placement
probabilistic guarantees
spread/load analysis
overlap under inconsistent views

The final mapping still resolved to:

one key -> one selected bucket

Q: If we want low spread, why not use a single hash?

A single hash works well only if:

all clients see identical membership

Under inconsistent views:

small topology differences can cause large mapping divergence

Hash families create probabilistic overlap between client views.

Q: Are the hash functions tuned dynamically?

Usually no.

They are generally:

fixed
deterministic
pseudo-random
globally agreed upon

The system relies on probabilistic guarantees.

Q: How does rebalance work when a new node is added?

Only nearby ownership changes.

Example:

(100,300] moves from C2 -> C4

All other keys remain untouched.

Q: What happens when a node goes down?

Its ownership range is inherited by the next node clockwise.

Example:

C2 dies
C3 inherits C2's interval

Q: If there is no replication and a node dies, is there data loss?

Yes.

Consistent hashing solves:

routing
placement
smooth rebalance

It does NOT solve:

durability
fault tolerance
availability

Without replication:

node death = permanent data loss

Q: What are virtual nodes (vnodes)?

Instead of:

1 physical node = 1 position

use:

1 physical node = many virtual positions

This smooths distribution and reduces imbalance.

Deepest Takeaway

The real core idea is:

small topology changes should cause small ownership changes.

The ring is only one implementation strategy.