Sharding Data for Fun & Profit

02:30 PM - 03:25 PM on August 15, 2015, Room 701

Wes Chow

Audience level:
advanced
Watch:
http://youtu.be/dl0JusHHIIQ

Description

The hash function is the veritable hammer for pounding a large array of engineering problems into submission. Want to shard your database? Draw a key from your data, hash it, and voila, instant deterministic load balancing! That’s simple enough, until you look more carefully at distributional effects, failure, and redundancy management. We’ll review well known (consistent hashing), not so well known (rendezvous hashing), and recent (shuffle sharding, copysets) work that goes a long way towards engineering more favorable failure scenarios.

Abstract

Sharding is our tool for handling load balancing, data distribution, and guides our replication efforts. But there's more to it than immediately apparent. The way in which we go about dealing with key distribution can have subtle effects on reliability and performance.