Programming + Design

Distributing requests across multiple hostnames using consistent hashing PDF Print E-mail
Written by Brett Brewer   
Sunday, 06 February 2011

If you've ever delved much into web site performance optimization, you're probably familiar with Google's PageSpeed and Yahoo's YSlow Firefox plugins and the typical list of suggestions they provide for improving page load times. One of the more common and least implemented suggestions (or most often wrongly implemented) is the idea of splitting request for resources across multiple hostnames. The idea is that since the early days of browsers there has been a limit on the number of resources that can be downloaded in parallel per hostname. The early browsers had a limit of 2 parallel downloads per hostname and I'm really not sure if that limit has increased or by how much, but the point is, if you split your requests across a few different hostnames, your browser can download more resources in parallel without blocking loading of other elements. Unfortunately, there are very few good explanations of how to actually achieve this without destroying most of the inherent benefits of browser caching. You might be tempted just to set up some new CNAMEs for the new hostnames in your DNS and then write a simple function that will randomly choose a hostname to serve each of your static resources from, but this is actually a bad idea because you have no guarantee that the resources will be served from the same URL on subsequent requests, so you will lose the benefits of browser caching. So then you might think, okay, I'll just use a static variable in a function to serve content sequentially in a round-robin fashion, so each resource would be served from the next host in the list and you just cycle through them repeatedly. As long as your pages never change and your resources are always in the exact same sequence on your page, then this would work fine, but if you add an element somewhere, you'll throw off the sequence for the other page elements and you'll end up busting your cache again. So what is an aspiring site optimization wizard to do? The answer turns out to be quite simple - use something called "consistent hashing".  Consistent hashing is the same thing used by popular backend technologies such as Memcached to determine which server in a pool of multiple Memcached servers to pull a particular resource from. Basically you create an algorithm that allows you to hash your filenames in such a way that they will always map to a particular server. This can get a little complicated when used for actual caching where you may want to have a file mapped to multiple servers for failover purposes, but for something as simple as spreading requests across multiple hostnames, all you really need is an algorithm that you can use to consistently map a particular filename to a single hostname. Fortunately for all of us PHP developers, there's a neat little class calledFlexihash that is suited for both simple and more complex uses of consistent hashing.

But enought of the useless background info, let's see how this would work in real life. For the sake of arguement, let's say you're running a fairly big ecommerce site and you're already serving your static content from a Content Delivery Network (CDN) such as Akamai. You serve your web site requests from and your images are all served from This mean you most likely have a CNAME record set up in your DNS corresponding to You now decide you want to add 3 more hostnames that all point to the same static content server as You choose, and and add CNAME records for them to your DNS zone, wherever your DNS is hosted. So now you can serve the same image files from any of the 4 domains you set up as CNAMEs. So, how do we set up our image file requests so that the requests are somewhat randomly served by these different host names, but ensure that every image is requested using the same hostname every time? We use a little library called Flexihash from a nice coder named Paul Annesley. So without further ado, here is a very simple example of how you'd use Flexihash to generate your image urls.

 //There are a couple of ways to include the required Flexihash
 //library files and we'll just assume you figured that part out
 //and included them already.
 //So, assuming your Flexihash lib is already included....
 //Instantiate our Flexihash object, to use the defalult hashing
 //algorithm (CRC32) and to hash each filename to just 1
 //target in our list of servers
 $flexiHash = new Flexihash(null,1);
 //Now set up our list of servers, each with a weight of 1,
 //so that Flexihash knows what to map the input filenames to.
 //set up some test filenames...
 $filename1 = "somefilename.jpg";
 $filename2 = "someotherfilename.jpg";
 $filename3 = "yetanotherfilename.jpg";
 //spit out a message showing how some test filenames
 //will map to specific servers...
 echo "<br/>$filename1 maps to ".$flexiHash->lookup($filename1);
 echo "<br/>$filename2 maps to ".$flexiHash->lookup($filename2);
 echo "<br/>$filename3 maps to ".$flexiHash->lookup($filename3);

So obviously, you'd do this a bit differently in your actual useage scenario. I'm getting ready to roll this out on a site and in my case I converted the Flexihash library to a native Kohana library and then wrote a helper function which uses Flexihash to allow me to get the image url for each of my images. If you're already knowledgable enough to know what consistent hashing is and that you need to use it, then you will hopefully have no trouble using the above example for your own implementation.

So now you have no excuse not to finally go ahead and implement this optimization technique in your next attempt at ecommerce world domination. 
Last Updated ( Tuesday, 24 May 2011 )
< Prev   Next >


Who's Online

We have 1 guest online

© 2017
Joomla! is Free Software released under the GNU/GPL License.