Avalanche on integer hash functions

From JBWiki
Jump to: navigation, search

Contents

Warning

This page is meant for very efficient javascript implementations like the V8 engine used in Chromium or Google Chrome. If you use a slower browser, clicking on "Go" below is likely to make it unresponsive for quite a while.

Testing avalanche on integer hash functions

Choose a function: (you may edit it below)

What is avalanche, and why does it matter?

A hash function maps various kinds of input to a number. The same input must always map to the same number, and different inputs should preferably map to different numbers.

The avalanche effect is the property, for a hash function, that every bit of the input affects every bit of the output 50% of the time.

While the primary motivation for this concept is in cryptographic contexts, it is also relevant for hashes destined to table lookup. Avalanche immunises against catastrophic collisions for inputs that only differ by a few bits, because any change in the input, even of one single bit, will strongly affect the output in a haphazard way. Since hash functions for general-purpose table lookup cannot know or assume anything about the distribution characteristics of every possible input, this seems very desirable indeed.

This page is aimed at 32-bit unsigned integer hash functions, that is, functions that map 32-bit unsigned integers to 32-bit unsigned integers. It is easy to extend them to more general input.

Avalanche diagrams

The avalanche effect has a further nice characteristic: it can be represented in a way that is both informative, easy to understand and aesthetically pleasing. Bret Mulvey's clever diagrams have become quite popular, they are also used by Austin Appleby (the author of the excellent MurmurHash functions), and even in Wikipedia. They show how each bit in the input (lines) affects each bit in the output (columns).

This page attempts to take Mulvey's great idea one step further by using plain javascript interactively instead of C#. You simply write (or paste) a function that takes a 32-bit unsigned integer both as input and as output into the code entry area above, click "Go", and its avalanche diagram is computed for you as you watch. Since the syntax of javascript is quite close to that of C, C++, Java, etc, lifting code from one of these languages is usually straightforward, but you may need to watch out for signed or unsigned right shifts.

The numbers in the cells show how many times (rounded) out of 100 that input bit affects that output bit. (100 is rendered as 99 to save space.) Pure light green is best, red is awful, dark green - brown - orange is in between. The "Avalanche" result under the textarea provides a quantified summary. Lower is better - for the best functions, it should decrease steadily and end up with a value in the neighbourhood of 0.00256 for 100 000 samples. (Anything spectacularly better, say, less than 0.0001, it is prima facie evidence of witchcraft. Science says it cannot happen.)

Happy experimenting!

The functions in the drop-down list give you examples that you may try to improve by editing them in the textarea. The default function (Mulvey's SimpleHash) is a good starting point - its point is not that it is a good hashing function, on the contrary, it was conceived to be easy to improve. You could, e.g., try other multipliers, like Marsaglia's famous 69069 or the alternating 0 and 1 bits 0x55555555 and 0xaaaaaaab or one of my favorites: 0x0aacc52b. You could also replace the additions by XORs, essentially reinventing FNV - but perhaps with a better multiplier, the original 0x01000193 is not particularly good, FNV-primes notwithstanding.

simpleHash uses a provided mult function that implements multiplication modulo 232. This is necessary because javascript numbers are always floating point with 53 significant bits, resulting in the discard of the least significant bits in the case of overflow. Try replacing it with plain multiplication and see what happens. That is, copy and paste the following version in the textarea:

  function simpleHash(n) {
    var hash = 0;
    var m = 0x50003;
    hash = m * (hash + (n & 0xff)); n >>>= 8;
    hash = m * (hash + (n & 0xff)); n >>>= 8;
    hash = m * (hash + (n & 0xff)); n >>>= 8;
    hash = m * (hash + (n & 0xff));
    return hash;
  }

Another provided function is fullMult. It takes two 32-bit arguments and returns their product as an array, the first element being the 32 low bits and the second the 32 high bits. This is one of the most efficient avalanching tools I know, it is used in fold. It is easily implemented in C using 64-bit integers, and perhaps more efficiently using a little bit of assembler language - most 32-bit CPUs provide a multiplication instruction that stores the high bits somewhere, e.g. the x86 family's mull instruction stores them in edx. However, in the case of javascript, its floating point variant foldFP is faster.

External links

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox