Universal Hashing – An Clear Explanation

We would cover the following topics on Universal Hashing

 

1. Introduction to Universal Hashing

If we have a fixed hash function such that all keys uses this single hash function, then we may be unfortunate that the keys selected may be such that they all hash to the same slot in the hash table. And this would result in an average search time of Θ(n). We want to avoid this happening or at least limit its possibility.

 

2. What is  Universal Hashing?

Universal Hashing is a way to avoid the situation presented above. In Universal Hashing the hash function is selected randomly independent of the keys that are to be stored.

In universal hashing, at the beginning of the execution, we choose a hash function randomly from a carefully designed family of functions. This guarantees that no single input would result in the worst-case situation. Since the hash functions are random, the behavior would be different for for each execution which would yield average case performance on each input.

 

3. How Universal Hashing Works

Let ℋ  be a finite set of has functions that map a given universe U of keys into the range {0, 1, 2, . . . ,m-1}. This set of functions is said to be universal if for each pair of distinct keys k, l ∈ U, the number of has functions h ∈   for which h(k) = h(l) is at most ||/m

This means that with a hash function randomly chosen from the set  , the possibility of collision between two distinct keys k and l is at most the chance of 1/m of of a collision occurring if h(k) and h(l) were chosen randomly and independently from the set {0, 1, 2, . . . ,m-1}

 

4. The Theorem and Proof

Theorem: Assuming that a hash function h is chosen randomly from a universal hash family of hash functions.  And then used to has n keys into a hash table T which is of size m. Also assuming chaining is used  to handle collisions. If the key k is not in the table, then the expected length E[nh(k)] of the linked list that k hashes into is at most equal to n/m. If the key is present in the table, then the expected length E(nh(k)] of the list containing the key is at most 1 + α

Proof: For each given pair of keys k and l which are distinct, we then define an indicator random variable Xkl to use to count the number of times k and l hashes to the same slot.

Xkl = I[h(k) = h(l)}

Since by by definition for universal family of has functions, a single pair of keys can only collide with probability of at most 1/m we now have

Pr{h(k) = h(l)} ≤ 1/m

We can there for conclude that :

E[Xkl] ≤ 1/m

Sure you have learnt about Universal Hashing but we recommend you read about Open Addressing and Linear Probing.

 

Check these Data Structure and Algorithms Tutorials
User Avatar

kindsonthegenius

Kindson Munonye is currently completing his doctoral program in Software Engineering in Budapest University of Technology and Economics

View all posts by kindsonthegenius →

2 thoughts on “Universal Hashing – An Clear Explanation

Leave a Reply

Your email address will not be published. Required fields are marked *