We would cover the following topics on Universal Hashing
- Introduction to Universal Hashing
- What is Universal Hashing
- How Universal hashing Works
- The Theorem and Proof
1. Introduction to Universal Hashing
If we have a fixed hash function such that all keys uses this single hash function, then we may be unfortunate that the keys selected may be such that they all hash to the same slot in the hash table. And this would result in an average search time of Θ(n). We want to avoid this happening or at least limit its possibility.
2. What is Universal Hashing?
Universal Hashing is a way to avoid the situation presented above. In Universal Hashing the hash function is selected randomly independent of the keys that are to be stored.
In universal hashing, at the beginning of the execution, we choose a hash function randomly from a carefully designed family of functions. This guarantees that no single input would result in the worst-case situation. Since the hash functions are random, the behavior would be different for for each execution which would yield average case performance on each input.
3. How Universal Hashing Works
Let ℋ be a finite set of has functions that map a given universe U of keys into the range {0, 1, 2, . . . ,m-1}. This set of functions is said to be universal if for each pair of distinct keys k, l ∈ U, the number of has functions h ∈ ℋ for which h(k) = h(l) is at most |ℋ|/m
This means that with a hash function randomly chosen from the set ℋ, the possibility of collision between two distinct keys k and l is at most the chance of 1/m of of a collision occurring if h(k) and h(l) were chosen randomly and independently from the set {0, 1, 2, . . . ,m-1}
4. The Theorem and Proof
Theorem: Assuming that a hash function h is chosen randomly from a universal hash family of hash functions. And then used to has n keys into a hash table T which is of size m. Also assuming chaining is used to handle collisions. If the key k is not in the table, then the expected length E[nh(k)] of the linked list that k hashes into is at most equal to n/m. If the key is present in the table, then the expected length E(nh(k)] of the list containing the key is at most 1 + α
Proof: For each given pair of keys k and l which are distinct, we then define an indicator random variable Xkl to use to count the number of times k and l hashes to the same slot.
Xkl = I[h(k) = h(l)}
Since by by definition for universal family of has functions, a single pair of keys can only collide with probability of at most 1/m we now have
Pr{h(k) = h(l)} ≤ 1/m
We can there for conclude that :
E[Xkl] ≤ 1/m
Sure you have learnt about Universal Hashing but we recommend you read about Open Addressing and Linear Probing.
Check these Data Structure and Algorithms Tutorials
- Skip Lists and How They Work
- Perfect Hashing – How it Works
- Linear Probing, Quadratic Probing and Double Hashing
- Hashing With Open Addressing
- Universal Hashing
- Search Time Under Simple Uniform Hashing
- Hash Tables – Hashing With Chaining
- Introduction to Hash Tables and Direct Addressing
- Recurrences in Divide and Conquer Algorithm
- How Bloom Filters Work
- How Cuckoo Hashing Works
- Introduction to AVL Trees (With Examples)
2 thoughts on “Universal Hashing – An Clear Explanation”