This is used when the keys stored in the hash table are expected to be static. In this case perfect hashing guarantees excellent average as well as worst-case performance. This is how real dictionaries work. You most of the time just need to read from a dictionary.

Some application of perfect hashing includes:

- data storage on a CD ROM
- set of reserved words in a programming language

This hashing technique is called perfect hashing because it takes constant time, O(1) to search for an item in the table.

**Perfect hashing** is implemented using two hash tables, one at each level. Each of the table uses universal hashing. The first level is the same a hashing with chaining such that n elements is hashed into m slots in the hash table. This is done using a has function selected from a universal family of hash functions.

The second level of uses a second hash table (instead of a linked list used in chaining). Elements that hash to the same slot j in the first hash table are stored in a second hash table. This second hash table is known as secondary hash table . The hash function h_{j} is carefully chosen such that there are no collision in the secondary table.

To ensure there are no collisions in the secondary hash table S_{j}, we need to make the size m_{j} of the secondary table equal to the square of the number of keys hashing into slot j in the first table. That is:

m_{j} = n_{j}^{2}

The hash functions for the primary hash table is carefully chosen so that we limit the expected total amount of space used to be O(n). Take some time to watch the video explanation of Perfect Hashing

**Watch the Video**

**Choosing Hash Functions**

The first level hash function is chosen from the universal hash family ℋ pm where p is a prime number larger than any key value. So the keys that hash into slot j of the first hash table are placed into a secondary hash table S_{j} using the hash function h_{j} which is chosen from another family of hash functions ℋ _{p,mj}. The size of the secondary hash table is m_{j}.

The objective is to achieve a situation where there are no collisions in the secondary hash table. At least no expected collisions. Let’s examine the expected colliding elements.

##### Estimating Expected Collisions

**Theorem**: Assuming n keys are stored in a hash table of size m = using a hash function h which is chosen randomly from a universal family of hash functions. Then the probability that there would be any collision is less than 1/2

Proof: There are 〈n choose 2〉 pairs of keys that may collide. Each pair collides with a probability of 1/m if h is chosen randomly from a universal class ℋ of hash functions. We use a random variable X to count the number of collisions.

Let E(X) = expected number of collisions

It then follows that a hash function h that is chosen randomly from the set ℋ is have a high likelihood of having no collisions.

##### Analysis

If the number of keys n is small we can get away with using a hash table of size m = n^{2} and choosing a random has function from ℋ . But if n is large, the m = n^{2} would be expensive. Therefore, we resort to using two level hash hashing:

in the first level, hash function h is used to hash n keys into m slots where m = n. Then the number of keys hashing to slot j is n_{j}.

in the second level, we now use a secondary hash table of size m_{j} = n_{j}^{2} which ensures a collision does not occur (given the table is static)