Hashing

Introduction

Hashing is a technique to map keys to array indices using a hash function, giving average O(1) insert, search, and delete. A hash table is the underlying array of size m; element x is placed at position h(x) mod m.

Hash Function

A good hash function should be deterministic, fast to compute, and uniform (spread keys evenly across the table). Common hash functions include:

Division: h(k) = k mod m. Pick m as a prime not close to a power of 2.
Multiplication: h(k) = floor(m * ((k * A) mod 1)) with A ≈ 0.618 (Knuth).
String hash (Java): h(s) = s[0]*31^n-1 + s[1]*31^n-2 + ... + s[n-1], mod table size.

Collisions

When two different keys hash to the same bucket, a collision occurs. Two strategies resolve collisions:

Separate Chaining

Each bucket holds a linked list of all keys mapped there. Insert appends to the list; search walks the list; delete removes the node. With load factor α = n/m, average search time is O(1 + α). Pros: simple, table never "fills up". Cons: extra pointer overhead, poor cache performance.

Open Addressing

All entries live in the table; on collision, probe other buckets until an empty slot is found.

Linear probing: try h(k), h(k)+1, h(k)+2, ... mod m. Simple but suffers from primary clustering.
Quadratic probing: try h(k), h(k)+1², h(k)+2², .... Reduces clustering.
Double hashing: probe h1(k) + i * h2(k) mod m. Best distribution.

Load factor must stay below 1 and is usually kept below 0.7 by rehashing.

Rehashing

When the load factor exceeds a threshold, allocate a new larger table (typically double) and reinsert every key using the new size. Amortized cost per insert stays O(1).

Applications

Language dictionaries: Python dict, Java HashMap, C++ unordered_map.
Database indexing (hash indexes).
Symbol tables in compilers.
Caching (memoization, LRU cache).
Cryptographic hashing (SHA-256) for integrity.
Set membership: spam filters, Bloom filters.

Hash Table vs BST

Aspect	Hash Table	Balanced BST
Search	O(1) avg	O(log n)
Ordered traversal	No	Yes
Worst case search	O(n)	O(log n)
Extra memory	Table slack	Pointers per node

Summary

Hashing gives the fastest average lookup but does not preserve order and can degrade with a bad hash function. Pick a prime table size, a uniform hash, and a collision strategy suited to your workload.

Important Questions

Define hashing and hash function. What are the properties of a good hash function?
Explain the division and multiplication methods.
What is a collision? How is it resolved?
Explain separate chaining with an example.
Explain linear, quadratic probing, and double hashing.
What is load factor? When should you rehash?
Compare hash tables and balanced BSTs.
List five applications of hashing.