Chapter 9 2 min read
Save

Hashing

Data Structure and Algorithms · BCA · Updated Apr 15, 2026

Table of Contents

Introduction

Hashing is a technique to map keys to array indices using a hash function, giving average O(1) insert, search, and delete. A hash table is the underlying array of size m; element x is placed at position h(x) mod m.

Hash Function

A good hash function should be deterministic, fast to compute, and uniform (spread keys evenly across the table). Common hash functions include:

  • Division: h(k) = k mod m. Pick m as a prime not close to a power of 2.
  • Multiplication: h(k) = floor(m * ((k * A) mod 1)) with A ≈ 0.618 (Knuth).
  • String hash (Java): h(s) = s[0]*31n-1 + s[1]*31n-2 + ... + s[n-1], mod table size.

Collisions

When two different keys hash to the same bucket, a collision occurs. Two strategies resolve collisions:

Separate Chaining

Each bucket holds a linked list of all keys mapped there. Insert appends to the list; search walks the list; delete removes the node. With load factor α = n/m, average search time is O(1 + α). Pros: simple, table never "fills up". Cons: extra pointer overhead, poor cache performance.

Open Addressing

All entries live in the table; on collision, probe other buckets until an empty slot is found.

  • Linear probing: try h(k), h(k)+1, h(k)+2, ... mod m. Simple but suffers from primary clustering.
  • Quadratic probing: try h(k), h(k)+1², h(k)+2², .... Reduces clustering.
  • Double hashing: probe h1(k) + i * h2(k) mod m. Best distribution.

Load factor must stay below 1 and is usually kept below 0.7 by rehashing.

Rehashing

When the load factor exceeds a threshold, allocate a new larger table (typically double) and reinsert every key using the new size. Amortized cost per insert stays O(1).

Applications

  1. Language dictionaries: Python dict, Java HashMap, C++ unordered_map.
  2. Database indexing (hash indexes).
  3. Symbol tables in compilers.
  4. Caching (memoization, LRU cache).
  5. Cryptographic hashing (SHA-256) for integrity.
  6. Set membership: spam filters, Bloom filters.

Hash Table vs BST

AspectHash TableBalanced BST
SearchO(1) avgO(log n)
Ordered traversalNoYes
Worst case searchO(n)O(log n)
Extra memoryTable slackPointers per node

Summary

Hashing gives the fastest average lookup but does not preserve order and can degrade with a bad hash function. Pick a prime table size, a uniform hash, and a collision strategy suited to your workload.

Important Questions

  1. Define hashing and hash function. What are the properties of a good hash function?
  2. Explain the division and multiplication methods.
  3. What is a collision? How is it resolved?
  4. Explain separate chaining with an example.
  5. Explain linear, quadratic probing, and double hashing.
  6. What is load factor? When should you rehash?
  7. Compare hash tables and balanced BSTs.
  8. List five applications of hashing.

Related Notes

Discussion

0 comments

Join the discussion

Log in to share your thoughts and help fellow students.

Log in to comment

No comments yet. Be the first to share your thoughts!