Concurrency Control
Concurrency control mechanisms ensure that concurrently executing transactions produce serializable schedules. Without them, interleavings of reads and writes lead to lost updates, dirty reads, and inconsistent retrievals. Several families of algorithms enforce correctness.
Lost Update, Dirty Read, and Phantoms
The classic concurrency anomalies are lost update (two transactions both overwrite the same item), dirty read (reading uncommitted data that may be rolled back), non-repeatable read (re-reading a value after another transaction changes it), and phantom reads (new rows appearing in a range query). Each can cause financial loss or incorrect business decisions.
Lock-Based Protocols
The simplest mechanism is locking. Transactions acquire shared (S) locks for reads and exclusive (X) locks for writes; compatibility rules dictate which locks can coexist. Granularity may be at the row, page, table, or database level, with each level balancing concurrency against overhead.
Two-Phase Locking (2PL)
Two-Phase Locking is the cornerstone protocol: each transaction has a growing phase during which it only acquires locks, followed by a shrinking phase during which it only releases them. 2PL guarantees conflict-serializability. Strict 2PL holds exclusive locks until commit, simplifying recovery. Rigorous 2PL holds all locks until commit and is the most common variant in practice.
Deadlocks
Locks can cause deadlocks: cycles in which transactions wait on each other's locks. Prevention strategies include wait-die and wound-wait, which use timestamps to decide which transaction is aborted. Detection uses a wait-for graph; when a cycle is found, one transaction is chosen as a victim and rolled back. Preventing starvation requires fairness in victim selection.
Timestamp-Based Protocols
Timestamp ordering assigns each transaction a unique timestamp and enforces serializability equivalent to the timestamp order. Read and write timestamps of data items are maintained; operations that would violate the order are aborted and restarted. This protocol avoids deadlocks entirely.
Optimistic Concurrency Control
Optimistic protocols defer conflict detection to transaction end. Each transaction proceeds through read, validation, and write phases; validation confirms that no conflicting transaction has committed in between. Optimistic approaches work well when conflicts are rare.
Multiversion Concurrency Control (MVCC)
MVCC maintains multiple versions of each data item so that readers and writers rarely block each other. Each transaction sees a consistent snapshot corresponding to its start time. PostgreSQL, Oracle, and many modern systems use MVCC; it provides high concurrency, especially for read-heavy workloads.
Granularity and Intention Locks
Locks may span multiple granularities. Intention locks (IS, IX, SIX) on higher levels signal the presence of locks at lower levels, allowing a mix of coarse- and fine-grained locking without scanning all lower-level objects.
Index Concurrency
Indexes like B+ trees need their own concurrency protocols. Crabbing locks a parent node, moves to a child, then releases the parent once the child is safely latched. Specialized protocols such as B-link trees allow even higher concurrency during tree modifications.
Summary
Concurrency control balances correctness and throughput. Locking, timestamps, optimistic methods, and MVCC each solve the problem differently; production systems typically mix several techniques. Understanding these mechanisms is essential for writing and tuning high-concurrency database applications.