Recovery Systems
Databases must survive failures: software crashes, power outages, disk errors. Recovery is the set of techniques that restore the database to a consistent state after a failure while preserving durability of committed transactions. Recovery and concurrency control together implement the ACID guarantees.
Failure Classification
Failures fall into several categories: transaction failures (application errors, deadlocks), system crashes (operating system or DBMS failures that lose in-memory state), and media failures (disk corruption that loses on-disk data). Each class demands a different response.
Log-Based Recovery
Recovery is built on the log: a sequential record of every operation. Each log record contains the transaction ID, data item, and old and new values. The write-ahead logging (WAL) rule requires that log records for an operation reach stable storage before the corresponding data pages. WAL ensures that all necessary information survives a crash.
Deferred and Immediate Updates
The deferred update (no-undo/redo) scheme writes updates to the database only after the transaction commits; the log is used only for redo. The immediate update (undo/redo) scheme may write uncommitted updates to disk; both undo and redo may be needed during recovery. Most real systems use immediate updates for efficiency.
Checkpoints
Checkpoints bound the amount of log that must be replayed after a crash. A checkpoint flushes dirty pages, records active transactions, and writes a checkpoint marker. During recovery, the system starts from the last checkpoint rather than the start of the log. Fuzzy checkpoints allow normal processing to continue during the checkpoint.
Undo and Redo
During recovery, each log record is examined. Redo reapplies operations of committed transactions that may not have reached disk. Undo reverses operations of transactions that were active at the crash and therefore did not commit. The combination leaves only the effects of committed transactions.
ARIES Algorithm
ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is the modern standard. It uses three phases: Analysis identifies dirty pages and active transactions from the log; Redo repeats all history deterministically; Undo rolls back losers. ARIES supports fine-grained locking, partial rollbacks, and efficient recovery even for long-running transactions.
Shadow Paging
Shadow paging is an alternative technique that maintains two copies of the page table during a transaction. At commit, the shadow table becomes current, providing atomicity without a separate undo log. Shadow paging does not require a log for recovery but fragments data and complicates concurrent updates.
Media Recovery
When a disk fails, the system restores from a backup and re-applies log records up to the failure point. Remote replication and continuous archiving further protect against catastrophic failures. Modern cloud databases often use these techniques continuously in the background.
Distributed Recovery
Distributed transactions add complexity: each site has its own log, and participants must coordinate to commit or abort consistently. Two-phase commit, heuristic decisions, and presumed abort/commit optimizations help, but recovery from partial failures remains challenging.
Summary
Recovery ensures that the database is always in a consistent state despite failures. The combination of write-ahead logging, checkpoints, and an algorithm like ARIES provides both efficiency and robustness, underlying the durability users take for granted in modern systems.