Relational Model
The relational model, introduced by E. F. Codd in 1970, represents data as mathematical relations—sets of tuples drawn from specified domains. Its simplicity, strong theoretical foundation, and amenability to query languages made it the dominant database paradigm for decades.
Relations and Tuples
A relation in the mathematical sense is a subset of the Cartesian product of its attribute domains. In practice a relation is presented as a table whose columns correspond to attributes and whose rows are tuples. Each attribute has a domain of allowable atomic values; relations contain no duplicate tuples and no ordering among tuples.
Schemas and Instances
A relation schema R(A1, A2, ..., An) lists the attributes and their domains. A relation instance is the current set of tuples satisfying the schema. A database schema is a collection of relation schemas with integrity constraints. The schema is relatively stable; the instance changes as data is inserted, deleted, or updated.
Keys in the Relational Model
As in the ER model, a super key is any subset of attributes whose values uniquely identify each tuple. A candidate key is a minimal super key, and one candidate key is chosen as the primary key. Other candidate keys are called alternate keys. A foreign key in one relation references the primary key of another, providing the mechanism by which related data is linked.
Integrity Constraints
Constraints restrict the legal instances of a schema. Domain constraints restrict attribute values to their domains. Entity integrity requires that primary key attributes be non-null. Referential integrity requires every foreign key value to either match an existing primary key value or be null. General constraints, such as check expressions and assertions, enforce application-specific rules.
Relational Schema Design
Schemas should represent reality accurately while minimizing redundancy. Normalization (discussed in a later chapter) is a systematic procedure for improving schema quality. Proper use of keys, references, and constraints protects data from inconsistency and simplifies application code.
Null Values
The special value NULL denotes unknown or inapplicable data. Three-valued logic complicates query semantics: comparisons with NULL yield neither true nor false but unknown. Designers balance the convenience of allowing NULLs with the ambiguity they introduce.
Relational Languages
Relational operations are expressed in three related formalisms: relational algebra, a procedural language with operators such as selection, projection, union, and join; tuple relational calculus, a declarative language using first-order logic; and domain relational calculus, another declarative variant. Commercial query languages like SQL draw on all three.
Properties of the Relational Model
The relational model is value-oriented: tuples and relationships are referenced by values, not internal identifiers. Queries are declarative: users specify what data they want, not how to obtain it. The model's mathematical grounding enables powerful optimizations and correctness proofs.
Summary
The relational model is simple yet expressive: relations, tuples, keys, and constraints together model complex domains. Understanding the model deeply is essential before studying SQL and the inner workings of relational databases, which build on this theoretical foundation.