Introduction to Database Systems
A database is an organized collection of related data, and a Database Management System (DBMS) is the software that stores, retrieves, and manages that data while controlling concurrent access and ensuring integrity. DBMSs are at the heart of nearly every non-trivial software system, from banking to e-commerce to social networks.
File Systems vs DBMS
Earlier applications used flat files for data storage, but file-based systems suffered from data redundancy, inconsistency, difficulty in access, limited concurrent use, and weak security. A DBMS overcomes these by providing a central, well-structured data store with a query language, transaction support, and controlled access.
Data Abstraction
DBMS introduces three levels of data abstraction. The physical level describes how data is actually stored on disk. The logical level describes the structure of the data: tables, relationships, and constraints. The view level presents specialized subsets of the data to different users. This separation allows the physical representation to change without affecting applications, a property called physical data independence. Similarly, logical data independence insulates user views from changes in the logical schema.
Data Models
A data model defines how data is represented conceptually. The relational model, based on tables of rows and columns, dominates commercial database systems. Other models include the hierarchical model (tree-structured), the network model (graph-structured), the object-oriented model, and the document model used by NoSQL systems. The entity-relationship (ER) model is widely used during design but is ultimately mapped to one of the implementation models.
Database Users and Administrators
Several classes of users interact with a database: naive users access the system through forms and reports; application programmers write software that accesses the database; sophisticated users write ad-hoc queries; and the Database Administrator (DBA) is responsible for creating schemas, granting access, tuning performance, managing backups, and enforcing security. The DBA's role is critical to the long-term health of any production database.
DBMS Architecture
The three-schema architecture (ANSI/SPARC) separates internal, conceptual, and external schemas to enforce data independence. Physically, a DBMS may be organized as a centralized single-server system, a client-server system where clients send queries to a shared server, a parallel system exploiting many processors, or a distributed system spanning multiple geographically separate sites. Cloud-based Database-as-a-Service platforms are a modern variant that combines elements of several of these architectures.
Components of a DBMS
Internally, a DBMS consists of a query processor that parses and optimizes queries, a storage manager that handles buffer management, file organization, and disk access, a transaction manager that provides ACID guarantees, and utilities for backup, recovery, and security. Understanding these components helps developers write efficient queries and diagnose performance problems.
Advantages and Challenges
DBMSs bring data consistency, reduced redundancy, improved security, backup and recovery, and concurrent access. They do, however, introduce cost, complexity, and a learning curve, and small applications may be better served by simpler storage. Choosing the right system for the application is itself an important part of database design.
Summary
This chapter has introduced databases and the DBMS, the benefits over file-based storage, the concept of data abstraction and models, the major user roles, and the typical architecture of a database system. Subsequent chapters develop the relational model, the SQL query language, and the internals that make robust, high-performance databases possible.