Probability Theory
Probability is the mathematical language of uncertainty. It quantifies how likely events are to occur and provides the foundation for statistical inference, risk analysis, machine learning, and decision theory. This chapter develops the basic vocabulary and rules.
Random Experiments and Sample Spaces
A random experiment is any process whose outcome is uncertain but reproducible in principle. The set of all possible outcomes is the sample space, denoted S or Ω. A single element is a sample point; a subset is an event. Tossing a coin has sample space {H, T}; rolling a die has {1, 2, 3, 4, 5, 6}.
Axiomatic Definition
Kolmogorov's axioms define a probability measure P satisfying (1) P(A) ≥ 0 for every event A; (2) P(S) = 1; (3) countable additivity: for disjoint events A1, A2, ..., P(∪ Ai) = Σ P(Ai). All familiar rules follow from these three axioms.
Classical and Empirical Probability
The classical definition assumes equally likely outcomes: P(A) = (favourable cases)/(total cases). The empirical approach estimates probability by repeating the experiment and counting relative frequencies. Both align with the axiomatic framework in appropriate settings.
Addition and Complement Rules
For any events A and B, P(A ∪ B) = P(A) + P(B) − P(A ∩ B). If A and B are mutually exclusive, the intersection term vanishes. The complement rule P(Ac) = 1 − P(A) is often simpler than direct computation of P(A).
Conditional Probability
The conditional probability of A given B is P(A | B) = P(A ∩ B) / P(B), provided P(B) > 0. It models how knowledge of B's occurrence changes the probability assigned to A. Two events are independent iff P(A ∩ B) = P(A) P(B), or equivalently P(A | B) = P(A).
Multiplication Rule
The multiplication rule P(A ∩ B) = P(A) P(B | A) generalizes to n events and is the workhorse of probability trees. It is especially useful when modelling sequential experiments like drawing cards without replacement.
Total Probability and Bayes' Theorem
If {B1, B2, ..., Bn} partitions S, the law of total probability states P(A) = Σ P(A | Bi) P(Bi). Bayes' theorem inverts conditional probability: P(Bk | A) = P(A | Bk) P(Bk) / Σ P(A | Bi) P(Bi). It is central to medical diagnosis, spam filtering, and Bayesian inference.
Counting Techniques
Many probabilities reduce to counting. Permutations n! and nPr arise when order matters; combinations nCr when it does not. The multiplication principle and pigeonhole principle also find frequent use. For complicated problems, generating functions and inclusion–exclusion extend the basic toolbox.
Examples in Computing
Hash tables rely on probabilistic analysis of collisions; randomized algorithms use probability to bound running times; packet loss on networks is modelled probabilistically. Probability pervades modern computer science.
Summary
Probability theory formalizes uncertainty through sample spaces, events, and a measure satisfying Kolmogorov's axioms. Conditional probability and Bayes' theorem describe how information updates belief, while counting and independence ease calculation. Mastery of these ideas prepares the ground for random variables and statistical inference.