Probability Theory

Probability is the mathematical language of uncertainty. It quantifies how likely events are to occur and provides the foundation for statistical inference, risk analysis, machine learning, and decision theory. This chapter develops the basic vocabulary and rules.

Random Experiments and Sample Spaces

A random experiment is any process whose outcome is uncertain but reproducible in principle. The set of all possible outcomes is the sample space, denoted S or Ω. A single element is a sample point; a subset is an event. Tossing a coin has sample space {H, T}; rolling a die has {1, 2, 3, 4, 5, 6}.

Axiomatic Definition

Kolmogorov's axioms define a probability measure P satisfying (1) P(A) ≥ 0 for every event A; (2) P(S) = 1; (3) countable additivity: for disjoint events A₁, A₂, ..., P(∪ A_i) = Σ P(A_i). All familiar rules follow from these three axioms.

Classical and Empirical Probability

The classical definition assumes equally likely outcomes: P(A) = (favourable cases)/(total cases). The empirical approach estimates probability by repeating the experiment and counting relative frequencies. Both align with the axiomatic framework in appropriate settings.

Addition and Complement Rules

For any events A and B, P(A ∪ B) = P(A) + P(B) − P(A ∩ B). If A and B are mutually exclusive, the intersection term vanishes. The complement rule P(A^c) = 1 − P(A) is often simpler than direct computation of P(A).

Conditional Probability

The conditional probability of A given B is P(A | B) = P(A ∩ B) / P(B), provided P(B) > 0. It models how knowledge of B's occurrence changes the probability assigned to A. Two events are independent iff P(A ∩ B) = P(A) P(B), or equivalently P(A | B) = P(A).

Multiplication Rule

The multiplication rule P(A ∩ B) = P(A) P(B | A) generalizes to n events and is the workhorse of probability trees. It is especially useful when modelling sequential experiments like drawing cards without replacement.

Total Probability and Bayes' Theorem

If {B₁, B₂, ..., B_n} partitions S, the law of total probability states P(A) = Σ P(A | B_i) P(B_i). Bayes' theorem inverts conditional probability: P(B_k | A) = P(A | B_k) P(B_k) / Σ P(A | B_i) P(B_i). It is central to medical diagnosis, spam filtering, and Bayesian inference.

Counting Techniques

Many probabilities reduce to counting. Permutations n! and nPr arise when order matters; combinations nCr when it does not. The multiplication principle and pigeonhole principle also find frequent use. For complicated problems, generating functions and inclusion–exclusion extend the basic toolbox.

Examples in Computing

Hash tables rely on probabilistic analysis of collisions; randomized algorithms use probability to bound running times; packet loss on networks is modelled probabilistically. Probability pervades modern computer science.

Summary

Probability theory formalizes uncertainty through sample spaces, events, and a measure satisfying Kolmogorov's axioms. Conditional probability and Bayes' theorem describe how information updates belief, while counting and independence ease calculation. Mastery of these ideas prepares the ground for random variables and statistical inference.

Probability Theory