Chapter 5 3 min read
Save

Sampling and Sampling Distributions

Statistics II · BCA · Updated Apr 15, 2026

Table of Contents

Sampling and Sampling Distributions

Statistics draws conclusions about a population from a sample. To do so reliably, we must understand how samples behave. Sampling distributions describe the probabilistic behaviour of sample statistics across many hypothetical samples.

Population and Sample

A population is the entire collection of items of interest, described by parameters like μ and σ. A sample is a subset actually observed. Summaries of the sample, called statistics, estimate population parameters.

Random Sampling

Simple random sampling selects n items such that every subset of size n has equal chance of being chosen. Stratified, cluster, and systematic sampling are variants used when the population has natural groupings or when simple random sampling is impractical. Good sampling design is the foundation of valid inference.

Sources of Error

Sampling produces sampling error, variability simply due to chance. Non-sampling errors come from bias, measurement error, or non-response. Statistical theory quantifies sampling error but cannot fix systematic biases; careful design and data collection are essential.

Sampling Distribution of the Mean

If X1, ..., Xn are independent with mean μ and variance σ2, the sample mean X̅ has mean μ and variance σ2/n. The standard error of the mean is σ/√n. Doubling sample size halves the standard error's square, i.e. quadrupling n halves σ/√n.

Central Limit Theorem

For large n, the sample mean is approximately normal, regardless of the population distribution, provided finite variance. This central limit theorem is the foundation of most inferential techniques and justifies using the normal distribution for standardized sample means.

Sampling Distribution of the Proportion

The sample proportion p̂ for a binary population has mean p and variance p(1 − p)/n. For large n, it is approximately normal with these parameters, enabling normal approximation inference for proportions and polls.

Sampling Distribution of the Variance

If the population is normal, (n − 1)S22 has a chi-square distribution with n − 1 degrees of freedom. This result underpins confidence intervals and tests for variances and is central to quality control.

Finite Population Correction

When sampling a substantial fraction of a finite population without replacement, the variance of the sample mean is multiplied by a finite population correction factor (N − n)/(N − 1). It reduces the standard error when the sample is large relative to the population.

Bootstrap Methods

Modern computers enable resampling approaches. The bootstrap repeatedly samples with replacement from the observed data to approximate the sampling distribution of almost any statistic. It replaces analytical formulas with simulation and handles complicated estimators easily.

Summary

Sample statistics fluctuate from sample to sample, and that fluctuation is quantified by sampling distributions. The CLT, standard errors, and resampling methods allow us to reason about how confident we can be in estimates drawn from data.

Related Notes

Discussion

0 comments

Join the discussion

Log in to share your thoughts and help fellow students.

Log in to comment

No comments yet. Be the first to share your thoughts!