Hypothesis Testing
Hypothesis testing is a formal procedure for deciding between two competing statements about a population based on sample evidence. It underlies quality control, scientific discovery, and A/B testing in industry.
Null and Alternative Hypotheses
The null hypothesis H0 typically represents a default or no-effect claim, such as μ = μ0. The alternative hypothesis H1 is what we suspect may be true instead. Tests can be two-sided (H1: μ ≠ μ0) or one-sided.
Test Statistic and Decision Rule
A test statistic measures how inconsistent the sample is with H0. The rejection region is the set of statistic values leading to rejection of H0. The significance level α is the probability of rejecting H0 when it is true (a Type I error); commonly α = 0.05.
Type I and Type II Errors
A Type I error rejects a true H0; a Type II error fails to reject a false H0. Power is 1 − Type II probability. There is always a trade-off: decreasing α increases Type II error for fixed sample size. Power analysis determines n for a desired balance.
p-value
The p-value is the probability, under H0, of a test statistic at least as extreme as the observed. Reject H0 if p < α. Small p-values indicate strong evidence against H0. p-values should be reported with effect sizes and context; statistical significance is not the same as practical importance.
Z-test for Means
When σ is known and n is large, Z = (X̅ − μ0)/(σ/√n) follows the standard normal under H0. The test compares Z to zα/2 (two-sided) or zα (one-sided) to decide.
t-test for Means
When σ is unknown and n is moderate, T = (X̅ − μ0)/(S/√n) follows Student's t with n − 1 degrees of freedom under normality of the population. Two-sample t-tests compare means of two independent groups, with pooled or Welch's variants.
Tests for Proportions
For large n, the proportion z-test statistic is (p̂ − p0)/√(p0(1 − p0)/n). Two-sample proportion tests compare two independent proportions, often in A/B testing.
Chi-Square Tests
The chi-square goodness-of-fit test assesses whether observed frequencies match expected ones. The chi-square test of independence evaluates whether two categorical variables are related using a contingency table. Both use χ2 = Σ (O − E)2/E.
ANOVA Overview
Analysis of Variance (ANOVA) extends the two-sample t-test to three or more groups by partitioning variability into between-group and within-group components. The F-statistic compares the two.
Summary
Hypothesis testing translates questions into decisions. Choosing the correct test (z, t, chi-square, F), controlling Type I error, reporting p-values with effect sizes, and understanding power are essentials of statistical practice.