Hypothesis Testing

Hypothesis testing is a formal procedure for deciding between two competing statements about a population based on sample evidence. It underlies quality control, scientific discovery, and A/B testing in industry.

Null and Alternative Hypotheses

The null hypothesis H₀ typically represents a default or no-effect claim, such as μ = μ₀. The alternative hypothesis H₁ is what we suspect may be true instead. Tests can be two-sided (H₁: μ ≠ μ₀) or one-sided.

Test Statistic and Decision Rule

A test statistic measures how inconsistent the sample is with H₀. The rejection region is the set of statistic values leading to rejection of H₀. The significance level α is the probability of rejecting H₀ when it is true (a Type I error); commonly α = 0.05.

Type I and Type II Errors

A Type I error rejects a true H₀; a Type II error fails to reject a false H₀. Power is 1 − Type II probability. There is always a trade-off: decreasing α increases Type II error for fixed sample size. Power analysis determines n for a desired balance.

p-value

The p-value is the probability, under H₀, of a test statistic at least as extreme as the observed. Reject H₀ if p < α. Small p-values indicate strong evidence against H₀. p-values should be reported with effect sizes and context; statistical significance is not the same as practical importance.

Z-test for Means

When σ is known and n is large, Z = (X̅ − μ₀)/(σ/√n) follows the standard normal under H₀. The test compares Z to z_α/2 (two-sided) or z_α (one-sided) to decide.

t-test for Means

When σ is unknown and n is moderate, T = (X̅ − μ₀)/(S/√n) follows Student's t with n − 1 degrees of freedom under normality of the population. Two-sample t-tests compare means of two independent groups, with pooled or Welch's variants.

Tests for Proportions

For large n, the proportion z-test statistic is (p̂ − p₀)/√(p₀(1 − p₀)/n). Two-sample proportion tests compare two independent proportions, often in A/B testing.

Chi-Square Tests

The chi-square goodness-of-fit test assesses whether observed frequencies match expected ones. The chi-square test of independence evaluates whether two categorical variables are related using a contingency table. Both use χ² = Σ (O − E)²/E.

ANOVA Overview

Analysis of Variance (ANOVA) extends the two-sample t-test to three or more groups by partitioning variability into between-group and within-group components. The F-statistic compares the two.

Summary

Hypothesis testing translates questions into decisions. Choosing the correct test (z, t, chi-square, F), controlling Type I error, reporting p-values with effect sizes, and understanding power are essentials of statistical practice.

Hypothesis Testing