Chapter 5 4 min read
Save

Regression Analysis

Business Statistics · BBS · Updated Apr 23, 2026

Table of Contents

Regression Analysis

Regression analysis establishes a mathematical relationship between a dependent variable and one or more independent variables. While correlation measures the strength of association, regression enables prediction — given the value of x, what is the expected value of y?

Concept of Regression

Sir Francis Galton introduced regression studying height inheritance — tall parents tend to have children closer to average height (regression toward the mean). In business: predict sales from advertising spend, estimate cost from production volume, forecast revenue from economic indicators. The dependent variable (y) is what we want to predict; the independent variable (x) is the predictor.

Regression vs Correlation

Correlation measures mutual association (symmetric — rxy = ryx). Regression establishes cause-effect direction (asymmetric — y on x ≠ x on y). Correlation gives a single value; regression gives an equation. Correlation describes; regression predicts. Both assume linear relationship, but regression goes further by modelling the relationship mathematically.

Simple Linear Regression

The regression equation is: ŷ = a + bx, where ŷ is the predicted value of y, a is the y-intercept (value of y when x = 0), and b is the slope (change in y per unit change in x). There are two regression lines: y on x (y = a + bx, used to predict y from x) and x on y (x = c + dy, used to predict x from y). The two lines coincide only when r = ±1 (perfect correlation).

Least Squares Method

The method of least squares finds a and b by minimising the sum of squared residuals: Σ(y − ŷ)². Normal equations: Σy = na + bΣx and Σxy = aΣx + bΣx². Solving: b = [nΣxy − ΣxΣy] / [nΣx² − (Σx)²] and a = ȳ − bx̄. The regression line always passes through the point (x̄, ȳ).

Properties of Regression Coefficients

The product of regression coefficients equals r²: byx × bxy = r². Both coefficients have the same sign. If one > 1, the other < 1. Geometric mean equals r: r = ±√(byx × bxy). These provide calculation accuracy checks.

Standard Error of Estimate

SE = √[Σ(y − ŷ)²/n]. Smaller SE means more accurate predictions. SE = 0 means perfect prediction. Analogous to standard deviation around the regression line.

Prediction

Interpolation (within data range) is reliable. Extrapolation (beyond range) is risky — relationship may not hold outside observed range.

Summary

Regression analysis — equations, least squares, and predictions — is one of the most powerful statistical tools in business for data-driven forecasting and decision-making.

Worked Example: Regression Line

Using the advertising (X) and sales (Y) data from the correlation chapter:

n=6ΣX=48ΣY=322ΣXY=2732ΣX²=418ΣY²=17998

Regression of Y on X (predicting sales from advertising):

b = [nΣXY − ΣXΣY] / [nΣX² − (ΣX)²] = [6(2732) − 48(322)] / [6(418) − (48)²] = [16392 − 15456] / [2508 − 2304] = 936/204 = b = 4.588

a = ȳ − bx̄ = (322/6) − 4.588(48/6) = 53.67 − 4.588(8) = 53.67 − 36.71 = a = 16.96

Regression equation: Y = 16.96 + 4.588X

Interpretation: When advertising = 0, expected sales = Rs 16.96 lakhs (base sales without advertising). For every Rs 1 lakh increase in advertising, sales increase by Rs 4.588 lakhs. This means each rupee spent on advertising generates approximately Rs 4.59 in sales — a strong return on investment.

Prediction: If the company plans to spend Rs 15 lakhs on advertising next month: Y = 16.96 + 4.588(15) = 16.96 + 68.82 = Rs 85.78 lakhs in expected sales. Note: this is extrapolation beyond the data range (max X was 12), so the prediction should be treated with caution.

Properties to Verify

We can verify: byx × bxy should equal r². We calculated byx = 4.588 and r = 0.999, so r² = 0.998. Let’s compute bxy: bxy = [nΣXY − ΣXΣY] / [nΣY² − (ΣY)²] = 936/4304 = 0.2175. Check: 4.588 × 0.2175 = 0.998 ≈ r² ✔️. Also: r = ±√(4.588 × 0.2175) = √0.998 = 0.999 ✔️.

Exam Tips for Regression

Tip 1: Clearly state which regression line you are computing — Y on X or X on Y. Tip 2: The regression line always passes through (̄x, ȳ) — verify your answer by substituting the means. Tip 3: Distinguish interpolation (reliable, within data range) from extrapolation (risky, outside range). Tip 4: If both regression coefficients are given, you can find r using r = ±√(byx × bxy). Tip 5: Interpret the slope in business terms (e.g., "each additional unit of X increases Y by b units"). Tip 6: If asked for both regression lines, calculate and present both equations clearly.

Related Notes

Discussion

0 comments

Join the discussion

Log in to share your thoughts and help fellow students.

Log in to comment

No comments yet. Be the first to share your thoughts!