Correlation Analysis
Correlation measures the strength and direction of the linear relationship between two variables. In business, understanding correlations helps identify relationships between advertising and sales, price and demand, experience and salary, and many other paired variables.
Types of Correlation
By direction: positive correlation (both variables move in the same direction — more advertising, more sales), negative correlation (variables move in opposite directions — higher price, lower demand), and zero correlation (no linear relationship). By degree: perfect (r = ±1), strong (r close to ±1), moderate, weak, or no correlation (r = 0). By linearity: linear (constant rate of change) or non-linear (curvilinear relationship). Correlation does not imply causation — two variables may be correlated due to a third factor.
Scatter Diagrams
A scatter diagram plots paired observations as points on a graph (x-axis and y-axis). Visual patterns reveal: positive correlation (points slope upward), negative (downward), no correlation (scattered randomly), and non-linear patterns (curved). Scatter diagrams are always the first step — they reveal the nature and form of the relationship before calculating numerical coefficients. They also help identify outliers that could distort correlation calculations.
Karl Pearson's Coefficient of Correlation
Pearson's r measures the strength of linear relationship between two quantitative variables. Formula: r = Σ(x−x̄)(y−ȳ) / √[Σ(x−x̄)² × Σ(y−ȳ)²]. Shortcut: r = [nΣxy − ΣxΣy] / √[nΣx² − (Σx)²] × √[nΣy² − (Σy)²]. Properties: −1 ≤ r ≤ +1, r = +1 means perfect positive linear relationship, r = −1 means perfect negative, r = 0 means no linear relationship. r is dimensionless (no units), symmetric (rxy = ryx), and not affected by change of origin or scale.
Interpreting r
General guidelines: |r| = 0.90-1.00 (very strong), 0.70-0.89 (strong), 0.50-0.69 (moderate), 0.30-0.49 (weak), 0.00-0.29 (very weak or no correlation). Coefficient of determination (r²) is the proportion of variance in y explained by x. If r = 0.8, then r² = 0.64, meaning 64% of the variation in y is explained by x.
Spearman's Rank Correlation
Spearman's ρ (rho) measures correlation between ranked data. Formula: ρ = 1 − [6Σd²/n(n²−1)] where d is the difference between ranks. Use when: data is ordinal, doesn't meet normality assumptions, or quick approximation needed. For tied ranks, assign average rank. More robust to outliers than Pearson's.
Probable Error
Probable Error (PE) = 0.6745 × (1 − r²)/√n. If r > 6 × PE, correlation is significant. If r < PE, insignificant. Quick significance test.
Limitations
Correlation ≠ causation. Outliers distort r. Only measures linear relationships. Ecological fallacy. Always examine scatter diagrams alongside numerical measures.
Summary
Correlation analysis — scatter diagrams, Pearson's r, Spearman's ρ, and coefficient of determination — quantifies relationships between variables. Fundamental for business analysis but must be interpreted carefully.
Worked Example: Karl Pearson’s Correlation
A company wants to know if advertising expenditure (X, in Rs lakhs) is related to sales revenue (Y, in Rs lakhs). Data for 6 months:
| Month | X (Advertising) | Y (Sales) | XY | X² | Y² |
|---|---|---|---|---|---|
| 1 | 5 | 40 | 200 | 25 | 1600 |
| 2 | 8 | 54 | 432 | 64 | 2916 |
| 3 | 6 | 45 | 270 | 36 | 2025 |
| 4 | 10 | 63 | 630 | 100 | 3969 |
| 5 | 7 | 48 | 336 | 49 | 2304 |
| 6 | 12 | 72 | 864 | 144 | 5184 |
| Total | ΣX=48 | ΣY=322 | ΣXY=2732 | ΣX²=418 | ΣY²=17998 |
Solution: r = [nΣXY − ΣXΣY] / √[nΣX² − (ΣX)²] × √[nΣY² − (ΣY)²]
r = [6(2732) − (48)(322)] / √[6(418) − (48)²] × √[6(17998) − (322)²]
r = [16392 − 15456] / √[2508 − 2304] × √[107988 − 103684]
r = 936 / √204 × √4304 = 936 / (14.28 × 65.60) = 936 / 936.77 = r = 0.999
Interpretation: r = 0.999 indicates an almost perfect positive correlation between advertising expenditure and sales revenue. As advertising increases, sales increase proportionally. The coefficient of determination r² = 0.998, meaning 99.8% of the variation in sales is explained by advertising. This strongly supports increasing the advertising budget.
Significance test: PE = 0.6745 × (1 − 0.999²)/√6 = 0.6745 × 0.001/2.449 = 0.0003. Since r (0.999) > 6 × PE (0.0018), the correlation is highly significant — it is not due to chance.
Worked Example: Spearman’s Rank Correlation
Two judges ranked 7 contestants in a business plan competition:
| Contestant | Judge A Rank | Judge B Rank | d = R₁−R₂ | d² |
|---|---|---|---|---|
| P | 1 | 2 | −1 | 1 |
| Q | 2 | 1 | 1 | 1 |
| R | 3 | 4 | −1 | 1 |
| S | 4 | 3 | 1 | 1 |
| T | 5 | 6 | −1 | 1 |
| U | 6 | 5 | 1 | 1 |
| V | 7 | 7 | 0 | 0 |
| Total | Σd² = 6 |
Solution: ρ = 1 − [6Σd² / n(n²−1)] = 1 − [6×6 / 7(49−1)] = 1 − [36/336] = 1 − 0.107 = ρ = 0.893
Interpretation: ρ = 0.893 indicates strong positive agreement between the two judges. They largely agree on the ranking of contestants, though there are minor differences in individual placements.
Common Exam Mistakes to Avoid
Mistake 1: Saying "correlation proves that X causes Y." Always state that correlation shows association, not causation. Mistake 2: Forgetting to square the denominator components separately before multiplying in Pearson’s formula. Mistake 3: Using Pearson’s r for ranked/ordinal data instead of Spearman’s ρ. Mistake 4: Not interpreting r² (coefficient of determination) when asked about the meaning of the correlation value. Mistake 5: Ignoring the sign — r = −0.9 is a stronger correlation than r = +0.5 (compare absolute values). Mistake 6: Computing r when the scatter diagram clearly shows a non-linear relationship (r only measures linear correlation).
Nepal Business Applications
Banking: Correlation between interest rates and loan demand helps NRB set monetary policy. Tourism: Correlation between tourist arrivals and hotel revenue guides capacity planning. Agriculture: Correlation between rainfall and crop yield helps farmers and policymakers plan irrigation investments. Stock Market: Correlation between NEPSE index and individual stock prices guides portfolio diversification — low correlation between stocks reduces portfolio risk.