TOP EDU PREP Β· Statistics Series

Correlation Coefficient

Complete Unit Β· 20 Exam-Style Problems Β· Auto-Graded

πŸ“ SAT πŸ“Š AP Statistics πŸŽ“ IB Math AA/AI ⏱ 40 min
⏱ Time
40:00

Core Concepts

01 Β· Definition

What is the Correlation Coefficient?

The Pearson correlation coefficient ($r$) measures the strength and direction of the linear relationship between two quantitative variables $X$ and $Y$. It always lies in the interval:

$$-1 \leq r \leq 1$$

A value of $r = 1$ indicates a perfect positive linear relationship; $r = -1$ a perfect negative linear relationship; and $r = 0$ indicates no linear relationship (though a non-linear relationship may still exist).

02 Β· Formula

Pearson's r β€” The Formula

$$r = \frac{\displaystyle\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\displaystyle\sum_{i=1}^{n}(x_i-\bar{x})^2 \cdot \sum_{i=1}^{n}(y_i-\bar{y})^2}}$$

Equivalently using standard deviations and covariance:

$$r = \frac{S_{xy}}{S_x \cdot S_y} \qquad \text{where} \quad S_{xy} = \frac{1}{n-1}\sum(x_i-\bar{x})(y_i-\bar{y})$$
n
sample size
xΜ„, Θ³
sample means
Sβ‚“, Sα΅§
std deviations
Sβ‚“α΅§
covariance
03 Β· Interpretation

Strength Scale

βˆ’1 ↔ βˆ’0.7
Strong βˆ’
βˆ’0.7 ↔ βˆ’0.4
Mod βˆ’
βˆ’0.4 ↔ 0
Weak βˆ’
0 ↔ 0.4
Weak +
0.4 ↔ 0.7
Mod +
0.7 ↔ 1
Strong +

Note: The exact cutoff values vary by textbook. The above (0.4 / 0.7) are commonly used in AP/IB contexts.

04 Β· Scatter Patterns

Recognizing r from Scatter Plots

β†—πŸ“ˆ
Perfect Positive
r = +1
β†—πŸ”΅
Strong Positive
r β‰ˆ +0.85
β­•
No Correlation
r β‰ˆ 0
β†˜πŸ”΄
Strong Negative
r β‰ˆ βˆ’0.85
β†˜πŸ“‰
Perfect Negative
r = βˆ’1
05 Β· rΒ²

Coefficient of Determination β€” $r^2$

The square of the correlation coefficient, $r^2$, tells us the proportion of variance in $Y$ explained by the linear relationship with $X$.

$$r^2 = \frac{\text{Explained Variation (SSR)}}{\text{Total Variation (SST)}}$$

Example: If $r = 0.8$, then $r^2 = 0.64$, meaning 64% of the variability in $Y$ is explained by $X$.

06 Β· Cautions

What r Does NOT Tell You

  • Causation β‰  Correlation: A high $|r|$ does not imply that $X$ causes $Y$.
  • Non-linear relationships: $r$ only measures linear association. A curved pattern may have $r \approx 0$ yet show a strong relationship.
  • Outliers: A single extreme point can drastically alter $r$.
  • Restricted range: Limiting the range of $X$ or $Y$ artificially reduces $|r|$.
  • Units invariant: $r$ has no units and does not change when you rescale or shift the data.

Memorize This

πŸ”‘

Essential Formulas

  • $r = \dfrac{S_{xy}}{S_x S_y}$  (correlation from covariance)
  • $S_{xy} = \dfrac{1}{n-1}\sum(x_i-\bar{x})(y_i-\bar{y})$
  • $-1 \le r \le 1$ always
  • $r^2$ = proportion of variance explained
  • Slope of regression: $b = r \cdot \dfrac{S_y}{S_x}$
πŸ“Š

Strength Thresholds (AP/IB Standard)

  • $|r| \ge 0.7$ β†’ Strong correlation
  • $0.4 \le |r| < 0.7$ β†’ Moderate correlation
  • $|r| < 0.4$ β†’ Weak (or negligible) correlation
  • $r > 0$ β†’ variables move in same direction
  • $r < 0$ β†’ variables move in opposite directions
⚠️

Common Exam Traps

  • $r$ does not change if you add/subtract a constant from all data values.
  • $r$ does not change if you multiply all values by a positive constant. Multiplying by a negative constant flips the sign of $r$.
  • Swapping $X$ and $Y$ does not change the value of $r$.
  • A non-linear relationship can have $r = 0$ β€” do not confuse "no linear correlation" with "no relationship."
  • Correlation is not causation β€” always check for lurking variables.
  • $r^2$ can never be negative; $r$ can be.
🧠

Quick Recall Checks

  • If $r = -0.92$: direction = negative, strength = strong
  • If $r = 0.35$: direction = positive, strength = weak
  • If $r = 0.6$, then $r^2 = \mathbf{0.36}$ β†’ 36% variance explained
  • Regression line always passes through $(\bar{x},\, \bar{y})$
  • If both $x$ and $y$ values are doubled, $r$ stays the same

Worked Examples

EXAMPLE 01 Β· Calculation
Five students' study hours ($x$) and exam scores ($y$) are: $(2,50),\,(3,65),\,(5,75),\,(7,85),\,(8,90)$. Calculate the Pearson correlation coefficient $r$.
1
Compute means: $\bar{x} = \frac{2+3+5+7+8}{5} = 5$, $\quad \bar{y} = \frac{50+65+75+85+90}{5} = 73$.
2
Compute deviations and products:
$(x_i-\bar{x})$: $-3,\,-2,\,0,\,2,\,3$
$(y_i-\bar{y})$: $-23,\,-8,\,2,\,12,\,17$
Products: $69,\,16,\,0,\,24,\,51$ β†’ $\sum = 160$
3
$\sum(x_i-\bar{x})^2 = 9+4+0+4+9 = 26$
$\sum(y_i-\bar{y})^2 = 529+64+4+144+289 = 1030$
4
$$r = \frac{160}{\sqrt{26 \times 1030}} = \frac{160}{\sqrt{26780}} \approx \frac{160}{163.6} \approx 0.978$$
r β‰ˆ 0.978 (Strong Positive)
EXAMPLE 02 Β· rΒ² Interpretation
A researcher finds $r = -0.75$ between daily screen time ($x$, hours) and sleep quality score ($y$). (a) Describe the relationship. (b) What percentage of variance in sleep quality is explained by screen time?
1
(a) $r = -0.75$: strong negative linear relationship. As screen time increases, sleep quality tends to decrease.
2
(b) $r^2 = (-0.75)^2 = 0.5625$, so approximately 56.25% of the variability in sleep quality is explained by the linear relationship with screen time.
56.25% of variance explained
EXAMPLE 03 Β· Transformation Effect
Dataset A has $r = 0.6$. A new dataset B is created by multiplying all $x$-values by $-2$ and all $y$-values by $3$. What is the correlation coefficient for dataset B?
1
Multiplying $y$ by $+3$ (positive): does not change the sign of $r$.
2
Multiplying $x$ by $-2$ (negative): flips the sign of $r$.
3
The magnitude of $r$ is unchanged (scaling has no effect on $|r|$).
Therefore, $r_B = -0.6$.
r = βˆ’0.6

πŸ“ Practice Problems

Enter your answer and press Check. Round to 2 decimal places where needed.
0 / 20

πŸŽ“ Final Results

0
Keep going!
0
Correct
0
Wrong
0%
Score

Answer Key & Full Solutions