Statistics Mastery Workbook

Measures of Center & Spread

Descriptive statistics summarize the main features of a dataset using numerical measures and visualizations.

Mean: x̄ = (Σxᵢ) / n
Median: middle value when sorted
Variance: s² = Σ(xᵢ − x̄)² / (n−1)
Std Dev: s = √s²
IQR = Q3 − Q1

★ Memorize This

Use median & IQR for skewed data; use mean & SD for symmetric data
Outlier rule: below Q1 − 1.5×IQR or above Q3 + 1.5×IQR
Population variance divides by N; sample variance divides by n−1 (Bessel's correction)
Coefficient of Variation (CV) = (s / x̄) × 100%

📝 Example

Dataset: {3, 7, 7, 9, 11, 13}. Find mean, median, and IQR.

Sort: 3, 7, 7, 9, 11, 13 → n = 6

Mean = (3+7+7+9+11+13)/6 = 50/6 ≈ 8.33

Median = (7+9)/2 = 8

Q1 = 7, Q3 = 11 → IQR = 4

Probability Rules

Probability measures the likelihood of events. Key rules govern how probabilities combine.

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
P(A | B) = P(A ∩ B) / P(B)
Independent: P(A ∩ B) = P(A) · P(B)
Complement: P(Aᶜ) = 1 − P(A)
Bayes: P(B|A) = P(A|B)·P(B) / P(A)

★ Memorize This

Mutually exclusive → P(A ∩ B) = 0, so P(A ∪ B) = P(A) + P(B)
Independent ≠ Mutually exclusive (they're almost opposites)
Conditional probability = "given that" = narrow down sample space
Bayes' Theorem: reverse conditional probabilities

📝 Example

P(A) = 0.4, P(B) = 0.3, P(A∩B) = 0.12. Are A and B independent?

Check: P(A)·P(B) = 0.4×0.3 = 0.12 = P(A∩B) ✓ Independent

Also: P(A|B) = 0.12/0.3 = 0.4 = P(A) ✓

Binomial & Normal Distributions

Discrete random variables follow binomial distributions; many continuous quantities follow normal distributions.

Binomial: P(X=k) = C(n,k) · pᵏ · (1−p)^(n−k)
Binomial Mean: μ = np
Binomial Var: σ² = np(1−p)

Normal: X ~ N(μ, σ²)
Z-score: z = (x − μ) / σ
Empirical Rule: 68% within 1σ, 95% within 2σ, 99.7% within 3σ

★ Memorize This

Binomial conditions: Fixed n, binary outcome, constant p, independent trials (BINS)
Normal distribution is symmetric about μ; mean = median = mode
Standardizing: Z-score tells you how many SDs from the mean
Standard Normal: Z ~ N(0, 1)
To add independent RVs: μ adds, variance adds (not SD!)

📝 Example

X ~ Bin(10, 0.3). Find P(X = 3).

P(X=3) = C(10,3) · (0.3)³ · (0.7)⁷ = 120 · 0.027 · 0.0824 ≈ 0.2668

Mean = 10(0.3) = 3, SD = √(10·0.3·0.7) ≈ 1.449

Sampling Distributions & CLT

The Central Limit Theorem (CLT) is foundational: the distribution of sample means approaches normal as n increases, regardless of the population shape.

Sampling Dist of x̄: mean = μ, SE = σ/√n
CLT applies when n ≥ 30 (or population is normal)
Sampling Dist of p̂: mean = p, SE = √[p(1−p)/n]
Conditions for p̂: np ≥ 10 and n(1−p) ≥ 10

★ Memorize This

Larger n → smaller SE → sampling distribution is narrower
SE = σ/√n (standard error, not standard deviation)
CLT makes inference possible even for non-normal populations
Independence condition: sample size ≤ 10% of population

📝 Example

Population: μ=50, σ=12. Sample n=36. Find P(x̄ > 52).

SE = 12/√36 = 2; z = (52−50)/2 = 1.0

P(Z > 1.0) = 1 − 0.8413 = 0.1587

Confidence Intervals

A confidence interval gives a range of plausible values for a population parameter.

CI for μ (known σ): x̄ ± z* · (σ/√n)
CI for μ (unknown σ): x̄ ± t* · (s/√n), df = n−1
CI for p: p̂ ± z* · √[p̂(1−p̂)/n]
Margin of Error (ME) = z* · SE
Common z*: 90%→1.645, 95%→1.96, 99%→2.576

★ Memorize This

Wider CI = higher confidence OR smaller n OR larger σ
Use t* when σ is unknown and/or n is small
"We are 95% confident the true parameter lies in [a, b]" — correct interpretation
Margin of Error = half the width of the interval
To halve ME: multiply n by 4

📝 Example

x̄ = 82, s = 10, n = 25. Construct a 95% CI for μ.

df = 24, t* ≈ 2.064

ME = 2.064 × (10/5) = 4.128

CI: (77.87, 86.13)

Hypothesis Testing

Hypothesis testing uses sample data to evaluate claims about population parameters.

H₀: null hypothesis (no effect/difference)
Hₐ: alternative hypothesis
Test Statistic (z): z = (x̄ − μ₀) / (σ/√n)
Test Statistic (t): t = (x̄ − μ₀) / (s/√n)
p-value: probability of result as extreme, given H₀ true
Reject H₀ if p-value < α

★ Memorize This

Type I error (α): reject H₀ when H₀ is true (false positive)
Type II error (β): fail to reject H₀ when Hₐ is true (false negative)
Power = 1 − β = P(reject H₀ | Hₐ true)
Small p-value → strong evidence against H₀
Never "accept H₀" — only "fail to reject H₀"

📝 Example

H₀: μ = 100; Hₐ: μ ≠ 100. x̄ = 104, s = 12, n = 36, α = 0.05.

t = (104−100)/(12/6) = 4/2 = 2.0, df = 35

Two-tailed p-value ≈ 0.053 > 0.05

Decision: Fail to reject H₀

Linear Regression & Chi-Square Tests

Regression models relationships between variables; Chi-Square tests compare categorical frequencies.

Regression Line: ŷ = b₀ + b₁x
Slope: b₁ = r · (sy/sx)
Intercept: b₀ = ȳ − b₁x̄
Residual = y − ŷ
r² = proportion of variation in y explained by x

χ² = Σ (O − E)² / E
Chi-Square GOF df = k − 1
Chi-Square Independence df = (r−1)(c−1)

★ Memorize This

r is the correlation coefficient: −1 ≤ r ≤ 1
r close to ±1 = strong linear association; r near 0 = weak/no linear association
Correlation ≠ Causation
Residual plot should show no pattern (random scatter) for a good fit
Expected frequency in Chi-Square = (Row total × Column total) / Grand total

📝 Example

r = 0.8, sx = 5, sy = 10, x̄ = 3, ȳ = 20. Find the regression line.

b₁ = 0.8 × (10/5) = 1.6

b₀ = 20 − 1.6(3) = 15.2

Line: ŷ = 15.2 + 1.6x

Master Statistics
From Concept to Exam

Measures of Center & Spread

Probability Rules

Binomial & Normal Distributions

Sampling Distributions & CLT

Confidence Intervals

Hypothesis Testing

Linear Regression & Chi-Square Tests

20 Exam-Style Problems

Answer Key & Full Solutions

Master StatisticsFrom Concept to Exam

Measures of Center & Spread

Probability Rules

Binomial & Normal Distributions

Sampling Distributions & CLT

Confidence Intervals

Hypothesis Testing

Linear Regression & Chi-Square Tests

20 Exam-Style Problems

Answer Key & Full Solutions

Master Statistics
From Concept to Exam