📚 Study Guide
Concepts & Key Formulas
Master these concepts before tackling the 20 practice questions below.
Describe distributions using SOCS: Shape, Outliers, Center, Spread. Use the correct measure based on shape.
Key Formulas
Mean: x̄ = Σxᵢ / n
IQR = Q3 − Q1
Outlier: x < Q1 − 1.5·IQR or x > Q3 + 1.5·IQR
s² = Σ(xᵢ − x̄)² / (n−1) [sample variance]
⭐ Memorize
- Skewed right → mean > median; skewed left → mean < median
- Symmetric → mean ≈ median; use mean + SD to summarize
- Skewed/outliers → use median + IQR to summarize
- Divide by (n−1) for sample SD (degrees of freedom)
📝 Quick Example
Dataset: 2, 5, 7, 8, 100. Is 100 an outlier?
Q1=3.5, Q3=8, IQR=4.5 → UF = 8+6.75 = 14.75. Since 100 > 14.75, YES it is an outlier.
The normal distribution is symmetric and bell-shaped. Any normal variable can be standardized to the standard normal (μ=0, σ=1).
Key Formulas
z = (x − μ) / σ
x = μ + z·σ [reverse: find raw score]
Empirical Rule: 68% (±1σ), 95% (±2σ), 99.7% (±3σ)
⭐ Memorize
- z-score = number of standard deviations from the mean
- Use Table A or calculator: P(Z < z) = normalcdf(−∞, z)
- Changing units (add/multiply) shifts/scales mean & SD accordingly
📝 Quick Example
Heights: μ=68in, σ=3in. What z-score for 74in?
z = (74−68)/3 = 2.0 → P(X<74) ≈ 97.7%
The least-squares regression line (LSRL) minimizes the sum of squared residuals. Always interpret in context.
Key Formulas
ŷ = a + bx
b = r·(sᵧ/sₓ) a = ȳ − b·x̄
residual = y − ŷ (observed − predicted)
r² = proportion of variation in y explained by x
⭐ Memorize
- LSRL always passes through (x̄, ȳ)
- r measures strength AND direction; −1 ≤ r ≤ 1
- Correlation ≠ causation
- Extrapolation = predicting outside the observed x range (unreliable)
- Positive residual → actual value ABOVE the line
📝 Quick Example
r = 0.8, r² = 0.64. Interpret r².
64% of the variation in y is explained by the linear relationship with x.
Well-designed studies use randomization to reduce bias and allow causal conclusions.
⭐ Memorize
- Experiments can establish causation; observational studies cannot
- Placebo + blinding reduces confounding in experiments
- Simple Random Sample (SRS): every individual equally likely to be chosen
- Stratified: divide into strata, SRS within each → reduces variability
- Cluster: randomly select groups; sample everyone in them
- Blocking in experiments controls for known lurking variables
📝 Quick Example
A study shows ice cream sales predict drowning rates. Is ice cream a cause?
No. Lurking variable: hot weather increases both. Correlation ≠ causation (observational study).
Key Formulas
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
P(A | B) = P(A ∩ B) / P(B)
Independent iff: P(A ∩ B) = P(A)·P(B)
Mutually exclusive: P(A ∩ B) = 0
⭐ Memorize
- Mutually exclusive events CANNOT both occur → NOT independent (unless P=0)
- Independent events: knowing one occurred doesn't change P of other
- Complement: P(Aᶜ) = 1 − P(A)
- Law of Total Probability: P(B) = P(B|A)P(A) + P(B|Aᶜ)P(Aᶜ)
Key Formulas
E(X) = Σ xᵢ·P(xᵢ)
Var(X) = Σ(xᵢ−μ)²·P(xᵢ) = E(X²) − [E(X)]²
Binomial X~B(n,p):
P(X=k) = C(n,k)·pᵏ·(1−p)^(n−k)
μ = np, σ = √(np(1−p))
Geometric X~Geo(p):
P(X=k) = (1−p)^(k−1)·p, μ = 1/p
⭐ Memorize — BINS (Binomial Conditions)
- Binary outcomes (success / failure)
- Independent trials
- Number of trials is fixed
- Same probability of success each trial
📝 Quick Example
Flip a fair coin 10 times. E(heads) = ?
X~B(10, 0.5) → μ = 10×0.5 = 5
Key Formulas — Sample Mean x̄
μ(x̄) = μ
σ(x̄) = σ/√n (standard error of x̄)
Sample Proportion p̂
μ(p̂) = p
σ(p̂) = √(p(1−p)/n)
⭐ Memorize
- CLT: for large n, x̄ is approximately normal regardless of population shape
- Rule of thumb for p̂ normal: np ≥ 10 AND n(1−p) ≥ 10
- 10% condition: n ≤ 10% of population (for independence)
- Larger n → smaller standard error → more precise estimates
Confidence Interval Template
statistic ± (critical value) × (standard error)
1-prop z-CI: p̂ ± z* √(p̂(1−p̂)/n)
1-sample t-CI: x̄ ± t* (s/√n), df = n−1
2-sample t-CI: (x̄₁−x̄₂) ± t* · SE
Test Statistic
z = (p̂ − p₀) / √(p₀(1−p₀)/n)
t = (x̄ − μ₀) / (s/√n)
⭐ Memorize
- 95% CI: we are 95% confident the true parameter lies in this interval
- Type I error (α): reject H₀ when H₀ is true (false positive)
- Type II error (β): fail to reject H₀ when Hₐ is true (false negative)
- Power = 1 − β; increases with larger n, larger effect, larger α
- p-value = P(data this extreme or more | H₀ true)
- Reject H₀ if p-value < α
- Use t (not z) when σ is unknown
📝 Quick Example
p-value = 0.03, α = 0.05. Conclusion?
0.03 < 0.05 → Reject H₀. Statistically significant evidence for Hₐ.
📝 Practice Exam
20 Exam-Style Questions
Choose the best answer for each question. Explanations appear immediately.