📖  How to Use This Guidebook: Content is now organized by Analysis Goal — Comparison, Correlation, Prediction, and Classification — matching the way researchers actually frame research questions. Each parametric test is paired with its non-parametric alternative. Technical assumptions have been corrected to specify that normality and homoscedasticity apply to residuals, not raw variables. New sections cover the n = 30 myth, VIF thresholds, and effect size interpretation. All code blocks now use a high-contrast light theme for full readability on RPubs and RStudio Viewer.


1 Universal Pre-Analysis Checklist

Before choosing any statistical test, pass through these four gates:

Gate Question Action if Violated
Measurement Scale What is the scale of the DV? (Nominal / Ordinal / Interval / Ratio) Reclassify; choose appropriate test family
Independence Are observations independent of each other? Use multilevel or mixed models if clustered
Missing Data Is missingness MCAR / MAR / MNAR? Impute (MI or FIML); avoid listwise deletion
Sample Size Is \(N\) adequate for the chosen test and expected effect size? Run a priori power analysis; target \(1-\beta \geq .80\)

1.1 Normality Testing — Corrected Guidance

Critical correction: Normality and homoscedasticity assumptions in parametric tests apply to residuals (errors) — not to the raw independent variables or raw dependent variable. Always diagnose residual distributions, never raw scores.

Tool Best For Limitation
Shapiro–Wilk test \(N < 50\); sensitive, formal test At \(n > 100\), trivially rejects normality due to over-sensitivity — \(p\)-value loses diagnostic value
Kolmogorov–Smirnov \(N \geq 50\) Less powerful than Shapiro–Wilk; requires estimated parameters correction (Lilliefors variant)
Q-Q Plot (visual) Preferred for \(n > 100\) Requires judgment; not a formal test
Histogram + density Any \(N\); useful for skew/kurtosis Qualitative only

Rule: For \(n \leq 100\), report Shapiro–Wilk \(W\) and \(p\)-value alongside a Q-Q plot. For \(n > 100\), rely on Q-Q plot inspection — a non-significant Shapiro–Wilk is no longer a reliable normality guarantee, and a significant one may flag trivial departures.

1.2 Homoscedasticity Testing — Test Selection by Context

Test Use In What It Tests
Levene’s Test Group comparison designs (t-test, ANOVA) Equality of variances across groups
Bartlett’s Test Group designs with confirmed normality Equality of variances — more powerful but less robust to non-normality
Breusch–Pagan Test Regression models Systematic relationship between residual variance and fitted values
White Test Regression models More general heteroscedasticity; no normality assumption required
Residual vs. Fitted Plot Regression (visual) Fan-shaped pattern signals heteroscedasticity

Context rule: Use Levene’s Test for ANOVA-family designs. Use Breusch–Pagan (or White’s test) specifically for regression models. Using Levene’s for regression is a category error — it tests group variances, not the regression error structure.

1.3 The n = 30 Myth — CLT Guidance

The oft-cited rule “n ≥ 30 is sufficient for the Central Limit Theorem” is an oversimplification that causes systematic errors in practice.

Distribution Type Minimum Recommended \(N\) Rationale
Approximately normal, symmetric \(n \geq 20\) per group CLT converges rapidly
Mild skew (skewness \(\lvert s \rvert < 1\)) \(n \geq 30\) per group Standard guidance applies
Moderate-to-heavy skew (\(\lvert s \rvert \geq 1\)) \(n \geq 100\) per group CLT convergence is substantially slower
Fat-tailed distributions (excess kurtosis \(> 3\)) \(n \geq 100\)\(200\) per group Heavy tails create persistent sampling instability
Bimodal distributions Use mixture models CLT does not resolve structural bimodality

The myth: \(n = 30\) is a minimum floor for mild departures from normality — it is not a universal pass for all distributions. For social science data (often skewed, bounded Likert composites), \(n \geq 100\) is the safer working rule before treating the sampling distribution of \(\bar{X}\) as approximately normal.

1.4 Multicollinearity — VIF Decision Rules

Variance Inflation Factor measures how much variance in a regression coefficient is inflated by collinearity with other predictors:

\[VIF_j = \frac{1}{1 - R^2_j}\]

where \(R^2_j\) is the coefficient of determination from regressing predictor \(X_j\) on all remaining predictors.

VIF Value Interpretation Action
\(VIF < 3\) No concern Proceed normally
\(3 \leq VIF < 5\) Mild Monitor; report; no action required
\(5 \leq VIF < 10\) Moderate concern Investigate; consider combining or centering predictors
\(VIF \geq 10\) Serious problem Multicollinearity is distorting coefficients; act (ridge regression, remove predictors, PCA on correlated set)

Also inspect the Condition Index (from eigenvalue decomposition of \(\mathbf{X}'\mathbf{X}\)): \(CI = \sqrt{\lambda_{\max}/\lambda_j}\); values \(> 30\) confirm serious collinearity.

1.5 Power Analysis

\[n = \frac{(z_{1-\alpha/2} + z_{1-\beta})^2 \cdot \sigma^2}{\delta^2}\]

where \(\delta\) is the minimum detectable effect, \(\sigma^2\) is the variance, and \(z\) values are critical values for Type I (\(\alpha\)) and Type II (\(\beta\)) error rates. Target \(1 - \beta \geq .80\); use \(\geq .95\) for confirmatory or high-stakes research. Use G*Power (Faul et al., 2007) for validated calculations across all major test families.


2 Analysis Goal 1: COMPARISON

▶ Analysis Goal: Comparison

Use comparison methods when the research question is: “Do groups differ on a measured outcome?”

2.1 Core Assumptions Shared by All Comparison Tests

Assumption Applies To How to Check Threshold
Normality of residuals All parametric comparison tests Shapiro–Wilk (\(n \leq 100\)); Q-Q plot (\(n > 100\)) \(p > .05\) on SW; points near diagonal
Homogeneity of variances t-test, ANOVA Levene’s Test \(p > .05\)
Independence of observations All tests Design review No nesting or clustering
Interval/ratio DV All parametric tests Measurement theory Design decision

2.2 Independent Samples t-Test

⇔ Non-parametric alternative: Mann–Whitney U Test

Use when: Comparing the means of two independent groups on a continuous DV.

\[t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}}\]

Assumption Check If Violated
Normality of residuals within each group Shapiro–Wilk; Q-Q plot Use Mann–Whitney U
Equal variances (homoscedasticity) Levene’s Test Apply Welch’s correction (do not pool variances)
Independence Design review Redesign or use paired test

Rules of Thumb: \(n \geq 30\) per group (or \(\geq 100\) for skewed data); group size ratio \(\leq 3{:}1\) with unequal variances requires Welch’s correction.

Effect size — Cohen’s \(d\):

\[d = \frac{\bar{X}_1 - \bar{X}_2}{s_{\text{pooled}}}, \quad s_{\text{pooled}} = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}\]

Benchmarks: small = .20, medium = .50, large = .80 (Cohen, 1988).

Red Flags 🚩

  • Reporting only \(p < .05\) without Cohen’s \(d\) or a confidence interval.
  • Using independent \(t\)-test on matched data — use paired \(t\) instead.
  • Treating ordinal Likert items as continuous without distributional evidence.

2.2.1 R Example — Welch Two-Sample t-Test

# Compare fuel efficiency (mpg) by transmission type
# am: 0 = automatic, 1 = manual
result_t <- t.test(mpg ~ am, data = mtcars, var.equal = FALSE)
print(result_t)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  mpg by am
#> t = -3.7671, df = 18.332, p-value = 0.001374
#> alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
#> 95 percent confidence interval:
#>  -11.280194  -3.209684
#> sample estimates:
#> mean in group 0 mean in group 1 
#>        17.14737        24.39231
# Cohen's d
grp    <- split(mtcars$mpg, mtcars$am)
n1     <- length(grp[[1]]); n2 <- length(grp[[2]])
sp     <- sqrt(((n1-1)*var(grp[[1]]) + (n2-1)*var(grp[[2]])) / (n1+n2-2))
cohd   <- abs(diff(sapply(grp, mean))) / sp
cat(sprintf("\nCohen's d = %.3f  [large effect by Cohen (1988) benchmarks]\n", cohd))
#> 
#> Cohen's d = 1.478  [large effect by Cohen (1988) benchmarks]

2.2.2 Non-Parametric Alternative — Mann–Whitney U Test

Use when: Normality is violated, the DV is ordinal, or \(N\) is small (\(< 30\)) and distribution is skewed.

\[U_1 = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1, \quad U = \min(U_1, U_2)\]

Effect size: \(r = Z / \sqrt{N}\); benchmarks: small = .10, medium = .30, large = .50.

Red Flags 🚩 Applying Mann–Whitney to paired data (use Wilcoxon Signed-Rank); interpreting \(U\) as testing means when group distributions differ in shape.


2.3 Paired Samples t-Test

⇔ Non-parametric alternative: Wilcoxon Signed-Rank Test

Use when: Comparing two related or matched measurements (pre–post, twins, repeated measures on same unit).

\[t = \frac{\bar{D}}{s_D / \sqrt{n}}, \quad D_i = X_{1i} - X_{2i}\]

The normality assumption applies to the difference scores \(D_i\), not the raw scores.

Effect size — Cohen’s \(d_z\):

\[d_z = \frac{\bar{D}}{s_D}\]

Benchmarks: small = .20, medium = .50, large = .80.

Red Flags 🚩 Running two separate one-sample \(t\)-tests instead (inflates Type I error); ignoring the within-pair correlation (which drives the power advantage of this design).


2.4 One-Way ANOVA

⇔ Non-parametric alternative: Kruskal–Wallis H Test

Use when: Comparing means across three or more independent groups.

\[F = \frac{MS_{\text{Between}}}{MS_{\text{Within}}} = \frac{SS_B/(k-1)}{SS_W/(N-k)}\]

Assumption Check If Violated
Normality of residuals Shapiro–Wilk on residuals; Q-Q plot Use Kruskal–Wallis
Homogeneity of variances Levene’s Test Use Welch’s ANOVA
Independence Design review Use multilevel model

Post-hoc hierarchy: Tukey HSD (equal \(n\), equal variances) → Games–Howell (unequal variances) → Bonferroni (planned comparisons).

Effect size — Eta-squared \(\eta^2\):

\[\eta^2 = \frac{SS_{\text{Between}}}{SS_{\text{Total}}}\]

Also report \(\omega^2\) (less biased for small \(N\)):

\[\omega^2 = \frac{SS_B - (k-1)MS_W}{SS_T + MS_W}\]

Metric Small Medium Large
\(\eta^2\) / \(\omega^2\) .01 .06 .14
Cohen’s \(f\) .10 .25 .40

Statistical significance ≠ practical importance. A one-way ANOVA with \(N = 800\) may yield \(F(2, 797) = 6.20\), \(p < .001\), yet \(\eta^2 = .015\) — accounting for less than 2% of variance. Always report and interpret effect sizes alongside \(p\)-values.

Red Flags 🚩 Running multiple \(t\)-tests instead of ANOVA (familywise error \(= 1-(1-\alpha)^k\)); reporting \(F\) without post-hoc tests when \(k > 2\); applying ANOVA to clustered data.

2.4.1 R Example — One-Way ANOVA with Post-Hoc

# Compare horsepower across cylinder groups
mtcars$cyl_f <- factor(mtcars$cyl, labels = c("4-cyl", "6-cyl", "8-cyl"))

fit_aov <- aov(hp ~ cyl_f, data = mtcars)
summary(fit_aov)
#>             Df Sum Sq Mean Sq F value   Pr(>F)    
#> cyl_f        2 104031   52015   36.18 1.32e-08 ***
#> Residuals   29  41696    1438                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Tukey HSD post-hoc
TukeyHSD(fit_aov)
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = hp ~ cyl_f, data = mtcars)
#> 
#> $cyl_f
#>                  diff       lwr       upr     p adj
#> 6-cyl-4-cyl  39.64935 -5.627454  84.92616 0.0949068
#> 8-cyl-4-cyl 126.57792 88.847251 164.30859 0.0000000
#> 8-cyl-6-cyl  86.92857 43.579331 130.27781 0.0000839
# Eta-squared
ss      <- summary(fit_aov)[[1]]$"Sum Sq"
eta_sq  <- ss[1] / sum(ss)
omega_sq <- (ss[1] - (nlevels(mtcars$cyl_f)-1) * summary(fit_aov)[[1]]$"Mean Sq"[2]) /
            (sum(ss) + summary(fit_aov)[[1]]$"Mean Sq"[2])
cat(sprintf("\neta-squared  = %.3f\nomega-squared = %.3f\n", eta_sq, omega_sq))
#> 
#> eta-squared  = 0.714
#> omega-squared = 0.687

2.4.2 Non-Parametric Alternative — Kruskal–Wallis H Test

Use when: Normality is violated; DV is ordinal; cells have \(n < 20\).

\[H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i} - 3(N+1)\]

Requires \(n_i \geq 5\) per group for the \(\chi^2\) approximation to be valid. Post-hoc: Dunn’s test with Bonferroni or Holm correction. Effect size: \(\eta^2_H = (H - k + 1)/(N - k)\).


2.5 Factorial (Two-Way) ANOVA

⇔ Note: No clean non-parametric equivalent; use aligned rank transform (ART-ANOVA) for ordinal data.

Use when: Examining the effects of two or more categorical IVs and their interaction on a continuous DV.

Additional rules: Always test the \(A \times B\) interaction before interpreting main effects. A significant interaction renders unconditional main effects misleading — interpret via simple effects analysis. Use Type III SS for unbalanced designs. \(n \geq 5\) per cell (ideally \(\geq 20\)).

Red Flags 🚩 Interpreting main effects as unconditional when a significant interaction is present; cells with \(n < 5\).


2.6 Repeated-Measures ANOVA

⇔ Non-parametric alternative: Friedman Test

Use when: The same participants are measured across three or more conditions or time points.

Additional assumption — Sphericity:

\[\hat{\varepsilon} \geq \frac{1}{k-1}, \quad \text{tested by Mauchly's Test}\]

Mauchly’s Result Correction
\(p > .05\) — sphericity met Standard \(F\); uncorrected \(df\)
\(\hat{\varepsilon} \geq .75\) Huynh–Feldt correction
\(\hat{\varepsilon} < .75\) Greenhouse–Geisser correction
Severe violation Switch to MANOVA or mixed-effects model

Red Flags 🚩 Ignoring Mauchly’s test; handling missing data by listwise deletion (use mixed-effects models instead).

2.6.1 Non-Parametric Alternative — Friedman Test

Use when: Sphericity is severely violated; DV is ordinal; \(N\) is small. Effect size: Kendall’s \(W\) (concordance coefficient); \(W \geq .70\) indicates strong agreement across conditions.


2.7 MANOVA

⇔ No standard non-parametric equivalent; use Permutation MANOVA (PERMANOVA) for robust alternatives.

Use when: Simultaneously comparing groups on two or more continuous DVs.

Assumption Diagnostic
Multivariate normality Mardia’s test; Henze–Zirkler; Royston’s test
Homogeneity of covariance matrices Box’s \(M\) (\(p > .001\) tolerance)
No multicollinearity among DVs Bivariate \(r\) between DVs: .30–.90
Independence Design review
No multivariate outliers Mahalanobis \(D^2\) at \(\chi^2_{p}\), \(p < .001\)

Test statistic selection:

Statistic Best When
Wilks’ \(\Lambda\) Most common; balanced power (Wilks, 1932)
Pillai’s Trace Default choice — most robust to violations (Pillai, 1955; Olson, 1976)
Hotelling’s Trace One dominant canonical dimension
Roy’s Largest Root Maximum power with one variate; most sensitive to violations

Effect size: Partial \(\eta^2_p\); benchmarks same as ANOVA (.01 / .06 / .14).

Red Flags 🚩 Running separate ANOVAs instead (inflates familywise error); DVs entirely uncorrelated (MANOVA loses advantage); Box’s \(M\) severely significant with unequal group sizes.


3 Analysis Goal 2: CORRELATION

▶ Analysis Goal: Correlation

Use correlation methods when the research question is: “How strongly are variables associated?”

3.1 Pearson r — Linear Association

⇔ Non-parametric alternative: Spearman’s ρ or Kendall’s τ

Use when: Both variables are continuous and interval/ratio scaled; relationship is expected to be linear.

\[r = \frac{\sum(X_i - \bar{X})(Y_i - \bar{Y})}{(n-1)\,s_X\,s_Y}\]

Assumption Check
Both variables continuous Measurement review
Linear relationship Scatterplot inspection
Bivariate normality (for inference) Q-Q plots; Mardia’s test
No severe outliers Scatterplot; leverage statistics
Independence of observations Design review

Benchmarks (Cohen, 1988): small = .10, medium = .30, large = .50.

The squared correlation \(r^2\) is the coefficient of determination — the proportion of variance in \(Y\) explained by \(X\).

Red Flags 🚩 Interpreting correlation as causation; failing to inspect scatterplot for non-linearity or heteroscedasticity; not checking for outliers driving spurious correlations.

3.1.1 R Example — Pearson and Spearman Correlation

# Pearson r: mpg vs weight
r_pearson <- cor.test(mtcars$mpg, mtcars$wt, method = "pearson")
cat("--- Pearson Correlation ---\n")
#> --- Pearson Correlation ---
print(r_pearson)
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  mtcars$mpg and mtcars$wt
#> t = -9.559, df = 30, p-value = 1.294e-10
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.9338264 -0.7440872
#> sample estimates:
#>        cor 
#> -0.8676594
cat(sprintf("R-squared = %.3f (proportion of shared variance)\n",
            r_pearson$estimate^2))
#> R-squared = 0.753 (proportion of shared variance)
# Spearman rho (non-parametric alternative)
r_spearman <- cor.test(mtcars$mpg, mtcars$wt, method = "spearman")
cat("\n--- Spearman Correlation (non-parametric) ---\n")
#> 
#> --- Spearman Correlation (non-parametric) ---
print(r_spearman)
#> 
#>  Spearman's rank correlation rho
#> 
#> data:  mtcars$mpg and mtcars$wt
#> S = 10292, p-value = 1.488e-11
#> alternative hypothesis: true rho is not equal to 0
#> sample estimates:
#>       rho 
#> -0.886422

3.2 Spearman’s \(\rho\) and Kendall’s \(\tau\) — Rank-Order Association

Use when: DV or IV is ordinal; or normality assumption for Pearson \(r\) is violated; or outliers are present.

\[\rho = 1 - \frac{6\sum d_i^2}{n(n^2-1)}, \quad \text{where } d_i = \text{rank}(X_i) - \text{rank}(Y_i)\]

Kendall’s \(\tau\) is preferred over \(\rho\) when there are many tied ranks or when \(N\) is small — it has better sampling properties in those conditions.

Red Flags 🚩 Using Spearman \(\rho\) when the research hypothesis is specifically about linear association (Pearson \(r\) is the appropriate test).


3.3 Chi-Square Test — Nominal Association

⇔ Exact alternative: Fisher’s Exact Test (for small expected cells)

Use when: Testing association between two categorical (nominal) variables in a contingency table.

\[\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad E_{ij} = \frac{R_i \cdot C_j}{N}\]

Assumption Threshold
Independence of observations No repeated measures on same case
Expected cell frequencies \(E_{ij} \geq 5\) in \(\geq 80\%\) of cells; no cell with \(E < 1\)
Mutually exclusive categories Categories do not overlap

If expected cell frequencies are violated: Use Fisher’s Exact Test (\(2 \times 2\)) or collapse categories.

Effect size — Cramér’s \(V\):

\[V = \sqrt{\frac{\chi^2}{N \cdot \min(r-1,\, c-1)}}\]

Benchmarks: small = .10, medium = .30, large = .50.

Red Flags 🚩 Running \(\chi^2\) on dependent observations — use McNemar’s test (McNemar, 1947); interpreting \(\chi^2\) as a strength measure (it tests independence, not association magnitude).


3.4 Point-Biserial Correlation

Use when: One variable is truly dichotomous (binary); the other is continuous.

\[r_{pb} = \frac{\bar{X}_1 - \bar{X}_0}{s_X}\sqrt{\frac{n_1 n_0}{n^2}}\]

This is mathematically equivalent to Pearson \(r\) when one variable is coded 0/1. It relates directly to the independent \(t\)-test: \(r_{pb}^2 = t^2/(t^2 + df)\).


4 Analysis Goal 3: PREDICTION

▶ Analysis Goal: Prediction

Use prediction methods when the research question is: “How well can we predict an outcome from a set of predictors?”

4.1 OLS Multiple Regression

⇔ Robust alternatives: Quantile regression (non-normal errors); Weighted Least Squares (heteroscedastic errors)

Use when: DV is continuous; predicting \(\hat{Y}\) from one or more IVs.

\[\hat{Y} = b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_k X_k + \varepsilon\]

4.1.1 Assumptions — Applied to RESIDUALS, Not Raw Variables

Correction from v2: The normality and homoscedasticity assumptions in OLS regression apply to the residuals \(\varepsilon_i = Y_i - \hat{Y}_i\), not to the raw IV or DV distributions. You can and should have non-normally distributed predictors. It is the error term that must satisfy distributional assumptions.

Assumption What It Tests Diagnostic Violation Remedy
Linearity \(E[\varepsilon \mid X] = 0\) Partial regression plots; RESET test Polynomial terms; transformations
Independence of residuals No autocorrelation Durbin–Watson (\(DW \approx 2\)) GLS; clustered SE; time-series models
Homoscedasticity of residuals \(\text{Var}(\varepsilon_i) = \sigma^2\) (constant) Breusch–Pagan test; Residual vs. Fitted plot WLS; HC3 heteroscedasticity-robust SE
Normality of residuals \(\varepsilon \sim \mathcal{N}(0,\sigma^2)\) Shapiro–Wilk on residuals; Q-Q plot of residuals Transform DV; robust regression
No multicollinearity Predictors not collinear \(VIF < 5\) (moderate); \(VIF < 10\) (serious) Ridge regression; drop/combine predictors
No influential outliers No single cases dominating fit Cook’s \(D > 4/n\); leverage \(h_{ii} > 2(p+1)/n\) Robust regression; investigate cases

Standard Error of regression coefficient:

\[SE(b_j) = \sqrt{\frac{MS_{\text{Residual}}}{\sum(X_{ij} - \bar{X}_j)^2 \cdot (1 - R^2_j)}}\]

Note that \(1/(1-R^2_j) = VIF_j\) — multicollinearity directly inflates the SE of \(b_j\).

4.1.2 Rules of Thumb for Regression

Heuristic Value
Events per variable (EPV) \(N/k \geq 10\) (minimum); \(N/k \geq 20\) (recommended)
Minimum \(N\) for \(R^2\) test \(50 + 8k\)
VIF — mild concern \(3 \leq VIF < 5\)
VIF — moderate concern \(5 \leq VIF < 10\)
VIF — serious problem \(VIF \geq 10\)
Condition Index \(> 30\) confirms serious collinearity
Cook’s \(D\) threshold \(> 4/n\) — investigate case

Effect size — Cohen’s \(f^2\):

\[f^2 = \frac{R^2}{1 - R^2}; \quad f^2 = 0.02 \text{ (small)},\; 0.15 \text{ (medium)},\; 0.35 \text{ (large)}\]

Red Flags 🚩

  • Interpreting normality of the DV as the OLS normality assumption — the residuals are what matter.
  • Using Levene’s Test to check OLS homoscedasticity — use Breusch–Pagan.
  • \(R^2\) substantially higher than Adjusted \(R^2\) — likely overfitting; cross-validate.
  • Stepwise selection — inflates Type I error and produces shrunken \(R^2\) in replication (use theory-driven entry or LASSO).
  • Not plotting Cook’s \(D\) — influential cases can single-handedly drive \(R^2\).
  • Adding colliders (variables caused by both predictor and outcome) — induces bias; draw a DAG first.

4.1.3 R Example — OLS Regression with Full Diagnostics

# Multiple regression: mpg ~ weight + horsepower + transmission
fit <- lm(mpg ~ wt + hp + am, data = mtcars)
summary(fit)
#> 
#> Call:
#> lm(formula = mpg ~ wt + hp + am, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.4221 -1.7924 -0.3788  1.2249  5.5317 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 34.002875   2.642659  12.867 2.82e-13 ***
#> wt          -2.878575   0.904971  -3.181 0.003574 ** 
#> hp          -0.037479   0.009605  -3.902 0.000546 ***
#> am           2.083710   1.376420   1.514 0.141268    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.538 on 28 degrees of freedom
#> Multiple R-squared:  0.8399, Adjusted R-squared:  0.8227 
#> F-statistic: 48.96 on 3 and 28 DF,  p-value: 2.908e-11
# --- Assumption checks ---
# 1. Normality of RESIDUALS (not raw variables)
shapiro.test(residuals(fit))
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  residuals(fit)
#> W = 0.9453, p-value = 0.1059
# 2. Homoscedasticity of RESIDUALS via Breusch-Pagan
if (requireNamespace("lmtest", quietly = TRUE)) {
  cat("\nBreusch-Pagan test for heteroscedasticity:\n")
  print(lmtest::bptest(fit))
} else {
  cat("\nInstall 'lmtest' for Breusch-Pagan: install.packages('lmtest')\n")
}
#> 
#> Breusch-Pagan test for heteroscedasticity:
#> 
#>  studentized Breusch-Pagan test
#> 
#> data:  fit
#> BP = 5.534, df = 3, p-value = 0.1366
# 3. VIF — multicollinearity
if (requireNamespace("car", quietly = TRUE)) {
  cat("\nVariance Inflation Factors:\n")
  vif_vals <- car::vif(fit)
  print(vif_vals)
  cat(sprintf("Max VIF = %.2f  [threshold: 5 = moderate, 10 = serious]\n",
              max(vif_vals)))
} else {
  cat("\nInstall 'car' for VIF: install.packages('car')\n")
}
#> 
#> Variance Inflation Factors:
#>       wt       hp       am 
#> 3.774838 2.088124 2.271082 
#> Max VIF = 3.77  [threshold: 5 = moderate, 10 = serious]
# 4. Influential cases — Cook's Distance
thresh <- 4 / nrow(mtcars)
n_inf  <- sum(cooks.distance(fit) > thresh)
cat(sprintf("\nCook's D > 4/n (%.4f): %d influential observation(s)\n",
            thresh, n_inf))
#> 
#> Cook's D > 4/n (0.1250): 4 influential observation(s)
# 5. Cohen's f-squared
r2    <- summary(fit)$r.squared
f2    <- r2 / (1 - r2)
cat(sprintf("\nR-squared = %.3f  |  Adjusted R-squared = %.3f  |  f2 = %.3f\n",
            r2, summary(fit)$adj.r.squared, f2))
#> 
#> R-squared = 0.840  |  Adjusted R-squared = 0.823  |  f2 = 5.246

4.2 Logistic Regression (Binary DV)

⇔ Exact alternatives: Exact Logistic Regression (small N); Penalized Firth Regression (complete separation)

Use when: DV is binary (0/1); model estimates the log-odds of the outcome.

\[\log\!\left(\frac{p}{1-p}\right) = b_0 + b_1 X_1 + \cdots + b_k X_k\]

\[\hat{p} = \frac{1}{1 + e^{-(b_0 + \mathbf{b}'\mathbf{X})}}\]

Assumption Diagnostic
Binary DV Design review
No complete separation Inspect max-rescaled \(R^2\); use Firth’s penalized likelihood if separated
No multicollinearity \(VIF < 10\)
Independence of errors Design review; GEE or mixed logistic if clustered
Adequate EPV \(\geq 10\) events per predictor (Peduzzi et al., 1996)

Reporting: \(OR = e^{b_k}\); \(OR > 1\) = increased odds; \(OR < 1\) = decreased odds. Report Nagelkerke \(R^2\) and AUC–ROC (\(\geq .70\) = acceptable, \(\geq .80\) = good, \(\geq .90\) = excellent).

Red Flags 🚩 EPV \(< 10\); applying to ordinal DV with \(> 2\) levels (use ordinal logistic); misinterpreting \(b_k\) as a probability change (it is a log-odds change).


4.3 Hierarchical (Sequential) Regression

Use when: Testing whether a theoretically motivated block of predictors accounts for incremental variance above a prior block.

\[\Delta R^2 = R^2_{\text{Model 2}} - R^2_{\text{Model 1}}\]

\[F_{\Delta R^2} = \frac{\Delta R^2 / \Delta k}{(1 - R^2_{\text{Model 2}}) / (N - k_2 - 1)}\]

Rules: Block entry order must be theory-driven; report \(\Delta R^2\), \(\Delta F\), and exact \(p\)-value per block; recheck all OLS assumptions at each step.


5 Analysis Goal 4: CLASSIFICATION & STRUCTURE

▶ Analysis Goal: Classification & Structure

Use these methods when the research question is: “What underlying structure exists in the data?” or “How can cases or variables be grouped?”

5.1 PCA vs. EFA — Choosing the Right Approach

Feature PCA EFA
Goal Data reduction; explain total variance Identify latent constructs; explain common variance
Variance modeled 100% of observed variance Shared variance only (communalities)
Output Components = linear composites of observed variables Factors = hypothetical latent variables
Use when Reducing variables with minimal information loss Hypothesizing underlying theoretical constructs
Validation step Not applicable CFA required for confirmatory testing

Core assumptions for EFA:

  1. \(N \geq 5\)–10 per variable; minimum \(N \geq 200\) widely recommended (Comrey & Lee, 1992).
  2. Variables at minimum interval-scaled; use polychoric correlations for ordinal items.
  3. Sufficient intercorrelation: Bartlett’s Sphericity \(p < .05\); KMO \(\geq .60\) acceptable, \(\geq .80\) good.
  4. No multicollinearity: \(|r| < .90\) among items; determinant of \(\mathbf{R} > .00001\).

5.1.1 Factor Retention

Rule Notes
Kaiser’s Criterion (\(\lambda > 1.0\)) Known to overextract (Kaiser, 1960) — use as lower bound only
Scree Plot elbow Subjective; combine with quantitative criteria (Cattell, 1966)
Parallel Analysis Gold standard — compare observed eigenvalues to random-data eigenvalues (Horn, 1965)
Velicer’s MAP Minimizes average squared partial correlation (Velicer, 1976)

Rotation: Oblique (Promax/Oblimin) for correlated factors (default for most social science constructs); Orthogonal (Varimax) only when factors are theoretically and empirically independent.

Reliability:

\[\alpha = \frac{k\bar{r}}{1+(k-1)\bar{r}} \quad \text{(Cronbach, 1951)} \qquad \omega_h = \frac{(\sum\lambda_i)^2}{(\sum\lambda_i)^2+\sum\delta_i} \quad \text{(McDonald, 1999 — preferred)}\]

Value Interpretation
\(\geq .90\) Excellent
\(\geq .80\) Good
\(\geq .70\) Acceptable
\(\geq .60\) Questionable
\(< .60\) Poor

Red Flags 🚩 Using PCA for construct validation; reporting Kaiser’s criterion alone; Cronbach’s \(\alpha\) as evidence of unidimensionality (it is not — use \(\omega_h\); McNeish, 2018).


5.2 Confirmatory Factor Analysis (CFA)

Use when: Testing a pre-specified factor structure grounded in theory or prior EFA.

Additional assumptions: \(N \geq 200\); \(N \geq 5\) per free parameter; use MLR or WLSMV for ordinal indicators.

5.2.1 Model Fit Indices

Index Acceptable Good Note
\(\chi^2/df\) \(< 5.0\) \(< 2.0\) \(\chi^2\) alone is \(N\)-sensitive
CFI \(\geq .90\) \(\geq .95\)
TLI \(\geq .90\) \(\geq .95\) Penalizes complexity
RMSEA \(< .08\) \(< .06\) Report 90% CI; test \(H_0\): RMSEA \(\leq .05\)
SRMR \(< .10\) \(< .08\)

Hu & Bentler (1999)

5.2.2 Validity Evidence

Type Indicator Threshold
Convergent AVE \(\geq .50\)
Convergent Factor loadings \(\lambda \geq .50\) (ideally \(\geq .70\))
Discriminant Fornell–Larcker: \(AVE > \phi^2_{ij}\) For each construct pair
Discriminant HTMT \(< .85\) (strict) or \(< .90\) (liberal)
Composite reliability CR \(\geq .70\)

\[AVE = \frac{\sum\lambda_i^2}{\sum\lambda_i^2+\sum\delta_i}, \qquad CR = \frac{(\sum\lambda_i)^2}{(\sum\lambda_i)^2+\sum\delta_i}\]

Red Flags 🚩 Relying on \(\chi^2\) alone; applying modification indices \(> 10\) atheoretically; Heywood cases (negative residual variances); not testing discriminant validity.


5.3 Structural Equation Modeling (SEM)

Use when: Testing complex theoretical models with latent variables, mediation, and/or moderation.

\[\boldsymbol{\Sigma}(\boldsymbol{\theta}) = \boldsymbol{\Lambda}\,\boldsymbol{\Phi}\,\boldsymbol{\Lambda}' + \boldsymbol{\Theta}\]

Rules: \(N \geq 200\); \(N \geq 10\) per free parameter; bootstrap \(B \geq 5000\) for indirect effects; report bias-corrected 95% CI for mediation (Preacher & Hayes, 2008).

Mediation indirect effect:

\[a \cdot b = a \times b, \quad a: X \to M,\quad b: M \to Y\]

Avoid Sobel test — use bias-corrected bootstrapped CIs instead (MacKinnon et al., 2004).

Red Flags 🚩 Not checking temporal precedence for mediation; treating non-significant direct paths as full mediation without bootstrapped CIs; ignoring equivalent models.


5.4 Cluster Analysis

Use when: Discovering natural groupings in data with no predefined criterion variable.

Method Best When
Hierarchical (Ward’s linkage) Small \(N\); unknown number of clusters (Ward, 1963)
\(K\)-Means Large \(N\); hypothesized or fixed \(k\)
GMM (Gaussian Mixture Models) Overlapping clusters; probabilistic assignment
DBSCAN Non-spherical clusters; noise/outlier handling needed

Rules: Always standardize variables before clustering; determine \(k\) using Elbow + Silhouette (\(\bar{s} \geq .50\); Rousseeuw, 1987) + Gap statistic; validate on split-half samples.

Red Flags 🚩 Unstandardized variables; researcher-chosen \(k\) without empirical support; treating cluster membership as a measured IV in subsequent regression (circular).


6 Master Decision Tree

Logic flow: If [Data Type] + If [Analysis Goal] → Use [Test]

IF goal = DESCRIBE distribution → Descriptive statistics, histograms, box plots, density plots

IF goal = COMPARE groups IF groups = 2 IF independent + continuous DV + normal residuals → Independent t-test IF independent + non-normal / ordinal DV → Mann-Whitney U IF related/paired + continuous DV + normal diffs → Paired t-test IF related/paired + non-normal / ordinal DV → Wilcoxon Signed-Rank IF groups >= 3 IF independent + continuous + normal + equal var → One-Way ANOVA IF independent + continuous + violated assump. → Kruskal-Wallis H IF repeated measures + continuous → RM-ANOVA IF repeated measures + non-normal / ordinal → Friedman Test IF 2+ IVs (factorial) → Factorial ANOVA IF 2+ continuous DVs simultaneously → MANOVA

IF goal = CORRELATE variables IF both continuous + linear + normal → Pearson r IF ordinal / non-normal / outliers present → Spearman rho IF many ties or small N → Kendall tau IF 1 continuous + 1 binary → Point-Biserial r_pb IF both categorical (nominal) → Chi-Square (or Fisher Exact) IF both ordinal in contingency table → Gamma or Somer’s d

IF goal = PREDICT an outcome IF DV continuous + predictors any scale → OLS Multiple Regression IF DV continuous + blocks of predictors → Hierarchical Regression IF DV binary (0/1) → Logistic Regression IF DV ordinal (3+ ordered levels) → Ordinal Logistic Regression IF DV is a count → Poisson / Negative Binomial IF DV continuous + predictors correlated (VIF>10) → Ridge / LASSO Regression IF complex model with latent variables → SEM / CFA

IF goal = FIND STRUCTURE in variables IF goal = data reduction, no theory → PCA IF goal = latent constructs, exploratory → EFA IF goal = test pre-specified factor structure → CFA IF goal = test complex latent path model → SEM

IF goal = FIND GROUPS in cases IF N small, k unknown → Hierarchical Cluster (Ward) IF N large, k hypothesized → K-Means Cluster IF overlapping groups, probabilistic → Gaussian Mixture Model IF non-spherical, with noise → DBSCAN


7 Effect Size Reference

7.1 Why Effect Sizes Matter — Statistical vs. Practical Significance

Core principle: Statistical significance (\(p < .05\)) tells you that an effect is unlikely to be zero. Effect size tells you how large the effect is in practice. With large samples, even trivially small effects achieve \(p < .001\). With small samples, important effects may not reach \(p < .05\). Always report and interpret effect sizes alongside \(p\)-values (Cohen, 1994; Cumming, 2014).

Worked example of the distinction:

A study of \(N = 1000\) finds: \(t(998) = 3.16\), \(p = .002\), Cohen’s \(d = 0.20\) (small effect). The difference is real but trivial — it explains only \(\approx 1\%\) of variance.

Another study with \(N = 40\) finds: \(t(38) = 1.90\), \(p = .065\), Cohen’s \(d = 0.62\) (medium effect). The effect is practically meaningful but underpowered — it would be significant with \(N \approx 70\).

7.2 Cohen’s d — Mean Difference Effect Size

\[d = \frac{\bar{X}_1 - \bar{X}_2}{s_{\text{pooled}}}\]

Value Interpretation Variance explained (\(r^2\))
\(.20\) Small \(\approx 1\%\)
\(.50\) Medium \(\approx 6\%\)
\(.80\) Large \(\approx 14\%\)

7.3 Eta-Squared \(\eta^2\) and Omega-Squared \(\omega^2\) — ANOVA Effect Sizes

\[\eta^2 = \frac{SS_{\text{Between}}}{SS_{\text{Total}}} \qquad \omega^2 = \frac{SS_B - (k-1)MS_W}{SS_T + MS_W}\]

\(\omega^2\) is preferred over \(\eta^2\) for small samples — \(\eta^2\) is a biased overestimate.

Partial \(\eta^2_p\) is used in factorial designs and MANOVA:

\[\eta^2_p = \frac{SS_{\text{effect}}}{SS_{\text{effect}} + SS_{\text{error}}}\]

Metric Small Medium Large
\(\eta^2\) / \(\omega^2\) .01 .06 .14
Cohen’s \(f = \sqrt{\eta^2/(1-\eta^2)}\) .10 .25 .40

7.4 Complete Effect Size Reference Table

Test Family Metric Small Medium Large Formula
\(t\)-test Cohen’s \(d\) .20 .50 .80 \((\bar{X}_1-\bar{X}_2)/s_p\)
ANOVA \(\eta^2\) .01 .06 .14 \(SS_B/SS_T\)
ANOVA \(\omega^2\) .01 .06 .14 Less biased than \(\eta^2\)
ANOVA Cohen’s \(f\) .10 .25 .40 \(\sqrt{\eta^2/(1-\eta^2)}\)
Correlation Pearson \(r\) .10 .30 .50 \(\text{cov}(X,Y)/s_Xs_Y\)
Regression Cohen’s \(f^2\) .02 .15 .35 \(R^2/(1-R^2)\)
\(\chi^2\) Cramér’s \(V\) .10 .30 .50 \(\sqrt{\chi^2/[N\min(r{-}1,c{-}1)]}\)
Non-parametric \(r\) .10 .30 .50 \(Z/\sqrt{N}\)
MANOVA \(\eta^2_p\) .01 .06 .14 \(SS_{\text{eff}}/(SS_{\text{eff}}+SS_{\text{err}})\)
Logistic Reg. Nagelkerke \(R^2\) .02 .13 .26 Pseudo-\(R^2\)

Benchmarks: Cohen (1988, 1992); Lakens (2013)


8 Common Transformation Guide

Apply when parametric assumptions are violated and non-parametric alternatives are not preferred (Osborne, 2002).

Skew Pattern Transformation Formula Notes
Moderate positive Square Root \(X' = \sqrt{X}\) Requires \(X \geq 0\)
Substantial positive Logarithmic \(X' = \ln(X)\) or \(\log_{10}(X)\) Requires \(X > 0\); add constant if zeros present
Severe positive Inverse (Reciprocal) \(X' = 1/X\) Reverses rank order
Moderate negative Reflect + Square Root \(X' = \sqrt{k-X}\), \(k=\max(X)+1\) Restore sign after analysis
Substantial negative Reflect + Log \(X' = \ln(k-X)\) Same reflection strategy
Proportions \([0,1]\) Arcsine Square Root \(X' = \arcsin\!\sqrt{p}\) Stabilizes binomial variance
Count data Square Root or Log \(X' = \sqrt{X}\) or \(\ln(X+1)\) Consider Poisson regression
Bimodal Do not transform Investigate subpopulations

Back-transformation: Log-transformed means become geometric means on the original scale. Always report results in original units.


9 APA Reporting Checklist

9.1 For Every Statistical Test

9.2 For OLS Regression

9.3 For EFA

9.4 For CFA and SEM


10 Quick Formula Pocket Card

\[t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}} \qquad\qquad F = \frac{MS_{\text{Between}}}{MS_{\text{Within}}}\]

\[r = \frac{\sum(X_i-\bar{X})(Y_i-\bar{Y})}{(n-1)s_Xs_Y} \qquad\qquad \rho = 1 - \frac{6\sum d_i^2}{n(n^2-1)}\]

\[\chi^2 = \sum\frac{(O-E)^2}{E} \qquad OR = e^{b} \qquad \text{logit}(p) = \ln\!\frac{p}{1-p}\]

\[d = \frac{\bar{X}_1-\bar{X}_2}{s_p} \qquad \eta^2 = \frac{SS_B}{SS_T} \qquad VIF_j = \frac{1}{1-R^2_j}\]

\[SE(b_j) = \sqrt{\frac{MS_{\text{Res}}}{\sum(X_{ij}-\bar{X}_j)^2}\cdot VIF_j} \qquad CI_{95\%}: \hat{\theta} \pm 1.96\cdot SE_{\hat{\theta}}\]

\[1-\beta = \text{Power} \qquad \alpha = \text{Type I Error} \qquad \beta = \text{Type II Error}\]


11 References

Foundational & General Statistics

American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.). https://doi.org/10.1037/0000165-000

Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology. American Psychologist, 73(1), 3–25. https://doi.org/10.1037/amp0000191

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. https://doi.org/10.1037/0033-2909.112.1.155

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997

Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966

Field, A. (2024). Discovering statistics using IBM SPSS statistics (6th ed.). SAGE Publications.

Gravetter, F. J., & Wallnau, L. B. (2021). Statistics for the behavioral sciences (10th ed.). Cengage Learning.

Tabachnick, B. G., & Fidell, L. S. (2019). Using multivariate statistics (7th ed.). Pearson Education.

Wilkinson, L., & the Task Force on Statistical Inference. (1999). Statistical methods in psychology journals. American Psychologist, 54(8), 594–604. https://doi.org/10.1037/0003-066X.54.8.594


Parametric Tests

Box, G. E. P. (1954). Some theorems on quadratic forms in the study of analysis of variance problems. Annals of Mathematical Statistics, 25(2), 290–302. https://doi.org/10.1214/aoms/1177728786

Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program. Behavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/BF03193146

Games, P. A., & Howell, J. F. (1976). Pairwise multiple comparison procedures with unequal N’s and/or variances. Journal of Educational Statistics, 1(2), 113–125. https://doi.org/10.2307/1164979

Greenhouse, S. W., & Geisser, S. (1959). On methods in the analysis of profile data. Psychometrika, 24(2), 95–112. https://doi.org/10.1007/BF02289823

Huynh, H., & Feldt, L. S. (1976). Estimation of the Box correction for degrees of freedom from sample data. Journal of Educational Statistics, 1(1), 69–82. https://doi.org/10.2307/1164736

Levene, H. (1960). Robust tests for equality of variances. In I. Olkin (Ed.), Contributions to probability and statistics (pp. 278–292). Stanford University Press.

Mauchly, J. W. (1940). Significance test for sphericity of a normal n-variate distribution. The Annals of Mathematical Statistics, 11(2), 204–209. https://doi.org/10.1214/aoms/1177731915

Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3–4), 591–611. https://doi.org/10.1093/biomet/52.3-4.591

Tukey, J. W. (1949). Comparing individual means in the analysis of variance. Biometrics, 5(2), 99–114. https://doi.org/10.2307/3001913

Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. Biometrika, 38(3–4), 330–336. https://doi.org/10.1093/biomet/38.3-4.330


Non-Parametric Tests

Dunn, O. J. (1964). Multiple comparisons using rank sums. Technometrics, 6(3), 241–252. https://doi.org/10.1080/00401706.1964.10490181

Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200), 675–701. https://doi.org/10.1080/01621459.1937.10503522

Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30(1–2), 81–93. https://doi.org/10.1093/biomet/30.1-2.81

Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260), 583–621. https://doi.org/10.1080/01621459.1952.10483441

Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1), 50–60. https://doi.org/10.1214/aoms/1177730491

Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). McGraw-Hill.

Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15(1), 72–101. https://doi.org/10.2307/1412159

Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83. https://doi.org/10.2307/3001968


Chi-Square & Association

Fisher, R. A. (1922). On the interpretation of χ² from contingency tables, and the calculation of P. Journal of the Royal Statistical Society, 85(1), 87–94. https://doi.org/10.2307/2340521

McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157. https://doi.org/10.1007/BF02295996

Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302), 157–175. https://doi.org/10.1080/14786440009463897

Yates, F. (1934). Contingency tables involving small numbers and the χ² test. Supplement to the Journal of the Royal Statistical Society, 1(2), 217–235. https://doi.org/10.2307/2983604


Regression Analysis

Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. John Wiley & Sons.

Breusch, T. S., & Pagan, A. R. (1979). A simple test for heteroscedasticity and random coefficient variation. Econometrica, 47(5), 1287–1294. https://doi.org/10.2307/1911963

Cook, R. D. (1977). Detection of influential observations in linear regression. Technometrics, 19(1), 15–18. https://doi.org/10.2307/1268249

Harrell, F. E. (2015). Regression modeling strategies (2nd ed.). Springer. https://doi.org/10.1007/978-3-319-19425-7

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). John Wiley & Sons. https://doi.org/10.1002/9781118548387

Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., & Feinstein, A. R. (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology, 49(12), 1373–1379. https://doi.org/10.1016/S0895-4356(96)00236-3

Pedhazur, E. J. (1997). Multiple regression in behavioral research (3rd ed.). Harcourt Brace.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817–838. https://doi.org/10.2307/1912934


MANOVA

Olson, C. L. (1976). On choosing a test statistic in multivariate analysis of variance. Psychological Bulletin, 83(4), 579–586. https://doi.org/10.1037/0033-2909.83.4.579

Pillai, K. C. S. (1955). Some new test criteria in multivariate analysis. The Annals of Mathematical Statistics, 26(1), 117–121. https://doi.org/10.1214/aoms/1177728599

Stevens, J. P. (2009). Applied multivariate statistics for the social sciences (5th ed.). Routledge.

Wilks, S. S. (1932). Certain generalizations in the analysis of variance. Biometrika, 24(3–4), 471–494. https://doi.org/10.1093/biomet/24.3-4.471


Factor Analysis, Reliability & SEM

Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research. Journal of Personality and Social Psychology, 51(6), 1173–1182. https://doi.org/10.1037/0022-3514.51.6.1173

Byrne, B. M. (2016). Structural equation modeling with AMOS (3rd ed.). Routledge.

Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245–276. https://doi.org/10.1207/s15327906mbr0102_10

Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). Lawrence Erlbaum Associates.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/BF02310555

Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 18(1), 39–50. https://doi.org/10.1177/002224378101800104

Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate data analysis (8th ed.). Cengage Learning.

Hayes, A. F. (2022). Introduction to mediation, moderation, and conditional process analysis (3rd ed.). The Guilford Press.

Hayes, A. F., & Coutts, J. J. (2020). Use omega rather than Cronbach’s alpha for estimating reliability. Communication Methods and Measures, 14(1), 1–24. https://doi.org/10.1080/19312458.2020.1718629

Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science, 43(1), 115–135. https://doi.org/10.1007/s11747-014-0403-8

Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. https://doi.org/10.1007/BF02289447

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118

Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20(1), 141–151. https://doi.org/10.1177/001316446002000116

Kline, R. B. (2023). Principles and practice of structural equation modeling (5th ed.). The Guilford Press.

MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the indirect effect. Multivariate Behavioral Research, 39(1), 99–128. https://doi.org/10.1207/s15327906mbr3901_4

McDonald, R. P. (1999). Test theory: A unified treatment. Lawrence Erlbaum Associates.

McNeish, D. (2018). Thanks coefficient alpha, we’ll take it from here. Psychological Methods, 23(3), 412–433. https://doi.org/10.1037/met0000144

Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior Research Methods, 40(3), 879–891. https://doi.org/10.3758/BRM.40.3.879

Velicer, W. F. (1976). Determining the number of components from the matrix of partial correlations. Psychometrika, 41(3), 321–327. https://doi.org/10.1007/BF02293557


Cluster Analysis, Effect Sizes & Transformations

Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis (5th ed.). John Wiley & Sons. https://doi.org/10.1002/9780470977811

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science. Frontiers in Psychology, 4, Article 863. https://doi.org/10.3389/fpsyg.2013.00863

Osborne, J. W. (2002). Notes on the use of data transformations. Practical Assessment, Research & Evaluation, 8(6), 1–8. https://doi.org/10.7275/4vng-5608

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. John Wiley & Sons.

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. https://doi.org/10.1037/1082-989X.7.2.147

Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244. https://doi.org/10.1080/01621459.1963.10500845