📖 How to Use This Guidebook: Content is now organized by Analysis Goal — Comparison, Correlation, Prediction, and Classification — matching the way researchers actually frame research questions. Each parametric test is paired with its non-parametric alternative. Technical assumptions have been corrected to specify that normality and homoscedasticity apply to residuals, not raw variables. New sections cover the n = 30 myth, VIF thresholds, and effect size interpretation. All code blocks now use a high-contrast light theme for full readability on RPubs and RStudio Viewer.
Before choosing any statistical test, pass through these four gates:
| Gate | Question | Action if Violated |
|---|---|---|
| Measurement Scale | What is the scale of the DV? (Nominal / Ordinal / Interval / Ratio) | Reclassify; choose appropriate test family |
| Independence | Are observations independent of each other? | Use multilevel or mixed models if clustered |
| Missing Data | Is missingness MCAR / MAR / MNAR? | Impute (MI or FIML); avoid listwise deletion |
| Sample Size | Is \(N\) adequate for the chosen test and expected effect size? | Run a priori power analysis; target \(1-\beta \geq .80\) |
Critical correction: Normality and homoscedasticity assumptions in parametric tests apply to residuals (errors) — not to the raw independent variables or raw dependent variable. Always diagnose residual distributions, never raw scores.
| Tool | Best For | Limitation |
|---|---|---|
| Shapiro–Wilk test | \(N < 50\); sensitive, formal test | At \(n > 100\), trivially rejects normality due to over-sensitivity — \(p\)-value loses diagnostic value |
| Kolmogorov–Smirnov | \(N \geq 50\) | Less powerful than Shapiro–Wilk; requires estimated parameters correction (Lilliefors variant) |
| Q-Q Plot (visual) | Preferred for \(n > 100\) | Requires judgment; not a formal test |
| Histogram + density | Any \(N\); useful for skew/kurtosis | Qualitative only |
Rule: For \(n \leq 100\), report Shapiro–Wilk \(W\) and \(p\)-value alongside a Q-Q plot. For \(n > 100\), rely on Q-Q plot inspection — a non-significant Shapiro–Wilk is no longer a reliable normality guarantee, and a significant one may flag trivial departures.
| Test | Use In | What It Tests |
|---|---|---|
| Levene’s Test | Group comparison designs (t-test, ANOVA) | Equality of variances across groups |
| Bartlett’s Test | Group designs with confirmed normality | Equality of variances — more powerful but less robust to non-normality |
| Breusch–Pagan Test | Regression models | Systematic relationship between residual variance and fitted values |
| White Test | Regression models | More general heteroscedasticity; no normality assumption required |
| Residual vs. Fitted Plot | Regression (visual) | Fan-shaped pattern signals heteroscedasticity |
Context rule: Use Levene’s Test for ANOVA-family designs. Use Breusch–Pagan (or White’s test) specifically for regression models. Using Levene’s for regression is a category error — it tests group variances, not the regression error structure.
The oft-cited rule “n ≥ 30 is sufficient for the Central Limit Theorem” is an oversimplification that causes systematic errors in practice.
| Distribution Type | Minimum Recommended \(N\) | Rationale |
|---|---|---|
| Approximately normal, symmetric | \(n \geq 20\) per group | CLT converges rapidly |
| Mild skew (skewness \(\lvert s \rvert < 1\)) | \(n \geq 30\) per group | Standard guidance applies |
| Moderate-to-heavy skew (\(\lvert s \rvert \geq 1\)) | \(n \geq 100\) per group | CLT convergence is substantially slower |
| Fat-tailed distributions (excess kurtosis \(> 3\)) | \(n \geq 100\)–\(200\) per group | Heavy tails create persistent sampling instability |
| Bimodal distributions | Use mixture models | CLT does not resolve structural bimodality |
The myth: \(n = 30\) is a minimum floor for mild departures from normality — it is not a universal pass for all distributions. For social science data (often skewed, bounded Likert composites), \(n \geq 100\) is the safer working rule before treating the sampling distribution of \(\bar{X}\) as approximately normal.
Variance Inflation Factor measures how much variance in a regression coefficient is inflated by collinearity with other predictors:
\[VIF_j = \frac{1}{1 - R^2_j}\]
where \(R^2_j\) is the coefficient of determination from regressing predictor \(X_j\) on all remaining predictors.
| VIF Value | Interpretation | Action |
|---|---|---|
| \(VIF < 3\) | No concern | Proceed normally |
| \(3 \leq VIF < 5\) | Mild | Monitor; report; no action required |
| \(5 \leq VIF < 10\) | Moderate concern | Investigate; consider combining or centering predictors |
| \(VIF \geq 10\) | Serious problem | Multicollinearity is distorting coefficients; act (ridge regression, remove predictors, PCA on correlated set) |
Also inspect the Condition Index (from eigenvalue decomposition of \(\mathbf{X}'\mathbf{X}\)): \(CI = \sqrt{\lambda_{\max}/\lambda_j}\); values \(> 30\) confirm serious collinearity.
\[n = \frac{(z_{1-\alpha/2} + z_{1-\beta})^2 \cdot \sigma^2}{\delta^2}\]
where \(\delta\) is the minimum detectable effect, \(\sigma^2\) is the variance, and \(z\) values are critical values for Type I (\(\alpha\)) and Type II (\(\beta\)) error rates. Target \(1 - \beta \geq .80\); use \(\geq .95\) for confirmatory or high-stakes research. Use G*Power (Faul et al., 2007) for validated calculations across all major test families.
Use comparison methods when the research question is: “Do groups differ on a measured outcome?”
Use when: Comparing the means of two independent groups on a continuous DV.
\[t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}}\]
| Assumption | Check | If Violated |
|---|---|---|
| Normality of residuals within each group | Shapiro–Wilk; Q-Q plot | Use Mann–Whitney U |
| Equal variances (homoscedasticity) | Levene’s Test | Apply Welch’s correction (do not pool variances) |
| Independence | Design review | Redesign or use paired test |
Rules of Thumb: \(n \geq 30\) per group (or \(\geq 100\) for skewed data); group size ratio \(\leq 3{:}1\) with unequal variances requires Welch’s correction.
Effect size — Cohen’s \(d\):
\[d = \frac{\bar{X}_1 - \bar{X}_2}{s_{\text{pooled}}}, \quad s_{\text{pooled}} = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}\]
Benchmarks: small = .20, medium = .50, large = .80 (Cohen, 1988).
Red Flags 🚩
# Compare fuel efficiency (mpg) by transmission type
# am: 0 = automatic, 1 = manual
result_t <- t.test(mpg ~ am, data = mtcars, var.equal = FALSE)
print(result_t)
#>
#> Welch Two Sample t-test
#>
#> data: mpg by am
#> t = -3.7671, df = 18.332, p-value = 0.001374
#> alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
#> 95 percent confidence interval:
#> -11.280194 -3.209684
#> sample estimates:
#> mean in group 0 mean in group 1
#> 17.14737 24.39231
# Cohen's d
grp <- split(mtcars$mpg, mtcars$am)
n1 <- length(grp[[1]]); n2 <- length(grp[[2]])
sp <- sqrt(((n1-1)*var(grp[[1]]) + (n2-1)*var(grp[[2]])) / (n1+n2-2))
cohd <- abs(diff(sapply(grp, mean))) / sp
cat(sprintf("\nCohen's d = %.3f [large effect by Cohen (1988) benchmarks]\n", cohd))
#>
#> Cohen's d = 1.478 [large effect by Cohen (1988) benchmarks]
Use when: Normality is violated, the DV is ordinal, or \(N\) is small (\(< 30\)) and distribution is skewed.
\[U_1 = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1, \quad U = \min(U_1, U_2)\]
Effect size: \(r = Z / \sqrt{N}\); benchmarks: small = .10, medium = .30, large = .50.
Red Flags 🚩 Applying Mann–Whitney to paired data (use Wilcoxon Signed-Rank); interpreting \(U\) as testing means when group distributions differ in shape.
Use when: Comparing two related or matched measurements (pre–post, twins, repeated measures on same unit).
\[t = \frac{\bar{D}}{s_D / \sqrt{n}}, \quad D_i = X_{1i} - X_{2i}\]
The normality assumption applies to the difference scores \(D_i\), not the raw scores.
Effect size — Cohen’s \(d_z\):
\[d_z = \frac{\bar{D}}{s_D}\]
Benchmarks: small = .20, medium = .50, large = .80.
Red Flags 🚩 Running two separate one-sample \(t\)-tests instead (inflates Type I error); ignoring the within-pair correlation (which drives the power advantage of this design).
Use when: Comparing means across three or more independent groups.
\[F = \frac{MS_{\text{Between}}}{MS_{\text{Within}}} = \frac{SS_B/(k-1)}{SS_W/(N-k)}\]
| Assumption | Check | If Violated |
|---|---|---|
| Normality of residuals | Shapiro–Wilk on residuals; Q-Q plot | Use Kruskal–Wallis |
| Homogeneity of variances | Levene’s Test | Use Welch’s ANOVA |
| Independence | Design review | Use multilevel model |
Post-hoc hierarchy: Tukey HSD (equal \(n\), equal variances) → Games–Howell (unequal variances) → Bonferroni (planned comparisons).
Effect size — Eta-squared \(\eta^2\):
\[\eta^2 = \frac{SS_{\text{Between}}}{SS_{\text{Total}}}\]
Also report \(\omega^2\) (less biased for small \(N\)):
\[\omega^2 = \frac{SS_B - (k-1)MS_W}{SS_T + MS_W}\]
| Metric | Small | Medium | Large |
|---|---|---|---|
| \(\eta^2\) / \(\omega^2\) | .01 | .06 | .14 |
| Cohen’s \(f\) | .10 | .25 | .40 |
Statistical significance ≠ practical importance. A one-way ANOVA with \(N = 800\) may yield \(F(2, 797) = 6.20\), \(p < .001\), yet \(\eta^2 = .015\) — accounting for less than 2% of variance. Always report and interpret effect sizes alongside \(p\)-values.
Red Flags 🚩 Running multiple \(t\)-tests instead of ANOVA (familywise error \(= 1-(1-\alpha)^k\)); reporting \(F\) without post-hoc tests when \(k > 2\); applying ANOVA to clustered data.
# Compare horsepower across cylinder groups
mtcars$cyl_f <- factor(mtcars$cyl, labels = c("4-cyl", "6-cyl", "8-cyl"))
fit_aov <- aov(hp ~ cyl_f, data = mtcars)
summary(fit_aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> cyl_f 2 104031 52015 36.18 1.32e-08 ***
#> Residuals 29 41696 1438
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Tukey HSD post-hoc
TukeyHSD(fit_aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = hp ~ cyl_f, data = mtcars)
#>
#> $cyl_f
#> diff lwr upr p adj
#> 6-cyl-4-cyl 39.64935 -5.627454 84.92616 0.0949068
#> 8-cyl-4-cyl 126.57792 88.847251 164.30859 0.0000000
#> 8-cyl-6-cyl 86.92857 43.579331 130.27781 0.0000839
# Eta-squared
ss <- summary(fit_aov)[[1]]$"Sum Sq"
eta_sq <- ss[1] / sum(ss)
omega_sq <- (ss[1] - (nlevels(mtcars$cyl_f)-1) * summary(fit_aov)[[1]]$"Mean Sq"[2]) /
(sum(ss) + summary(fit_aov)[[1]]$"Mean Sq"[2])
cat(sprintf("\neta-squared = %.3f\nomega-squared = %.3f\n", eta_sq, omega_sq))
#>
#> eta-squared = 0.714
#> omega-squared = 0.687
Use when: Normality is violated; DV is ordinal; cells have \(n < 20\).
\[H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i} - 3(N+1)\]
Requires \(n_i \geq 5\) per group for the \(\chi^2\) approximation to be valid. Post-hoc: Dunn’s test with Bonferroni or Holm correction. Effect size: \(\eta^2_H = (H - k + 1)/(N - k)\).
Use when: Examining the effects of two or more categorical IVs and their interaction on a continuous DV.
Additional rules: Always test the \(A \times B\) interaction before interpreting main effects. A significant interaction renders unconditional main effects misleading — interpret via simple effects analysis. Use Type III SS for unbalanced designs. \(n \geq 5\) per cell (ideally \(\geq 20\)).
Red Flags 🚩 Interpreting main effects as unconditional when a significant interaction is present; cells with \(n < 5\).
Use when: The same participants are measured across three or more conditions or time points.
Additional assumption — Sphericity:
\[\hat{\varepsilon} \geq \frac{1}{k-1}, \quad \text{tested by Mauchly's Test}\]
| Mauchly’s Result | Correction |
|---|---|
| \(p > .05\) — sphericity met | Standard \(F\); uncorrected \(df\) |
| \(\hat{\varepsilon} \geq .75\) | Huynh–Feldt correction |
| \(\hat{\varepsilon} < .75\) | Greenhouse–Geisser correction |
| Severe violation | Switch to MANOVA or mixed-effects model |
Red Flags 🚩 Ignoring Mauchly’s test; handling missing data by listwise deletion (use mixed-effects models instead).
Use when: Sphericity is severely violated; DV is ordinal; \(N\) is small. Effect size: Kendall’s \(W\) (concordance coefficient); \(W \geq .70\) indicates strong agreement across conditions.
Use when: Simultaneously comparing groups on two or more continuous DVs.
| Assumption | Diagnostic |
|---|---|
| Multivariate normality | Mardia’s test; Henze–Zirkler; Royston’s test |
| Homogeneity of covariance matrices | Box’s \(M\) (\(p > .001\) tolerance) |
| No multicollinearity among DVs | Bivariate \(r\) between DVs: .30–.90 |
| Independence | Design review |
| No multivariate outliers | Mahalanobis \(D^2\) at \(\chi^2_{p}\), \(p < .001\) |
Test statistic selection:
| Statistic | Best When |
|---|---|
| Wilks’ \(\Lambda\) | Most common; balanced power (Wilks, 1932) |
| Pillai’s Trace | Default choice — most robust to violations (Pillai, 1955; Olson, 1976) |
| Hotelling’s Trace | One dominant canonical dimension |
| Roy’s Largest Root | Maximum power with one variate; most sensitive to violations |
Effect size: Partial \(\eta^2_p\); benchmarks same as ANOVA (.01 / .06 / .14).
Red Flags 🚩 Running separate ANOVAs instead (inflates familywise error); DVs entirely uncorrelated (MANOVA loses advantage); Box’s \(M\) severely significant with unequal group sizes.
Use correlation methods when the research question is: “How strongly are variables associated?”
Use when: Both variables are continuous and interval/ratio scaled; relationship is expected to be linear.
\[r = \frac{\sum(X_i - \bar{X})(Y_i - \bar{Y})}{(n-1)\,s_X\,s_Y}\]
| Assumption | Check |
|---|---|
| Both variables continuous | Measurement review |
| Linear relationship | Scatterplot inspection |
| Bivariate normality (for inference) | Q-Q plots; Mardia’s test |
| No severe outliers | Scatterplot; leverage statistics |
| Independence of observations | Design review |
Benchmarks (Cohen, 1988): small = .10, medium = .30, large = .50.
The squared correlation \(r^2\) is the coefficient of determination — the proportion of variance in \(Y\) explained by \(X\).
Red Flags 🚩 Interpreting correlation as causation; failing to inspect scatterplot for non-linearity or heteroscedasticity; not checking for outliers driving spurious correlations.
# Pearson r: mpg vs weight
r_pearson <- cor.test(mtcars$mpg, mtcars$wt, method = "pearson")
cat("--- Pearson Correlation ---\n")
#> --- Pearson Correlation ---
print(r_pearson)
#>
#> Pearson's product-moment correlation
#>
#> data: mtcars$mpg and mtcars$wt
#> t = -9.559, df = 30, p-value = 1.294e-10
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#> -0.9338264 -0.7440872
#> sample estimates:
#> cor
#> -0.8676594
cat(sprintf("R-squared = %.3f (proportion of shared variance)\n",
r_pearson$estimate^2))
#> R-squared = 0.753 (proportion of shared variance)
# Spearman rho (non-parametric alternative)
r_spearman <- cor.test(mtcars$mpg, mtcars$wt, method = "spearman")
cat("\n--- Spearman Correlation (non-parametric) ---\n")
#>
#> --- Spearman Correlation (non-parametric) ---
print(r_spearman)
#>
#> Spearman's rank correlation rho
#>
#> data: mtcars$mpg and mtcars$wt
#> S = 10292, p-value = 1.488e-11
#> alternative hypothesis: true rho is not equal to 0
#> sample estimates:
#> rho
#> -0.886422
Use when: DV or IV is ordinal; or normality assumption for Pearson \(r\) is violated; or outliers are present.
\[\rho = 1 - \frac{6\sum d_i^2}{n(n^2-1)}, \quad \text{where } d_i = \text{rank}(X_i) - \text{rank}(Y_i)\]
Kendall’s \(\tau\) is preferred over \(\rho\) when there are many tied ranks or when \(N\) is small — it has better sampling properties in those conditions.
Red Flags 🚩 Using Spearman \(\rho\) when the research hypothesis is specifically about linear association (Pearson \(r\) is the appropriate test).
Use when: Testing association between two categorical (nominal) variables in a contingency table.
\[\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad E_{ij} = \frac{R_i \cdot C_j}{N}\]
| Assumption | Threshold |
|---|---|
| Independence of observations | No repeated measures on same case |
| Expected cell frequencies | \(E_{ij} \geq 5\) in \(\geq 80\%\) of cells; no cell with \(E < 1\) |
| Mutually exclusive categories | Categories do not overlap |
If expected cell frequencies are violated: Use Fisher’s Exact Test (\(2 \times 2\)) or collapse categories.
Effect size — Cramér’s \(V\):
\[V = \sqrt{\frac{\chi^2}{N \cdot \min(r-1,\, c-1)}}\]
Benchmarks: small = .10, medium = .30, large = .50.
Red Flags 🚩 Running \(\chi^2\) on dependent observations — use McNemar’s test (McNemar, 1947); interpreting \(\chi^2\) as a strength measure (it tests independence, not association magnitude).
Use when: One variable is truly dichotomous (binary); the other is continuous.
\[r_{pb} = \frac{\bar{X}_1 - \bar{X}_0}{s_X}\sqrt{\frac{n_1 n_0}{n^2}}\]
This is mathematically equivalent to Pearson \(r\) when one variable is coded 0/1. It relates directly to the independent \(t\)-test: \(r_{pb}^2 = t^2/(t^2 + df)\).
Use prediction methods when the research question is: “How well can we predict an outcome from a set of predictors?”
Use when: DV is continuous; predicting \(\hat{Y}\) from one or more IVs.
\[\hat{Y} = b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_k X_k + \varepsilon\]
Correction from v2: The normality and homoscedasticity assumptions in OLS regression apply to the residuals \(\varepsilon_i = Y_i - \hat{Y}_i\), not to the raw IV or DV distributions. You can and should have non-normally distributed predictors. It is the error term that must satisfy distributional assumptions.
| Assumption | What It Tests | Diagnostic | Violation Remedy |
|---|---|---|---|
| Linearity | \(E[\varepsilon \mid X] = 0\) | Partial regression plots; RESET test | Polynomial terms; transformations |
| Independence of residuals | No autocorrelation | Durbin–Watson (\(DW \approx 2\)) | GLS; clustered SE; time-series models |
| Homoscedasticity of residuals | \(\text{Var}(\varepsilon_i) = \sigma^2\) (constant) | Breusch–Pagan test; Residual vs. Fitted plot | WLS; HC3 heteroscedasticity-robust SE |
| Normality of residuals | \(\varepsilon \sim \mathcal{N}(0,\sigma^2)\) | Shapiro–Wilk on residuals; Q-Q plot of residuals | Transform DV; robust regression |
| No multicollinearity | Predictors not collinear | \(VIF < 5\) (moderate); \(VIF < 10\) (serious) | Ridge regression; drop/combine predictors |
| No influential outliers | No single cases dominating fit | Cook’s \(D > 4/n\); leverage \(h_{ii} > 2(p+1)/n\) | Robust regression; investigate cases |
Standard Error of regression coefficient:
\[SE(b_j) = \sqrt{\frac{MS_{\text{Residual}}}{\sum(X_{ij} - \bar{X}_j)^2 \cdot (1 - R^2_j)}}\]
Note that \(1/(1-R^2_j) = VIF_j\) — multicollinearity directly inflates the SE of \(b_j\).
| Heuristic | Value |
|---|---|
| Events per variable (EPV) | \(N/k \geq 10\) (minimum); \(N/k \geq 20\) (recommended) |
| Minimum \(N\) for \(R^2\) test | \(50 + 8k\) |
| VIF — mild concern | \(3 \leq VIF < 5\) |
| VIF — moderate concern | \(5 \leq VIF < 10\) |
| VIF — serious problem | \(VIF \geq 10\) |
| Condition Index | \(> 30\) confirms serious collinearity |
| Cook’s \(D\) threshold | \(> 4/n\) — investigate case |
Effect size — Cohen’s \(f^2\):
\[f^2 = \frac{R^2}{1 - R^2}; \quad f^2 = 0.02 \text{ (small)},\; 0.15 \text{ (medium)},\; 0.35 \text{ (large)}\]
Red Flags 🚩
# Multiple regression: mpg ~ weight + horsepower + transmission
fit <- lm(mpg ~ wt + hp + am, data = mtcars)
summary(fit)
#>
#> Call:
#> lm(formula = mpg ~ wt + hp + am, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.4221 -1.7924 -0.3788 1.2249 5.5317
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 34.002875 2.642659 12.867 2.82e-13 ***
#> wt -2.878575 0.904971 -3.181 0.003574 **
#> hp -0.037479 0.009605 -3.902 0.000546 ***
#> am 2.083710 1.376420 1.514 0.141268
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.538 on 28 degrees of freedom
#> Multiple R-squared: 0.8399, Adjusted R-squared: 0.8227
#> F-statistic: 48.96 on 3 and 28 DF, p-value: 2.908e-11
# --- Assumption checks ---
# 1. Normality of RESIDUALS (not raw variables)
shapiro.test(residuals(fit))
#>
#> Shapiro-Wilk normality test
#>
#> data: residuals(fit)
#> W = 0.9453, p-value = 0.1059
# 2. Homoscedasticity of RESIDUALS via Breusch-Pagan
if (requireNamespace("lmtest", quietly = TRUE)) {
cat("\nBreusch-Pagan test for heteroscedasticity:\n")
print(lmtest::bptest(fit))
} else {
cat("\nInstall 'lmtest' for Breusch-Pagan: install.packages('lmtest')\n")
}
#>
#> Breusch-Pagan test for heteroscedasticity:
#>
#> studentized Breusch-Pagan test
#>
#> data: fit
#> BP = 5.534, df = 3, p-value = 0.1366
# 3. VIF — multicollinearity
if (requireNamespace("car", quietly = TRUE)) {
cat("\nVariance Inflation Factors:\n")
vif_vals <- car::vif(fit)
print(vif_vals)
cat(sprintf("Max VIF = %.2f [threshold: 5 = moderate, 10 = serious]\n",
max(vif_vals)))
} else {
cat("\nInstall 'car' for VIF: install.packages('car')\n")
}
#>
#> Variance Inflation Factors:
#> wt hp am
#> 3.774838 2.088124 2.271082
#> Max VIF = 3.77 [threshold: 5 = moderate, 10 = serious]
# 4. Influential cases — Cook's Distance
thresh <- 4 / nrow(mtcars)
n_inf <- sum(cooks.distance(fit) > thresh)
cat(sprintf("\nCook's D > 4/n (%.4f): %d influential observation(s)\n",
thresh, n_inf))
#>
#> Cook's D > 4/n (0.1250): 4 influential observation(s)
# 5. Cohen's f-squared
r2 <- summary(fit)$r.squared
f2 <- r2 / (1 - r2)
cat(sprintf("\nR-squared = %.3f | Adjusted R-squared = %.3f | f2 = %.3f\n",
r2, summary(fit)$adj.r.squared, f2))
#>
#> R-squared = 0.840 | Adjusted R-squared = 0.823 | f2 = 5.246
Use when: DV is binary (0/1); model estimates the log-odds of the outcome.
\[\log\!\left(\frac{p}{1-p}\right) = b_0 + b_1 X_1 + \cdots + b_k X_k\]
\[\hat{p} = \frac{1}{1 + e^{-(b_0 + \mathbf{b}'\mathbf{X})}}\]
| Assumption | Diagnostic |
|---|---|
| Binary DV | Design review |
| No complete separation | Inspect max-rescaled \(R^2\); use Firth’s penalized likelihood if separated |
| No multicollinearity | \(VIF < 10\) |
| Independence of errors | Design review; GEE or mixed logistic if clustered |
| Adequate EPV | \(\geq 10\) events per predictor (Peduzzi et al., 1996) |
Reporting: \(OR = e^{b_k}\); \(OR > 1\) = increased odds; \(OR < 1\) = decreased odds. Report Nagelkerke \(R^2\) and AUC–ROC (\(\geq .70\) = acceptable, \(\geq .80\) = good, \(\geq .90\) = excellent).
Red Flags 🚩 EPV \(< 10\); applying to ordinal DV with \(> 2\) levels (use ordinal logistic); misinterpreting \(b_k\) as a probability change (it is a log-odds change).
Use when: Testing whether a theoretically motivated block of predictors accounts for incremental variance above a prior block.
\[\Delta R^2 = R^2_{\text{Model 2}} - R^2_{\text{Model 1}}\]
\[F_{\Delta R^2} = \frac{\Delta R^2 / \Delta k}{(1 - R^2_{\text{Model 2}}) / (N - k_2 - 1)}\]
Rules: Block entry order must be theory-driven; report \(\Delta R^2\), \(\Delta F\), and exact \(p\)-value per block; recheck all OLS assumptions at each step.
Use these methods when the research question is: “What underlying structure exists in the data?” or “How can cases or variables be grouped?”
| Feature | PCA | EFA |
|---|---|---|
| Goal | Data reduction; explain total variance | Identify latent constructs; explain common variance |
| Variance modeled | 100% of observed variance | Shared variance only (communalities) |
| Output | Components = linear composites of observed variables | Factors = hypothetical latent variables |
| Use when | Reducing variables with minimal information loss | Hypothesizing underlying theoretical constructs |
| Validation step | Not applicable | CFA required for confirmatory testing |
Core assumptions for EFA:
| Rule | Notes |
|---|---|
| Kaiser’s Criterion (\(\lambda > 1.0\)) | Known to overextract (Kaiser, 1960) — use as lower bound only |
| Scree Plot elbow | Subjective; combine with quantitative criteria (Cattell, 1966) |
| Parallel Analysis | Gold standard — compare observed eigenvalues to random-data eigenvalues (Horn, 1965) |
| Velicer’s MAP | Minimizes average squared partial correlation (Velicer, 1976) |
Rotation: Oblique (Promax/Oblimin) for correlated factors (default for most social science constructs); Orthogonal (Varimax) only when factors are theoretically and empirically independent.
Reliability:
\[\alpha = \frac{k\bar{r}}{1+(k-1)\bar{r}} \quad \text{(Cronbach, 1951)} \qquad \omega_h = \frac{(\sum\lambda_i)^2}{(\sum\lambda_i)^2+\sum\delta_i} \quad \text{(McDonald, 1999 — preferred)}\]
| Value | Interpretation |
|---|---|
| \(\geq .90\) | Excellent |
| \(\geq .80\) | Good |
| \(\geq .70\) | Acceptable |
| \(\geq .60\) | Questionable |
| \(< .60\) | Poor |
Red Flags 🚩 Using PCA for construct validation; reporting Kaiser’s criterion alone; Cronbach’s \(\alpha\) as evidence of unidimensionality (it is not — use \(\omega_h\); McNeish, 2018).
Use when: Testing a pre-specified factor structure grounded in theory or prior EFA.
Additional assumptions: \(N \geq 200\); \(N \geq 5\) per free parameter; use MLR or WLSMV for ordinal indicators.
| Index | Acceptable | Good | Note |
|---|---|---|---|
| \(\chi^2/df\) | \(< 5.0\) | \(< 2.0\) | \(\chi^2\) alone is \(N\)-sensitive |
| CFI | \(\geq .90\) | \(\geq .95\) | — |
| TLI | \(\geq .90\) | \(\geq .95\) | Penalizes complexity |
| RMSEA | \(< .08\) | \(< .06\) | Report 90% CI; test \(H_0\): RMSEA \(\leq .05\) |
| SRMR | \(< .10\) | \(< .08\) | — |
Hu & Bentler (1999)
| Type | Indicator | Threshold |
|---|---|---|
| Convergent | AVE | \(\geq .50\) |
| Convergent | Factor loadings | \(\lambda \geq .50\) (ideally \(\geq .70\)) |
| Discriminant | Fornell–Larcker: \(AVE > \phi^2_{ij}\) | For each construct pair |
| Discriminant | HTMT | \(< .85\) (strict) or \(< .90\) (liberal) |
| Composite reliability | CR | \(\geq .70\) |
\[AVE = \frac{\sum\lambda_i^2}{\sum\lambda_i^2+\sum\delta_i}, \qquad CR = \frac{(\sum\lambda_i)^2}{(\sum\lambda_i)^2+\sum\delta_i}\]
Red Flags 🚩 Relying on \(\chi^2\) alone; applying modification indices \(> 10\) atheoretically; Heywood cases (negative residual variances); not testing discriminant validity.
Use when: Testing complex theoretical models with latent variables, mediation, and/or moderation.
\[\boldsymbol{\Sigma}(\boldsymbol{\theta}) = \boldsymbol{\Lambda}\,\boldsymbol{\Phi}\,\boldsymbol{\Lambda}' + \boldsymbol{\Theta}\]
Rules: \(N \geq 200\); \(N \geq 10\) per free parameter; bootstrap \(B \geq 5000\) for indirect effects; report bias-corrected 95% CI for mediation (Preacher & Hayes, 2008).
Mediation indirect effect:
\[a \cdot b = a \times b, \quad a: X \to M,\quad b: M \to Y\]
Avoid Sobel test — use bias-corrected bootstrapped CIs instead (MacKinnon et al., 2004).
Red Flags 🚩 Not checking temporal precedence for mediation; treating non-significant direct paths as full mediation without bootstrapped CIs; ignoring equivalent models.
Use when: Discovering natural groupings in data with no predefined criterion variable.
| Method | Best When |
|---|---|
| Hierarchical (Ward’s linkage) | Small \(N\); unknown number of clusters (Ward, 1963) |
| \(K\)-Means | Large \(N\); hypothesized or fixed \(k\) |
| GMM (Gaussian Mixture Models) | Overlapping clusters; probabilistic assignment |
| DBSCAN | Non-spherical clusters; noise/outlier handling needed |
Rules: Always standardize variables before clustering; determine \(k\) using Elbow + Silhouette (\(\bar{s} \geq .50\); Rousseeuw, 1987) + Gap statistic; validate on split-half samples.
Red Flags 🚩 Unstandardized variables; researcher-chosen \(k\) without empirical support; treating cluster membership as a measured IV in subsequent regression (circular).
Logic flow: If [Data Type] + If [Analysis Goal] → Use [Test]
IF goal = DESCRIBE distribution → Descriptive statistics, histograms, box plots, density plots
IF goal = COMPARE groups IF groups = 2 IF independent + continuous DV + normal residuals → Independent t-test IF independent + non-normal / ordinal DV → Mann-Whitney U IF related/paired + continuous DV + normal diffs → Paired t-test IF related/paired + non-normal / ordinal DV → Wilcoxon Signed-Rank IF groups >= 3 IF independent + continuous + normal + equal var → One-Way ANOVA IF independent + continuous + violated assump. → Kruskal-Wallis H IF repeated measures + continuous → RM-ANOVA IF repeated measures + non-normal / ordinal → Friedman Test IF 2+ IVs (factorial) → Factorial ANOVA IF 2+ continuous DVs simultaneously → MANOVA
IF goal = CORRELATE variables IF both continuous + linear + normal → Pearson r IF ordinal / non-normal / outliers present → Spearman rho IF many ties or small N → Kendall tau IF 1 continuous + 1 binary → Point-Biserial r_pb IF both categorical (nominal) → Chi-Square (or Fisher Exact) IF both ordinal in contingency table → Gamma or Somer’s d
IF goal = PREDICT an outcome IF DV continuous + predictors any scale → OLS Multiple Regression IF DV continuous + blocks of predictors → Hierarchical Regression IF DV binary (0/1) → Logistic Regression IF DV ordinal (3+ ordered levels) → Ordinal Logistic Regression IF DV is a count → Poisson / Negative Binomial IF DV continuous + predictors correlated (VIF>10) → Ridge / LASSO Regression IF complex model with latent variables → SEM / CFA
IF goal = FIND STRUCTURE in variables IF goal = data reduction, no theory → PCA IF goal = latent constructs, exploratory → EFA IF goal = test pre-specified factor structure → CFA IF goal = test complex latent path model → SEM
IF goal = FIND GROUPS in cases IF N small, k unknown → Hierarchical Cluster (Ward) IF N large, k hypothesized → K-Means Cluster IF overlapping groups, probabilistic → Gaussian Mixture Model IF non-spherical, with noise → DBSCAN
Core principle: Statistical significance (\(p < .05\)) tells you that an effect is unlikely to be zero. Effect size tells you how large the effect is in practice. With large samples, even trivially small effects achieve \(p < .001\). With small samples, important effects may not reach \(p < .05\). Always report and interpret effect sizes alongside \(p\)-values (Cohen, 1994; Cumming, 2014).
Worked example of the distinction:
A study of \(N = 1000\) finds: \(t(998) = 3.16\), \(p = .002\), Cohen’s \(d = 0.20\) (small effect). The difference is real but trivial — it explains only \(\approx 1\%\) of variance.
Another study with \(N = 40\) finds: \(t(38) = 1.90\), \(p = .065\), Cohen’s \(d = 0.62\) (medium effect). The effect is practically meaningful but underpowered — it would be significant with \(N \approx 70\).
\[d = \frac{\bar{X}_1 - \bar{X}_2}{s_{\text{pooled}}}\]
| Value | Interpretation | Variance explained (\(r^2\)) |
|---|---|---|
| \(.20\) | Small | \(\approx 1\%\) |
| \(.50\) | Medium | \(\approx 6\%\) |
| \(.80\) | Large | \(\approx 14\%\) |
\[\eta^2 = \frac{SS_{\text{Between}}}{SS_{\text{Total}}} \qquad \omega^2 = \frac{SS_B - (k-1)MS_W}{SS_T + MS_W}\]
\(\omega^2\) is preferred over \(\eta^2\) for small samples — \(\eta^2\) is a biased overestimate.
Partial \(\eta^2_p\) is used in factorial designs and MANOVA:
\[\eta^2_p = \frac{SS_{\text{effect}}}{SS_{\text{effect}} + SS_{\text{error}}}\]
| Metric | Small | Medium | Large |
|---|---|---|---|
| \(\eta^2\) / \(\omega^2\) | .01 | .06 | .14 |
| Cohen’s \(f = \sqrt{\eta^2/(1-\eta^2)}\) | .10 | .25 | .40 |
| Test Family | Metric | Small | Medium | Large | Formula |
|---|---|---|---|---|---|
| \(t\)-test | Cohen’s \(d\) | .20 | .50 | .80 | \((\bar{X}_1-\bar{X}_2)/s_p\) |
| ANOVA | \(\eta^2\) | .01 | .06 | .14 | \(SS_B/SS_T\) |
| ANOVA | \(\omega^2\) | .01 | .06 | .14 | Less biased than \(\eta^2\) |
| ANOVA | Cohen’s \(f\) | .10 | .25 | .40 | \(\sqrt{\eta^2/(1-\eta^2)}\) |
| Correlation | Pearson \(r\) | .10 | .30 | .50 | \(\text{cov}(X,Y)/s_Xs_Y\) |
| Regression | Cohen’s \(f^2\) | .02 | .15 | .35 | \(R^2/(1-R^2)\) |
| \(\chi^2\) | Cramér’s \(V\) | .10 | .30 | .50 | \(\sqrt{\chi^2/[N\min(r{-}1,c{-}1)]}\) |
| Non-parametric | \(r\) | .10 | .30 | .50 | \(Z/\sqrt{N}\) |
| MANOVA | \(\eta^2_p\) | .01 | .06 | .14 | \(SS_{\text{eff}}/(SS_{\text{eff}}+SS_{\text{err}})\) |
| Logistic Reg. | Nagelkerke \(R^2\) | .02 | .13 | .26 | Pseudo-\(R^2\) |
Benchmarks: Cohen (1988, 1992); Lakens (2013)
Apply when parametric assumptions are violated and non-parametric alternatives are not preferred (Osborne, 2002).
| Skew Pattern | Transformation | Formula | Notes |
|---|---|---|---|
| Moderate positive | Square Root | \(X' = \sqrt{X}\) | Requires \(X \geq 0\) |
| Substantial positive | Logarithmic | \(X' = \ln(X)\) or \(\log_{10}(X)\) | Requires \(X > 0\); add constant if zeros present |
| Severe positive | Inverse (Reciprocal) | \(X' = 1/X\) | Reverses rank order |
| Moderate negative | Reflect + Square Root | \(X' = \sqrt{k-X}\), \(k=\max(X)+1\) | Restore sign after analysis |
| Substantial negative | Reflect + Log | \(X' = \ln(k-X)\) | Same reflection strategy |
| Proportions \([0,1]\) | Arcsine Square Root | \(X' = \arcsin\!\sqrt{p}\) | Stabilizes binomial variance |
| Count data | Square Root or Log | \(X' = \sqrt{X}\) or \(\ln(X+1)\) | Consider Poisson regression |
| Bimodal | — | Do not transform | Investigate subpopulations |
Back-transformation: Log-transformed means become geometric means on the original scale. Always report results in original units.
\[t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}} \qquad\qquad F = \frac{MS_{\text{Between}}}{MS_{\text{Within}}}\]
\[r = \frac{\sum(X_i-\bar{X})(Y_i-\bar{Y})}{(n-1)s_Xs_Y} \qquad\qquad \rho = 1 - \frac{6\sum d_i^2}{n(n^2-1)}\]
\[\chi^2 = \sum\frac{(O-E)^2}{E} \qquad OR = e^{b} \qquad \text{logit}(p) = \ln\!\frac{p}{1-p}\]
\[d = \frac{\bar{X}_1-\bar{X}_2}{s_p} \qquad \eta^2 = \frac{SS_B}{SS_T} \qquad VIF_j = \frac{1}{1-R^2_j}\]
\[SE(b_j) = \sqrt{\frac{MS_{\text{Res}}}{\sum(X_{ij}-\bar{X}_j)^2}\cdot VIF_j} \qquad CI_{95\%}: \hat{\theta} \pm 1.96\cdot SE_{\hat{\theta}}\]
\[1-\beta = \text{Power} \qquad \alpha = \text{Type I Error} \qquad \beta = \text{Type II Error}\]
Foundational & General Statistics
American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.). https://doi.org/10.1037/0000165-000
Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology. American Psychologist, 73(1), 3–25. https://doi.org/10.1037/amp0000191
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. https://doi.org/10.1037/0033-2909.112.1.155
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966
Field, A. (2024). Discovering statistics using IBM SPSS statistics (6th ed.). SAGE Publications.
Gravetter, F. J., & Wallnau, L. B. (2021). Statistics for the behavioral sciences (10th ed.). Cengage Learning.
Tabachnick, B. G., & Fidell, L. S. (2019). Using multivariate statistics (7th ed.). Pearson Education.
Wilkinson, L., & the Task Force on Statistical Inference. (1999). Statistical methods in psychology journals. American Psychologist, 54(8), 594–604. https://doi.org/10.1037/0003-066X.54.8.594
Parametric Tests
Box, G. E. P. (1954). Some theorems on quadratic forms in the study of analysis of variance problems. Annals of Mathematical Statistics, 25(2), 290–302. https://doi.org/10.1214/aoms/1177728786
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program. Behavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/BF03193146
Games, P. A., & Howell, J. F. (1976). Pairwise multiple comparison procedures with unequal N’s and/or variances. Journal of Educational Statistics, 1(2), 113–125. https://doi.org/10.2307/1164979
Greenhouse, S. W., & Geisser, S. (1959). On methods in the analysis of profile data. Psychometrika, 24(2), 95–112. https://doi.org/10.1007/BF02289823
Huynh, H., & Feldt, L. S. (1976). Estimation of the Box correction for degrees of freedom from sample data. Journal of Educational Statistics, 1(1), 69–82. https://doi.org/10.2307/1164736
Levene, H. (1960). Robust tests for equality of variances. In I. Olkin (Ed.), Contributions to probability and statistics (pp. 278–292). Stanford University Press.
Mauchly, J. W. (1940). Significance test for sphericity of a normal n-variate distribution. The Annals of Mathematical Statistics, 11(2), 204–209. https://doi.org/10.1214/aoms/1177731915
Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3–4), 591–611. https://doi.org/10.1093/biomet/52.3-4.591
Tukey, J. W. (1949). Comparing individual means in the analysis of variance. Biometrics, 5(2), 99–114. https://doi.org/10.2307/3001913
Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. Biometrika, 38(3–4), 330–336. https://doi.org/10.1093/biomet/38.3-4.330
Non-Parametric Tests
Dunn, O. J. (1964). Multiple comparisons using rank sums. Technometrics, 6(3), 241–252. https://doi.org/10.1080/00401706.1964.10490181
Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200), 675–701. https://doi.org/10.1080/01621459.1937.10503522
Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30(1–2), 81–93. https://doi.org/10.1093/biomet/30.1-2.81
Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260), 583–621. https://doi.org/10.1080/01621459.1952.10483441
Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1), 50–60. https://doi.org/10.1214/aoms/1177730491
Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). McGraw-Hill.
Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15(1), 72–101. https://doi.org/10.2307/1412159
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83. https://doi.org/10.2307/3001968
Chi-Square & Association
Fisher, R. A. (1922). On the interpretation of χ² from contingency tables, and the calculation of P. Journal of the Royal Statistical Society, 85(1), 87–94. https://doi.org/10.2307/2340521
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157. https://doi.org/10.1007/BF02295996
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302), 157–175. https://doi.org/10.1080/14786440009463897
Yates, F. (1934). Contingency tables involving small numbers and the χ² test. Supplement to the Journal of the Royal Statistical Society, 1(2), 217–235. https://doi.org/10.2307/2983604
Regression Analysis
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. John Wiley & Sons.
Breusch, T. S., & Pagan, A. R. (1979). A simple test for heteroscedasticity and random coefficient variation. Econometrica, 47(5), 1287–1294. https://doi.org/10.2307/1911963
Cook, R. D. (1977). Detection of influential observations in linear regression. Technometrics, 19(1), 15–18. https://doi.org/10.2307/1268249
Harrell, F. E. (2015). Regression modeling strategies (2nd ed.). Springer. https://doi.org/10.1007/978-3-319-19425-7
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). John Wiley & Sons. https://doi.org/10.1002/9781118548387
Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., & Feinstein, A. R. (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology, 49(12), 1373–1379. https://doi.org/10.1016/S0895-4356(96)00236-3
Pedhazur, E. J. (1997). Multiple regression in behavioral research (3rd ed.). Harcourt Brace.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817–838. https://doi.org/10.2307/1912934
MANOVA
Olson, C. L. (1976). On choosing a test statistic in multivariate analysis of variance. Psychological Bulletin, 83(4), 579–586. https://doi.org/10.1037/0033-2909.83.4.579
Pillai, K. C. S. (1955). Some new test criteria in multivariate analysis. The Annals of Mathematical Statistics, 26(1), 117–121. https://doi.org/10.1214/aoms/1177728599
Stevens, J. P. (2009). Applied multivariate statistics for the social sciences (5th ed.). Routledge.
Wilks, S. S. (1932). Certain generalizations in the analysis of variance. Biometrika, 24(3–4), 471–494. https://doi.org/10.1093/biomet/24.3-4.471
Factor Analysis, Reliability & SEM
Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research. Journal of Personality and Social Psychology, 51(6), 1173–1182. https://doi.org/10.1037/0022-3514.51.6.1173
Byrne, B. M. (2016). Structural equation modeling with AMOS (3rd ed.). Routledge.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245–276. https://doi.org/10.1207/s15327906mbr0102_10
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). Lawrence Erlbaum Associates.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/BF02310555
Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 18(1), 39–50. https://doi.org/10.1177/002224378101800104
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate data analysis (8th ed.). Cengage Learning.
Hayes, A. F. (2022). Introduction to mediation, moderation, and conditional process analysis (3rd ed.). The Guilford Press.
Hayes, A. F., & Coutts, J. J. (2020). Use omega rather than Cronbach’s alpha for estimating reliability. Communication Methods and Measures, 14(1), 1–24. https://doi.org/10.1080/19312458.2020.1718629
Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science, 43(1), 115–135. https://doi.org/10.1007/s11747-014-0403-8
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. https://doi.org/10.1007/BF02289447
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118
Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20(1), 141–151. https://doi.org/10.1177/001316446002000116
Kline, R. B. (2023). Principles and practice of structural equation modeling (5th ed.). The Guilford Press.
MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the indirect effect. Multivariate Behavioral Research, 39(1), 99–128. https://doi.org/10.1207/s15327906mbr3901_4
McDonald, R. P. (1999). Test theory: A unified treatment. Lawrence Erlbaum Associates.
McNeish, D. (2018). Thanks coefficient alpha, we’ll take it from here. Psychological Methods, 23(3), 412–433. https://doi.org/10.1037/met0000144
Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior Research Methods, 40(3), 879–891. https://doi.org/10.3758/BRM.40.3.879
Velicer, W. F. (1976). Determining the number of components from the matrix of partial correlations. Psychometrika, 41(3), 321–327. https://doi.org/10.1007/BF02293557
Cluster Analysis, Effect Sizes & Transformations
Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis (5th ed.). John Wiley & Sons. https://doi.org/10.1002/9780470977811
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science. Frontiers in Psychology, 4, Article 863. https://doi.org/10.3389/fpsyg.2013.00863
Osborne, J. W. (2002). Notes on the use of data transformations. Practical Assessment, Research & Evaluation, 8(6), 1–8. https://doi.org/10.7275/4vng-5608
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. John Wiley & Sons.
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. https://doi.org/10.1037/1082-989X.7.2.147
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244. https://doi.org/10.1080/01621459.1963.10500845