1 Required libraries

library(performance)
library(gt)
library(dplyr)
library(afcommon)

2 Introduction

This document demonstrates the usage of two regression diagnostic functions:

  1. af_vif_test(): Tests for multicollinearity using Variance Inflation Factors (VIF)
  2. af_hetero_test(): Tests for heteroscedasticity using the Breusch-Pagan test

We’ll examine both passing and failing cases for each test using different datasets and model specifications.

First, let’s define our diagnostic functions:

source("../Common/af_regression.R")

3 Case 1: Model with Good Properties

Let’s first look at a well-behaved model using the built-in mtcars dataset.

# Fit a model with well-behaved predictors
good_model <- lm(mpg ~ wt + hp, data = mtcars)

# Test for multicollinearity
vif_results <- af_vif_test(good_model)
cat(vif_results$result_line)
We use VIF to test for multicolinearity. The result indicates that all VIF values <= 10. 
 Multicollinearity is not a concern.
vif_results$gt_tbl
VIF Table
Term VIF VIF_CI_low VIF_CI_high SE_factor Tolerance Tolerance_CI_low Tolerance_CI_high
wt 1.8 1.3 3.0 1.3 0.6 0.3 0.8
hp 1.8 1.3 3.0 1.3 0.6 0.3 0.8

# Test for heteroscedasticity
hetero_results <- af_hetero_test(good_model)
cat(hetero_results$result_line)
We use Breusch-Pagan test for homoscedasticity (equal variance) assumption. The result of P-value 0.402 >= 0.05, indicates homoscedasticity (no concern for heteroscedasticity).

In this case, both tests pass because: 1. The VIF values are well below 10, indicating no concerning multicollinearity 2. The heteroscedasticity test p-value is above 0.05, indicating homoscedastic residuals

4 Case 2: Model with Multicollinearity

Now let’s create a model with high multicollinearity by including redundant predictors.

# Create a new variable that's highly correlated with weight
mtcars$wt_scaled <- mtcars$wt * 2.2 + rnorm(32, 0, 0.1)  # Convert to kg with small noise

# Fit model with multicollinearity
bad_vif_model <- lm(mpg ~ wt + wt_scaled + hp, data = mtcars)

# Test for multicollinearity
vif_results_bad <- af_vif_test(bad_vif_model)
cat(vif_results_bad$result_line)
We use VIF to test for multicolinearity. The result indicates VIF values > 10 detected. 
 Multicollinearity is high.
vif_results_bad$gt_tbl
VIF Table
Term VIF VIF_CI_low VIF_CI_high SE_factor Tolerance Tolerance_CI_low Tolerance_CI_high
wt 380.5 217.7 665.4 19.5 0.0 0.0 0.0
wt_scaled 383.4 219.4 670.6 19.6 0.0 0.0 0.0
hp 1.8 1.3 3.0 1.3 0.6 0.3 0.8

This model fails the VIF test because we intentionally included two highly correlated predictors (wt and wt_scaled).

5 Case 3: Model with Heteroscedasticity

Let’s create a model that exhibits heteroscedasticity by using a transformation that induces unequal variance.

# Create dataset with heteroscedastic errors
set.seed(123)
n <- 100
x <- runif(n, 0, 10)
y <- 2 * x + rnorm(n, 0, x/2)  # Error variance increases with x
het_data <- data.frame(x = x, y = y)

# Fit model
bad_hetero_model <- lm(y ~ x, data = het_data)

# Test for heteroscedasticity
hetero_results_bad <- af_hetero_test(bad_hetero_model)
cat(hetero_results_bad$result_line)
We use Breusch-Pagan test for homoscedasticity (equal variance) assumption. The result of P-value 0 < 0.05 indicates a concern for heteroscedasticity.

# Visualize heteroscedasticity
plot(bad_hetero_model, which = 1)

This model fails the heteroscedasticity test because we intentionally created a situation where the variance of residuals increases with the predictor value.

6 Summary

Our diagnostic functions successfully identified:

  1. A well-behaved model that passed both tests
  2. A model with high multicollinearity (VIF > 10)
  3. A model with heteroscedasticity (p < 0.05 in Breusch-Pagan test)

These examples demonstrate how the functions can be used to identify potential violations of regression assumptions and help analysts make informed decisions about their models.