This document demonstrates the usage of two regression diagnostic functions:
af_vif_test()
: Tests for multicollinearity using
Variance Inflation Factors (VIF)af_hetero_test()
: Tests for heteroscedasticity using
the Breusch-Pagan testWe’ll examine both passing and failing cases for each test using different datasets and model specifications.
First, let’s define our diagnostic functions:
Let’s first look at a well-behaved model using the built-in mtcars dataset.
# Fit a model with well-behaved predictors
good_model <- lm(mpg ~ wt + hp, data = mtcars)
# Test for multicollinearity
vif_results <- af_vif_test(good_model)
cat(vif_results$result_line)
We use VIF to test for multicolinearity. The result indicates that all VIF values <= 10.
Multicollinearity is not a concern.
VIF Table | |||||||
Term | VIF | VIF_CI_low | VIF_CI_high | SE_factor | Tolerance | Tolerance_CI_low | Tolerance_CI_high |
---|---|---|---|---|---|---|---|
wt | 1.8 | 1.3 | 3.0 | 1.3 | 0.6 | 0.3 | 0.8 |
hp | 1.8 | 1.3 | 3.0 | 1.3 | 0.6 | 0.3 | 0.8 |
# Test for heteroscedasticity
hetero_results <- af_hetero_test(good_model)
cat(hetero_results$result_line)
We use Breusch-Pagan test for homoscedasticity (equal variance) assumption. The result of P-value 0.402 >= 0.05, indicates homoscedasticity (no concern for heteroscedasticity).
In this case, both tests pass because: 1. The VIF values are well below 10, indicating no concerning multicollinearity 2. The heteroscedasticity test p-value is above 0.05, indicating homoscedastic residuals
Now let’s create a model with high multicollinearity by including redundant predictors.
# Create a new variable that's highly correlated with weight
mtcars$wt_scaled <- mtcars$wt * 2.2 + rnorm(32, 0, 0.1) # Convert to kg with small noise
# Fit model with multicollinearity
bad_vif_model <- lm(mpg ~ wt + wt_scaled + hp, data = mtcars)
# Test for multicollinearity
vif_results_bad <- af_vif_test(bad_vif_model)
cat(vif_results_bad$result_line)
We use VIF to test for multicolinearity. The result indicates VIF values > 10 detected.
Multicollinearity is high.
VIF Table | |||||||
Term | VIF | VIF_CI_low | VIF_CI_high | SE_factor | Tolerance | Tolerance_CI_low | Tolerance_CI_high |
---|---|---|---|---|---|---|---|
wt | 380.5 | 217.7 | 665.4 | 19.5 | 0.0 | 0.0 | 0.0 |
wt_scaled | 383.4 | 219.4 | 670.6 | 19.6 | 0.0 | 0.0 | 0.0 |
hp | 1.8 | 1.3 | 3.0 | 1.3 | 0.6 | 0.3 | 0.8 |
This model fails the VIF test because we intentionally included two highly correlated predictors (wt and wt_scaled).
Let’s create a model that exhibits heteroscedasticity by using a transformation that induces unequal variance.
# Create dataset with heteroscedastic errors
set.seed(123)
n <- 100
x <- runif(n, 0, 10)
y <- 2 * x + rnorm(n, 0, x/2) # Error variance increases with x
het_data <- data.frame(x = x, y = y)
# Fit model
bad_hetero_model <- lm(y ~ x, data = het_data)
# Test for heteroscedasticity
hetero_results_bad <- af_hetero_test(bad_hetero_model)
cat(hetero_results_bad$result_line)
We use Breusch-Pagan test for homoscedasticity (equal variance) assumption. The result of P-value 0 < 0.05 indicates a concern for heteroscedasticity.
This model fails the heteroscedasticity test because we intentionally created a situation where the variance of residuals increases with the predictor value.
Our diagnostic functions successfully identified:
These examples demonstrate how the functions can be used to identify potential violations of regression assumptions and help analysts make informed decisions about their models.