Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
Running Code
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
library(skedastic) # FOR WHITE TESTlibrary(lmtest)
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
# Load the datasetdata(mtcars)# Display the structure of the datasetstr(mtcars)
# Fit the linear regression model to be the main modelmodel <-lm(mpg ~ hp + wt + cyl, data = mtcars)# Display the summary of the regression modelsummary(model)
Call:
lm(formula = mpg ~ hp + wt + cyl, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.9290 -1.5598 -0.5311 1.1850 5.8986
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.75179 1.78686 21.687 < 2e-16 ***
hp -0.01804 0.01188 -1.519 0.140015
wt -3.16697 0.74058 -4.276 0.000199 ***
cyl -0.94162 0.55092 -1.709 0.098480 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.512 on 28 degrees of freedom
Multiple R-squared: 0.8431, Adjusted R-squared: 0.8263
F-statistic: 50.17 on 3 and 28 DF, p-value: 2.184e-11
# Perform White's test for heteroskedasticity# We'll use the 'bptest' function with the 'studentized' parameter for White's testwhite_test <-bptest(model, ~fitted(model) +I(fitted(model)^2), data = mtcars)# Display the results of White's testprint(white_test)
studentized Breusch-Pagan test
data: model
BP = 5.8332, df = 2, p-value = 0.05412
# Optional: Plot residuals to visually check for heteroskedasticitypar(mfrow =c(2, 2)) # Arrange plots in a 2x2 gridplot(model)
White’s Test Results
Test Statistic (BP): 5.8332 Degrees of Freedom (df): 2 p-value: 0.05412
Interpretation
Null Hypothesis: The residuals have constant variance (homoskedasticity). Alternative Hypothesis: The residuals do not have constant variance (heteroskedasticity exists). p-value = 0.05412: This value is slightly above the typical significance threshold of 0.05. At the 5% significance level, we fail to reject the null hypothesis of homoskedasticity. At a more lenient 10% significance level, the p-value would indicate some evidence of heteroskedasticity.
Conclusion
At the 5% significance level, the test suggests that there is no statistically significant evidence of heteroskedasticity in the residuals of the model. Therefore, we conclude that the residual variance is likely constant, and the assumption of homoskedasticity holds for this model.
Using the residual plots, we can visually check for heteroskedasticity.
Residuals vs Fitted Values Plot: There is a random scatter of points without any pattern, indicating homoscedasticity (constant variance of errors).
Scale-Location Plot: The points form a random scatter, suggesting homoscedasticity.
Normal Q-Q Plot: The residuals do not have varying spread across different quantiles (e.g., the spread is approximately as wide for larger residuals as for smaller residuals), indicating homoscedasticity.
Residuals vs Leverage Plot: Points that are far from the center (high leverage) do not have significantly large residuals.
The White’s test result confirms with the residual charts plotted.
# Ensure mtcars has only 32 rowsnrow(mtcars) # Check the number of rows in the dataset
[1] 32
# Create squares of the original regressorsmtcars$hp2 <- mtcars$hp^2# Square of horsepowermtcars$wt2 <- mtcars$wt^2# Square of weightmtcars$cyl2 <- mtcars$cyl^2# Square of cylinders# Create cross terms of the original regressorsmtcars$hp_wt <- mtcars$hp * mtcars$wt # Interaction between horsepower and weightmtcars$hp_cyl <- mtcars$hp * mtcars$cyl # Interaction between horsepower and cylindersmtcars$wt_cyl <- mtcars$wt * mtcars$cyl # Interaction between weight and cylinders# View the first few rows of the dataset to confirm new variableshead(mtcars)
# Extract residuals from the main regressionresiduals_main <-residuals(model)# Square the residuals to use as the dependent variable for the auxiliary regressionsquared_residuals <- residuals_main^2# Run the auxiliary regression (squared residuals as the dependent variable)auxiliary_model <-lm(squared_residuals ~ hp + wt + cyl + hp2 + wt2 + cyl2 + hp_wt + hp_cyl + wt_cyl, data = mtcars)# Display the results of the auxiliary regressionsummary(auxiliary_model)
Call:
lm(formula = squared_residuals ~ hp + wt + cyl + hp2 + wt2 +
cyl2 + hp_wt + hp_cyl + wt_cyl, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-13.7670 -3.3051 -0.8702 2.7699 20.9129
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.314e+01 3.821e+01 1.391 0.178
hp -2.971e-01 3.060e-01 -0.971 0.342
wt 1.107e+00 1.342e+01 0.082 0.935
cyl -8.987e+00 1.549e+01 -0.580 0.568
hp2 -8.104e-05 8.549e-04 -0.095 0.925
wt2 -5.660e+00 5.241e+00 -1.080 0.292
cyl2 3.271e-01 2.371e+00 0.138 0.892
hp_wt 1.503e-01 1.323e-01 1.136 0.268
hp_cyl -2.542e-02 8.616e-02 -0.295 0.771
wt_cyl 2.292e+00 6.875e+00 0.333 0.742
Residual standard error: 8.698 on 22 degrees of freedom
Multiple R-squared: 0.2795, Adjusted R-squared: -0.01525
F-statistic: 0.9483 on 9 and 22 DF, p-value: 0.5056
Interpreting the R-squared Value
In the auxiliary regression, the R-squared value is 0.2795 meaning that only 27.95% of the variation in the squared residuals is explained by the independent variables (the original regressors, their squares, and cross terms).
This R-squared is low and suggests that the independent variables used in the auxiliary regression do not explain much of the variation in the squared residuals. This may imply that there isn’t a strong relationship between the residuals and the independent variables, indicating no significant heteroskedasticity.
If the R-squared were high, it would suggest that the independent variables are explaining more of the variation in the squared residuals, indicating the presence of heteroskedasticity (i.e., the variance of the residuals depends on the independent variables).
We can formally test if the R-squared value is statistically significant using a chi-squared test.
Null hypothesis: There is no heteroskedasticity (i.e., the R-squared value of the auxiliary regression is not significant). Alternative hypothesis: There is heteroskedasticity (i.e., the R-squared value is significant, meaning the independent variables explain the variation in squared residuals). The test statistic for the chi-squared test is calculated as:
Test Statistic = R^2 × number of observations = 0.2795 × 32 = 8.944
where: R^2 = 0.2795 (from the auxiliary regression), number of observations = 32
Compute the critical value using the chi-squared distribution and compare it with the test statistic. We use the qchisq() function in R to find the critical value for a significance level of 5% (α = 0.05) and degrees of freedom equal to the number of independent variables minus 1 (9 independent variables for the regression, so degrees of freedom = 9 - 1 = 8).
# Define the R-squared value and number of observationsR_squared <-0.2795n_obs <-32df <-8# degrees of freedom for the auxiliary regression# Calculate the test statistictest_statistic <- R_squared * n_obstest_statistic
[1] 8.944
# Find the critical value from the chi-squared distributionalpha <-0.05critical_value <-qchisq(1- alpha, df)# Compute p-value of the test statisticp_value <-1-pchisq(test_statistic, df)# Display resultstest_statistic
[1] 8.944
critical_value
[1] 15.50731
p_value
[1] 0.3470421
Interpretation of the Chi-Squared Test Results
Test Statistic: The test statistic is 8.944. Critical Value: The critical value for the chi-squared distribution with 8 degrees of freedom and a significance level of 5% (α = 0.05) is 15.50731. p-value: The p-value is 0.3470421.
The test statistic (8.944) is less than the critical value (15.50731). The p-value (0.3470421) is greater than the significance level (α = 0.05).
Decision Rule: If the test statistic is greater than the critical value, we reject the null hypothesis. If the p-value is less than the significance level (α = 0.05), we also reject the null hypothesis.
Conclusion
Fail to Reject the Null Hypothesis: Since the test statistic is smaller than the critical value, and the p-value is larger than 0.05, we fail to reject the null hypothesis. Interpretation: There is no significant evidence of heteroskedasticity in the model. This suggests that the variance of the residuals is not significantly related to the independent variables in your model.