Weighted Least Squares is a regression techniques that expand on the Ordinary Least Squares method by assigning different “weights” to
##
## Call:
## lm(formula = score ~ hours, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.967 -5.970 -0.719 7.531 15.032
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.467 5.128 11.791 1.17e-08 ***
## hours 5.500 1.127 4.879 0.000244 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.224 on 14 degrees of freedom
## Multiple R-squared: 0.6296, Adjusted R-squared: 0.6032
## F-statistic: 23.8 on 1 and 14 DF, p-value: 0.0002438
plot(fitted(model), resid(model), xlab = "Fitted Values", ylab = "Residuals")
# add a horizontal line at 0
abline(0, 0)Residuals are not distributed with equal variance in the plot
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 3.9597, df = 1, p-value = 0.0466
The Breusch-Pagan test uses the following null and alternative hypotheses:
Null Hypothesis (H0): Homoscedasticity is present (the residuals are distributed with equal variance) Alternative Hypothesis (HA): Heteroscedasticity is present (the residuals are not distributed with equal variance)
Since the p-value from the test is 0.0466 we will reject the null hypothesis and conclude that heteroscedasticity is a problem in this model.
# define weights to use
wt <- 1/lm(abs(model$residuals) ~ model$fitted.values)$fitted.values^2
# perform weighted least squares regression
wls_model <- lm(score ~ hours, data = df, weights = wt)
# view summary of model
summary(wls_model)##
## Call:
## lm(formula = score ~ hours, data = df, weights = wt)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -2.0167 -0.9263 -0.2589 0.9873 1.6977
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 63.9689 5.1587 12.400 6.13e-09 ***
## hours 4.7091 0.8709 5.407 9.24e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.199 on 14 degrees of freedom
## Multiple R-squared: 0.6762, Adjusted R-squared: 0.6531
## F-statistic: 29.24 on 1 and 14 DF, p-value: 9.236e-05
plot(fitted(wls_model), resid(wls_model), xlab = "Fitted Values", ylab = "Residuals")
# add a horizontal line at 0
abline(0, 0)From the output we can see that the coefficient estimate for the predictor variable hours changed a bit and the overall fit of the model improved.
The weighted least squares model has a residual standard error of 1.199 compared to 9.224 in the original simple linear regression model.
This indicates that the predicted values produced by the weighted least squares model are much closer to the actual observations compared to the predicted values produced by the simple linear regression model.
The weighted least squares model also has an R-squared of .6762 compared to .6296 in the original simple linear regression model.
This indicates that the weighted least squares model is able to explain more of the variance in exam scores compared to the simple linear regression model.
These metrics indicate that the weighted least squares model offers a better fit to the data compared to the simple linear regression model.