Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)
require(car)
require(lmtest)
require(Hmisc)
require(ResourceSelection)
#sample data
cars <- datasets::cars
head(cars)
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
#specify model, data, and level
model=cars$speed~cars$dist
data=cars
level=.95
cars %>% ggplot(aes(x=dist,y=speed)) + geom_point() +stat_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
I used a function shared by Professor Fulton on this week’s discussion board for model evaluation. Based off the resultant charts and analysis, we can see that the adjusted \(R^2\) is .6438 - showing how distance explains 64% of the variance in stopping speed. With a p-value < .01, we can conclude there is also statistical significance in the model.
#function
regressit=function(model,data,level) #model is the functional relationship among variables, , l is level, forecast is vector or matrix to forecast
{
kdepairs(data) #scatterplot
myreg=lm(model,y=TRUE) #runregression
confint(myreg, level=level) #confidenceintervals
r1=summary(myreg) #regression summary
r2=anova(myreg) #ANOVA tableforregression
r3=coefficients(myreg) #coefficients
r4=myreg$residuals #residuals
r5=shapiro.test(r4) #test for normality of residuals
r6=bptest(myreg) #Breusch-Pagan-Godfrey-Test for homoscedasticity, lmtest
r7a=ncvTest(myreg) #non-constant variance test for homoscedasticity
r7=dwtest(myreg) #independence of residuals #test for independence of errors
#r8=vif(myreg) #look at collinearity, car package
me=mean(myreg$residuals)
mad=mean(abs(myreg$residuals))
mse=mean(myreg$residuals^2)
rmse=sqrt(mse)
mpe= 100*mean(myreg$residuals / myreg$y)
mape=100*mean(abs(myreg$residuals) / myreg$y)
aic=AIC(myreg)
bic=BIC(myreg)
all=c(me,mad,rmse,mpe,mape,aic,bic)
names(all)=c("ME","MAD","RMSE","MPE","MAPE", "AIC","BIC")
mybarplot=barplot(all)
mylist=list(r1,r2,r3,r4,r5,r6,r7,r7a,all)
return(mylist)
}
regressit(model,data,level)
## [[1]]
##
## Call:
## lm(formula = model, y = TRUE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5293 -2.1550 0.3615 2.4377 6.4179
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.28391 0.87438 9.474 1.44e-12 ***
## cars$dist 0.16557 0.01749 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.156 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
##
##
## [[2]]
## Analysis of Variance Table
##
## Response: cars$speed
## Df Sum Sq Mean Sq F value Pr(>F)
## cars$dist 1 891.98 891.98 89.567 1.49e-12 ***
## Residuals 48 478.02 9.96
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## [[3]]
## (Intercept) cars$dist
## 8.2839056 0.1655676
##
## [[4]]
## 1 2 3 4 5 6
## -4.61504079 -5.93958139 -1.94617594 -4.92639228 -2.93298684 -0.93958139
## 7 8 9 10 11 12
## -1.26412199 -2.58866258 -3.91320318 -0.09855441 -1.91979773 1.39814831
## 13 14 15 16 17 18
## 0.40474287 -0.25752743 -0.91979773 0.41133742 -0.91320318 -0.91320318
## 19 20 21 22 23 24
## -2.90001408 1.41133742 -0.24433833 -4.21796012 -7.52931161 3.40474287
## 25 26 27 28 29 30
## 2.41133742 -2.22455467 2.41793197 1.09339137 3.41793197 2.09339137
## 31 32 33 34 35 36
## 0.43771563 2.76225622 0.44431018 -2.86704131 -4.19158191 4.75566167
## 37 38 39 40 41 42
## 3.09998592 -0.54250072 6.41793197 3.76885078 3.10658048 2.44431018
## 43 44 45 46 47 48
## 1.11976958 2.78863443 5.77544533 4.12636413 0.48387749 0.31830992
## 49 50
## -4.15201460 2.64285051
##
## [[5]]
##
## Shapiro-Wilk normality test
##
## data: r4
## W = 0.98745, p-value = 0.8696
##
##
## [[6]]
##
## studentized Breusch-Pagan test
##
## data: myreg
## BP = 0.71522, df = 1, p-value = 0.3977
##
##
## [[7]]
##
## Durbin-Watson test
##
## data: myreg
## DW = 1.1949, p-value = 0.0009159
## alternative hypothesis: true autocorrelation is greater than 0
##
##
## [[8]]
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 0.5456416, Df = 1, p = 0.4601
##
## [[9]]
## ME MAD RMSE MPE MAPE
## -1.207194e-16 2.518286e+00 3.091994e+00 -7.497101e+00 2.095004e+01
## AIC BIC
## 2.607755e+02 2.665115e+02