Assignment 11

Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)

require(car)
require(lmtest)
require(Hmisc)
require(ResourceSelection)

#sample data
cars <- datasets::cars
head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10
#specify model, data, and level
model=cars$speed~cars$dist
data=cars
level=.95
cars %>% ggplot(aes(x=dist,y=speed)) + geom_point() +stat_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

I used a function shared by Professor Fulton on this week’s discussion board for model evaluation. Based off the resultant charts and analysis, we can see that the adjusted \(R^2\) is .6438 - showing how distance explains 64% of the variance in stopping speed. With a p-value < .01, we can conclude there is also statistical significance in the model.

#function
regressit=function(model,data,level) #model is the functional relationship among variables, , l is level, forecast is vector or matrix to forecast
    {
    kdepairs(data) #scatterplot
    myreg=lm(model,y=TRUE)    #runregression
    confint(myreg, level=level) #confidenceintervals
    r1=summary(myreg) #regression summary
    r2=anova(myreg) #ANOVA tableforregression
    r3=coefficients(myreg) #coefficients
    r4=myreg$residuals #residuals
    r5=shapiro.test(r4) #test for normality of residuals
    r6=bptest(myreg)  #Breusch-Pagan-Godfrey-Test for homoscedasticity, lmtest
    r7a=ncvTest(myreg) #non-constant variance test for homoscedasticity
    r7=dwtest(myreg)  #independence of residuals  #test for independence of errors
    #r8=vif(myreg)  #look at collinearity, car package
    me=mean(myreg$residuals)
    mad=mean(abs(myreg$residuals))
    mse=mean(myreg$residuals^2)
    rmse=sqrt(mse)
    mpe= 100*mean(myreg$residuals / myreg$y)
    mape=100*mean(abs(myreg$residuals) / myreg$y)
    aic=AIC(myreg)
    bic=BIC(myreg)
    all=c(me,mad,rmse,mpe,mape,aic,bic)
    names(all)=c("ME","MAD","RMSE","MPE","MAPE", "AIC","BIC")
    mybarplot=barplot(all)
    mylist=list(r1,r2,r3,r4,r5,r6,r7,r7a,all)
    return(mylist)
    }

regressit(model,data,level)

## [[1]]
## 
## Call:
## lm(formula = model, y = TRUE)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5293 -2.1550  0.3615  2.4377  6.4179 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.28391    0.87438   9.474 1.44e-12 ***
## cars$dist    0.16557    0.01749   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.156 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12
## 
## 
## [[2]]
## Analysis of Variance Table
## 
## Response: cars$speed
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## cars$dist  1 891.98  891.98  89.567 1.49e-12 ***
## Residuals 48 478.02    9.96                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[3]]
## (Intercept)   cars$dist 
##   8.2839056   0.1655676 
## 
## [[4]]
##           1           2           3           4           5           6 
## -4.61504079 -5.93958139 -1.94617594 -4.92639228 -2.93298684 -0.93958139 
##           7           8           9          10          11          12 
## -1.26412199 -2.58866258 -3.91320318 -0.09855441 -1.91979773  1.39814831 
##          13          14          15          16          17          18 
##  0.40474287 -0.25752743 -0.91979773  0.41133742 -0.91320318 -0.91320318 
##          19          20          21          22          23          24 
## -2.90001408  1.41133742 -0.24433833 -4.21796012 -7.52931161  3.40474287 
##          25          26          27          28          29          30 
##  2.41133742 -2.22455467  2.41793197  1.09339137  3.41793197  2.09339137 
##          31          32          33          34          35          36 
##  0.43771563  2.76225622  0.44431018 -2.86704131 -4.19158191  4.75566167 
##          37          38          39          40          41          42 
##  3.09998592 -0.54250072  6.41793197  3.76885078  3.10658048  2.44431018 
##          43          44          45          46          47          48 
##  1.11976958  2.78863443  5.77544533  4.12636413  0.48387749  0.31830992 
##          49          50 
## -4.15201460  2.64285051 
## 
## [[5]]
## 
##  Shapiro-Wilk normality test
## 
## data:  r4
## W = 0.98745, p-value = 0.8696
## 
## 
## [[6]]
## 
##  studentized Breusch-Pagan test
## 
## data:  myreg
## BP = 0.71522, df = 1, p-value = 0.3977
## 
## 
## [[7]]
## 
##  Durbin-Watson test
## 
## data:  myreg
## DW = 1.1949, p-value = 0.0009159
## alternative hypothesis: true autocorrelation is greater than 0
## 
## 
## [[8]]
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.5456416, Df = 1, p = 0.4601
## 
## [[9]]
##            ME           MAD          RMSE           MPE          MAPE 
## -1.207194e-16  2.518286e+00  3.091994e+00 -7.497101e+00  2.095004e+01 
##           AIC           BIC 
##  2.607755e+02  2.665115e+02