Data 605 HW 11
library(knitr)
library(rmdformats)
## Global options
options(max.print="85")
opts_chunk$set(cache=TRUE,
prompt=FALSE,
tidy=TRUE,
comment=NA,
message=FALSE,
warning=FALSE)
opts_knit$set(width=31)
library(olsrr)##
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
##
## rivers
##
## Attaching package: 'MASS'
## The following object is masked from 'package:olsrr':
##
## cement
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)
Cars data summary:
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00
Visualize the relationsip between the response variable, stopping distance and speed.
plot(cars$speed, cars$dist, xlab = "Speed (mph)", ylab = "Stopping Distance (ft)",
main = "Stopping Distance vs. Speed")Create a linear model
Call:
lm(formula = dist ~ speed, data = cars)
Residuals:
Min 1Q Median 3Q Max
-29.069 -9.525 -2.272 9.215 43.201
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.5791 6.7584 -2.601 0.0123 *
speed 3.9324 0.4155 9.464 1.49e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Visualizing the linear model:
plot(cars$speed, cars$dist, xlab = "Speed (mph)", ylab = "Stopping Distance (ft)",
main = "Stopping Distance vs. Speed")
abline(lm.fit)It seems the linear model (lm.fit) was a decent fit. It has an adjusted R-squared of 0.64.
Let’s check for the assumptions.
hist <- hist(cars$dist, breaks = 10, col = "pink", xlab = "Distance", main = "Histogram with Normal Curve")
xfit <- seq(min(cars$dist), max(cars$dist), length = 40)
yfit <- dnorm(xfit, mean = mean(cars$dist), sd = sd(cars$dist))
yfit <- yfit * diff(hist$mids[1:2]) * length(cars$dist)
lines(xfit, yfit, col = "cadetblue1", lwd = 2)It doesn’t look like the distribution of dist forms a normal distribution.
Examining the Residuals
We don’t have normality in residuals.
Residuals don’t have constant variance as shown below.
Breusch Pagan Test for Heteroskedasticity
The test results show that we reject the null hypothesis. Therefore, the variance is not constant.
Breusch Pagan Test for Heteroskedasticity
-----------------------------------------
Ho: the variance is constant
Ha: the variance is not constant
Data
--------------------------------
Response : dist
Variables: fitted values of dist
Test Summary
-----------------------------
DF = 1
Chi2 = 4.650233
Prob > Chi2 = 0.03104933
[1] 0.2626263
Let’s do a transformation by raising distance to lambda.
Call:
lm(formula = dist^lambda ~ log(speed), data = cars)
Residuals:
Min 1Q Median 3Q Max
-0.45300 -0.17549 -0.01524 0.16402 0.60400
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02630 0.23456 0.112 0.911
log(speed) 0.95888 0.08708 11.011 9.86e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2529 on 48 degrees of freedom
Multiple R-squared: 0.7164, Adjusted R-squared: 0.7105
F-statistic: 121.2 on 1 and 48 DF, p-value: 9.856e-15
Adjust R-squared improves to .71.
Let’s examine the residuals.
Normality in residuals ?
Yes. It becomes normal.
Residuals do have constant variance as shown above.
Breusch Pagan Test for Heteroskedasticity
-----------------------------------------
Ho: the variance is constant
Ha: the variance is not constant
Data
---------------------------------------
Response : dist^lambda
Variables: fitted values of dist^lambda
Test Summary
----------------------------
DF = 1
Chi2 = 1.877362
Prob > Chi2 = 0.1706343
P-value is greater than .05. We don’t reject H0 hypothesis. We do have constant variance.
So a BoxCox transformation of lambda = 0.2626263 was a good choice. It all checks out.