library(ISLR)
auto<-read.csv("http://faculty.marshall.usc.edu/gareth-james/ISL/Auto.csv",
header=TRUE,
na.strings = "?")
auto<-na.omit(auto)
auto<-auto[, -c(8:9)]
attach(auto)
modInt1<-lm(mpg~year*weight, data=auto)
summary(modInt1)
##
## Call:
## lm(formula = mpg ~ year * weight, data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0397 -1.9956 -0.0983 1.6525 12.9896
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.105e+02 1.295e+01 -8.531 3.30e-16 ***
## year 2.040e+00 1.718e-01 11.876 < 2e-16 ***
## weight 2.755e-02 4.413e-03 6.242 1.14e-09 ***
## year:weight -4.579e-04 5.907e-05 -7.752 8.02e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.193 on 388 degrees of freedom
## Multiple R-squared: 0.8339, Adjusted R-squared: 0.8326
## F-statistic: 649.3 on 3 and 388 DF, p-value: < 2.2e-16
With a p-vaule of less than 0.05, the interaction between year and weight (year:weight) appears to be statistically significant.
Other possible interactions:
modInt2<-lm(mpg~year*cylinders, data=auto)
summary(modInt2)
##
## Call:
## lm(formula = mpg ~ year * cylinders, data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.2164 -2.5792 -0.1558 2.2569 15.2532
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -61.61775 15.10277 -4.080 5.47e-05 ***
## year 1.34054 0.19909 6.733 5.99e-11 ***
## cylinders 5.51044 2.73705 2.013 0.04478 *
## year:cylinders -0.11350 0.03647 -3.112 0.00199 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.131 on 388 degrees of freedom
## Multiple R-squared: 0.722, Adjusted R-squared: 0.7199
## F-statistic: 335.9 on 3 and 388 DF, p-value: < 2.2e-16
With a p-vaule of less than 0.05, the interaction between year and weight (year:cylinders) appears to be statistically significant.
modInt3<-lm(mpg~weight*cylinders, data=auto)
summary(modInt3)
##
## Call:
## lm(formula = mpg ~ weight * cylinders, data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.4916 -2.6225 -0.3927 1.7794 16.7087
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 65.3864559 3.7333137 17.514 < 2e-16 ***
## weight -0.0128348 0.0013628 -9.418 < 2e-16 ***
## cylinders -4.2097950 0.7238315 -5.816 1.26e-08 ***
## weight:cylinders 0.0010979 0.0002101 5.226 2.83e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.165 on 388 degrees of freedom
## Multiple R-squared: 0.7174, Adjusted R-squared: 0.7152
## F-statistic: 328.3 on 3 and 388 DF, p-value: < 2.2e-16
With a p-vaule of less than 0.05, the interaction between year and weight (weight:cylinders) appears to be statistically significant.
X^2 transformation, variabel=weight
mod1<-lm(mpg~weight+I(weight^2), data=auto)
summary(mod1)
##
## Call:
## lm(formula = mpg ~ weight + I(weight^2), data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.6246 -2.7134 -0.3485 1.8267 16.0866
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.226e+01 2.993e+00 20.800 < 2e-16 ***
## weight -1.850e-02 1.972e-03 -9.379 < 2e-16 ***
## I(weight^2) 1.697e-06 3.059e-07 5.545 5.43e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.176 on 389 degrees of freedom
## Multiple R-squared: 0.7151, Adjusted R-squared: 0.7137
## F-statistic: 488.3 on 2 and 389 DF, p-value: < 2.2e-16
plot(mod1$fitted.values, mod1$residuals, pch=16)
abline(h=0, col="blue")
plot(mod1)
While the residual plot for the X^2 transformation of the weight varoiable seems to have no pattern with the residuals having approximately equal distances from 0. the qqplot shows the upper tail straying from normality.
sqrt X, variable=weight
mod2<-lm(mpg~weight+I(weight^(1/2)), data=auto)
summary(mod2)
##
## Call:
## lm(formula = mpg ~ weight + I(weight^(1/2)), data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.5660 -2.6552 -0.4161 1.7373 16.1001
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 109.218284 11.573797 9.437 < 2e-16 ***
## weight 0.013191 0.003828 3.446 0.000631 ***
## I(weight^(1/2)) -2.314535 0.424250 -5.456 8.7e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.181 on 389 degrees of freedom
## Multiple R-squared: 0.7145, Adjusted R-squared: 0.713
## F-statistic: 486.7 on 2 and 389 DF, p-value: < 2.2e-16
plot(mod2$fitted.values, mod2$residuals, pch=16)
abline(h=0, col="blue")
plot(mod2)
While the residual plot for the X^1/2 transformation of the weight varoiable seems to have no pattern (maybe slight fanning) with the residuals having approximately equal distances from 0. the qqplot shows the upper tail straying from normality.
data(Carseats)
names(Carseats)
## [1] "Sales" "CompPrice" "Income" "Advertising" "Population"
## [6] "Price" "ShelveLoc" "Age" "Education" "Urban"
## [11] "US"
head(Carseats)
## Sales CompPrice Income Advertising Population Price ShelveLoc Age
## 1 9.50 138 73 11 276 120 Bad 42
## 2 11.22 111 48 16 260 83 Good 65
## 3 10.06 113 35 10 269 80 Medium 59
## 4 7.40 117 100 4 466 97 Medium 55
## 5 4.15 141 64 3 340 128 Bad 38
## 6 10.81 124 113 13 501 72 Bad 78
## Education Urban US
## 1 17 Yes Yes
## 2 10 Yes Yes
## 3 12 Yes Yes
## 4 14 Yes Yes
## 5 13 Yes No
## 6 16 No Yes
CarS1<-lm(Sales~Price+Urban+US, data=Carseats)
summary(CarS1)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
Price, -0.054459: For every 1 increase in unit of Price, you would expect a 0.054459 decrease in Sales.
UrbanYes, -0.021916: highlight shift in the intercept for a YES to Urban
USYes, 1.200573: highlight shift in the intercept for a YES to US
UrbanNo, UrbanNo (ref): y = -0.05449x + 13.043469
UrbanYes, USYes: y = -0.054459x + (13.043469 - 0.021916 + 1.200573)
UrbanYes, UrbanNo: y = -0.05449x + (13.043469 - 0.021916)
UrbanNo, UrbanYes: y = -0.05449x + (13.043469 + 1.200573)
note: no changes in slope in this example because interactions between explanatory variables not taken into account
summary(CarS1)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
You can reject the null hypothesis for the predictors of price and US(Yes) as both appear to be statistically significiant with a small (less than 0.05) p-value.
CarSmallPrice<-lm(Sales~Price*US, data = Carseats)
summary(CarSmallPrice)
##
## Call:
## lm(formula = Sales ~ Price * US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9299 -1.6375 -0.0492 1.5765 7.0430
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.974798 0.953079 13.614 < 2e-16 ***
## Price -0.053986 0.008163 -6.613 1.22e-10 ***
## USYes 1.295775 1.252146 1.035 0.301
## Price:USYes -0.000835 0.010641 -0.078 0.937
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
plot(CarS1)
plot(CarSmallPrice)
When checking the diagnostic plots for the models in parts a) and e), both show a residual plot that has data with no pattern and similar distances from 0 (residuals). Both QQ plots show very little deviation from normality and Cook’s plot hihghlights no outliers that carry too much leverage on the modeel fit.
confint(CarSmallPrice)
## 2.5 % 97.5 %
## (Intercept) 11.10107096 14.84852478
## Price -0.07003516 -0.03793731
## USYes -1.16590989 3.75745964
## Price:USYes -0.02175564 0.02008563