This question should be answered using the Carseats data set.
library(ISLR)
## Warning: package 'ISLR' was built under R version 3.6.3
attach(Carseats)
(a) Fit a multiple regression model to predict Sales using Price, Urban, and US.
fit=lm(Sales~Price+Urban+US)
summary(fit)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
(b) Provide an interpretation of each coefficient in the model. Becareful—some of the variables in the model are qualitative!
From the table in part A, Price and US are significant predictors of Sales. For every $1000 increase in the price, Sales decrease by about $54.5. Sales inside the US are $1,200 higher than sales outside of the US. Urban has no effect on Sales.
(c) Write out the model in equation form, being careful to handlethe qualitative variables properly.
\(Sales = 13.043469 -0.054459*Price - 0.021916*Urban_{Yes} + 1.200573*US_{Yes}\)
(d) For which of the predictors can you reject the null hypothesis \(H_{0}:\beta_j=0?\)
Price and US.
(e) On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there isevidence of association with the outcome.
fit2=lm(Sales~Price+US)
summary(fit2)
##
## Call:
## lm(formula = Sales ~ Price + US)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9269 -1.6286 -0.0574 1.5766 7.0515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
## Price -0.05448 0.00523 -10.416 < 2e-16 ***
## USYes 1.19964 0.25846 4.641 4.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
## F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
(f) How well do the models in (a) and (e) fit the data?
Not well, each model explains around 23% of the variance in Sales.
(g) Using the model from (e), obtain 95 % confidence intervals forthe coefficient(s).
confint(fit2)
## 2.5 % 97.5 %
## (Intercept) 11.79032020 14.27126531
## Price -0.06475984 -0.04419543
## USYes 0.69151957 1.70776632
(h) Is there evidence of outliers or high leverage observations in the model from (e)?