(a) Fit a multiple regression model to predict Sales using Price, Urban, and US.

library('ISLR')
## Warning: package 'ISLR' was built under R version 3.5.2
sales = Carseats$Sales
price = Carseats$Price
urban = Carseats$Urban
us = Carseats$US
model = lm(sales ~ price + urban + us)
plot(model)

(b) Provide an interpretation of each coefficient in the model. Be careful-some of the variables in the model are qualitative!

Holding all variables except for price constant, we have

\[ \hat{sales}_{n+1} - \hat{sales}_{n} = \hat{\beta_0}(1-1)+\hat{\beta_1}(X_{1_{n+1}}-X_{1_{n}}) + \hat{\beta_2}(X_2-X_2)+\hat{\beta_3}(X_3-X_3) \]

\[ \hat{sales}_{n+1} - \hat{sales}_{n} = \hat{\beta_1} = -0.05445885 \] (all numbers are expressed in thousands) Thus, increasing the price by $1 results in approximately 54.46 fewer units sold. Taking a similar approach to the other variables we see that 21.92 fewer units are sold in urban areas and 1.2 more units are sold in the US than other countries.

(c) Write out the model in equation form, being careful to handle the qualitative variables properly.

\[ \hat{sales} = \hat{\beta_0} + \hat{\beta_1}(Price) + \hat{\beta_2}(Urban) + \hat{\beta_3}(US) \] where (values are in thousands and rounded to three spaces) \[ \hat{\beta_0} = 13.043 \\ \hat{\beta_1} = -0.054 \\ \hat{\beta_2} = -0.022 \\ \hat{\beta_3} = 1.201 \\\] Price represents the price of a car seat, urban = 1 if the car is sold in an urban area and 0 if not, and US = 1 if sold in the US and 0 if not.

(d) For which of the predictors can you reject the null hypothesis H0: ??j = 0?

summary(model)
## 
## Call:
## lm(formula = sales ~ price + urban + us)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## price       -0.054459   0.005242 -10.389  < 2e-16 ***
## urbanYes    -0.021916   0.271650  -0.081    0.936    
## usYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

Judging by the p-value of the variable Urban, there is enough evidence to reject the null hypothesis \[H_0: \beta_2 = 0 \]

(e) On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.

modelNew = lm(sales ~ price + us)
summary(modelNew)
## 
## Call:
## lm(formula = sales ~ price + us)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## price       -0.05448    0.00523 -10.416  < 2e-16 ***
## usYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

Our new model is

\[ \hat{sales} = \hat{\beta_0} + \hat{\beta_1}(Price) + \hat{\beta_2}(US) \]

where \[ \hat{\beta_0} = 13.031 \\ \hat{\beta_1} = -0.054 \\ \hat{\beta_2} = -1.2\]

(f) How well do the models in (a) and (e) fit the data?

R2_old = summary(model)$r.squared
R2_new = summary(modelNew)$r.squared
deltaR = R2_old - R2_new
deltaR
## [1] 1.250376e-05

The two models fit equally well judging from the insignificant difference between the coefficients of the determinants.

(g) Using the model from (e), obtain 95% confidence intervals for the coefficient(s).

confint(modelNew)
##                   2.5 %      97.5 %
## (Intercept) 11.79032020 14.27126531
## price       -0.06475984 -0.04419543
## usYes        0.69151957  1.70776632

(h) Is there evidence of outliers or high leverage observations in the model from (e)?

plot(modelNew)

There do not appear to be any outliers that significantly deviate from the model.