ISLR - Chapter 3 Linear Regression: Applied Q10: More on MLR

Q10(a) Fit MLR on the “Carseats” data set to predict “Sales” Using “Price”, “Urban”, “US”

library(ISLR)

## Warning: package 'ISLR' was built under R version 3.3.2

attach(Carseats)
lm.fit = lm(Sales ~ Price + Urban + US)
summary(lm.fit)

## 
## Call:
## lm(formula = Sales ~ Price + Urban + US)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

(b) Provide an interpretation of each coefficient in the model.

“Price” variable: The average effect of a price increase of 1 dollar is a decrease of 54.4588492 units in sales all other predictors remaining fixed.
“Urban” variable: On average the unit sales in urban location are 21.9161508 units less than in rural location all other predictors remaining fixed.
“US” variable: On average the unit sales in a US store are 1200.5726978 units more than in a non US store all other predictors remaining fixed.

(c) Write out the model in equation form, being careful to handle the qualitative variables properly.

The model may be written as \[Sales = 13.0434689 + (-0.0544588)\times Price + (-0.0219162)\times Urban + (1.2005727)\times US + \varepsilon\] with \(Urban = 1\) if the store is in an urban location and \(0\) if not, and \(US = 1\) if the store is in the US and \(0\) if not.

(d)For which of the predictors can you reject the null hypothesis \(H_0 : \beta_j = 0\) ?

We can reject the null hypothesis for the “Price” and “US” variables.

(e) On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.

lm.fit2 = lm(Sales ~ Price + US)
summary(lm.fit2)

## 
## Call:
## lm(formula = Sales ~ Price + US)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

(f) How well do the models in (a) and (e) fit the data ?

Based on the RSE and \(R^2\) of the linear regressions, they both fit the data similarly, with linear regression from (e) fitting the data slightly better. Essentially about 23.9262888% of the variability is explained by the second model.

(g) Using the model from (e), obtain 95% confidence intervals for the coefficient(s).

confint(lm.fit2)

##                   2.5 %      97.5 %
## (Intercept) 11.79032020 14.27126531
## Price       -0.06475984 -0.04419543
## USYes        0.69151957  1.70776632

(h) Is there evidence of outliers or high leverage observations in the model from (e) ?

plot(predict(lm.fit2), rstudent(lm.fit2))

All studentized residuals appear to be bounded by (-3 to 3), so not potential outliers are suggested from the linear regression.

par(mfrow = c(2,2))
plot(lm.fit2)

However, there are some points that exceed \((p + 1)/n\) (0.0075) that suggest that the corresponding points have high leverage.

ISLR - Chapter 3 Linear Regression: Applied Q10: More on MLR

Chee Loong Lian

12/13/2017