library('ISLR')
## Warning: package 'ISLR' was built under R version 3.5.2
sales = Carseats$Sales
price = Carseats$Price
urban = Carseats$Urban
us = Carseats$US
model = lm(sales ~ price + urban + us)
plot(model)
Holding all variables except for price constant, we have
\[ \hat{sales}_{n+1} - \hat{sales}_{n} = \hat{\beta_0}(1-1)+\hat{\beta_1}(X_{1_{n+1}}-X_{1_{n}}) + \hat{\beta_2}(X_2-X_2)+\hat{\beta_3}(X_3-X_3) \]
\[ \hat{sales}_{n+1} - \hat{sales}_{n} = \hat{\beta_1} = -0.05445885 \] (all numbers are expressed in thousands) Thus, increasing the price by $1 results in approximately 54.46 fewer units sold. Taking a similar approach to the other variables we see that 21.92 fewer units are sold in urban areas and 1.2 more units are sold in the US than other countries.
\[ \hat{sales} = \hat{\beta_0} + \hat{\beta_1}(Price) + \hat{\beta_2}(Urban) + \hat{\beta_3}(US) \] where (values are in thousands and rounded to three spaces) \[ \hat{\beta_0} = 13.043 \\ \hat{\beta_1} = -0.054 \\ \hat{\beta_2} = -0.022 \\ \hat{\beta_3} = 1.201 \\\] Price represents the price of a car seat, urban = 1 if the car is sold in an urban area and 0 if not, and US = 1 if sold in the US and 0 if not.
summary(model)
##
## Call:
## lm(formula = sales ~ price + urban + us)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## price -0.054459 0.005242 -10.389 < 2e-16 ***
## urbanYes -0.021916 0.271650 -0.081 0.936
## usYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
Judging by the p-value of the variable Urban, there is enough evidence to reject the null hypothesis \[H_0: \beta_2 = 0 \]
modelNew = lm(sales ~ price + us)
summary(modelNew)
##
## Call:
## lm(formula = sales ~ price + us)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9269 -1.6286 -0.0574 1.5766 7.0515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
## price -0.05448 0.00523 -10.416 < 2e-16 ***
## usYes 1.19964 0.25846 4.641 4.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
## F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
Our new model is
\[ \hat{sales} = \hat{\beta_0} + \hat{\beta_1}(Price) + \hat{\beta_2}(US) \]
where \[ \hat{\beta_0} = 13.031 \\ \hat{\beta_1} = -0.054 \\ \hat{\beta_2} = -1.2\]
R2_old = summary(model)$r.squared
R2_new = summary(modelNew)$r.squared
deltaR = R2_old - R2_new
deltaR
## [1] 1.250376e-05
The two models fit equally well judging from the insignificant difference between the coefficients of the determinants.
confint(modelNew)
## 2.5 % 97.5 %
## (Intercept) 11.79032020 14.27126531
## price -0.06475984 -0.04419543
## usYes 0.69151957 1.70776632
plot(modelNew)
There do not appear to be any outliers that significantly deviate from the model.