#problem 10 This question should be answered using the Carseats data set.
library(ISLR2)
attach(Carseats)
head(Carseats)
summary(Carseats)
## Sales CompPrice Income Advertising
## Min. : 0.000 Min. : 77 Min. : 21.00 Min. : 0.000
## 1st Qu.: 5.390 1st Qu.:115 1st Qu.: 42.75 1st Qu.: 0.000
## Median : 7.490 Median :125 Median : 69.00 Median : 5.000
## Mean : 7.496 Mean :125 Mean : 68.66 Mean : 6.635
## 3rd Qu.: 9.320 3rd Qu.:135 3rd Qu.: 91.00 3rd Qu.:12.000
## Max. :16.270 Max. :175 Max. :120.00 Max. :29.000
## Population Price ShelveLoc Age Education
## Min. : 10.0 Min. : 24.0 Bad : 96 Min. :25.00 Min. :10.0
## 1st Qu.:139.0 1st Qu.:100.0 Good : 85 1st Qu.:39.75 1st Qu.:12.0
## Median :272.0 Median :117.0 Medium:219 Median :54.50 Median :14.0
## Mean :264.8 Mean :115.8 Mean :53.32 Mean :13.9
## 3rd Qu.:398.5 3rd Qu.:131.0 3rd Qu.:66.00 3rd Qu.:16.0
## Max. :509.0 Max. :191.0 Max. :80.00 Max. :18.0
## Urban US
## No :118 No :142
## Yes:282 Yes:258
##
##
##
##
(a) Fit a multiple regression model to predict Sales using Price, Urban, and US.
fit<-lm(Sales ~ Price + Urban + US, data = Carseats)
summary(fit)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
(b) Provide an interpretation of each coefficient in the model. Be careful—some of the variables in the model are qualitative!
price coeffecient for price is -0.054459. Means for
every dollar increase and the proce of the carseat, the sotre sale
decreases by $54 on average.
The coefficient for US = Yes is 1.200573 which means
comapred to stores outsides the US, the carseats are sold on average
$1,200 more comparewd to stores outside the US.
(c) Write out the model in equation form, being careful to handle the qualitative variables properly. $ Sales = 13.04 - 0.05Price -0.022Urban + 1.2US$
(d) For which of the predictors can you reject the null hypothesis _H0 : βj = 0?__
see part (B) for interpretation. Price and US = yes are significant, we can reject the null hypothesis $H_0: $
(e) On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.
fit<-lm(Sales ~ Price + Urban + US, data = Carseats)
summary(fit)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
(f) How well do the models in (a) and (e) fit the data?
Not good: adjusted R squared is 0.2335 for part a, and adjusted r squared is 0.2354 for part e. > 0.7 is preferred (g) Using the model from (e), obtain 95 % confidence intervals for the coefficient(s).
confint(fit)
## 2.5 % 97.5 %
## (Intercept) 11.76359670 14.32334118
## Price -0.06476419 -0.04415351
## UrbanYes -0.55597316 0.51214085
## USYes 0.69130419 1.70984121
(h) Is there evidence of outliers or high leverage observations in the model from (e)?
par(mfrow=c(2,2))
plot(fit)
1. Residuals vs Fitted Plot The residuals show
to be randomely scattered around zerom indicating no major non-linearity
or heteroscedasticity, few residuals seem to be larger
then -+ 5, indicating potential outliers
2. Q-Q Plot of Residuals Majority of the points lie close to the diagonal line, while some deviate at the right tail, which indicates outlier
3. Scale-Location Plot The data points are evenly spread. Some variance increases at the higher fitted values
4. Residuals vs Leverage Plot The majority of points have low leverage