library(ISLR)
data("Carseats")
MSSPEED <- 21186/1
MSSPEED
## [1] 21186
MSRESIDUAL <- 11354/48
MSRESIDUAL
## [1] 236.5417
FVALUE <- MSSPEED/MSRESIDUAL
FVALUE
## [1] 89.56562
pf(FVALUE, 1, 48, lower.tail = FALSE)
## [1] 1.490228e-12
tvalueinter <- -17.5791/6.7584
tvalueinter
## [1] -2.601074
tvaluespeed <- 3.9324/0.4155
tvaluespeed
## [1] 9.46426
teststat <- 3.9324/0.4155
n <- 50
pt(teststat, df = n-2, lower.tail = FALSE)*2
## [1] 1.488495e-12
48 degrees of freedom
SS <- 21186
SSTOT <- 21186+11354
Rsqr <- SS/SSTOT
Rsqr
## [1] 0.6510756
FVALUE <- MSSPEED/MSRESIDUAL
FVALUE
## [1] 89.56562
on 1 and 48 DF
pt(teststat, df = n-2, lower.tail = FALSE)*2
## [1] 1.488495e-12
The variables that are numeric are Sales and Price. The variables that are categorical are Urban and US. They both have two levels, yes and no.
mod1 <- lm(Sales ~ Price+Urban+US, data = Carseats)
summary(mod1)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
Price could be interpreted by saying that the average effect of price increases by $1 is a 5.446 decrease in sales while all other variables are held constant. UrbanYes could be interpreted by saying that the average, depending on the urban location, the sales decrease by 2.19 while all other variables are held constant. USYes could be interpreted by saying that the average, sales in a US store are 120.05 units more than a non US store while all other variables are held constant.
Sales = 13.04+(-0.05)Price+(-0.02)Urban+(1.20)US
We are able to reject the null hypothesis for the variables Price and USYes. Their p-values are all significantly small.
mod2 <- lm(Sales ~ Price + US, data = Carseats)
summary(mod2)
##
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9269 -1.6286 -0.0574 1.5766 7.0515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
## Price -0.05448 0.00523 -10.416 < 2e-16 ***
## USYes 1.19964 0.25846 4.641 4.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
## F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
anova(mod1)
## Analysis of Variance Table
##
## Response: Sales
## Df Sum Sq Mean Sq F value Pr(>F)
## Price 1 630.03 630.03 103.0603 < 2.2e-16 ***
## Urban 1 0.10 0.10 0.0158 0.9001
## US 1 131.31 131.31 21.4802 4.86e-06 ***
## Residuals 396 2420.83 6.11
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(mod2)
## Analysis of Variance Table
##
## Response: Sales
## Df Sum Sq Mean Sq F value Pr(>F)
## Price 1 630.03 630.03 103.319 < 2.2e-16 ***
## US 1 131.37 131.37 21.543 4.707e-06 ***
## Residuals 397 2420.87 6.10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Mean Squared Error for mod2 is a little lower than mod1. Thus, mod2 is the better fit.
confint(mod1)
## 2.5 % 97.5 %
## (Intercept) 11.76359670 14.32334118
## Price -0.06476419 -0.04415351
## UrbanYes -0.55597316 0.51214085
## USYes 0.69130419 1.70984121
The confidence interval tells us that we are 95% confident that the true mean slope lies between (-0.06,-0.04) for Price, (-0.56,0.51) for UrbanYes and (0.69,1.71) for USYes.