HW7

library(ISLR)
data("Carseats")

Part 1.

Missing values for ANOVA Table

1

MSSPEED <- 21186/1
MSSPEED

## [1] 21186

2.

MSRESIDUAL <- 11354/48
MSRESIDUAL

## [1] 236.5417

3.

FVALUE <- MSSPEED/MSRESIDUAL
FVALUE

## [1] 89.56562

4.

pf(FVALUE, 1, 48, lower.tail = FALSE)

## [1] 1.490228e-12

Missing values for Summary Output

1.

tvalueinter <- -17.5791/6.7584
tvalueinter

## [1] -2.601074

2.

tvaluespeed <- 3.9324/0.4155
tvaluespeed

## [1] 9.46426

3.

teststat <- 3.9324/0.4155
n <- 50
 
pt(teststat, df = n-2, lower.tail = FALSE)*2

## [1] 1.488495e-12

4.

48 degrees of freedom

5.

SS <- 21186
SSTOT <- 21186+11354
Rsqr <- SS/SSTOT
Rsqr

## [1] 0.6510756

6.

FVALUE <- MSSPEED/MSRESIDUAL
FVALUE

## [1] 89.56562

7.

on 1 and 48 DF

8.

pt(teststat, df = n-2, lower.tail = FALSE)*2

## [1] 1.488495e-12

Part 2.

A.

The variables that are numeric are Sales and Price. The variables that are categorical are Urban and US. They both have two levels, yes and no.

B.

mod1 <- lm(Sales ~ Price+Urban+US, data = Carseats)
summary(mod1)

## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

C.

Price could be interpreted by saying that the average effect of price increases by $1 is a 5.446 decrease in sales while all other variables are held constant. UrbanYes could be interpreted by saying that the average, depending on the urban location, the sales decrease by 2.19 while all other variables are held constant. USYes could be interpreted by saying that the average, sales in a US store are 120.05 units more than a non US store while all other variables are held constant.

D.

Sales = 13.04+(-0.05)Price+(-0.02)Urban+(1.20)US

E.

We are able to reject the null hypothesis for the variables Price and USYes. Their p-values are all significantly small.

F.

mod2 <- lm(Sales ~ Price + US, data = Carseats)
summary(mod2)

## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

G.

anova(mod1)

## Analysis of Variance Table
## 
## Response: Sales
##            Df  Sum Sq Mean Sq  F value    Pr(>F)    
## Price       1  630.03  630.03 103.0603 < 2.2e-16 ***
## Urban       1    0.10    0.10   0.0158    0.9001    
## US          1  131.31  131.31  21.4802  4.86e-06 ***
## Residuals 396 2420.83    6.11                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(mod2)

## Analysis of Variance Table
## 
## Response: Sales
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## Price       1  630.03  630.03 103.319 < 2.2e-16 ***
## US          1  131.37  131.37  21.543 4.707e-06 ***
## Residuals 397 2420.87    6.10                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The Mean Squared Error for mod2 is a little lower than mod1. Thus, mod2 is the better fit.

H.

confint(mod1)

##                   2.5 %      97.5 %
## (Intercept) 11.76359670 14.32334118
## Price       -0.06476419 -0.04415351
## UrbanYes    -0.55597316  0.51214085
## USYes        0.69130419  1.70984121

The confidence interval tells us that we are 95% confident that the true mean slope lies between (-0.06,-0.04) for Price, (-0.56,0.51) for UrbanYes and (0.69,1.71) for USYes.

HW7

Nikki Seina

10/29/2020

Part 1.

Missing values for ANOVA Table

1

2.

3.

4.

Missing values for Summary Output

1.

2.

3.

4.

5.

6.

7.

8.

Part 2.

A.

B.

C.

D.

E.

F.

G.

H.