Problem 1

  1. t value for intercept: -17.5791/6.7584 = -2.601
  2. t value for speed: 3.9324/0.1455 = 27.026
  3. Pr(t) for speed = 2.13e-31
  4. degrees of freedom = 49
  5. multiple R-squared = 1- 21186/(21186+11354) = 0.3489244
  6. F = 8.963351 on 1 and 48 df, p = .00434
  7. mean squares for speed = 21186
  8. mean squares for residuals = 2363.625
  9. F = 21186/2363.625 = 8.963351
  10. p = .00434
# pvalue for the t test
pt(q=27.026,df= 49, lower.tail = FALSE)
## [1] 2.131753e-31
# pvalue for the f test
pf(q =8.963351, df1 =1, df2 = 48, lower.tail = FALSE)
## [1] 0.004344262

Problem 2

library(ISLR)
data("Carseats")
str(Carseats)
## 'data.frame':    400 obs. of  11 variables:
##  $ Sales      : num  9.5 11.22 10.06 7.4 4.15 ...
##  $ CompPrice  : num  138 111 113 117 141 124 115 136 132 132 ...
##  $ Income     : num  73 48 35 100 64 113 105 81 110 113 ...
##  $ Advertising: num  11 16 10 4 3 13 0 15 0 0 ...
##  $ Population : num  276 260 269 466 340 501 45 425 108 131 ...
##  $ Price      : num  120 83 80 97 128 72 108 120 124 124 ...
##  $ ShelveLoc  : Factor w/ 3 levels "Bad","Good","Medium": 1 2 3 3 1 1 3 2 3 3 ...
##  $ Age        : num  42 65 59 55 38 78 71 67 76 76 ...
##  $ Education  : num  17 10 12 14 13 16 15 10 10 17 ...
##  $ Urban      : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 2 2 1 1 ...
##  $ US         : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 2 1 2 1 2 ...

A.) Describe the variables

  -sales: numeric variable
  -price: numeric variable
  -urban: categorical with 2 levels, no and yes
  -US: categorical variable with 2 levels, no and yes

B.) Fit a regression model

mod1 <- lm(Sales ~ Price + Urban + US, data = Carseats)
summary(mod1)
## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

C.) Interpret the Coefficients

The intercept is where the regression line crosses the y axis, which identifies the line uniquely. Theoretically it means when the price of carseats is 0, there would be 13.04 thousand sales.

The coefficient for price gives us the slope of the line, and tells us that with everything else held constant, as price increases by one unit, sales decrease by .054 thousand.

UrbanYes tells us that locations in urban areas sell .022 thousand less than non urban areas with everything else held constant.

Similarly, USYes tells us that locations in the US sell 1.2 thousand more than locations outside the US with everything else held constant.

D.) Write equations for the model

When: UrbanYes and USYes \[ y = 13.043 -.0219 + 1.2 -.054x \] \[ y = 14.221 -.054x \] When: UrbanYes and USno \[ y = 13.043 -.0219 -.054x\] \[ y = 13.014 - .054x \]

When: UrbanNo and USYes \[ y = 13.043 + 1.2 -.054x\] \[ y = 14.243 - .054x\]

When: UrbanNo and USNo \[y = 13.042 -.054x \]

E.) We reject the null hypothesis for price and USYes.

F.) Fit a new model:

mod2 <- lm(Sales~Price+US, data = Carseats)
summary(mod2)
## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

G.) How well do these models fit the data?

summary(mod1)
## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16
summary(mod2)
## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

Model 2 fits the data slightly better. The RSE for mod2 is 2.469 compared to an RSE of 2.472 for mod1, and the adjusted R-squared is higher for mod2 than mod1.

H.) Confidence Intervals

These show us that 95% of confidence intervals constructed in this way will contain the true parameter value. Because none of the intervals, except for UrbanYes, contain 0, we assume they are significant and indicate a relationship between the predictor and the response variable.

confint(mod1)
##                   2.5 %      97.5 %
## (Intercept) 11.76359670 14.32334118
## Price       -0.06476419 -0.04415351
## UrbanYes    -0.55597316  0.51214085
## USYes        0.69130419  1.70984121