Question 1

Consider the following dat with x as the predictor and y as the outcome

x <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62)
y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36)

Give a P-value for the two sided hypothesis test of whether \(\beta_{1}\) from a linear regression model is 0 or not.

  1. 0.05296
  2. 2.325
  3. 0.391
  4. 0.025

Answer

Perform a linear regression model and then get the P-value from the summary

fit <- lm(y ~ x)
summary(fit)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27636 -0.18807  0.01364  0.16595  0.27143 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.1885     0.2061   0.914    0.391  
## x             0.7224     0.3107   2.325    0.053 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.223 on 7 degrees of freedom
## Multiple R-squared:  0.4358, Adjusted R-squared:  0.3552 
## F-statistic: 5.408 on 1 and 7 DF,  p-value: 0.05296

Question 2

Consider the previous problem, give the estimate of the residual standard deviation

  1. 0.4358
  2. 0.223
  3. 0.05296
  4. 0.3552

Answer

summary(fit)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27636 -0.18807  0.01364  0.16595  0.27143 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.1885     0.2061   0.914    0.391  
## x             0.7224     0.3107   2.325    0.053 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.223 on 7 degrees of freedom
## Multiple R-squared:  0.4358, Adjusted R-squared:  0.3552 
## F-statistic: 5.408 on 1 and 7 DF,  p-value: 0.05296

Question 3

In the mtcars data set, fit a linear regression model of weight (predictor) on mpg (outcome). Get a 95% confidence interval for the expected mpg at the average weight. What is the lower endpoint?

  1. -4.00
  2. 21.190
  3. -6.486
  4. 18.991

Answer

data("mtcars")
y <- mtcars$mpg
x <- mtcars$wt
fit <- lm(y ~ x)

predict(fit, newdat = data.frame(x = mean(x)), interval = ("confidence"))
##        fit      lwr      upr
## 1 20.09062 18.99098 21.19027

Question 4

Refer to the previous question. Read the help file for mtcars. What is the weight coefficient interpreted as?

  1. It can’t be interpreted without further information
  2. The estimated expected change in mpg per 1 lb increase in weight.
  3. The estimated expected change in mpg per 1,000 lb increase in weight.
  4. The estimated 1,000 lb change in weight per 1 mpg increase.

Answer

miles = Miles/(US) gallon wt = Weight (1000 lbs)

mpg per 1000 lbs, therefore for every 1k increase in weight will change the outcome of mpg

Question 5

Consider again the mtcars data set and a linear regression with mpg as predicted by weight (1,000 lbs). A new car is coming, weighs 3000 pounds. Construct a 95% prediction interval for its mpg. What is the upper endpoint?

  1. -5.77
  2. 21.25
  3. 27.57
  4. 14.93

Answer

Using predict, get the predicted value.

newCarWeight <- 3000 / 1000
predict(fit, newdata = data.frame(x = newCarWeight), interval = ("prediction"))
##        fit      lwr      upr
## 1 21.25171 14.92987 27.57355

or for calculating manually

sigma <- summary(fit)$sigma
yhat <- fit$coef[1] + fit$coef[2] * newCarWeight
predicted_tValues <- yhat + c(-1,1) * qt(.975, df=fit$df) * sigma * sqrt(1 + (1/length(y)) + ((newCarWeight - mean(x)) ^ 2 / sum((x - mean(x)) ^ 2)))

predicted_tValues
## [1] 14.92987 27.57355

Question 6

Consider again the mtcars data set and a linear regression model with mpg as predicted by weight (1,000 lbs). A “short” ton is defined as 2,000 lbs. Construct a 95% confidence interval for the expected change in mpg per 1 short ton increase in weight. Give the lower endpoint.

  1. -9.000
  2. -12.973
  3. -6.486
  4. 4.2026

Answer

Calculate the new fit using short ton weight

shortTonWeight <- 2000/1000
fit_shortTon <- lm(y ~ I(x/shortTonWeight))

Use the slope to calculate the change in mpg

sumCoef <- coef(summary(fit_shortTon))

or

sumCoef <- summary(fit_shortTon)$coefficients

Now calculate:

sumCoef[2,1] + c(-1,1) * qt(.975, df=fit$df) * sumCoef[2,2]
## [1] -12.97262  -8.40527

Question 7

If my X from a linear regression is measured in centimeters and I convert it to meters, what would happen to the slope coefficient?

  1. It would get divided by 100
  2. It would get divided by 10
  3. It would get multiplied by 100
  4. It would get multiplied by 10

Answer

The conversion from centimeters to meters is 100 cm = 1 m

x <- round(runif(n = 10, min = 0.1, max = 0.99),2)
x
##  [1] 0.75 0.70 0.59 0.62 0.89 0.30 0.61 0.65 0.36 0.95
y <- round(runif(n = 10, min = 0.1, max = 0.99),2)
y
##  [1] 0.34 0.22 0.55 0.76 0.65 0.44 0.45 0.27 0.86 0.93
fit <- lm(y ~ x)
fit$coefficients[2]
##         x 
## 0.1251075

Now convert to meters and check the results

fit <- lm(y ~ I(x/100))
fit$coefficients[2]
## I(x/100) 
## 12.51075

Question 8

I have an outcome, \(Y\), and a predictor, \(X\), and a fit a linear regression model with \(Y = \beta_{0} + \beta_{1}X + \epsilon\) to obtain \(\hat\beta_{0}\) and \(\hat\beta_{1}\). What would be the consequence to the subsequent slope and intercept if I were to refit the model with a new regressor, \(X\) + \(c\) for some constant, \(c\)?

  1. The new intercept would be \(\hat\beta_{0} + c\hat\beta_{1}\)
  2. The new intercept would be \(\hat\beta_{0} - c\hat\beta_{1}\)
  3. The new slope would be \(c\hat\beta_{1}\)
  4. The new slope would be \(\hat\beta_{1} + c\)

Answer

x <- round(runif(n = 10, min = 0.1, max = 0.99),2)
x
##  [1] 0.87 0.56 0.69 0.66 0.27 0.25 0.92 0.91 0.47 0.32
y <- round(runif(n = 10, min = 0.1, max = 0.99),2)
y
##  [1] 0.71 0.16 0.64 0.91 0.20 0.53 0.68 0.45 0.62 0.47
fit <- lm(y ~ x)
fit$coefficients
## (Intercept)           x 
##   0.3103232   0.3829000

Now add the constant to the equation, which basically has the new formula;

\(Y = \beta_{0} + \beta_{1}(X + c) + \epsilon\)

c = 100
fit_c <- lm(y ~ I(x + c))
fit_c$coefficients
## (Intercept)    I(x + c) 
##   -37.97968     0.38290
fit$coefficients[1] - c * fit$coefficients[2]
## (Intercept) 
##   -37.97968

Question 9

Refer back to the mtcars data set with mpg as an outcome and weight(wt) as the predictor. About what is the ratio of the sum of the squared errors, \(\sum^n_{i=1}(Y_{i} - \hat{Y})^2\) when comparing a model with just an intercept (denominator) to the model with the intercept and slope (numerator)?

  1. 0.50
  2. 0.25
  3. 4.00
  4. 0.75

Answer

This is simply one minus the \(R^2\) values

data(mtcars)
fit1 <- lm(mpg ~ wt, data = mtcars)
fit2 <- lm(mpg ~ 1, data = mtcars)
summary(fit1)$r.squared
## [1] 0.7528328
sse1 <- sum((predict(fit1) - mtcars$mpg)^2)
sse2 <- sum((predict(fit2) - mtcars$mpg)^2)
sse1/sse2
## [1] 0.2471672

Question 10

Do the residuals always have to sum to 0 in linear regression?

  1. If an intercept is included, the residuals most likely won’t sum to zero
  2. If an intercept is included, then they will sum to 0
  3. The residuals must always sum to zero
  4. The residuals never sum to zero