q1

Consider the following data with x as the predictor and y as as the outcome.

library(ggplot2)
x <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62)
y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36)

Give a P-value for the two sided hypothesis test of whether \(\beta_1\) from a linear regression model is 0 or not.

# Plotting to have an overview about the relationship between x and y and the linear fitting.
dt <- data.frame(cbind(x, y))
g <- ggplot(dt, aes(x = x, y=y))
g <- g + geom_point(size = 7, colour = "black", alpha = 0.5)
g + geom_smooth(method = "lm", colour = "red")

# Fitting to have an answer

fit <- lm(y~x)
(Coeffit1 <- summary(fit)$coef)
##              Estimate Std. Error   t value   Pr(>|t|)
## (Intercept) 0.1884572  0.2061290 0.9142681 0.39098029
## x           0.7224211  0.3106531 2.3254912 0.05296439
# P-values is
Coeffit1[2, 4]
## [1] 0.05296439

q2

Consider the previous problem, give the estimate of the residual standard deviation.

round(summary(fit)$sigma,3)
## [1] 0.223

q3

In the mtcars data set, fit a linear regression model of weight (predictor) on mpg (outcome). Get a 95% confidence interval for the expected mpg at the average weight. What is the lower endpoint?
First Plotting for overview of the relationship between mpg and wt

g <- ggplot(mtcars, aes(x = wt, y = mpg))
g <- g + geom_point(size = 7, colour = "black", alpha = 0.5)
g + geom_smooth(method = "lm", colour = "red")

- Solution for question 3

require(datasets)
data("mtcars")

fit3 <- lm(mpg ~ I(wt-mean(wt)), mtcars)
(sumCoef <- summary(fit3)$coef)
##                   Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)      20.090625   0.538441 37.312586 1.062610e-26
## I(wt - mean(wt)) -5.344472   0.559101 -9.559044 1.293959e-10
sumCoef[1,1] + c(-1,1)*qt(.975, df = fit3$df.residual)*sumCoef[1,2]
## [1] 18.99098 21.19027
# or 
predict(fit3, newdata = data.frame(wt =mean(mtcars$wt)), interval = ("confidence"))
##        fit      lwr      upr
## 1 20.09062 18.99098 21.19027

q4

Refer to the previous question. Read the help file for mtcars. What is the weight coefficient interpreted as?

fit4 <- lm(mpg~wt, mtcars)
summary(fit4)$coef
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.285126   1.877627 19.857575 8.241799e-19
## wt          -5.344472   0.559101 -9.559044 1.293959e-10

- The estimated expected change in mpg per 1,000 lb increase in weight.

q5

Consider again the mtcars data set and a linear regression model with mpg as predicted by weight (1,000 lbs). A new car is coming weighing 3000 pounds. Construct a 95% prediction interval for its mpg. What is the upper endpoint?

fit5 <- lm(mpg ~ wt, mtcars)
# confidence interval
p1 <- predict(fit5, newdata = data.frame(wt = 3), interval = ("confidence"))

# Predection interval
predict(fit5, newdata = data.frame(wt=3), interval = ("prediction"))
##        fit      lwr      upr
## 1 21.25171 14.92987 27.57355

Plotting for overview

x <- mtcars$wt
y <- mtcars$mpg
fit <- lm(y~x)

newx = data.frame(x = seq(min(x), max(x), length = 100))
p1 = data.frame(predict(fit, newdata= newx,interval = ("confidence")))
p2 = data.frame(predict(fit, newdata = newx,interval = ("prediction")))
p1$interval = "confidence"
p2$interval = "prediction"
p1$x = newx$x
p2$x = newx$x
dat = rbind(p1, p2)
names(dat)[1] = "y"
g = ggplot(dat, aes(x = x, y = y)) + labs(x = "weight") + labs (y = "mpg")
g = g + geom_ribbon(aes(ymin = lwr, ymax = upr, fill = interval), alpha = 0.2)
g = g + geom_line(colour = "red", lwd = 2)
g = g + geom_point(data = data.frame(x = x, y=y), aes(x = x, y = y), size = 4)
g

q6

Consider again the mtcars data set and a linear regression model with mpg as predicted by weight (in 1,000 lbs). A “short” ton is defined as 2,000 lbs. Construct a 95% confidence interval for the expected change in mpg per 1 short ton increase in weight. Give the lower endpoint.

library(dplyr)
mtcars = mutate(mtcars, short_ton = wt/2)
fit6 <- lm(mpg ~ short_ton, mtcars)
sumCoef6 <- summary(fit6)$coef
sumCoef6[2,1] + c(-1,1)*qt(.975, df = fit6$df.residual)*sumCoef6[2,2]
## [1] -12.97262  -8.40527
# Note that here we estimate the expected change in mpg per 1 short ton increase, then we do not include the intercept in the calculation.

# OR

fit6 <- lm(mpg ~ I(wt/2), mtcars)
sumCoef6 <- summary(fit6)$coef
sumCoef6[2,1] + c(-1,1)*qt(.975, df = fit6$df.residual)*sumCoef6[2,2]
## [1] -12.97262  -8.40527

q7

If my X from a linear regression is measured in centimeters and I convert it to meters what would happen to the slope coefficient?
- Let see this example

# x in centimet
x_cm <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62)
y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36)

fit7_cm <- lm(y~x_cm)
# the slope is
summary(fit7_cm)$coef[2,1]
## [1] 0.7224211
# x in meter
x_m <- x_cm/100
fit7_m <- lm(y~x_m)
# the slope is
summary(fit7_m)$coef[2,1]
## [1] 72.24211

- It would get multiplied by 100

q8

I have an outcome, Y, and a predictor, X and fit a linear regression model with \(Y = \beta_0 + \beta_1x + \epsilon\) to obtain \(\beta_0\) and \(\beta_1\). What would be the consequence to the subsequent slope and intercept if I were to refit the model with a new regressor, X+c for some constant, c?
- Let see \[Y = \beta_0 + \beta_1x + \epsilon\] Or \[Y = \beta_0 + \beta_1(x+c) - c\beta_1 + \epsilon\] Then

- the new intercept would be \(\beta_0 - c\beta_1\)

q9

Refer back to the mtcars data set with mpg as an outcome and weight (wt) as the predictor. About what is the ratio of the sum of the squared errors, \(\sum_{i = 1}^{n} = (Y_i - \hat{Y}_i)^2\) when comparing a model with just an intercept (denominator) to the model with the intercept and slope (numerator)?

fit9 <- lm(mpg ~ wt, mtcars)
round(sum(resid(fit9)^2) / sum((mtcars$mpg - mean(mtcars$mpg))^2),2)
## [1] 0.25

q10

Do the residuals always have to sum to 0 in linear regression?

- If an intercept is included, then they will sum to 0.