Question 1

Consider the following data with x as the predictor and y as as the outcome.

x <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62)
y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36)

Give a P-value for the two sided hypothesis test of whether \(\beta_1\) from a linear regression model is 0 or not.

Answer. From the linear model, using x as the predictor and y as the outcome:

fit <- lm(y ~ x)
summary(fit)$coefficients
##              Estimate Std. Error   t value   Pr(>|t|)
## (Intercept) 0.1884572  0.2061290 0.9142681 0.39098029
## x           0.7224211  0.3106531 2.3254912 0.05296439

As we can see, the anser is 0.05296.

Question 2

Consider the previous problem, give the estimate of the residual standard deviation.

Answer. This can be found reading from summary(fit) or, equivalently, with the command summary(fit)$sigma.

Either way, the answer is 0.223.

Question 3

In the mtcars data set, fit a linear regression model of weight (predictor) on mpg (outcome). Get a 95% confidence interval for the expected mpg at the average weight. What is the lower endpoint?

Answer. Let us first load the database into R, then we can calculate the linear model with the function lm and predict the confidence interval of the average weight using the predict method, setting the variable interval to “confidence”.

data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
carlm <- lm(mpg ~ wt, data = mtcars)
meanvalue <- mean(mtcars$wt)
predict(carlm, newdata=data.frame(wt = meanvalue), interval = "confidence")
##        fit      lwr      upr
## 1 20.09062 18.99098 21.19027

The lower value is therefore 18.991.

Question 4

Refer to the previous question. Read the help file for mtcars. What is the weight coefficient interpreted as?

Answer.

Typing ?mtcars we can read in the help that wt represents the weight in thousand pounds. So, as the model uses wt as predictor for the miles per gallon, we can say that is

Question 5

Consider again the mtcars data set and a linear regression model with mpg as predicted by weight (1,000 lbs). A new car is coming weighing 3000 pounds. Construct a 95% prediction interval for its mpg. What is the upper endpoint?

Answer.

This is a simple application of the predict method with w = 3.

predict(carlm, newdata = data.frame(wt = 3), interval = "predict")
##        fit      lwr      upr
## 1 21.25171 14.92987 27.57355

Therefore, the upper endpoint is 27.57.

Question 6

Consider again the mtcars data set and a linear regression model with mpg as predicted by weight (in 1,000 lbs). A “short” ton is defined as 2,000 lbs. Construct a 95% confidence interval for the expected change in mpg per 1 short ton increase in weight. Give the lower endpoint.

Answers.

It suffices to scale the predictor by a factor 2.

newlm <- lm(mpg ~ I(wt / 2), data = mtcars)
summary(newlm)
## 
## Call:
## lm(formula = mpg ~ I(wt/2), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   37.285      1.878  19.858  < 2e-16 ***
## I(wt/2)      -10.689      1.118  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10
confint(newlm)
##                 2.5 %   97.5 %
## (Intercept)  33.45050 41.11975
## I(wt/2)     -12.97262 -8.40527

The lower endpoint is therefore -12.973.

Question 7

If my X from a linear regression is measured in centimeters and I convert it to meters what would happen to the slope coefficient?

Answer.

When you multiply a regression variable by a factor, the slope will be divided by the same factor. So, multiplying x by 1/100 will divide \(\beta_1\) by 1/100 that means multiply by 100. Hence,

Question 8

I have an outcome, \(Y\), and a predictor, \(X\), and fit a linear regression model with \(Y = \beta_0 + \beta_1 X + \varepsilon\) to obtain \(\hat{\beta}_0\) and \(\hat{\beta}_1\). What would be the consequence to the subsequent slope and intercept if I were to refit the model with a new regressor, \(X+c\), for some constant, \(c\)?

Answer.

Suffice it to see that if \(Y = \beta_0 + \beta_1 X + \varepsilon\), then \(Y = \beta_0 - c\beta_1 + \beta_1(X + c) + \varepsilon\). This shows that the gradient remains \(\beta_1\) and the y-intercept becomes \(\beta_0 - c\beta_1\). Hence,

Question 9

Refer back to the mtcars data set with mpg as an outcome and weight (wt) as the predictor. About what is the ratio of the the sum of the squared errors, \(\sum_{i=1}^n (Y_i - \bar{Y})^2\) when comparing a model with just an intercept (denominator) to the model with the intercept and slope (numerator)?

Answer.

Intercept_Slope <- lm(mpg ~ wt, data = mtcars)
Just_Intercept <- lm(mpg ~ 1, data =mtcars)
num <- sum((predict(Intercept_Slope) - mtcars$mpg)^2)
den <- sum((predict(Just_Intercept) - mtcars$mpg)^2)
num / den
## [1] 0.2471672

So the answer is 0.25.

Question 10

Do the residuals always have to sum to 0 in linear regression?

Answer.