Consider the following data with x as the predictor and
y as as the outcome.
x <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62)
y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36)
Give a P-value for the two sided hypothesis test of whether \(\beta_1\) from a linear regression model is 0 or not.
Answer. From the linear model, using x
as the predictor and y as the outcome:
fit <- lm(y ~ x)
summary(fit)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1884572 0.2061290 0.9142681 0.39098029
## x 0.7224211 0.3106531 2.3254912 0.05296439
As we can see, the anser is 0.05296.
Consider the previous problem, give the estimate of the residual standard deviation.
Answer. This can be found reading from
summary(fit) or, equivalently, with the command
summary(fit)$sigma.
Either way, the answer is 0.223.
In the mtcars data set, fit a linear regression model of
weight (predictor) on mpg (outcome). Get a 95%
confidence interval for the expected mpg at the average weight. What is
the lower endpoint?
Answer. Let us first load the database into R, then
we can calculate the linear model with the function lm and
predict the confidence interval of the average weight using the
predict method, setting the variable interval
to “confidence”.
data(mtcars)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
carlm <- lm(mpg ~ wt, data = mtcars)
meanvalue <- mean(mtcars$wt)
predict(carlm, newdata=data.frame(wt = meanvalue), interval = "confidence")
## fit lwr upr
## 1 20.09062 18.99098 21.19027
The lower value is therefore 18.991.
Refer to the previous question. Read the help file for
mtcars. What is the weight coefficient interpreted as?
Answer.
Typing ?mtcars we can read in the help that
wt represents the weight in thousand pounds. So, as the
model uses wt as predictor for the miles per gallon, we can say that
is
Consider again the mtcars data set and a linear
regression model with mpg as predicted by weight (1,000 lbs). A new car
is coming weighing 3000 pounds. Construct a 95% prediction interval for
its mpg. What is the upper endpoint?
Answer.
This is a simple application of the predict method with
w = 3.
predict(carlm, newdata = data.frame(wt = 3), interval = "predict")
## fit lwr upr
## 1 21.25171 14.92987 27.57355
Therefore, the upper endpoint is 27.57.
Consider again the mtcars data set and a linear regression model with mpg as predicted by weight (in 1,000 lbs). A “short” ton is defined as 2,000 lbs. Construct a 95% confidence interval for the expected change in mpg per 1 short ton increase in weight. Give the lower endpoint.
Answers.
It suffices to scale the predictor by a factor 2.
newlm <- lm(mpg ~ I(wt / 2), data = mtcars)
summary(newlm)
##
## Call:
## lm(formula = mpg ~ I(wt/2), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.285 1.878 19.858 < 2e-16 ***
## I(wt/2) -10.689 1.118 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
confint(newlm)
## 2.5 % 97.5 %
## (Intercept) 33.45050 41.11975
## I(wt/2) -12.97262 -8.40527
The lower endpoint is therefore -12.973.
If my X from a linear regression is measured in centimeters and I convert it to meters what would happen to the slope coefficient?
Answer.
When you multiply a regression variable by a factor, the slope will
be divided by the same factor. So, multiplying x by
1/100 will divide \(\beta_1\) by 1/100 that means
multiply by 100. Hence,
I have an outcome, \(Y\), and a predictor, \(X\), and fit a linear regression model with \(Y = \beta_0 + \beta_1 X + \varepsilon\) to obtain \(\hat{\beta}_0\) and \(\hat{\beta}_1\). What would be the consequence to the subsequent slope and intercept if I were to refit the model with a new regressor, \(X+c\), for some constant, \(c\)?
Answer.
Suffice it to see that if \(Y = \beta_0 + \beta_1 X + \varepsilon\), then \(Y = \beta_0 - c\beta_1 + \beta_1(X + c) + \varepsilon\). This shows that the gradient remains \(\beta_1\) and the y-intercept becomes \(\beta_0 - c\beta_1\). Hence,
Refer back to the mtcars data set with mpg as an outcome
and weight (wt) as the predictor. About what is the ratio of the the sum
of the squared errors, \(\sum_{i=1}^n (Y_i -
\bar{Y})^2\) when comparing a model with just an intercept
(denominator) to the model with the intercept and slope (numerator)?
Answer.
Intercept_Slope <- lm(mpg ~ wt, data = mtcars)
Just_Intercept <- lm(mpg ~ 1, data =mtcars)
num <- sum((predict(Intercept_Slope) - mtcars$mpg)^2)
den <- sum((predict(Just_Intercept) - mtcars$mpg)^2)
num / den
## [1] 0.2471672
So the answer is 0.25.
Do the residuals always have to sum to 0 in linear regression?
Answer.