Consider the following dat with x as the predictor and y as the outcome
x <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62)
y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36)
Give a P-value for the two sided hypothesis test of whether \(\beta_{1}\) from a linear regression model is 0 or not.
Perform a linear regression model and then get the P-value from the summary
fit <- lm(y ~ x)
summary(fit)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.27636 -0.18807 0.01364 0.16595 0.27143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1885 0.2061 0.914 0.391
## x 0.7224 0.3107 2.325 0.053 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.223 on 7 degrees of freedom
## Multiple R-squared: 0.4358, Adjusted R-squared: 0.3552
## F-statistic: 5.408 on 1 and 7 DF, p-value: 0.05296
Consider the previous problem, give the estimate of the residual standard deviation
summary(fit)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.27636 -0.18807 0.01364 0.16595 0.27143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1885 0.2061 0.914 0.391
## x 0.7224 0.3107 2.325 0.053 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.223 on 7 degrees of freedom
## Multiple R-squared: 0.4358, Adjusted R-squared: 0.3552
## F-statistic: 5.408 on 1 and 7 DF, p-value: 0.05296
In the mtcars data set, fit a linear regression model of weight (predictor) on mpg (outcome). Get a 95% confidence interval for the expected mpg at the average weight. What is the lower endpoint?
data("mtcars")
y <- mtcars$mpg
x <- mtcars$wt
fit <- lm(y ~ x)
predict(fit, newdat = data.frame(x = mean(x)), interval = ("confidence"))
## fit lwr upr
## 1 20.09062 18.99098 21.19027
Refer to the previous question. Read the help file for mtcars. What is the weight coefficient interpreted as?
miles = Miles/(US) gallon wt = Weight (1000 lbs)
mpg per 1000 lbs, therefore for every 1k increase in weight will change the outcome of mpg
Consider again the mtcars data set and a linear regression with mpg as predicted by weight (1,000 lbs). A new car is coming, weighs 3000 pounds. Construct a 95% prediction interval for its mpg. What is the upper endpoint?
Using predict, get the predicted value.
newCarWeight <- 3000 / 1000
predict(fit, newdata = data.frame(x = newCarWeight), interval = ("prediction"))
## fit lwr upr
## 1 21.25171 14.92987 27.57355
or for calculating manually
sigma <- summary(fit)$sigma
yhat <- fit$coef[1] + fit$coef[2] * newCarWeight
predicted_tValues <- yhat + c(-1,1) * qt(.975, df=fit$df) * sigma * sqrt(1 + (1/length(y)) + ((newCarWeight - mean(x)) ^ 2 / sum((x - mean(x)) ^ 2)))
predicted_tValues
## [1] 14.92987 27.57355
Consider again the mtcars data set and a linear regression model with mpg as predicted by weight (1,000 lbs). A “short” ton is defined as 2,000 lbs. Construct a 95% confidence interval for the expected change in mpg per 1 short ton increase in weight. Give the lower endpoint.
Calculate the new fit using short ton weight
shortTonWeight <- 2000/1000
fit_shortTon <- lm(y ~ I(x/shortTonWeight))
Use the slope to calculate the change in mpg
sumCoef <- coef(summary(fit_shortTon))
or
sumCoef <- summary(fit_shortTon)$coefficients
Now calculate:
sumCoef[2,1] + c(-1,1) * qt(.975, df=fit$df) * sumCoef[2,2]
## [1] -12.97262 -8.40527
If my X from a linear regression is measured in centimeters and I convert it to meters, what would happen to the slope coefficient?
The conversion from centimeters to meters is 100 cm = 1 m
x <- round(runif(n = 10, min = 0.1, max = 0.99),2)
x
## [1] 0.75 0.70 0.59 0.62 0.89 0.30 0.61 0.65 0.36 0.95
y <- round(runif(n = 10, min = 0.1, max = 0.99),2)
y
## [1] 0.34 0.22 0.55 0.76 0.65 0.44 0.45 0.27 0.86 0.93
fit <- lm(y ~ x)
fit$coefficients[2]
## x
## 0.1251075
Now convert to meters and check the results
fit <- lm(y ~ I(x/100))
fit$coefficients[2]
## I(x/100)
## 12.51075
I have an outcome, \(Y\), and a predictor, \(X\), and a fit a linear regression model with \(Y = \beta_{0} + \beta_{1}X + \epsilon\) to obtain \(\hat\beta_{0}\) and \(\hat\beta_{1}\). What would be the consequence to the subsequent slope and intercept if I were to refit the model with a new regressor, \(X\) + \(c\) for some constant, \(c\)?
x <- round(runif(n = 10, min = 0.1, max = 0.99),2)
x
## [1] 0.87 0.56 0.69 0.66 0.27 0.25 0.92 0.91 0.47 0.32
y <- round(runif(n = 10, min = 0.1, max = 0.99),2)
y
## [1] 0.71 0.16 0.64 0.91 0.20 0.53 0.68 0.45 0.62 0.47
fit <- lm(y ~ x)
fit$coefficients
## (Intercept) x
## 0.3103232 0.3829000
Now add the constant to the equation, which basically has the new formula;
\(Y = \beta_{0} + \beta_{1}(X + c) + \epsilon\)
c = 100
fit_c <- lm(y ~ I(x + c))
fit_c$coefficients
## (Intercept) I(x + c)
## -37.97968 0.38290
fit$coefficients[1] - c * fit$coefficients[2]
## (Intercept)
## -37.97968
Refer back to the mtcars data set with mpg as an outcome and weight(wt) as the predictor. About what is the ratio of the sum of the squared errors, \(\sum^n_{i=1}(Y_{i} - \hat{Y})^2\) when comparing a model with just an intercept (denominator) to the model with the intercept and slope (numerator)?
This is simply one minus the \(R^2\) values
data(mtcars)
fit1 <- lm(mpg ~ wt, data = mtcars)
fit2 <- lm(mpg ~ 1, data = mtcars)
summary(fit1)$r.squared
## [1] 0.7528328
sse1 <- sum((predict(fit1) - mtcars$mpg)^2)
sse2 <- sum((predict(fit2) - mtcars$mpg)^2)
sse1/sse2
## [1] 0.2471672
Do the residuals always have to sum to 0 in linear regression?