Consider the following data with x as the predictor and y as as the outcome.
x <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62) y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36) Give a P-value for the two sided hypothesis test of whether β1 from a linear regression model is 0 or not. H0, beta1 = 0 H1, beta1 != 0
x <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62)
y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36)
df1 <- data.frame(x, y)
fit1 <- lm (y ~ x)
fit1
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## 0.1885 0.7224
library(ggplot2)
ggplot (data=df1, aes(x=x, y=y)) + geom_point() + stat_smooth(method = "lm")
yp <- fit1$fitted.values
t.test(y, yp, alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: y and yp
## t = 0, df = 13.86, p-value = 1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2381402 0.2381402
## sample estimates:
## mean of x mean of y
## 0.6355556 0.6355556
beta <- fit1$coefficients[2]
se <- summary(fit1)$coef[[4]]
Tvalue <- (beta - 0)/se
pvalue <- pt(Tvalue, length(x)-2, lower.tail = FALSE) *2
pvalue
## x
## 0.05296439
Consider the previous problem, give the estimate of the residual standard deviation.
summary(fit1)$sigma
## [1] 0.2229981
In the 𝚖𝚝𝚌𝚊𝚛𝚜 data set, fit a linear regression model of weight (predictor) on mpg (outcome). Get a 95% confidence interval for the expected mpg at the average weight. What is the lower endpoint?
data (mtcars)
ggplot(data = mtcars, aes(x=wt, y = mpg)) + geom_point() + stat_smooth(method = "lm")
fit3 <- lm (mpg~wt, data = mtcars)
summary(fit3)
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## wt -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
new_data <- data.frame(wt = mean(mtcars$wt))
predict(fit3, new_data, interval = "confidence")
## fit lwr upr
## 1 20.09062 18.99098 21.19027
Refer to the previous question. Read the help file for 𝚖𝚝𝚌𝚊𝚛𝚜. What is the weight coefficient interpreted as?
Consider again the 𝚖𝚝𝚌𝚊𝚛𝚜 data set and a linear regression model with mpg as predicted by weight (1,000 lbs). A new car is coming weighing 3000 pounds. Construct a 95% prediction interval for its mpg. What is the upper endpoint?
new_data5 <- data.frame(wt = 3)
predict(fit3, new_data5, interval = "prediction")
## fit lwr upr
## 1 21.25171 14.92987 27.57355
Consider again the 𝚖𝚝𝚌𝚊𝚛𝚜 data set and a linear regression model with mpg as predicted by weight (in 1,000 lbs). A “short” ton is defined as 2,000 lbs. Construct a 95% confidence interval for the expected change in mpg per 1 short ton increase in weight. Give the lower endpoint.
new_data5 <- data.frame(wt = -2)
sumfit3 <- summary(fit3)
n6 <- length(mtcars$wt)
(sumfit3$coefficients[2,1] +sumfit3$coefficients[2,2]*qt(0.025, n6-2))*2
## [1] -12.97262
If my X from a linear regression is measured in centimeters and I convert it to meters what would happen to the slope coefficient?
newmpg <- mtcars$mpg *100
fit7 <- lm (newmpg ~ mtcars$wt)
fit7
##
## Call:
## lm(formula = newmpg ~ mtcars$wt)
##
## Coefficients:
## (Intercept) mtcars$wt
## 3728.5 -534.4
fit3
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Coefficients:
## (Intercept) wt
## 37.285 -5.344
Refer back to the mtcars data set with mpg as an outcome and weight (wt) as the predictor. About what is the ratio of the the sum of the squared errors, ∑ni=1(Yi−Y^i)2 when comparing a model with just an intercept (denominator) to the model with the intercept and slope (numerator)?
fit9 <- lm (mpg ~ wt -1, data = mtcars)
fit91 <- lm (mpg ~ 1,data = mtcars)
plot9 <- ggplot(data = mtcars, aes(x = wt, y = mpg))
plot9 + geom_point() + geom_smooth(method = "lm", se = FALSE)+
geom_abline(slope = fit9$coefficients[[1]], col = "green")+
geom_hline(yintercept = fit91$coefficients[[1]], col = "red")
sum(residuals(fit3)^2) / sum((mtcars$mpg-mean(mtcars$mpg))^2)
## [1] 0.2471672
Do the residuals always have to sum to 0 in linear regression?