Consider the following data with x as the predictor and y as as the outcome.
X <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62)
Y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36)
Give a P-value for the two sided hypothesis test of whether \(β_1\) from a linear regression model is 0 or not.
Solutions
There are two methods to solve this.
1. using the numeric equation given as:
\[\beta_1 = cor(Y, X) \frac{sd(Y)} {sd(X)}\] \[\beta_0 = \bar X - \beta_1\bar X\]
\[\epsilon = Y - \beta_0\ - \beta_1X\]
\[\sigma = \sqrt\frac{\sum \epsilon^2}{n - 2}\]
\[SSX = {\sum(X -\bar X)^2}\]
\[\hat\beta_1 = \frac{\sigma}{\sqrt {SSX}}\]
\[t_{\beta_1} = \frac{\beta_1}{\hat\beta_1}\]
\[P_{\beta_1} = 2\times pt(abs(t_{\beta_1}), df = n-2, lower.tail = FALSE)\]
2. Using R function
This is used for the solution
fit <- lm(Y ~ X)
coefTable <- coef(summary(fit))
pval <- coefTable[2, 4]
pval
## [1] 0.05296439
OR
summary(fit)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1884572 0.2061290 0.9142681 0.39098029
## X 0.7224211 0.3106531 2.3254912 0.05296439
summary(fit)$coef[2,4]
## [1] 0.05296439
Consider the previous problem, give the estimate of the residual standard deviation.
For this solution \(\sigma\) listed above can be used. For now Rfunction is used as:
sigma<- summary(fit)$sigma
The residual standard deviation is 0.223.
In the \(mtcars\) data set, fit a linear regression model of weight (predictor) on mpg (outcome).
Get a 95% confidence interval for the expected mpg at the average weight. What is the lower endpoint?
data(mtcars)
fit<- lm(mpg ~ I(wt - mean(wt)), data = mtcars)
confint(fit)
## 2.5 % 97.5 %
## (Intercept) 18.990982 21.190268
## I(wt - mean(wt)) -6.486308 -4.202635
OR
x<- mtcars$wt
y<- mtcars$mpg
predict(lm(y ~ x), newdata = data.frame(x = mean(x)), interval = ("confidence"))
## fit lwr upr
## 1 20.09062 18.99098 21.19027
Refer to the previous question. Read the help file for mtcars. What is the weight coefficient interpreted as?
Solution
Since variable wt has unit (lb/1000), the coefficient is interpreted as the estimated expected change in mpg per 1,000 lb increase in weight.
Consider again the \(mtcars\) data set and a linear regression model with \(mpg\) as predicted by weight (1,000 lbs). A new car is coming weighing 3000 pounds.
Construct a 95% prediction interval for its \(mpg\).
What is the upper endpoint?
predict(lm(y ~ x), newdata = data.frame(x = 3), interval = ("prediction"))
## fit lwr upr
## 1 21.25171 14.92987 27.57355
Consider again the \(mtcars\) data set and a linear regression model with \(mpg\) as predicted by weight (in 1,000 lbs). A “short” ton is defined as 2,000 lbs.
Construct a 95% confidence interval for the expected change in mpg per 1 short ton increase in weight.
Give the lower endpoint.
fit<- lm(y ~ x, mtcars)
confint(fit)[2,]*2
## 2.5 % 97.5 %
## -12.97262 -8.40527
# or equivalently change the unit
fit<- lm(y ~ I(x*0.5))
confint(fit)[2, ]
## 2.5 % 97.5 %
## -12.97262 -8.40527
If my \(X\) from a linear regression is measured in centimeters and I convert it to meters what would happen to the slope coefficient?
Solution
Consider the previous data, x and y. Assume their units is centimeter and divide the value of x with 100 to change into meter.
# before convert to meter
fit <- lm(y ~ x)
fit$coefficients[2]
## x
## -5.344472
#after converted to meter
fit<- lm(y ~ I(x/100))
fit$coefficients[2]
## I(x/100)
## -534.4472
The coefficient get multiplied by 100. Hence, it would get multiplied by 100.
I have an outcome, \(Y\), and a predictor, \(X\) and fit a linear regression model with \(Y=\beta_0 +\beta_1X+\epsilon\) to obtain \(\beta_0\) and \(\beta_1\). What would be the consequence to the subsequent slope and intercept if I were to refit the model with a new regressor, \(X+c\) for some constant, \(c\)?
Solution
\(Y = \beta_0 + \beta_1X +\epsilon\) when c added to
\(Y = \beta_0 -\beta_1c +\beta_1c + \beta_1X +\epsilon\) and bring \(+\beta_1\) together,
\(Y = \beta_0 -\beta_1c +\beta_1(X +c) +\epsilon\)
Hence, the new intercept would be \(\hat\beta_0 -\hat\beta_1c\).
Refer back to the \(mtcars\) data set with \(mpg\) as an outcome and weight (\(wt\)) as the predictor. About what is the ratio of the the sum of the squared errors, \(\sum_{i-1}^n(Y_i-\hat Y_i)^2\) when comparing a model with just an intercept (denominator) to the model with the intercept and slope (numerator)?
Solution
fit0<- lm(y ~ x)
1-summary(fit0)$r.squared
## [1] 0.2471672
#or
fit1<- lm(y ~1)
sse0<- sum((predict(fit0)-y)^2)
sse1<- sum((predict(fit1)-y)^2)
sse0/sse1
## [1] 0.2471672
# or
fit2 <- lm(y ~ x)
sum(resid(fit2)^2) / sum((y - mean(y))^2)
## [1] 0.2471672
Do the residuals always have to sum to 0 in linear regression?
Solution
Take \(mtcars\) data for explanation
x<- mtcars$wt
y<- mtcars$mpg
fit0 <- lm(y ~ x)
sum(resid(fit0))
## [1] -1.637579e-15
# with no intercept
fit1<- lm(y ~ x - 1)
sum(resid(fit1))
## [1] 98.11672
#with intercept
fit2<- lm(y ~ rep(1, length(y)))
sum(resid(fit2))
## [1] -5.995204e-15
Hence,if an intercept is included, then they will sum to 0.
\[================================Asse============================\]