Coursera Regression Models Quiz 2

Question 1

Consider the following data with x as the predictor and y as as the outcome.

X <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62)  
Y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36)

Give a P-value for the two sided hypothesis test of whether \(β_1\) from a linear regression model is 0 or not.

Solutions
There are two methods to solve this.
1. using the numeric equation given as:
\[\beta_1 = cor(Y, X) \frac{sd(Y)} {sd(X)}\] \[\beta_0 = \bar X - \beta_1\bar X\]
\[\epsilon = Y - \beta_0\ - \beta_1X\]
\[\sigma = \sqrt\frac{\sum \epsilon^2}{n - 2}\]
\[SSX = {\sum(X -\bar X)^2}\]
\[\hat\beta_1 = \frac{\sigma}{\sqrt {SSX}}\]
\[t_{\beta_1} = \frac{\beta_1}{\hat\beta_1}\]
\[P_{\beta_1} = 2\times pt(abs(t_{\beta_1}), df = n-2, lower.tail = FALSE)\]
2. Using R function
This is used for the solution

fit <- lm(Y ~ X)
coefTable <- coef(summary(fit))
pval <- coefTable[2, 4]
pval

## [1] 0.05296439

summary(fit)$coef

##              Estimate Std. Error   t value   Pr(>|t|)
## (Intercept) 0.1884572  0.2061290 0.9142681 0.39098029
## X           0.7224211  0.3106531 2.3254912 0.05296439

summary(fit)$coef[2,4]

## [1] 0.05296439

Question 2

Consider the previous problem, give the estimate of the residual standard deviation.
For this solution \(\sigma\) listed above can be used. For now Rfunction is used as:

sigma<- summary(fit)$sigma

The residual standard deviation is 0.223.

Question 3

In the \(mtcars\) data set, fit a linear regression model of weight (predictor) on mpg (outcome).
Get a 95% confidence interval for the expected mpg at the average weight. What is the lower endpoint?

data(mtcars)
fit<- lm(mpg ~ I(wt - mean(wt)), data = mtcars)
confint(fit)

##                      2.5 %    97.5 %
## (Intercept)      18.990982 21.190268
## I(wt - mean(wt)) -6.486308 -4.202635

x<- mtcars$wt
y<- mtcars$mpg
predict(lm(y ~ x), newdata = data.frame(x = mean(x)), interval = ("confidence"))

##        fit      lwr      upr
## 1 20.09062 18.99098 21.19027

Question 4

Refer to the previous question. Read the help file for mtcars. What is the weight coefficient interpreted as?

Solution
Since variable wt has unit (lb/1000), the coefficient is interpreted as the estimated expected change in mpg per 1,000 lb increase in weight.

Question 5

Consider again the \(mtcars\) data set and a linear regression model with \(mpg\) as predicted by weight (1,000 lbs). A new car is coming weighing 3000 pounds.
Construct a 95% prediction interval for its \(mpg\).
What is the upper endpoint?

predict(lm(y ~ x), newdata = data.frame(x = 3), interval = ("prediction"))

##        fit      lwr      upr
## 1 21.25171 14.92987 27.57355

Question 6

Consider again the \(mtcars\) data set and a linear regression model with \(mpg\) as predicted by weight (in 1,000 lbs). A “short” ton is defined as 2,000 lbs.
Construct a 95% confidence interval for the expected change in mpg per 1 short ton increase in weight.
Give the lower endpoint.

fit<- lm(y ~ x, mtcars)
confint(fit)[2,]*2

##     2.5 %    97.5 % 
## -12.97262  -8.40527

# or equivalently change the unit
fit<- lm(y ~ I(x*0.5))
confint(fit)[2, ]

##     2.5 %    97.5 % 
## -12.97262  -8.40527

Question 7

If my \(X\) from a linear regression is measured in centimeters and I convert it to meters what would happen to the slope coefficient?
Solution
Consider the previous data, x and y. Assume their units is centimeter and divide the value of x with 100 to change into meter.

# before convert to meter
fit <- lm(y ~ x)
fit$coefficients[2]

##         x 
## -5.344472

#after converted to meter
fit<- lm(y ~ I(x/100))
fit$coefficients[2]

##  I(x/100) 
## -534.4472

The coefficient get multiplied by 100. Hence, it would get multiplied by 100.

Question 8

I have an outcome, \(Y\), and a predictor, \(X\) and fit a linear regression model with \(Y=\beta_0 +\beta_1X+\epsilon\) to obtain \(\beta_0\) and \(\beta_1\). What would be the consequence to the subsequent slope and intercept if I were to refit the model with a new regressor, \(X+c\) for some constant, \(c\)?

Solution
\(Y = \beta_0 + \beta_1X +\epsilon\) when c added to
\(Y = \beta_0 -\beta_1c +\beta_1c + \beta_1X +\epsilon\) and bring \(+\beta_1\) together,
\(Y = \beta_0 -\beta_1c +\beta_1(X +c) +\epsilon\)
Hence, the new intercept would be \(\hat\beta_0 -\hat\beta_1c\).

Question 9

Refer back to the \(mtcars\) data set with \(mpg\) as an outcome and weight (\(wt\)) as the predictor. About what is the ratio of the the sum of the squared errors, \(\sum_{i-1}^n(Y_i-\hat Y_i)^2\) when comparing a model with just an intercept (denominator) to the model with the intercept and slope (numerator)?

Solution

fit0<- lm(y ~ x)
1-summary(fit0)$r.squared

## [1] 0.2471672

#or
fit1<- lm(y ~1)
sse0<- sum((predict(fit0)-y)^2)
sse1<- sum((predict(fit1)-y)^2)
sse0/sse1

## [1] 0.2471672

# or
fit2 <- lm(y ~ x)
sum(resid(fit2)^2) / sum((y - mean(y))^2)

## [1] 0.2471672

Question 10

Do the residuals always have to sum to 0 in linear regression?