Q1

Consider the following data with x as the predictor and y as as the outcome. x <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62) y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36) Give a P-value for the two sided hypothesis test of whether ??1 from a linear regression model is 0 or not.

x <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62)
y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36)
fit<- lm(y~x)
summary(fit)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27636 -0.18807  0.01364  0.16595  0.27143 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.1885     0.2061   0.914    0.391  
## x             0.7224     0.3107   2.325    0.053 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.223 on 7 degrees of freedom
## Multiple R-squared:  0.4358, Adjusted R-squared:  0.3552 
## F-statistic: 5.408 on 1 and 7 DF,  p-value: 0.05296

Q2

Consider the previous problem, give the estimate of the residual standard deviation.

summary(fit)$sigma

## [1] 0.2229981

Q3

In the mtcars data set, fit a linear regression model of weight (predictor) on mpg (outcome). Get a 95% confidence interval for the expected mpg at the average weight. What is the lower endpoint?

x<-mtcars$wt
y<-mtcars$mpg
fit<-lm(y ~ x)
predict(fit,data.frame(x=mean(x)), interval="confidence")

##        fit      lwr      upr
## 1 20.09062 18.99098 21.19027

Q5

Consider again the mtcars data set and a linear regression model with mpg as predicted by weight (1,000 lbs). A new car is coming weighing 3000 pounds. Construct a 95% prediction interval for its mpg. What is the upper endpoint?

y<- mtcars$mpg
x<- mtcars$wt
fit<- lm(y~x)
predict(fit,data.frame(x=3),interval=("prediction"))

##        fit      lwr      upr
## 1 21.25171 14.92987 27.57355

Q6

y<- mtcars$mpg
x<- mtcars$wt
fit<- lm(y~I(x/2)) #Make an AsIs Class to represent short ton
mean<- summary(fit)$coefficients[2,1]
sd<- summary(fit)$coefficients[2,2]
deg_fr<- fit$df
#Two sides T-Tests
mean+c(-1,1)*qt(0.975,df=deg_fr)*sd

## [1] -12.97262  -8.40527

Q7

If my X from a linear regression is measured in centimeters and I convert it to meters what would happen to the slope coefficient?

y<- mtcars$mpg
x<- mtcars$wt
fit<- lm(y~x)
summary(fit)$coefficients

##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.285126   1.877627 19.857575 8.241799e-19
## x           -5.344472   0.559101 -9.559044 1.293959e-10

fit1<- lm(y~I(x/100))
summary(fit1)$coefficients

##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)   37.28513   1.877627 19.857575 8.241799e-19
## I(x/100)    -534.44716  55.910105 -9.559044 1.293959e-10

Q8

I have an outcome, Y, and a predictor, X and fit a linear regression model with Y=??0+??1X+?? to obtain ??^0 and ??^1. What would be the consequence to the subsequent slope and intercept if I were to refit the model with a new regressor, X+c for some constant, c?

y<- mtcars$mpg
x<- mtcars$wt
fit<- lm(y~x)
summary(fit)$coefficients

##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.285126   1.877627 19.857575 8.241799e-19
## x           -5.344472   0.559101 -9.559044 1.293959e-10

fit1<- lm(y~I(x+2))
b0<- summary(fit)$coefficients[1,1]
b1<- summary(fit)$coefficients[2,1]
round(summary(fit1)$coefficients[1,1])==round(b0-2*b1)

## [1] TRUE

Q9

Refer back to the mtcars data set with mpg as an outcome and weight (wt) as the predictor. About what is the ratio of the the sum of the squared errors, ???ni=1(Yi???Y^i)2 when comparing a model with just an intercept (denominator) to the model with the intercept and slope (numerator)?

#Model with Slope & Intercept v.s. Model with only Intercept v.s. Only Slope
fit5<-lm(y ~ 1)
fit6<-lm(y ~ x - 1)
plot(x,y)
abline(fit,col="red")
abline(fit5,col="blue")
abline(fit6,col="green")

anova(fit)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## x          1 847.73  847.73  91.375 1.294e-10 ***
## Residuals 30 278.32    9.28                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(fit5)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 31   1126  36.324

278/1126

## [1] 0.2468917

Q10

Do the residuals always have to sum to 0 in linear regression?

sum(resid(fit)) #both intercept and slope

## [1] -1.637579e-15

sum(resid(fit5)) #only intercept

## [1] -5.995204e-15

sum(resid(fit6)) #only slope

## [1] 98.11672

How can we measure which one is the best model? Use Sigma or R^2

summary(fit)$sigma   #both intercept and slope

## [1] 3.045882

summary(fit5)$sigma  #only intercept

## [1] 6.026948

summary(fit6)$sigma  #only slope

## [1] 11.26888