my_data<-data.frame(
x1=c(60,62,67,70,71,72,75,78),
x2=c(22,25,24,20,15,14,14,11),
y=c(140,155,159,179,192,200,212,215))
#i). Plot the observed values of y as a function of x1 and x2 . Does it seem reasonablethat either x1 or x2 can describe the variation in y?
par(mfrow=c(1,2))
plot(my_data$x1, my_data$y, xlab="x1", ylab="y")
plot(my_data$x2, my_data$y, xlab="x2", ylab="y")
There does not seem to be a strong relation between y and x1 or x2.
#ii). Calculate the parameter estimates βˆ0 , βˆ1 , βˆ2 , and σˆ2 and fit a regression line. In addition find the 95% confidence intervals for β0 , β1 , & β2 .
fit_lim<- lm(y ~ x1 + x2, data=my_data)
summary(fit_lim)
##
## Call:
## lm(formula = y ~ x1 + x2, data = my_data)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -5.5709 8.1017 -5.2939 -1.3622 0.2092 3.4052 5.9615 -5.4506
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.8675 74.3120 -0.092 0.9300
## x1 3.1479 0.8387 3.753 0.0132 *
## x2 -1.6561 0.9759 -1.697 0.1505
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.379 on 5 degrees of freedom
## Multiple R-squared: 0.9626, Adjusted R-squared: 0.9477
## F-statistic: 64.37 on 2 and 5 DF, p-value: 0.0002702
Getting the parameter estimates The parameter estimates are given in the first column of the coefficient matrix, β̂ 0=-6.8675 ,β̂ 1 =3.1479 β̂ 2, =-1.6561 and the error variance estimate is σ̂2 =6.379^2
Getting the confidence intervals
confint(fit_lim)
## 2.5 % 97.5 %
## (Intercept) -197.8926674 184.1576929
## x1 0.9919648 5.3038214
## x2 -4.1648817 0.8525951
#iii). If appropriate, reduce the model using α = 0.05 confidence level and test the significance of the reduced model. Since the confidence interval for β2 cover zero (and the p-value is much larger than 0.05, the parameter should be removed from the model to get the simpler model yi = β 0 + β 1 x1 + ε i ,ε i ∼ N (0, σ2 ),the parameter estimates in the simpler model are and both parameters are now significant.
fit2 <- lm(y ~ x1, data=,my_data)
summary(fit2)
##
## Call:
## lm(formula = y ~ x1, data = my_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.037 -4.686 1.571 5.787 6.936
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -124.131 31.326 -3.963 0.00743 **
## x1 4.405 0.450 9.790 6.54e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.31 on 6 degrees of freedom
## Multiple R-squared: 0.9411, Adjusted R-squared: 0.9313
## F-statistic: 95.84 on 1 and 6 DF, p-value: 6.537e-05
#iv). Carry out a residual analysis to check that the model assumptions are fulfilled. We are interested in inspecting a q-q plot of the residuals and a plot of the residuals as a function of the fitted values
par(mfrow=c(1,2))
qqnorm(fit2$residuals, pch=19)
qqline(fit2$residuals)
plot(fit2$fitted.values, fit2$residuals, pch=19,
xlab="Fitted.values", ylab="Residuals")
there are no strong evidence against the assumptions, the qq-plot is are
a straight line and the are no obvious dependence between the residuals
and the fitted values,and we conclude that the assumptions are not
fulfilled.