Regression Modelling

my_data<-data.frame(
x1=c(60,62,67,70,71,72,75,78),
x2=c(22,25,24,20,15,14,14,11),

y=c(140,155,159,179,192,200,212,215))

#i). Plot the observed values of y as a function of x1 and x2 . Does it seem reasonablethat either x1 or x2 can describe the variation in y?

par(mfrow=c(1,2))
plot(my_data$x1, my_data$y, xlab="x1", ylab="y")
plot(my_data$x2, my_data$y, xlab="x2", ylab="y")

There does not seem to be a strong relation between y and x1 or x2.

#ii). Calculate the parameter estimates βˆ0 , βˆ1 , βˆ2 , and σˆ2 and fit a regression line. In addition find the 95% confidence intervals for β0 , β1 , & β2 .

fit_lim<- lm(y ~ x1 + x2, data=my_data)
summary(fit_lim)

## 
## Call:
## lm(formula = y ~ x1 + x2, data = my_data)
## 
## Residuals:
##       1       2       3       4       5       6       7       8 
## -5.5709  8.1017 -5.2939 -1.3622  0.2092  3.4052  5.9615 -5.4506 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  -6.8675    74.3120  -0.092   0.9300  
## x1            3.1479     0.8387   3.753   0.0132 *
## x2           -1.6561     0.9759  -1.697   0.1505  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.379 on 5 degrees of freedom
## Multiple R-squared:  0.9626, Adjusted R-squared:  0.9477 
## F-statistic: 64.37 on 2 and 5 DF,  p-value: 0.0002702

Getting the parameter estimates The parameter estimates are given in the first column of the coefficient matrix, β̂ 0=-6.8675 ,β̂ 1 =3.1479 β̂ 2, =-1.6561 and the error variance estimate is σ̂2 =6.379^2

Getting the confidence intervals

confint(fit_lim)

##                    2.5 %      97.5 %
## (Intercept) -197.8926674 184.1576929
## x1             0.9919648   5.3038214
## x2            -4.1648817   0.8525951

#iii). If appropriate, reduce the model using α = 0.05 confidence level and test the significance of the reduced model. Since the confidence interval for β2 cover zero (and the p-value is much larger than 0.05, the parameter should be removed from the model to get the simpler model yi = β 0 + β 1 x1 + ε i ,ε i ∼ N (0, σ2 ),the parameter estimates in the simpler model are and both parameters are now significant.

fit2 <- lm(y ~ x1, data=,my_data)
summary(fit2)

## 
## Call:
## lm(formula = y ~ x1, data = my_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.037  -4.686   1.571   5.787   6.936 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -124.131     31.326  -3.963  0.00743 ** 
## x1             4.405      0.450   9.790 6.54e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.31 on 6 degrees of freedom
## Multiple R-squared:  0.9411, Adjusted R-squared:  0.9313 
## F-statistic: 95.84 on 1 and 6 DF,  p-value: 6.537e-05

#iv). Carry out a residual analysis to check that the model assumptions are fulfilled. We are interested in inspecting a q-q plot of the residuals and a plot of the residuals as a function of the fitted values

par(mfrow=c(1,2))
qqnorm(fit2$residuals, pch=19)
qqline(fit2$residuals)
plot(fit2$fitted.values, fit2$residuals, pch=19,
xlab="Fitted.values", ylab="Residuals")

there are no strong evidence against the assumptions, the qq-plot is are a straight line and the are no obvious dependence between the residuals and the fitted values,and we conclude that the assumptions are not fulfilled.

Regression Modelling

Kelvin Nyongesa

2023-01-19