Answer all questions completely. Explain in complete sentences, do not just submit code. See the Day5PracticeSolution.Rmd file for an example of what is expected. You may use any R functions to help answer the questions.
Copier maintenance The users of the copiers are either training institutions that use a small model, or business firms that use a large, commercial model. An analyst at Tri-City wishes to fit a regression model including both number of copiers serviced (\(X_1\)) and type of copier (\(X_2\)) as predictor variables and estimate the effect of copier model (S-small,L-large) on number of minutes spent on the service call. Assume the following regression model is appropriate \[Y_i=\beta_0+\beta_1X_{i1}+\beta_2X_{i2}+\varepsilon_i\] and let \(X_2=1\) if small model and 0 if large, commercial model.
a) Explain the meaning of all regression coefficients in the model.
\(\beta_0\) has no meaning in this model because it would not make sense for the number of copiers serviced to be 0 and \(\beta_0\) to be non-zero. For every increase in the number of copiers serviced, the number of minutes spent on the service call changes by \(\beta_1\). If the type of copier is small, then the number of minutes spent on the service call changes by \(\beta_2\).
b) Fit the regression model and state the estimated regression function.
##
## Call:
## lm(formula = Y ~ ., data = myData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.5390 -4.2515 0.5995 6.5995 14.9330
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.9225 3.0997 -0.298 0.767
## X1 15.0461 0.4900 30.706 <2e-16 ***
## X2 0.7587 2.7799 0.273 0.786
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.011 on 42 degrees of freedom
## Multiple R-squared: 0.9576, Adjusted R-squared: 0.9556
## F-statistic: 473.9 on 2 and 42 DF, p-value: < 2.2e-16
The estimated regression function is \(\hat{y} = -0.9225 + 15.0461X_1 + 0.7587X_2\).
c) Estimate the effect of copier model on mean service time with a 95 percent confidence interval. Interpret your interval estimate.
## 2.5 % 97.5 %
## (Intercept) -7.177891 5.332945
## X1 14.057283 16.035004
## X2 -4.851254 6.368698
We are 95% confident that having a small copier model will effect the mean service time by a value between -4.85 and 6.37
d) Why would the analyst wish to include \(X_1\), number of copiers, in the regression model when interest is in estimating the effect of type of copier model on service time?
An analyst may wish to include \(X_1\) in the regression model when interest is in estimating the effect of type of copier model on service time because the number of copiers may influence service time in conjunction with the type of copier model.
e) Obtain the residuals and plot them against \(X_1X_2\). Is there any indication that an interaction term in the regression model would be helpful?
## Warning: package 'ggplot2' was built under R version 3.5.3
The residuals do not look randomly distributed so there is indication that an interaction term may be helpful.
f) Fit the regression with the interaction term. State the estimated regression function.
##
## Call:
## lm(formula = Y ~ X1 * X2, data = myData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.2072 -6.7887 -0.1708 7.1504 14.7441
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.8131 3.6468 0.771 0.4449
## X1 14.3394 0.6146 23.333 <2e-16 ***
## X2 -8.1412 5.5801 -1.459 0.1522
## X1:X2 1.7774 0.9746 1.824 0.0755 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.771 on 41 degrees of freedom
## Multiple R-squared: 0.9608, Adjusted R-squared: 0.9579
## F-statistic: 334.6 on 3 and 41 DF, p-value: < 2.2e-16
The estimated regression function is \(\hat{y} = 2.81 + 14.34X_1 - 8.14X_2 + 1.78X_1X_2\).
g) Test whether the interaction term can be dropped from the model; control the \(\alpha\) risk at \(.10\). State the alternatives, decision rule, and conclusion. What is the \(P\)-value of the test?
## [1] 1052.664
## [1] 2.825999
## [1] 7.20254e-32
\(H_0:\beta_1=\beta_2=\beta _3 = 0, H_a:\beta _1 \neq 0, \beta _2 \neq 0, \beta _3 \neq 0.\) Decision Rule: If \(F^* \leq 2.825999\), conclude \(H_0\), else conclude \(H_a\).
Since \(F^* > 2.825999\), we reject \(H_0\) and conclude that \(\beta_1, \beta_2,\) and \(\beta_3\) is significantly different from zero (\(p=7.203 \times 10^{-32}\)). That is, the interaction term should not be dropped from the model.
h) Using ggplot2, plot the data with \(X_1\) on the x-axis and \(Y\) on the y-axis. Use different colors for the different levels of \(X_2\). Be sure to add the linear regression line found in part (f) to the plot.
Since there are two unique values of \(X_2\) there are 2 different regression lines between \(X_1\) and Y. The line marked red represents the regression line where \(X_2=0\) and the line marked blue represents the regression line where \(X_2=1\).
Define \(Y\) as follows.
a) Prepare a conditional effects plot of the response function against \(X_1\) when \(X_2=1\) and when \(X_2=3\). How is the lack of interaction effect of \(X_1\) and \(X_2\) on \(Y\) apparent from this graph?
The lack of interaction effect is evident by the lack of difference in the two lines. That is, the two lines are parallel to each other.
b) Plot a set of contour curves for the response surface. How is the lack of interaction effect of \(X_1\) and \(X_2\) on \(Y\) apparent from this graph?
The presence of parallel lines in the contour plot shows the lack of interaction between X1 and X2.
Define \(Y\) as follows.
a) Prepare a conditional effects plot of the response function against \(X_1\) when \(X_2=1\) and when \(X_2=3\). How is the interaction effect of \(X_1\) and \(X_2\) on \(Y\) apparent from this graph?
The interaction effect is evident by the difference in the 2 lines. That is, the two lines have a different intercept and slope.
b) Plot a set of contour curves for the response surface. How is the interaction effect of \(X_1\) and \(X_2\) on \(Y\) apparent from this graph?
The lack of parallel lines in the contour plot shows the interaction between X1 and X2.
Conclusion
Finally, explain how the plots above can be used to determine if an interaction term is present.
To determine if an interaction term is present, the conditional effects plot needs to show lines that are different in slope and intercept and the contour plot needs to show a lack of parallel lines.