Copier maintenance The users of the copiers are either training institutions that use a small model, or business firms that use a large, commercial model. An analyst at Tri-City wishes to fit a regression model including both number of copiers serviced (\(X_1\)) and type of copier (\(X_2\)) as predictor variables and estimate the effect of copier model (S-small,L-large) on number of minutes spent on the service call. Assume the following regression model is appropriate \[Y_i=\beta_0+\beta_1X_{i1}+\beta_2X_{i2}+\varepsilon_i\] and let \(X_2=1\) if small model and 0 if large, commercial model.

y = The responce variable X1 = The first predictor variable X2 = The second predictor variable e = the residual error (unmeasured variable) B0 = Y intercept B1 = First regression coefficient B2 = second regression coefficient

b. Fit the regression model and state the estimated regression function. Model1 = lm(data = myData, Y ~ X + X2) summary(Model1)

Large Coppier: If the number of coppiers serived increases by 1 unit then number of minutes spend on the phone increase 15.0461. Y = - 0.9225 + 15.0461 (Coppiers Serviced)

Small Coppier: If the coppier is a small coppier time on a service call is will increase that time by .7587 in comparison to large coppiers. Y= -0.1638 + 15.0461 (Coppiers Serviced)

c. Estimate the effect of copier model on mean service time with a 95 percent confidence interval. Interpret your interval estimate.

With 95% confidence we can be sure that service time will fall between 14.057283 and 16.035004.

d. Why would the analyst wish to include \(X_1\), number of copiers, in the regression model when interest is in estimating the effect of type of copier model on service time?

PRESS Stat

m1 <- resid(Model1) m2 <- resid(Model2)

pr1 <- m1/(1 - lm.influence(Model1)\(hat) pr2 <- m2/(1 - lm.influence(Model2)\)hat)

sum(pr1^{2) sum(pr2}2)

Without X in the model the model does a very poor job at explaining the data. By comparing the Adj. R^2 Model1 = 0.9556 Model2 = -0.01809 we can see the dramatic differnce in explaining the data. Also, looking at the PRESS stat Model1 = 3981.195 and Model2 = 87929.27. We can clearly see that Model1 is a much better model.

e. Obtain the residuals and plot them against \(X_1X_2\). Is there any indication that an interaction term in the regression model would be helpful?

plot(resid(Model1))

Int = myData\(X*myData\)X2

plot(Int, resid(Model1))

Based on the residual plot with the interaction term and the original resudual it appears that it would not be benifital for the model to add in an interaction term.

Interaction Simulation

#Creates X1, X2 and errors
X1 <- rnorm(50,5,3)
X2 <- rnorm(50,2,1)
error <- rnorm(50,0,1)

Without Interaction

Define \(Y\) as follows.

## Without interaction
Y<-2+4*X1+10*X2+error

a. Prepare a conditional effects plot of the response function against \(X_1\) when \(X_2=1\) and when \(X_2=3\). How is the lack of interaction effect of \(X_1\) and \(X_2\) on \(Y\) apparent from this graph? library(ggplot2) y1 = 2+4X1 + 101 y2 = 2+ 4X1 +103 ggplot()+geom_line(aes(X1,y1))+geom_line(aes(X1,y2))

Based on this graph both lines appear to be parallel and have the same slope with differnt intercepts.

b. Plot a set of contour curves for the response surface. How is the lack of interaction effect of \(X_1\) and \(X_2\) on \(Y\) apparent from this graph?

Xi1<–10:10 Xi2<–10:10 y<-matrix(nrow=length(Xi1),ncol=length(Xi2)) for(i in 1:length(Xi1)){ for(j in 1:length(Xi2)){ y[i,j]<-2+4Xi1[i]+10Xi2[j] } } contour(Xi1,Xi2,y,xlab=“x1”,ylab=“x2”)

It is apparent that there are no interactions because the lines are parallel meaning that they are all doing the same thing. So as x2 is decreasing X1 is increasing. The difference between the lines is the difference in expected y value.

With Interaction

Define \(Y\) as follows.

## With interaction
Y<-2+4*X1+10*X2+5*X1*X2+error

a. Prepare a conditional effects plot of the response function against \(X_1\) when \(X_2=1\) and when \(X_2=3\). How is the interaction effect of \(X_1\) and \(X_2\) on \(Y\) apparent from this graph?

Y1 = 2+4X1+101+5X11 Y2 = 2+4X1+103+5X13 ggplot()+geom_line(aes(X1,Y1))+geom_line(aes(X1,Y2))

These lines do not have the same slope or the same intercept.

b. Plot a set of contour curves for the response surface. How is the interaction effect of \(X_1\) and \(X_2\) on \(Y\) apparent from this graph?

X1<–10:10 X2<–10:10 y<-matrix(nrow=length(X1),ncol=length(X2)) for(i in 1:length(X1)){ for(j in 1:length(X2)){ y[i,j]<-2+4X1[i]+10X2[j]+5X1[i]X2[j] } } contour(X1,X2,y,xlab=“x1”,ylab=“x2”)

Observed by this plot we can clearly see that there are interacitons present. For non interactions the distance is the same across different x1 values where as this is different for the interaction effects. Conclusion

Finally, explain how the plots above can be used to determine if an interaction term is present.

We can use the controue curves to better understand if there are interactions in our data. It can be understood that there are no interaction terms when lines are parallel. On the other hand when there are interaction terms we can easily see how the lines that are not parallel could be explaining interaction effects.

Day 7 Homework

Due: May 19, 2017

PRESS Stat

Interaction Simulation