The p-value for the coefficient of the interaction term is 0.0120. It is Statistically significant at 0.05, but it is not significant at 0.01.
latour <- read.table("/Users/shijiabian/Dropbox/STA 721 Linear Model/Homework/HW2/Latour.txt", header = T)
fit <- lm(formula = Quality ~ EndofHarvest + Rain + Rain:EndofHarvest, data = latour)
summary(fit)
##
## Call:
## lm(formula = Quality ~ EndofHarvest + Rain + Rain:EndofHarvest,
## data = latour)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.683 -0.570 0.127 0.439 1.635
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.1612 0.6892 7.49 3.9e-09 ***
## EndofHarvest -0.0314 0.0176 -1.79 0.082 .
## Rain 1.7867 1.3174 1.36 0.183
## EndofHarvest:Rain -0.0831 0.0316 -2.63 0.012 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.758 on 40 degrees of freedom
## Multiple R-squared: 0.685, Adjusted R-squared: 0.661
## F-statistic: 29 on 3 and 40 DF, p-value: 4.02e-10
If there is no unwanter rain happens, 31.85 days of delay to the end of the harvest can decrease the quality rating by one point.
If there is some unwanter rain happens, 8.73 days of delay to the end of the harvest can decrease the quality rating by one point.
\[ \begin{equation} \begin{split} \text{Quality} &= \hat{\beta_0} + \text{EndofHarvest} \times \hat{\beta_1} + \text{Rain} \times \hat{\beta_2} + \text{Rain} \times \text{EndofHarvest} \times \hat{\beta_3}\\ & = \hat{\beta_0} + \text{EndofHarvest} \times \hat{\beta_1} \quad\text{No unwanted rain at harvest, R = 0}\\ & \text{or} \\ & = (\hat{\beta_0} + \hat{\beta_0}) + \text{EndofHarvest} \times (\hat{\beta_1} + \hat{\beta_3}) \quad \text{Some unwanted rain at harvest, R = 1} \end{split} \end{equation} \]
Therefore, \[ \begin{equation} \begin{split} \text{Quality} &= \hat{\beta_0} + \text{EndofHarvest} \times \hat{\beta_1} + \text{Rain} \times \hat{\beta_2} + \text{Rain} \times \text{EndofHarvest} \times \hat{\beta_3}\\ & = 5.16122 - 0.03145 \times \hat{\beta_1} \quad\text{No unwanted rain at harvest, R = 0}\\ & \text{or} \\ & = (5.16122 + 1.78670) + \text{EndofHarvest} \times (-0.03145 - 0.08314) \quad \text{Some unwanted rain at harvest, R = 1} \end{split} \end{equation} \]
attach(latour)
### For no rain at harvest
# Build the data frame specific for no rain, and extract x and y axis
latourNoRaine <- latour$Rain==0
NoRain <- latour[latourNoRaine,]
yNoRaine <- NoRain$Quality
xNoRaine <- NoRain$EndofHarvest
# Build the data frame specific for rain
latourRain <-latour$Rain==1
Rain <- latour[latourRain,]
yRaine <- Rain$Quality
xRaine <- Rain$EndofHarvest
# Rebuild the model for rain and no rain specifically
fitnoRain <- lm (yNoRaine~xNoRaine)
fitRain <- lm (yRaine~xRaine)
y = latour$Rain
par(mfrow=c(1,1))
plot(EndofHarvest,Quality,pch=y+1,col=y+1,xlab="End of Harvest (in days since August 31)")
legend(20, 2.5,legend=c("No","Yes"),pch=1:2,col=1:2,title="Rain at Harvest?")
legend(45, 5,legend=c("NoRainCI","RainCI","NoRainPI","RainPI"),lty=c(2,2,4,4),col= c("blue","purple",
"red", "orange"))
abline(fitnoRain,lty=1,col=1)
abline(fitRain,lty=2,col=2)
new <- data.frame(xNoRaine = seq(min(xNoRaine),max(xNoRaine), length.out=100))
pred.w.plim <- predict(lm (yNoRaine~xNoRaine), new, interval="prediction")
pred.w.clim <- predict(lm (yNoRaine~xNoRaine), new, interval="confidence")
# overlaying the C intervals
lines(new$xNoRaine,pred.w.clim[,2],col="blue",lty=2)
lines(new$xNoRaine,pred.w.clim[,3],col="blue",lty=2)
# overlaying the P intervals
lines(new$xNoRaine,pred.w.plim[,2],col="red",lty=4)
lines(new$xNoRaine,pred.w.plim[,3],col="red",lty=4)
newRain <- data.frame(xRaine = seq(min(xRaine),max(xRaine), length.out=100))
pred.r.plim <- predict(lm(yRaine~xRaine), newRain, interval="prediction")
pred.r.clim <- predict(lm(yRaine~xRaine), newRain, interval="confidence")
# overlaying the C intervals
lines(newRain$xRaine,pred.r.plim[,2],col="purple",lty=2)
lines(newRain$xRaine,pred.r.clim[,3],col="purple",lty=2)
# overlaying the P intervals
lines(newRain$xRaine,pred.r.plim[,2],col="orange",lty=4)
lines(newRain$xRaine,pred.r.clim[,3],col="orange",lty=4)
detach(latour)
### Citation
## Textbook Website <http://www.stat.tamu.edu/~sheather/book/docs/rcode/Chapter5.R>
## Plot Prediction and Confidence Interval in R <http://www.stat.sc.edu/~wang345/Teaching/STA704Fall2013/Code/Ch1_Toluca.R>
For both the rain and no rain case, the linear model, confidence interval and prediciton interval are negatively related. The overall rating for the no rain is higher than that have rain, this also matches our intuition. Please ignore the graph showing the x-axis is smaller than 20, because this is out of the range of the data. I ploted this way because I want to show the graph more clearly. We want to use confidence interval to test sampling error in relation to the parameter of interest: sample size increases, sampling error decreases and the interval becomes narrower. Confidence intervals does not show the distribution of individual values, it shows the the interested parameter. Given the specified settings of the predictors in the model, we use a prediction interval to predict the range that is likely to contain the response value of a single new observation.