dat <- read.csv(file.choose())
deliverytime <- dat$DeliveryTime.min.
obs <- dat$Observation
numc <- dat$NumCases
distance <- dat$Distance.ft.
model <- lm(deliverytime~numc+distance+numc:distance)
summary(model)
## 
## Call:
## lm(formula = deliverytime ~ numc + distance + numc:distance)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7316 -1.5387  0.0606  1.4375  4.7841 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.1390846  1.3997413   5.100 4.73e-05 ***
## numc          1.0144063  0.1912517   5.304 2.93e-05 ***
## distance      0.0058273  0.0033825   1.723 0.099622 .  
## numc:distance 0.0007419  0.0001750   4.240 0.000366 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.449 on 21 degrees of freedom
## Multiple R-squared:  0.9782, Adjusted R-squared:  0.9751 
## F-statistic: 314.6 on 3 and 21 DF,  p-value: < 2.2e-16

1) What are your estimates of the regression parameters and what is the associated value of R2?

Here are the model parameters:

model$coefficients
##   (Intercept)          numc      distance numc:distance 
##  7.1390845734  1.0144062540  0.0058273479  0.0007419211

Here is the r-square:

summary(model)$r.squared
## [1] 0.9782308

2) What do you notice in the diagnostic plots (all of them)

plot(model)

From the residual and normal plots, it seems that normality and constance variance assumptions are upheld. From the standardized residual plot, it seems that point 11 may be an outlier. Furthermore, in residual vs leverage plot we can observe that point 9 has a cook’s number of greater than 1, suggesting it can be a point of influence.

3) Remove the observation(s) that appears to be the most influential

Here, we are only removing the point 9, because it has a high cook’s number (>1).

dat1 <- dat[-9,]

4) What are your estimates of the regression parameters what is the associated value of R2?

dt <- dat1$DeliveryTime.min.
dis <- dat1$Distance.ft.
numc1 <- dat1$NumCases
obs1 <- dat1$Observation
model1 <- lm(dt~numc1+dis+numc1:dis)
summary(model1)
## 
## Call:
## lm(formula = dt ~ numc1 + dis + numc1:dis)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8495 -1.3509 -0.0835  1.6174  4.9098 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.7984402  1.9709874   2.942 0.008062 ** 
## numc1       1.2660217  0.3229617   3.920 0.000848 ***
## dis         0.0080441  0.0040895   1.967 0.063212 .  
## numc1:dis   0.0003480  0.0004432   0.785 0.441497    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.452 on 20 degrees of freedom
## Multiple R-squared:  0.9502, Adjusted R-squared:  0.9428 
## F-statistic: 127.3 on 3 and 20 DF,  p-value: 3.368e-13

Here are the model parameters for the new model.

model1$coefficients
##  (Intercept)        numc1          dis    numc1:dis 
## 5.7984402036 1.2660216932 0.0080440801 0.0003480246

Here is the r-square value for the new model.

summary(model1)$r.squared
## [1] 0.9502353

5) What do you notice in the diagnostic plots (all of them) after removing this point?

plot(model1)

Residual vs fitted value plot looks similar to before removing, however, normal plot seems worst after removing point 9. On the other hand, from the standardized residual plot it seems that point 10 may still be an outlier. But, residuals vs leverage plot shows that none of the points have a high coock’s number (D<0.5).

6) Is there now another point that might be influential or that has leverage?

plot(model1,5)

This plot shows that no points are above cook’s distance=0.5 line, therefore it seems that we dont have any points of influence anymore. However, there is one point with a high leverage value (>0.6).

7) Do you believe that this point should have be removed? (explain)

No, because removing point 9 negativaly affected our r-square and normal plot.