dat <- read.csv(file.choose())
deliverytime <- dat$DeliveryTime.min.
obs <- dat$Observation
numc <- dat$NumCases
distance <- dat$Distance.ft.
model <- lm(deliverytime~numc+distance+numc:distance)
summary(model)
##
## Call:
## lm(formula = deliverytime ~ numc + distance + numc:distance)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7316 -1.5387 0.0606 1.4375 4.7841
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.1390846 1.3997413 5.100 4.73e-05 ***
## numc 1.0144063 0.1912517 5.304 2.93e-05 ***
## distance 0.0058273 0.0033825 1.723 0.099622 .
## numc:distance 0.0007419 0.0001750 4.240 0.000366 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.449 on 21 degrees of freedom
## Multiple R-squared: 0.9782, Adjusted R-squared: 0.9751
## F-statistic: 314.6 on 3 and 21 DF, p-value: < 2.2e-16
Here are the model parameters:
model$coefficients
## (Intercept) numc distance numc:distance
## 7.1390845734 1.0144062540 0.0058273479 0.0007419211
Here is the r-square:
summary(model)$r.squared
## [1] 0.9782308
plot(model)
From the residual and normal plots, it seems that normality and
constance variance assumptions are upheld. From the standardized
residual plot, it seems that point 11 may be an outlier. Furthermore, in
residual vs leverage plot we can observe that point 9 has a cook’s
number of greater than 1, suggesting it can be a point of influence.
Here, we are only removing the point 9, because it has a high cook’s number (>1).
dat1 <- dat[-9,]
dt <- dat1$DeliveryTime.min.
dis <- dat1$Distance.ft.
numc1 <- dat1$NumCases
obs1 <- dat1$Observation
model1 <- lm(dt~numc1+dis+numc1:dis)
summary(model1)
##
## Call:
## lm(formula = dt ~ numc1 + dis + numc1:dis)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8495 -1.3509 -0.0835 1.6174 4.9098
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.7984402 1.9709874 2.942 0.008062 **
## numc1 1.2660217 0.3229617 3.920 0.000848 ***
## dis 0.0080441 0.0040895 1.967 0.063212 .
## numc1:dis 0.0003480 0.0004432 0.785 0.441497
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.452 on 20 degrees of freedom
## Multiple R-squared: 0.9502, Adjusted R-squared: 0.9428
## F-statistic: 127.3 on 3 and 20 DF, p-value: 3.368e-13
Here are the model parameters for the new model.
model1$coefficients
## (Intercept) numc1 dis numc1:dis
## 5.7984402036 1.2660216932 0.0080440801 0.0003480246
Here is the r-square value for the new model.
summary(model1)$r.squared
## [1] 0.9502353
plot(model1)
Residual vs fitted value plot looks similar to before removing, however, normal plot seems worst after removing point 9. On the other hand, from the standardized residual plot it seems that point 10 may still be an outlier. But, residuals vs leverage plot shows that none of the points have a high coock’s number (D<0.5).
plot(model1,5)
This plot shows that no points are above cook’s distance=0.5 line, therefore it seems that we dont have any points of influence anymore. However, there is one point with a high leverage value (>0.6).
No, because removing point 9 negativaly affected our r-square and normal plot.