Packages needed on this document:
library(knitr) #For table plotting
library(ggplot2) #For graphical environment
library(gridExtra) #Multiple graphics environment
library(grid) # " " "
library(ggpubr) # " " "
library(scatterplot3d) # " " "
library(MPV) #Datasets from Montegomery's book
Loading the delivery time data.
data("softdrink")
A soft drink bottler is analyzing the vending machine service routes in his distribution system. He is interested in predicting the amount of time required by the route driver to service the vending machines in an outlet. This service activity includes stocking the machine with beverage products and minor maintenance or housekeeping. The industrial engineer responsible for the study has suggested that the two most important variables affecting the delivery time (y) are the number of cases of product stocked (x1) and the distance walked by the route driver (x2). The engineer has collected 25 observations on delivery time, which are shown in the following table. We will fit the simple and multiple linear regression models.
colnames(softdrink) <- c("DeliveryTime", "NumberCases", "Distance")
kable(softdrink)
| DeliveryTime | NumberCases | Distance |
|---|---|---|
| 16.68 | 7 | 560 |
| 11.50 | 3 | 220 |
| 12.03 | 3 | 340 |
| 14.88 | 4 | 80 |
| 13.75 | 6 | 150 |
| 18.11 | 7 | 330 |
| 8.00 | 2 | 110 |
| 17.83 | 7 | 210 |
| 79.24 | 30 | 1460 |
| 21.50 | 5 | 605 |
| 40.33 | 16 | 688 |
| 21.00 | 10 | 215 |
| 13.50 | 4 | 255 |
| 19.75 | 6 | 462 |
| 24.00 | 9 | 448 |
| 29.00 | 10 | 776 |
| 15.35 | 6 | 200 |
| 19.00 | 7 | 132 |
| 9.50 | 3 | 36 |
| 35.10 | 17 | 770 |
| 17.90 | 10 | 140 |
| 52.32 | 26 | 810 |
| 18.75 | 9 | 450 |
| 19.83 | 8 | 635 |
| 10.75 | 4 | 150 |
Graphics can be very useful in fitting regression models. On the following is the scatter plot grid of the delivery time data.
p <- ggplot(aes(x = NumberCases, y = DeliveryTime), data = softdrink) +
geom_point(color = 'Blue') + theme_bw() + labs(caption = "Correlation coefficient = 0.96")
q <- ggplot(aes(x = Distance, y = DeliveryTime), data = softdrink) +
geom_point(color = 'Blue') + theme_bw() + labs(caption = "Correlation coefficient = 0.89")
w <- ggplot(aes(x = NumberCases, y = Distance), data = softdrink) +
geom_point(color = 'Blue') + theme_bw() + labs(caption = "Correlation coefficient = 0.82")
grid.arrange(p, q, w, ncol = 3, top = "Scatter diagram")
grid.arrange(ggarrange(p, q, ncol = 2), w, nrow = 2, top = "Scatter diagram") #A different way to plot
plot(softdrink, pch = 16, col = "Blue") #The most famous way
From the scatter diagram above, one can see that it seems to exist a positive linear correlation between the variables delivery time and number of cases, and delivery time and distance. Then, now let’s build two simple linear models:
First one: \(y = \beta_0 + \beta_1 x_1 + \epsilon\)
Second one: \(y = \beta_0 + \beta_1 x_2 + \epsilon\)
lm1 <- lm(DeliveryTime ~ NumberCases, data = softdrink) #Fitting the model
print(lm1) #Getting the estimated values of the intercept and beta
##
## Call:
## lm(formula = DeliveryTime ~ NumberCases, data = softdrink)
##
## Coefficients:
## (Intercept) NumberCases
## 3.321 2.176
pfit <- ggplot(aes(x = NumberCases, y = DeliveryTime), data = softdrink)
pfit + geom_point(color = 'Black') + geom_abline(intercept = 3.321, slope = 2.176, color = "Red") + theme_bw()
lm2 <- lm(DeliveryTime ~ Distance, data = softdrink)
print(lm2)
##
## Call:
## lm(formula = DeliveryTime ~ Distance, data = softdrink)
##
## Coefficients:
## (Intercept) Distance
## 4.96116 0.04257
qfit <- ggplot(aes(x = Distance, y = DeliveryTime), data = softdrink)
qfit + geom_point(color = 'Black') + geom_abline(intercept = 4.96116, slope = 0.04257, color = "Red") + theme_bw()
Now let’s build a model to explain the delivery time by using both number of cases and distance. So the model is: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon\)
lm3 <- lm(DeliveryTime ~ NumberCases + Distance, data = softdrink) #Fitting the model
print(lm3)
##
## Call:
## lm(formula = DeliveryTime ~ NumberCases + Distance, data = softdrink)
##
## Coefficients:
## (Intercept) NumberCases Distance
## 2.34123 1.61591 0.01438
lm3p <- scatterplot3d(softdrink, type = "h", color = "Blue", angle = 45, pch = 16, main = "Scatter plot 3d") #Creating the scatter plot
lm3.plane <- lm3 #Creating the regression plane
lm3p$plane3d(lm3.plane) #Adding the plane to the scatter plot