Introduction

Packages needed on this document:

library(knitr)     #For table plotting
library(ggplot2)   #For graphical environment
library(gridExtra) #Multiple graphics environment
library(grid)      #    "       "         "
library(ggpubr)    #    "       "         "
library(scatterplot3d)     #    "       "         "
library(MPV)       #Datasets from Montegomery's book

Loading the delivery time data.

data("softdrink")

A soft drink bottler is analyzing the vending machine service routes in his distribution system. He is interested in predicting the amount of time required by the route driver to service the vending machines in an outlet. This service activity includes stocking the machine with beverage products and minor maintenance or housekeeping. The industrial engineer responsible for the study has suggested that the two most important variables affecting the delivery time (y) are the number of cases of product stocked (x1) and the distance walked by the route driver (x2). The engineer has collected 25 observations on delivery time, which are shown in the following table. We will fit the simple and multiple linear regression models.

colnames(softdrink) <- c("DeliveryTime", "NumberCases", "Distance")
kable(softdrink)
DeliveryTime NumberCases Distance
16.68 7 560
11.50 3 220
12.03 3 340
14.88 4 80
13.75 6 150
18.11 7 330
8.00 2 110
17.83 7 210
79.24 30 1460
21.50 5 605
40.33 16 688
21.00 10 215
13.50 4 255
19.75 6 462
24.00 9 448
29.00 10 776
15.35 6 200
19.00 7 132
9.50 3 36
35.10 17 770
17.90 10 140
52.32 26 810
18.75 9 450
19.83 8 635
10.75 4 150

Graphics can be very useful in fitting regression models. On the following is the scatter plot grid of the delivery time data.

p <- ggplot(aes(x = NumberCases, y = DeliveryTime), data = softdrink) + 
     geom_point(color = 'Blue') + theme_bw() + labs(caption = "Correlation coefficient = 0.96")

q <- ggplot(aes(x = Distance, y = DeliveryTime), data = softdrink) +
     geom_point(color = 'Blue') + theme_bw() + labs(caption = "Correlation coefficient = 0.89")

w <- ggplot(aes(x = NumberCases, y = Distance), data = softdrink) +
     geom_point(color = 'Blue') + theme_bw() + labs(caption = "Correlation coefficient = 0.82")

grid.arrange(p, q, w, ncol = 3, top = "Scatter diagram")

grid.arrange(ggarrange(p, q, ncol = 2), w, nrow = 2, top = "Scatter diagram")           #A different way to plot

plot(softdrink, pch = 16, col = "Blue")                                                 #The most famous way

Simple linear model

From the scatter diagram above, one can see that it seems to exist a positive linear correlation between the variables delivery time and number of cases, and delivery time and distance. Then, now let’s build two simple linear models:
First one: \(y = \beta_0 + \beta_1 x_1 + \epsilon\)
Second one: \(y = \beta_0 + \beta_1 x_2 + \epsilon\)

lm1 <- lm(DeliveryTime ~ NumberCases, data = softdrink)   #Fitting the model
print(lm1)                                                #Getting the estimated values of the intercept and beta
## 
## Call:
## lm(formula = DeliveryTime ~ NumberCases, data = softdrink)
## 
## Coefficients:
## (Intercept)  NumberCases  
##       3.321        2.176
pfit <- ggplot(aes(x = NumberCases, y = DeliveryTime), data = softdrink) 
pfit + geom_point(color = 'Black') + geom_abline(intercept = 3.321, slope = 2.176, color = "Red") + theme_bw() 

lm2 <- lm(DeliveryTime ~ Distance, data = softdrink)
print(lm2)
## 
## Call:
## lm(formula = DeliveryTime ~ Distance, data = softdrink)
## 
## Coefficients:
## (Intercept)     Distance  
##     4.96116      0.04257
qfit <- ggplot(aes(x = Distance, y = DeliveryTime), data = softdrink)
qfit + geom_point(color = 'Black') + geom_abline(intercept = 4.96116, slope = 0.04257, color = "Red") + theme_bw()

Multiple linear model

Now let’s build a model to explain the delivery time by using both number of cases and distance. So the model is: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon\)

lm3 <- lm(DeliveryTime ~ NumberCases + Distance, data = softdrink) #Fitting the model
print(lm3)
## 
## Call:
## lm(formula = DeliveryTime ~ NumberCases + Distance, data = softdrink)
## 
## Coefficients:
## (Intercept)  NumberCases     Distance  
##     2.34123      1.61591      0.01438
lm3p <- scatterplot3d(softdrink, type = "h", color = "Blue", angle = 45, pch = 16, main = "Scatter plot 3d") #Creating the scatter plot
lm3.plane <- lm3 #Creating the regression plane
lm3p$plane3d(lm3.plane) #Adding the plane to the scatter plot