Blog5

Exercise 2.3

The manager of the purchasing department of a large company would like to develop a regression model to predict the average amount of time it takes to process a given number of invoices. Over a 30-day period, data are collected on the number of invoices processed and the total time taken (in hours). The data are available on the book web site in the file invoices.txt. The following model was fit to the data Y = B0 + B1x + e where Y is the processing time and x is the number of invoices. A plot of the data and the fitted model can be found in Figure 2.7. Utilizing the output from the fit of this model provided below, complete the following tasks.

a). Find a 95% confidence interval for the start-up time, i.e. B0.

The 95% confidence interval for B0 is given by 0.6417099 +(-) 2.1009 * 0.1222707 = (0.6417099 +(-) 0.2568785) = (0.8985884, 0.3848314)

b). Suppose that a best practice benchmark for the average processing time for an additional invoice is 0.01 hours (or 0.6 minutes). Test the null hypothesis H0: B1 = 0.01 against a two-sided alternative. Interpret your result.

H0: B1 = 0.01 against HA: B1 # 0.01. From the regression summary below, t-value is 13.797 and the corresponding p-value is 5.17e-14. The p-value here is very small and hence the null hypothesis cannot be rejected. Hence, the average processing time for an additional invoice is 0.01 hours.

c). Find a point estimate and a 95% prediction interval for the time taken to process 130 invoices.

Point estimate = 0.6417099 + 0.0112916 * 130 = 0.6417099 + 1.467908 = 2.109618 hours or 126.577 minutes is the time taken to process 130 invoices. Prediction intervals are given below using the predict function.

require(faraway)

## Loading required package: faraway

require(ggplot2)

## Loading required package: ggplot2

invoice <- read.table("/Users/tponnada/Downloads/invoices.txt", header=TRUE, sep="\t")
lmod <- lm(Time ~ Invoices, invoice)
summary(lmod)

## 
## Call:
## lm(formula = Time ~ Invoices, data = invoice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59516 -0.27851  0.03485  0.19346  0.53083 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.6417099  0.1222707   5.248 1.41e-05 ***
## Invoices    0.0112916  0.0008184  13.797 5.17e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3298 on 28 degrees of freedom
## Multiple R-squared:  0.8718, Adjusted R-squared:  0.8672 
## F-statistic: 190.4 on 1 and 28 DF,  p-value: 5.175e-14

mean(invoice$Time)

## [1] 2.11

median(invoice$Time)

## [1] 2

mean(invoice$Invoices)

## [1] 130.0333

median(invoice$Invoices)

## [1] 127.5

ggplot(invoice, aes(x = Invoices, y = Time)) + geom_point()

predict(lmod, invoice, interval="predict")

##          fit       lwr      upr
## 1  2.3241648 1.6367528 3.011577
## 2  1.3192085 0.6225678 2.015849
## 3  2.7645390 2.0710207 3.458057
## 4  0.9014177 0.1916850 1.611150
## 5  2.9113303 2.2144242 3.608236
## 6  1.2966252 0.5994116 1.993839
## 7  1.5111665 0.8187586 2.203574
## 8  3.1484549 2.4446835 3.852226
## 9  2.6854975 1.9935260 3.377469
## 10 0.9804592 0.2736022 1.687316
## 11 1.8837907 1.1962937 2.571288
## 12 1.5789163 0.8877281 2.270105
## 13 1.3192085 0.6225678 2.015849
## 14 0.9240010 0.2151086 1.632893
## 15 2.5951643 1.9047205 3.285608
## 16 2.5499977 1.8602213 3.239774
## 17 2.7871223 2.0931263 3.481118
## 18 3.2726630 2.5646231 3.980703
## 19 3.9049950 3.1684193 4.641571
## 20 1.1498339 0.4485171 1.851151
## 21 2.8209972 2.1262549 3.515740
## 22 1.4321250 0.7381128 2.126137
## 23 3.3629961 2.6515678 4.074424
## 24 1.8047492 1.1165791 2.492919
## 25 2.4822479 1.7933512 3.171145
## 26 1.9967072 1.3098249 2.683589
## 27 2.9113303 2.2144242 3.608236
## 28 2.1660818 1.4793551 2.852809
## 29 1.5450414 0.8532614 2.236821
## 30 0.9691676 0.2619109 1.676424