Before starting, let’s make sure the R Commander Plugin HH is active. This will make our lives easier.
The working directory will be important for the .html Markdown. Set that to this file’s location.
Generate the HTML report. That will give us the data.
What are the fixed costs with 95% confidence? What are the variable costs with 95% confidence? How much of the variance in total costs is explained by this simple accounting partition? Compare a linear and a polynomial of order 2. Can we reject the curvature? Predict costs (average and individual) at interesting levels. We can take advantage of HH to a large extent and then manipulate what it does to get precisely what we want. This will force us to manipulate the Markdown a wee bit. Have a look at the effects plots.
To the example in the slides…
This will use the HMB data.
First a linear regression.
> HMB.L <- lm(TotCost~units, data=HMB)
> summary(HMB.L)
Call:
lm(formula = TotCost ~ units, data = HMB)
Residuals:
Min 1Q Median 3Q Max
-1606.7 -610.9 -124.8 509.8 2522.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12308.642 323.601 38.04 <2e-16 ***
units 21.702 0.862 25.18 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 871.8 on 58 degrees of freedom
Multiple R-squared: 0.9162, Adjusted R-squared: 0.9147
F-statistic: 633.8 on 1 and 58 DF, p-value: < 2.2e-16
Now linear models to get the polynomial.
> HMB.Q <- lm(TotCost ~ poly(units , degree=2, raw=TRUE), data=HMB)
> summary(HMB.Q)
Call:
lm(formula = TotCost ~ poly(units, degree = 2, raw = TRUE), data = HMB)
Residuals:
Min 1Q Median 3Q Max
-1624.5 -616.3 -132.9 516.4 2507.0
Coefficients:
Estimate Std. Error t value
(Intercept) 1.219e+04 8.774e+02 13.890
poly(units, degree = 2, raw = TRUE)1 2.250e+01 5.421e+00 4.150
poly(units, degree = 2, raw = TRUE)2 -1.129e-03 7.585e-03 -0.149
Pr(>|t|)
(Intercept) < 2e-16 ***
poly(units, degree = 2, raw = TRUE)1 0.000112 ***
poly(units, degree = 2, raw = TRUE)2 0.882230
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 879.2 on 57 degrees of freedom
Multiple R-squared: 0.9162, Adjusted R-squared: 0.9133
F-statistic: 311.6 on 2 and 57 DF, p-value: < 2.2e-16
In the model box, select HMB.L and predict it for 300, 350, and 400 units. Then Models and Prediction Intervals (HH).. First, the confidence interval for mean total costs.
> .NewData <- data.frame(units=c(300, 350, 400), row.names=c("1", "2", "3"))
> .NewData # Newdata
units
1 300
2 350
3 400
> predict(HMB.L, newdata=.NewData, interval="confidence", level=.95, se.fit=FALSE)
fit lwr upr
1 18819.11 18576.63 19061.58
2 19904.19 19678.87 20129.50
3 20989.26 20749.21 21229.31
Now just copy the Markdown chunk from above exactly [including the apostrophe marks]. All we will change is the HMB.L to an HMB.Q. This will have R predict the same thing for the other model. This is a shortcoming of the predictions, they are smart but this makes a dialog box hard.
> .NewData <- data.frame(units=c(300, 350, 400), row.names=c("1", "2", "3"))
> .NewData # Newdata
units
1 300
2 350
3 400
> predict(HMB.Q, newdata=.NewData, interval="confidence", level=.95, se.fit=FALSE)
fit lwr upr
1 18835.21 18508.42 19162.00
2 19923.42 19578.95 20267.88
3 21005.98 20675.40 21336.56
We will replicate the same process for the individual distribution of predictions [the range of possible costs, not average costs… remember, we are assuming that the residuals are normal, thus the predictions are normal]. The same interactions but now a prediction interval.
> .NewData <- data.frame(units=c(300, 350, 400), row.names=c("1", "2", "3"))
> .NewData # Newdata
units
1 300
2 350
3 400
> predict(HMB.L, newdata=.NewData, interval="prediction", level=.95, se.fit=FALSE)
fit lwr upr
1 18819.11 17057.28 20580.93
2 19904.19 18144.64 21663.73
3 20989.26 19227.77 22750.75
Now to the quadratic but otherwise the same.
> .NewData <- data.frame(units=c(300, 350, 400), row.names=c("1", "2", "3"))
> .NewData # Newdata
units
1 300
2 350
3 400
> predict(HMB.Q, newdata=.NewData, interval="prediction", level=.95, se.fit=FALSE)
fit lwr upr
1 18835.21 17044.52 20625.89
2 19923.42 18129.42 21717.41
3 21005.98 19214.60 22797.37
And an even more elaborate version. All it does is take new observations on units and predict for those values. We could upload a spreadsheet. Let’s try it.
First, import the new data from a spreadsheet.
Then we will predict using those data, but we need to assign the predictions a name.
Then we can merge the predictions back into the original data.