Cost Accounting Markdown

Robert W. Walker

2015-10-27

## Loading required package: splines
## Loading required package: RcmdrMisc
## Loading required package: car
## Loading required package: sandwich
## The Commander GUI is launched only in interactive sessions

Objectives: Use Bond Funds to briefly illustrate sampling. We can see both types in this data structure. Let’s look at the data on Bonds.

The overall challenge: Understand and characterize the cost structure of four example firms using regression models based on reported data.

Let’s first look at the data…

> par(mfrow=c(2,2))
> plot(TotCost~units, data=HMB)
> plot(TotCost2~units, data=ScarceSteady)
> plot(TotCost4~units, data=Semiconductors)
> scatterplot(TotCost3~units|PlantFac, reg.line=lm, smooth=FALSE, spread=FALSE, id.method=FALSE, data=PowerPlant)

> scatterplot(TotCost~units, reg.line=lm, smooth=FALSE, spread=FALSE, 
+   id.method='mahal', id.n = 2, boxplots=FALSE, span=0.5, xlab="Units 
+   Produced", ylab="Total Cost", main="HandMade Bags", data=HMB, xlim=c(0,
+   600), ylim=c(0,26000))

 5 28 
 5 28 
> HMB.L <- lm(TotCost~units, data=HMB)
> summary(HMB.L)

Call:
lm(formula = TotCost ~ units, data = HMB)

Residuals:
    Min      1Q  Median      3Q     Max 
-1606.7  -610.9  -124.8   509.8  2522.3 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 12308.642    323.601   38.04   <2e-16 ***
units          21.702      0.862   25.18   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 871.8 on 58 degrees of freedom
Multiple R-squared:  0.9162,    Adjusted R-squared:  0.9147 
F-statistic: 633.8 on 1 and 58 DF,  p-value: < 2.2e-16
> HMB.Q <- lm(TotCost~poly(units, 2, raw=TRUE), data=HMB)
> summary(HMB.Q)

Call:
lm(formula = TotCost ~ poly(units, 2, raw = TRUE), data = HMB)

Residuals:
    Min      1Q  Median      3Q     Max 
-1624.5  -616.3  -132.9   516.4  2507.0 

Coefficients:
                                Estimate   Std. Error t value Pr(>|t|)    
(Intercept)                 12187.439285   877.442126  13.890  < 2e-16 ***
poly(units, 2, raw = TRUE)1    22.497840     5.421259   4.150 0.000112 ***
poly(units, 2, raw = TRUE)2    -0.001129     0.007585  -0.149 0.882230    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 879.2 on 57 degrees of freedom
Multiple R-squared:  0.9162,    Adjusted R-squared:  0.9133 
F-statistic: 311.6 on 2 and 57 DF,  p-value: < 2.2e-16
> HMB<- within(HMB, {
+   fitted.HMB.L <- fitted(HMB.L)
+   residuals.HMB.L <- residuals(HMB.L) 
+   fitted.HMB.Q <- fitted(HMB.Q)
+   residuals.HMB.Q <- residuals(HMB.Q) 
+ })
> qqPlot(HMB$residuals.HMB.L, dist= "norm", labels=FALSE)

> shapiro.test(HMB$residuals.HMB.L)

    Shapiro-Wilk normality test

data:  HMB$residuals.HMB.L
W = 0.98224, p-value = 0.5304
> qqPlot(HMB$residuals.HMB.Q, dist= "norm", labels=FALSE)

> shapiro.test(HMB$residuals.HMB.Q)

    Shapiro-Wilk normality test

data:  HMB$residuals.HMB.Q
W = 0.98214, p-value = 0.5254

Now, how do we interpret this. With normal residuls, the average predictions are distributed t; the full predictions are distributed normal, and the intercepts and slopes are distributed t. Here is what we can do.

> library(MASS, pos=32)
> Confint(HMB.L, level=0.95)
               Estimate       2.5 %      97.5 %
(Intercept) 12308.64215 11660.88464 12956.39966
units          21.70155    19.97605    23.42705
> .NewData <- data.frame(units=c(300, 350, 400), row.names=c("1", "2", "3"))
> .NewData  # Newdata
  units
1   300
2   350
3   400
> predict(HMB.L, newdata=.NewData, interval="confidence", level=.95, 
+   se.fit=FALSE)
       fit      lwr      upr
1 18819.11 18576.63 19061.58
2 19904.19 19678.87 20129.50
3 20989.26 20749.21 21229.31
> .NewData <- data.frame(units=c(300, 350, 400), row.names=c("1", "2", "3"))
> .NewData  # Newdata
  units
1   300
2   350
3   400
> predict(HMB.L, newdata=.NewData, interval="prediction", level=.95, 
+   se.fit=FALSE)
       fit      lwr      upr
1 18819.11 17057.28 20580.93
2 19904.19 18144.64 21663.73
3 20989.26 19227.77 22750.75
> HMB.Q <- lm(TotCost ~ poly(units , degree=2, raw=TRUE), data=HMB)
> summary(HMB.Q)

Call:
lm(formula = TotCost ~ poly(units, degree = 2, raw = TRUE), data = HMB)

Residuals:
    Min      1Q  Median      3Q     Max 
-1624.5  -616.3  -132.9   516.4  2507.0 

Coefficients:
                                         Estimate   Std. Error t value
(Intercept)                          12187.439285   877.442126  13.890
poly(units, degree = 2, raw = TRUE)1    22.497840     5.421259   4.150
poly(units, degree = 2, raw = TRUE)2    -0.001129     0.007585  -0.149
                                     Pr(>|t|)    
(Intercept)                           < 2e-16 ***
poly(units, degree = 2, raw = TRUE)1 0.000112 ***
poly(units, degree = 2, raw = TRUE)2 0.882230    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 879.2 on 57 degrees of freedom
Multiple R-squared:  0.9162,    Adjusted R-squared:  0.9133 
F-statistic: 311.6 on 2 and 57 DF,  p-value: < 2.2e-16

The null hypothesis of F is that the two models are equivalent; we can only rule out equivalence if the curve explains more variation than the line taking account of the need for two terms in estimating the curve. In this case, the curve explains little. The logic of F is this. How much new is explained and how many degrees of freedom are required to explain it? How does that compare to the average residual? If the ratio is big enough, I must be explaining a lot relative to how much I do not know.

For this example, a lot would be an F of 4.0098679. Or about 4 times as much. This all follows from the normal. About 95% are between -2 and 2; in squared terms, 95% should be below 4.

> anova(HMB.L, HMB.Q)
Analysis of Variance Table

Model 1: TotCost ~ units
Model 2: TotCost ~ poly(units, degree = 2, raw = TRUE)
  Res.Df      RSS Df Sum of Sq      F Pr(>F)
1     58 44080089                           
2     57 44062971  1     17118 0.0221 0.8822

Where R shines is in predicting in complicated circumstances. Let’s try to predict this from the menus.

We cannot do it. But we can trick R into doing it. Let me copy the chunk that predicts from above.

> NewData <- data.frame(units=c(0:600))
> NewData$predsCL <- predict(HMB.L, newdata=NewData, interval="confidence", level=.95, 
+   se.fit=FALSE)
> NewData$predsCQ <- predict(HMB.Q, newdata=NewData, interval="confidence", level=.95, 
+   se.fit=FALSE)
> NewData$predsPL <- predict(HMB.L, newdata=NewData, interval="prediction", level=.95, 
+   se.fit=FALSE)
> NewData$predsPQ <- predict(HMB.Q, newdata=NewData, interval="prediction", level=.95, 
+   se.fit=FALSE)

To make it predict the quadratic equation; I just change the name of the model to be predicted from HMB.L to HMB.Q. Now that I have the dataset in NewData, I can plot the predictions. There are multiple ways to display this but to show what it ultimately does, the following graphic is most instructive.

> plot(x=NA, y=NA, main="Predicted Mean Costs", xlim=c(0,600), ylim=c(0,25000), xlab="Units Produced", ylab="Total Costs")
> with(NewData, segments(x0=units+0.5, x1=units+0.5, y0=predsCQ[,2],y1=predsCQ[,3], col='red', lwd=1))
> with(NewData, segments(x0=units-0.5, x1=units-0.5, y0=predsCL[,2],y1=predsCL[,3], col='pink', lwd=1))

> plot(x=NA, y=NA, main="Predicted Costs", xlim=c(0,600), ylim=c(0,30000), xlab="Units Produced", ylab="Total Costs")
> with(NewData, segments(x0=units+0.5, x1=units+0.5, y0=predsPQ[,2],y1=predsPQ[,3], col='red', lwd=1))
> with(NewData, segments(x0=units-0.5, x1=units-0.5, y0=predsPL[,2],y1=predsPL[,3], col='pink', lwd=1))

Now let us have a look at semiconductors.

> scatterplot(TotCost4~units, reg.line=lm, smooth=FALSE, spread=FALSE, 
+   id.method='mahal', id.n = 2, boxplots='xy', span=0.5, data=Semiconductors)

17 21 
17 21 
> SC.L <- lm(TotCost4~units, data=Semiconductors)
> summary(SC.L)

Call:
lm(formula = TotCost4 ~ units, data = Semiconductors)

Residuals:
    Min      1Q  Median      3Q     Max 
-5680.7 -1158.2   346.2  1382.3  4402.2 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 23861.099    850.026   28.07   <2e-16 ***
units          62.452      2.264   27.58   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2290 on 58 degrees of freedom
Multiple R-squared:  0.9292,    Adjusted R-squared:  0.9279 
F-statistic: 760.7 on 1 and 58 DF,  p-value: < 2.2e-16
> shapiro.test(SC.L$residuals)

    Shapiro-Wilk normality test

data:  SC.L$residuals
W = 0.96631, p-value = 0.09611
> SC.Q <- lm(TotCost4 ~ poly(units , degree=2, raw=TRUE), data=Semiconductors)
> summary(SC.Q)

Call:
lm(formula = TotCost4 ~ poly(units, degree = 2, raw = TRUE), 
    data = Semiconductors)

Residuals:
    Min      1Q  Median      3Q     Max 
-4573.4 -1017.7   -56.5  1167.3  6437.0 

Coefficients:
                                        Estimate  Std. Error t value
(Intercept)                          14545.53347  1883.46986   7.723
poly(units, degree = 2, raw = TRUE)1   123.65467    11.63698  10.626
poly(units, degree = 2, raw = TRUE)2    -0.08675     0.01628  -5.328
                                     Pr(>|t|)    
(Intercept)                          1.99e-10 ***
poly(units, degree = 2, raw = TRUE)1 3.87e-15 ***
poly(units, degree = 2, raw = TRUE)2 1.76e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1887 on 57 degrees of freedom
Multiple R-squared:  0.9527,    Adjusted R-squared:  0.9511 
F-statistic: 574.2 on 2 and 57 DF,  p-value: < 2.2e-16
> shapiro.test(SC.Q$residuals)

    Shapiro-Wilk normality test

data:  SC.Q$residuals
W = 0.97403, p-value = 0.2287
> anova(SC.L, SC.Q)
Analysis of Variance Table

Model 1: TotCost4 ~ units
Model 2: TotCost4 ~ poly(units, degree = 2, raw = TRUE)
  Res.Df       RSS Df Sum of Sq     F      Pr(>F)    
1     58 304149603                                   
2     57 203027172  1 101122431 28.39 0.000001757 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here, we can reject the hypothesis that the two models are equivalent; the p-value is the probability that the added term explains no variance – 0. This appears to have a nonlinear cost structure.

> NewData2 <- data.frame(units=c(0:600))
> NewData2$predsCL <- predict(SC.L, newdata=NewData, interval="confidence", level=.95, 
+   se.fit=FALSE)
> NewData2$predsCQ <- predict(SC.Q, newdata=NewData, interval="confidence", level=.95, 
+   se.fit=FALSE)
> NewData2$predsPL <- predict(SC.L, newdata=NewData, interval="prediction", level=.95, 
+   se.fit=FALSE)
> NewData2$predsPQ <- predict(SC.Q, newdata=NewData, interval="prediction", level=.95, 
+   se.fit=FALSE)

To make it predict the quadratic equation; I just change the name of the model to be predicted from HMB.L to HMB.Q. Now that I have the dataset in NewData, I can plot the predictions. There are multiple ways to display this but to show what it ultimately does, the following graphic is most instructive.

> plot(x=NA, y=NA, main="Predicted Mean Costs", xlim=c(0,600), ylim=c(0,60000), xlab="Units Produced", ylab="Total Costs")
> with(NewData2, segments(x0=units+0.5, x1=units+0.5, y0=predsCQ[,2],y1=predsCQ[,3], col='red', lwd=1))
> with(NewData2, segments(x0=units-0.5, x1=units-0.5, y0=predsCL[,2],y1=predsCL[,3], col='pink', lwd=1))

> plot(x=NA, y=NA, main="Predicted Costs", xlim=c(0,600), ylim=c(0,60000), xlab="Units Produced", ylab="Total Costs")
> with(NewData2, segments(x0=units+0.5, x1=units+0.5, y0=predsPQ[,2],y1=predsPQ[,3], col='red', lwd=1))
> with(NewData2, segments(x0=units-0.5, x1=units-0.5, y0=predsPL[,2],y1=predsPL[,3], col='pink', lwd=1))

The difference in the two models now leads to differing conclusions on costs to be incurred for given numbers of units produced.

We have two more examples. Scarce and steady inputs and power plants.