A random sample of 50 production cycles for a Tennessee factory that produces two distinct types of fertilizer.

Two Sample testing

  1. Describe, with 95% confidence, using appropriate statistical methods,
> with(TwoVarCA, (t.test(TotalCost, alternative='two.sided', mu=0.0, 
+   conf.level=.95)))

    One Sample t-test

data:  TotalCost
t = 80.978, df = 49, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 605463.7 636279.3
sample estimates:
mean of x 
 620871.5 

With 95% confidence, average total costs range between $605,464 and $636,279 per period.

> with(TwoVarCA, (t.test(Total.Production, alternative='two.sided', mu=0.0, 
+   conf.level=.95)))

    One Sample t-test

data:  Total.Production
t = 109.24, df = 49, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 8995.928 9333.112
sample estimates:
mean of x 
  9164.52 

With 95% confidence, average total production ranges between 8995.93 and 9333.11 units per period.

> with(TwoVarCA, (t.test(Wheat, Sugar, alternative='two.sided', 
+   conf.level=.95, paired=TRUE)))

    Paired t-test

data:  Wheat and Sugar
t = -0.2092, df = 49, p-value = 0.8352
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1539.143  1248.903
sample estimates:
mean of the differences 
                -145.12 

With 95% confidence, the average difference between sugar units and wheat units per period ranges between 1539 units less wheat than sugar to 1249 units more wheat than sugar per period. There is a slight weighting toward sugar fertilizer as the mean of the differences is less than zero [wheat - sugar].

Before moving on, and before we even started this, we should look at the data. In this case, it is instructive. Many firms face capacity constraints. The data show this relatively clearly. They are inversely proportional. Indeed, we can use a simple regression to figure out what it looks like.

> with(TwoVarCA, plot(Wheat, Sugar, xlab="Wheat Units", ylab="Sugar Units"))

> with(TwoVarCA, summary(lm(Wheat~Sugar)))

Call:
lm(formula = Wheat ~ Sugar)

Residuals:
     Min       1Q   Median       3Q      Max 
-1217.61  -347.91   -77.14   408.00  1351.04 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 9152.27811  184.15213    49.7   <2e-16 ***
Sugar         -0.99737    0.03512   -28.4   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 599.3 on 48 degrees of freedom
Multiple R-squared:  0.9438,    Adjusted R-squared:  0.9427 
F-statistic: 806.4 on 1 and 48 DF,  p-value: < 2.2e-16

They are nearly perfect substitutes.

Two Variable Regression

  1. Historically, the variable costs of sugar and wheat fertilizer were thought be the same. Estimate total costs as a function of total units produced.
> CostsOneVar <- lm(TotalCost~Total.Production, data=TwoVarCA)
> summary(CostsOneVar)

Call:
lm(formula = TotalCost ~ Total.Production, data = TwoVarCA)

Residuals:
   Min     1Q Median     3Q    Max 
-83035 -34049   1934  38114  69332 

Coefficients:
                 Estimate Std. Error t value    Pr(>|t|)    
(Intercept)      78578.78   92318.16   0.851       0.399    
Total.Production    59.17      10.05   5.886 0.000000374 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 41740 on 48 degrees of freedom
Multiple R-squared:  0.4192,    Adjusted R-squared:  0.4071 
F-statistic: 34.65 on 1 and 48 DF,  p-value: 0.0000003739

\[ Total Cost = Fixed Cost + Variable Cost + e \]

Plugging in the results from above,

\[ Total Cost = 78578.78 + 59.173*Units + e \]

> library(MASS, pos=18)
> Confint(CostsOneVar, level=0.95)
                    Estimate         2.5 %      97.5 %
(Intercept)      78578.78037 -107039.32297 264196.8837
Total.Production    59.17306      38.96053     79.3856

The interval for variable costs is reflected in the row Total.Production and the interval for fixed costs appears in row Intercept.

Bivariate Managerial Accounting

The regression does not provide a very good fit. R-squared, as variance explained, is only 0.4192. Though total units produced explain over forty percent of the variance in costs, they leave almost sixty percent unexplained. The estimate of the fixed costs includes unreasonable negative values and the variable costs are imprecise – a range of forty dollars per unit. Notice we cannot reject the hypothesis of no fixed costs; the standard error is large! That said, it is not useless; the number of units does explain over 40 percent and we can reject the claim of no relationship. This is not surprising. In the end, we can do better but we know that a fixed and variable cost breakdown is potentially promising with a better understanding of the variable costs.

Multiple Regression

  1. More recently, the costs of inputs have systematically changed and this could lead to differences in the variable costs of sugar and wheat fertilizer. Estimate total costs as fixed costs plus variable costs of sugar and wheat fertilizer together. Compare the models with separate per unit costs to the regression explaining total costs with just Total units produced. Compare the fit of this regression with two variable costs to the regression with just Total units produced. How do they compare? Test the hypothesis that the variable costs are the same.
> CostsTwoVar <- lm(TotalCost~Sugar+Wheat, data=TwoVarCA)
> summary(CostsTwoVar)

Call:
lm(formula = TotalCost ~ Sugar + Wheat, data = TwoVarCA)

Residuals:
   Min     1Q Median     3Q    Max 
-37388  -7597   2876   9298  27076 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 145872.563  33381.577    4.37 6.82e-05 ***
Sugar           44.035      3.709   11.87 9.45e-16 ***
Wheat           59.877      3.612   16.57  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15000 on 47 degrees of freedom
Multiple R-squared:  0.9266,    Adjusted R-squared:  0.9235 
F-statistic: 296.6 on 2 and 47 DF,  p-value: < 2.2e-16
> anova(CostsOneVar, CostsTwoVar)
Analysis of Variance Table

Model 1: TotalCost ~ Total.Production
Model 2: TotalCost ~ Sugar + Wheat
  Res.Df         RSS Df   Sum of Sq      F    Pr(>F)    
1     48 83646901517                                    
2     47 10574923794  1 73071977723 324.77 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

F tests the null hypothesis that the two models are equal [explain the same variance on average] against the alternative that the bigger model (more predictors) fits better. Ub this case, the F shows that the probability that total production is equivalent to separate variable costs for sugar and wheat fertilizer is zero. Thus, the variable costs among the two items must be different.

  1. Write an equation expressing total cost as fixed cost plus variable cost of wheat fertilizer units plus variable cost of sugar fertilizer units using the regression results. How well does the two variable cost model fit?

\[ Total Cost = 145872.563 + 44.035*Units of Wheat + 59.877*Units of Sugar + e \]

  1. Regression inference requires normally distributed residuals. What do you think in this case? For the remaining questions, carry forward as though the residuals are normal without regard to your answer to this question. But you should keep in mind the important if-then role that normal residuals play.
> TwoVarCA<- within(TwoVarCA, {
+   fitted.CostsTwoVar <- fitted(CostsTwoVar)
+   residuals.CostsTwoVar <- residuals(CostsTwoVar) 
+ })
> qq.plot(TwoVarCA$residuals.CostsTwoVar, dist= "norm", labels=FALSE)
Warning: 'qq.plot' is deprecated.
Use 'qqPlot' instead.
See help("Deprecated") and help("car-deprecated").

> shapiro.test(TwoVarCA$residuals.CostsTwoVar)

    Shapiro-Wilk normality test

data:  TwoVarCA$residuals.CostsTwoVar
W = 0.96956, p-value = 0.2221

The most definitive evidence comes from comparing the residuals to a normal with mean zero and standard deviation equal to the residual standard error. The Shapiro-Wilk test of the null hypothesis of normality yields a p-value that doesn’t allow us to rule out normality.

  1. What is the standard error of the residual? Use regression assumptions about the distribution of the errors to find the probability that actual and predicted cost differ, on average, by no more than (+/-) $20000.

The regression table tells us the residual standard error: the standard deviation of the residuals. In this case, the residual standard error is $15,000; the regression line differs from the actual data by, on average, about $15,000. To find the probability that the errors take values between $-20,000 and $20,000, we estimate the probability above $20,000 and below -$20000 and subtract both from one.

> 1-(pnorm(c(20000), mean=0, sd=15000, lower.tail=FALSE)+pnorm(c(-20000), 
+   mean=0, sd=15000, lower.tail=TRUE))
[1] 0.8175776
  1. What percentage of variation in total cost is explained by the regression?

The regession table tells us that multiple r-squared is 0.9266; 92.66% of the variance in total costs is explained by knowing the number of sugar and wheat units produced.

  1. What statistic evaluates the claim that the regression, as a whole, explains no variance? Interpret the statistic or the associated p-value to provide an answer. Assume 95% confidence for evaluating this.

The regression table presents an F test at the bottom. This F test, like F tests in general, tests a null hypothesis that two models are equivalent. What are the two models? One has only an intercept; the other includes both sugar units and wheat units. The F statistic is almost 300 and the associated p-value is zero. This is evidence that the regression model is explaining variance.

  1. Confidence intervals: What is the 95% confidence interval for
> Confint(CostsTwoVar, level=0.95)
                Estimate       2.5 %       97.5 %
(Intercept) 145872.56322 78717.49141 213027.63503
Sugar           44.03464    36.57390     51.49538
Wheat           59.87663    52.60937     67.14388
  1. Are wheat units and sugar units unrelated? What proof can you generate at the 95% level of confidence? You can use either regression, a simple scatterplot, or correlation as you see fit. Can you explain this relationship?

Yes, wheat units and sugar units are clearly related. The regression shows that over 90% of the variance in sugar units is explained by wheat units. The scatterplot is even more informative because it gives us hints about what is going on. Notice the lower boundaries of wheat and sugar units seem to be fixed. THe factory has committed to producing a minimum number of both types for their distributors. The remaining production in a period is constrained by capacity; this capacity constraint causes wheat units and sugar units to be nearly perfect substitutes.

  1. True/False to the following questions (with 95% confidence):
  1. If the wholesale price is the same, which item is more profitable to produce or does it not matter? Sugar fertilizer would be more profitable because the variable cost of sugar fertilizer is lower. Indeed, the variable cost of sugar fertilizer is disjoint from the variable cost of wheat fertilizer so sugar is clearly cheaper with 95% confidence.

  2. Two relevant intervals can be generated for the expected costs of producing 4500 units each of wheat and sugar fertilizer. Provide these intervals and a discussion of their meaning and the difference between them. *As has been the case since the beginning of the term, we have the distribution of the data and the distribution of the means. The 95% confidence interval for the mean gives us the 95% confidence interval for average total costs given 4500 units of each produced. The 95% prediction interval gives us the range of observable costs with 95% confidence. Thus, the confidence interval tells us what happens on average [and is narrow]; the prediction interval tells us the range of total costs that we should see in any period in which we produce 4500 units of each.

> .NewData <- data.frame(Sugar=4500, Wheat=4500, row.names="1")
> .NewData  # Newdata
  Sugar Wheat
1  4500  4500
> predict(CostsTwoVar, newdata=.NewData, interval="confidence", level=.95, 
+   se.fit=FALSE)
       fit      lwr      upr
1 613473.3 609033.8 617912.8
> predict(CostsTwoVar, newdata=.NewData, interval="prediction", level=.95, 
+   se.fit=FALSE)
       fit      lwr      upr
1 613473.3 582972.4 643974.1
  1. True/False (given the results) (Observed total cost minus predicted total cost yields the residual).
  1. The factory has agreed to sell 4500 units of wheat fertilizer at $80 per unit and 4500 units of sugar fertilizer at $60 per unit to an agribusiness firm. Provide a 95% confidence interval for average profit.

The 95% confidence interval for average cost is $609033.8 to $617912.80 given 4500 units of each type of fertilizer. The revenue from 4500 units of each is $630,000. Profit equals revenue minus costs. So, $630,000 minus $609033.8 and $630,000 minus $617912.8 yields a 95% confidence interval for average profits ranging from $12087.20 to $20966.20.

  1. The proposed contract will last two years. The owner/CEO draws no salary. She only takes home realized profits in the accounting period. Can she expect, with 95% confidence, to be paid a positive amount in each and every month for the two years of the contract? If you can provide a 95% confidence interval for her direct compensation, do so.

The interval calculation follows the same procedure as before but we subtract the prediction interval instead of the confidence interval. With 95% confidence, profits should range between [$630,000 - $582,972.4=]$47,027.60 and [$630,000-643974.1=]-$13,974.10 [losses]. Thus, with 95% confidence, she cannot be guaranteed profits.

  1. Examine the effects plots [Models > Graphs > ] including the partial residuals. Are lines sufficient or is there evidence of more complicated relationships?
> library(effects, pos=16)
> plot(allEffects(CostsTwoVar, partial.residuals=TRUE))

Lines appear sufficient.