A random sample of 50 production cycles for a Tennessee factory that produces two distinct types of fertilizer.
> with(TwoVarCA, (t.test(TotalCost, alternative='two.sided', mu=0.0,
+ conf.level=.95)))
One Sample t-test
data: TotalCost
t = 80.978, df = 49, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
605463.7 636279.3
sample estimates:
mean of x
620871.5
With 95% confidence, average total costs range between $605,464 and $636,279 per period.
> with(TwoVarCA, (t.test(Total.Production, alternative='two.sided', mu=0.0,
+ conf.level=.95)))
One Sample t-test
data: Total.Production
t = 109.24, df = 49, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
8995.928 9333.112
sample estimates:
mean of x
9164.52
With 95% confidence, average total production ranges between 8995.93 and 9333.11 units per period.
> with(TwoVarCA, (t.test(Wheat, Sugar, alternative='two.sided',
+ conf.level=.95, paired=TRUE)))
Paired t-test
data: Wheat and Sugar
t = -0.2092, df = 49, p-value = 0.8352
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1539.143 1248.903
sample estimates:
mean of the differences
-145.12
With 95% confidence, the average difference between sugar units and wheat units per period ranges between 1539 units less wheat than sugar to 1249 units more wheat than sugar per period. There is a slight weighting toward sugar fertilizer as the mean of the differences is less than zero [wheat - sugar].
Before moving on, and before we even started this, we should look at the data. In this case, it is instructive. Many firms face capacity constraints. The data show this relatively clearly. They are inversely proportional. Indeed, we can use a simple regression to figure out what it looks like.
> with(TwoVarCA, plot(Wheat, Sugar, xlab="Wheat Units", ylab="Sugar Units"))
> with(TwoVarCA, summary(lm(Wheat~Sugar)))
Call:
lm(formula = Wheat ~ Sugar)
Residuals:
Min 1Q Median 3Q Max
-1217.61 -347.91 -77.14 408.00 1351.04
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9152.27811 184.15213 49.7 <2e-16 ***
Sugar -0.99737 0.03512 -28.4 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 599.3 on 48 degrees of freedom
Multiple R-squared: 0.9438, Adjusted R-squared: 0.9427
F-statistic: 806.4 on 1 and 48 DF, p-value: < 2.2e-16
They are nearly perfect substitutes.
> CostsOneVar <- lm(TotalCost~Total.Production, data=TwoVarCA)
> summary(CostsOneVar)
Call:
lm(formula = TotalCost ~ Total.Production, data = TwoVarCA)
Residuals:
Min 1Q Median 3Q Max
-83035 -34049 1934 38114 69332
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 78578.78 92318.16 0.851 0.399
Total.Production 59.17 10.05 5.886 0.000000374 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 41740 on 48 degrees of freedom
Multiple R-squared: 0.4192, Adjusted R-squared: 0.4071
F-statistic: 34.65 on 1 and 48 DF, p-value: 0.0000003739
\[ Total Cost = Fixed Cost + Variable Cost + e \]
Plugging in the results from above,
\[ Total Cost = 78578.78 + 59.173*Units + e \]
> library(MASS, pos=18)
> Confint(CostsOneVar, level=0.95)
Estimate 2.5 % 97.5 %
(Intercept) 78578.78037 -107039.32297 264196.8837
Total.Production 59.17306 38.96053 79.3856
The interval for variable costs is reflected in the row Total.Production and the interval for fixed costs appears in row Intercept.
The regression does not provide a very good fit. R-squared, as variance explained, is only 0.4192. Though total units produced explain over forty percent of the variance in costs, they leave almost sixty percent unexplained. The estimate of the fixed costs includes unreasonable negative values and the variable costs are imprecise – a range of forty dollars per unit. Notice we cannot reject the hypothesis of no fixed costs; the standard error is large! That said, it is not useless; the number of units does explain over 40 percent and we can reject the claim of no relationship. This is not surprising. In the end, we can do better but we know that a fixed and variable cost breakdown is potentially promising with a better understanding of the variable costs.
> CostsTwoVar <- lm(TotalCost~Sugar+Wheat, data=TwoVarCA)
> summary(CostsTwoVar)
Call:
lm(formula = TotalCost ~ Sugar + Wheat, data = TwoVarCA)
Residuals:
Min 1Q Median 3Q Max
-37388 -7597 2876 9298 27076
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 145872.563 33381.577 4.37 6.82e-05 ***
Sugar 44.035 3.709 11.87 9.45e-16 ***
Wheat 59.877 3.612 16.57 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15000 on 47 degrees of freedom
Multiple R-squared: 0.9266, Adjusted R-squared: 0.9235
F-statistic: 296.6 on 2 and 47 DF, p-value: < 2.2e-16
> anova(CostsOneVar, CostsTwoVar)
Analysis of Variance Table
Model 1: TotalCost ~ Total.Production
Model 2: TotalCost ~ Sugar + Wheat
Res.Df RSS Df Sum of Sq F Pr(>F)
1 48 83646901517
2 47 10574923794 1 73071977723 324.77 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F tests the null hypothesis that the two models are equal [explain the same variance on average] against the alternative that the bigger model (more predictors) fits better. Ub this case, the F shows that the probability that total production is equivalent to separate variable costs for sugar and wheat fertilizer is zero. Thus, the variable costs among the two items must be different.
\[ Total Cost = 145872.563 + 44.035*Units of Wheat + 59.877*Units of Sugar + e \]
> TwoVarCA<- within(TwoVarCA, {
+ fitted.CostsTwoVar <- fitted(CostsTwoVar)
+ residuals.CostsTwoVar <- residuals(CostsTwoVar)
+ })
> qq.plot(TwoVarCA$residuals.CostsTwoVar, dist= "norm", labels=FALSE)
Warning: 'qq.plot' is deprecated.
Use 'qqPlot' instead.
See help("Deprecated") and help("car-deprecated").
> shapiro.test(TwoVarCA$residuals.CostsTwoVar)
Shapiro-Wilk normality test
data: TwoVarCA$residuals.CostsTwoVar
W = 0.96956, p-value = 0.2221
The most definitive evidence comes from comparing the residuals to a normal with mean zero and standard deviation equal to the residual standard error. The Shapiro-Wilk test of the null hypothesis of normality yields a p-value that doesn’t allow us to rule out normality.
The regression table tells us the residual standard error: the standard deviation of the residuals. In this case, the residual standard error is $15,000; the regression line differs from the actual data by, on average, about $15,000. To find the probability that the errors take values between $-20,000 and $20,000, we estimate the probability above $20,000 and below -$20000 and subtract both from one.
> 1-(pnorm(c(20000), mean=0, sd=15000, lower.tail=FALSE)+pnorm(c(-20000),
+ mean=0, sd=15000, lower.tail=TRUE))
[1] 0.8175776
The regession table tells us that multiple r-squared is 0.9266; 92.66% of the variance in total costs is explained by knowing the number of sugar and wheat units produced.
The regression table presents an F test at the bottom. This F test, like F tests in general, tests a null hypothesis that two models are equivalent. What are the two models? One has only an intercept; the other includes both sugar units and wheat units. The F statistic is almost 300 and the associated p-value is zero. This is evidence that the regression model is explaining variance.
> Confint(CostsTwoVar, level=0.95)
Estimate 2.5 % 97.5 %
(Intercept) 145872.56322 78717.49141 213027.63503
Sugar 44.03464 36.57390 51.49538
Wheat 59.87663 52.60937 67.14388
Yes, wheat units and sugar units are clearly related. The regression shows that over 90% of the variance in sugar units is explained by wheat units. The scatterplot is even more informative because it gives us hints about what is going on. Notice the lower boundaries of wheat and sugar units seem to be fixed. THe factory has committed to producing a minimum number of both types for their distributors. The remaining production in a period is constrained by capacity; this capacity constraint causes wheat units and sugar units to be nearly perfect substitutes.
If the wholesale price is the same, which item is more profitable to produce or does it not matter? Sugar fertilizer would be more profitable because the variable cost of sugar fertilizer is lower. Indeed, the variable cost of sugar fertilizer is disjoint from the variable cost of wheat fertilizer so sugar is clearly cheaper with 95% confidence.
Two relevant intervals can be generated for the expected costs of producing 4500 units each of wheat and sugar fertilizer. Provide these intervals and a discussion of their meaning and the difference between them. *As has been the case since the beginning of the term, we have the distribution of the data and the distribution of the means. The 95% confidence interval for the mean gives us the 95% confidence interval for average total costs given 4500 units of each produced. The 95% prediction interval gives us the range of observable costs with 95% confidence. Thus, the confidence interval tells us what happens on average [and is narrow]; the prediction interval tells us the range of total costs that we should see in any period in which we produce 4500 units of each.
> .NewData <- data.frame(Sugar=4500, Wheat=4500, row.names="1")
> .NewData # Newdata
Sugar Wheat
1 4500 4500
> predict(CostsTwoVar, newdata=.NewData, interval="confidence", level=.95,
+ se.fit=FALSE)
fit lwr upr
1 613473.3 609033.8 617912.8
> predict(CostsTwoVar, newdata=.NewData, interval="prediction", level=.95,
+ se.fit=FALSE)
fit lwr upr
1 613473.3 582972.4 643974.1
The 95% confidence interval for average cost is $609033.8 to $617912.80 given 4500 units of each type of fertilizer. The revenue from 4500 units of each is $630,000. Profit equals revenue minus costs. So, $630,000 minus $609033.8 and $630,000 minus $617912.8 yields a 95% confidence interval for average profits ranging from $12087.20 to $20966.20.
The interval calculation follows the same procedure as before but we subtract the prediction interval instead of the confidence interval. With 95% confidence, profits should range between [$630,000 - $582,972.4=]$47,027.60 and [$630,000-643974.1=]-$13,974.10 [losses]. Thus, with 95% confidence, she cannot be guaranteed profits.
> library(effects, pos=16)
> plot(allEffects(CostsTwoVar, partial.residuals=TRUE))
Lines appear sufficient.