Anova two factors

Dataset

An engineer is designing a battery for use in a device that will be subjected to some extreme variations in temperature. The only design parameter that he can select at this point is the plate material for the battery, and he has three possible choices. When the device is manufactured and is shipped to the field, the engineer has no control over the temperature extremes that the device will encounter, and he knows from experience that temperature will probably affect the effective battery life. However, the temperature can be controlled in the product development laboratory for a test.

The engineer decides to test all three plate materials at three temperature levels: \(15\) (low), \(70\) (medium), and \(125°F\) (high) because these temperature levels are consistent with the product end-use environment. Four batteries are tested at each combination of plate material and temperature, and all 36 tests are run in random order. The experiment and the resulting observed battery life data are given in the next table.

battery_life	temperature	material_type
130	low	A
155	low	A
34	medium	A
40	medium	A
20	high	A
70	high	A
74	low	A
180	low	A
80	medium	A
75	medium	A
82	high	A
58	high	A
150	low	B
188	low	B
136	medium	B
122	medium	B
25	high	B
70	high	B
159	low	B
126	low	B
106	medium	B
115	medium	B
58	high	B
45	high	B
138	low	C
110	low	C
174	medium	C
120	medium	C
96	high	C
104	high	C
168	low	C
160	low	C
150	medium	C
139	medium	C
82	high	C
60	high	C

We can show the table in a different way to a better understanding of the data.

	low	low	medium	medium	high	high
A	130	155	34	40	20	70
A	74	180	80	75	82	58
B	150	188	136	122	25	70
B	159	126	106	115	58	45
C	138	110	174	120	96	104
C	168	160	150	139	82	60

In this table, the columns represent the different temperatures and the rows depict the types of material.

Plots

It is always a good idea to examine experimental data graphically. The next figure presents a boxplot for battery life at each level of temperature and material type.

The graph indicates that generally, the battery life increases as the temperature decreases (for material type A this is not so clear). Based on this simple graphical analysis, we strongly suspect that temperature affects the battery life (at least for material type B), and (2) generally smaller temperature result in increased battery life(for material type A this is not so clear).

Analysis of the problem

Because there are two factors at three levels, this design is sometimes called a \(3^2\) factorial design. In this problem the engineer wants to answer the following questions:

What effects do material type and temperature have on the life of the battery?
Is there a choice of material that would give a uniformly long life regardless of temperature?

This design is a specific example of the general case of a two-factor factorial.

The fixed effects model can be described as follow \[y_{ijk} = \mu+\tau_i + \beta_j+ (\tau\beta)_{ij} +\epsilon_{ijk} \mspace{36mu} i=1,...,a \mspace{12mu} j=1,...,b \mspace{12mu} k=1,2,...,n\] where \(\mu\) is the overall mean effect, \(\tau_i\) is the effect of the \(i\)th level of the row factor A, \(\beta_j\) is the effect of the \(j\)th level of column factor B, \((\tau\beta)_{ij}\) is the effect of the interaction between \(\tau_i\) and \(\beta_j\) , and \(\epsilon_{ijk}\) is a random error component. Both factors are assumed to be fixed.

In the example, the number of levels for both factors temperature and material type are three so \(a=3\) and \(b=3\). We have four observations for each combination of the factors so \(k=4\). There are \(abn=3*3*4=36\) total observations

We are interested in testing hypotheses about the equality of row treatment effects, say \[ H_0: \tau_1=\tau_2=...=\tau_a=0\\ H_1: \tau_i\neq0 \text{ for at least one }i \]

and the equality of column treatment effects, say

\[ H_0: \beta_1=\beta_2=...=\beta_b=0\\ H_1: \beta_j\neq0 \text{ for at least one }j \]

We are also interested in determining whether row and column treatments interact. Thus, we also wish to test

\[ H_0: (\tau\beta)_{ij}=0 \text{ for all }ij\\ H_1: (\tau\beta)_{ij}\neq0 \text{ for at least one pair }ij \]

These hypotheses are tested using a two-factor analysis of variance.

The Anova table is shown in the next table.

	Df	Sum Sq	Mean Sq	F value	P value
temperature	2	39118.722	19559.361	28.967692	0.0000002
material_type	2	10683.722	5341.861	7.911372	0.0019761
temperature:material_type	4	9613.778	2403.444	3.559535	0.0186112
Residuals	27	18230.750	675.213

Because the P-value is smaller than the level \(\alpha=0.05\), we reject all null hypothesis \(H_0\) and conclude that there is a significant interaction between material types and temperature and the main effects of material type and temperature are also significant.

To assist in interpreting the results of this experiment, it is helpful to construct a graph of the average responses at each treatment combination. This graph is shown in the next figure.

The significant interaction is indicated by the lack of parallelism of the lines. In general, longer battery life is attained at low temperature, regardless of material type. Changing from low to an intermediate temperature, battery life with material type 3 may actually increase, whereas it decreases for types 1 and 2. From intermediate to high temperature, battery life decreases for material types 2 and 3 and is essentially unchanged for type 1. Material type 3 seems to give the best results if we want less loss of effective life as the temperature changes.

We can also see the tests on the individual terms (temperature, material type, and temperature: material type).

	Estimate	Standard error	Statistic	P value
(Intercept)	57.50	12.99243	4.4256540	0.0001424
temperaturelow	77.25	18.37407	4.2042942	0.0002573
temperaturemedium	-0.25	18.37407	-0.0136061	0.9892443
material_typeB	-8.00	18.37407	-0.4353962	0.6667358
material_typeC	28.00	18.37407	1.5238866	0.1391655
temperaturelow:material_typeB	29.00	25.98486	1.1160345	0.2742418
temperaturemedium:material_typeB	70.50	25.98486	2.7131183	0.0114615
temperaturelow:material_typeC	-18.75	25.98486	-0.7215740	0.4767592
temperaturemedium:material_typeC	60.50	25.98486	2.3282788	0.0276325

An F test is displayed for the model source of variation.

R-squared	Adjusted R-squared	Standard error	Statistic	P value	Df
0.7652098	0.6956423	25.98486	10.99953	9e-07	9

The P-value is small (\(0.0001\)), so the interpretation of this test is that at least one of the three terms in the model is significant. Also \(\text{R-squared }=0.7652\). That is, about \(77\) percent of the variability in the battery life is explained by the plate material in the battery, the temperature, and the material type–temperature interaction.

In the next section, we discuss the use of the residuals and residual plots in model adequacy checking.

Checking assumptions of the model

Violations of the basic assumptions and model adequacy can be easily investigated by the examination of residuals. The residuals for the two-factor factorial model with interaction are \[e_{ijk}=y_{ijk}-\hat{y}_{ijk}=y_{ijk}-\overline{y}_{ij.}\]

The normality assumption

A check of the normality assumption could be made by plotting a histogram of the residuals. If the \(NID(0,\sigma^2)\) assumption on the errors is satisfied, this plot should look like a sample from a normal distribution centered at zero. Unfortunately, with small samples, considerable fluctuation in the shape of a histogram often occurs, so the appearance of a moderate departure from normality does not necessarily imply a serious violation of the assumptions. Gross deviations from normality are potentially serious and require further analysis.

An extremely useful procedure is to construct a normal probability plot of the residuals. If the error distribution is normal, this plot will resemble a straight line. In visualizing the straight line, place more emphasis on the central values of the plot than on the extremes.

The general impression from examining this display is that the error distribution is approximately normal, although the largest negative residual (\(-60.75\) at low temperature for material type 1) does stand out somewhat from the others. The standardized value of this residual is \(\frac{-60.75}{\sqrt{675.21}}=-2.34\), and this is the only residual whose absolute value is larger than 2.

Alternatively, we can use the Shapiro-Wilk test to check the normality of the errors. In this case, the null-hypothesis of this test is that the errors are normally distributed.

The results of this test in the example are shown in the next table.

	Statistic	P value
	0.976057	0.6117267

Because the P-value is \(p=0.6117267>\alpha=0.05\), the null hypothesis that the residuals came from a normally distributed population can not be rejected. This is the same conclusion reached by analyzing the normal probability plot of the residuals.

Independence of the errors

Plotting the residuals in time order of data collection helps detect a strong correlation between the residuals. A tendency to have runs of positive and negative residuals indicates a positive correlation. This would imply that the independence assumption on the errors has been violated.

A plot of these residuals versus run order or time is shown in the next figure.

There is no reason to suspect any violation of independence or constant variance assumptions.

Nonconstant variance or homoscedasticity

If the model is correct and the assumptions are satisfied, the residuals should be structureless; in particular, they should be unrelated to any other variable including the predicted response. A simple check is to plot the residuals versus the fitted values \(\hat{y}_{ij.}\) (\(\hat{y}_{ij}=\overline{y}_{ij.}\)). This plot should not reveal any obvious pattern. The next figure plots the residuals versus the fitted values for the example.

There is some mild tendency for the variance of the residuals to increase as the battery life increases.

Inequality of variance also shows up occasionally on the plot of residuals versus run order. An outward-opening funnel pattern indicates that variability is increasing over time.

The next two figures plot the residuals versus temperature and material types, respectively.

Both plots indicate mild inequality of variance, with the treatment combination of \(15°F\) (low temperature) and material type 1 possibly having larger variance than the others.

We can see that the low temperature-material type 1 cell contains both extreme residuals (\(-60.75\) and \(45.25\)). These two residuals are primarily responsible for the inequality of variance detected in these figures and in the plot of the residuals versus fitted values. Reexamination of the data does not reveal any obvious problem, such as an error in recording, so we accept these responses as legitimate. It is possible that this particular treatment combination produces a slightly more erratic battery life than the others. The problem, however, is not severe enough to have a dramatic impact on the analysis and conclusions.

Although residual plots are frequently used to diagnose inequality of variance, several statistical tests have also been proposed. These tests may be viewed as formal tests of the hypotheses \[H_0:\sigma_1^2=\sigma_2^2=...=\sigma_a^2\] \[H_1:\text{above not true for at least one } \sigma_i^2\]

A widely used procedure to test the homogeneity of variances is the Bartlett’s test. The procedure involves computing a statistic whose sampling distribution is closely approximated by the chi-square distribution.

The results of this test in the example are shown in the next two tables. The first table tests the homogeneity of variances of the residuals for each level of factor temperature and the second table for levels of the factor material type.

	Statistic	P value
	3.311821	0.1909182

	Statistic	P value
	3.173694	0.2045696

The P-value is bigger than the level \(\alpha=0.05\), so we cannot reject the null hypothesis.

Because Bartlett’s test is sensitive to the normality assumption, there may be situations where an alternative procedure would be useful. The modified Levene test is a very nice procedure that is robust to departures from normality. To test the hypothesis of equal variances in all treatments, the modified Levene test uses the absolute deviation of the observations \(y_{ij}\) in each treatment from the treatment median.

The results of this test in the example are shown in the next two tables. The first table tests the homogeneity of variances of the residuals for each level of factor temperature and the second table for levels of the factor material type.

	Statistic	P value
	1.382132	0.2651959

	Statistic	P value
	1.672862	0.2032358

The P-value is bigger than the level \(\alpha=0.05\), so we cannot reject the null hypothesis (that all three variances are the same).

Multiple comparisons

When the ANOVA indicates that row or column means differ, it is usually of interest to make comparisons between the individual row or column means to discover the specific differences.

We now illustrate the use of Tukey’s test on the battery life data for example. Note that in this experiment, interaction is significant. When the interaction is significant, comparisons between the means of one factor (e.g., A) may be obscured by the AB interaction. One approach to this situation is to fix factor B at a specific level and apply Tukey’s test to the means of factor A at that level.

To illustrate, suppose that in the example we are interested in detecting differences among the means of the three material types. Because interaction is significant, we make this comparison at just one level of temperature, say level 2 (medium temperature). We assume that the best estimate of the error variance is the \(MS_E\) from the ANOVA table, utilizing the assumption that the experimental error variance is the same over all treatment combinations.

The results of this test in the example are shown in the next table (the rows marked in bold specify the particular test).

	Difference	Lower ci	Uper ci	P value
low:A-high:A	77.25	15.426816	139.073184	0.0067471
medium:A-high:A	-0.25	-62.073184	61.573184	1.0000000
high:B-high:A	-8.00	-69.823184	53.823184	0.9999508
low:B-high:A	98.25	36.426816	160.073184	0.0003574
medium:B-high:A	62.25	0.426816	124.073184	0.0474675
high:C-high:A	28.00	-33.823184	89.823184	0.8347331
low:C-high:A	86.50	24.676816	148.323184	0.0018765
medium:C-high:A	88.25	26.426816	150.073184	0.0014679
medium:A-low:A	-77.50	-139.323184	-15.676816	0.0065212
high:B-low:A	-85.25	-147.073184	-23.426816	0.0022351
low:B-low:A	21.00	-40.823184	82.823184	0.9616404
medium:B-low:A	-15.00	-76.823184	46.823184	0.9953182
high:C-low:A	-49.25	-111.073184	12.573184	0.2016535
low:C-low:A	9.25	-52.573184	71.073184	0.9998527
medium:C-low:A	11.00	-50.823184	72.823184	0.9994703
high:B-medium:A	-7.75	-69.573184	54.073184	0.9999614
low:B-medium:A	98.50	36.676816	160.323184	0.0003449
medium:B-medium:A	62.50	0.676816	124.323184	0.0460388
high:C-medium:A	28.25	-33.573184	90.073184	0.8281938
low:C-medium:A	86.75	24.926816	148.573184	0.0018119
medium:C-medium:A	88.50	26.676816	150.323184	0.0014173
low:B-high:B	106.25	44.426816	168.073184	0.0001152
medium:B-high:B	70.25	8.426816	132.073184	0.0172076
high:C-high:B	36.00	-25.823184	97.823184	0.5819453
low:C-high:B	94.50	32.676816	156.323184	0.0006078
medium:C-high:B	96.25	34.426816	158.073184	0.0004744
medium:B-low:B	-36.00	-97.823184	25.823184	0.5819453
high:C-low:B	-70.25	-132.073184	-8.426816	0.0172076
low:C-low:B	-11.75	-73.573184	50.073184	0.9991463
medium:C-low:B	-10.00	-71.823184	51.823184	0.9997369
high:C-medium:B	-34.25	-96.073184	27.573184	0.6420441
low:C-medium:B	24.25	-37.573184	86.073184	0.9165175
medium:C-medium:B	26.00	-35.823184	87.823184	0.8822881
low:C-high:C	58.50	-3.323184	120.323184	0.0742711
medium:C-high:C	60.25	-1.573184	122.073184	0.0604247
medium:C-low:C	1.75	-60.073184	63.573184	1.0000000

This analysis indicates that at the medium temperature level, the mean battery life is the same for material types B and C and that the mean battery life for material type A differs significantly in comparison to both types B and C. Specifically, the mean battery life for material type A is significantly lower in comparison to both types B and C (see the graph of the average responses at each treatment combination).

As the interaction is significant, we could compare all \(ab=9\) cell means to determine which ones differ significantly. In this analysis, differences between cell means include interaction effects as well as both main effects. In the example, this would give 36 comparisons between all possible pairs of the nine-cell means (all these comparisons can be seen in the previous table).

References

Design and analysis of experiments, Montgomery.

	low	low	medium	medium	high	high
A	130	155	34	40	20	70
A	74	180	80	75	82	58
B	150	188	136	122	25	70
B	159	126	106	115	58	45
C	138	110	174	120	96	104
C	168	160	150	139	82	60

	low	low	medium	medium	high	high
A	130	155	34	40	20	70
A	74	180	80	75	82	58
B	150	188	136	122	25	70
B	159	126	106	115	58	45
C	138	110	174	120	96	104
C	168	160	150	139	82	60

	low	low	medium	medium	high	high
A	130	155	34	40	20	70
A	74	180	80	75	82	58
B	150	188	136	122	25	70
B	159	126	106	115	58	45
C	138	110	174	120	96	104
C	168	160	150	139	82	60