KITADA

Lesson #25

The Analysis of Variance Procedure

Motivation:

In many experiments, researchers record a quantitative response variable and want to see how different treatments affect this response variable. If certain conditions exist (such as similar spread of the values of the response variable for each treatment group and the values of the response variable are approximately normally distributed for each treatment group), group means can be compared using the Analysis of Variance procedure. The Analysis of Variance procedure is a method of comparing the effects of the treatment groups on the response variable.

What you need to know from this lesson:

After completing this lesson, you should be able to

To accomplish the above “What You Need to Know”, do the following:

The Lesson

Example: What's your names?

Psychologists at Lancaster University (United Kingdom) evaluated three methods of name retrieval in a controlled setting (Journal of Experimental Psychology—Applied, June 2000). A sample of 139 students was randomly divided into three groups and each group of students used a different method to learn the names of the other students in the group. Group 1 used the “simple name game,” where the first student states his/her full name, the second student announces his/her name and the name of the first student, the third student says his/her name and names of the first two students, etc. Group 2 used the “elaborate name game,” a modification of the simple name game where the students not only state their names but also their favorite activity. Group 3 used “pairwise introductions,” where students are divided into pairs and each student must introduce the other member of the pair. One year later, all subjects were sent pictures of the students in their group and asked to state the full name of each. The researchers measured the percentage of names recalled for each student respondent.

Are the mean percentages of names recalled different for the three name retrieval methods?

1. What is the response variable? Is it quantitative or categorical?

2. What is the explanatory variable? Is it quantitative or categorical?

3. What study design was used to collect the data? That is, was this a sampling design or experiment?

“A sample of 139 students was randomly divided into three groups and each group of students used a different method to learn the names of the other students in the group.”

This is an experiment since students were randomly assigned to the treatment groups.

The Analysis of Variance procedure is a two-step process:

Step 1: Determine if there is evidence to say that at least one group has a different mean by performing an F-test.

Step 2: Determine which groups have different means by performing a Multiple Comparison method.

Step 1:

4. The null hypothesis is that the different name retrieval methods had no effect on the mean percentage of names recalled one year later. If this were the case, what would the mean of all the groups be in relation to each other? State the null hypothesis in statistical notation AND in words.

\( H_0: \mu_{SNG}=\mu_{ENG}=\mu_{PI} \), there is no difference between the group means

Notation:

5. If one of the name retrieval methods did have an effect on recalling names one year later, what would you expect the mean percentage of names recalled for that group to be compared to the others? Using this idea, state the alternative hypothesis in words.

\( H_A \): At least one of the means is different

6. Before testing the above null hypothesis, it is a good idea to “explore” the data. The actual data are NOT given here, but a box-and-whisker plot and summary statistics are provided below:

(SEE PLOT ON HANDOUT)

Summary Statistics

            Group               sample size   mean(%)   st dev(%)   
        1: simple name game           50        30.64       20.0354 
        2: elaborate name game        42        26.2143     23.7019
        3: pairwise introductions     47        15.1277     15.7032

a) Does it appear that the three groups have roughly the same “centers”?

All three dont look the same, but when looking at the pairwise comparisons it looks like groups might have roughly the same center due to the overlap of the interquartile range.

b) Does it appear that the three groups have roughly the same spread?

It looks like the spread for the three groups are different but close.

7. To test the above null hypothesis, which test is used?

We can use an Anova F-test to test the equality of means.

8. As with any hypothesis test, there are certain conditions that must be met for conclusions to be valid to the populations of interest. What are those conditions?

9. Let’s concentrate on the “constant variation” condition. This implies that the spread within each group is the same. (Sometimes this is stated that all groups have the same standard deviation (i.e. \( \sigma_1=\sigma_2=\sigma_3 \)). What methods can be used to assess this condition?

a. Using the box-and-whisker plot on a previous page and the residual plot below, do you feel that the spread is similar in each group?

(SEE PLOT ON HANDOUT)

The spread is “similar enough” that I feel comfortable saying that these groups came from populations with the same spread.

b. Sometimes only summary information is provided. How can the summary information be used to assess the condition that all three groups have the same spread? Using this “rule”, is it safe to say that the spread is the same in all three groups (i.e. \( \sigma_1=\sigma_2=\sigma_3 \))?

Rule of thumb: \( \frac{largest s_i}{smallest s_i} <2 \), then we can feel safe assuming \( \sigma_1=\sigma_2=\sigma_3 \)

\( \frac{23.7019}{15.7032} =1.5 <2 \), Yes

10. If all three groups have the same population standard deviation, we don’t need to use subscripts to distinguish between the three groups. That is, if we feel comfortable saying the spread is the same in each group, then \( \sigma_1=\sigma_2=\sigma_3 \).

a. We estimate with sp, the “pooled” sample standard deviation. The pooled sample standard deviation is a weighted average of all the sample standard deviations (weighted by the sample size). It is legitimate to “pool” the sample standard deviations only if it is believed the groups come from populations with the same spread.

Here is a general formula for calculating \( s_p \): \( s_p=\sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2+...+(n_g-1)s_g^2}{n_1+n_2+...+n_g-g}} \),

where \( g \) = the number of groups

Upon close examination of this formula, identify what value in the Analysis of Variance table the numerator and denominator represent:

Therefore, what value in the Analysis of Variance table is \( s_p \)?

\( s_p=\sqrt{MSE} \)

b. Instead of calculating \( s_p \), let’s calculate \( s_p^2 \). Calculate \( s_p^2 \) and place that in the appropriate place in the Analysis of Variance table.

\( s_p^2=\frac{54045.59}{136}=\frac{SSE}{DFE}=MSE=397.394 \)

11. SST = 60155.5. Fill in the rest of the ANOVA table.

  Source                   sum of squares   df      mean squares        F-statistic

    Groups (between)        6109.91         2       3054.955             7.687                                    
    Error (within groups)   54045.59        136     397.394         
    Total                   60155.5         138               

12. How many degrees of freedom are associated with the F-statistic?

NUMERATOR: 2 DENOMINATOR: 136

13. What is the p-value associated with this F-statistic?

P-VALUE < 0.001 (from calculator p-value = 0.00069)

14. What is your conclusion?

There is strong evidence to indicate that at least one name game method has a different mean percent of names recalled one year later than the others (p-value = 0.00069)

15. What percent of the total variation is explained by the model (that is, the difference in the means)?

(Note: this is also called the coefficient of determination)

\( R^2 \) = SSG/SST = 6109.91/60155.5 = 0.1016 = 10.16%

Step 2:

Since there is evidence to reject the null hypothesis from the F-test, the next step is to determine which group or groups have different means by using a Multiple Comparison Method. One such method is the Bonferroni method. This method involves comparing two groups at a time by doing two-sample t-tests.

16. How many pairwise comparisons will be made? What are they?

3 pairwise comparisons

17. For each of the pairwise comparisons, a two-sample t-test will be performed. What are the null and alternative hypotheses in each of these two-sample t-tests?

\( H_0: \mu_i=\mu_j \)

\( H_A: \mu_i \neq \mu_j \)

18. How many degrees of freedom will the t-statistic have in each of these two-sample t-tests?

DFE= n-g = 139-3=136

19. For any multiple comparison method, the significance level for each test needs to be adjusted. Why?

Significance level (\( \alpha \)= “alpha”): (typically \( \alpha \) = 0.05) this is the “cut-off point” between evidence to reject \( H_0 \) or failing to reject \( H_0 \).

P(Type I Error) = \( \alpha \)

Suppose \( \alpha \) = 0.05 on each hypothesis test and we do 2 tests.

P(At least one Type I Error) = 0.0975

So if we want our overall \( \alpha \) to be 0.05, we need to adjust the significance level for each individual test.

20. The Bonferroni Multiple Comparison method adjusts the significance level for each test this way:

New \( \alpha = \frac{Desired Type I Error}{Number of Pairwise Comparisons} \)

a. Calculate the new significance level for each two-sample t-test if we want to keep the probability of making at least one Type I Error at 0.05.

We want to keep P(at least one Type I Error) to something reasonable, such as 0.05

If we want P(at least one Type I Error) = 0.05, then new \( \alpha \) = 0.05/3 = 0.0167

b. As another example, suppose 4 treatment groups were being compared and researchers wanted to keep the probability of making at least one Type I Error at 0.05. What is the new significance level for each two-sample t-test?

6 comparisons can be made

New \( \alpha \) = 0.05/6 = 0.083

21. Back to our example: let’s do the t-test for just one of these comparisons. Compare the mean percent recall for those subjects who used the simple name game with those who used the elaborate name game.

\( H_0: \mu_s=\mu_e \)

\( H_A: \mu_s \neq \mu_e \)

Since p-value >0.0167, we fail to reject the null.

22. Here are the t-statistics and p-values for the other comparisons. Which group means are different?

Comparison          t-statistic     p-value
simple vs pairwise      3.83        .00019      
elaborate vs pairwise   2.62        .00979

Remember that we are comparing the above p-values to our new significance level of 0.0167.

For both tests, we found strong evidence to suggest Pairwise has a different mean than simple and elaborate name games.

This implies that mean percent of names recalled one year later is different for those in the pairwise group compared to the other two. Verify this by looking the side-by-side boxplots.