KITADA

Lesson #8

Inference for comparing two-proportions: Fisher's Exact Test and Two Sample Z-Test for Proportions

Motivation:

In Lesson 7, we learned that a randomization test is a way to determine the exact p-value for a two-proportion problem. However, it is rare that all of the possible randomizations can be listed. The twoprop macro can generate several thousand of the possible randomizations which will give an accurate approximation to the exact p-value. In this lesson, we’ll learn a method that will easily calculate an exact p-value without having to list all the unique randomizations: the Fisher’s Exact Test. We’ll show an example of how to calculate the exact p-value from Fisher’s Exact Test in this lesson. While doing so, notice how complex and computationally intense the calculations are. Modern computer software programs can handle such computationally intense calculations, but if the software is not powerful enough to do so, the twoprop macro provides a very accurate approximation to the exact p-value found from Fisher’s Exact Test.

We will also learn how to do an approximation using a two sample z procedure for proportions.

What you need to know from this lesson:

After completing this lesson, you should be able to

Calculate the exact p-value using the Fisher's Exact Test
Explain why an approximation method may be necessary to approximate the exact p-value even with modern technology

To accomplish the above “What You Need to Know”, do the following:

1. Attend lecture and answer the questions on the following pages of this lesson.
2. Read the material in this lesson
3. Do the Lesson 8 questions at the end of the lesson notes

The Lesson

When discussing the Randomization Test in Lesson 7, we discussed that it may be nearly impossible to list all of the randomizations to obtain an exact p-value. Even so, we will still be able to calculate the exact p-value using Fisher’s Exact Test. This is because Fisher’s Exact Test will use formulas to determine probabilities of observing tables of counts like the one observed or tables of counts that are “more unusual” than the one observed if the null hypothesis is true from ALL unique randomizations.

Notice that the Fisher’s Exact Test uses tables of counts. Recall that one way to summarize information from two categorical variables is with a table of counts. To see how the Fisher’s Exact Test calculates the p-value, let’s return to the Friendly Observers example from Lesson 7:

Example 1: Friendly Observers

In a study published in the Journal of Personality and Social Psychology, researchers investigated a conjecture that having an observer with a vested interest would decrease subjects’ performance on a skill-based task. Subjects were given time to practice playing a video game that required them to navigate an obstacle course as quickly as possible. They were then told to play the game one final time with an observer present. Subjects were randomly assigned to one of two groups. One group was told that the participant and observer would each win $3 if the participant beat a certain threshold time, and the other group was told only that the participant would win the prize if the threshold were beaten. The investigator is interested in determining if having an observer with a vested interest would change subjects’ performance on this task. In other words, would subjects perform differently on the task if they knew they had to share the prize?

Here are the results:

Threshold<-c(rep("Beat Threshold", 1), rep("Did Not Beat", 7),
             rep("Beat Threshold", 5), rep("Did Not Beat", 2))
Treatments<-c(rep("A: Share Prize",8), rep("B: No Share", 7))

prize_table<-addmargins(table(Threshold, Treatments))
prize_table

##                 Treatments
## Threshold        A: Share Prize B: No Share Sum
##   Beat Threshold              1           5   6
##   Did Not Beat                7           2   9
##   Sum                         8           7  15

1. Re-state the null and alternative hypotheses.

$ H_0: p_{no share} = p_{share} $

$ H_A: p_{no share} \neq p_{share} $

$ p_{no share} $: The proportion of people who pass the threshold in the “no share” group
$ p_{share} $: The proportion of people who pass the threshold in the “share” group

2. What proportion of all people in the study beat the threshold?

6/15=0.4

## Error in 6/15 = 0.4: target of assignment expands to non-language object

3. Would this proportion be the same for both comparison groups if the null hypothesis is true?

Yes, if the null was true the proportions would be the same for both groups

4. If the null hypothesis is true, how many people in the share the prize group would we expect to beat the threshold? (Note: this could be a decimal. If it is, leave it as a decimal.)

### EXPECTED BEAT THRESHOLD FOR SHARE GROUP
8*0.4

## [1] 3.2

5. If the null hypothesis is true, calculate the expected counts for all remaining cells in the table. Complete the table below with the expected counts if the null hypothesis is true:

### EXPECTED BEAT THRESHOLD FOR SHARE GROUP
8*0.4

## [1] 3.2

### EXPECTED BEAT THRESHOLD FOR NO SHARE GROUP
7*0.4

## [1] 2.8

### EXPECTED NOT BEAT THRESHOLD FOR SHARE GROUP
8*0.6

## [1] 4.8

### EXPECTED NOT BEAT THRESHOLD FOR NO SHARE GROUP
7*0.6

## [1] 4.2

To determine the p-value, you only need to consider one of the cells in the table. It’s usually easiest to use the cell with the observed count closest to 0 (but any could be used).

As has been done with all previous methods, we’ll determine what counts in the chosen cell are considered “as or more unusual” and then find the probability of observing such counts if the null hypothesis is true.

6. The cell with an observed count closest to 0 is the cell for beating the threshold in the share the prize group. Let’s consider this cell. What counts would be as or more unusual than the observed count in this cell if the null hypothesis is true? In other words, what counts would be as or further away from the expected count for this cell than the original count is from the expected count? (Again, a number line may help.)

The observed number of people who beat the threshold in the share group was 1.

Thus based on our two sided hypothesis, the values that would be “as or more extreme” are 0 and 1 on the lower end and 6, 7, and 8 the upper end.

A Fisher’s Exact Test is still a Randomization Test. Therefore, it still considers all possible randomizations of the 15 participants to the two comparison groups so that 8 are in the share the prize group and the other 7 are in the no share the prize group. In addition, the same six people will beat the threshold no matter what group they’ve been assigned to (if the null hypothesis is true). So, each randomization will have six “winners” and nine that do not beat the threshold.

7. Suppose one of these randomizations resulted in 2 participants in the share the prize group that beat the threshold.

a. Fill in the table of counts for this randomization:

Threshold_7a<-c(rep("Beat Threshold", 2), rep("Did Not Beat", 6),
             rep("Beat Threshold", 4), rep("Did Not Beat", 3))
Treatments_7a<-c(rep("A: Share Prize",8), rep("B: No Share", 7))

prize_table_7a<-addmargins(table(Threshold_7a, Treatments_7a))
prize_table_7a

##                 Treatments_7a
## Threshold_7a     A: Share Prize B: No Share Sum
##   Beat Threshold              2           4   6
##   Did Not Beat                6           3   9
##   Sum                         8           7  15

b. Would this table of counts be “more unusual” than the table of counts created from what was observed in the original study?

No, in the original study we observed 1 and the expected was 3.2. This means that values between 1 and 5.4 are more in line with what we expect under the null. Therefore, they are not as extreme.

What we want to do is list all tables of counts from randomizations that would be considered “as or more unusual”. That is we want to list tables of counts from randomizations that would produce counts in the beat the threshold for the share the prize cell that are “as or more unusual” than the one observed.

8. From #6 above, what counts in the beat the threshold in the share the prize cell are considered “as or more unusual?”

Thus based on our two sided hypothesis, the values that would be “as or more extreme” are 0 and 1 on the lower end and 6 on the upper end. Remember that we are constrained by the 6 “successes” in the overall population.

9. For any counts in the beat the threshold for the share the prize cell that are “more unusual” than the observed count, create the table of counts from that randomization. (This will always include the original (i.e. “observed”) table of counts.)

#### ORIGINAL TABLE
Threshold<-c(rep("Beat Threshold", 1), rep("Did Not Beat", 7),
             rep("Beat Threshold", 5), rep("Did Not Beat", 2))
Treatments<-c(rep("A: Share Prize",8), rep("B: No Share", 7))

prize_table<-addmargins(table(Threshold, Treatments))
prize_table

##                 Treatments
## Threshold        A: Share Prize B: No Share Sum
##   Beat Threshold              1           5   6
##   Did Not Beat                7           2   9
##   Sum                         8           7  15

#### MORE EXTREME (TABLE 1)
Threshold0<-c(rep("Beat Threshold", 0), rep("Did Not Beat", 8),
             rep("Beat Threshold", 6), rep("Did Not Beat", 1))
Treatments0<-c(rep("A: Share Prize",8), rep("B: No Share", 7))

prize_table0<-addmargins(table(Threshold0, Treatments0))
prize_table0

##                 Treatments0
## Threshold0       A: Share Prize B: No Share Sum
##   Beat Threshold              0           6   6
##   Did Not Beat                8           1   9
##   Sum                         8           7  15

#### MORE EXTREME (TABLE 2)
Threshold6<-c(rep("Beat Threshold", 6), rep("Did Not Beat", 2),
             rep("Beat Threshold", 0), rep("Did Not Beat", 7))
Treatments6<-c(rep("A: Share Prize",8), rep("B: No Share", 7))

prize_table6<-addmargins(table(Threshold6, Treatments6))
prize_table6

##                 Treatments6
## Threshold6       A: Share Prize B: No Share Sum
##   Beat Threshold              6           0   6
##   Did Not Beat                2           7   9
##   Sum                         8           7  15

Calculating a p-value from the Fisher’s Exact Test

The way the Fisher’s Exact Test calculates the p-value is it calculates the probability of observing the table of counts created from the original data (this is the “as” part of “as or more unusual”), and any tables created from randomizations that produced something “more unusual” than the original table of counts. In our example,

the table of counts with 1 person beating the threshold in the share the prize group was what was observed.

we determined that any randomizations that produced 0 people or 6 people beating the threshold in the share the prize group would be further away from the expected number of people in this group if the null hypothesis is true (which was 3.2) than the observed count is from 3.2

            Suppose we had the following table of counts

            Treatments

            Threshold        A: Share Prize B: No Share Sum

            Beat Threshold              a           b   R1

            Did Not Beat                c           d   R2

            Sum                         C1          C2  N

            The formula for determining the probability of this table of counts occuring is: 

            R1!R2!C1!C2!/a!b!c!d!N!

Let’s use this formula to show how the Fisher’s Exact test calculates the exact p-value:

Let’s start with what was observed:

#### ORIGINAL TABLE
Threshold<-c(rep("Beat Threshold", 1), rep("Did Not Beat", 7),
             rep("Beat Threshold", 5), rep("Did Not Beat", 2))
Treatments<-c(rep("A: Share Prize",8), rep("B: No Share", 7))

prize_table<-addmargins(table(Threshold, Treatments))
prize_table

##                 Treatments
## Threshold        A: Share Prize B: No Share Sum
##   Beat Threshold              1           5   6
##   Did Not Beat                7           2   9
##   Sum                         8           7  15

## In this example
a=1
b=5
c=7
d=2
R1=6
R2=9
C1=8
C2=7
N=15

## Putting these numbers into the forumla, we get

(factorial(6)*factorial(9)*factorial(8)*factorial(7))/
  (factorial(1)*factorial(5)*factorial(7)*factorial(2)*factorial(15))

## [1] 0.03356643

This says that the probability that a randomization of these 15 people to the two groups (so that there are 8 in the share the prize group and 7 in the no share the prize group) results in 1 person in the share the prize group who beats the threshold is 0.03357. In other words, 3.3357% of all the unique randomizations result in 1 of the six “winners” being assigned to the share the prize group. (Remember, for each randomization, there are six people who beat the threshold and nine that don’t.)

Now we have to consider the randomizations that produced tables of counts “more unusual”.

#### Let’s find the probability of observing a table of counts with 0 “winners” in the share the prize group.

### The table of counts would look like this:

Threshold0<-c(rep("Beat Threshold", 0), rep("Did Not Beat", 8),
             rep("Beat Threshold", 6), rep("Did Not Beat", 1))
Treatments0<-c(rep("A: Share Prize",8), rep("B: No Share", 7))

prize_table0<-addmargins(table(Threshold0, Treatments0))
prize_table0

##                 Treatments0
## Threshold0       A: Share Prize B: No Share Sum
##   Beat Threshold              0           6   6
##   Did Not Beat                8           1   9
##   Sum                         8           7  15

(factorial(6)*factorial(9)*factorial(8)*factorial(7))/
  (factorial(0)*factorial(6)*factorial(8)*factorial(1)*factorial(15))

## [1] 0.001398601

#### Next: find the probability of observing a table of counts with 6 “winners” in the share the prize group. 

### The table of counts would look like this:

Threshold6<-c(rep("Beat Threshold", 6), rep("Did Not Beat", 2),
             rep("Beat Threshold", 0), rep("Did Not Beat", 7))
Treatments6<-c(rep("A: Share Prize",8), rep("B: No Share", 7))

prize_table6<-addmargins(table(Threshold6, Treatments6))
prize_table6

##                 Treatments6
## Threshold6       A: Share Prize B: No Share Sum
##   Beat Threshold              6           0   6
##   Did Not Beat                2           7   9
##   Sum                         8           7  15

(factorial(6)*factorial(9)*factorial(8)*factorial(7))/
  (factorial(6)*factorial(0)*factorial(2)*factorial(7)*factorial(15))

## [1] 0.005594406

Add the probabilities of these three tables (the “as” and the two “more extreme” tables of counts) to get the p-value:

0.03357 + 0.00140 + 0.00559 = 0.04056

Therefore, the exact p-value from Fisher’s Exact Test is 0.04056.

When CAN’T the Fisher’s Exact Test be used?

Never when a p-value is all that is desired. With two categorical variables each with two categories, the Fisher’s Exact Test can always be used to obtain a p-value. But, there may be limitations to what the technology can do because the formulas to calculate a probability of a specific table can result in very large numbers. Most modern software (including Minitab) can handle such calculations. However, if such software is not readily available, approximation methods (the twoprop macro and the normal approximation method (not discussed in this class)) will provide an accurate approximation to the exact p-value.

However, if a confidence interval for the difference in population proportions is desired, the twoprop macro must be used as the Fisher’s Exact Test does NOT provide a confidence interval for the difference in proportions!

Two Sample Z-Procedures for Proportions

If we have a large enough sample size we can also perform Z-procedures for proportions.

The following conditions must be met to make a large sample confidence interval for p1-p2 for proportions:

Sample randomly selected
Samples are independent
Sample sizes must be large enough.
- Rule 1: At least 10 successes in each group
- Rule 2: At least 10 failures in each group

Hypotheses:

$ H_0: p_1-p_2=0 $ or $ H_0: p_1=p_2 $

$ H_A: p_1-p_2<0 $ or \( H_0: p_1
$ H_0: p_1-p_2>0 $ or $ H_0: p_1>p_2 $
$ H_0: p_1-p_2 \neq 0 $ or $ H_0: p_1 \neq p_2 $

Test statistic: $ z=\frac{\hat{p}_1-\hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})(\frac{1}{n_1}+\frac{1}{n_2})}} $

where $ \hat{p}=\frac{x_1+x_2}{n_1+n_2} $

Confidence Interval: $ CI = (\hat{p}_1-\hat{p}_2) \pm z^* \times SE_{\hat{p}_1-\hat{p}_2} $

where $ SE_{\hat{p}_1-\hat{p}_2} = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} $