ANTH 504 – Chi-square Lab

There are 4 things we want to know for each statistical test:

  1. What types of variables are needed for this type of test?

  2. What is the null and alternative hypothesis?

  3. How do we conduct the test?

  4. What are the assumptions? How do we test them? What to do if the assumptions are not met?

In this lab, we’ll go through these steps to solve the Chi-Square test.

Variables ### 1)
Based on what we discussed in lecture, what types of variables do we need to conduct a chi-square test? Categorical

Null and Alternative Hypotheses ### 2)
Recall: What is the null and alternative hypothesis for the chi-square test?

Ho: p1 = p2 (proportions are equal)

Ha: p1 ≠ p2 (proportions are unequal

Conducting the Chi-Square test To conduct our chi-square test, we will use data from the dataset “High sex ratios in rural China: declining well-being with age in never-married men”. This dataset has information on partnership and education for men aged 20-29 in rural China. 691 men were included in the sample, 351 unpartnered and 340 partnered men. The distribution of education is as follows: Middle or high school: 156 unpartnered, 204 partnered College or higher: 195 unpartnered, 136 partnered

3)

Create a contingency table of the data below:

cont_table <- data.frame(partnered=c(204,136), Unpartnered=c(156,195))
cont_table 
##   partnered Unpartnered
## 1       204         156
## 2       136         195

Let’s import our data:

There are two assumptions of the chi-square test:

  1. The data are independent

  2. The expected frequencies > 5 for each cell.

4)

Are the data independent? Yes. The individuals are either partnered or un-partnered. The individuals either at middle or high school or college education level. No individual is or could be both.

5)

Are the expected frequencies > 5 for each cell. How can you determine this? Yes, all expected values are well above 100 from the contingency table, calculate the row times the column divided by total = expected for each row column pairing

chi squared test in excel

To conduct the chi-square test, we can use the R code: chisq.test() Recall that you can either enter in the cont_table into a separate variable and then use chisq.test() or you can do it all in one line of code (from our lecture notes):

c=2
r=2
df <-(c-1)*(r-1)
df
## [1] 1
chisq.test(data.frame(c(204,136),c(156,195)))
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data.frame(c(204, 136), c(156, 195))
## X-squared = 16.128, df = 1, p-value = 5.92e-05
qchisq(0.95, df = df)
## [1] 3.841459

6)

At this point, can you determine if the data meet the assumptions of our test? Why or why not? Yes, R did not give us a warning. If the expected frequencies were below 5, R would give a warning and run the code with a Yates’ continuity correction

7)

What is the p-value? What would you conclude? Is there a significant difference between the proportion of men who are partnered or not based on completed education?
p=5.92e-05 The value we found (16.13) is larger than the critical value (3.84) and the p-value is less than 0.05 (p=5.92e-05). Therefore, we have strong evidence to reject the null hypothesis in favor of the alternative that the proportions are different.

ODDS RATIO Odds ratios allow us to examine the strength of the relationship between two categorical variables, but it can only compare four cells (it can’t handle larger contingency tables). So, we’ll need to decide which variables we would like to compare in this example. Let’s examine the relationship between being partnered and attaining higher education vs. middle/high school. To calculate the odds ratio, we need to calculate: odds (partnered after middle/high school) / odds (unpartnered after middle/high school). These odds can be calculated as follows: Odds (partnered after higher education) = Number that have higher education and are partnered / Number that have higher education and are not partnered. Odds (partnered after middle/high school) = Number that have middle/high school and are partnered / Number that have middle/high school but aren’t partnered. Note: When you take the odds ratio, it makes it easier to interpret if you do the larger number divided by the smaller.
8) Please calculate the odds ratio for this example

# Observed
part_mid_hs <- 204
part_college <- 136
unpart_mid_hs <- 156
unpart_college <- 195
# Odds (partnered after higher education) = Number that have higher education and are partnered / Number that have higher education and are not partnered
odds_part_after_college <- part_college / unpart_college
odds_part_after_college
## [1] 0.6974359
# Odds (partnered after middle/high school) = Number that have middle/high school and are partnered / Number that have middle/high school but aren’t partnered.
odds_part_precollege <- part_mid_hs / unpart_mid_hs
odds_part_precollege
## [1] 1.307692
#Larger / Smaller
partered_secondary_per_partnered_tertiary  <- odds_part_precollege / odds_part_after_college
partered_secondary_per_partnered_tertiary 
## [1] 1.875

REPORTING THE RESULTS

Review the slide on communicating results:

From lecture notes:

  1. Start with some descriptive statistics.

  2. The description tells you what the null hypothesis being tested is

  3. A “stat” block is included

  4. The results are interpreted

9)

Write a 4-sentence paragraph describing the results of your chi-square test.

Of the 691 individuals this survey,360 individuals highest level of education was middle or high school level, while 331 had at least entered college. A chi-square test for association with Yates continuity correction was conducted to test whether there was an association between education level and being partnered. Results show a significant association between the education level and whether partnered χ2(1, N = 691 =16.13 p < 5.92e-05 Based on the odds ratio, the odds of being partnered in middle and high school were 1.88 times higher than partnering after obtaining at least some college education.

FISHER’S EXACT TEST In this case, we did not have to run Fisher’s Exact Test, but let’s examine an example where that might be necessary. Lady tea tasting: Some people argue that they can tell (by taste alone) whether a cup of tea with milk had the tea poured first or the milk poured first. An experiment was performed. Eight cups of tea were prepared and presented in random order. Here are the results:

cont_table <- data.frame(Tea=c(3,1), Milk=c(1,3))
cont_table
##   Tea Milk
## 1   3    1
## 2   1    3

In this case, the null hypothesis is that the proportions are equal (meaning the person cannot tell which was really poured first). Evidence for the proportions not being equal means that there is evidence the person can tell which was poured first. Run a Fisher’s exact test: fisher.test() to determine if there is evidence that this person can tell by taste alone which was poured first.

fisher.test(data.frame(c(3,1),c(1,3)))
## 
##  Fisher's Exact Test for Count Data
## 
## data:  data.frame(c(3, 1), c(1, 3))
## p-value = 0.4857
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##    0.2117329 621.9337505
## sample estimates:
## odds ratio 
##   6.408309

10)

Is Fisher’s exact test significant? What would you conclude?

p-value = 0.4857 is not significant. (p > 0.05)

Results do not show a significant association between believed poured first, whether tea or milk. We fail to reject the null as evidence does not support that the proportions are unequal.

NO INDEPENDENCE ### 11) Our other assumption is that our data are independent. What would you do if the data were not independent? (Hint: Check your text for the answer) Abort! Consider McNemar Test

HW #6: Chi-square test

We will use data that reports the number of goals scored in a soccer match by individuals who work for U of I and those that work for Boise State (note: this data is fictional). The contingency table is as follows:

  1. What is the proportion of individuals who scored from Boise State? 23 out of 53 individuals

What is the proportion of individuals who scored from U of I? 5 out of 24 individuals

c=2
r=2
df <-(c-1)*(r-1)
df
## [1] 1
  1. Please enter this data into R (there is no data file) and run the appropriate test on this data to determine if the proportion of Boise State employees who scored goals is significantly different than the proportion of U of I employees who scored goals.
cont_table <- data.frame(UofI=c(5,19), BSU=c(23,30))
cont_table 
##   UofI BSU
## 1    5  23
## 2   19  30
chisq.test(cont_table) 
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  cont_table
## X-squared = 2.7246, df = 1, p-value = 0.09881
chisq.test(data.frame(c(5,19),c(23,30)))
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data.frame(c(5, 19), c(23, 30))
## X-squared = 2.7246, df = 1, p-value = 0.09881
qchisq(0.95, df = 1)
## [1] 3.841459

This result has the Yates’ continuity correction.

chisq.test(cont_table, correct=FALSE) 
## 
##  Pearson's Chi-squared test
## 
## data:  cont_table
## X-squared = 3.6342, df = 1, p-value = 0.0566
  1. Is there a significant difference between the proportions? p-value = 0.09881 (p > 0.05). Therefore, we fail to reject the null that their is no association between university and whether the individual scored or not.

  2. Are any of the expected values below 5?
    Lowest expected value is 8.7. BSU vs UofI expectancy table

  3. Do we need to run a Fisher’s exact test? No

  4. Determine the odds ratio and interpret it.

# Observed
UofI_score <- 5
UofI_noscore <- 19
BSU_score <- 23
BSU_noscore <- 30
# Odds (UofI) = Number scored and at UofI / Number not scored and at UofI
odds_UofI_score_noscore <- UofI_score / UofI_noscore
odds_UofI_score_noscore
## [1] 0.2631579
# Odds (BSU) = Number scored and at BSU / Number not scored and at BSU
odds_BSU_score_noscore <- BSU_score / BSU_noscore
odds_BSU_score_noscore
## [1] 0.7666667
odds_BSU_score_noscore/odds_UofI_score_noscore
## [1] 2.913333
  1. Write a communication block to communicate the full findings of this problem. Of the 77 individuals recorded in our data, 28 scored, while 49 did not score. A chi-square test for association with Yates continuity correction was conducted to test whether there was an association between university and whether scoring or not. Results show no significant association between university and whether the individual scored χ2(1, N = 77) = 2.72, p = 0.099. Based on the odds ratio, the odds of scoring were 2.91 times higher for individuals from BSU than if from UofI.