1 Loading Libraries

library(expss) # for the cross_cases() command
## Loading required package: maditr
## 
## Use magrittr pipe '%>%' to chain several operations:
##              mtcars %>%
##                  let(mpg_hp = mpg/hp) %>%
##                  take(mean(mpg_hp), by = am)
## 

2 Importing Data

# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
d <- read.csv(file="Data/eammi2_clean.csv", header=T)

3 State Your Hypothesis

There will be no gender differences in participation across the racial/ethnic categories (in other words, men, women, and non-binary participants will be evenly distributed across the racial/ethnic categories).

(Note: This hypothesis is predicting a non-significant result. It’s a bit backwards from how we usually do things, where a significant results supports the findings, but it makes sense for the current variables.)

4 Check Your Variables

# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)
## 'data.frame':    3166 obs. of  6 variables:
##  $ ResponseId: chr  "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
##  $ gender    : chr  "f" "m" "m" "f" ...
##  $ race_rc   : chr  "white" "white" "white" "other" ...
##  $ swb       : num  4.33 4.17 1.83 5.17 3.67 ...
##  $ belong    : num  2.8 4.2 3.6 4 3.4 4.2 3.9 3.6 2.9 2.5 ...
##  $ efficacy  : num  3.4 3.4 2.2 2.8 3 2.4 2.3 3 3 3.7 ...
# we can see in the str() command that our categorical variables are being read as character or string variables
# to correct this, we'll use the as.factor() command
d$gender <- as.factor(d$gender)
d$race_rc <- as.factor(d$race_rc)

table(d$gender, useNA = "always")
## 
##    f    m   nb <NA> 
## 2322  791   53    0
table(d$race_rc, useNA = "always")
## 
##       asian       black    hispanic multiracial  nativeamer       other 
##         210         247         286         293          12          97 
##       white        <NA> 
##        2021           0
cross_cases(d, gender, race_rc)
 race_rc 
 asian   black   hispanic   multiracial   nativeamer   other   white 
 gender 
   f  152 182 207 222 11 72 1476
   m  57 63 77 61 1 24 508
   nb  1 2 2 10 1 37
   #Total cases  210 247 286 293 12 97 2021

5 Check Your Assumptions

5.1 Chi-square Test Assumptions

  • Data should be frequencies or counts
  • Variables and levels should be independent
  • There are two variables
  • At least 5 or more participants per cell

5.2 Issues with My Data

While my data meets the first three assumptions, I don’t have at least 5 participants in all cells. The number of non-binary or other gender participants is pretty small, and for some of the racial/ethnic groups it is less than five. The number of Native American participants is also small, and there is only one man from that group.

To proceed with this analysis, I will drop the non-binary participants from my sample and add the Native American participants to the ‘other’ category. Dropping participants is always a difficult choice, and has the potential to further marginalize already minoritized groups, but it’s a necessary compromise for my analysis. I will make a note to discuss this issue in my Method write-up and in my Discussion as a limitation of my study.

# first thing we'll cover is how to drop extra levels from a categorical variable
# we'll use the subset command to drop our non-binary participants
d <- subset(d, gender != "nb")
# rather than use the subset command to select columns, will use a filter to drop rows
# using the '!=' sign here tells R to filter out the indicated criteria (participants who are marked as 'nb' in the 'gender' column)

# then we check to make sure it looks correct
table(d$gender, useNA = "always")
## 
##    f    m   nb <NA> 
## 2322  791    0    0
# once we've dropped a level from our factor, we need to use the droplevels() command to remove it, or it will still show as 0
d$gender <- droplevels(d$gender)

# then we check to make sure it looks correct
table(d$gender, useNA = "always")
## 
##    f    m <NA> 
## 2322  791    0
# second thing we'll cover is how to combine categories
# we'll recode our race variable to combine our native american participants with our other participants

# create a new variable (race_rc2 identical to current variable (race_rc)
d$race_rc2 <- d$race_rc

# we will use some of our previous code to recode our Native American participants
d$race_rc2[d$race_rc == "nativeamer"] <- "other"
table(d$race_rc2, useNA = "always")
## 
##       asian       black    hispanic multiracial  nativeamer       other 
##         209         245         284         283           0         108 
##       white        <NA> 
##        1984           0
# once again, we need to use the droplevels() command
d$race_rc2 <- droplevels(d$race_rc2)
table(d$race_rc2, useNA = "always")
## 
##       asian       black    hispanic multiracial       other       white 
##         209         245         284         283         108        1984 
##        <NA> 
##           0
# since I made changes to my variables, I am going to re-run the cross_cases() command
cross_cases(d, gender, race_rc2)
 race_rc2 
 asian   black   hispanic   multiracial   other   white 
 gender 
   f  152 182 207 222 83 1476
   m  57 63 77 61 25 508
   #Total cases  209 245 284 283 108 1984

6 Run a Chi-square Test

# we use the chisq.test() command to run our chi-square test
# the only arguments we need to specify are the variables we're using for the chi-square test
# we are saving the output from our chi-square test to the chi_output object so we can view it again later
chi_output <- chisq.test(d$gender, d$race_rc2)

7 View Test Output

# to view the results of our chi-square test, we just have to call up the output we saved
chi_output
## 
##  Pearson's Chi-squared test
## 
## data:  d$gender and d$race_rc2
## X-squared = 3.3795, df = 5, p-value = 0.6417

8 View Standardized Residuals

# to view the standardized residuals, we use the $ operator to access the stdres element of the chi_output file that we created
chi_output$stdres
##         d$race_rc2
## d$gender      asian      black   hispanic multiracial      other      white
##        f -0.6405795 -0.1141389 -0.6915650   1.5622533  0.5494410 -0.3317414
##        m  0.6405795  0.1141389  0.6915650  -1.5622533 -0.5494410  0.3317414

9 Write Up Results

To test our hypothesis that there would be no gender differences in participation across the racial/ethnic categories, we ran a Chi-square test of independence. Our variables met most of the criteria for running a chi-square test of analysis (it used frequencies, the variables were independent, and there were two variables). However, we had a low number of Native American and non-binary and other gender participants and did not meet the criteria for at least five participants per cell. To proceed with this analysis, we dropped the non-binary and other gender participants from our sample and combined our Native American participants with the existing category for participants from other small racial/ethnic groups. The final sample for analysis can be seen in Table 1:

 race_rc2 
 asian   black   hispanic   multiracial   other   white 
 gender 
   f  152 182 207 222 83 1476
   m  57 63 77 61 25 508

As predicted, we did not find a gender difference in participation across the racial/ethnic categories, χ2(5, N = 3113) = 3.38, p = .642.

–alternative–

As predicted, we did not find a gender difference in participation across the racial/ethnic categories, X^2(5, N = 3113) = 3.38, p = .642.