1 Loading Libraries

library(expss) # for the cross_cases() command

## Loading required package: maditr

## 
## Use magrittr pipe '%>%' to chain several operations:
##              mtcars %>%
##                  let(mpg_hp = mpg/hp) %>%
##                  take(mean(mpg_hp), by = am)
##

2 Importing Data

# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
d <- read.csv(file="Data/eammi2_clean.csv", header=T)

3 State Your Hypothesis

There will be no gender differences in participation across the racial/ethnic categories (in other words, men, women, and non-binary participants will be evenly distributed across the racial/ethnic categories).

(Note: This hypothesis is predicting a non-significant result. It’s a bit backwards from how we usually do things, where a significant results supports the findings, but it makes sense for the current variables.)

4 Check Your Variables

# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)

## 'data.frame':    3166 obs. of  6 variables:
##  $ ResponseId: chr  "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
##  $ gender    : chr  "f" "m" "m" "f" ...
##  $ race_rc   : chr  "white" "white" "white" "other" ...
##  $ swb       : num  4.33 4.17 1.83 5.17 3.67 ...
##  $ belong    : num  2.8 4.2 3.6 4 3.4 4.2 3.9 3.6 2.9 2.5 ...
##  $ efficacy  : num  3.4 3.4 2.2 2.8 3 2.4 2.3 3 3 3.7 ...

# we can see in the str() command that our categorical variables are being read as character or string variables
# to correct this, we'll use the as.factor() command
d$gender <- as.factor(d$gender)
d$race_rc <- as.factor(d$race_rc)

table(d$gender, useNA = "always")

## 
##    f    m   nb <NA> 
## 2322  791   53    0

table(d$race_rc, useNA = "always")

## 
##       asian       black    hispanic multiracial  nativeamer       other 
##         210         247         286         293          12          97 
##       white        <NA> 
##        2021           0

cross_cases(d, gender, race_rc)

	race_rc
	asian	black	hispanic	multiracial	nativeamer	other	white
gender
f	152	182	207	222	11	72	1476
m	57	63	77	61	1	24	508
nb	1	2	2	10		1	37
#Total cases	210	247	286	293	12	97	2021

5 Check Your Assumptions

5.1 Chi-square Test Assumptions

Data should be frequencies or counts
Variables and levels should be independent
There are two variables
At least 5 or more participants per cell

5.2 Issues with My Data

While my data meets the first three assumptions, I don’t have at least 5 participants in all cells. The number of non-binary or other gender participants is pretty small, and for some of the racial/ethnic groups it is less than five. The number of Native American participants is also small, and there is only one man from that group.

To proceed with this analysis, I will drop the non-binary participants from my sample and add the Native American participants to the ‘other’ category. Dropping participants is always a difficult choice, and has the potential to further marginalize already minoritized groups, but it’s a necessary compromise for my analysis. I will make a note to discuss this issue in my Method write-up and in my Discussion as a limitation of my study.

# first thing we'll cover is how to drop extra levels from a categorical variable
# we'll use the subset command to drop our non-binary participants
d <- subset(d, gender != "nb")
# rather than use the subset command to select columns, will use a filter to drop rows
# using the '!=' sign here tells R to filter out the indicated criteria (participants who are marked as 'nb' in the 'gender' column)

# then we check to make sure it looks correct
table(d$gender, useNA = "always")

## 
##    f    m   nb <NA> 
## 2322  791    0    0

# once we've dropped a level from our factor, we need to use the droplevels() command to remove it, or it will still show as 0
d$gender <- droplevels(d$gender)

# then we check to make sure it looks correct
table(d$gender, useNA = "always")

## 
##    f    m <NA> 
## 2322  791    0

# second thing we'll cover is how to combine categories
# we'll recode our race variable to combine our native american participants with our other participants

# create a new variable (race_rc2 identical to current variable (race_rc)
d$race_rc2 <- d$race_rc

# we will use some of our previous code to recode our Native American participants
d$race_rc2[d$race_rc == "nativeamer"] <- "other"
table(d$race_rc2, useNA = "always")

## 
##       asian       black    hispanic multiracial  nativeamer       other 
##         209         245         284         283           0         108 
##       white        <NA> 
##        1984           0

# once again, we need to use the droplevels() command
d$race_rc2 <- droplevels(d$race_rc2)
table(d$race_rc2, useNA = "always")

## 
##       asian       black    hispanic multiracial       other       white 
##         209         245         284         283         108        1984 
##        <NA> 
##           0

# since I made changes to my variables, I am going to re-run the cross_cases() command
cross_cases(d, gender, race_rc2)

	race_rc2
	asian	black	hispanic	multiracial	other	white
gender
f	152	182	207	222	83	1476
m	57	63	77	61	25	508
#Total cases	209	245	284	283	108	1984

6 Run a Chi-square Test

# we use the chisq.test() command to run our chi-square test
# the only arguments we need to specify are the variables we're using for the chi-square test
# we are saving the output from our chi-square test to the chi_output object so we can view it again later
chi_output <- chisq.test(d$gender, d$race_rc2)

7 View Test Output

# to view the results of our chi-square test, we just have to call up the output we saved
chi_output

## 
##  Pearson's Chi-squared test
## 
## data:  d$gender and d$race_rc2
## X-squared = 3.3795, df = 5, p-value = 0.6417

8 View Standardized Residuals

# to view the standardized residuals, we use the $ operator to access the stdres element of the chi_output file that we created
chi_output$stdres

##         d$race_rc2
## d$gender      asian      black   hispanic multiracial      other      white
##        f -0.6405795 -0.1141389 -0.6915650   1.5622533  0.5494410 -0.3317414
##        m  0.6405795  0.1141389  0.6915650  -1.5622533 -0.5494410  0.3317414

9 Write Up Results

To test our hypothesis that there would be no gender differences in participation across the racial/ethnic categories, we ran a Chi-square test of independence. Our variables met most of the criteria for running a chi-square test of analysis (it used frequencies, the variables were independent, and there were two variables). However, we had a low number of Native American and non-binary and other gender participants and did not meet the criteria for at least five participants per cell. To proceed with this analysis, we dropped the non-binary and other gender participants from our sample and combined our Native American participants with the existing category for participants from other small racial/ethnic groups. The final sample for analysis can be seen in Table 1:

	race_rc2
	asian	black	hispanic	multiracial	other	white
gender
f	152	182	207	222	83	1476
m	57	63	77	61	25	508

As predicted, we did not find a gender difference in participation across the racial/ethnic categories, χ²(5, N = 3113) = 3.38, p = .642.

–alternative–

As predicted, we did not find a gender difference in participation across the racial/ethnic categories, X^2(5, N = 3113) = 3.38, p = .642.

Running a Chi-square Test of Independence

Heather Perkins

2023-05-22