1 Loading Libraries

library(expss) # for the cross_cases() command

## Loading required package: maditr

## 
## To modify variables or add new variables:
##              let(mtcars, new_var = 42, new_var2 = new_var*hp) %>% head()

2 Importing Data

# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
d <- read.csv(file="eammi2_clean.csv", header=T)

3 State Your Hypothesis

There will be no gender differences in participation across the racial/ethnic categories (in other words, men, women, and non-binary participants will be evenly distributed across the racial/ethnic categories).

(Note: This hypothesis is predicting a non-significant result. It’s a bit backwards from how we usually do things, where a significant results supports the findings, but it makes sense for the current variables.)

4 Check Your Variables

# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)

## 'data.frame':    3166 obs. of  6 variables:
##  $ ResponseId: chr  "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
##  $ gender    : chr  "f" "m" "m" "f" ...
##  $ race_rc   : chr  "white" "white" "white" "other" ...
##  $ swb       : num  4.33 4.17 1.83 5.17 3.67 ...
##  $ belong    : num  2.8 4.2 3.6 4 3.4 4.2 3.9 3.6 2.9 2.5 ...
##  $ efficacy  : num  3.4 3.4 2.2 2.8 3 2.4 2.3 3 3 3.7 ...

# we can see in the str() command that our categorical variables are being read as character or string variables
# to correct this, we'll use the as.factor() command
d$gender <- as.factor(d$gender)
d$race_rc <- as.factor(d$race_rc)

table(d$gender, useNA = "always")

## 
##    f    m   nb <NA> 
## 2322  791   53    0

table(d$race_rc, useNA = "always")

## 
##       asian       black    hispanic multiracial  nativeamer       other 
##         210         247         286         293          12          97 
##       white        <NA> 
##        2021           0

cross_cases(d,gender,race_rc)

	race_rc
	asian	black	hispanic	multiracial	nativeamer	other	white
gender
f	152	182	207	222	11	72	1476
m	57	63	77	61	1	24	508
nb	1	2	2	10		1	37
#Total cases	210	247	286	293	12	97	2021

5 Check Your Assumptions

5.1 Chi-square Test Assumptions

Data should be frequencies or counts
Variables and levels should be independent
There are two variables
At least 5 or more participants per cell

5.2 Issues with My Data

While my data meets the first three assumptions, I don’t have at least 5 participants in all cells. The number of non-binary or other gender participants is pretty small, and for some of the racial/ethnic groups it is less than five. The number of Native American participants is also small, and there is only one man from that group.

To proceed with this analysis, I will drop the non-binary participants from my sample and add the Native American participants to the ‘other’ category. Dropping participants is always a difficult choice, and has the potential to further marginalize already minoritized groups, but it’s a necessary compromise for my analysis. I will make a note to discuss this issue in my Method write-up and in my Discussion as a limitation of my study.

# we'll use the subset command to drop our non-binary participants
d <- subset(d, gender != "nb") #using the '!=' sign here tells R to filter out the indicated criteria
# once we've dropped a level from our factor, we need to use the droplevels() command to remove it, or it will still show as 0
d$gender <- droplevels(d$gender)

table(d$gender, useNA = "always")

## 
##    f    m <NA> 
## 2322  791    0

# we'll recode our race variable to combine our native american participants with our other participants
d$race_rc2 <- d$race_rc # create a new variable (race_rc2_ identical to current variable (race_rc)
d$race_rc2[d$race_rc == "nativeamer"] <- "other" # we will use some of our previous code to recode our Native American participants
d$race_rc2 <- droplevels(d$race_rc2) # once again, we need to use the droplevels() command

table(d$race_rc2, useNA = "always")

## 
##       asian       black    hispanic multiracial       other       white 
##         209         245         284         283         108        1984 
##        <NA> 
##           0

# since I made changes to my variables, I am going to re-run the cross_cases() command
cross_cases(d, gender, race_rc2)

	race_rc2
	asian	black	hispanic	multiracial	other	white
gender
f	152	182	207	222	83	1476
m	57	63	77	61	25	508
#Total cases	209	245	284	283	108	1984

6 Run a Chi-square Test

# we use the chisq.test() command to run our chi-square test
# the only arguments we need to specify are the variables we're using for the chi-square test
# we are saving the output from our chi-square test to the chi_output object so we can view it again later
chi_output <- chisq.test(d$gender, d$race_rc2)

7 View Test Output

# to view the results of our chi-square test, we just have to call up the output we saved
chi_output

## 
##  Pearson's Chi-squared test
## 
## data:  d$gender and d$race_rc2
## X-squared = 3.3795, df = 5, p-value = 0.6417

8 View Standardized Residuals

# to view the standardized residuals, we use the $ operator to access the stdres element of the chi_output file that we created
chi_output$stdres

##         d$race_rc2
## d$gender      asian      black   hispanic multiracial      other      white
##        f -0.6405795 -0.1141389 -0.6915650   1.5622533  0.5494410 -0.3317414
##        m  0.6405795  0.1141389  0.6915650  -1.5622533 -0.5494410  0.3317414

9 Write Up Results

# the apply_labels() command can make our levels long and difficult to work with, so we'll only set them at the end
d <- apply_labels (d,
                   gender = "Gender",
                   gender = c("Women" = "f", "Men" = "m"),
                   race_rc2 = "Race/Ethnicity",
                   race_rc2 = c("Asian" = "asian",
                                "Black/African American" = "black",
                                "Hispanic, Latino, or Latina" = "hispanic",
                                "Multiracial" = "multiracial",
                                "Small Groups Combined" = "other",
                                "White" = "white"))

## Warning in set_val_lab.default(data[[curr_name]], curr_lab): You are trying to
## put value labels on factor. It can lead to unexpected results. Factor will be
## converted to character.

## Warning in set_val_lab.default(data[[curr_name]], curr_lab): You are trying to
## put value labels on factor. It can lead to unexpected results. Factor will be
## converted to character.

To test our hypothesis that there would be no gender differences in participation across the racial/ethnic categories, we ran a Chi-square test of independence. Our variables met most of the criteria for running a chi-square test of analysis (it used frequencies, the variables were independent, and there were two variables). However, we had a low number of Native American and non-binary and other gender participants and did not meet the criteria for at least five participants per cell. To proceed with this analysis, we dropped the non-binary and other gender participants from our sample and combined our Native American participants with the existing category for participants from other small racial/ethnic groups. The final sample for analysis can be seen in Table 1:

	Race/Ethnicity
	Asian	Black/African American	Hispanic, Latino, or Latina	Multiracial	Small Groups Combined	White
Gender
Women	152	182	207	222	83	1476
Men	57	63	77	61	25	508

As predicted, we did not find a gender difference in participation across the racial/ethnic categories, χ²(5, N = 3113) = 3.38, p = .642.

–alternative–

As predicted, we did not find a gender difference in participation across the racial/ethnic categories, X^2(5, N = 3113) = 3.38, p = .642.

Running a Chi-square Test of Independence

Heather Perkins

2023-06-06