library(expss) # for the cross_cases() command
## Loading required package: maditr
##
## To modify variables or add new variables:
## let(mtcars, new_var = 42, new_var2 = new_var*hp) %>% head()
# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
d <- read.csv(file="eammi2_clean.csv", header=T)
There will be no gender differences in participation across the racial/ethnic categories (in other words, men, women, and non-binary participants will be evenly distributed across the racial/ethnic categories).
(Note: This hypothesis is predicting a non-significant result. It’s a bit backwards from how we usually do things, where a significant results supports the findings, but it makes sense for the current variables.)
# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)
## 'data.frame': 3166 obs. of 6 variables:
## $ ResponseId: chr "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
## $ gender : chr "f" "m" "m" "f" ...
## $ race_rc : chr "white" "white" "white" "other" ...
## $ swb : num 4.33 4.17 1.83 5.17 3.67 ...
## $ belong : num 2.8 4.2 3.6 4 3.4 4.2 3.9 3.6 2.9 2.5 ...
## $ efficacy : num 3.4 3.4 2.2 2.8 3 2.4 2.3 3 3 3.7 ...
# we can see in the str() command that our categorical variables are being read as character or string variables
# to correct this, we'll use the as.factor() command
d$gender <- as.factor(d$gender)
d$race_rc <- as.factor(d$race_rc)
table(d$gender, useNA = "always")
##
## f m nb <NA>
## 2322 791 53 0
table(d$race_rc, useNA = "always")
##
## asian black hispanic multiracial nativeamer other
## 210 247 286 293 12 97
## white <NA>
## 2021 0
cross_cases(d,gender,race_rc)
race_rc | |||||||
---|---|---|---|---|---|---|---|
asian | black | hispanic | multiracial | nativeamer | other | white | |
gender | |||||||
f | 152 | 182 | 207 | 222 | 11 | 72 | 1476 |
m | 57 | 63 | 77 | 61 | 1 | 24 | 508 |
nb | 1 | 2 | 2 | 10 | 1 | 37 | |
#Total cases | 210 | 247 | 286 | 293 | 12 | 97 | 2021 |
While my data meets the first three assumptions, I don’t have at least 5 participants in all cells. The number of non-binary or other gender participants is pretty small, and for some of the racial/ethnic groups it is less than five. The number of Native American participants is also small, and there is only one man from that group.
To proceed with this analysis, I will drop the non-binary participants from my sample and add the Native American participants to the ‘other’ category. Dropping participants is always a difficult choice, and has the potential to further marginalize already minoritized groups, but it’s a necessary compromise for my analysis. I will make a note to discuss this issue in my Method write-up and in my Discussion as a limitation of my study.
# we'll use the subset command to drop our non-binary participants
d <- subset(d, gender != "nb") #using the '!=' sign here tells R to filter out the indicated criteria
# once we've dropped a level from our factor, we need to use the droplevels() command to remove it, or it will still show as 0
d$gender <- droplevels(d$gender)
table(d$gender, useNA = "always")
##
## f m <NA>
## 2322 791 0
# we'll recode our race variable to combine our native american participants with our other participants
d$race_rc2 <- d$race_rc # create a new variable (race_rc2_ identical to current variable (race_rc)
d$race_rc2[d$race_rc == "nativeamer"] <- "other" # we will use some of our previous code to recode our Native American participants
d$race_rc2 <- droplevels(d$race_rc2) # once again, we need to use the droplevels() command
table(d$race_rc2, useNA = "always")
##
## asian black hispanic multiracial other white
## 209 245 284 283 108 1984
## <NA>
## 0
# since I made changes to my variables, I am going to re-run the cross_cases() command
cross_cases(d, gender, race_rc2)
race_rc2 | ||||||
---|---|---|---|---|---|---|
asian | black | hispanic | multiracial | other | white | |
gender | ||||||
f | 152 | 182 | 207 | 222 | 83 | 1476 |
m | 57 | 63 | 77 | 61 | 25 | 508 |
#Total cases | 209 | 245 | 284 | 283 | 108 | 1984 |
# we use the chisq.test() command to run our chi-square test
# the only arguments we need to specify are the variables we're using for the chi-square test
# we are saving the output from our chi-square test to the chi_output object so we can view it again later
chi_output <- chisq.test(d$gender, d$race_rc2)
# to view the results of our chi-square test, we just have to call up the output we saved
chi_output
##
## Pearson's Chi-squared test
##
## data: d$gender and d$race_rc2
## X-squared = 3.3795, df = 5, p-value = 0.6417
# to view the standardized residuals, we use the $ operator to access the stdres element of the chi_output file that we created
chi_output$stdres
## d$race_rc2
## d$gender asian black hispanic multiracial other white
## f -0.6405795 -0.1141389 -0.6915650 1.5622533 0.5494410 -0.3317414
## m 0.6405795 0.1141389 0.6915650 -1.5622533 -0.5494410 0.3317414
# the apply_labels() command can make our levels long and difficult to work with, so we'll only set them at the end
d <- apply_labels (d,
gender = "Gender",
gender = c("Women" = "f", "Men" = "m"),
race_rc2 = "Race/Ethnicity",
race_rc2 = c("Asian" = "asian",
"Black/African American" = "black",
"Hispanic, Latino, or Latina" = "hispanic",
"Multiracial" = "multiracial",
"Small Groups Combined" = "other",
"White" = "white"))
## Warning in set_val_lab.default(data[[curr_name]], curr_lab): You are trying to
## put value labels on factor. It can lead to unexpected results. Factor will be
## converted to character.
## Warning in set_val_lab.default(data[[curr_name]], curr_lab): You are trying to
## put value labels on factor. It can lead to unexpected results. Factor will be
## converted to character.
To test our hypothesis that there would be no gender differences in participation across the racial/ethnic categories, we ran a Chi-square test of independence. Our variables met most of the criteria for running a chi-square test of analysis (it used frequencies, the variables were independent, and there were two variables). However, we had a low number of Native American and non-binary and other gender participants and did not meet the criteria for at least five participants per cell. To proceed with this analysis, we dropped the non-binary and other gender participants from our sample and combined our Native American participants with the existing category for participants from other small racial/ethnic groups. The final sample for analysis can be seen in Table 1:
Race/Ethnicity | ||||||
---|---|---|---|---|---|---|
Asian | Black/African American | Hispanic, Latino, or Latina | Multiracial | Small Groups Combined | White | |
Gender | ||||||
Women | 152 | 182 | 207 | 222 | 83 | 1476 |
Men | 57 | 63 | 77 | 61 | 25 | 508 |
As predicted, we did not find a gender difference in participation across the racial/ethnic categories, χ2(5, N = 3113) = 3.38, p = .642.
–alternative–
As predicted, we did not find a gender difference in participation across the racial/ethnic categories, X^2(5, N = 3113) = 3.38, p = .642.