library(expss) # for the cross_cases() command
## Loading required package: maditr
##
## To drop variable use NULL: let(mtcars, am = NULL) %>% head()
##
## Use 'expss_output_rnotebook()' to display tables inside R Notebooks.
## To return to the console output, use 'expss_output_default()'.
library(psych) # for the describe() command
library(car) # for the leveneTest() command
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
## The following object is masked from 'package:expss':
##
## recode
library(effsize) # for the cohen.d() command
##
## Attaching package: 'effsize'
## The following object is masked from 'package:psych':
##
## cohen.d
# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
d <- read.csv(file="eammi2_final.csv", header=T)
With the hope that our sampling was evenly distributed, there will be no gender differences in participation across the racial/ethnic categories.
# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)
## 'data.frame': 3182 obs. of 7 variables:
## $ ResponseId: chr "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
## $ gender : chr "f" "m" "m" "f" ...
## $ race_rc : chr "white" "white" "white" "other" ...
## $ swb : num 4.33 4.17 1.83 5.17 3.67 ...
## $ stress : num 3.3 3.6 3.3 3.2 3.5 2.9 3.2 3 2.9 3.2 ...
## $ belong : num 3.4 3.4 3.6 3.6 3.2 3.4 3.5 3.2 3.5 2.7 ...
## $ SocMedia : num 4.27 2.09 3.09 3.18 3.36 ...
# we can see in the str() command that our categorical variables are being read as character or string variables
# to correct this, we'll use the as.factor() command
d$gender <- as.factor(d$gender)
d$race_rc <- as.factor(d$race_rc)
table(d$gender, useNA = "always")
##
## f m nb <NA>
## 2332 792 54 4
table(d$race_rc, useNA = "always")
##
## asian black hispanic multiracial nativeamer other
## 210 249 286 293 12 97
## white <NA>
## 2026 9
cross_cases(d,gender,race_rc)
 race_rc | |||||||
---|---|---|---|---|---|---|---|
 asian |  black |  hispanic |  multiracial |  nativeamer |  other |  white | |
 gender | |||||||
   f | 152 | 184 | 207 | 222 | 11 | 72 | 1480 |
   m | 57 | 63 | 77 | 61 | 1 | 24 | 508 |
   nb | 1 | 2 | 2 | 10 | 1 | 38 | |
   #Total cases | 210 | 249 | 286 | 293 | 12 | 97 | 2026 |
My data meets the first three assumptions, but not the last since I don’t have at least 5 participants in all cells. The number of non-binary participants is small, and for some of the racial/ethnic groups it is less than five. The number of Native American participants is also small, and there is only one male from that group.
To proceed with this analysis, I will drop the non-binary participants from my sample and add the Native American participants to the ‘other’ category. Dropping participants is always a difficult choice, and has the potential to further marginalize already minoritized groups, but it’s a necessary compromise for my analysis. I will make a note to discuss this issue in my Method write-up and in my Discussion as a limitation of my study.
# we'll use the subset command to drop our non-binary participants
d <- subset(d, gender != "nb") #using the '!=' sign here tells R to filter out the indicated criteria
# once we've dropped a level from our factor, we need to use the droplevels() command to remove it, or it will still show as 0
d$gender <- droplevels(d$gender)
table(d$gender, useNA = "always")
##
## f m <NA>
## 2332 792 0
# we'll recode our race variable to combine our native american participants with our other participants
d$race_rc2 <- d$race_rc # create a new variable (race_rc2_ identical to current variable (race_rc)
d$race_rc2[d$race_rc == "nativeamer"] <- "other" # we will use some of our previous code to recode our Native American participants
d$race_rc2 <- droplevels(d$race_rc2) # once again, we need to use the droplevels() command
table(d$race_rc2, useNA = "always")
##
## asian black hispanic multiracial other white
## 209 247 284 283 108 1988
## <NA>
## 5
# since I made changes to my variables, I am going to re-run the cross_cases() command
cross_cases(d, gender, race_rc2)
 race_rc2 | ||||||
---|---|---|---|---|---|---|
 asian |  black |  hispanic |  multiracial |  other |  white | |
 gender | ||||||
   f | 152 | 184 | 207 | 222 | 83 | 1480 |
   m | 57 | 63 | 77 | 61 | 25 | 508 |
   #Total cases | 209 | 247 | 284 | 283 | 108 | 1988 |
# we use the chisq.test() command to run our chi-square test
# the only arguments we need to specify are the variables we're using for the chi-square test
# we are saving the output from our chi-square test to the chi_output object so we can view it again later
chi_output <- chisq.test(d$gender, d$race_rc2)
# to view the results of our chi-square test, we just have to call up the output we saved
chi_output
##
## Pearson's Chi-squared test
##
## data: d$gender and d$race_rc2
## X-squared = 3.3508, df = 5, p-value = 0.6461
# to view the standardized residuals, we use the $ operator to access the stdres element of the chi_output file that we created
chi_output$stdres
## d$race_rc2
## d$gender asian black hispanic multiracial other
## f -0.65775743 -0.05472746 -0.71179673 1.54327508 0.53788796
## m 0.65775743 0.05472746 0.71179673 -1.54327508 -0.53788796
## d$race_rc2
## d$gender white
## f -0.32782212
## m 0.32782212
To test our hypothesis that there would be no gender differences in participation across the racial/ethnic categories, we ran a Chi-square test of independence. Our variables met most of the criteria for running a chi-square test of analysis (it used frequencies, the variables were independent, and there were two variables). However, we had a low number of Native American and non-binary participants and did not meet the criteria for at least five participants per cell. To proceed with this analysis, we dropped the non-binary participants from our sample and combined our Native American participants with the existing category for participants from other small racial/ethnic groups. The final sample for analysis can be seen in Table 1:
 race_rc2 | ||||||
---|---|---|---|---|---|---|
 asian |  black |  hispanic |  multiracial |  other |  white | |
 gender | ||||||
   f | 152 | 184 | 207 | 222 | 83 | 1480 |
   m | 57 | 63 | 77 | 61 | 25 | 508 |
As predicted, we did not find a gender difference in participation across the racial/ethnic categories, χ2(5, N = 3124) = 3.35, p = .646.
We predict that women will report significantly more need to belong than men.
# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)
## 'data.frame': 3124 obs. of 8 variables:
## $ ResponseId: chr "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
## $ gender : Factor w/ 2 levels "f","m": 1 2 2 1 2 1 1 1 1 1 ...
## $ race_rc : Factor w/ 7 levels "asian","black",..: 7 7 7 6 7 7 7 7 4 4 ...
## $ swb : num 4.33 4.17 1.83 5.17 3.67 ...
## $ stress : num 3.3 3.6 3.3 3.2 3.5 2.9 3.2 3 2.9 3.2 ...
## $ belong : num 3.4 3.4 3.6 3.6 3.2 3.4 3.5 3.2 3.5 2.7 ...
## $ SocMedia : num 4.27 2.09 3.09 3.18 3.36 ...
## $ race_rc2 : Factor w/ 6 levels "asian","black",..: 6 6 6 5 6 6 6 6 4 4 ...
d$gender <- as.factor(d$gender)
table(d$gender, useNA = "always")
##
## f m <NA>
## 2332 792 0
# you can use the describe() command on an entire datafrom (d) or just on a single variable (d$pss)
describe(d$belong)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 3117 3.31 0.49 3.3 3.33 0.44 1 5 4 -0.3 0.54 0.01
# also use a histogram to examine your continuous variable
hist(d$belong)
# can use the describeBy() command to view the means and standard deviations by group
# it's very similar to the describe() command but splits the dataframe according to the 'group' variable
describeBy(d$belong, group=d$gender)
##
## Descriptive statistics by group
## group: f
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 2325 3.35 0.48 3.4 3.36 0.44 1 5 4 -0.34 0.73 0.01
## ------------------------------------------------------------
## group: m
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 792 3.21 0.51 3.2 3.22 0.44 1.4 4.8 3.4 -0.16 0.19 0.02
# last, use a boxplot to examine your continuous and categorical variables together
boxplot(d$belong~d$gender)
# use the leveneTest() command from the car package to test homogeneity of variance
# uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(belong~gender, data = d)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 6.7319 0.009515 **
## 3115
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
My independent variable originally had more than two levels. Above, in my Chi-square section, I had dropped the non-binary participants from my sample, and I will keep it this way in my T-test (reran, and double checked). I will make a note to discuss this issue in my Method write-up and in my Discussion as a limitation of my study.
My data also has some potential issues regarding homogeneity of variance since my Levene’s test was significant. To accommodate any potential heterogeneity of variance, I will use Welch’s t-test instead of Student’s t-test.
# once again, subetting to drop the nb group
d <- subset(d, gender != "nb")
d$gender <- droplevels(d$gender) # using droplevels() to drop the empty factor
table(d$gender, useNA = "always")
##
## f m <NA>
## 2332 792 0
# very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$belong~d$gender)
t_output
##
## Welch Two Sample t-test
##
## data: d$belong by d$gender
## t = 6.5565, df = 1295, p-value = 7.937e-11
## alternative hypothesis: true difference in means between group f and group m is not equal to 0
## 95 percent confidence interval:
## 0.09579566 0.17759925
## sample estimates:
## mean in group f mean in group m
## 3.346925 3.210227
# once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$belong~d$gender)
d_output
##
## Cohen's d
##
## d estimate: 0.2785787 (small)
## 95 percent confidence interval:
## lower upper
## 0.1976128 0.3595446
To test our hypothesis that women in our sample would report significantly more need to belong than men, we used an two-sample or independent t-test. This required us to drop our non-binary participants from our sample, as we are limited to a two-group comparison when using this test. We tested the homogeneity of variance with Levene’s test and found some signs of heterogeneity (p = 0.0095). This suggests that there is an increased chance of Type I error. To correct for this possible issue, we use Welch’s t-test, which does not assume homogeneity of variance. Our data met all other assumptions of a t-test.
As predicted, we found that women (M = 3.35, SD = .48) reported significantly higher need to belong than men (M = 3.21, SD = .51); t(1295) = 6.56, p < .010 (see Figure 1). The effect size was calculated using Cohen’s d, with a value of .28 (small effect; Cohen, 1988).
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.