1 Loading Libraries

library(expss) # for the cross_cases() command

## Loading required package: maditr

## 
## To drop variable use NULL: let(mtcars, am = NULL) %>% head()

## 
## Use 'expss_output_rnotebook()' to display tables inside R Notebooks.
##  To return to the console output, use 'expss_output_default()'.

library(psych) # for the describe() command
library(car) # for the leveneTest() command

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

## The following object is masked from 'package:expss':
## 
##     recode

library(effsize) # for the cohen.d() command

## 
## Attaching package: 'effsize'

## The following object is masked from 'package:psych':
## 
##     cohen.d

2 Importing Data

# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
d <- read.csv(file="eammi2_final.csv", header=T)

3 Chi-square: State Your Hypothesis

With the hope that our sampling was evenly distributed, there will be no gender differences in participation across the racial/ethnic categories.

4 Chi-square: Check Your Variables

# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)

## 'data.frame':    3182 obs. of  7 variables:
##  $ ResponseId: chr  "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
##  $ gender    : chr  "f" "m" "m" "f" ...
##  $ race_rc   : chr  "white" "white" "white" "other" ...
##  $ swb       : num  4.33 4.17 1.83 5.17 3.67 ...
##  $ stress    : num  3.3 3.6 3.3 3.2 3.5 2.9 3.2 3 2.9 3.2 ...
##  $ belong    : num  3.4 3.4 3.6 3.6 3.2 3.4 3.5 3.2 3.5 2.7 ...
##  $ SocMedia  : num  4.27 2.09 3.09 3.18 3.36 ...

# we can see in the str() command that our categorical variables are being read as character or string variables
# to correct this, we'll use the as.factor() command
d$gender <- as.factor(d$gender)
d$race_rc <- as.factor(d$race_rc)

table(d$gender, useNA = "always")

## 
##    f    m   nb <NA> 
## 2332  792   54    4

table(d$race_rc, useNA = "always")

## 
##       asian       black    hispanic multiracial  nativeamer       other 
##         210         249         286         293          12          97 
##       white        <NA> 
##        2026           9

cross_cases(d,gender,race_rc)

	race_rc
	asian	black	hispanic	multiracial	nativeamer	other	white
gender
f	152	184	207	222	11	72	1480
m	57	63	77	61	1	24	508
nb	1	2	2	10		1	38
#Total cases	210	249	286	293	12	97	2026

5 Chi-square: Check Your Assumptions

5.1 Chi-square Test Assumptions

Data should be frequencies or counts
Variables and levels should be independent
There are two variables
At least 5 or more participants per cell

5.2 Issues with My Data

My data meets the first three assumptions, but not the last since I don’t have at least 5 participants in all cells. The number of non-binary participants is small, and for some of the racial/ethnic groups it is less than five. The number of Native American participants is also small, and there is only one male from that group.

To proceed with this analysis, I will drop the non-binary participants from my sample and add the Native American participants to the ‘other’ category. Dropping participants is always a difficult choice, and has the potential to further marginalize already minoritized groups, but it’s a necessary compromise for my analysis. I will make a note to discuss this issue in my Method write-up and in my Discussion as a limitation of my study.

# we'll use the subset command to drop our non-binary participants
d <- subset(d, gender != "nb") #using the '!=' sign here tells R to filter out the indicated criteria
# once we've dropped a level from our factor, we need to use the droplevels() command to remove it, or it will still show as 0
d$gender <- droplevels(d$gender)

table(d$gender, useNA = "always")

## 
##    f    m <NA> 
## 2332  792    0

# we'll recode our race variable to combine our native american participants with our other participants
d$race_rc2 <- d$race_rc # create a new variable (race_rc2_ identical to current variable (race_rc)
d$race_rc2[d$race_rc == "nativeamer"] <- "other" # we will use some of our previous code to recode our Native American participants
d$race_rc2 <- droplevels(d$race_rc2) # once again, we need to use the droplevels() command

table(d$race_rc2, useNA = "always")

## 
##       asian       black    hispanic multiracial       other       white 
##         209         247         284         283         108        1988 
##        <NA> 
##           5

# since I made changes to my variables, I am going to re-run the cross_cases() command
cross_cases(d, gender, race_rc2)

	race_rc2
	asian	black	hispanic	multiracial	other	white
gender
f	152	184	207	222	83	1480
m	57	63	77	61	25	508
#Total cases	209	247	284	283	108	1988

6 Chi-square: Run a Chi-square Test

# we use the chisq.test() command to run our chi-square test
# the only arguments we need to specify are the variables we're using for the chi-square test
# we are saving the output from our chi-square test to the chi_output object so we can view it again later
chi_output <- chisq.test(d$gender, d$race_rc2)

7 Chi-square: View Test Output

# to view the results of our chi-square test, we just have to call up the output we saved
chi_output

## 
##  Pearson's Chi-squared test
## 
## data:  d$gender and d$race_rc2
## X-squared = 3.3508, df = 5, p-value = 0.6461

8 Chi-square: View Standardized Residuals

# to view the standardized residuals, we use the $ operator to access the stdres element of the chi_output file that we created
chi_output$stdres

##         d$race_rc2
## d$gender       asian       black    hispanic multiracial       other
##        f -0.65775743 -0.05472746 -0.71179673  1.54327508  0.53788796
##        m  0.65775743  0.05472746  0.71179673 -1.54327508 -0.53788796
##         d$race_rc2
## d$gender       white
##        f -0.32782212
##        m  0.32782212

9 Chi-square: Write Up Results

To test our hypothesis that there would be no gender differences in participation across the racial/ethnic categories, we ran a Chi-square test of independence. Our variables met most of the criteria for running a chi-square test of analysis (it used frequencies, the variables were independent, and there were two variables). However, we had a low number of Native American and non-binary participants and did not meet the criteria for at least five participants per cell. To proceed with this analysis, we dropped the non-binary participants from our sample and combined our Native American participants with the existing category for participants from other small racial/ethnic groups. The final sample for analysis can be seen in Table 1:

	race_rc2
	asian	black	hispanic	multiracial	other	white
gender
f	152	184	207	222	83	1480
m	57	63	77	61	25	508

As predicted, we did not find a gender difference in participation across the racial/ethnic categories, χ²(5, N = 3124) = 3.35, p = .646.

10 T-test: State Your Hypothesis

We predict that women will report significantly more need to belong than men.

11 T-test: Check Your Variables

# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)

## 'data.frame':    3124 obs. of  8 variables:
##  $ ResponseId: chr  "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
##  $ gender    : Factor w/ 2 levels "f","m": 1 2 2 1 2 1 1 1 1 1 ...
##  $ race_rc   : Factor w/ 7 levels "asian","black",..: 7 7 7 6 7 7 7 7 4 4 ...
##  $ swb       : num  4.33 4.17 1.83 5.17 3.67 ...
##  $ stress    : num  3.3 3.6 3.3 3.2 3.5 2.9 3.2 3 2.9 3.2 ...
##  $ belong    : num  3.4 3.4 3.6 3.6 3.2 3.4 3.5 3.2 3.5 2.7 ...
##  $ SocMedia  : num  4.27 2.09 3.09 3.18 3.36 ...
##  $ race_rc2  : Factor w/ 6 levels "asian","black",..: 6 6 6 5 6 6 6 6 4 4 ...

d$gender <- as.factor(d$gender)

table(d$gender, useNA = "always")

## 
##    f    m <NA> 
## 2332  792    0

# you can use the describe() command on an entire datafrom (d) or just on a single variable (d$pss)
describe(d$belong)

##    vars    n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 3117 3.31 0.49    3.3    3.33 0.44   1   5     4 -0.3     0.54 0.01

# also use a histogram to examine your continuous variable
hist(d$belong)

# can use the describeBy() command to view the means and standard deviations by group
# it's very similar to the describe() command but splits the dataframe according to the 'group' variable
describeBy(d$belong, group=d$gender)

## 
##  Descriptive statistics by group 
## group: f
##    vars    n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 2325 3.35 0.48    3.4    3.36 0.44   1   5     4 -0.34     0.73 0.01
## ------------------------------------------------------------ 
## group: m
##    vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 792 3.21 0.51    3.2    3.22 0.44 1.4 4.8   3.4 -0.16     0.19 0.02

# last, use a boxplot to examine your continuous and categorical variables together
boxplot(d$belong~d$gender)

12 T-test: Check Your Assumptions

12.1 T-test Assumptions

IV must have two levels
Data values must be independent (independent t-test only)
Data obtained via a random sample
Dependent variable must be normally distributed
Variances of the two groups are approximately equal

12.2 Testing Homogeneity of Variance with Levene’s Test

# use the leveneTest() command from the car package to test homogeneity of variance
# uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(belong~gender, data = d)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value   Pr(>F)   
## group    1  6.7319 0.009515 **
##       3115                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

12.3 Issues with My Data

My independent variable originally had more than two levels. Above, in my Chi-square section, I had dropped the non-binary participants from my sample, and I will keep it this way in my T-test (reran, and double checked). I will make a note to discuss this issue in my Method write-up and in my Discussion as a limitation of my study.

My data also has some potential issues regarding homogeneity of variance since my Levene’s test was significant. To accommodate any potential heterogeneity of variance, I will use Welch’s t-test instead of Student’s t-test.

# once again, subetting to drop the nb group
d <- subset(d, gender != "nb")
d$gender <- droplevels(d$gender) # using droplevels() to drop the empty factor
table(d$gender, useNA = "always")

## 
##    f    m <NA> 
## 2332  792    0

13 T-test: Run a T-test

# very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$belong~d$gender)

14 T-test: View Test Output

t_output

## 
##  Welch Two Sample t-test
## 
## data:  d$belong by d$gender
## t = 6.5565, df = 1295, p-value = 7.937e-11
## alternative hypothesis: true difference in means between group f and group m is not equal to 0
## 95 percent confidence interval:
##  0.09579566 0.17759925
## sample estimates:
## mean in group f mean in group m 
##        3.346925        3.210227

15 T-test: Calculate Cohen’s d

# once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$belong~d$gender)

16 T-test: View Effect Size

d_output

## 
## Cohen's d
## 
## d estimate: 0.2785787 (small)
## 95 percent confidence interval:
##     lower     upper 
## 0.1976128 0.3595446

17 T-test: Write Up Results

To test our hypothesis that women in our sample would report significantly more need to belong than men, we used an two-sample or independent t-test. This required us to drop our non-binary participants from our sample, as we are limited to a two-group comparison when using this test. We tested the homogeneity of variance with Levene’s test and found some signs of heterogeneity (p = 0.0095). This suggests that there is an increased chance of Type I error. To correct for this possible issue, we use Welch’s t-test, which does not assume homogeneity of variance. Our data met all other assumptions of a t-test.

As predicted, we found that women (M = 3.35, SD = .48) reported significantly higher need to belong than men (M = 3.21, SD = .51); t(1295) = 6.56, p < .010 (see Figure 1). The effect size was calculated using Cohen’s d, with a value of .28 (small effect; Cohen, 1988).

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.

Chi-square and T-test HW

Sophie Sabo

2023-06-10