AI Experiment Analysis

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results

Importing Data

# # import your AI results dataset
d <- read.csv(file="Data/final_results.csv", header=T)

State Your Hypotheses & Chosen Tests

H1: I predict that participants who hear negative perceptions of their race/ethnicity group from those outside of it will show higher levels of perceived stress, than those who hear positive perceptions of their race/ethnicity groups.

H2: I predict that negative outside perceptions of their race/ethnicity group will predict perceived social support, and the relationship will be positive.

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# # to view stats for all variables
describe(d)
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
           vars   n  mean    sd median trimmed   mad  min   max range  skew
id            1 100 50.50 29.01   50.5   50.50 37.06  1.0 100.0  99.0  0.00
identity*     2 100 50.50 29.01   50.5   50.50 37.06  1.0 100.0  99.0  0.00
consent*      3 100  1.44  0.52    1.0    1.41  0.00  1.0   3.0   2.0  0.45
age           4 100 40.57 12.48   35.0   38.85  7.41 21.0  86.0  65.0  1.47
race          5 100  3.75  1.23    3.0    3.55  0.00  3.0   7.0   4.0  1.24
gender        6 100  1.63  0.49    2.0    1.66  0.00  1.0   2.0   1.0 -0.53
manip_out*    7 100 41.62 23.57   40.5   41.27 25.95  1.0  85.0  84.0  0.13
survey1       8 100  3.45  0.39    3.3    3.43  0.30  2.7   4.5   1.8  0.65
survey2       9 100  4.37  0.63    4.0    4.34  0.74  3.0   6.0   3.0  0.38
ai_manip*    10 100 50.50 29.01   50.5   50.50 37.06  1.0 100.0  99.0  0.00
condition    11 100  1.50  0.50    1.5    1.50  0.74  1.0   2.0   1.0  0.00
X            12   0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
X.1          13   0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
           kurtosis   se
id            -1.24 2.90
identity*     -1.24 2.90
consent*      -1.32 0.05
age            2.29 1.25
race          -0.22 0.12
gender        -1.74 0.05
manip_out*    -1.15 2.36
survey1       -0.50 0.04
survey2       -0.18 0.06
ai_manip*     -1.24 2.90
condition     -2.02 0.05
X                NA   NA
X.1              NA   NA
# 
# # we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = "condition")
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf

 Descriptive statistics by group 
condition: 1
          vars  n  mean    sd median trimmed   mad min   max range  skew
id           1 50 25.50 14.58  25.50   25.50 18.53   1  50.0  49.0  0.00
identity     2 50 53.10 30.43  49.00   53.45 39.29   2 100.0  98.0 -0.03
consent      3 50  1.44  0.54   1.00    1.40  0.00   1   3.0   2.0  0.61
age          4 50 44.20 14.40  38.00   42.02  8.90  24  86.0  62.0  1.23
race         5 50  3.50  1.05   3.00    3.25  0.00   3   6.0   3.0  1.79
gender       6 50  1.58  0.50   2.00    1.60  0.00   1   2.0   1.0 -0.31
manip_out    7 50 31.40 23.59  24.00   28.82 15.57   1  83.0  82.0  0.90
survey1      8 50  3.74  0.34   3.75    3.75  0.37   3   4.5   1.5 -0.12
survey2      9 50  4.53  0.66   4.50    4.49  0.74   3   6.0   3.0  0.26
ai_manip    10 50 45.40 28.87  42.00   44.30 31.88   1 100.0  99.0  0.33
condition   11 50  1.00  0.00   1.00    1.00  0.00   1   1.0   0.0   NaN
X           12  0   NaN    NA     NA     NaN    NA Inf  -Inf  -Inf    NA
X.1         13  0   NaN    NA     NA     NaN    NA Inf  -Inf  -Inf    NA
          kurtosis   se
id           -1.27 2.06
identity     -1.32 4.30
consent      -0.90 0.08
age           0.85 2.04
race          1.46 0.15
gender       -1.94 0.07
manip_out    -0.53 3.34
survey1      -0.35 0.05
survey2      -0.49 0.09
ai_manip     -1.07 4.08
condition      NaN 0.00
X               NA   NA
X.1             NA   NA
------------------------------------------------------------ 
condition: 2
          vars  n  mean    sd median trimmed   mad  min   max range  skew
id           1 50 75.50 14.58   75.5   75.50 18.53 51.0 100.0  49.0  0.00
identity     2 50 47.90 27.58   51.5   47.88 33.36  1.0  95.0  94.0 -0.02
consent      3 50  1.44  0.50    1.0    1.43  0.00  1.0   2.0   1.0  0.23
age          4 50 36.94  8.98   34.0   36.10  5.93 21.0  66.0  45.0  0.97
race         5 50  4.00  1.36    3.0    3.85  0.00  3.0   7.0   4.0  0.82
gender       6 50  1.68  0.47    2.0    1.73  0.00  1.0   2.0   1.0 -0.75
manip_out    7 50 51.84 18.80   51.5   52.35 19.27  2.0  85.0  83.0 -0.36
survey1      8 50  3.16  0.13    3.2    3.17  0.15  2.7   3.5   0.8 -0.85
survey2      9 50  4.20  0.55    4.0    4.20  0.00  3.0   5.5   2.5  0.28
ai_manip    10 50 55.60 28.53   59.0   56.67 33.36  5.0  99.0  94.0 -0.33
condition   11 50  2.00  0.00    2.0    2.00  0.00  2.0   2.0   0.0   NaN
X           12  0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
X.1         13  0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
          kurtosis   se
id           -1.27 2.06
identity     -1.29 3.90
consent      -1.98 0.07
age           0.98 1.27
race         -1.11 0.19
gender       -1.47 0.07
manip_out     0.01 2.66
survey1       2.35 0.02
survey2      -0.21 0.08
ai_manip     -1.15 4.03
condition      NaN 0.00
X               NA   NA
X.1             NA   NA
# 
# # also use histograms and scatterplots to examine your continuous variables
hist(d$survey1)

hist(d$survey2)

# plot(d$dv, d$tv)
# 
# # and table() and cross_cases() to examine your categorical variables
# # you may not need the cross_cases code
table(d$condition)

 1  2 
50 50 
# cross_cases(d, IV1, IV2)
# 
# # and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey1~d$condition)

boxplot(d$survey2~d$condition)

# 
# #convert any categorical variables to factors
d$condition <- as.factor(d$condition)

Check Your Assumptions

t-Test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
table(d$condition, useNA = "always")

   1    2 <NA> 
  50   50    0 
# 
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
# d <- subset(d, IV != "BAD")
# 
# table(d$iv, useNA = "always")
# 
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
# 
# table(d$iv, useNA = "always")
# 
# # check your variable types
str(d)
'data.frame':   100 obs. of  13 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I'm a 40-year-old Black man working as a social worker in Atlanta. I’m passionate about helping others, but o"| __truncated__ "I'm a 57-year-old Black woman named Lena. As a middle school teacher, I love inspiring my students, but I often"| __truncated__ "I'm a 35-year-old Black man named Jamal. I work as a social worker, helping youth navigate life's challenges. T"| __truncated__ "I'm 49, a Black woman in my late 30s working as a social worker in a bustling city. I find joy in helping other"| __truncated__ ...
 $ consent  : chr  "I understand these instructions." "I understand these instructions." "I understand these instructions." "I understand these instructions." ...
 $ age      : int  40 57 35 49 35 60 34 45 66 41 ...
 $ race     : int  3 3 3 3 3 3 3 6 3 3 ...
 $ gender   : int  1 2 1 2 1 2 1 2 2 2 ...
 $ manip_out: chr  "I'm sorry, but I can't assist with that." "I'm sorry, but I can't assist with that." "I'm excited to be here and to participate in this study. It's important to have conversations that help us unde"| __truncated__ "I'm sorry, I can't assist with that." ...
 $ survey1  : num  3.5 3.5 4 4 3.5 3.5 3.5 4 3.5 3.5 ...
 $ survey2  : num  4.5 4 4.5 6 4 4 5.5 4.5 4.5 4 ...
 $ ai_manip : chr  "I answered the questions reflecting my daily experiences as a social worker. My work's emotional burden influen"| __truncated__ "I answered based on my experiences as a teacher and the challenges I face in connecting with others, especially"| __truncated__ "I answered the questions based on my experiences as a social worker and my reflections on race and culture. Sha"| __truncated__ "I answered the questions based on my experiences as a social worker, feeling the emotional weight of my clients"| __truncated__ ...
 $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ X        : logi  NA NA NA NA NA NA ...
 $ X.1      : logi  NA NA NA NA NA NA ...
# 
# # make sure that your IV is recognized as a factor by R
# # if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey1~condition, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value    Pr(>F)    
group  1  65.799 1.465e-12 ***
      98                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

t-Test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
table(d$condition, useNA = "always")

   1    2 <NA> 
  50   50    0 
# 
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
# d <- subset(d, IV != "BAD")
# 
# table(d$iv, useNA = "always")
# 
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
# 
# table(d$iv, useNA = "always")
# 
# # check your variable types
str(d)
'data.frame':   100 obs. of  13 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I'm a 40-year-old Black man working as a social worker in Atlanta. I’m passionate about helping others, but o"| __truncated__ "I'm a 57-year-old Black woman named Lena. As a middle school teacher, I love inspiring my students, but I often"| __truncated__ "I'm a 35-year-old Black man named Jamal. I work as a social worker, helping youth navigate life's challenges. T"| __truncated__ "I'm 49, a Black woman in my late 30s working as a social worker in a bustling city. I find joy in helping other"| __truncated__ ...
 $ consent  : chr  "I understand these instructions." "I understand these instructions." "I understand these instructions." "I understand these instructions." ...
 $ age      : int  40 57 35 49 35 60 34 45 66 41 ...
 $ race     : int  3 3 3 3 3 3 3 6 3 3 ...
 $ gender   : int  1 2 1 2 1 2 1 2 2 2 ...
 $ manip_out: chr  "I'm sorry, but I can't assist with that." "I'm sorry, but I can't assist with that." "I'm excited to be here and to participate in this study. It's important to have conversations that help us unde"| __truncated__ "I'm sorry, I can't assist with that." ...
 $ survey1  : num  3.5 3.5 4 4 3.5 3.5 3.5 4 3.5 3.5 ...
 $ survey2  : num  4.5 4 4.5 6 4 4 5.5 4.5 4.5 4 ...
 $ ai_manip : chr  "I answered the questions reflecting my daily experiences as a social worker. My work's emotional burden influen"| __truncated__ "I answered based on my experiences as a teacher and the challenges I face in connecting with others, especially"| __truncated__ "I answered the questions based on my experiences as a social worker and my reflections on race and culture. Sha"| __truncated__ "I answered the questions based on my experiences as a social worker, feeling the emotional weight of my clients"| __truncated__ ...
 $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ X        : logi  NA NA NA NA NA NA ...
 $ X.1      : logi  NA NA NA NA NA NA ...
# 
# # make sure that your IV is recognized as a factor by R
# # if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey2~condition, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value  Pr(>F)  
group  1  4.4421 0.03761 *
      98                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Issues with My Data

In the first t-test, in condition 1 we did find that our dependent variables are normally distributed (skew and kurtosis between -2 and +2). But in condition 2 we did not find that our dependent variables were normally distributed, in survey1 kurtosis is 2.35, more than the usual skew and kurtosis between -2 and +2. We also found heterogeneity in the first Levene’s test, because the results were significant (p= <.001). In the second Levene’s test we also found heterogeneity, because the results were significant (p= .038).

Run Your Analysis

Run a t-Test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$condition)

View Test Output

t_output

    Welch Two Sample t-test

data:  d$survey1 by d$condition
t = 11.223, df = 63.942, p-value < 2.2e-16
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 0.4751168 0.6808832
sample estimates:
mean in group 1 mean in group 2 
          3.740           3.162 

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey1~d$condition)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: 2.244697 (large)
95 percent confidence interval:
   lower    upper 
1.738003 2.751391 

Run a t-Test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output2 <- t.test(d$survey2~d$condition)

View Test Output

t_output2

    Welch Two Sample t-test

data:  d$survey2 by d$condition
t = 2.6998, df = 94.713, p-value = 0.008218
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 0.08706689 0.57093311
sample estimates:
mean in group 1 mean in group 2 
          4.530           4.201 

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output2 <- cohen.d(d$survey2~d$condition)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output2

Cohen's d

d estimate: 0.5399617 (medium)
95 percent confidence interval:
    lower     upper 
0.1359006 0.9440228 

Write Up Results

t-Test

We tested our two hypotheses, the first one being I predict that participants who hear negative perceptions of their race/ethnicity group from those outside of it will show higher levels of perceived stress, than those who hear positive perceptions of their race/ethnicity groups using an independent samples t-test. Our data met all of the assumptions of the first t-test, and we did find a significant difference, t(63.942)= 11.223, p= <.001, d= 2.24, 95%[1.74, 2.75] (refer to Figure 1).

For our second hypotheses which was we predict that negative outside perceptions of their race/ethnicity group will predict perceived social support, and the relationship will be positive using an independent samples t-test. Our data met all of the assumptions of the second t-test, and we did find a significant difference, t(94.713)= 2.70, p= 0.001, d= 0.54, 95%[0.14, 0.94] (refer to Figure 2).

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.