AI Experiment Analysis

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results

Importing Data

# import your AI results dataset
d <- read.csv(file="Data/results final!!!.csv", header=T)

State Your Hypotheses & Chosen Tests

I predict that those with high levels of perceived safety will report less stress than those with lower levels of perceived safety. I predict that female participants will report higher levels of stress.

I am going to be doing t-tests for both hypotheses.

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# to view stats for all variables
describe(d)
           vars   n  mean    sd median trimmed   mad min max range  skew
id            1 100 50.50 29.01  50.50   50.50 37.06   1 100    99  0.00
identity*     2 100 50.50 29.01  50.50   50.50 37.06   1 100    99  0.00
consent*      3 100  1.24  0.43   1.00    1.18  0.00   1   2     1  1.20
age           4 100 39.91 14.08  34.00   37.86  5.19  12  83    71  1.38
race*         5 100  4.18  1.96   5.50    4.22  2.22   1   7     6 -0.13
gender        6 100  1.84  0.37   2.00    1.93  0.00   1   2     1 -1.83
manip_out*    7 100 50.50 29.01  50.50   50.50 37.06   1 100    99  0.00
survey1       8 100  3.33  0.25   3.25    3.31  0.22   3   4     1  0.84
ai_manip*     9 100 50.50 29.01  50.50   50.50 37.06   1 100    99  0.00
condition    10 100  1.50  0.50   1.50    1.50  0.74   1   2     1  0.00
           kurtosis   se
id            -1.24 2.90
identity*     -1.24 2.90
consent*      -0.57 0.04
age            1.56 1.41
race*         -1.81 0.20
gender         1.35 0.04
manip_out*    -1.24 2.90
survey1        0.06 0.03
ai_manip*     -1.24 2.90
condition     -2.02 0.05
# we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = "condition")

 Descriptive statistics by group 
condition: 1
          vars  n  mean    sd median trimmed   mad min max range  skew kurtosis
id           1 50 25.50 14.58  25.50   25.50 18.53   1  50    49  0.00    -1.27
identity     2 50 46.90 25.83  46.50   46.35 30.39   3 100    97  0.15    -0.97
consent      3 50  1.26  0.44   1.00    1.20  0.00   1   2     1  1.06    -0.89
age          4 50 38.82 14.80  34.00   36.12  4.45  18  83    65  1.67     2.09
race         5 50  4.32  1.93   6.00    4.42  0.00   1   7     6 -0.28    -1.74
gender       6 50  1.88  0.33   2.00    1.98  0.00   1   2     1 -2.27     3.21
manip_out    7 50 57.72 35.24  73.50   59.58 25.20   1  99    98 -0.51    -1.47
survey1      8 50  3.42  0.28   3.38    3.39  0.37   3   4     1  0.54    -0.68
ai_manip     9 50 45.10 26.58  48.00   44.50 30.39   2  95    93  0.03    -1.05
condition   10 50  1.00  0.00   1.00    1.00  0.00   1   1     0   NaN      NaN
            se
id        2.06
identity  3.65
consent   0.06
age       2.09
race      0.27
gender    0.05
manip_out 4.98
survey1   0.04
ai_manip  3.76
condition 0.00
------------------------------------------------------------ 
condition: 2
          vars  n  mean    sd median trimmed   mad min   max range  skew
id           1 50 75.50 14.58   75.5   75.50 18.53  51 100.0  49.0  0.00
identity     2 50 54.10 31.73   57.5   54.95 42.25   1  99.0  98.0 -0.19
consent      3 50  1.22  0.42    1.0    1.15  0.00   1   2.0   1.0  1.31
age          4 50 41.00 13.38   34.5   39.38  5.93  12  79.0  67.0  1.01
race         5 50  4.04  2.00    3.5    4.03  2.22   1   7.0   6.0  0.02
gender       6 50  1.80  0.40    2.0    1.88  0.00   1   2.0   1.0 -1.46
manip_out    7 50 43.28 18.77   41.5   42.38 18.53   3 100.0  97.0  0.52
survey1      8 50  3.25  0.20    3.2    3.23  0.15   3   3.8   0.8  0.85
ai_manip     9 50 55.90 30.57   59.5   56.50 38.55   1 100.0  99.0 -0.15
condition   10 50  2.00  0.00    2.0    2.00  0.00   2   2.0   0.0   NaN
          kurtosis   se
id           -1.27 2.06
identity     -1.42 4.49
consent      -0.28 0.06
age           0.89 1.89
race         -1.87 0.28
gender        0.12 0.06
manip_out     0.17 2.65
survey1       0.36 0.03
ai_manip     -1.46 4.32
condition      NaN 0.00
# also use histograms and scatterplots to examine your continuous variables
hist(d$survey1)

# and table() and cross_cases() to examine your categorical variables
# you may not need the cross_cases code
table(d$IV)
< table of extent 0 >
# and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey1~d$gender)

#convert any categorical variables to factors
d$condition <- as.factor(d$condition)

Check Your Assumptions

t-Test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# preview the levels and counts for your IV
table(d$condition, useNA = "always")

   1    2 <NA> 
  50   50    0 
# note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear

# to drop levels from your variable
# this subsets the data and says that any participant who is coded as 'BAD' should be removed
#d <- subset(d, IV != "condition")

#table(d$condition, useNA = "always")

# to combine levels
# this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
#d$condition_rc[d$condition == "BAD"] <- "GOOD"

#table(d$condition, useNA = "always")

# check your variable types
str(d)
'data.frame':   100 obs. of  10 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I’m Alejandro, a 34-year-old Latino male living in Austin, Texas. Passionate about music, I play guitar in a lo"| __truncated__ "I'm a 34-year-old White woman named Sarah, living in Portland, Oregon. I work as a graphic designer and love ex"| __truncated__ "I’m Jamal, a 34-year-old Black man from Atlanta. I work as a graphic designer and have a passion for street art"| __truncated__ "I'm a 35-year-old White woman named Sarah, living in Denver. As an environmental scientist, I'm passionate abou"| __truncated__ ...
 $ consent  : chr  "I understand these instructions." "I understand the instructions." "I understand the instructions." "I understand the instructions." ...
 $ age      : int  34 34 34 35 32 21 44 25 37 34 ...
 $ race     : chr  "4" "6" "3" "6" ...
 $ gender   : int  1 2 1 2 2 2 2 2 2 2 ...
 $ manip_out: chr  "In this safe environment, I felt a profound sense of calm wash over me. The bright sunlight and gentle sounds o"| __truncated__ "As I immerse myself in this serene VR environment, I feel an overwhelming sense of calm and safety. The gentle "| __truncated__ "As I relax in this safe environment, I feel a deep sense of calm and contentment. The soothing sounds and brigh"| __truncated__ "As I immersed myself in the safe environment, I felt an overwhelming sense of peace and contentment. The bright"| __truncated__ ...
 $ survey1  : num  3.62 3.38 3.62 3.38 3.62 ...
 $ ai_manip : chr  "I answered that way because I was reflecting on the peacefulness of my surroundings, which contrasts with the d"| __truncated__ "I answered based on my personal experiences and preferences, reflecting my appreciation for nature and creativi"| __truncated__ "I answered that way to convey the contrast between my laid-back persona and the calming environment, emphasizin"| __truncated__ "I answered the way I did to convey a deep appreciation for nature and its calming effects on my mental well-bei"| __truncated__ ...
 $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# use the leveneTest() command from the car package to test homogeneity of variance
# uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey1~condition, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value   Pr(>F)   
group  1  7.2383 0.008391 **
      98                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check Your Assumptions

t-Test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# preview the levels and counts for your IV
#table(d$survey1, useNA = "always")

# note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear

# to drop levels from your variable
# this subsets the data and says that any participant who is coded as 'BAD' should be removed
#d <- subset(d, survey1 != "BAD")

table(d$gender, useNA = "always")

   1    2 <NA> 
  16   84    0 
# to combine levels
# this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
#d$survey1_rc[d$survey1 == "BAD"] <- "GOOD"

#table(d$survey1, useNA = "always")

# check your variable types
str(d)
'data.frame':   100 obs. of  10 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I’m Alejandro, a 34-year-old Latino male living in Austin, Texas. Passionate about music, I play guitar in a lo"| __truncated__ "I'm a 34-year-old White woman named Sarah, living in Portland, Oregon. I work as a graphic designer and love ex"| __truncated__ "I’m Jamal, a 34-year-old Black man from Atlanta. I work as a graphic designer and have a passion for street art"| __truncated__ "I'm a 35-year-old White woman named Sarah, living in Denver. As an environmental scientist, I'm passionate abou"| __truncated__ ...
 $ consent  : chr  "I understand these instructions." "I understand the instructions." "I understand the instructions." "I understand the instructions." ...
 $ age      : int  34 34 34 35 32 21 44 25 37 34 ...
 $ race     : chr  "4" "6" "3" "6" ...
 $ gender   : int  1 2 1 2 2 2 2 2 2 2 ...
 $ manip_out: chr  "In this safe environment, I felt a profound sense of calm wash over me. The bright sunlight and gentle sounds o"| __truncated__ "As I immerse myself in this serene VR environment, I feel an overwhelming sense of calm and safety. The gentle "| __truncated__ "As I relax in this safe environment, I feel a deep sense of calm and contentment. The soothing sounds and brigh"| __truncated__ "As I immersed myself in the safe environment, I felt an overwhelming sense of peace and contentment. The bright"| __truncated__ ...
 $ survey1  : num  3.62 3.38 3.62 3.38 3.62 ...
 $ ai_manip : chr  "I answered that way because I was reflecting on the peacefulness of my surroundings, which contrasts with the d"| __truncated__ "I answered based on my personal experiences and preferences, reflecting my appreciation for nature and creativi"| __truncated__ "I answered that way to convey the contrast between my laid-back persona and the calming environment, emphasizin"| __truncated__ "I answered the way I did to convey a deep appreciation for nature and its calming effects on my mental well-bei"| __truncated__ ...
 $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$gender <- as.factor(d$gender)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# use the leveneTest() command from the car package to test homogeneity of variance
# uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey1~gender, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.0603 0.8066
      98               

Issues with My Data

We did not have any issues when testing Homogeneity of Variance with Levene’s Test. Results were not significant, p>.05, indicating that the assumptions for equal variance was met .

Run Your Analysis

Run a t-Test

# very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$condition)

View Test Output

 t_output

    Welch Two Sample t-test

data:  d$survey1 by d$condition
t = 3.4115, df = 88.1, p-value = 0.0009774
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 0.06888393 0.26111607
sample estimates:
mean in group 1 mean in group 2 
          3.415           3.250 

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey1~d$condition)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: 0.6822952 (medium)
95 percent confidence interval:
    lower     upper 
0.2740172 1.0905732 

Run a t-Test 2

# very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$gender)

View Test Output

 t_output

    Welch Two Sample t-test

data:  d$survey1 by d$gender
t = -0.49642, df = 23.407, p-value = 0.6242
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.16134947  0.09884947
sample estimates:
mean in group 1 mean in group 2 
        3.30625         3.33750 

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey1~d$gender)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: -0.1222969 (negligible)
95 percent confidence interval:
     lower      upper 
-0.6638767  0.4192829 

Write Up Results

t-Test

For my hypotheses, I predict that those with high levels of perceived safety will report less stress than those with lower levels of perceived safety.

I predict that female participants will report higher levels of stress.

Testing Homogeneity of Variance with Levene’s Test showed that p>.05, indicating that the assumptions for equal variance was met.

The Two Sample t-test p-value for the first hypothesis was 0.0009774. However, the results of the t-test for the hypothesis was 0.6242, suggesting that gender does not have a significant effect on stress.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.