AI Experiment Analysis

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results

Importing Data

# # import your AI results dataset
d <- read.csv(file="Data/final_data(in).csv", header=T)

State Your Hypotheses & Chosen Tests

H1: I predict that participants who write an essay about their favorite childhood memory will report higher levels of mindfulness than participants who write an essay about their favorite course they took in college. H2: I predict women will have higher scores of mindfulness than men will.
Two iv levels (family memory and college course) and continuous dv (mindfulness) = T-test Two iv levels (male or female) and continuous dv (mindfulness) = T-test

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# # to view stats for all variables
describe(d)
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
           vars   n  mean    sd median trimmed   mad   min    max range  skew
id            1 100 50.50 29.01  50.50   50.50 37.06  1.00 100.00 99.00  0.00
identity*     2 100 50.50 29.01  50.50   50.50 37.06  1.00 100.00 99.00  0.00
consent*      3 100 30.88 17.37  26.50   30.26 19.27  1.00  66.00 65.00  0.30
age           4 100 19.81  1.36  20.00   19.59  1.48 17.00  27.00 10.00  2.70
race          5 100  4.94  1.46   6.00    5.08  0.00  1.00   7.00  6.00 -0.68
gender        6 100  1.87  0.81   2.00    1.82  0.00  1.00   7.00  6.00  3.37
manip_out*    7 100 50.50 29.01  50.50   50.50 37.06  1.00 100.00 99.00  0.00
survey1       8 100  4.24  0.27   4.27    4.26  0.20  3.47   4.73  1.27 -0.90
survey2       9   0   NaN    NA     NA     NaN    NA   Inf   -Inf  -Inf    NA
ai_manip*    10 100 47.92 25.69  50.50   48.77 33.36  1.00  88.00 87.00 -0.23
condition    11 100  1.50  0.50   1.50    1.50  0.74  1.00   2.00  1.00  0.00
           kurtosis   se
id            -1.24 2.90
identity*     -1.24 2.90
consent*      -0.98 1.74
age            9.95 0.14
race          -1.11 0.15
gender        17.64 0.08
manip_out*    -1.24 2.90
survey1        0.88 0.03
survey2          NA   NA
ai_manip*     -1.29 2.57
condition     -2.02 0.05
# 
# # we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = "condition")
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf

 Descriptive statistics by group 
condition: 1
          vars  n  mean    sd median trimmed   mad   min    max range  skew
id           1 50 25.50 14.58  25.50   25.50 18.53  1.00  50.00 49.00  0.00
identity     2 50 55.06 29.38  52.00   55.25 43.00  9.00 100.00 91.00 -0.04
consent      3 50 30.74 18.72  26.00   29.90 22.98  1.00  66.00 65.00  0.36
age          4 50 19.62  1.03  19.50   19.50  0.74 18.00  25.00  7.00  2.78
race         5 50  4.72  1.53   6.00    4.82  0.00  1.00   7.00  6.00 -0.45
gender       6 50  1.96  0.95   2.00    1.88  0.00  1.00   7.00  6.00  3.47
manip_out    7 50 30.98 20.77  27.50   29.25 18.53  3.00 100.00 97.00  0.93
survey1      8 50  4.25  0.27   4.27    4.27  0.20  3.47   4.73  1.27 -0.89
survey2      9  0   NaN    NA     NA     NaN    NA   Inf   -Inf  -Inf    NA
ai_manip    10 50 46.68 26.02  46.00   47.15 36.32  2.00  88.00 86.00 -0.12
condition   11 50  1.00  0.00   1.00    1.00  0.00  1.00   1.00  0.00   NaN
          kurtosis   se
id           -1.27 2.06
identity     -1.42 4.15
consent      -1.02 2.65
age          12.51 0.15
race         -1.28 0.22
gender       15.40 0.13
manip_out     0.67 2.94
survey1       0.58 0.04
survey2         NA   NA
ai_manip     -1.35 3.68
condition      NaN 0.00
------------------------------------------------------------ 
condition: 2
          vars  n  mean    sd median trimmed   mad   min    max range  skew
id           1 50 75.50 14.58   75.5   75.50 18.53 51.00 100.00  49.0  0.00
identity     2 50 45.94 28.20   49.5   45.58 32.62  1.00  98.00  97.0  0.01
consent      3 50 31.02 16.09   27.0   30.73 17.79  3.00  61.00  58.0  0.19
age          4 50 20.00  1.62   20.0   19.70  0.00 17.00  27.00  10.0  2.30
race         5 50  5.16  1.38    6.0    5.32  0.00  2.00   7.00   5.0 -0.93
gender       6 50  1.78  0.65    2.0    1.77  0.00  1.00   5.00   4.0  1.99
manip_out    7 50 70.02 22.17   74.5   72.03 23.72  1.00  99.00  98.0 -1.08
survey1      8 50  4.23  0.26    4.2    4.25  0.20  3.47   4.67   1.2 -0.89
survey2      9  0   NaN    NA     NA     NaN    NA   Inf   -Inf  -Inf    NA
ai_manip    10 50 49.16 25.57   54.5   50.42 28.91  1.00  87.00  86.0 -0.34
condition   11 50  2.00  0.00    2.0    2.00  0.00  2.00   2.00   0.0   NaN
          kurtosis   se
id           -1.27 2.06
identity     -1.19 3.99
consent      -1.13 2.28
age           6.63 0.23
race         -0.92 0.19
gender        9.79 0.09
manip_out     1.27 3.14
survey1       1.05 0.04
survey2         NA   NA
ai_manip     -1.26 3.62
condition      NaN 0.00
# 
# # also use histograms and scatterplots to examine your continuous variables
hist(d$survey1)

# plot(d$dv, d$tv)
# 
# # and table() and cross_cases() to examine your categorical variables
# # you may not need the cross_cases code
table(d$condition)

 1  2 
50 50 
# cross_cases(d, IV1, IV2)
# 
# # and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey1~d$condition)

boxplot(d$survey1~d$gender)

# 
# #convert any categorical variables to factors
# d$variable <- as.factor(d$variable)

Check Your Assumptions

t-Test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# preview the levels and counts for your IV
table(d$condition, useNA = "always")

   1    2 <NA> 
  50   50    0 
# note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear

# check your variable types
str(d)
'data.frame':   100 obs. of  11 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I'm 19 and a proud Latina majoring in environmental science. I love nature but often feel overwhelmed balancing"| __truncated__ "I’m 21 and a junior studying Environmental Science at a state university. Feeling disconnected from my peers, I"| __truncated__ "I'm a 19-year-old White college student studying psychology. I'm passionate yet often overwhelmed by anxiety an"| __truncated__ "I’m Jada, a 20-year-old Black woman studying sociology. I’m passionate about social justice but often feel over"| __truncated__ ...
 $ consent  : chr  "I understand the instructions. I will respond to the memory question with a 200-word response and then complete"| __truncated__ "I understand the instructions. I will respond to the memory question with a 200-word response and then complete"| __truncated__ "I understand the instructions. I will respond to a memory question with a 200-word response, and then I will co"| __truncated__ "I understand the instructions. I'm ready to respond to the memory question and complete the two surveys on mind"| __truncated__ ...
 $ age      : int  19 21 19 20 20 19 20 20 19 21 ...
 $ race     : int  4 6 6 3 3 3 6 3 4 6 ...
 $ gender   : int  2 2 2 2 2 2 1 2 2 2 ...
 $ manip_out: chr  "One of my fondest childhood memories is a summer camping trip my family took to a national park. I was around e"| __truncated__ "One of my favorite childhood memories takes me back to a summer spent at my grandmother's house. Every Saturday"| __truncated__ "One of my fondest childhood memories is the summer camping trip my family took to a quiet lake nestled in the w"| __truncated__ "One of my fondest childhood memories is from a summer family picnic in the park when I was about eight years ol"| __truncated__ ...
 $ survey1  : num  4.13 4.4 4.27 3.47 4.4 ...
 $ survey2  : logi  NA NA NA NA NA NA ...
 $ ai_manip : chr  "Thank you for participating!" "I'm 21 and a junior studying Environmental Science at a state university. Feeling disconnected from my peers, I"| __truncated__ "I'm a 19-year-old White college student studying psychology. I'm passionate yet often overwhelmed by anxiety an"| __truncated__ "Thank you for participating!" ...
 $ condition: int  1 1 1 1 1 1 1 1 1 1 ...
# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey1~condition, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.0013  0.971
      98               

Check Your Assumptions

t-Test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
# table(d$iv, useNA = "always")
# 
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
d <- subset(d, gender != "5")
d <- subset(d, gender != "7")
# 
table(d$gender, useNA = "always")

   1    2 <NA> 
  24   73    0 
# 
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
# 
# table(d$iv, useNA = "always")
# 
# # check your variable types
# str(d)
# 
# # make sure that your IV is recognized as a factor by R
# # if you created a new _rc variable make sure to use that one instead
d$gender <- as.factor(d$gender)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest (survey1~gender, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  1.0624 0.3053
      95               

Issues with My Data

I dropped participants who identified as non-binary or “another gender not listed here”, leaving only male and female participants in the study. I tested homogeneity of variance for the first hypothesis using Levene’s test (p = .968) and that my dependent variable is evenly distributed with a skew and kurtosis between -2 and 2. I tested homogeneity of variance for the second hypothesis using Levene’s test (p = .305) and that my dependent variable is evenly distributed with a skew and kurtosis between -2 and 2.

Run Your Analysis

Run a t-Test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$condition)

View Test Output

t_output

    Welch Two Sample t-test

data:  d$survey1 by d$condition
t = 0.3147, df = 94.857, p-value = 0.7537
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.09073142  0.12491509
sample estimates:
mean in group 1 mean in group 2 
       4.237500        4.220408 

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
# d_output <- cohen.d(d$pss~d$pet)
# no significant difference, so we dont need to look at effect size

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
# d_output

Run a t-Test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$gender)

View Test Output

t_output

    Welch Two Sample t-test

data:  d$survey1 by d$gender
t = 2.1321, df = 49.582, p-value = 0.03798
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 0.006735557 0.226597776
sample estimates:
mean in group 1 mean in group 2 
       4.316667        4.200000 

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey1~d$gender)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: 0.4441989 (small)
95 percent confidence interval:
      lower       upper 
-0.02719842  0.91559619 

Write Up Results

t-Test

I tested my hypothesis that adolescents who wrote a short essay about their favorite childhood memory would have a higher average mindful attention awareness score than adolescents who wrote an essay about their favorite college course using a T-test. My data met all of the assumptions of a T-test. However, I did not find a significant difference, t(94.857)=0.315, p=0.754, 95% CI [-0.091,0.125] (refer to Figure 1).

I tested my hypothesis that women would have a higher average mindful attention awareness score than men using a T-test. My data met all of the assumptions of a T-test. I did find a significant difference, t(49.582)=2.132, p=0.0379, d=0.444, 95% CI [0.007,0.226] (refer to Figure 2). This effect size is small according to Cohen (1988).

``` References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.