AI Experiment Analysis

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results

Importing Data

# import your AI results dataset
d <- read.csv(file="Data/final results.csv", header=T)

Hypothesis & T-Test 1

I predict that participants who read an article that does not align with the beliefs of their self-reported political party affiliation will report higher stress levels than those who read an article that does align with the beliefs of their party.

Given our hypothesis, I will be running a t-test.

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# to view stats for all variables
describe(d)

           vars   n  mean    sd median trimmed   mad   min    max range  skew
id            1 100 50.50 29.01  50.50   50.50 37.06  1.00 100.00 99.00  0.00
identity*     2 100 50.50 29.01  50.50   50.50 37.06  1.00 100.00 99.00  0.00
consent*      3 100 14.20  6.49  13.00   13.50  0.00  1.00  32.00 31.00  1.04
age           4 100 42.78 14.32  36.00   40.33  5.93 22.00  88.00 66.00  1.37
race          5 100  4.56  1.53   4.00    4.66  2.97  1.00   7.00  6.00 -0.27
gender        6 100  1.84  0.37   2.00    1.93  0.00  1.00   2.00  1.00 -1.83
manip_out*    7 100 50.50 29.01  50.50   50.50 37.06  1.00 100.00 99.00  0.00
survey1*      8 100  1.33  0.59   1.00    1.21  0.00  1.00   3.00  2.00  1.56
survey2       9 100  3.20  0.16   3.22    3.18  0.16  2.89   3.56  0.67  0.19
ai_manip*    10 100 50.50 29.01  50.50   50.50 37.06  1.00 100.00 99.00  0.00
Condition    11 100  1.50  0.50   1.50    1.50  0.74  1.00   2.00  1.00  0.00
           kurtosis   se
id            -1.24 2.90
identity*     -1.24 2.90
consent*       1.14 0.65
age            1.03 1.43
race          -1.48 0.15
gender         1.35 0.04
manip_out*    -1.24 2.90
survey1*       1.36 0.06
survey2       -0.65 0.02
ai_manip*     -1.24 2.90
Condition     -2.02 0.05

# 
# # we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = "Condition")


 Descriptive statistics by group 
Condition: 1
          vars  n  mean    sd median trimmed   mad   min    max range  skew
id           1 50 25.50 14.58  25.50   25.50 18.53  1.00  50.00 49.00  0.00
identity     2 50 47.98 30.49  50.50   47.48 38.55  1.00 100.00 99.00  0.01
consent      3 50 14.30  6.48  13.00   13.55  0.00  2.00  32.00 30.00  1.15
age          4 50 40.02 13.83  34.00   37.23  2.97 22.00  88.00 66.00  1.98
race         5 50  4.68  1.52   6.00    4.80  0.74  2.00   7.00  5.00 -0.35
gender       6 50  1.80  0.40   2.00    1.88  0.00  1.00   2.00  1.00 -1.46
manip_out    7 50 70.30 21.58  73.50   72.75 19.27 19.00  99.00 80.00 -0.88
survey1      8 50  1.38  0.67   1.00    1.23  0.00  1.00   3.00  2.00  1.45
survey2      9 50  3.20  0.17   3.22    3.19  0.16  2.89   3.56  0.67  0.24
ai_manip    10 50 46.52 30.53  41.50   45.85 40.03  1.00 100.00 99.00  0.11
Condition   11 50  1.00  0.00   1.00    1.00  0.00  1.00   1.00  0.00   NaN
          kurtosis   se
id           -1.27 2.06
identity     -1.36 4.31
consent       1.32 0.92
age           3.48 1.96
race         -1.49 0.21
gender        0.12 0.06
manip_out     0.01 3.05
survey1       0.70 0.09
survey2      -0.58 0.02
ai_manip     -1.36 4.32
Condition      NaN 0.00
------------------------------------------------------------ 
Condition: 2
          vars  n  mean    sd median trimmed   mad   min   max range  skew
id           1 50 75.50 14.58  75.50   75.50 18.53 51.00 100.0 49.00  0.00
identity     2 50 53.02 27.53  50.50   53.08 36.32  3.00  99.0 96.00  0.04
consent      3 50 14.10  6.58  13.00   13.45  0.00  1.00  31.0 30.00  0.91
age          4 50 45.54 14.39  39.50   43.55  8.15 26.00  79.0 53.00  0.90
race         5 50  4.44  1.54   4.00    4.53  2.97  1.00   6.0  5.00 -0.18
gender       6 50  1.88  0.33   2.00    1.98  0.00  1.00   2.0  1.00 -2.27
manip_out    7 50 30.70 20.85  29.50   29.40 24.46  1.00 100.0 99.00  0.77
survey1      8 50  1.28  0.50   1.00    1.20  0.00  1.00   3.0  2.00  1.43
survey2      9 50  3.19  0.16   3.22    3.18  0.16  2.89   3.5  0.61  0.11
ai_manip    10 50 54.48 27.13  53.00   54.60 36.32  2.00  99.0 97.00 -0.05
Condition   11 50  2.00  0.00   2.00    2.00  0.00  2.00   2.0  0.00   NaN
          kurtosis   se
id           -1.27 2.06
identity     -1.23 3.89
consent       0.79 0.93
age          -0.41 2.04
race         -1.51 0.22
gender        3.21 0.05
manip_out     0.78 2.95
survey1       1.02 0.07
survey2      -0.91 0.02
ai_manip     -1.18 3.84
Condition      NaN 0.00

# 
# # also use histograms and scatter plots to examine your continuous variables
hist(d$"survey2")

#plot(d$survey2, d$survey1)
# 
# # and table() and cross_cases() to examine your categorical variables
# # you may not need the cross_cases code
table(d$"Condition")


 1  2 
50 50

cross_cases(d, "Condition", survey1)

	survey1
	Democratic	Independent	Republican
"Condition"
Condition	73	21	6
#Total cases	73	21	6

# 
# # and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey2~d$Condition)

# 
# #convert any categorical variables to factors
#d$variable <- as.factor(d$variable)

Check Your Assumptions

t-Test Assumptions

Data values must be independent (independent t-test only) (confirmed by data report)
Data obtained via a random sample (confirmed by data report)
IV must have two levels (will check below)
Dependent variable must be normally distributed (will check below. if issues, note and proceed)
Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
table(d$Condition, useNA = "always")


   1    2 <NA> 
  50   50    0

# 
# # note that the table() output shows you exactly how the levels of your variable are written. when re-coding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
d <- subset(d, survey1 != "Independent")
# 
table(d$survey1, useNA = "always")


Democratic Republican       <NA> 
        73          6          0

# 
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
# 
table(d$Condition, useNA = "always")


   1    2 <NA> 
  41   38    0

# 
# # check your variable types
str(d)

'data.frame':   79 obs. of  11 variables:
 $ id       : int  1 2 3 4 6 8 9 10 13 14 ...
 $ identity : chr  "I’m 51, a Black woman living in Atlanta. I identify as a Democrat but feel frustrated by the constant divisiven"| __truncated__ "I'm an 88-year-old Asian woman named Mei living in California. I’m deeply connected to my family but struggle w"| __truncated__ "I’m 32, White, and identify as a Democrat. I work in marketing and love creativity, but I often feel overwhelme"| __truncated__ "I’m a 37-year-old Black woman living in Atlanta. I’m passionate about social justice but often feel disillusion"| __truncated__ ...
 $ consent  : chr  "I understand the instructions. I will identify my political party, read the article about transgender athletes,"| __truncated__ "I understand the instructions. I will identify my political party, read the article about transgender athletes,"| __truncated__ "I understand the instructions. I will identify my political party, read the article about transgender athletes,"| __truncated__ "I understand the instructions. I will identify my political party, read the article about transgender athletes,"| __truncated__ ...
 $ age      : int  51 88 32 37 82 22 34 61 70 57 ...
 $ race     : int  3 2 6 3 4 6 3 3 3 3 ...
 $ gender   : int  2 2 2 2 2 2 2 2 2 2 ...
 $ manip_out: chr  "As I read the article, I felt a sense of validation and hope. The emphasis on inclusion and the scientific back"| __truncated__ "As I read the article, I felt a mixture of hope and frustration. The emphasis on inclusion resonates with my be"| __truncated__ "As I read the article, I felt a sense of relief and validation regarding my beliefs in inclusivity and equality"| __truncated__ "As I read the article, I felt a mix of hope and frustration. The emphasis on inclusion resonated with my passio"| __truncated__ ...
 $ survey1  : chr  "Democratic" "Democratic" "Democratic" "Democratic" ...
 $ survey2  : num  3.5 2.89 3 3.22 3.11 ...
 $ ai_manip : chr  "My identity as a Black woman and my Democratic values shaped my responses. The article resonated with my commit"| __truncated__ "As an 88-year-old Asian woman and a Democrat, my deep-rooted values of equality and inclusion shaped my respons"| __truncated__ "My identity as a 32-year-old White Democrat shaped my responses throughout the study. Engaging with opposing vi"| __truncated__ "My identity as a Black woman and my experiences with political division influenced my responses. The article sp"| __truncated__ ...
 $ Condition: int  1 1 1 1 1 1 1 1 1 1 ...

# 
# # make sure that your IV is recognized as a factor by R
# # if you created a new _rc variable make sure to use that one instead
d$Condition <- as.factor(d$Condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey2~Condition, data = d)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.0365 0.8491
      77

Run Your Analysis

Run a t-Test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey2~d$Condition)

View Test Output

t_output


    Welch Two Sample t-test

data:  d$survey2 by d$Condition
t = 0.010116, df = 76.923, p-value = 0.992
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.07681660  0.07760108
sample estimates:
mean in group 1 mean in group 2 
       3.191328        3.190936

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey2~d$Condition)

View Effect Size

Trivial: < .2
Small: between .2 and .5
Medium: between .5 and .8
Large: > .8

d_output


Cohen's d

d estimate: 0.002268502 (negligible)
95 percent confidence interval:
     lower      upper 
-0.4461223  0.4506593

Write Up Results

t-Test

Using an independent samples t-test, We tested our hypothesis that democrats and republicans would report higher levels of stress when reading an article than did not align with their reported political party. Our data met all of the assumptions of a t-test, but we did not find a statistically significant difference, t(0.10) = 76.923, p = 0.992, d = .002, 95% [-0.446, 0.451]. (refer to Fig. 1)

We dropped participants who had a political party affiliation other than democrat or republican (independent). We also confirmed homogeneity of variance using Levene’s test (p=.849) and that our dependent variable is normally distributed (skew and kurtosis between -+2)

Our effect size was negligible according to Cohen (1988).

Hypothesis and T-Test 2

I predict that self-reported political party affiliation will predict perceived stress levels, such that participants identifying as Republican will report higher stress levels compared to other affiliations, regardless of article alignment.

Given our hypothesis, I will be running a t-test.

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# to view stats for all variables
describe(d)

           vars  n  mean    sd median trimmed   mad   min    max range  skew
id            1 79 50.00 28.37  49.00   49.89 35.58  1.00 100.00 99.00  0.04
identity*     2 79 40.00 22.95  40.00   40.00 29.65  1.00  79.00 78.00  0.00
consent*      3 79 10.19  4.59   9.00    9.72  0.00  1.00  23.00 22.00  1.18
age           4 79 43.99 14.83  38.00   41.94  5.93 22.00  88.00 66.00  1.21
race          5 79  4.52  1.55   4.00    4.63  2.97  1.00   6.00  5.00 -0.28
gender        6 79  1.87  0.33   2.00    1.95  0.00  1.00   2.00  1.00 -2.20
manip_out*    7 79 40.00 22.95  40.00   40.00 29.65  1.00  79.00 78.00  0.00
survey1*      8 79  1.08  0.27   1.00    1.00  0.00  1.00   2.00  1.00  3.14
survey2       9 79  3.19  0.17   3.22    3.18  0.16  2.89   3.56  0.67  0.35
ai_manip*    10 79 40.00 22.95  40.00   40.00 29.65  1.00  79.00 78.00  0.00
Condition*   11 79  1.48  0.50   1.00    1.48  0.00  1.00   2.00  1.00  0.07
           kurtosis   se
id            -1.16 3.19
identity*     -1.25 2.58
consent*       1.31 0.52
age            0.55 1.67
race          -1.53 0.17
gender         2.89 0.04
manip_out*    -1.25 2.58
survey1*       7.97 0.03
survey2       -0.68 0.02
ai_manip*     -1.25 2.58
Condition*    -2.02 0.06

# 
# # we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = "survey1")


 Descriptive statistics by group 
survey1: 1
          vars  n  mean    sd median trimmed   mad   min    max range  skew
id           1 73 50.27 28.88  51.00   50.22 35.58  1.00 100.00 99.00 -0.01
identity     2 73 39.68 23.27  39.00   39.61 31.13  1.00  79.00 78.00  0.05
consent      3 73 10.37  4.69   9.00    9.86  0.00  1.00  23.00 22.00  1.14
age          4 73 44.47 15.29  37.00   42.32  5.93 22.00  88.00 66.00  1.12
race         5 73  4.40  1.55   4.00    4.49  2.97  1.00   6.00  5.00 -0.14
gender       6 73  1.90  0.30   2.00    2.00  0.00  1.00   2.00  1.00 -2.69
manip_out    7 73 41.58 23.01  43.00   41.86 29.65  1.00  79.00 78.00 -0.11
survey1      8 73  1.00  0.00   1.00    1.00  0.00  1.00   1.00  0.00   NaN
survey2      9 73  3.19  0.17   3.22    3.17  0.16  2.89   3.56  0.67  0.35
ai_manip    10 73 39.77 21.86  40.00   39.97 26.69  1.00  79.00 78.00 -0.04
Condition   11 73  1.51  0.50   2.00    1.51  0.00  1.00   2.00  1.00 -0.03
          kurtosis   se
id           -1.21 3.38
identity     -1.25 2.72
consent       1.02 0.55
age           0.25 1.79
race         -1.57 0.18
gender        5.30 0.03
manip_out    -1.24 2.69
survey1        NaN 0.00
survey2      -0.63 0.02
ai_manip     -1.16 2.56
Condition    -2.03 0.06
------------------------------------------------------------ 
survey1: 2
          vars n  mean    sd median trimmed   mad min  max range  skew kurtosis
id           1 6 46.67 23.11  41.50   46.67  9.64  22 90.0  68.0  0.87    -0.72
identity     2 6 43.83 19.93  54.00   43.83  6.67  12 59.0  47.0 -0.61    -1.68
consent      3 6  8.00  2.45   9.00    8.00  0.00   3  9.0   6.0 -1.36    -0.08
age          4 6 38.17  4.40  39.00   38.17  5.19  32 43.0  11.0 -0.26    -1.88
race         5 6  6.00  0.00   6.00    6.00  0.00   6  6.0   0.0   NaN      NaN
gender       6 6  1.50  0.55   1.50    1.50  0.74   1  2.0   1.0  0.00    -2.31
manip_out    7 6 20.83 10.76  26.50   20.83  2.97   3 29.0  26.0 -0.68    -1.55
survey1      8 6  2.00  0.00   2.00    2.00  0.00   2  2.0   0.0   NaN      NaN
survey2      9 6  3.26  0.20   3.22    3.26  0.25   3  3.5   0.5  0.14    -1.87
ai_manip    10 6 42.83 36.35  43.00   42.83 48.93   8 77.0  69.0  0.00    -2.30
Condition   11 6  1.17  0.41   1.00    1.17  0.00   1  2.0   1.0  1.36    -0.08
             se
id         9.44
identity   8.14
consent    1.00
age        1.80
race       0.00
gender     0.22
manip_out  4.39
survey1    0.00
survey2    0.08
ai_manip  14.84
Condition  0.17

# 
# # also use histograms and scatter plots to examine your continuous variables
hist(d$survey2)

#plot(d$survey2, d$survey1)
# 
# # and table() and cross_cases() to examine your categorical variables
# # you may not need the cross_cases code
table(d$survey1)


Democratic Republican 
        73          6

#cross_cases(d, IV1, IV2)
# 
# # and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey2~d$survey1)

# 
# #convert any categorical variables to factors
#d$variable <- as.factor(d$variable)

Check Your Assumptions

t-Test Assumptions

Data values must be independent (independent t-test only) (confirmed by data report)
Data obtained via a random sample (confirmed by data report)
IV must have two levels (will check below)
Dependent variable must be normally distributed (will check below. if issues, note and proceed)
Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
table(d$survey1, useNA = "always")


Democratic Republican       <NA> 
        73          6          0

# 
# # note that the table() output shows you exactly how the levels of your variable are written. when re-coding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
d <- subset(d, survey1 != "Independent")
# 
#table(d$iv, useNA = "always")
# 
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
# 
table(d$survey1, useNA = "always")


Democratic Republican       <NA> 
        73          6          0

# 
# # check your variable types
str(d)

'data.frame':   79 obs. of  11 variables:
 $ id       : int  1 2 3 4 6 8 9 10 13 14 ...
 $ identity : chr  "I’m 51, a Black woman living in Atlanta. I identify as a Democrat but feel frustrated by the constant divisiven"| __truncated__ "I'm an 88-year-old Asian woman named Mei living in California. I’m deeply connected to my family but struggle w"| __truncated__ "I’m 32, White, and identify as a Democrat. I work in marketing and love creativity, but I often feel overwhelme"| __truncated__ "I’m a 37-year-old Black woman living in Atlanta. I’m passionate about social justice but often feel disillusion"| __truncated__ ...
 $ consent  : chr  "I understand the instructions. I will identify my political party, read the article about transgender athletes,"| __truncated__ "I understand the instructions. I will identify my political party, read the article about transgender athletes,"| __truncated__ "I understand the instructions. I will identify my political party, read the article about transgender athletes,"| __truncated__ "I understand the instructions. I will identify my political party, read the article about transgender athletes,"| __truncated__ ...
 $ age      : int  51 88 32 37 82 22 34 61 70 57 ...
 $ race     : int  3 2 6 3 4 6 3 3 3 3 ...
 $ gender   : int  2 2 2 2 2 2 2 2 2 2 ...
 $ manip_out: chr  "As I read the article, I felt a sense of validation and hope. The emphasis on inclusion and the scientific back"| __truncated__ "As I read the article, I felt a mixture of hope and frustration. The emphasis on inclusion resonates with my be"| __truncated__ "As I read the article, I felt a sense of relief and validation regarding my beliefs in inclusivity and equality"| __truncated__ "As I read the article, I felt a mix of hope and frustration. The emphasis on inclusion resonated with my passio"| __truncated__ ...
 $ survey1  : chr  "Democratic" "Democratic" "Democratic" "Democratic" ...
 $ survey2  : num  3.5 2.89 3 3.22 3.11 ...
 $ ai_manip : chr  "My identity as a Black woman and my Democratic values shaped my responses. The article resonated with my commit"| __truncated__ "As an 88-year-old Asian woman and a Democrat, my deep-rooted values of equality and inclusion shaped my respons"| __truncated__ "My identity as a 32-year-old White Democrat shaped my responses throughout the study. Engaging with opposing vi"| __truncated__ "My identity as a Black woman and my experiences with political division influenced my responses. The article sp"| __truncated__ ...
 $ Condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...

# 
# # make sure that your IV is recognized as a factor by R
# # if you created a new _rc variable make sure to use that one instead
d$survey1 <- as.factor(d$survey1)

Testing Homogeneity of Variance with Levene’s Test

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey2~survey1, data = d)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.1371 0.7122
      77

Run Your Analysis

Run a t-Test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey2~d$survey1)

View Test Output

t_output


    Welch Two Sample t-test

data:  d$survey2 by d$survey1
t = -0.86163, df = 5.5817, p-value = 0.4244
alternative hypothesis: true difference in means between group Democratic and group Republican is not equal to 0
95 percent confidence interval:
 -0.2869317  0.1394939
sample estimates:
mean in group Democratic mean in group Republican 
                3.185540                 3.259259

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey2~d$survey1)

View Effect Size

Trivial: < .2
Small: between .2 and .5
Medium: between .5 and .8
Large: > .8

d_output


Cohen's d

d estimate: -0.4291667 (small)
95 percent confidence interval:
     lower      upper 
-1.2775697  0.4192362

Write Up Results

t-Test

Using an independent samples t-test, We tested our hypothesis that democrats and republicans would report higher levels of stress when reading an article than did not align with their reported political party. Our data met all of the assumptions of a t-test, but we did not find a statistically significant difference, t(-0.862) = 5.582, p = 0.4244, d = -0.429, 95% [-1.228, .419]. (refer to Fig. 1)

We dropped participants who had a political party affiliation other than democrat or republican (independent). We also confirmed homogeneity of variance using Levene’s test (p=0.712) and that our dependent variable is normally distributed (skew and kurtosis between -+2)

Our effect size was trivial according to Cohen (1988).

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.