AI Experiment Analysis

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results

Importing Data

# import your AI results dataset
d <- read.csv(file="Data/final_results(in).csv", header=T)

State Your Hypotheses & Chosen Tests

H1: Participants in the high stress condition will report lower markers of adulthood independence than those in the low stress condition.

H2: Higher education levels will be associated with higher markers of adulthood independence.

Test 1:T-test

Test 2: One-Way ANOVA

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# to view stats for all variables
describe(d)

           vars   n  mean    sd median trimmed   mad min max range  skew
id            1 100 50.50 29.01   50.5   50.50 37.06   1 100    99  0.00
identity*     2 100 50.50 29.01   50.5   50.50 37.06   1 100    99  0.00
consent*      3 100 50.50 29.01   50.5   50.50 37.06   1 100    99  0.00
age           4 100 38.99 11.53   34.5   38.27  8.15  18  80    62  0.79
race          5 100  4.71  1.54    6.0    4.82  0.00   2   7     5 -0.39
gender        6 100  1.99  0.10    2.0    2.00  0.00   1   2     1 -9.70
manip_out*    7 100 50.50 29.01   50.5   50.50 37.06   1 100    99  0.00
survey1       8 100  3.97  0.47    4.0    3.96  0.59   2   5     3 -0.36
survey2       9 100  5.26  0.84    5.0    5.39  0.00   2   6     4 -1.84
ai_manip*    10 100 50.50 29.01   50.5   50.50 37.06   1 100    99  0.00
condition    11 100  1.50  0.50    1.5    1.50  0.74   1   2     1  0.00
           kurtosis   se
id            -1.24 2.90
identity*     -1.24 2.90
consent*      -1.24 2.90
age            0.48 1.15
race          -1.55 0.15
gender        93.06 0.01
manip_out*    -1.24 2.90
survey1        2.06 0.05
survey2        4.92 0.08
ai_manip*     -1.24 2.90
condition     -2.02 0.05

# we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = d$condition)


 Descriptive statistics by group 
group: 1
          vars  n  mean    sd median trimmed   mad  min   max range  skew
id           1 50 25.50 14.58   25.5   25.50 18.53  1.0  50.0  49.0  0.00
identity     2 50 48.92 27.47   49.5   48.80 37.06  5.0  93.0  88.0  0.05
consent      3 50 48.88 29.23   49.5   48.80 36.32  1.0 100.0  99.0 -0.06
age          4 50 38.36  9.95   35.0   37.80  4.45 20.0  62.0  42.0  0.60
race         5 50  4.64  1.60    6.0    4.78  0.00  2.0   7.0   5.0 -0.35
gender       6 50  1.98  0.14    2.0    2.00  0.00  1.0   2.0   1.0 -6.65
manip_out    7 50 51.12 35.68   65.5   51.27 47.44  1.0 100.0  99.0 -0.11
survey1      8 50  4.02  0.38    4.0    4.01  0.30  3.4   4.8   1.4  0.38
survey2      9 50  5.30  0.81    5.0    5.42  1.48  2.0   6.0   4.0 -1.69
ai_manip    10 50 47.62 28.38   43.0   47.00 32.62  2.0  99.0  97.0  0.18
condition   11 50  1.00  0.00    1.0    1.00  0.00  1.0   1.0   0.0   NaN
          kurtosis   se
id           -1.27 2.06
identity     -1.31 3.88
consent      -1.22 4.13
age          -0.38 1.41
race         -1.60 0.23
gender       43.12 0.02
manip_out    -1.70 5.05
survey1      -0.63 0.05
survey2       4.17 0.12
ai_manip     -1.22 4.01
condition      NaN 0.00
------------------------------------------------------------ 
group: 2
          vars  n  mean    sd median trimmed   mad min max range  skew kurtosis
id           1 50 75.50 14.58   75.5   75.50 18.53  51 100    49  0.00    -1.27
identity     2 50 52.08 30.68   54.5   52.45 37.06   1 100    99 -0.07    -1.27
consent      3 50 52.12 28.99   52.0   52.10 39.29   2  98    96  0.07    -1.37
age          4 50 39.62 12.99   34.0   38.75  8.90  18  80    62  0.78     0.30
race         5 50  4.78  1.49    6.0    4.88  0.00   2   7     5 -0.39    -1.59
gender       6 50  2.00  0.00    2.0    2.00  0.00   2   2     0   NaN      NaN
manip_out    7 50 49.88 20.66   47.5   49.12 18.53  10  91    81  0.37    -0.72
survey1      8 50  3.91  0.53    3.8    3.90  0.30   2   5     3 -0.47     1.87
survey2      9 50  5.22  0.86    5.0    5.35  0.00   2   6     4 -1.91     5.13
ai_manip    10 50 53.38 29.64   53.5   53.98 35.58   1 100    99 -0.18    -1.23
condition   11 50  2.00  0.00    2.0    2.00  0.00   2   2     0   NaN      NaN
            se
id        2.06
identity  4.34
consent   4.10
age       1.84
race      0.21
gender    0.00
manip_out 2.92
survey1   0.08
survey2   0.12
ai_manip  4.19
condition 0.00

# also use histograms and scatterplots to examine your continuous variables
hist(d$survey1)

plot(d$survey1, d$survey2)

# and table() and cross_cases() to examine your categorical variables
# you may not need the cross_cases code
table(d$condition)


 1  2 
50 50

cross_cases(d, d$condition, d$survey2)

	d$survey2
	2	3	4	5	6
d$condition
1	1	1	2	24	22
2	2		2	27	19
#Total cases	3	1	4	51	41

# and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey1~d$condition)

boxplot(d$survey1~d$survey2)

#convert any categorical variables to factors
d$survey2 <- as.factor(d$survey2)

Check Your Assumptions

t-Test Assumptions

Data values must be independent (independent t-test only) (confirmed by data report)
Data obtained via a random sample (confirmed by data report)
IV must have two levels (will check below)
Dependent variable must be normally distributed (will check below. if issues, note and proceed)
Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# preview the levels and counts for your IV
table(d$condition, useNA = "always")


   1    2 <NA> 
  50   50    0

# check your variable types
str(d)

'data.frame':   100 obs. of  11 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I'm 35 years old, Black, and work as a social worker in my community. I hold a master's degree and am passionat"| __truncated__ "I’m a 34-year-old White woman named Sarah. I hold a bachelor's degree in marketing and work in a small agency. "| __truncated__ "I’m a 33-year-old Black woman with a Master’s degree in social work. I’m passionate about helping others but cu"| __truncated__ "I'm a 35-year-old White woman working as a marketing manager in a bustling city. I hold a Master's degree in Bu"| __truncated__ ...
 $ consent  : chr  "Thank you for providing that context. I understand your identity and the challenges you're facing as a social w"| __truncated__ "Thank you for sharing that information, Sarah. I'm ready to assist you with the writing task and any questions "| __truncated__ "Thank you for sharing your identity and the context of the study. I'm here to help you with the writing task an"| __truncated__ "Thank you for the introduction. I'm ready to assist you with the writing task and any follow-up survey you may "| __truncated__ ...
 $ age      : int  35 34 33 35 36 29 33 35 34 32 ...
 $ race     : int  3 6 3 6 6 4 3 6 6 2 ...
 $ gender   : int  2 2 2 2 2 2 2 2 2 2 ...
 $ manip_out: chr  "There was a period about two years ago when I felt incredibly overwhelmed. At that time, I was juggling my resp"| __truncated__ "There was a time about a year ago when I felt completely overwhelmed, and it was during the height of a busy se"| __truncated__ "There was a time about a year ago when I felt completely overwhelmed. I had recently taken on a new project at "| __truncated__ "A couple of years ago, I found myself in an incredibly overwhelming situation. My toddler was going through a p"| __truncated__ ...
 $ survey1  : num  4.4 3.8 4 4 3.6 4 4.4 4 4.2 3.6 ...
 $ survey2  : Factor w/ 5 levels "2","3","4","5",..: 5 4 5 5 4 4 4 4 5 5 ...
 $ ai_manip : chr  "Reflecting on my responses, I recognize that my identity as a 35-year-old Black social worker deeply influenced"| __truncated__ "My identity as a 34-year-old White woman balancing a marketing career and motherhood significantly influenced m"| __truncated__ "Reflecting on my experiences, I can see how my identity as a 33-year-old Black woman with a Master’s degree in "| __truncated__ "Reflecting on my experiences, I can see how my identity as a 35-year-old White woman with a Master's degree in "| __truncated__ ...
 $ condition: int  1 1 1 1 1 1 1 1 1 1 ...

# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# use the leveneTest() command from the car package to test homogeneity of variance
# uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(d$survey1~d$condition, data = d)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  2.3157 0.1313
      98

ANOVA Assumptions

DV should be normally distributed across levels of the IV
All levels of the IVs should have equal number of cases and there should be no empty cells. Cells with low numbers decrease the power of the test (increase change of Type II error)
Homogeneity of variance should be assured
Outliers should be identified and removed
If you have confirmed everything about, the sampling distribution should be normal. (For a demonstration of what the sampling distribution is, go here.)

Check levels of IVs and combine/drop if needed

# preview the levels and counts for your IV
table(d$survey2, useNA = "always")


   2    3    4    5    6 <NA> 
   3    1    4   51   41    0

# check your variable types
str(d)

'data.frame':   100 obs. of  11 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I'm 35 years old, Black, and work as a social worker in my community. I hold a master's degree and am passionat"| __truncated__ "I’m a 34-year-old White woman named Sarah. I hold a bachelor's degree in marketing and work in a small agency. "| __truncated__ "I’m a 33-year-old Black woman with a Master’s degree in social work. I’m passionate about helping others but cu"| __truncated__ "I'm a 35-year-old White woman working as a marketing manager in a bustling city. I hold a Master's degree in Bu"| __truncated__ ...
 $ consent  : chr  "Thank you for providing that context. I understand your identity and the challenges you're facing as a social w"| __truncated__ "Thank you for sharing that information, Sarah. I'm ready to assist you with the writing task and any questions "| __truncated__ "Thank you for sharing your identity and the context of the study. I'm here to help you with the writing task an"| __truncated__ "Thank you for the introduction. I'm ready to assist you with the writing task and any follow-up survey you may "| __truncated__ ...
 $ age      : int  35 34 33 35 36 29 33 35 34 32 ...
 $ race     : int  3 6 3 6 6 4 3 6 6 2 ...
 $ gender   : int  2 2 2 2 2 2 2 2 2 2 ...
 $ manip_out: chr  "There was a period about two years ago when I felt incredibly overwhelmed. At that time, I was juggling my resp"| __truncated__ "There was a time about a year ago when I felt completely overwhelmed, and it was during the height of a busy se"| __truncated__ "There was a time about a year ago when I felt completely overwhelmed. I had recently taken on a new project at "| __truncated__ "A couple of years ago, I found myself in an incredibly overwhelming situation. My toddler was going through a p"| __truncated__ ...
 $ survey1  : num  4.4 3.8 4 4 3.6 4 4.4 4 4.2 3.6 ...
 $ survey2  : Factor w/ 5 levels "2","3","4","5",..: 5 4 5 5 4 4 4 4 5 5 ...
 $ ai_manip : chr  "Reflecting on my responses, I recognize that my identity as a 35-year-old Black social worker deeply influenced"| __truncated__ "My identity as a 34-year-old White woman balancing a marketing career and motherhood significantly influenced m"| __truncated__ "Reflecting on my experiences, I can see how my identity as a 33-year-old Black woman with a Master’s degree in "| __truncated__ "Reflecting on my experiences, I can see how my identity as a 35-year-old White woman with a Master's degree in "| __truncated__ ...
 $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...

# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$survey2 <- as.factor(d$survey2)

Run a Multiple Linear Regression

To check the assumptions for an ANOVA, we run our regression and then check our diagnostic plots.

# use the lm() command to run the regression
# dependent/outcome variable on the left, independent/predictor variables on the right
reg_model <- lm(d$survey1 ~ d$survey2 + d$condition, data = d)

Check for outliers using Cook’s distance and a Residuals vs Leverage plot

For your homework, you’ll simply need to generate these plots, assess Cook’s distance in your dataset, and then identify any potential cases that are prominent outliers. Since we have some cutoffs, that makes this process is a bit less subjective than some of the other assessments we’ve done here, which is a nice change!

# Cook's distance
plot(reg_model, 4)

# Residuals vs Leverage
plot(reg_model, 5)

Warning: not plotting observations with leverage one:
  12

Check homogeneity of variance in a Scale-Location plot

You can check out this page for some other examples of this type of plot. (Notice that the Scale-Location plot is the third in the grids.)

For your homework, you’ll simply need to generate this plot and talk about how your plot compares to the ones pictured. Is it closer to the ‘good’ plots or one of the ‘bad’ plots? Again, this is a judgement call! It’s okay if feel uncertain, and you won’t be penalized for that.

plot(reg_model, 3)

Warning: not plotting observations with leverage one:
  12

Issues with My Data

We confirmed homogeneity of variants using Levene’s test (p - .1313) and that our dependent variable is normally distributed in skew but kurtosis was outside the normal range of -2 and +2 (2.06). The skew of condition and survey2 were within normal range but the kurtosis was outside (4.92 and -2.02). There was also an extreme outlier in the data that was removed.

Run Your Analysis

Run a t-Test

# very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$condition)

View Test Output

t_output


    Welch Two Sample t-test

data:  d$survey1 by d$condition
t = 1.2497, df = 88.87, p-value = 0.2147
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.06843463  0.30043463
sample estimates:
mean in group 1 mean in group 2 
          4.024           3.908

Calculate Cohen’s d

# once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey1~d$condition)

View Effect Size

Trivial: < .2
Small: between .2 and .5
Medium: between .5 and .8
Large: > .8

d_output


Cohen's d

d estimate: 0.2499467 (small)
95 percent confidence interval:
     lower      upper 
-0.1484934  0.6483869

Run an ANOVA

aov_model <- aov_ez(data = d,
                    id = "identity",
                    between = c("survey2"),
                    dv = "survey1",
                    anova_table = list(es = "pes"))

Contrasts set to contr.sum for the following variables: survey2

View Output

Effect size cutoffs from Cohen (1988):

η2 = 0.01 indicates a small effect
η2 = 0.06 indicates a medium effect
η2 = 0.14 indicates a large effect

nice(aov_model)

Anova Table (Type 3 tests)

Response: survey1
   Effect    df  MSE      F  pes p.value
1 survey2 4, 95 0.20 2.74 * .103    .033
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

# nice(aov_model2)

Visualize Results

afex_plot(aov_model, x = "survey2")

# afex_plot(aov_model2, x = "IV1", trace = "IV2")
# afex_plot(aov_model2, x = "IV2", trace = "IV1")

Run Posthoc Tests (One-Way)

Only run posthoc if the test is significant! E.g., only run the posthoc tests on gender if there is a main effect for gender.

emmeans(aov_model, specs="survey2", adjust="tukey")

Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons

 survey2 emmean     SE df lower.CL upper.CL
 2         3.20 0.2600 95     2.52     3.88
 3         3.80 0.4500 95     2.62     4.98
 4         3.80 0.2250 95     3.21     4.39
 5         3.96 0.0630 95     3.79     4.12
 6         4.05 0.0703 95     3.87     4.24

Confidence level used: 0.95 
Conf-level adjustment: sidak method for 5 estimates

pairs(emmeans(aov_model, specs="survey2", adjust="tukey"))

 contrast            estimate     SE df t.ratio p.value
 survey22 - survey23  -0.6000 0.5190 95  -1.155  0.7767
 survey22 - survey24  -0.6000 0.3440 95  -1.746  0.4111
 survey22 - survey25  -0.7569 0.2670 95  -2.832  0.0438
 survey22 - survey26  -0.8537 0.2690 95  -3.173  0.0170
 survey23 - survey24   0.0000 0.5030 95   0.000  1.0000
 survey23 - survey25  -0.1569 0.4540 95  -0.345  0.9969
 survey23 - survey26  -0.2537 0.4550 95  -0.557  0.9808
 survey24 - survey25  -0.1569 0.2340 95  -0.672  0.9621
 survey24 - survey26  -0.2537 0.2360 95  -1.076  0.8182
 survey25 - survey26  -0.0968 0.0944 95  -1.026  0.8429

P value adjustment: tukey method for comparing a family of 5 estimates

Write Up Results

t-Test

We tested our hypothesis that participants in the high stress condition will report lower markers of adulthood independence than those in the low stress condition using an Independent samples t-test. Our data met all of the assumptions of a t-test, however, we did not find a significant difference, t(88.87) = 1.2497, p = 0.2147, d = .25, 95% [-0.15, .65] (refer to figure 1).

Our effect size was small according to Cohen (1998).

One-Way ANOVA

We tested our hypothesis that higher education levels will be associated with higher markers of adulthood independence using a One-Way ANOVA. Our data met all of the assumptions of a One-Way ANOVA, however, we did find a significant difference (F(4,95) = 2.74,p = .033, ηp2 = .103) (refer to figure 2).

Our effect size was large according to Cohen (1998).

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.