AI Experiment Analysis

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results

Importing Data

# # import your AI results dataset
 d <- read.csv(file="Data/final results.csv", header=T)

State Your Hypotheses & Chosen Tests

H1: I predict that one’s perceived mental health during COVID-19 will have a negative relationship with their mental resilience The independent variable, perceived mental health, is categorical and the dependent variable, brief resilience scale scores, is continuous. I’m trying to figure out BRS scores differ based on whether their mental health is perceived as good or bad. Therefore, I’ll be doing a T-Test.

H2: People who perceived their mental health as worse during the COVID-19 pandemic will report more eating disorder behaviors The independent variable, perceived mental health, is categorical and the dependent variable, eating disorder questionnaire, is continuous. I’m trying to figure out EDEQ12 scores differ based on whether their mental health is perceived as good or bad. Therefore, I’ll be doing a T-Test.

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# # to view stats for all variables
describe(d)
           vars   n  mean    sd median trimmed   mad min    max range  skew
id            1 100 50.50 29.01  50.50   50.50 37.06   1 100.00 99.00  0.00
identity*     2 100 50.50 29.01  50.50   50.50 37.06   1 100.00 99.00  0.00
consent*      3 100  1.52  0.50   2.00    1.52  0.00   1   2.00  1.00 -0.08
age           4 100 25.26  6.47  24.00   24.02  0.00  20  80.00 60.00  6.50
race          5 100  4.49  1.62   4.00    4.59  2.97   1   7.00  6.00 -0.25
gender        6 100  1.94  0.42   2.00    2.00  0.00   1   5.00  4.00  2.82
manip_out*    7 100 50.50 29.01  50.50   50.50 37.06   1 100.00 99.00  0.00
survey1       8 100  3.07  0.39   3.00    3.10  0.74   2   3.67  1.67 -0.31
survey2       9 100  1.91  0.40   1.92    1.92  0.37   1   2.67  1.67 -0.19
ai_manip*    10 100 50.50 29.01  50.50   50.50 37.06   1 100.00 99.00  0.00
condition    11 100  1.50  0.50   1.50    1.50  0.74   1   2.00  1.00  0.00
           kurtosis   se
id            -1.24 2.90
identity*     -1.24 2.90
consent*      -2.01 0.05
age           49.36 0.65
race          -1.35 0.16
gender        26.83 0.04
manip_out*    -1.24 2.90
survey1       -1.05 0.04
survey2       -0.46 0.04
ai_manip*     -1.24 2.90
condition     -2.02 0.05
# 
# # we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = d$condition)

 Descriptive statistics by group 
group: 1
          vars  n  mean    sd median trimmed   mad min    max range  skew
id           1 50 25.50 14.58  25.50   25.50 18.53   1  50.00 49.00  0.00
identity     2 50 54.26 30.07  61.50   55.02 37.06   1 100.00 99.00 -0.22
consent      3 50  1.46  0.50   1.00    1.45  0.00   1   2.00  1.00  0.16
age          4 50 24.98  8.00  24.00   23.90  0.00  20  80.00 60.00  6.49
race         5 50  4.56  1.55   4.00    4.55  1.48   1   7.00  6.00 -0.06
gender       6 50  2.02  0.47   2.00    2.00  0.00   1   5.00  4.00  4.59
manip_out    7 50 45.70 29.74  37.50   44.50 31.88   3 100.00 97.00  0.42
survey1      8 50  3.03  0.40   3.00    3.06  0.49   2   3.50  1.50 -0.32
survey2      9 50  1.90  0.41   1.92    1.90  0.37   1   2.67  1.67  0.03
ai_manip    10 50 54.58 28.32  60.00   54.98 32.62   6 100.00 94.00 -0.13
condition   11 50  1.00  0.00   1.00    1.00  0.00   1   1.00  0.00   NaN
          kurtosis   se
id           -1.27 2.06
identity     -1.27 4.25
consent      -2.01 0.07
age          41.72 1.13
race         -1.42 0.22
gender       29.27 0.07
manip_out    -1.14 4.21
survey1      -0.86 0.06
survey2      -0.66 0.06
ai_manip     -1.20 4.01
condition      NaN 0.00
------------------------------------------------------------ 
group: 2
          vars  n  mean    sd median trimmed   mad  min    max range  skew
id           1 50 75.50 14.58   75.5   75.50 18.53 51.0 100.00 49.00  0.00
identity     2 50 46.74 27.70   44.5   46.05 34.10  3.0  96.00 93.00  0.21
consent      3 50  1.58  0.50    2.0    1.60  0.00  1.0   2.00  1.00 -0.31
age          4 50 25.54  4.51   24.0   24.35  0.00 22.0  43.00 21.00  2.49
race         5 50  4.42  1.70    5.0    4.58  1.48  1.0   6.00  5.00 -0.36
gender       6 50  1.86  0.35    2.0    1.95  0.00  1.0   2.00  1.00 -2.01
manip_out    7 50 55.30 27.73   58.5   56.52 28.17  1.0  99.00 98.00 -0.44
survey1      8 50  3.12  0.39    3.0    3.14  0.74  2.5   3.67  1.17 -0.29
survey2      9 50  1.92  0.39    2.0    1.94  0.37  1.0   2.67  1.67 -0.44
ai_manip    10 50 46.42 29.40   44.5   46.08 37.06  1.0  96.00 95.00  0.14
condition   11 50  2.00  0.00    2.0    2.00  0.00  2.0   2.00  0.00   NaN
          kurtosis   se
id           -1.27 2.06
identity     -1.13 3.92
consent      -1.94 0.07
age           5.19 0.64
race         -1.45 0.24
gender        2.10 0.05
manip_out    -0.95 3.92
survey1      -1.43 0.05
survey2      -0.31 0.06
ai_manip     -1.28 4.16
condition      NaN 0.00
# 
# # also use histograms and scatterplots to examine your continuous variables
hist(d$survey1)

hist(d$survey2)

# plot(d$dv, d$tv)
# 
# # and table() and cross_cases() to examine your categorical variables
# # you may not need the cross_cases code
# table(d$IV)
# cross_cases(d, IV1, IV2)
# 
# # and boxplot to examine any categorical variables with continuous variables
 boxplot(d$survey1~d$condition)

  boxplot(d$survey2~d$condition)

# 
# #convert any categorical variables to factors
# d$variable <- as.factor(d$variable)

Check Your Assumptions

t-Test1 Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# preview the levels and counts for your IV
table(d$condition, useNA = "always")

   1    2 <NA> 
  50   50    0 
# note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear

# to drop levels from your variable
# this subsets the data and says that any participant who is coded as 'BAD' should be removed
# d <- subset(d, IV != "BAD")
# 
# table(d$iv, useNA = "always")
# 
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
# 
# table(d$iv, useNA = "always")

# check your variable types
str(d)
'data.frame':   100 obs. of  11 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I’m a 24-year-old multiracial woman from Atlanta, Georgia. Balancing my job as a community organizer with a s"| __truncated__ "I'm a 24-year-old Black woman from Atlanta, Georgia, navigating my career as a data analyst. The pandemic disru"| __truncated__ "I'm 24, a Black woman from Atlanta, Georgia, working as a community outreach coordinator. The pandemic disrupte"| __truncated__ "I'm 25, a Latina from San Antonio, Texas. Working as a high school math teacher, I often feel overwhelmed balan"| __truncated__ ...
 $ consent  : chr  "I understand these instructions." "I understand the instructions." "I understand these instructions." "I understand the instructions." ...
 $ age      : int  24 24 24 25 23 80 24 24 24 23 ...
 $ race     : int  7 3 3 4 3 4 7 6 4 6 ...
 $ gender   : int  2 2 2 2 2 2 5 2 2 2 ...
 $ manip_out: chr  "During the COVID-19 pandemic, I often felt a deep sense of isolation, but there were moments that brought me em"| __truncated__ "During the COVID-19 pandemic, I often felt overwhelmed and anxious, but there were moments that provided a sens"| __truncated__ "During the COVID-19 pandemic, my mental health fluctuated significantly, but there were moments when I felt emo"| __truncated__ "During the COVID-19 pandemic, there were moments when I felt overwhelmed and isolated, but I also found ways to"| __truncated__ ...
 $ survey1  : num  3.5 3.5 2.67 3.5 2.67 ...
 $ survey2  : num  2.58 2 1.92 1.75 1.75 ...
 $ ai_manip : chr  "I answered based on my experiences of isolation during the pandemic, the importance of my support system, and m"| __truncated__ "I answered the questions reflecting my experiences with anxiety during college, especially during the pandemic."| __truncated__ "I answered based on my experiences during the pandemic, reflecting on my fluctuating mental health and the copi"| __truncated__ "I answered based on my experiences of balancing work, isolation, and emotional struggles during college. My foc"| __truncated__ ...
 $ condition: int  1 1 1 1 1 1 1 1 1 1 ...
# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# use the leveneTest() command from the car package to test homogeneity of variance
# uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey1~condition, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.2613 0.6104
      98               

t-Test2 Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
table(d$condition, useNA = "always")

   1    2 <NA> 
  50   50    0 
# note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear

# to drop levels from your variable
# this subsets the data and says that any participant who is coded as 'BAD' should be removed
# d <- subset(d, IV != "BAD")
# 
# table(d$iv, useNA = "always")
# 
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
# 
# table(d$iv, useNA = "always")

# check your variable types
str(d)
'data.frame':   100 obs. of  11 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I’m a 24-year-old multiracial woman from Atlanta, Georgia. Balancing my job as a community organizer with a s"| __truncated__ "I'm a 24-year-old Black woman from Atlanta, Georgia, navigating my career as a data analyst. The pandemic disru"| __truncated__ "I'm 24, a Black woman from Atlanta, Georgia, working as a community outreach coordinator. The pandemic disrupte"| __truncated__ "I'm 25, a Latina from San Antonio, Texas. Working as a high school math teacher, I often feel overwhelmed balan"| __truncated__ ...
 $ consent  : chr  "I understand these instructions." "I understand the instructions." "I understand these instructions." "I understand the instructions." ...
 $ age      : int  24 24 24 25 23 80 24 24 24 23 ...
 $ race     : int  7 3 3 4 3 4 7 6 4 6 ...
 $ gender   : int  2 2 2 2 2 2 5 2 2 2 ...
 $ manip_out: chr  "During the COVID-19 pandemic, I often felt a deep sense of isolation, but there were moments that brought me em"| __truncated__ "During the COVID-19 pandemic, I often felt overwhelmed and anxious, but there were moments that provided a sens"| __truncated__ "During the COVID-19 pandemic, my mental health fluctuated significantly, but there were moments when I felt emo"| __truncated__ "During the COVID-19 pandemic, there were moments when I felt overwhelmed and isolated, but I also found ways to"| __truncated__ ...
 $ survey1  : num  3.5 3.5 2.67 3.5 2.67 ...
 $ survey2  : num  2.58 2 1.92 1.75 1.75 ...
 $ ai_manip : chr  "I answered based on my experiences of isolation during the pandemic, the importance of my support system, and m"| __truncated__ "I answered the questions reflecting my experiences with anxiety during college, especially during the pandemic."| __truncated__ "I answered based on my experiences during the pandemic, reflecting on my fluctuating mental health and the copi"| __truncated__ "I answered based on my experiences of balancing work, isolation, and emotional struggles during college. My foc"| __truncated__ ...
 $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# use the leveneTest() command from the car package to test homogeneity of variance
# uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey2~condition, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.5807 0.4479
      98               

Issues with My Data

I did not run into any problems with my data. Skew and kurtosis were within normal range (-2 to +2). Distribution was also relatively normal, as well, with no glaring issues or outliers. I also confirmed homogeneity of variance using Levene’s test for both hypotheses. Hypothesis 1: p = .610 Hypothesis 2: p = .448

Run Your Analysis

Run a t-Test1

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$condition)

View Test Output

t_output

    Welch Two Sample t-test

data:  d$survey1 by d$condition
t = -1.0672, df = 97.884, p-value = 0.2885
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.24057552  0.07230885
sample estimates:
mean in group 1 mean in group 2 
       3.032600        3.116733 

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey1~d$condition)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: -0.2134492 (small)
95 percent confidence interval:
     lower      upper 
-0.6114713  0.1845728 

Run a t-Test2

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey2~d$condition)

View Test Output

t_output

    Welch Two Sample t-test

data:  d$survey2 by d$condition
t = -0.16583, df = 97.607, p-value = 0.8686
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.1728983  0.1462316
sample estimates:
mean in group 1 mean in group 2 
       1.903333        1.916667 

Calculate Cohen’s d

# once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey2~d$condition)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: -0.0331663 (negligible)
95 percent confidence interval:
     lower      upper 
-0.4300871  0.3637545 

Write Up Results

t-Test1

I tested my hypothesis that people who were college students during the COVID-19 pandemic who perceived their mental health as worse during the COVID-19 pandemic would experience higher levels of mental resilience. Our data meets all of the assumptions of a T Test. Data did not support this hypothesis, as I did not find a significant difference, t(97.88) = –1.07, p = .289, d = –0.21, 95% CI [–0.61, 0.18] (refer to Figure 1).

The effect size was small according to Cohen (1988).

t-Test2

I tested my hypothesis that people who were college students during the COVID-19 pandemic who perceived their mental health as worse during the COVID-19 pandemic would experience higher prevelance and severity of eating disorder symptoms. Our data meets all of the assumptions of a T Test. Data did not support this hypothesis, as I did not find a significant difference, t(97.61) = –0.17, p = .869, d = –0.03, 95% CI [–0.43, 0.36] (refer to Figure 1).

The effect size was trivial according to Cohen (1988).

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.