AI Experiment Analysis

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results

Importing Data

# # import your AI results dataset
d <- read.csv(file="Data/final_results.csv", header=T)

State Your Hypotheses & Chosen Tests

Participants who are presented with the position of social media as negative will feel higher levels of stress than those who are given the position of social media as positive.

Age will predict perceived stress levels, and the relationship will be negative. (The higher the age, the less perceived stress level)

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# to view stats for all variables
describe(d)
           vars   n  mean    sd median trimmed   mad min   max range  skew
id            1 100 50.50 29.01  50.50   50.50 37.06   1 100.0  99.0  0.00
identity*     2 100 50.50 29.01  50.50   50.50 37.06   1 100.0  99.0  0.00
consent*      3 100  1.36  0.48   1.00    1.32  0.00   1   2.0   1.0  0.57
age           4 100 39.96 12.82  35.00   38.74  8.15  20  79.0  59.0  0.95
race          5 100  4.59  1.52   5.00    4.66  1.48   2   7.0   5.0 -0.19
gender        6 100  1.94  0.42   2.00    2.00  0.00   1   5.0   4.0  2.82
manip_out*    7 100 50.50 29.01  50.50   50.50 37.06   1 100.0  99.0  0.00
survey1       8 100  2.67  0.32   2.75    2.69  0.37   2   3.5   1.5  0.02
survey2       9 100  2.36  0.27   2.50    2.35  0.00   2   3.5   1.5  0.48
ai_manip*    10 100 50.50 29.01  50.50   50.50 37.06   1 100.0  99.0  0.00
condition    11 100  1.50  0.50   1.50    1.50  0.74   1   2.0   1.0  0.00
           kurtosis   se
id            -1.24 2.90
identity*     -1.24 2.90
consent*      -1.69 0.05
age            0.26 1.28
race          -1.66 0.15
gender        26.83 0.04
manip_out*    -1.24 2.90
survey1        0.31 0.03
survey2        1.62 0.03
ai_manip*     -1.24 2.90
condition     -2.02 0.05
# also use histograms and scatterplots to examine your continuous variables
hist(d$survey1)

plot(d$survey1, d$survey2)

# and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey1~d$gender)

boxplot(d$survey1~d$condition)

Check Your Assumptions

t-Test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# preview the levels and counts for your IV
table(d$gender, useNA = "always")

   1    2    5 <NA> 
   9   90    1    0 
# note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
 
# to drop levels from your variable
# this subsets the data and says that any participant who is coded as 'BAD' should be removed
d <- subset(d, gender != "5")

table(d$gender, useNA = "always")

   1    2 <NA> 
   9   90    0 
# to combine levels
# this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
d$gender_rc[d$gender == "5"] <- "2"
 
table(d$gender, useNA = "always")

   1    2 <NA> 
   9   90    0 
# check your variable types
str(d)
'data.frame':   99 obs. of  12 variables:
 $ id       : int  1 2 3 4 5 6 7 9 10 11 ...
 $ identity : chr  "I'm a 34-year-old Asian American woman named Mei, living in Seattle. I’m an introverted artist who struggles wi"| __truncated__ "I'm 32, Asian American, juggling my job in tech with family expectations. I often feel the weight of cultural p"| __truncated__ "I’m a 60-year-old Black woman named Cynthia, a retired social worker living in Atlanta. I treasure my close-kni"| __truncated__ "I'm a 45-year-old White woman named Linda. I work as a librarian, but I often feel unfulfilled and isolated in "| __truncated__ ...
 $ consent  : chr  "I understand the instructions." "I understand the instructions." "I understand these instructions." "I understand these instructions." ...
 $ age      : int  34 32 60 45 34 30 42 32 22 45 ...
 $ race     : int  2 2 3 6 6 6 6 6 6 3 ...
 $ gender   : int  2 1 2 2 2 2 2 2 2 1 ...
 $ manip_out: chr  "Social media, while often lauded for its ability to connect individuals, has increasingly been criticized for p"| __truncated__ "Social media has become a double-edged sword in modern society, with its potential to spread misinformation and"| __truncated__ "Social media has become a significant force in modern society, often criticized for spreading misinformation an"| __truncated__ "Social media has become a pervasive force in modern communication, and while it offers some benefits, the negat"| __truncated__ ...
 $ survey1  : num  2.75 2.5 3 2.5 2.75 2 3 2.5 2.5 2.75 ...
 $ survey2  : num  2.5 2 2.5 2.6 3 2.5 2 2.2 2.5 2.5 ...
 $ ai_manip : chr  "I answered the questions based on my experiences and observations about social media's impact on my mental heal"| __truncated__ "I answered the questions reflecting my experiences with social media's negative influence on mental health, emp"| __truncated__ "I answered the questions reflecting my belief in social media's dual impact. While it can exacerbate feelings o"| __truncated__ "I answered as I did because I see social media as a double-edged sword. While it can connect people, my experie"| __truncated__ ...
 $ condition: int  1 1 1 1 1 1 1 1 1 1 ...
 $ gender_rc: chr  NA NA NA NA ...
# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$gender <- as.factor(d$gender)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# use the leveneTest() command from the car package to test homogeneity of variance
# uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey1~gender, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.9708 0.3269
      97               

Check levels of IVs and combine/drop if needed

# preview the levels and counts for your IV
table(d$gender, useNA = "always")

   1    2 <NA> 
   9   90    0 

Run a Multiple Linear Regression

To check the assumptions for an ANOVA, we run our regression and then check our diagnostic plots.

# use the lm() command to run the regression
# dependent/outcome variable on the left, independent/predictor variables on the right
reg_model <- lm(survey1 ~ gender + age, data = d)

Check for outliers using Cook’s distance and a Residuals vs Leverage plot

For your homework, you’ll simply need to generate these plots, assess Cook’s distance in your dataset, and then identify any potential cases that are prominent outliers. Since we have some cutoffs, that makes this process is a bit less subjective than some of the other assessments we’ve done here, which is a nice change!

# Cook's distance
plot(reg_model, 4)

# Residuals vs Leverage
plot(reg_model, 5)

Check homogeneity of variance in a Scale-Location plot

You can check out this page for some other examples of this type of plot. (Notice that the Scale-Location plot is the third in the grids.)

For your homework, you’ll simply need to generate this plot and talk about how your plot compares to the ones pictured. Is it closer to the ‘good’ plots or one of the ‘bad’ plots? Again, this is a judgement call! It’s okay if feel uncertain, and you won’t be penalized for that.

plot(reg_model, 3)

Issues with My Data

There were no issues with the data, all variables worked well.

Run Your Analysis

Run a t-Test

# very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output1 <- t.test(d$survey1~d$gender)

View Test Output

t_output1

    Welch Two Sample t-test

data:  d$survey1 by d$gender
t = 2.1953, df = 11.857, p-value = 0.0488
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 0.001082365 0.348917635
sample estimates:
mean in group 1 mean in group 2 
       2.833333        2.658333 

Calculate Cohen’s d

# once again, we use our formula to calculate cohen's d
d_output1 <- cohen.d(d$survey1~d$gender)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output1

Cohen's d

d estimate: 0.5575453 (medium)
95 percent confidence interval:
    lower     upper 
-0.140762  1.255853 

Run Your Analysis

Run a t-Test

# very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$condition)

View Test Output

t_output

    Welch Two Sample t-test

data:  d$survey1 by d$condition
t = -0.023951, df = 96.998, p-value = 0.9809
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.1283663  0.1253051
sample estimates:
mean in group 1 mean in group 2 
       2.673469        2.675000 

Calculate Cohen’s d

# once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey1~d$condition)
Warning in cohen.d.formula(d$survey1 ~ d$condition): Cohercing rhs of formula
to factor

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: -0.004813803 (negligible)
95 percent confidence interval:
     lower      upper 
-0.4037791  0.3941515 

Write Up Results

t-Test

Write-up of your results goes here. Check past labs/HWs for template.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.