AI Experiment Analysis

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results

Importing Data

# # import your AI results dataset
 d <- read.csv(file="Data/final_data(in).csv", header=T)

State Your Hypotheses & Chosen Tests

H1: We predict that those who are active users on social media will feel a higher sense of need to belong than those who are passive users on social media. H2: We predict that individuals will report higher subjective well-being after active social media use compared to passive use. With these hypotheses, we chose to run two seperate t-tests.

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# # to view stats for all variables
describe(d)
           vars   n  mean    sd median trimmed   mad  min    max range  skew
id            1 100 50.50 29.01  50.50   50.50 37.06  1.0 100.00 99.00  0.00
identity*     2 100 50.50 29.01  50.50   50.50 37.06  1.0 100.00 99.00  0.00
consent*      3 100 33.50 16.68  31.50   33.85 17.05  1.0  61.00 60.00 -0.01
age           4 100 20.00  1.72  20.00   19.89  0.00 12.0  26.00 14.00  0.34
race          5 100  4.87  1.40   6.00    4.95  0.00  2.0   7.00  5.00 -0.40
gender        6 100  1.64  0.69   2.00    1.60  0.00  1.0   5.00  4.00  2.06
manip_out*    7 100 50.50 29.01  50.50   50.50 37.06  1.0 100.00 99.00  0.00
survey1       8 100  3.76  0.31   3.83    3.81  0.25  3.0   4.17  1.17 -1.12
survey2       9 100  3.77  0.18   3.80    3.78  0.15  3.1   4.00  0.90 -0.68
ai_manip*    10 100 50.50 29.01  50.50   50.50 37.06  1.0 100.00 99.00  0.00
condition    11 100  1.50  0.50   1.50    1.50  0.74  1.0   2.00  1.00  0.00
           kurtosis   se
id            -1.24 2.90
identity*     -1.24 2.90
consent*      -1.04 1.67
age            8.80 0.17
race          -1.49 0.14
gender         8.64 0.07
manip_out*    -1.24 2.90
survey1       -0.21 0.03
survey2        0.32 0.02
ai_manip*     -1.24 2.90
condition     -2.02 0.05
# 
# # we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = "condition" )

 Descriptive statistics by group 
condition: 1
          vars  n  mean    sd median trimmed   mad   min   max range  skew
id           1 50 25.50 14.58  25.50   25.50 18.53  1.00 50.00  49.0  0.00
identity     2 50 45.74 29.70  44.00   44.58 35.58  1.00 99.00  98.0  0.30
consent      3 50 34.30 16.28  29.50   34.35 16.31  5.00 59.00  54.0  0.21
age          4 50 20.04  2.13  20.00   19.90  0.00 12.00 26.00  14.0  0.00
race         5 50  5.10  1.39   6.00    5.18  0.00  3.00  7.00   4.0 -0.58
gender       6 50  1.70  0.84   2.00    1.60  0.00  1.00  5.00   4.0  2.21
manip_out    7 50 28.06 20.25  26.50   26.18 19.27  1.00 99.00  98.0  1.46
survey1      8 50  3.71  0.27   3.83    3.73  0.00  3.17  4.17   1.0 -0.98
survey2      9 50  3.76  0.18   3.75    3.77  0.22  3.40  4.00   0.6 -0.23
ai_manip    10 50 47.34 26.81  46.50   47.08 26.69  2.00 96.00  94.0  0.05
condition   11 50  1.00  0.00   1.00    1.00  0.00  1.00  1.00   0.0   NaN
          kurtosis   se
id           -1.27 2.06
identity     -1.11 4.20
consent      -1.27 2.30
age           5.45 0.30
race         -1.39 0.20
gender        6.78 0.12
manip_out     3.34 2.86
survey1      -0.05 0.04
survey2      -1.05 0.02
ai_manip     -0.97 3.79
condition      NaN 0.00
------------------------------------------------------------ 
condition: 2
          vars  n  mean    sd median trimmed   mad  min max range  skew
id           1 50 75.50 14.58   75.5   75.50 18.53 51.0 100  49.0  0.00
identity     2 50 55.26 27.79   60.5   56.45 33.36  2.0 100  98.0 -0.29
consent      3 50 32.70 17.21   32.0   33.10 17.05  1.0  61  60.0 -0.19
age          4 50 19.96  1.19   20.0   19.90  0.00 16.0  26  10.0  1.84
race         5 50  4.64  1.40    4.0    4.72  2.97  2.0   6   4.0 -0.24
gender       6 50  1.58  0.50    2.0    1.60  0.00  1.0   2   1.0 -0.31
manip_out    7 50 72.94 16.21   73.5   73.50 18.53 19.0 100  81.0 -0.57
survey1      8 50  3.82  0.34    4.0    3.88  0.00  3.0   4   1.0 -1.38
survey2      9 50  3.78  0.19    3.8    3.80  0.15  3.1   4   0.9 -1.05
ai_manip    10 50 53.66 31.01   58.0   53.85 42.25  1.0 100  99.0 -0.11
condition   11 50  2.00  0.00    2.0    2.00  0.00  2.0   2   0.0   NaN
          kurtosis   se
id           -1.27 2.06
identity     -1.20 3.93
consent      -1.01 2.43
age          12.90 0.17
race         -1.59 0.20
gender       -1.94 0.07
manip_out     0.54 2.29
survey1       0.02 0.05
survey2       1.35 0.03
ai_manip     -1.48 4.39
condition      NaN 0.00
# 
# # also use histograms and scatterplots to examine your continuous variables
 hist(d$ survey1)

# plot(d$"survey 1)
# 
# # and table() and cross_cases() to examine your categorical variables
# # you may not need the cross_cases code
table(d$"condition")

 1  2 
50 50 
# cross_cases(d, IV1, IV2)
# 
# # and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey1~d$condition)

# 
# #convert any categorical variables to factors
d$condition <- as.factor(d$condition)

Check Your Assumptions

t-Test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
table(d$condition, useNA = "always")

   1    2 <NA> 
  50   50    0 
# 
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
# d <- subset(d, IV != "BAD")
# 
# table(d$iv, useNA = "always")
# 
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
# 
# table(d$iv, useNA = "always")
# 
# # check your variable types
str(d)
'data.frame':   100 obs. of  11 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I'm 19, a White female studying environmental science in Ohio. I love nature but struggle with anxiety and fitt"| __truncated__ "I'm Sofia, a 20-year-old Latina studying psychology. I’m passionate about helping others but often feel overwhe"| __truncated__ "I'm a 26-year-old White woman studying psychology at a state university. I often feel overwhelmed by balancing "| __truncated__ "I'm Anna, a 20-year-old White woman studying environmental science. I’m passionate about climate change but oft"| __truncated__ ...
 $ consent  : chr  "I understand the instructions. I'm ready to participate in the study and will wait for further guidance regardi"| __truncated__ "I understand the instructions. I will respond to the questions and complete the task using social media when in"| __truncated__ "I understand the instructions. I'm ready to respond to the questions and complete the task using social media when instructed." "I understand the instructions. I'm ready to answer the questions and complete the task using social media as directed." ...
 $ age      : int  19 20 26 20 20 22 15 20 26 20 ...
 $ race     : int  6 4 6 6 6 6 6 6 6 7 ...
 $ gender   : int  2 2 2 2 2 2 1 1 2 2 ...
 $ manip_out: chr  "*Scrolling through my feed, I see a post from a friend about her recent hiking trip. The photos are breathtakin"| __truncated__ "*Scrolling through my Instagram feed, I see a mix of posts from friends, influencers, and mental health pages. "| __truncated__ "*Scrolling through Instagram feed*\n\n*First post: A close friend posted a photo of her with a group of friends"| __truncated__ "*Scrolling through my feed, I see a post from an environmental organization I follow. They’re sharing a new rep"| __truncated__ ...
 $ survey1  : num  3.83 3.67 3.83 3.83 3.67 ...
 $ survey2  : num  3.4 3.7 3.6 3.6 3.7 4 3.7 3.9 4 4 ...
 $ ai_manip : chr  "I answered the way I did because social media evokes mixed feelings for me. It connects me to nature and friend"| __truncated__ "I answered based on how social media interactions evoke feelings of connection and self-reflection. My response"| __truncated__ "I answered the questions based on my mixed feelings about social media. While it helps me connect and feel unde"| __truncated__ "I answered the questions based on my experiences with social media, highlighting a blend of climate activism an"| __truncated__ ...
 $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
# 
# # make sure that your IV is recognized as a factor by R
# # if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey1~condition, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.0284 0.8665
      98               

Check Your Assumptions

t-Test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
 table(d$condition, useNA = "always")

   1    2 <NA> 
  50   50    0 
# 
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
# d <- subset(d, IV != "BAD")
# 
# table(d$iv, useNA = "always")
# 
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
# 
# table(d$iv, useNA = "always")
# 
# # check your variable types
str(d)
'data.frame':   100 obs. of  11 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I'm 19, a White female studying environmental science in Ohio. I love nature but struggle with anxiety and fitt"| __truncated__ "I'm Sofia, a 20-year-old Latina studying psychology. I’m passionate about helping others but often feel overwhe"| __truncated__ "I'm a 26-year-old White woman studying psychology at a state university. I often feel overwhelmed by balancing "| __truncated__ "I'm Anna, a 20-year-old White woman studying environmental science. I’m passionate about climate change but oft"| __truncated__ ...
 $ consent  : chr  "I understand the instructions. I'm ready to participate in the study and will wait for further guidance regardi"| __truncated__ "I understand the instructions. I will respond to the questions and complete the task using social media when in"| __truncated__ "I understand the instructions. I'm ready to respond to the questions and complete the task using social media when instructed." "I understand the instructions. I'm ready to answer the questions and complete the task using social media as directed." ...
 $ age      : int  19 20 26 20 20 22 15 20 26 20 ...
 $ race     : int  6 4 6 6 6 6 6 6 6 7 ...
 $ gender   : int  2 2 2 2 2 2 1 1 2 2 ...
 $ manip_out: chr  "*Scrolling through my feed, I see a post from a friend about her recent hiking trip. The photos are breathtakin"| __truncated__ "*Scrolling through my Instagram feed, I see a mix of posts from friends, influencers, and mental health pages. "| __truncated__ "*Scrolling through Instagram feed*\n\n*First post: A close friend posted a photo of her with a group of friends"| __truncated__ "*Scrolling through my feed, I see a post from an environmental organization I follow. They’re sharing a new rep"| __truncated__ ...
 $ survey1  : num  3.83 3.67 3.83 3.83 3.67 ...
 $ survey2  : num  3.4 3.7 3.6 3.6 3.7 4 3.7 3.9 4 4 ...
 $ ai_manip : chr  "I answered the way I did because social media evokes mixed feelings for me. It connects me to nature and friend"| __truncated__ "I answered based on how social media interactions evoke feelings of connection and self-reflection. My response"| __truncated__ "I answered the questions based on my mixed feelings about social media. While it helps me connect and feel unde"| __truncated__ "I answered the questions based on my experiences with social media, highlighting a blend of climate activism an"| __truncated__ ...
 $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
# 
# # make sure that your IV is recognized as a factor by R
# # if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey2~condition, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.0092 0.9238
      98               

Issues with My Data

We did not drop any participants from survey 1. Before proceeding with analysis, we confirmed that all t-test assumptions were met.

Issues with My Data

We did not drop any participants from survey 2. Before proceeding with analysis, we confirmed that all t-test assumptions were met.

Run Your Analysis

Run a t-Test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$condition)

View Test Output

t_output

    Welch Two Sample t-test

data:  d$survey1 by d$condition
t = -1.7841, df = 93.673, p-value = 0.07765
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.23242701  0.01242701
sample estimates:
mean in group 1 mean in group 2 
           3.71            3.82 

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
 d_output <- cohen.d(d$survey1~d$condition)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: -0.3568126 (small)
95 percent confidence interval:
      lower       upper 
-0.75685180  0.04322657 

Run a t-Test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
 t_output <- t.test(d$survey2~d$condition)

View Test Output

t_output

    Welch Two Sample t-test

data:  d$survey2 by d$condition
t = -0.49398, df = 97.427, p-value = 0.6224
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.09031608  0.05431608
sample estimates:
mean in group 1 mean in group 2 
          3.758           3.776 

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey2~d$condition)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
 d_output

Cohen's d

d estimate: -0.09879694 (negligible)
95 percent confidence interval:
     lower      upper 
-0.4959325  0.2983386 

Write Up Results

t-Test

We tested our hypothesis that those who are active users on social media will feel a higher sense of need to belong than those who are passive users on social media. We did not drop any participants and made sure all t-test assumptions were met. We did not find any significance between variables and our effect size was small according to Cohen (1988), t(-1.78)= 93.67, p = .077, d = -.36, 95% = [-.76, .04]. (See Figure 1).

t-Test

We tested our hypothesis that those who are active users on social media will feel a higher sense of need to belong than those who are passive users on social media. We did not drop any participants and made sure all t-test assumptions were met. We did not find any significance between variables and our effect size was small according to Cohen (1988), t(-0.49)= 97.42, p = 0.622, d = -0.10, 95% = [-.50, .30]. (See Figure 2).

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.