AI Experiment Analysis

Loading Libraries

library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results

Importing Data

# t-test= survey 2 and condition, pearson's= survey 1 and 2
# # import your AI results dataset

d <- read.csv(file="Data/final_results.csv", header=T)
d <- subset(d, select=c(1:11))
d <- subset(d, !is.na(id))
d$survey2 <- as.numeric(d$survey2)
Warning: NAs introduced by coercion

State Your Hypotheses & Chosen Tests

H1: Participants who are exposed to idealized social media will report lower NPI scores than those who are exposed to neutral content due to negative social comparison which can reduce self-esteem. H2: Mindfulness will moderate the effect of social media exposure on narcissism, such that the decrease in narcissism following idealized content exposure will be weaker among participants with higher mindfulness levels.

In order to test my hypotheses, I will be using a t-test as well as a pearson’s correlation coefficient.

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# # to view stats for all variables
describe(d)
           vars   n  mean    sd median trimmed   mad min max range  skew
id            1 100 50.50 29.01   50.5   50.50 37.06   1 100    99  0.00
identity*     2 100 50.50 29.01   50.5   50.50 37.06   1 100    99  0.00
consent*      3 100 50.50 29.01   50.5   50.50 37.06   1 100    99  0.00
age           4 100 26.73  2.21   27.0   26.65  1.48  20  34    14  0.59
race          5 100  4.96  1.46    6.0    5.09  0.00   2   7     5 -0.63
gender        6 100  1.90  0.30    2.0    2.00  0.00   1   2     1 -2.63
manip_out*    7 100 50.43 28.90   50.5   50.50 37.06   1  99    98 -0.01
survey1       8 100 71.03  7.59   72.0   70.94  7.41  52  92    40  0.03
survey2       9  99 22.35 12.59   23.0   21.88 17.79   5  50    45  0.11
ai_manip*    10 100 50.50 29.01   50.5   50.50 37.06   1 100    99  0.00
condition    11 100  1.50  0.50    1.5    1.50  0.74   1   2     1  0.00
           kurtosis   se
id            -1.24 2.90
identity*     -1.24 2.90
consent*      -1.24 2.90
age            2.42 0.22
race          -1.27 0.15
gender         4.95 0.03
manip_out*    -1.25 2.89
survey1       -0.17 0.76
survey2       -1.21 1.27
ai_manip*     -1.24 2.90
condition     -2.02 0.05
# 
# # we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = d$condition)

 Descriptive statistics by group 
group: 1
          vars  n  mean    sd median trimmed   mad min max range  skew kurtosis
id           1 50 25.50 14.58   25.5   25.50 18.53   1  50    49  0.00    -1.27
identity     2 50 50.24 29.24   50.0   50.23 36.32   2  99    97  0.04    -1.25
consent      3 50 50.42 29.92   50.0   50.40 37.81   2 100    98  0.05    -1.25
age          4 50 26.94  2.34   27.0   26.80  1.48  22  34    12  0.82     1.61
race         5 50  4.98  1.52    6.0    5.12  0.00   2   7     5 -0.69    -1.21
gender       6 50  1.88  0.33    2.0    1.98  0.00   1   2     1 -2.27     3.21
manip_out    7 50 36.38 27.62   29.5   33.38 28.91   1  99    98  0.73    -0.48
survey1      8 50 71.26  8.55   72.0   71.03 10.38  52  92    40  0.17    -0.57
survey2      9 50 22.72 13.22   24.5   22.25 17.05   5  50    45  0.05    -1.32
ai_manip    10 50 45.00 31.63   38.5   43.77 38.55   1 100    99  0.28    -1.28
condition   11 50  1.00  0.00    1.0    1.00  0.00   1   1     0   NaN      NaN
            se
id        2.06
identity  4.14
consent   4.23
age       0.33
race      0.21
gender    0.05
manip_out 3.91
survey1   1.21
survey2   1.87
ai_manip  4.47
condition 0.00
------------------------------------------------------------ 
group: 2
          vars  n  mean    sd median trimmed   mad min max range  skew kurtosis
id           1 50 75.50 14.58   75.5   75.50 18.53  51 100    49  0.00    -1.27
identity     2 50 50.76 29.07   51.0   50.77 36.32   1 100    99 -0.04    -1.29
consent      3 50 50.58 28.38   51.0   50.65 34.84   1  99    98 -0.06    -1.30
age          4 50 26.52  2.08   27.0   26.55  1.48  20  34    14  0.18     2.96
race         5 50  4.94  1.41    6.0    5.05  0.00   2   7     5 -0.54    -1.44
gender       6 50  1.92  0.27    2.0    2.00  0.00   1   2     1 -3.00     7.17
manip_out    7 50 64.48 22.84   68.5   66.15 25.20  12  97    85 -0.51    -0.71
survey1      8 50 70.80  6.56   72.0   71.00  5.93  53  85    32 -0.37    -0.01
survey2      9 49 21.98 12.05   23.0   21.51 16.31   5  50    45  0.17    -1.15
ai_manip    10 50 56.00 25.27   55.5   56.58 30.39  12  98    86 -0.18    -1.19
condition   11 50  2.00  0.00    2.0    2.00  0.00   2   2     0   NaN      NaN
            se
id        2.06
identity  4.11
consent   4.01
age       0.29
race      0.20
gender    0.04
manip_out 3.23
survey1   0.93
survey2   1.72
ai_manip  3.57
condition 0.00
# 
# # also use histograms and scatterplots to examine your continuous variables
hist(d$survey2)

plot(d$survey2, d$survey1)

# 
# # and table() and cross_cases() to examine your categorical variables
# # you may not need the cross_cases code
table(d$condition)

 1  2 
50 50 
#cross_cases(d, IV1, IV2)
# 
# # and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey2~d$condition)

# 
# #convert any categorical variables to factors
# d$variable <- as.factor(d$variable)

Check Your Assumptions

t-Test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
table(d$condition, useNA = "always")

   1    2 <NA> 
  50   50    0 
# 
# # check your variable types
str(d)
'data.frame':   100 obs. of  11 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I'm a 28-year-old White woman named Sarah. I work as a marketing coordinator in Chicago. I’m creative and outgo"| __truncated__ "50 I’m a 26-year-old White woman named Sarah. I often feel lost in my career choice, juggling a dull office job"| __truncated__ "I'm a 26-year-old White woman named Sarah. I love hiking and painting, but I often feel lost in my career. My f"| __truncated__ "I'm a 31-year-old White woman living in Portland, Oregon. I’m passionate about art and sustainability but often"| __truncated__ ...
 $ consent  : chr  "Thank you for sharing this information, Sarah! It sounds like you have a lot going on in your life, balancing y"| __truncated__ "Thank you for sharing your identity, Sarah! It's completely normal to feel lost in your career and struggle wit"| __truncated__ "Thank you for sharing that, Sarah! It's great that you know some of your interests and what you value. Identify"| __truncated__ "Thank you for sharing your identity and feelings. It sounds like you’re navigating some challenging emotions wh"| __truncated__ ...
 $ age      : int  28 26 26 31 27 27 28 24 27 26 ...
 $ race     : int  6 6 6 6 6 6 3 4 3 4 ...
 $ gender   : int  2 2 2 2 1 2 2 2 1 2 ...
 $ manip_out: chr  "*Scrolling through the curated media feed...*\n\n- A serene image of a lake surrounded by trees.\n- A short vid"| __truncated__ "*Scrolls through the feed.*\n\n1. A serene image of a park with trees and a walking path. Caption: \"A peaceful"| __truncated__ "*Scrolling begins*\n\n1. A serene image of a mountain landscape at sunset. Caption: \"Nature is calming.\"\n  \"| __truncated__ "I'm scrolling through the feed, taking in the images and videos. \n\nThe first post is a serene landscape of a "| __truncated__ ...
 $ survey1  : int  64 58 65 74 69 52 75 62 75 86 ...
 $ survey2  : num  6 6 36 8 28 6 6 5 33 50 ...
 $ ai_manip : chr  "I answered based on my desire for connection and tranquility amidst anxiety. Scrolling through the calming medi"| __truncated__ "I answered the questions based on my struggles with social anxiety and a yearning for connection, often feeling"| __truncated__ "I answered the questions based on my feelings about nature and routine tasks, as these evoke a sense of calm an"| __truncated__ "I answered the questions based on my genuine feelings and experiences while engaging with the media. Scrolling "| __truncated__ ...
 $ condition: int  1 1 1 1 1 1 1 1 1 1 ...
# 
# # make sure that your IV is recognized as a factor by R
# # if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey2~condition, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.8491 0.3591
      97               

Pearson’s Correlation Coefficient Assumptions

  • Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
  • Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (will do below)
  • Outliers should be identified and removed, or results will be inaccurate (will do below)
  • Relationship between the variables should be linear, or they will not be detected (will do below)

Run a Multiple Linear Regression

To check the assumptions for Pearson’s correlation coefficient, we run our regression and then check our diagnostic plots.

# # use the lm() command to run the regression
# # dependent/outcome variable on the left, independent/predictor variables on the right
reg_model <- lm(survey2 ~ condition + survey1, data = d)

Check linearity with Residuals vs Fitted plot

For some examples of good Residuals vs Fitted plot and ones that show serious errors, check out this page.

For your homework, you’ll simply need to generate this plot and talk about how your plot compares to the good and problematic plots linked to above. Is it closer to the ‘good’ plots or one of the ‘bad’ plots? This is going to be a judgement call, and that’s okay! In practice, you’ll always be making these judgement calls as part of a team, so this assignment is just about getting experience with it, not making the perfect call.

plot(reg_model, 1)

Check for outliers using Cook’s distance and a Residuals vs Leverage plot

For your homework, you’ll simply need to generate these plots, assess Cook’s distance in your dataset, and then identify any potential cases that are prominent outliers.

# # Cook's distance
plot(reg_model, 4)

# 
# # Residuals vs Leverage
plot(reg_model, 5)

Run Your Analysis

Run a t-Test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey2~d$condition)

View Test Output

t_output

    Welch Two Sample t-test

data:  d$survey2 by d$condition
t = 0.29136, df = 96.503, p-value = 0.7714
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -4.303524  5.784340
sample estimates:
mean in group 1 mean in group 2 
       22.72000        21.97959 

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey2~d$condition)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: 0.05851356 (negligible)
95 percent confidence interval:
     lower      upper 
-0.3405365  0.4575636 

Run a Correlation Test

Create a Correlation Matrix

d2 <- subset(d, select=c(survey1, survey2))
corr_output_m <- corr.test(d2)

View Test Output

  • Strong effect: Between |0.50| and |1|
  • Moderate effect: Between |0.30| and |0.49|
  • Weak effect: Between |0.10| and |0.29|
  • Trivial effect: Less than |0.09|
corr_output_m
Call:corr.test(x = d2)
Correlation matrix 
        survey1 survey2
survey1    1.00    0.06
survey2    0.06    1.00
Sample Size 
        survey1 survey2
survey1     100      99
survey2      99      99
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
        survey1 survey2
survey1    0.00    0.58
survey2    0.58    0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

Write Up Results

t-Test

We tested our hypothesis that those exposed to media with a luxurious lifestyle and high beauty standards, etc. would score lower on an NPI test than those exposed to neutral media. We did not drop any participants. Using Levene’s test, we checked for homogeneity of variance. Our p-value was not significant, at 0.359, meaning there is homogeneity. We completed a t-test between the control group, neutral social media, and the experimental group, the luxurious lifestyle media, and their scores on the NPI test. The p-value was 0.771, showing no statistical significance. The mean of control group was 22.72 and the mean of the experimental group was 21.98. This means that we fail to reject the null hypothesis, and that media may not have any significant affect on NPI. We then ran a Cohen’s D test which confirmed our results. The d estimate was 0.06 which is negligible effect size, so there is no meaningful difference.

Correlation Test

We tested our hypothesis that those with a higher MAAS score before the experiment would be less affected my the media performed they were given when completing an NPI test. First, we completed a residuals vs fitted plot in order to see the relationship. The plot looks to be a reasonably good regression as there are only a few outliers and it does not violate the assumption of linearity. Next, we looked at Cook’s distance. There are a few points that seem to have higher influence: 10, 24, and 59. Next, we looked at the residuals vs leverage. There were no points too far outside Cook’s distance, but a few that had more leverage. We decided to leave all results in. We then ran a correlation matrix. The matrix resulted in r = 0.06, meaning there is weak relationship between MAAS scores and NPI scores.

Variable M SD 1
MAAS 71.03 7.59
NPI 22.35 12.59 .06
[-.14, .25]
Note:
M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation.
* indicates p < .05
** indicates p < .01.