AI Experiment Analysis

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results

Importing Data

# import your AI results dataset
d <- read.csv(file="Data/final_results2.csv", header=T)

State Your Hypotheses & Chosen Tests

For this experiment, I have proposed two hypotheses between the variables of narcissism, social media usage, and self-esteem. Firstly, individuals exposed to more picture-based social media will display higher levels of narcissism compared to those exposed to text-based social media. Secondly, the increased levels of narcissism will be more apparent in individuals that have a higher self-esteem, compared to those who have a lower self-esteem. In order to test these hypotheses, I am choosing to run a t-test for the first hypothesis, and then Pearson’s correlation test to display the relationships with my third variable of self-esteem.

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# to view stats for all variables
describe(d)

           vars   n  mean    sd median trimmed   mad min   max range  skew
id            1 100 50.50 29.01   50.5   50.50 37.06   1 100.0  99.0  0.00
identity*     2 100 50.50 29.01   50.5   50.50 37.06   1 100.0  99.0  0.00
consent*      3 100 46.16 27.58   45.5   45.94 36.32   1  94.0  93.0  0.06
age           4 100 20.62  1.70   20.0   20.38  0.00  17  30.0  13.0  3.05
race          5 100  4.49  1.61    4.0    4.50  2.97   1   7.0   6.0 -0.01
gender        6 100  2.03  0.81    2.0    1.96  0.00   1   6.0   5.0  2.88
manip_out*    7 100 50.50 29.01   50.5   50.50 37.06   1 100.0  99.0  0.00
survey1       8 100  7.07  2.38    7.0    7.05  2.97   2  12.0  10.0  0.13
survey2       9 100  1.32  0.18    1.3    1.31  0.15   1   1.8   0.8  0.25
ai_manip*    10 100 50.50 29.01   50.5   50.50 37.06   1 100.0  99.0  0.00
condition    11 100  1.50  0.50    1.5    1.50  0.74   1   2.0   1.0  0.00
           kurtosis   se
id            -1.24 2.90
identity*     -1.24 2.90
consent*      -1.28 2.76
age           12.42 0.17
race          -1.49 0.16
gender        10.36 0.08
manip_out*    -1.24 2.90
survey1       -0.70 0.24
survey2       -0.67 0.02
ai_manip*     -1.24 2.90
condition     -2.02 0.05

# we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = "condition")


 Descriptive statistics by group 
condition: 1
          vars  n  mean    sd median trimmed   mad min   max range  skew
id           1 50 25.50 14.58   25.5   25.50 18.53   1  50.0  49.0  0.00
identity     2 50 54.30 29.70   55.5   55.15 36.32   2 100.0  98.0 -0.19
consent      3 50 50.06 28.20   53.5   50.50 37.06   3  94.0  91.0 -0.12
age          4 50 20.38  1.41   20.0   20.32  0.74  17  28.0  11.0  2.77
race         5 50  4.66  1.56    5.0    4.70  1.48   2   7.0   5.0 -0.17
gender       6 50  1.96  0.73    2.0    1.93  0.00   1   5.0   4.0  2.55
manip_out    7 50 67.38 28.91   75.5   71.50 18.53   1 100.0  99.0 -1.13
survey1      8 50  6.88  2.40    7.0    6.82  2.97   3  12.0   9.0  0.16
survey2      9 50  1.33  0.19    1.3    1.32  0.15   1   1.8   0.8  0.32
ai_manip    10 50 53.20 27.83   54.0   53.62 33.36   2  99.0  97.0 -0.12
condition   11 50  1.00  0.00    1.0    1.00  0.00   1   1.0   0.0   NaN
          kurtosis   se
id           -1.27 2.06
identity     -1.18 4.20
consent      -1.36 3.99
age          14.91 0.20
race         -1.58 0.22
gender        9.69 0.10
manip_out     0.11 4.09
survey1      -0.79 0.34
survey2      -0.63 0.03
ai_manip     -1.09 3.94
condition      NaN 0.00
------------------------------------------------------------ 
condition: 2
          vars  n  mean    sd median trimmed   mad min   max range  skew
id           1 50 75.50 14.58   75.5   75.50 18.53  51 100.0  49.0  0.00
identity     2 50 46.70 28.09   45.0   45.95 35.58   1  99.0  98.0  0.18
consent      3 50 42.26 26.66   40.5   41.50 31.13   1  92.0  91.0  0.23
age          4 50 20.86  1.93   20.0   20.42  0.00  19  30.0  11.0  2.91
race         5 50  4.32  1.66    4.0    4.30  1.48   1   7.0   6.0  0.15
gender       6 50  2.10  0.89    2.0    2.00  0.00   1   6.0   5.0  2.91
manip_out    7 50 33.62 16.83   33.5   33.65 21.50   2  61.0  59.0 -0.01
survey1      8 50  7.26  2.38    7.0    7.25  2.97   2  12.0  10.0  0.10
survey2      9 50  1.30  0.16    1.3    1.30  0.15   1   1.6   0.6  0.02
ai_manip    10 50 47.80 30.18   43.5   47.27 41.51   1 100.0  99.0  0.13
condition   11 50  2.00  0.00    2.0    2.00  0.00   2   2.0   0.0   NaN
          kurtosis   se
id           -1.27 2.06
identity     -1.26 3.97
consent      -1.15 3.77
age           9.38 0.27
race         -1.42 0.23
gender        9.31 0.13
manip_out    -1.33 2.38
survey1      -0.70 0.34
survey2      -1.28 0.02
ai_manip     -1.38 4.27
condition      NaN 0.00

# also use histograms and scatterplots to examine your continuous variables
hist(d$survey1)

hist(d$survey2)

plot(d$survey1, d$survey2)

# and table() and cross_cases() to examine your categorical variables
# you may not need the cross_cases code
# table(d$condition)
# cross_cases(d, IV1, IV2)

# and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey1~d$condition)

#convert any categorical variables to factors
d$condition <- as.factor(d$condition)

Check Your Assumptions

t-Test Assumptions

Data values must be independent (independent t-test only) (confirmed by data report)
Data obtained via a random sample (confirmed by data report)
IV must have two levels (will check below)
Dependent variable must be normally distributed (will check below. if issues, note and proceed)
Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# preview the levels and counts for your IV
table(d$condition, useNA = "always")


   1    2 <NA> 
  50   50    0

# 
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
# d <- subset(d, IV != "BAD")
# 
# table(d$iv, useNA = "always")
# 
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
# 
# table(d$iv, useNA = "always")
# 
# check your variable types
str(d)

'data.frame':   100 obs. of  11 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I'm an outgoing Black woman in college, passionate about environmental science. I struggle with anxiety and fee"| __truncated__ "I'm a 20-year-old White college student named Emily. I struggle with anxiety and often feel lonely, despite my "| __truncated__ "I'm 21, Black, and a college junior studying psychology. I'm a quiet introvert but love reading and painting, e"| __truncated__ "I'm a 20-year-old Black woman, passionate about photography and creative writing. I often feel anxious and stru"| __truncated__ ...
 $ consent  : chr  "I understand the instructions. I'll be scrolling through a social media feed on the provided smartphone first, "| __truncated__ "I understand the instructions. I will scroll through the social media feed on the provided smartphone and then "| __truncated__ "I understand the instructions. I'll begin by scrolling through the social media feed on the smartphone provided"| __truncated__ "I understand the instructions. I will scroll through the social media feed on the provided smartphone and then "| __truncated__ ...
 $ age      : int  21 20 21 20 20 20 20 22 20 20 ...
 $ race     : int  3 6 3 3 6 4 6 6 4 6 ...
 $ gender   : int  2 2 2 2 2 1 2 2 2 2 ...
 $ manip_out: chr  "This post is a photo of a stunning sunset over a mountain range taken by a popular travel influencer. The vibra"| __truncated__ "Scrolling through the feed, I come across a variety of posts. \n\nFirst, there’s a photo of a stunning sunset o"| __truncated__ "The first post is a photo of a group of friends laughing around a bonfire on the beach. They are all wearing ca"| __truncated__ "This post is a photo of a young Black woman standing confidently in front of a vibrant mural in her city. Her s"| __truncated__ ...
 $ survey1  : int  8 9 3 10 9 6 5 11 7 3 ...
 $ survey2  : num  1.6 1.5 1.1 1.5 1.4 1.4 1.1 1.1 1.2 1.4 ...
 $ ai_manip : chr  "I answered the questions based on my personal experiences and emotional responses to social media content. My f"| __truncated__ "I answered the questions reflecting my feelings of inadequacy and loneliness when engaging with photo-based soc"| __truncated__ "I answered the questions based on my feelings of envy, inspiration, and comfort while scrolling through social "| __truncated__ "I answered the questions based on how each post made me feel. While some content inspires me and aligns with my"| __truncated__ ...
 $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...

# 
# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# use the leveneTest() command from the car package to test homogeneity of variance
# uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey1~condition, data = d)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.0048 0.9451
      98

Pearson’s Correlation Coefficient Assumptions

Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (will do below)
Outliers should be identified and removed, or results will be inaccurate (will do below)
Relationship between the variables should be linear, or they will not be detected (will do below)

Run a Multiple Linear Regression

To check the assumptions for Pearson’s correlation coefficient, we run our regression and then check our diagnostic plots.

# use the lm() command to run the regression
# dependent/outcome variable on the left, independent/predictor variables on the right
reg_model <- lm(survey2 ~ condition, data = d)

Check linearity with Residuals vs Fitted plot

For some examples of good Residuals vs Fitted plot and ones that show serious errors, check out this page.

For your homework, you’ll simply need to generate this plot and talk about how your plot compares to the good and problematic plots linked to above. Is it closer to the ‘good’ plots or one of the ‘bad’ plots? This is going to be a judgement call, and that’s okay! In practice, you’ll always be making these judgement calls as part of a team, so this assignment is just about getting experience with it, not making the perfect call.

plot(reg_model, 1)

Check for outliers using Cook’s distance and a Residuals vs Leverage plot

For your homework, you’ll simply need to generate these plots, assess Cook’s distance in your dataset, and then identify any potential cases that are prominent outliers.

# Cook's distance
plot(reg_model, 4)

# Residuals vs Leverage
plot(reg_model, 5)

Issues with My Data

After checking all the assumptions for both the t-test and Pearson’s correlation, I have not found any kind of issues with my data. Skew and kurtosis is all between -2 and 2 and Levene’s test displayed no significance, meaning that the variance between the groups is equal. Each plot (residuals vs fitted, Cook’s distance, and residuals vs leverage) all looked good and we are able to move onto the tests and analyses.

Run Your Analysis

Run a t-Test

# very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$condition)

View Test Output

t_output


    Welch Two Sample t-test

data:  d$survey1 by d$condition
t = -0.79561, df = 97.996, p-value = 0.4282
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -1.3278251  0.5678251
sample estimates:
mean in group 1 mean in group 2 
           6.88            7.26

Calculate Cohen’s d

# once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey1~d$condition)

View Effect Size

Trivial: < .2
Small: between .2 and .5
Medium: between .5 and .8
Large: > .8

d_output


Cohen's d

d estimate: -0.1591218 (negligible)
95 percent confidence interval:
     lower      upper 
-0.5566428  0.2383993

Run a Correlation Test

Create a Correlation Matrix

d2 <- subset(d, select=c(survey1, survey2))
corr_output_m <- corr.test(d2)

View Test Output

Strong effect: Between |0.50| and |1|
Moderate effect: Between |0.30| and |0.49|
Weak effect: Between |0.10| and |0.29|
Trivial effect: Less than |0.09|

corr_output_m

Call:corr.test(x = d2)
Correlation matrix 
        survey1 survey2
survey1    1.00    0.18
survey2    0.18    1.00
Sample Size 
[1] 100
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
        survey1 survey2
survey1    0.00    0.07
survey2    0.07    0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

Write Up Results

t-Test

We tested our hypothesis that people exposed to photo-based social media would report higher narcissistic levels compared to those viewing text-based social media using an independent samples t-test. We did not need to drop any participants from this analysis. Before proceeding with analysis, we confirmed that all t-test assumptions were met. Our dependent variable is normally distributed (skew and kurtosis between -2 and +2). Levene’s test found significant homogeneity of variance (p > .05). As a result, Welch’s t-test will be used. During the t-test, we did not find a significant difference, t(97.996) = -0.796, p > .05. (refer to Figure 1). Our effect size was weak according to Cohen (1988).

Correlation Test

I predicted that the participants that displayed higher levels of narcissism would be participants that also displayed higher levels of self-esteem, making a positive correlation. I found no outliers for the variables involving narcissism or self-esteem. All assumptions were met and there were no clear indications of non-linearity. Skew and kurtosis were normal (between -2 and 2), therefore we proceeded with Pearson’s Correlation. After running the test, we found that our second hypothesis was also not supported, finding non-significant results (p > .05). Full data values and confidence intervals can be found in Table 1. The effect size for this relationship is weak according to Cohen’s d (Cohen, 1988).

Variable	M	SD	1
Narcissism (NPI-13)	7.07	2.38

Self-esteem (Rosenberg)	1.32	0.18	.18
			[-.01, .37]

Note:
M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation.
^* indicates p < .05
^** indicates p < .01.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.