AI Experiment Analysis

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results

Importing Data

# # import your AI results dataset
d <- read.csv(file="Data/myfinalresults.csv", header=T)

State Your Hypotheses & Chosen Tests

H1: Participants in the social media use condition will have higher post-stress scores than those in the control condition. My chosen test for this hypothesis is an independent samples t-test. The dependent variable will be the post-stress score, and the independent variable will be the condition (social media use vs. control).

H2: Pre-survey stress scores will be positively correlated with post-survey stress scores across all participants. I will be using a Pearson’s correlation test for this hypothesis, using pre-survey and post-survey stress scores as continuous variables.

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# # to view stats for all variables
describe(d)

           vars   n  mean    sd median trimmed   mad   min    max range  skew
id            1 100 50.50 29.01  50.50   50.50 37.06  1.00 100.00  99.0  0.00
identity*     2 100 50.50 29.01  50.50   50.50 37.06  1.00 100.00  99.0  0.00
consent*      3 100  1.93  0.26   2.00    2.00  0.00  1.00   2.00   1.0 -3.32
age           4 100 29.08  7.01  28.00   27.85  2.97 18.00  65.00  47.0  2.50
race          5 100  4.57  1.59   6.00    4.68  1.48  1.00   7.00   6.0 -0.28
gender        6 100  2.17  0.84   2.00    2.00  0.00  1.00   7.00   6.0  3.70
manip_out*    7 100 50.50 29.01  50.50   50.50 37.06  1.00 100.00  99.0  0.00
survey1       8 100  3.09  0.43   3.18    3.12  0.47  2.23   3.63   1.4 -0.45
survey2       9 100  3.38  0.20   3.50    3.42  0.00  2.50   3.60   1.1 -1.82
ai_manip*    10 100 50.50 29.01  50.50   50.50 37.06  1.00 100.00  99.0  0.00
condition    11 100  1.50  0.50   1.50    1.50  0.74  1.00   2.00   1.0  0.00
           kurtosis   se
id            -1.24 2.90
identity*     -1.24 2.90
consent*       9.11 0.03
age            8.27 0.70
race          -1.54 0.16
gender        14.36 0.08
manip_out*    -1.24 2.90
survey1       -1.37 0.04
survey2        3.50 0.02
ai_manip*     -1.24 2.90
condition     -2.02 0.05

# 
# # we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = "survey1")


 Descriptive statistics by group 
survey1: 2.23
          vars n  mean sd median trimmed mad   min   max range skew kurtosis se
id           1 1 53.00 NA  53.00   53.00   0 53.00 53.00     0   NA       NA NA
identity     2 1 83.00 NA  83.00   83.00   0 83.00 83.00     0   NA       NA NA
consent      3 1  2.00 NA   2.00    2.00   0  2.00  2.00     0   NA       NA NA
age          4 1 42.00 NA  42.00   42.00   0 42.00 42.00     0   NA       NA NA
race         5 1  6.00 NA   6.00    6.00   0  6.00  6.00     0   NA       NA NA
gender       6 1  2.00 NA   2.00    2.00   0  2.00  2.00     0   NA       NA NA
manip_out    7 1 53.00 NA  53.00   53.00   0 53.00 53.00     0   NA       NA NA
survey1      8 1  2.23 NA   2.23    2.23   0  2.23  2.23     0   NA       NA NA
survey2      9 1  3.50 NA   3.50    3.50   0  3.50  3.50     0   NA       NA NA
ai_manip    10 1 59.00 NA  59.00   59.00   0 59.00 59.00     0   NA       NA NA
condition   11 1  2.00 NA   2.00    2.00   0  2.00  2.00     0   NA       NA NA
------------------------------------------------------------ 
survey1: 2.27
          vars n  mean sd median trimmed mad   min   max range skew kurtosis se
id           1 1  1.00 NA   1.00    1.00   0  1.00  1.00     0   NA       NA NA
identity     2 1 96.00 NA  96.00   96.00   0 96.00 96.00     0   NA       NA NA
consent      3 1  2.00 NA   2.00    2.00   0  2.00  2.00     0   NA       NA NA
age          4 1 29.00 NA  29.00   29.00   0 29.00 29.00     0   NA       NA NA
race         5 1  6.00 NA   6.00    6.00   0  6.00  6.00     0   NA       NA NA
gender       6 1  2.00 NA   2.00    2.00   0  2.00  2.00     0   NA       NA NA
manip_out    7 1 22.00 NA  22.00   22.00   0 22.00 22.00     0   NA       NA NA
survey1      8 1  2.27 NA   2.27    2.27   0  2.27  2.27     0   NA       NA NA
survey2      9 1  3.50 NA   3.50    3.50   0  3.50  3.50     0   NA       NA NA
ai_manip    10 1 56.00 NA  56.00   56.00   0 56.00 56.00     0   NA       NA NA
condition   11 1  1.00 NA   1.00    1.00   0  1.00  1.00     0   NA       NA NA
------------------------------------------------------------ 
survey1: 2.5
          vars  n  mean    sd median trimmed   mad  min  max range  skew
id           1 25 50.48 30.42   59.0   50.48 38.55  2.0 99.0  97.0 -0.14
identity     2 25 50.12 32.35   54.0   50.14 38.55  1.0 99.0  98.0 -0.04
consent      3 25  1.92  0.28    2.0    2.00  0.00  1.0  2.0   1.0 -2.91
age          4 25 28.88  5.26   28.0   28.81  5.93 18.0 39.0  21.0  0.17
race         5 25  5.64  1.04    6.0    5.90  0.00  2.0  6.0   4.0 -2.52
gender       6 25  2.04  0.68    2.0    2.00  0.00  1.0  5.0   4.0  3.07
manip_out    7 25 49.00 29.10   50.0   49.43 43.00  1.0 93.0  92.0 -0.15
survey1      8 25  2.50  0.00    2.5    2.50  0.00  2.5  2.5   0.0   NaN
survey2      9 25  3.40  0.20    3.5    3.43  0.00  2.8  3.6   0.8 -1.58
ai_manip    10 25 47.64 34.03   45.0   47.19 44.48  1.0 99.0  98.0  0.17
condition   11 25  1.56  0.51    2.0    1.57  0.00  1.0  2.0   1.0 -0.23
          kurtosis   se
id           -1.35 6.08
identity     -1.44 6.47
consent       6.76 0.06
age          -0.47 1.05
race          5.05 0.21
gender       12.17 0.14
manip_out    -1.33 5.82
survey1        NaN 0.00
survey2       1.47 0.04
ai_manip     -1.47 6.81
condition    -2.02 0.10
------------------------------------------------------------ 
survey1: 2.6
          vars n mean sd median trimmed mad  min  max range skew kurtosis se
id           1 1 44.0 NA   44.0    44.0   0 44.0 44.0     0   NA       NA NA
identity     2 1 48.0 NA   48.0    48.0   0 48.0 48.0     0   NA       NA NA
consent      3 1  2.0 NA    2.0     2.0   0  2.0  2.0     0   NA       NA NA
age          4 1 28.0 NA   28.0    28.0   0 28.0 28.0     0   NA       NA NA
race         5 1  6.0 NA    6.0     6.0   0  6.0  6.0     0   NA       NA NA
gender       6 1  2.0 NA    2.0     2.0   0  2.0  2.0     0   NA       NA NA
manip_out    7 1 30.0 NA   30.0    30.0   0 30.0 30.0     0   NA       NA NA
survey1      8 1  2.6 NA    2.6     2.6   0  2.6  2.6     0   NA       NA NA
survey2      9 1  3.4 NA    3.4     3.4   0  3.4  3.4     0   NA       NA NA
ai_manip    10 1 52.0 NA   52.0    52.0   0 52.0 52.0     0   NA       NA NA
condition   11 1  1.0 NA    1.0     1.0   0  1.0  1.0     0   NA       NA NA
------------------------------------------------------------ 
survey1: 2.7
          vars n mean sd median trimmed mad  min  max range skew kurtosis se
id           1 1 11.0 NA   11.0    11.0   0 11.0 11.0     0   NA       NA NA
identity     2 1 55.0 NA   55.0    55.0   0 55.0 55.0     0   NA       NA NA
consent      3 1  2.0 NA    2.0     2.0   0  2.0  2.0     0   NA       NA NA
age          4 1 29.0 NA   29.0    29.0   0 29.0 29.0     0   NA       NA NA
race         5 1  6.0 NA    6.0     6.0   0  6.0  6.0     0   NA       NA NA
gender       6 1  2.0 NA    2.0     2.0   0  2.0  2.0     0   NA       NA NA
manip_out    7 1 94.0 NA   94.0    94.0   0 94.0 94.0     0   NA       NA NA
survey1      8 1  2.7 NA    2.7     2.7   0  2.7  2.7     0   NA       NA NA
survey2      9 1  3.5 NA    3.5     3.5   0  3.5  3.5     0   NA       NA NA
ai_manip    10 1 20.0 NA   20.0    20.0   0 20.0 20.0     0   NA       NA NA
condition   11 1  1.0 NA    1.0     1.0   0  1.0  1.0     0   NA       NA NA
------------------------------------------------------------ 
survey1: 3
          vars  n  mean    sd median trimmed   mad  min   max range  skew
id           1 13 54.00 31.87   67.0   54.73 29.65  5.0  95.0  90.0 -0.31
identity     2 13 46.69 30.41   34.0   46.00 28.17  9.0  92.0  83.0  0.31
consent      3 13  1.85  0.38    2.0    1.91  0.00  1.0   2.0   1.0 -1.70
age          4 13 31.23  9.72   28.0   29.45  1.48 24.0  58.0  34.0  1.66
race         5 13  4.00  1.91    3.0    4.00  1.48  1.0   7.0   6.0  0.20
gender       6 13  2.23  0.83    2.0    2.00  0.00  2.0   5.0   3.0  2.82
manip_out    7 13 53.15 30.86   55.0   52.73 38.55 11.0 100.0  89.0 -0.01
survey1      8 13  3.00  0.00    3.0    3.00  0.00  3.0   3.0   0.0   NaN
survey2      9 13  3.38  0.15    3.5    3.39  0.00  3.1   3.5   0.4 -0.54
ai_manip    10 13 63.38 26.60   67.0   65.00 10.38  9.0 100.0  91.0 -0.74
condition   11 13  1.62  0.51    2.0    1.64  0.00  1.0   2.0   1.0 -0.42
          kurtosis   se
id           -1.68 8.84
identity     -1.65 8.43
consent       0.99 0.10
age           1.68 2.70
race         -1.58 0.53
gender        6.44 0.23
manip_out    -1.62 8.56
survey1        NaN 0.00
survey2      -1.55 0.04
ai_manip     -0.35 7.38
condition    -1.96 0.14
------------------------------------------------------------ 
survey1: 3.09
          vars n  mean    sd median trimmed   mad   min   max range skew
id           1 2 61.00 35.36  61.00   61.00 37.06 36.00 86.00  50.0    0
identity     2 2 20.50 20.51  20.50   20.50 21.50  6.00 35.00  29.0    0
consent      3 2  2.00  0.00   2.00    2.00  0.00  2.00  2.00   0.0  NaN
age          4 2 26.00  2.83  26.00   26.00  2.97 24.00 28.00   4.0    0
race         5 2  3.50  0.71   3.50    3.50  0.74  3.00  4.00   1.0    0
gender       6 2  2.00  0.00   2.00    2.00  0.00  2.00  2.00   0.0  NaN
manip_out    7 2 33.50 40.31  33.50   33.50 42.25  5.00 62.00  57.0    0
survey1      8 2  3.09  0.00   3.09    3.09  0.00  3.09  3.09   0.0  NaN
survey2      9 2  3.25  0.35   3.25    3.25  0.37  3.00  3.50   0.5    0
ai_manip    10 2 48.50 51.62  48.50   48.50 54.11 12.00 85.00  73.0    0
condition   11 2  1.50  0.71   1.50    1.50  0.74  1.00  2.00   1.0    0
          kurtosis    se
id           -2.75 25.00
identity     -2.75 14.50
consent        NaN  0.00
age          -2.75  2.00
race         -2.75  0.50
gender         NaN  0.00
manip_out    -2.75 28.50
survey1        NaN  0.00
survey2      -2.75  0.25
ai_manip     -2.75 36.50
condition    -2.75  0.50
------------------------------------------------------------ 
survey1: 3.1
          vars n mean    sd median trimmed   mad  min  max range  skew kurtosis
id           1 5 52.2 25.82   47.0    52.2 19.27 29.0 94.0    65  0.63    -1.45
identity     2 5 58.4 26.82   66.0    58.4 13.34 12.0 78.0    66 -0.91    -1.12
consent      3 5  1.8  0.45    2.0     1.8  0.00  1.0  2.0     1 -1.07    -0.92
age          4 5 30.0  8.03   27.0    30.0  2.97 24.0 44.0    20  0.94    -1.07
race         5 5  4.4  1.52    4.0     4.4  1.48  3.0  6.0     3  0.15    -2.21
gender       6 5  2.0  0.00    2.0     2.0  0.00  2.0  2.0     0   NaN      NaN
manip_out    7 5 57.2 26.27   45.0    57.2 14.83 35.0 95.0    60  0.42    -1.93
survey1      8 5  3.1  0.00    3.1     3.1  0.00  3.1  3.1     0   NaN      NaN
survey2      9 5  3.2  0.42    3.4     3.2  0.15  2.5  3.5     1 -0.74    -1.41
ai_manip    10 5 65.2 34.67   81.0    65.2 20.76 22.0 95.0    73 -0.27    -2.17
condition   11 5  1.4  0.55    1.0     1.4  0.00  1.0  2.0     1  0.29    -2.25
             se
id        11.55
identity  11.99
consent    0.20
age        3.59
race       0.68
gender     0.00
manip_out 11.75
survey1    0.00
survey2    0.19
ai_manip  15.50
condition  0.24
------------------------------------------------------------ 
survey1: 3.18
          vars n  mean    sd median trimmed   mad   min   max range  skew
id           1 4 44.25 32.39  42.00   44.25 37.06 13.00 80.00  67.0  0.07
identity     2 4 27.25 19.38  19.50   27.25  5.19 14.00 56.00  42.0  0.70
consent      3 4  2.00  0.00   2.00    2.00  0.00  2.00  2.00   0.0   NaN
age          4 4 28.25  1.26  28.00   28.25  0.74 27.00 30.00   3.0  0.42
race         5 4  3.50  1.73   3.00    3.50  0.74  2.00  6.00   4.0  0.58
gender       6 4  2.75  1.50   2.00    2.75  0.00  2.00  5.00   3.0  0.75
manip_out    7 4 36.50 35.82  28.00   36.50 21.50  3.00 87.00  84.0  0.48
survey1      8 4  3.18  0.00   3.18    3.18  0.00  3.18  3.18   0.0   NaN
survey2      9 4  3.52  0.05   3.50    3.52  0.00  3.50  3.60   0.1  0.75
ai_manip    10 4 53.25 39.14  60.00   53.25 37.06  5.00 88.00  83.0 -0.21
condition   11 4  1.50  0.58   1.50    1.50  0.74  1.00  2.00   1.0  0.00
          kurtosis    se
id           -2.32 16.19
identity     -1.72  9.69
consent        NaN  0.00
age          -1.82  0.63
race         -1.77  0.87
gender       -1.69  0.75
manip_out    -1.82 17.91
survey1        NaN  0.00
survey2      -1.69  0.03
ai_manip     -2.19 19.57
condition    -2.44  0.29
------------------------------------------------------------ 
survey1: 3.2
          vars n  mean    sd median trimmed   mad  min  max range skew kurtosis
id           1 2 45.00 35.36  45.00   45.00 37.06 20.0 70.0  50.0    0    -2.75
identity     2 2 69.00 22.63  69.00   69.00 23.72 53.0 85.0  32.0    0    -2.75
consent      3 2  2.00  0.00   2.00    2.00  0.00  2.0  2.0   0.0  NaN      NaN
age          4 2 47.00 25.46  47.00   47.00 26.69 29.0 65.0  36.0    0    -2.75
race         5 2  5.00  2.83   5.00    5.00  2.97  3.0  7.0   4.0    0    -2.75
gender       6 2  2.00  0.00   2.00    2.00  0.00  2.0  2.0   0.0  NaN      NaN
manip_out    7 2 63.00 16.97  63.00   63.00 17.79 51.0 75.0  24.0    0    -2.75
survey1      8 2  3.20  0.00   3.20    3.20  0.00  3.2  3.2   0.0  NaN      NaN
survey2      9 2  3.45  0.07   3.45    3.45  0.07  3.4  3.5   0.1    0    -2.75
ai_manip    10 2 36.00  9.90  36.00   36.00 10.38 29.0 43.0  14.0    0    -2.75
condition   11 2  1.50  0.71   1.50    1.50  0.74  1.0  2.0   1.0    0    -2.75
             se
id        25.00
identity  16.00
consent    0.00
age       18.00
race       2.00
gender     0.00
manip_out 12.00
survey1    0.00
survey2    0.05
ai_manip   7.00
condition  0.50
------------------------------------------------------------ 
survey1: 3.27
          vars n  mean    sd median trimmed   mad   min   max range  skew
id           1 3 69.00 25.36  58.00   69.00 10.38 51.00 98.00  47.0  0.35
identity     2 3 23.67 21.73  20.00   23.67 23.72  4.00 47.00  43.0  0.16
consent      3 3  2.00  0.00   2.00    2.00  0.00  2.00  2.00   0.0   NaN
age          4 3 26.00  3.46  28.00   26.00  0.00 22.00 28.00   6.0 -0.38
race         5 3  4.33  1.53   4.00    4.33  1.48  3.00  6.00   3.0  0.21
gender       6 3  2.00  0.00   2.00    2.00  0.00  2.00  2.00   0.0   NaN
manip_out    7 3 68.33 15.37  61.00   68.33  4.45 58.00 86.00  28.0  0.37
survey1      8 3  3.27  0.00   3.27    3.27  0.00  3.27  3.27   0.0   NaN
survey2      9 3  3.33  0.29   3.50    3.33  0.00  3.00  3.50   0.5 -0.38
ai_manip    10 3 53.00 26.66  44.00   53.00 17.79 32.00 83.00  51.0  0.30
condition   11 3  2.00  0.00   2.00    2.00  0.00  2.00  2.00   0.0   NaN
          kurtosis    se
id           -2.33 14.64
identity     -2.33 12.55
consent        NaN  0.00
age          -2.33  2.00
race         -2.33  0.88
gender         NaN  0.00
manip_out    -2.33  8.88
survey1        NaN  0.00
survey2      -2.33  0.17
ai_manip     -2.33 15.39
condition      NaN  0.00
------------------------------------------------------------ 
survey1: 3.5
          vars  n  mean    sd median trimmed   mad  min   max range  skew
id           1 40 48.42 27.78   45.5   47.72 29.65  3.0 100.0  97.0  0.22
identity     2 40 55.77 26.20   54.5   56.31 27.43  2.0 100.0  98.0 -0.08
consent      3 40  1.98  0.16    2.0    2.00  0.00  1.0   2.0   1.0 -5.86
age          4 40 27.82  5.34   26.0   26.84  2.97 21.0  48.0  27.0  1.97
race         5 40  4.15  1.53    3.5    4.19  0.74  2.0   6.0   4.0  0.22
gender       6 40  2.25  1.03    2.0    2.00  0.00  1.0   7.0   6.0  3.33
manip_out    7 40 50.30 30.47   46.0   49.91 36.32  2.0  99.0  97.0  0.08
survey1      8 40  3.50  0.00    3.5    3.50  0.00  3.5   3.5   0.0   NaN
survey2      9 40  3.39  0.16    3.5    3.42  0.00  3.0   3.5   0.5 -1.33
ai_manip    10 40 47.52 26.67   47.5   47.47 35.58  3.0  93.0  90.0  0.03
condition   11 40  1.40  0.50    1.0    1.38  0.00  1.0   2.0   1.0  0.39
          kurtosis   se
id           -1.13 4.39
identity     -1.03 4.14
consent      33.15 0.02
age           4.28 0.84
race         -1.68 0.24
gender       10.82 0.16
manip_out    -1.40 4.82
survey1        NaN 0.00
survey2       0.44 0.03
ai_manip     -1.33 4.22
condition    -1.89 0.08
------------------------------------------------------------ 
survey1: 3.6
          vars n mean sd median trimmed mad  min  max range skew kurtosis se
id           1 1 90.0 NA   90.0    90.0   0 90.0 90.0     0   NA       NA NA
identity     2 1 19.0 NA   19.0    19.0   0 19.0 19.0     0   NA       NA NA
consent      3 1  1.0 NA    1.0     1.0   0  1.0  1.0     0   NA       NA NA
age          4 1 28.0 NA   28.0    28.0   0 28.0 28.0     0   NA       NA NA
race         5 1  3.0 NA    3.0     3.0   0  3.0  3.0     0   NA       NA NA
gender       6 1  2.0 NA    2.0     2.0   0  2.0  2.0     0   NA       NA NA
manip_out    7 1 36.0 NA   36.0    36.0   0 36.0 36.0     0   NA       NA NA
survey1      8 1  3.6 NA    3.6     3.6   0  3.6  3.6     0   NA       NA NA
survey2      9 1  3.5 NA    3.5     3.5   0  3.5  3.5     0   NA       NA NA
ai_manip    10 1 50.0 NA   50.0    50.0   0 50.0 50.0     0   NA       NA NA
condition   11 1  2.0 NA    2.0     2.0   0  2.0  2.0     0   NA       NA NA
------------------------------------------------------------ 
survey1: 3.63
          vars n  mean sd median trimmed mad   min   max range skew kurtosis se
id           1 1 93.00 NA  93.00   93.00   0 93.00 93.00     0   NA       NA NA
identity     2 1  7.00 NA   7.00    7.00   0  7.00  7.00     0   NA       NA NA
consent      3 1  2.00 NA   2.00    2.00   0  2.00  2.00     0   NA       NA NA
age          4 1 24.00 NA  24.00   24.00   0 24.00 24.00     0   NA       NA NA
race         5 1  5.00 NA   5.00    5.00   0  5.00  5.00     0   NA       NA NA
gender       6 1  2.00 NA   2.00    2.00   0  2.00  2.00     0   NA       NA NA
manip_out    7 1 57.00 NA  57.00   57.00   0 57.00 57.00     0   NA       NA NA
survey1      8 1  3.63 NA   3.63    3.63   0  3.63  3.63     0   NA       NA NA
survey2      9 1  3.00 NA   3.00    3.00   0  3.00  3.00     0   NA       NA NA
ai_manip    10 1 30.00 NA  30.00   30.00   0 30.00 30.00     0   NA       NA NA
condition   11 1  2.00 NA   2.00    2.00   0  2.00  2.00     0   NA       NA NA

# 
# # also use histograms and scatterplots to examine your continuous variables
hist(d$survey2)

plot(d$survey1, d$survey2)

# 
# # and table() and cross_cases() to examine your categorical variables
# # you may not need the cross_cases code
# table(d$IV)
# cross_cases(d, IV1, IV2)
# 
# # and boxplot to examine any categorical variables with continuous variables
 boxplot(d$survey2~d$condition)

# 
# #convert any categorical variables to factors
# d$variable <- as.factor(d$variable)

Check Your Assumptions

t-Test Assumptions

Data values must be independent (independent t-test only) (confirmed by data report)
Data obtained via a random sample (confirmed by data report)
IV must have two levels (will check below)
Dependent variable must be normally distributed (will check below. if issues, note and proceed)
Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
table(d$condition, useNA = "always")


   1    2 <NA> 
  50   50    0

# 
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
# d <- subset(d, IV != "BAD")
# 
table(d$condition, useNA = "always")


   1    2 <NA> 
  50   50    0

# 
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
# 
table(d$condition, useNA = "always")


   1    2 <NA> 
  50   50    0

# 
# # check your variable types
str(d)

'data.frame':   100 obs. of  11 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I’m a 29-year-old White woman from Ohio. I’m an introvert, often battling anxiety and feelings of loneliness. T"| __truncated__ "I'm a 27-year-old Black woman living in Atlanta, juggling my job as a graphic designer with freelance gigs. I s"| __truncated__ "I’m a 29-year-old Asian American woman, navigating the complexities of my career in advertising. I often feel p"| __truncated__ "I’m a 28-year-old White woman living in Portland. I’m passionate about sustainable living but often feel overwh"| __truncated__ ...
 $ consent  : chr  "I understand these instructions." "I understand these instructions." "I understand these instructions." "I understand these instructions." ...
 $ age      : int  29 27 29 28 27 32 32 24 24 29 ...
 $ race     : int  6 3 2 6 3 6 6 6 4 7 ...
 $ gender   : int  2 2 2 2 2 2 2 2 2 2 ...
 $ manip_out: chr  "Thank you for sharing that context about yourself. It sounds like you're navigating a complex relationship with"| __truncated__ "Thank you for sharing this information. It sounds like you're participating in an interesting study that could "| __truncated__ "Thank you for sharing that context; it sounds like you're navigating a challenging but important journey. Parti"| __truncated__ "Thank you for the context! I'm here to help you explore the themes related to your identity and how they inters"| __truncated__ ...
 $ survey1  : num  2.27 2.5 3.5 2.5 3 2.5 3.5 2.5 3.5 3 ...
 $ survey2  : num  3.5 3.5 3.5 3.5 3.2 3 3.5 3.5 3.2 3.1 ...
 $ ai_manip : chr  "Thank you for sharing more about your background and experiences. It sounds like you're reflecting thoughtfully"| __truncated__ "It sounds like you're navigating some common challenges that many people face, especially those in creative fie"| __truncated__ "It sounds like you're reflecting deeply on your experiences and feelings related to social media, your career, "| __truncated__ "It sounds like you're looking for a deeper understanding of how your identity, experiences, and personality tra"| __truncated__ ...
 $ condition: int  1 1 1 1 1 1 1 1 1 1 ...

# 
# # make sure that your IV is recognized as a factor by R
# # if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey2~condition, data = d)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.4529 0.5026
      98

Pearson’s Correlation Coefficient Assumptions

Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (will do below)
Outliers should be identified and removed, or results will be inaccurate (will do below)
Relationship between the variables should be linear, or they will not be detected (will do below)

Run a Multiple Linear Regression

To check the assumptions for Pearson’s correlation coefficient, we run our regression and then check our diagnostic plots.

# # use the lm() command to run the regression
# # dependent/outcome variable on the left, independent/predictor variables on the right
reg_model <- lm(survey2 ~ survey1, data = d)

Check linearity with Residuals vs Fitted plot

For some examples of good Residuals vs Fitted plot and ones that show serious errors, check out this page.

For your homework, you’ll simply need to generate this plot and talk about how your plot compares to the good and problematic plots linked to above. Is it closer to the ‘good’ plots or one of the ‘bad’ plots? This is going to be a judgement call, and that’s okay! In practice, you’ll always be making these judgement calls as part of a team, so this assignment is just about getting experience with it, not making the perfect call.

plot(reg_model, 1)

Check for outliers using Cook’s distance and a Residuals vs Leverage plot

For your homework, you’ll simply need to generate these plots, assess Cook’s distance in your dataset, and then identify any potential cases that are prominent outliers.

# # Cook's distance
plot(reg_model, 4)

# 
# # Residuals vs Leverage
plot(reg_model, 5)

Issues with My Data

Describe any issues and why they’re problematic here.

Run Your Analysis

Run a t-Test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey2~d$condition)

View Test Output

t_output


    Welch Two Sample t-test

data:  d$survey2 by d$condition
t = -0.87038, df = 90.974, p-value = 0.3864
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.1115952  0.0435952
sample estimates:
mean in group 1 mean in group 2 
          3.368           3.402

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey2~d$condition)

View Effect Size

Trivial: < .2
Small: between .2 and .5
Medium: between .5 and .8
Large: > .8

d_output


Cohen's d

d estimate: -0.1740754 (negligible)
95 percent confidence interval:
     lower      upper 
-0.5717199  0.2235690

Run a Correlation Test

Create a Correlation Matrix

d2 <- subset(d, select= c(survey1, survey2))
corr_output_m <- corr.test(d2)

View Test Output

Strong effect: Between |0.50| and |1|
Moderate effect: Between |0.30| and |0.49|
Weak effect: Between |0.10| and |0.29|
Trivial effect: Less than |0.09|

corr_output_m

Call:corr.test(x = d2)
Correlation matrix 
        survey1 survey2
survey1    1.00   -0.06
survey2   -0.06    1.00
Sample Size 
[1] 100
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
        survey1 survey2
survey1    0.00    0.58
survey2    0.58    0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

Write Up Results

t-Test

he hypothesis predicted that participants in the social media use condition would have higher post-stress scores than those in the control condition. The independent variable was condition (social media use vs. control), and the dependent variable was post-survey stress scores (survey2).

Prior to conducting the t-test, I ran Levene’s Test for homogeneity of variance to check if the variances between the two groups were equal. Levene’s test was not significant, 𝐹 ( 1 , 98 ) = 0.45 , 𝑝 = .503 F(1,98)=0.45,p=.503, indicating that the assumption of equal variances was met.

An independent samples t-test was then performed to compare post-stress scores between the two conditions. The results indicated no significant difference between the social media use and control conditions, t(90.97)=−0.87,p=.386. The means for the social media use condition (M = 3.37, SD = 0.89) and the control condition (M = 3.40, SD = 0.91) were very similar, with a 95% confidence interval for the difference in means ranging from -0.11 to 0.04. Cohen’s d was calculated as -0.17, which indicates a negligible effect size.

There were no major issues, and the analysis was conducted successfully. However, it is important to note that the assumption of equal variances was confirmed, and the results of the t-test were consistent with that assumption.

Correlation Test

The hypothesis predicted that pre-survey stress scores (survey1) would be positively correlated with post-survey stress scores (survey2) across all participants. To test this, a Pearson correlation was conducted between the pre- and post-survey stress scores.

The correlation between pre-survey and post-survey stress scores was weak and negative, r = -0.06, p = .58, suggesting no meaningful relationship between the two variables. The 95% confidence interval for the correlation ranged from -0.18 to 0.09. This result was not statistically significant, indicating that pre-survey stress scores did not reliably predict post-survey stress scores.

There were no issues with the coding, and the analysis was conducted successfully. However, the weak, non-significant correlation suggests that the relationship between pre- and post-survey stress scores is minimal.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.