AI Experiment Analysis

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results

Importing Data

# import your AI results dataset
d <- read.csv(file="Data/final results .csv", header=T)

State Your Hypotheses & Chosen Tests

H1: I predict that participants who report more hours of sleep per night will experience lower levels of stress than those that report lower hours of sleep. I will be using a T-test analyze this hypothesis. H2: High/low levels of resilience will predict stress levels and the relationship will be negative (higher levels of resilience will be associated with lower levels of stress). I will be using a Correlation test to analyze this hypothesis.

Check Your Variables

This is just basic variable checking that is used across all HW assignments.

# # to view stats for all variables
describe(d)
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
           vars   n  mean    sd median trimmed   mad  min   max range  skew
id            1 100 50.50 29.01   50.5   50.50 37.06  1.0 100.0  99.0  0.00
identity*     2 100 50.50 29.01   50.5   50.50 37.06  1.0 100.0  99.0  0.00
consent*      3 100  1.78  0.42    2.0    1.85  0.00  1.0   2.0   1.0 -1.33
age           4 100 39.86 12.30   35.0   38.00  5.93 23.0  82.0  59.0  1.41
race          5 100  4.87  1.54    6.0    4.95  0.00  1.0   7.0   6.0 -0.48
gender        6 100  1.93  0.26    2.0    2.00  0.00  1.0   2.0   1.0 -3.32
manip_out*    7 100 50.50 29.01   50.5   50.50 37.06  1.0 100.0  99.0  0.00
survey1       8 100  3.14  0.35    3.2    3.20  0.30  2.2   3.6   1.4 -1.40
survey.2      9 100  3.11  0.22    3.0    3.10  0.00  2.5   3.5   1.0  0.69
ai_manip*    10 100 39.76 19.43   44.0   40.24 20.76  1.0  74.0  73.0 -0.23
condition    11 100  1.50  0.50    1.5    1.50  0.74  1.0   2.0   1.0  0.00
X            12   0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
X.1          13   0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
X.2          14   0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
X.3          15   0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
X.4          16   0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
           kurtosis   se
id            -1.24 2.90
identity*     -1.24 2.90
consent*      -0.23 0.04
age            1.47 1.23
race          -1.27 0.15
gender         9.11 0.03
manip_out*    -1.24 2.90
survey1        1.51 0.03
survey.2      -0.39 0.02
ai_manip*     -0.87 1.94
condition     -2.02 0.05
X                NA   NA
X.1              NA   NA
X.2              NA   NA
X.3              NA   NA
X.4              NA   NA
# # we'll use the describeBy() command to view skew and kurtosis across our IVs
 describeBy(d, group = "condition")
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf

 Descriptive statistics by group 
condition: 1
          vars  n  mean    sd median trimmed   mad   min   max range  skew
id           1 50 25.50 14.58   25.5   25.50 18.53  1.00  50.0 49.00  0.00
identity     2 50 52.92 27.47   51.5   52.90 32.62  3.00 100.0 97.00  0.01
consent      3 50  1.84  0.37    2.0    1.93  0.00  1.00   2.0  1.00 -1.80
age          4 50 41.56 12.93   35.0   39.75  5.19 24.00  82.0 58.00  1.21
race         5 50  4.62  1.61    6.0    4.70  1.48  1.00   7.0  6.00 -0.30
gender       6 50  1.96  0.20    2.0    2.00  0.00  1.00   2.0  1.00 -4.55
manip_out    7 50 53.10 29.88   52.5   53.30 42.25  4.00 100.0 96.00 -0.01
survey1      8 50  3.18  0.31    3.2    3.23  0.30  2.20   3.6  1.40 -1.67
survey.2     9 50  3.09  0.21    3.0    3.07  0.00  2.83   3.5  0.67  1.14
ai_manip    10 50 37.06 20.40   37.0   36.88 25.20  2.00  74.0 72.00  0.03
condition   11 50  1.00  0.00    1.0    1.00  0.00  1.00   1.0  0.00   NaN
X           12  0   NaN    NA     NA     NaN    NA   Inf  -Inf  -Inf    NA
X.1         13  0   NaN    NA     NA     NaN    NA   Inf  -Inf  -Inf    NA
X.2         14  0   NaN    NA     NA     NaN    NA   Inf  -Inf  -Inf    NA
X.3         15  0   NaN    NA     NA     NaN    NA   Inf  -Inf  -Inf    NA
X.4         16  0   NaN    NA     NA     NaN    NA   Inf  -Inf  -Inf    NA
          kurtosis   se
id           -1.27 2.06
identity     -1.09 3.88
consent       1.26 0.05
age           0.59 1.83
race         -1.45 0.23
gender       19.13 0.03
manip_out    -1.36 4.23
survey1       2.41 0.04
survey.2     -0.04 0.03
ai_manip     -1.11 2.89
condition      NaN 0.00
X               NA   NA
X.1             NA   NA
X.2             NA   NA
X.3             NA   NA
X.4             NA   NA
------------------------------------------------------------ 
condition: 2
          vars  n  mean    sd median trimmed   mad  min   max range  skew
id           1 50 75.50 14.58   75.5   75.50 18.53 51.0 100.0  49.0  0.00
identity     2 50 48.08 30.56   46.0   47.95 37.81  1.0  98.0  97.0  0.04
consent      3 50  1.72  0.45    2.0    1.77  0.00  1.0   2.0   1.0 -0.95
age          4 50 38.16 11.51   34.0   36.23  5.93 23.0  81.0  58.0  1.61
race         5 50  5.12  1.42    6.0    5.20  0.00  2.0   7.0   5.0 -0.62
gender       6 50  1.90  0.30    2.0    2.00  0.00  1.0   2.0   1.0 -2.59
manip_out    7 50 47.90 28.17   47.5   48.02 31.13  1.0  99.0  98.0 -0.02
survey1      8 50  3.11  0.38    3.2    3.16  0.15  2.2   3.6   1.4 -1.14
survey.2     9 50  3.14  0.24    3.0    3.13  0.00  2.5   3.5   1.0  0.32
ai_manip    10 50 42.46 18.20   44.0   43.55 20.02  1.0  73.0  72.0 -0.50
condition   11 50  2.00  0.00    2.0    2.00  0.00  2.0   2.0   0.0   NaN
X           12  0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
X.1         13  0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
X.2         14  0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
X.3         15  0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
X.4         16  0   NaN    NA     NA     NaN    NA  Inf  -Inf  -Inf    NA
          kurtosis   se
id           -1.27 2.06
identity     -1.44 4.32
consent      -1.12 0.06
age           2.57 1.63
race         -1.18 0.20
gender        4.79 0.04
manip_out    -1.22 3.98
survey1       0.69 0.05
survey.2     -0.58 0.03
ai_manip     -0.42 2.57
condition      NaN 0.00
X               NA   NA
X.1             NA   NA
X.2             NA   NA
X.3             NA   NA
X.4             NA   NA
 # # also use histograms and scatterplots to examine your continuous variables
hist(d$survey1)

hist(d$survey.2)

plot(d$survey1, d$survey.2)

# and table() and cross_cases() to examine your categorical variables
# you may not need the cross_cases code
table(d$condition)

 1  2 
50 50 
cross_cases(d, condition, survey.2)
 survey.2 
 2.5   2.833333333   3   3.166666667   3.333333333   3.5 
 condition 
   1  5 32 3 2 8
   2  1 1 30 2 4 12
   #Total cases  1 6 62 5 6 20
# and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey1~d$condition)

#convert any categorical variables to factors
d$condition <- as.factor(d$condition)

Check Your Assumptions

t-Test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# preview the levels and counts for your IV
table(d$condition, useNA = "always")

   1    2 <NA> 
  50   50    0 
# note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear

# check your variable types
str(d)
'data.frame':   100 obs. of  16 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ identity : chr  "I'm 53 years old, a Black woman living in Atlanta. I'm passionate about community gardening and spend weekends "| __truncated__ "I’m 40, a White woman living in a small town in Ohio. I work as a librarian, where I find solace among books. I"| __truncated__ "I’m 37, a White woman living in a small town in Ohio. I work as a nurse, finding purpose in caring for others. "| __truncated__ "I'm a 35-year-old multiracial woman, navigating life as the daughter of a Black father and a Latina mother. I w"| __truncated__ ...
 $ consent  : chr  "I understand these instructions." "I understand these instructions." "I understand these instructions." "I understand the instructions." ...
 $ age      : int  53 40 37 35 34 29 34 43 55 47 ...
 $ race     : int  3 6 6 7 3 6 3 3 3 6 ...
 $ gender   : int  2 2 2 2 2 2 2 1 2 2 ...
 $ manip_out: chr  "I will set my bedtime for 10:30 PM and wake up at 6:30 AM to ensure I get a full 8 hours of sleep each night. I"| __truncated__ "For the next two weeks, I will stick to a consistent sleep schedule, aiming for eight hours of sleep each night"| __truncated__ "For the next two weeks, I'll set a consistent bedtime of 10:30 PM and a wake-up time of 6:30 AM to ensure I get"| __truncated__ "For the next two weeks, I will maintain a consistent sleep schedule, aiming for 8 hours of sleep each night. My"| __truncated__ ...
 $ survey1  : num  3.4 3.2 3.4 3.2 3.4 2.8 3.2 3.2 3.4 3.2 ...
 $ survey.2 : num  3 3.33 3.17 2.83 3 ...
 $ ai_manip : chr  "Thank you for your participation." "Thank you for sharing your information. Please continue with your study and activities as planned, and I wish y"| __truncated__ "I’m 37, a White woman living in a small town in Ohio. I work as a nurse, finding purpose in caring for others. "| __truncated__ "At the end of the first week, I assessed my stress levels and noticed some changes. My level of resilience rema"| __truncated__ ...
 $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ X        : logi  NA NA NA NA NA NA ...
 $ X.1      : logi  NA NA NA NA NA NA ...
 $ X.2      : logi  NA NA NA NA NA NA ...
 $ X.3      : logi  NA NA NA NA NA NA ...
 $ X.4      : logi  NA NA NA NA NA NA ...
# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# use the leveneTest() command from the car package to test homogeneity of variance
# uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey.2~condition, data = d)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  1.2971 0.2575
      98               

Pearson’s Correlation Coefficient Assumptions

  • Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
  • Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (will do below)
  • Outliers should be identified and removed, or results will be inaccurate (will do below)
  • Relationship between the variables should be linear, or they will not be detected (will do below)

Run a Multiple Linear Regression

To check the assumptions for Pearson’s correlation coefficient, we run our regression and then check our diagnostic plots.

# use the lm() command to run the regression
# dependent/outcome variable on the left, independent/predictor variables on the right
reg_model <- lm(survey1 ~ condition + survey.2, data = d)

Check linearity with Residuals vs Fitted plot

For some examples of good Residuals vs Fitted plot and ones that show serious errors, check out this page.

For your homework, you’ll simply need to generate this plot and talk about how your plot compares to the good and problematic plots linked to above. Is it closer to the ‘good’ plots or one of the ‘bad’ plots? This is going to be a judgement call, and that’s okay! In practice, you’ll always be making these judgement calls as part of a team, so this assignment is just about getting experience with it, not making the perfect call.

 plot(reg_model, 1)

Check for outliers using Cook’s distance and a Residuals vs Leverage plot

For your homework, you’ll simply need to generate these plots, assess Cook’s distance in your dataset, and then identify any potential cases that are prominent outliers.

# # Cook's distance
plot(reg_model, 4)

#
# # Residuals vs Leverage
plot(reg_model, 5)

Issues with My Data

When checking linearity and non linearity, the residuals vs leverage plot showed some outliers as the red line was distant from the cook’s distance line, however Cook’s distance plot remained normal (there were no results higher than .5) When checking skew and kurtosis when the data was separated into conditions it showed that survey 1 (stress) in condition 1 had a skew of -2.02 which implies that there are outliers in the data which means that there was a flatter distribution than normal.

Run Your Analysis

Run a t-Test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
 t_output <- t.test(d$survey1~d$condition)

View Test Output

t_output

    Welch Two Sample t-test

data:  d$survey1 by d$condition
t = 0.97538, df = 94.279, p-value = 0.3319
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.07041865  0.20641865
sample estimates:
mean in group 1 mean in group 2 
          3.176           3.108 

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey1~d$condition)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: 0.1950754 (negligible)
95 percent confidence interval:
     lower      upper 
-0.2027609  0.5929118 

Run a Correlation Test

Create a Correlation Matrix

#d2 <- subset(d, select=-c(DROP ANY VARIABLES YOU DO NOT WANT IN THE CORRELATION))
d2 <- subset(d,select=c(survey1,survey.2))
corr_output_m <- corr.test(d2)

View Test Output

  • Strong effect: Between |0.50| and |1|
  • Moderate effect: Between |0.30| and |0.49|
  • Weak effect: Between |0.10| and |0.29|
  • Trivial effect: Less than |0.09|
corr_output_m
Call:corr.test(x = d2)
Correlation matrix 
         survey1 survey.2
survey1     1.00    -0.19
survey.2   -0.19     1.00
Sample Size 
[1] 100
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
         survey1 survey.2
survey1     0.00     0.06
survey.2    0.06     0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

Write Up Results

t-Test

I tested my hypothesis that participants that get 5 hours of sleep will report being more stressed than participants that reported getting between 8 hours of sleep using an independent samples t-test.I also confirmed homogeneity of variance using Levenes test (p = .25).When checking skew and kurtosis when the data was separated into conditions it showed that survey 1 (stress) in condition 1 had a skew of -2.02 which implies that there are outliers in the data which means that there was a flatter distribution than normal.There was not a significant difference present, t(94.279) = 0.97538, p = 0.33, d = .1950754 (negligible) According to Cohen (1988), 95%[-0.20, 0.59]. (refer to Figure 1).

Correlation Test

I hypothesized that High/low levels of resilience would predict stress levels and that the relationship would be negative. When checking for outliers,the data provided by the histograms were skewed, with one measure being repeated and some not even being selected. The scatterplot for stress and resilience, there did not seem to be a strictly linear relationship between the two. When checking linearity and non linearity, the residuals vs leverage plot showed some outliers as the red line was distant from the cook’s distance line, however Cook’s distance plot remained normal (there were no results higher than .5) When checking skew and kurtosis, the data was separated into conditions it showed that survey 1 (stress) in condition 1 had a skew of -2.02 which implies that there are outliers in the data which means that there was a flatter distribution than norma. Effect size was negligble (Cohen 1988)

[1] "Table 1: Means, standard deviations, and correlations with confidence intervals\n"
Variable M SD 1 Surveys
1. survey1 3.14 0.35 Stress
2. survey.2 3.11 0.22 -.19 Resilience
[-.37, .01]
Note:
M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation.
* indicates p < .05
** indicates p < .01.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.