library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our resultsAI Experiment Analysis
Loading Libraries
Importing Data
# # import your AI results dataset
d <- read.csv(file="Data/final_data(in).csv", header=T)State Your Hypotheses & Chosen Tests
H1: I predict that participants who write an essay about their favorite childhood memory will report higher levels of mindfulness than participants who write an essay about their favorite course they took in college. H2: I predict women will have higher scores of mindfulness than men will.
Two iv levels (family memory and college course) and continuous dv (mindfulness) = T-test Two iv levels (male or female) and continuous dv (mindfulness) = T-test
Check Your Variables
This is just basic variable checking that is used across all HW assignments.
# # to view stats for all variables
describe(d)Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
vars n mean sd median trimmed mad min max range skew
id 1 100 50.50 29.01 50.50 50.50 37.06 1.00 100.00 99.00 0.00
identity* 2 100 50.50 29.01 50.50 50.50 37.06 1.00 100.00 99.00 0.00
consent* 3 100 30.88 17.37 26.50 30.26 19.27 1.00 66.00 65.00 0.30
age 4 100 19.81 1.36 20.00 19.59 1.48 17.00 27.00 10.00 2.70
race 5 100 4.94 1.46 6.00 5.08 0.00 1.00 7.00 6.00 -0.68
gender 6 100 1.87 0.81 2.00 1.82 0.00 1.00 7.00 6.00 3.37
manip_out* 7 100 50.50 29.01 50.50 50.50 37.06 1.00 100.00 99.00 0.00
survey1 8 100 4.24 0.27 4.27 4.26 0.20 3.47 4.73 1.27 -0.90
survey2 9 0 NaN NA NA NaN NA Inf -Inf -Inf NA
ai_manip* 10 100 47.92 25.69 50.50 48.77 33.36 1.00 88.00 87.00 -0.23
condition 11 100 1.50 0.50 1.50 1.50 0.74 1.00 2.00 1.00 0.00
kurtosis se
id -1.24 2.90
identity* -1.24 2.90
consent* -0.98 1.74
age 9.95 0.14
race -1.11 0.15
gender 17.64 0.08
manip_out* -1.24 2.90
survey1 0.88 0.03
survey2 NA NA
ai_manip* -1.29 2.57
condition -2.02 0.05
#
# # we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = "condition")Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
Descriptive statistics by group
condition: 1
vars n mean sd median trimmed mad min max range skew
id 1 50 25.50 14.58 25.50 25.50 18.53 1.00 50.00 49.00 0.00
identity 2 50 55.06 29.38 52.00 55.25 43.00 9.00 100.00 91.00 -0.04
consent 3 50 30.74 18.72 26.00 29.90 22.98 1.00 66.00 65.00 0.36
age 4 50 19.62 1.03 19.50 19.50 0.74 18.00 25.00 7.00 2.78
race 5 50 4.72 1.53 6.00 4.82 0.00 1.00 7.00 6.00 -0.45
gender 6 50 1.96 0.95 2.00 1.88 0.00 1.00 7.00 6.00 3.47
manip_out 7 50 30.98 20.77 27.50 29.25 18.53 3.00 100.00 97.00 0.93
survey1 8 50 4.25 0.27 4.27 4.27 0.20 3.47 4.73 1.27 -0.89
survey2 9 0 NaN NA NA NaN NA Inf -Inf -Inf NA
ai_manip 10 50 46.68 26.02 46.00 47.15 36.32 2.00 88.00 86.00 -0.12
condition 11 50 1.00 0.00 1.00 1.00 0.00 1.00 1.00 0.00 NaN
kurtosis se
id -1.27 2.06
identity -1.42 4.15
consent -1.02 2.65
age 12.51 0.15
race -1.28 0.22
gender 15.40 0.13
manip_out 0.67 2.94
survey1 0.58 0.04
survey2 NA NA
ai_manip -1.35 3.68
condition NaN 0.00
------------------------------------------------------------
condition: 2
vars n mean sd median trimmed mad min max range skew
id 1 50 75.50 14.58 75.5 75.50 18.53 51.00 100.00 49.0 0.00
identity 2 50 45.94 28.20 49.5 45.58 32.62 1.00 98.00 97.0 0.01
consent 3 50 31.02 16.09 27.0 30.73 17.79 3.00 61.00 58.0 0.19
age 4 50 20.00 1.62 20.0 19.70 0.00 17.00 27.00 10.0 2.30
race 5 50 5.16 1.38 6.0 5.32 0.00 2.00 7.00 5.0 -0.93
gender 6 50 1.78 0.65 2.0 1.77 0.00 1.00 5.00 4.0 1.99
manip_out 7 50 70.02 22.17 74.5 72.03 23.72 1.00 99.00 98.0 -1.08
survey1 8 50 4.23 0.26 4.2 4.25 0.20 3.47 4.67 1.2 -0.89
survey2 9 0 NaN NA NA NaN NA Inf -Inf -Inf NA
ai_manip 10 50 49.16 25.57 54.5 50.42 28.91 1.00 87.00 86.0 -0.34
condition 11 50 2.00 0.00 2.0 2.00 0.00 2.00 2.00 0.0 NaN
kurtosis se
id -1.27 2.06
identity -1.19 3.99
consent -1.13 2.28
age 6.63 0.23
race -0.92 0.19
gender 9.79 0.09
manip_out 1.27 3.14
survey1 1.05 0.04
survey2 NA NA
ai_manip -1.26 3.62
condition NaN 0.00
#
# # also use histograms and scatterplots to examine your continuous variables
hist(d$survey1)# plot(d$dv, d$tv)
#
# # and table() and cross_cases() to examine your categorical variables
# # you may not need the cross_cases code
table(d$condition)
1 2
50 50
# cross_cases(d, IV1, IV2)
#
# # and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey1~d$condition)boxplot(d$survey1~d$gender)#
# #convert any categorical variables to factors
# d$variable <- as.factor(d$variable)Check Your Assumptions
t-Test Assumptions
- Data values must be independent (independent t-test only) (confirmed by data report)
- Data obtained via a random sample (confirmed by data report)
- IV must have two levels (will check below)
- Dependent variable must be normally distributed (will check below. if issues, note and proceed)
- Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)
Checking IV levels
# preview the levels and counts for your IV
table(d$condition, useNA = "always")
1 2 <NA>
50 50 0
# note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
# check your variable types
str(d)'data.frame': 100 obs. of 11 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ identity : chr "I'm 19 and a proud Latina majoring in environmental science. I love nature but often feel overwhelmed balancing"| __truncated__ "I’m 21 and a junior studying Environmental Science at a state university. Feeling disconnected from my peers, I"| __truncated__ "I'm a 19-year-old White college student studying psychology. I'm passionate yet often overwhelmed by anxiety an"| __truncated__ "I’m Jada, a 20-year-old Black woman studying sociology. I’m passionate about social justice but often feel over"| __truncated__ ...
$ consent : chr "I understand the instructions. I will respond to the memory question with a 200-word response and then complete"| __truncated__ "I understand the instructions. I will respond to the memory question with a 200-word response and then complete"| __truncated__ "I understand the instructions. I will respond to a memory question with a 200-word response, and then I will co"| __truncated__ "I understand the instructions. I'm ready to respond to the memory question and complete the two surveys on mind"| __truncated__ ...
$ age : int 19 21 19 20 20 19 20 20 19 21 ...
$ race : int 4 6 6 3 3 3 6 3 4 6 ...
$ gender : int 2 2 2 2 2 2 1 2 2 2 ...
$ manip_out: chr "One of my fondest childhood memories is a summer camping trip my family took to a national park. I was around e"| __truncated__ "One of my favorite childhood memories takes me back to a summer spent at my grandmother's house. Every Saturday"| __truncated__ "One of my fondest childhood memories is the summer camping trip my family took to a quiet lake nestled in the w"| __truncated__ "One of my fondest childhood memories is from a summer family picnic in the park when I was about eight years ol"| __truncated__ ...
$ survey1 : num 4.13 4.4 4.27 3.47 4.4 ...
$ survey2 : logi NA NA NA NA NA NA ...
$ ai_manip : chr "Thank you for participating!" "I'm 21 and a junior studying Environmental Science at a state university. Feeling disconnected from my peers, I"| __truncated__ "I'm a 19-year-old White college student studying psychology. I'm passionate yet often overwhelmed by anxiety an"| __truncated__ "Thank you for participating!" ...
$ condition: int 1 1 1 1 1 1 1 1 1 1 ...
# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)Testing Homogeneity of Variance with Levene’s Test
We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!
# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey1~condition, data = d)Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.0013 0.971
98
Check Your Assumptions
t-Test Assumptions
- Data values must be independent (independent t-test only) (confirmed by data report)
- Data obtained via a random sample (confirmed by data report)
- IV must have two levels (will check below)
- Dependent variable must be normally distributed (will check below. if issues, note and proceed)
- Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)
Checking IV levels
# # preview the levels and counts for your IV
# table(d$iv, useNA = "always")
#
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
#
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
d <- subset(d, gender != "5")
d <- subset(d, gender != "7")
#
table(d$gender, useNA = "always")
1 2 <NA>
24 73 0
#
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
#
# table(d$iv, useNA = "always")
#
# # check your variable types
# str(d)
#
# # make sure that your IV is recognized as a factor by R
# # if you created a new _rc variable make sure to use that one instead
d$gender <- as.factor(d$gender)Testing Homogeneity of Variance with Levene’s Test
We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!
# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest (survey1~gender, data = d)Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 1.0624 0.3053
95
Issues with My Data
I dropped participants who identified as non-binary or “another gender not listed here”, leaving only male and female participants in the study. I tested homogeneity of variance for the first hypothesis using Levene’s test (p = .968) and that my dependent variable is evenly distributed with a skew and kurtosis between -2 and 2. I tested homogeneity of variance for the second hypothesis using Levene’s test (p = .305) and that my dependent variable is evenly distributed with a skew and kurtosis between -2 and 2.
Run Your Analysis
Run a t-Test
# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$condition)View Test Output
t_output
Welch Two Sample t-test
data: d$survey1 by d$condition
t = 0.3147, df = 94.857, p-value = 0.7537
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-0.09073142 0.12491509
sample estimates:
mean in group 1 mean in group 2
4.237500 4.220408
Calculate Cohen’s d
# # once again, we use our formula to calculate cohen's d
# d_output <- cohen.d(d$pss~d$pet)
# no significant difference, so we dont need to look at effect sizeView Effect Size
- Trivial: < .2
- Small: between .2 and .5
- Medium: between .5 and .8
- Large: > .8
# d_outputRun a t-Test
# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$gender)View Test Output
t_output
Welch Two Sample t-test
data: d$survey1 by d$gender
t = 2.1321, df = 49.582, p-value = 0.03798
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
0.006735557 0.226597776
sample estimates:
mean in group 1 mean in group 2
4.316667 4.200000
Calculate Cohen’s d
# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey1~d$gender)View Effect Size
- Trivial: < .2
- Small: between .2 and .5
- Medium: between .5 and .8
- Large: > .8
d_output
Cohen's d
d estimate: 0.4441989 (small)
95 percent confidence interval:
lower upper
-0.02719842 0.91559619
Write Up Results
t-Test
I tested my hypothesis that adolescents who wrote a short essay about their favorite childhood memory would have a higher average mindful attention awareness score than adolescents who wrote an essay about their favorite college course using a T-test. My data met all of the assumptions of a T-test. However, I did not find a significant difference, t(94.857)=0.315, p=0.754, 95% CI [-0.091,0.125] (refer to Figure 1).
I tested my hypothesis that women would have a higher average mindful attention awareness score than men using a T-test. My data met all of the assumptions of a T-test. I did find a significant difference, t(49.582)=2.132, p=0.0379, d=0.444, 95% CI [0.007,0.226] (refer to Figure 2). This effect size is small according to Cohen (1988).
``` References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.