library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results
AI Experiment Analysis
Loading Libraries
Importing Data
# # import your AI results dataset
<- read.csv(file="Data/final_results(results_c1 (2)).csv", header=T) d
State Your Hypotheses & Chosen Tests
H1:I predict participants who engage in regular exercise will report lower levels of stress than those who do not exercise.
H2:I predict that participants with higher levels of anxiety will report higher levels of stress overall, regardless of their exercise habits.
tests- H1: Independent samples t-test
H2:correlation
H2(control):Multiple Regression
Check Your Variables
This is just basic variable checking that is used across all HW assignments.
# # to view stats for all variables
describe(d)
vars n mean sd median trimmed mad min max range skew
id 1 100 50.50 29.01 50.50 50.50 37.06 1.00 100.00 99.00 0.00
identity* 2 100 50.50 29.01 50.50 50.50 37.06 1.00 100.00 99.00 0.00
consent* 3 100 1.92 0.27 2.00 2.00 0.00 1.00 2.00 1.00 -3.05
age 4 100 37.88 11.40 34.00 36.29 2.97 18.00 80.00 62.00 1.53
race 5 100 4.54 1.60 5.00 4.61 1.48 1.00 7.00 6.00 -0.19
gender 6 100 1.90 0.46 2.00 1.96 0.00 1.00 5.00 4.00 2.09
manip_out* 7 100 28.84 11.37 31.00 29.30 2.22 1.00 53.00 52.00 -0.47
survey1 8 100 2.86 0.28 2.71 2.83 0.11 2.14 3.57 1.43 0.85
survey2 9 100 2.70 0.19 2.75 2.72 0.00 2.00 3.25 1.25 -0.87
ai_manip* 10 100 50.50 29.01 50.50 50.50 37.06 1.00 100.00 99.00 0.00
condition 11 100 1.50 0.50 1.50 1.50 0.74 1.00 2.00 1.00 0.00
kurtosis se
id -1.24 2.90
identity* -1.24 2.90
consent* 7.38 0.03
age 2.14 1.14
race -1.59 0.16
gender 19.42 0.05
manip_out* 0.28 1.14
survey1 0.50 0.03
survey2 3.19 0.02
ai_manip* -1.24 2.90
condition -2.02 0.05
#
# # we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = "condition")
Descriptive statistics by group
condition: 1
vars n mean sd median trimmed mad min max range skew
id 1 50 25.50 14.58 25.50 25.50 18.53 1.00 50.00 49.00 0.00
identity 2 50 46.52 30.31 51.50 46.00 36.32 1.00 100.00 99.00 0.00
consent 3 50 1.98 0.14 2.00 2.00 0.00 1.00 2.00 1.00 -6.65
age 4 50 36.72 11.01 34.00 35.10 2.97 21.00 77.00 56.00 1.73
race 5 50 4.40 1.68 4.00 4.50 2.97 1.00 7.00 6.00 -0.17
gender 6 50 1.92 0.57 2.00 1.95 0.00 1.00 5.00 4.00 2.63
manip_out 7 50 26.34 15.44 25.50 26.30 20.02 1.00 52.00 51.00 0.03
survey1 8 50 2.86 0.28 2.71 2.82 0.00 2.14 3.57 1.43 0.80
survey2 9 50 2.74 0.17 2.75 2.75 0.00 2.25 3.25 1.00 -0.65
ai_manip 10 50 34.44 26.54 27.50 31.45 26.69 1.00 100.00 99.00 0.82
condition 11 50 1.00 0.00 1.00 1.00 0.00 1.00 1.00 0.00 NaN
kurtosis se
id -1.27 2.06
identity -1.32 4.29
consent 43.12 0.02
age 2.88 1.56
race -1.59 0.24
gender 15.56 0.08
manip_out -1.33 2.18
survey1 0.71 0.04
survey2 2.87 0.02
ai_manip -0.29 3.75
condition NaN 0.00
------------------------------------------------------------
condition: 2
vars n mean sd median trimmed mad min max range skew
id 1 50 75.50 14.58 75.50 75.50 18.53 51.00 100.00 49.00 0.00
identity 2 50 54.48 27.38 48.50 54.35 35.58 10.00 98.00 88.00 0.09
consent 3 50 1.86 0.35 2.00 1.95 0.00 1.00 2.00 1.00 -2.01
age 4 50 39.04 11.76 34.00 37.45 2.97 18.00 80.00 62.00 1.33
race 5 50 4.68 1.52 6.00 4.70 1.48 2.00 7.00 5.00 -0.15
gender 6 50 1.88 0.33 2.00 1.98 0.00 1.00 2.00 1.00 -2.27
manip_out 7 50 31.34 3.14 31.00 31.00 0.00 30.00 53.00 23.00 6.55
survey1 8 50 2.87 0.28 2.71 2.84 0.21 2.29 3.57 1.29 0.88
survey2 9 50 2.67 0.19 2.75 2.70 0.00 2.00 3.25 1.25 -1.00
ai_manip 10 50 66.56 21.68 68.50 67.65 23.72 14.00 99.00 85.00 -0.42
condition 11 50 2.00 0.00 2.00 2.00 0.00 2.00 2.00 0.00 NaN
kurtosis se
id -1.27 2.06
identity -1.41 3.87
consent 2.10 0.05
age 1.47 1.66
race -1.77 0.21
gender 3.21 0.05
manip_out 42.27 0.44
survey1 0.15 0.04
survey2 3.15 0.03
ai_manip -0.75 3.07
condition NaN 0.00
#
# # also use histograms and scatterplots to examine your continuous variables
hist(d$survey1)
hist(d$survey2)
plot(d$survey1, d$survey2)
#
# # and table() and cross_cases() to examine your categorical variables
# # you may not need the cross_cases code
table(d$condition)
1 2
50 50
# cross_cases(d, IV1, IV2)
#
# # and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey1 ~ d$condition)
#
# #convert any categorical variables to factors
$condition <- as.factor(d$condition) d
Check Your Assumptions
t-Test Assumptions
- Data values must be independent (independent t-test only) (confirmed by data report)
- Data obtained via a random sample (confirmed by data report)
- IV must have two levels (will check below)
- Dependent variable must be normally distributed (will check below. if issues, note and proceed)
- Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)
Checking IV levels
# # preview the levels and counts for your IV
# table(d$iv, useNA = "always")
#
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
#
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
# d <- subset(d, IV != "BAD")
#
# table(d$iv, useNA = "always")
#
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
#
# table(d$iv, useNA = "always")
#
# # check your variable types
# str(d)
#
# # make sure that your IV is recognized as a factor by R
# # if you created a new _rc variable make sure to use that one instead
# d$iv <- as.factor(d$iv)
Testing Homogeneity of Variance with Levene’s Test
We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!
# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
# leveneTest(dv~iv, data = d)
Pearson’s Correlation Coefficient Assumptions
- Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
- Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (will do below)
- Outliers should be identified and removed, or results will be inaccurate (will do below)
- Relationship between the variables should be linear, or they will not be detected (will do below)
Run a Multiple Linear Regression
To check the assumptions for Pearson’s correlation coefficient, we run our regression and then check our diagnostic plots.
# # use the lm() command to run the regression
# # dependent/outcome variable on the left, independent/predictor variables on the right
<- lm(survey2 ~ survey1, data = d) reg_model
Check linearity with Residuals vs Fitted plot
For some examples of good Residuals vs Fitted plot and ones that show serious errors, check out this page.
For your homework, you’ll simply need to generate this plot and talk about how your plot compares to the good and problematic plots linked to above. Is it closer to the ‘good’ plots or one of the ‘bad’ plots? This is going to be a judgement call, and that’s okay! In practice, you’ll always be making these judgement calls as part of a team, so this assignment is just about getting experience with it, not making the perfect call.
plot(reg_model, 1)
Check for outliers using Cook’s distance and a Residuals vs Leverage plot
For your homework, you’ll simply need to generate these plots, assess Cook’s distance in your dataset, and then identify any potential cases that are prominent outliers.
# # Cook's distance
#
# # Residuals vs Leverage
plot(reg_model, 4)
plot(reg_model, 5)
Issues with My Data
Describe any issues and why they’re problematic here.
Run Your Analysis
Run a t-Test
# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
"t_output <-t.test(d$survey1-d$condition)"
[1] "t_output <-t.test(d$survey1-d$condition)"
"t-output"
[1] "t-output"
View Test Output
str(d)
'data.frame': 100 obs. of 11 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ identity : chr "I\x82\xc4\xf4m 34, an Asian female who works as a graphic designer. Lately, I\x82\xc4\xf4ve been feeling overwhelmed by work de "I'm 59 and Black, navigating life with a mix of resilience and anxiety. I'm a single mother of two, balancing work and family c "I\x82\xc4\xf4m 34, a White female marketing manager living in Chicago. Balancing work and my side hobby of painting has been to "I\x82\xc4\xf4m 32, a Black woman navigating life as a social worker. Balancing my passion for helping others with my own strugg ...
$ consent : chr "I understand these instructions." "I understand these instructions." "I understand these instructions." "I understand these instructions." ...
$ age : int 34 59 34 32 65 36 32 32 22 57 ...
$ race : int 2 3 6 3 3 6 6 1 6 3 ...
$ gender : int 2 2 2 2 2 2 2 2 1 2 ...
$ manip_out: chr "**Two-Week Structured Exercise Program**\n\n**Week 1:**\n\n**Day 1: Monday**\n- **Activity:** 30-minute brisk walk outside foll "It's great to hear that you're considering taking part in a structured exercise program! Engaging in regular physical activity "That sounds like an interesting and beneficial program! Here's how I would plan and approach the next two weeks of participatin "That sounds like a meaningful exercise program, and it\x82\xc4\xf4s great that you're integrating physical activity into your r ...
$ survey1 : num 2.71 2.71 2.71 2.86 2.14 ...
$ survey2 : num 2.75 2.75 2.75 2.75 2.75 2.75 2.75 3 2.75 2.75 ...
$ ai_manip : chr "Thank you for sharing your two-week structured exercise program! It seems like a well-thought-out approach to managing your anx "It sounds like you\x82\xc4\xf4re navigating a lot, and I applaud your resilience in managing your responsibilities as a single "It sounds like you\x82\xc4\xf4ve made some great strides in managing your stress and anxiety through your structured exercise p "It sounds like you're on a meaningful journey towards better mental and physical well-being, supporting both yourself and your ...
$ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
str
function (object, ...)
UseMethod("str")
<bytecode: 0x5d8b9451b5e8>
<environment: namespace:utils>
str(d$condition)
Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
<- t.test(d$survey1 ~ d$condition)
t_output print(t_output)
Welch Two Sample t-test
data: d$survey1 by d$condition
t = -0.15464, df = 97.993, p-value = 0.8774
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-0.1185658 0.1014230
sample estimates:
mean in group 1 mean in group 2
2.857143 2.865714
t_output
Welch Two Sample t-test
data: d$survey1 by d$condition
t = -0.15464, df = 97.993, p-value = 0.8774
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-0.1185658 0.1014230
sample estimates:
mean in group 1 mean in group 2
2.857143 2.865714
Calculate Cohen’s d
# # once again, we use our formula to calculate cohen's d
<- cohen.d(d$survey1~d$condition) d_output
View Effect Size
- Trivial: < .2
- Small: between .2 and .5
- Medium: between .5 and .8
- Large: > .8
d_output
Cohen's d
d estimate: -0.03092837 (negligible)
95 percent confidence interval:
lower upper
-0.4278456 0.3659888
Run a Correlation Test
Create a Correlation Matrix
<- subset(d, select=c(survey1,survey2))
d2
# corr_output_m <- corr.test(d2)
View Test Output
- Strong effect: Between |0.50| and |1|
- Moderate effect: Between |0.30| and |0.49|
- Weak effect: Between |0.10| and |0.29|
- Trivial effect: Less than |0.09|
# corr_output_m
Run an ANOVA
# aov_model <- aov_ez(data = d,
# id = "X",
# between = c("IV1"),
# dv = "pss",
# anova_table = list(es = "pes"))
#
# aov_model2 <- aov_ez(data = d2,
# id = "X",
# between = c("IV1","IV2"),
# dv = "pss",
# anova_table = list(es = "pes"))
View Output
Effect size cutoffs from Cohen (1988):
- η2 = 0.01 indicates a small effect
- η2 = 0.06 indicates a medium effect
- η2 = 0.14 indicates a large effect
# nice(aov_model)
#
# nice(aov_model2)
Visualize Results
# afex_plot(aov_model, x = "IV1")
#
# afex_plot(aov_model2, x = "IV1", trace = "IV2")
# afex_plot(aov_model2, x = "IV2", trace = "IV1")
Run Posthoc Tests (One-Way)
Only run posthocs if the test is significant! E.g., only run the posthoc tests on gender if there is a main effect for gender.
# emmeans(aov_model, specs="IV1", adjust="tukey")
# pairs(emmeans(aov_model, specs="IV1", adjust="tukey"))
Run Posthoc Tests (Two-Way)
Only run posthocs if the test is significant! E.g., only run the posthoc tests on gender if there is a main effect for gender.
# emmeans(aov_model, specs="IV1", adjust="tukey")
# pairs(emmeans(aov_model, specs="IV1", adjust="tukey"))
#
# emmeans(aov_model2, specs="IV2", adjust="tukey")
# pairs(emmeans(aov_model2, specs="IV2", adjust="tukey"))
#
# emmeans(aov_model2, specs="IV1", by="IV2", adjust="sidak")
# pairs(emmeans(aov_model2, specs="IV2", by="IV1", adjust="sidak"))
#
# emmeans(aov_model2, specs="IV2", by="IV1", adjust="sidak")
# pairs(emmeans(aov_model2, specs="IV2", by="IV1", adjust="sidak"))
Run a Multiple Linear Regression
You already ran this!
View Test Output
Effect size cutoffs from Cohen (1988): * Trivial: < .1 * Small: between .1 and .3 * Medium: between .3 and .5 * Large: > .5
# summary(reg_model)
Write Up Results
t-Test
Results: t = 0.98084, df = 97.18, p-value = 0.3291
The p-value is greater than 0.05, indicating that there is no statistically significant difference in the PSS scores between the two groups. This suggests that exercise did not significantly impact stress levels in the sample.
Correlation Test
Write-up of your results goes here. Check past labs/HWs for template. Depending on how many variables you have here, I may need to help you tweak your table output.
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.