library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our results
library(ggbeeswarm) AI Experiment Analysis
Loading Libraries
Importing Data
# import your AI results dataset
d <- read.csv(file="Data/Final_Results1.csv", header=T)State Your Hypotheses & Chosen Tests
H1: I predict that the participants writing about the idea of co-habiting will report higher levels of social support than the participants writing about living alone. (Note: their own personal living status will not determine the prompt they receive) (T-Test)
H2:I predict that married couples will have the highest amount of social support, non-marital relationships will have mediocre social support, and single people will have the lowest amount of social support. (ANOVA)
Check Your Variables
This is just basic variable checking that is used across all HW assignments.
# to view stats for all variables
describe(d) vars n mean sd median trimmed mad min max range skew
id 1 100 50.50 29.01 50.50 50.50 37.06 1.00 100 99.00 0.00
identity* 2 100 50.50 29.01 50.50 50.50 37.06 1.00 100 99.00 0.00
consent* 3 100 50.50 29.01 50.50 50.50 37.06 1.00 100 99.00 0.00
age 4 100 40.02 14.63 34.00 37.30 2.97 20.00 99 79.00 1.82
race 5 100 4.64 1.57 6.00 4.76 0.00 2.00 7 5.00 -0.33
gender 6 100 1.97 0.17 2.00 2.00 0.00 1.00 2 1.00 -5.43
manip_out* 7 100 50.50 29.01 50.50 50.50 37.06 1.00 100 99.00 0.00
survey1 8 100 8.00 0.50 7.92 7.99 0.25 6.67 10 3.33 0.79
survey2* 9 100 3.75 0.77 4.00 3.91 0.00 1.00 5 4.00 -1.91
survey3* 10 100 1.20 0.40 1.00 1.12 0.00 1.00 2 1.00 1.48
ai_manip* 11 100 50.50 29.01 50.50 50.50 37.06 1.00 100 99.00 0.00
condition 12 100 1.50 0.50 1.50 1.50 0.74 1.00 2 1.00 0.00
kurtosis se
id -1.24 2.90
identity* -1.24 2.90
consent* -1.24 2.90
age 3.07 1.46
race -1.57 0.16
gender 27.74 0.02
manip_out* -1.24 2.90
survey1 3.87 0.05
survey2* 3.22 0.08
survey3* 0.19 0.04
ai_manip* -1.24 2.90
condition -2.02 0.05
# we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = "condition")
Descriptive statistics by group
condition: 1
vars n mean sd median trimmed mad min max range skew
id 1 50 25.50 14.58 25.50 25.50 18.53 1.00 50 49.00 0.00
identity 2 50 53.16 29.88 56.50 53.35 42.25 4.00 100 96.00 -0.07
consent 3 50 50.72 30.49 51.50 50.55 38.55 3.00 99 96.00 0.01
age 4 50 42.16 16.37 34.00 39.12 3.71 26.00 99 73.00 1.59
race 5 50 4.32 1.58 4.00 4.38 2.97 2.00 7 5.00 0.02
gender 6 50 1.98 0.14 2.00 2.00 0.00 1.00 2 1.00 -6.65
manip_out 7 50 31.56 16.49 30.50 31.40 20.76 4.00 61 57.00 0.10
survey1 8 50 8.05 0.56 7.92 8.00 0.25 6.75 10 3.25 1.12
survey2 9 50 3.72 0.88 4.00 3.88 0.00 1.00 5 4.00 -1.72
survey3 10 50 1.16 0.37 1.00 1.07 0.00 1.00 2 1.00 1.80
ai_manip 11 50 57.44 29.03 63.50 59.02 31.13 4.00 98 94.00 -0.47
condition 12 50 1.00 0.00 1.00 1.00 0.00 1.00 1 0.00 NaN
kurtosis se
id -1.27 2.06
identity -1.36 4.23
consent -1.33 4.31
age 1.80 2.32
race -1.63 0.22
gender 43.12 0.02
manip_out -1.18 2.33
survey1 3.02 0.08
survey2 2.36 0.12
survey3 1.26 0.05
ai_manip -1.10 4.11
condition NaN 0.00
------------------------------------------------------------
condition: 2
vars n mean sd median trimmed mad min max range skew
id 1 50 75.50 14.58 75.50 75.50 18.53 51.00 100.00 49.00 0.00
identity 2 50 47.84 28.16 46.00 47.65 35.58 1.00 97.00 96.00 0.05
consent 3 50 50.28 27.76 49.00 50.60 34.84 1.00 100.00 99.00 -0.02
age 4 50 37.88 12.45 34.00 35.95 2.97 20.00 82.00 62.00 1.93
race 5 50 4.96 1.50 6.00 5.12 0.00 2.00 7.00 5.00 -0.72
gender 6 50 1.96 0.20 2.00 2.00 0.00 1.00 2.00 1.00 -4.55
manip_out 7 50 69.44 26.39 75.50 73.92 18.53 1.00 100.00 99.00 -1.31
survey1 8 50 7.96 0.42 7.92 7.99 0.25 6.67 9.33 2.66 -0.35
survey2 9 50 3.78 0.65 4.00 3.95 0.00 2.00 5.00 3.00 -1.98
survey3 10 50 1.24 0.43 1.00 1.18 0.00 1.00 2.00 1.00 1.18
ai_manip 11 50 43.56 27.56 40.00 41.98 25.95 1.00 100.00 99.00 0.47
condition 12 50 2.00 0.00 2.00 2.00 0.00 2.00 2.00 0.00 NaN
kurtosis se
id -1.27 2.06
identity -1.17 3.98
consent -1.22 3.93
age 4.17 1.76
race -1.21 0.21
gender 19.13 0.03
manip_out 0.96 3.73
survey1 2.79 0.06
survey2 3.04 0.09
survey3 -0.62 0.06
ai_manip -0.77 3.90
condition NaN 0.00
# also use histograms and scatterplots to examine your continuous variables
hist(d$survey1)# and table() and cross_cases() to examine your categorical variables
# you may not need the cross_cases code
table(d$condition)
1 2
50 50
cross_cases(d, condition, survey2)| survey2 | |||||
|---|---|---|---|---|---|
| Boyfriend or Girlfriend | Dating | Divorced | Single | Widowed | |
| condition | |||||
| 1 | 2 | 5 | 1 | 39 | 3 |
| 2 | 5 | 2 | 42 | 1 | |
| #Total cases | 2 | 10 | 3 | 81 | 4 |
# and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey1~d$condition)#convert any categorical variables to factors
d$condition <- as.factor(d$condition)
## condition = iv
## survey 1 = dv
## survey 2 = tv (third variable)Check Your Assumptions
T-Test Assumptions
- Data values must be independent (independent t-test only) (confirmed by data report)
- Data obtained via a random sample (confirmed by data report)
- IV must have two levels (will check below)
- Dependent variable must be normally distributed (will check below. if issues, note and proceed)
- Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)
Checking IV levels
# preview the levels and counts for your IV
table(d$condition, useNA = "always")
1 2 <NA>
50 50 0
# note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
# to drop levels from your variable
# this subsets the data and says that any participant who is coded as 'BAD' should be removed
# d <- subset(d, IV != "BAD")
# table(d$iv, useNA = "always")
# to combine levels
# this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
# table(d$iv, useNA = "always")
# check your variable types
str(d)'data.frame': 100 obs. of 12 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ identity : chr "I'm a 32-year-old White woman living in a cozy apartment in Portland with my younger sister. I enjoy hiking and"| __truncated__ "I'm a 34-year-old White woman named Emily. I live with my younger sister in a cozy apartment in Portland. I'm a"| __truncated__ "I'm a 32-year-old Latina woman living in a cozy apartment in Austin, Texas, with my younger brother. I work as "| __truncated__ "I'm a 99-year-old multiracial woman living in a cozy apartment with my younger sister. We often host game night"| __truncated__ ...
$ consent : chr "I understand the instructions. I'll be writing my responses to the prompt on the provided paper using the chose"| __truncated__ "I understand the instructions. I will write in response to the prompt on the provided piece of paper with my ch"| __truncated__ "I understand the instructions clearly. I will be participating in a study about perceptions of living status, w"| __truncated__ "I understand the instructions. I'm ready to respond to the prompt on the provided piece of paper and will be pr"| __truncated__ ...
$ age : int 32 34 32 99 35 34 29 70 34 57 ...
$ race : int 6 6 4 7 6 4 4 3 6 3 ...
$ gender : int 2 2 2 2 2 2 2 2 2 2 ...
$ manip_out: chr "If I were to imagine myself as a 32-year-old White woman living with someone else in Portland, it seems only fi"| __truncated__ "If I were to imagine myself living with someone else, I would choose to share my cozy apartment in Portland wit"| __truncated__ "In this scenario, let’s imagine that I’m living with my best friend, Mia, who shares a similar creative spirit "| __truncated__ "If I were a 99-year-old multiracial woman living with someone else, I would choose to live with my best friend "| __truncated__ ...
$ survey1 : num 8.17 7.92 7.75 9.25 7.08 ...
$ survey2 : chr "Single" "Single" "Single" "Dating" ...
$ survey3 : chr "Co-habiting" "Co-habiting" "Co-habiting" "Co-habiting" ...
$ ai_manip : chr "Thank you for sharing your detailed insights! Your responses reflect a thoughtful consideration of the dynamics"| __truncated__ "I answered the questions based on my belief that living with someone like Mia would enhance my social support a"| __truncated__ "I described my responses based on the positive dynamics I foresee living with my best friend, emphasizing the c"| __truncated__ "Thank you for this opportunity! I answered the questions based on the deep bonds I share with my sister and par"| __truncated__ ...
$ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$survey2 <- as.factor(d$survey2)Testing Homogeneity of Variance with Levene’s Test
We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!
# use the leveneTest() command from the car package to test homogeneity of variance
# uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey1~condition, data = d)Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.5342 0.4666
98
ANOVA Assumptions
- DV should be normally distributed across levels of the IV
- All levels of the IVs should have equal number of cases and there should be no empty cells. Cells with low numbers decrease the power of the test (increase change of Type II error)
- Homogeneity of variance should be assured
- Outliers should be identified and removed
- If you have confirmed everything about, the sampling distribution should be normal. (For a demonstration of what the sampling distribution is, go here.)
Check levels of IVs and combine/drop if needed
# preview the levels and counts for your IV
table(d$condition, useNA = "always")
1 2 <NA>
50 50 0
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
#
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
# d <- subset(d, IV != "BAD")
#
# table(d$iv, useNA = "always")
#
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
# d$iv_rc[d$iv == "BAD"] <- "GOOD"
#
# table(d$iv, useNA = "always")
#
# check your variable types
str(d)'data.frame': 100 obs. of 12 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ identity : chr "I'm a 32-year-old White woman living in a cozy apartment in Portland with my younger sister. I enjoy hiking and"| __truncated__ "I'm a 34-year-old White woman named Emily. I live with my younger sister in a cozy apartment in Portland. I'm a"| __truncated__ "I'm a 32-year-old Latina woman living in a cozy apartment in Austin, Texas, with my younger brother. I work as "| __truncated__ "I'm a 99-year-old multiracial woman living in a cozy apartment with my younger sister. We often host game night"| __truncated__ ...
$ consent : chr "I understand the instructions. I'll be writing my responses to the prompt on the provided paper using the chose"| __truncated__ "I understand the instructions. I will write in response to the prompt on the provided piece of paper with my ch"| __truncated__ "I understand the instructions clearly. I will be participating in a study about perceptions of living status, w"| __truncated__ "I understand the instructions. I'm ready to respond to the prompt on the provided piece of paper and will be pr"| __truncated__ ...
$ age : int 32 34 32 99 35 34 29 70 34 57 ...
$ race : int 6 6 4 7 6 4 4 3 6 3 ...
$ gender : int 2 2 2 2 2 2 2 2 2 2 ...
$ manip_out: chr "If I were to imagine myself as a 32-year-old White woman living with someone else in Portland, it seems only fi"| __truncated__ "If I were to imagine myself living with someone else, I would choose to share my cozy apartment in Portland wit"| __truncated__ "In this scenario, let’s imagine that I’m living with my best friend, Mia, who shares a similar creative spirit "| __truncated__ "If I were a 99-year-old multiracial woman living with someone else, I would choose to live with my best friend "| __truncated__ ...
$ survey1 : num 8.17 7.92 7.75 9.25 7.08 ...
$ survey2 : Factor w/ 5 levels "Boyfriend or Girlfriend",..: 4 4 4 2 4 4 4 5 4 4 ...
$ survey3 : chr "Co-habiting" "Co-habiting" "Co-habiting" "Co-habiting" ...
$ ai_manip : chr "Thank you for sharing your detailed insights! Your responses reflect a thoughtful consideration of the dynamics"| __truncated__ "I answered the questions based on my belief that living with someone like Mia would enhance my social support a"| __truncated__ "I described my responses based on the positive dynamics I foresee living with my best friend, emphasizing the c"| __truncated__ "Thank you for this opportunity! I answered the questions based on the deep bonds I share with my sister and par"| __truncated__ ...
$ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
# make sure that your IV is recognized as a factor by R
# if you created a new _rc variable make sure to use that one instead
d$survey2 <- as.factor(d$survey2)Run a Multiple Linear Regression
To check the assumptions for an ANOVA, we run our regression and then check our diagnostic plots.
# use the lm() command to run the regression
# dependent/outcome variable on the left, independent/predictor variables on the right
reg_model <- lm(survey1 ~ survey2, data = d)Check for outliers using Cook’s distance and a Residuals vs Leverage plot
For your homework, you’ll simply need to generate these plots, assess Cook’s distance in your dataset, and then identify any potential cases that are prominent outliers. Since we have some cutoffs, that makes this process is a bit less subjective than some of the other assessments we’ve done here, which is a nice change!
# Cook's distance
plot(reg_model, 4)# Residuals vs Leverage
plot(reg_model, 5)Check homogeneity of variance in a Scale-Location plot
You can check out this page for some other examples of this type of plot. (Notice that the Scale-Location plot is the third in the grids.)
For your homework, you’ll simply need to generate this plot and talk about how your plot compares to the ones pictured. Is it closer to the ‘good’ plots or one of the ‘bad’ plots? Again, this is a judgement call! It’s okay if feel uncertain, and you won’t be penalized for that.
plot(reg_model, 3)Issues with My Data
*assumption or homogenity issues
Run Your Analysis
Run a T-Test
# very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$condition)View Test Output
t_output
Welch Two Sample t-test
data: d$survey1 by d$condition
t = 0.89851, df = 90.549, p-value = 0.3713
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-0.1079817 0.2863332
sample estimates:
mean in group 1 mean in group 2
8.045842 7.956667
Calculate Cohen’s d
# once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey1~d$condition)View Effect Size
- Trivial: < .2
- Small: between .2 and .5
- Medium: between .5 and .8
- Large: > .8
d_output
Cohen's d
d estimate: 0.1797027 (negligible)
95 percent confidence interval:
lower upper
-0.2179911 0.5773964
# small effect size?Run an ANOVA
aov_model <- aov_ez(data = d,
id = "id",
between = c("survey2"),
dv = "survey1",
anova_table = list(es = "pes"))Contrasts set to contr.sum for the following variables: survey2
View Output
Effect size cutoffs from Cohen (1988):
- η2 = 0.01 indicates a small effect
- η2 = 0.06 indicates a medium effect
- η2 = 0.14 indicates a large effect
nice(aov_model)Anova Table (Type 3 tests)
Response: survey1
Effect df MSE F pes p.value
1 survey2 4, 95 0.18 10.02 *** .297 <.001
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
Visualize Results
afex_plot(aov_model, x = "survey2")Run Posthoc Tests (One-Way)
Only run posthocs if the test is significant! E.g., only run the posthoc tests on gender if there is a main effect for gender.
# emmeans(aov_model, specs="IV1", adjust="tukey")
# pairs(emmeans(aov_model, specs="IV1", adjust="tukey"))Write Up Results
T-Test
Write-up of your results goes here. Check past labs/HWs for template.
One-Way ANOVA
Write-up of your results goes here. Check past labs/HWs for template.
Boyfriend or Girlfriend Dating Divorced
2 10 3
Single Widowed
81 4
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.