library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
library(effsize) # for the cohen.d() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(sjPlot) # to visualize our resultsAI Experiment Analysis
Loading Libraries
Importing Data
# import your AI results dataset
d <- read.csv(file="Data/results_final3.csv", header=T)State Your Hypotheses & Chosen Tests
I predict that different messaging of dieting (positive vs negative) will predict a higher intolerance of uncertainty. I will be testing this with a T-Test. I predict that a higher intolerance of uncertainty will be controlled by neuroticism. I believe the relationship will be positive. I will be testing this with Pearson’s Correlation Test.
Check Your Variables
This is just basic variable checking that is used across all HW assignments.
# # to view stats for all variables
describe(d) vars n mean sd median trimmed mad min max range skew
X3.5 1 100 50.50 29.01 50.5 50.50 37.06 1 100.0 99.0 0.00
identity* 2 100 50.50 29.01 50.5 50.50 37.06 1 100.0 99.0 0.00
consent* 3 100 36.59 23.67 35.5 35.61 31.88 1 81.0 80.0 0.26
age 4 100 19.91 1.52 20.0 19.75 1.48 17 30.0 13.0 3.06
race 5 100 4.06 1.33 3.0 3.95 0.00 2 7.0 5.0 0.68
gender 6 100 1.75 0.44 2.0 1.81 0.00 1 2.0 1.0 -1.14
manip_out* 7 100 50.50 29.01 50.5 50.50 37.06 1 100.0 99.0 0.00
survey1 8 100 3.46 0.22 3.5 3.47 0.00 3 4.4 1.4 0.08
survey2* 9 100 7.82 2.10 9.0 7.96 2.97 1 11.0 10.0 -0.95
ai_manip* 10 100 43.85 22.81 45.5 44.92 30.39 1 76.0 75.0 -0.24
condition 11 100 1.50 0.50 1.5 1.50 0.74 1 2.0 1.0 0.00
kurtosis se
X3.5 -1.24 2.90
identity* -1.24 2.90
consent* -1.23 2.37
age 17.66 0.15
race -1.24 0.13
gender -0.71 0.04
manip_out* -1.24 2.90
survey1 3.62 0.02
survey2* 1.12 0.21
ai_manip* -1.25 2.28
condition -2.02 0.05
#
# # we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d, group = "condition")
Descriptive statistics by group
condition: 1
vars n mean sd median trimmed mad min max range skew
X3.5 1 50 25.50 14.58 25.5 25.50 18.53 1 50.0 49.0 0.00
identity 2 50 51.60 29.19 52.0 51.75 37.06 2 98.0 96.0 -0.04
consent 3 50 37.26 24.68 36.0 36.45 34.10 2 81.0 79.0 0.21
age 4 50 19.92 1.78 20.0 19.73 1.48 17 30.0 13.0 3.49
race 5 50 4.22 1.45 3.5 4.15 0.74 2 7.0 5.0 0.41
gender 6 50 1.68 0.47 2.0 1.73 0.00 1 2.0 1.0 -0.75
manip_out 7 50 57.44 28.99 62.0 57.85 36.32 10 100.0 90.0 -0.12
survey1 8 50 3.49 0.24 3.5 3.50 0.00 3 4.4 1.4 0.66
survey2 9 50 8.00 2.02 9.0 8.15 2.22 2 11.0 9.0 -0.86
ai_manip 10 50 42.34 27.00 43.0 43.30 43.00 1 76.0 75.0 -0.18
condition 11 50 1.00 0.00 1.0 1.00 0.00 1 1.0 0.0 NaN
kurtosis se
X3.5 -1.27 2.06
identity -1.30 4.13
consent -1.43 3.49
age 17.87 0.25
race -1.60 0.20
gender -1.47 0.07
manip_out -1.44 4.10
survey1 4.25 0.03
survey2 0.57 0.29
ai_manip -1.60 3.82
condition NaN 0.00
------------------------------------------------------------
condition: 2
vars n mean sd median trimmed mad min max range skew kurtosis
X3.5 1 50 75.50 14.58 75.5 75.50 18.53 51 100 49 0.00 -1.27
identity 2 50 49.40 29.08 49.0 49.27 35.58 1 100 99 0.04 -1.24
consent 3 50 35.92 22.85 34.0 34.77 28.91 1 80 79 0.29 -1.07
age 4 50 19.90 1.23 20.0 19.77 1.48 17 24 7 1.02 2.20
race 5 50 3.90 1.20 3.0 3.75 0.00 3 6 3 0.95 -0.76
gender 6 50 1.82 0.39 2.0 1.90 0.00 1 2 1 -1.62 0.63
manip_out 7 50 43.56 27.60 44.0 43.08 30.39 1 94 93 0.06 -1.18
survey1 8 50 3.43 0.21 3.5 3.45 0.00 3 4 1 -0.90 1.18
survey2 9 50 7.64 2.18 8.0 7.78 1.48 1 11 10 -0.98 1.27
ai_manip 10 50 45.36 17.82 48.0 45.27 22.98 15 74 59 -0.04 -1.33
condition 11 50 2.00 0.00 2.0 2.00 0.00 2 2 0 NaN NaN
se
X3.5 2.06
identity 4.11
consent 3.23
age 0.17
race 0.17
gender 0.05
manip_out 3.90
survey1 0.03
survey2 0.31
ai_manip 2.52
condition 0.00
#
# # also use histograms and scatterplots to examine your continuous variables
hist(d$survey1)plot(d$survey1, d$survey2)Warning in xy.coords(x, y, xlabel, ylabel, log): NAs introduced by coercion
#
# # and table() and cross_cases() to examine your categorical variables
# # you may not need the cross_cases code
table(d$condition)
1 2
50 50
#
# # and boxplot to examine any categorical variables with continuous variables
boxplot(d$survey1~d$condition)#
# #convert any categorical variables to factors
d$condition <- as.factor(d$condition)Check Your Assumptions
t-Test Assumptions
- Data values must be independent (independent t-test only) (confirmed by data report)
- Data obtained via a random sample (confirmed by data report)
- IV must have two levels (will check below)
- Dependent variable must be normally distributed (will check below. if issues, note and proceed)
- Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)
Checking IV levels
# # preview the levels and counts for your IV
table(d$condition, useNA = "always")
1 2 <NA>
50 50 0
#
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
#
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'BAD' should be removed
d <- subset(d, condition != "BAD")
#
table(d$condition, useNA = "always")
1 2 <NA>
50 50 0
#
# # to combine levels
# # this says that where any participant is coded as 'BAD' it should be replaced by 'GOOD'
d$iv_rc[d$condition == "BAD"] <- "GOOD"
#
table(d$condition, useNA = "always")
1 2 <NA>
50 50 0
#
# # check your variable types
str(d)'data.frame': 100 obs. of 12 variables:
$ X3.5 : int 1 2 3 4 5 6 7 8 9 10 ...
$ identity : chr "I’m a 19-year-old white male studying environmental science. I’m passionate about climate change, but often fee"| __truncated__ "I’m a 20-year-old Black woman studying psychology at a state university. I’m passionate about mental health adv"| __truncated__ "I’m a 19-year-old Black woman studying psychology at a large university. I’m passionate about mental health awa"| __truncated__ "I'm a 20-year-old multiracial woman studying psychology at a university in California. I often feel caught betw"| __truncated__ ...
$ consent : chr "I understand the instructions. I'll read the short essay on why dieting is good and important for health and fi"| __truncated__ "I understand the instructions. I will read the short essay on why dieting is beneficial for health and fitness,"| __truncated__ "I understand the instructions. I will read the short essay on why dieting is good and something all people shou"| __truncated__ "I understand the instructions. I'll read the essay on why dieting is beneficial for health and fitness, and the"| __truncated__ ...
$ age : int 19 20 19 20 19 20 19 21 19 20 ...
$ race : int 6 3 3 7 6 3 6 3 4 3 ...
$ gender : int 1 2 2 2 2 2 1 2 2 1 ...
$ manip_out: chr "I have a negative opinion of diets and diet culture. I believe that the pressure to conform to strict dietary r"| __truncated__ "I have a negative opinion of diets and diet culture. They often promote unrealistic body standards and can lead"| __truncated__ "I have a negative opinion of diets and diet culture. They often promote unrealistic standards and quick-fix sol"| __truncated__ "I have a negative opinion of diets and diet culture. I believe that they often promote unrealistic body standar"| __truncated__ ...
$ survey1 : num 3 3.5 3.5 3.8 3 3.5 3.5 3.5 3.5 3.5 ...
$ survey2 : chr "4.333333333" "3.75" "4.666666667" "4.333333333" ...
$ ai_manip : chr "Your intolerance of uncertainty was 3. Your neuroticism score was 6." "Thank you for sharing your insights. Your intolerance of uncertainty score is 3.5, and your neuroticism score is 6." "I have a negative opinion of diets and diet culture. They often promote unrealistic standards and quick-fix sol"| __truncated__ "I maintain my negative view of diets. I understand the emphasis on health and balance, but I feel that the appr"| __truncated__ ...
$ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ iv_rc : chr NA NA NA NA ...
#
# # make sure that your IV is recognized as a factor by R
# # if you created a new _rc variable make sure to use that one instead
d$condition <- as.factor(d$condition)Testing Homogeneity of Variance with Levene’s Test
We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!
# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(survey1~condition, data = d)Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0 1
98
Pearson’s Correlation Coefficient Assumptions
- Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
- Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (will do below)
- Outliers should be identified and removed, or results will be inaccurate (will do below)
- Relationship between the variables should be linear, or they will not be detected (will do below)
Run a Multiple Linear Regression
To check the assumptions for Pearson’s correlation coefficient, we run our regression and then check our diagnostic plots.
# # use the lm() command to run the regression
# # dependent/outcome variable on the left, independent/predictor variables on the right
reg_model <- lm(survey1 ~ survey2, data = d)Check linearity with Residuals vs Fitted plot
For some examples of good Residuals vs Fitted plot and ones that show serious errors, check out this page.
For your homework, you’ll simply need to generate this plot and talk about how your plot compares to the good and problematic plots linked to above. Is it closer to the ‘good’ plots or one of the ‘bad’ plots? This is going to be a judgement call, and that’s okay! In practice, you’ll always be making these judgement calls as part of a team, so this assignment is just about getting experience with it, not making the perfect call.
plot(reg_model, 1)Check for outliers using Cook’s distance and a Residuals vs Leverage plot
For your homework, you’ll simply need to generate these plots, assess Cook’s distance in your dataset, and then identify any potential cases that are prominent outliers.
# # Cook's distance
plot(reg_model, 4)#
# # Residuals vs Leverage
plot(reg_model, 5)Warning: not plotting observations with leverage one:
2, 21, 28, 65
Issues with My Data
Some extreme outliers were found in the data.
Run Your Analysis
Run a t-Test
# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$survey1~d$condition)View Test Output
t_output
Welch Two Sample t-test
data: d$survey1 by d$condition
t = 1.4694, df = 96.457, p-value = 0.145
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-0.02286982 0.15326982
sample estimates:
mean in group 1 mean in group 2
3.4934 3.4282
Calculate Cohen’s d
# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$survey1~d$condition)View Effect Size
- Trivial: < .2
- Small: between .2 and .5
- Medium: between .5 and .8
- Large: > .8
d_output
Cohen's d
d estimate: 0.2938876 (small)
95 percent confidence interval:
lower upper
-0.1051426 0.6929178
Run a Correlation Test
Create a Correlation Matrix
d2 <- subset(d, select=c(survey1,survey2))
str(d2)'data.frame': 100 obs. of 2 variables:
$ survey1: num 3 3.5 3.5 3.8 3 3.5 3.5 3.5 3.5 3.5 ...
$ survey2: chr "4.333333333" "3.75" "4.666666667" "4.333333333" ...
d2$survey1 <- as.numeric(d2$survey1)
d2$survey2 <- as.numeric(d2$survey2)Warning: NAs introduced by coercion
corr_output_m <- corr.test(d2)View Test Output
- Strong effect: Between |0.50| and |1|
- Moderate effect: Between |0.30| and |0.49|
- Weak effect: Between |0.10| and |0.29|
- Trivial effect: Less than |0.09|
corr_output_mCall:corr.test(x = d2)
Correlation matrix
survey1 survey2
survey1 1.00 0.05
survey2 0.05 1.00
Sample Size
survey1 survey2
survey1 100 99
survey2 99 99
Probability values (Entries above the diagonal are adjusted for multiple tests.)
survey1 survey2
survey1 0.0 0.6
survey2 0.6 0.0
To see confidence intervals of the correlations, print with the short=FALSE option
Effect size cutoffs from Cohen (1988): * Trivial: < .1 * Small: between .1 and .3 * Medium: between .3 and .5 * Large: > .5
Write Up Results
t-Test
We tested our hypothesis that messaging of diets would affect intolerance of uncertainty in college aged people using a two sample t-test. Our data supported the hypothesis, t(96.46)=1.47, p= 0.145,d = 0.3, 95% (-0.023,0.15).
Our data has a small effect size according to Cohen (1988).
Correlation Test
We tested our hypothesis that intolerance of uncertainty was going to be controlled by neuroticism and the relationship would be positive. The kurtosis and skew of the variables were all betweem -2 and 2. The relationships were also linear. Neuroticism and Intolerance of Uncertainty were not correlated.
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.