Creating the fake data.
options(scipen = 999) # tells R studio to use decimals instead of expotential notation
# the following lines are used only to create my fake dataset
dating <- c(1,2,3,4,5,3,4,5,6,7,4,5,6,7,8)
group <- c(rep("haircut",5), rep("child",5),rep("puppy",5))
df <- data.frame(group,dating)
library(ggplot2)
library(psych)
# I will start by getting descriptive statistics
describeBy(df$dating, group=df$group)
Descriptive statistics by group
group: child
vars n mean sd median trimmed mad min max range skew
X1 1 5 5 1.58 5 5 1.48 3 7 4 0
kurtosis se
X1 -1.91 0.71
-----------------------------------------------
group: haircut
vars n mean sd median trimmed mad min max range skew
X1 1 5 3 1.58 3 3 1.48 1 5 4 0
kurtosis se
X1 -1.91 0.71
-----------------------------------------------
group: puppy
vars n mean sd median trimmed mad min max range skew
X1 1 5 6 1.58 6 6 1.48 4 8 4 0
kurtosis se
X1 -1.91 0.71
# Running an ANOVA
anova.results <- aov(dating ~ group, data=df)
anova(anova.results)
Analysis of Variance Table
Response: dating
Df Sum Sq Mean Sq F value Pr(>F)
group 2 23.333 11.667 4.6667 0.03168 *
Residuals 12 30.000 2.500
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Our results are significant (p = .03). Therefore, we should follow up the results with post hoc testing. I chose to test all pairwise comparisons. I will use a no correction but will divide the alpha by the number of tests to protect the familywise alpha.
# Conducting post hoc tests (I will test all pairwise comparisons)
pairwise.t.test(x=df$dating, g=df$group, p.adjust.method="none")
Pairwise comparisons using t tests with pooled SD
data: df$dating and df$group
child haircut
haircut 0.069 -
puppy 0.337 0.011
P value adjustment method: none
# other options for p.adjust.method in include "bonf"
# note that R multiplies the p-values by the number of tests instead of adjust alpha, so you should interpret the tests at your usual alpha (.05)
The pairwise.t.test function does not have an option for a Tukey correction. We can use the TukeyHSD function isntead.
# remember that we saved the results of our ANOVA as anova.results
TukeyHSD(anova.results)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = dating ~ group, data = df)
$group
diff lwr upr p adj
haircut-child -2 -4.6678637 0.6678637 0.1545800
puppy-child 1 -1.6678637 3.6678637 0.5907706
puppy-haircut 3 0.3321363 5.6678637 0.0277219
Notice that both results agree (this will not always be the case, so we should choose only one to run). The post hoc tests show that the only difference is between the puppy and haircut groups. Neither of those groups differed from the child group. Now I will graph the results.
ggplot(data=df, aes(x=group, y=dating, fill=group)) +
geom_bar(stat="summary", fun.y="mean") +
stat_summary(geom = "errorbar", fun.data = mean_se, width=.4) +
scale_fill_manual(values=c("darkgoldenrod1", "coral", "azure3"))