Introduction
This notebook outlines analyses conducted as part of a coding challenge in The Categorization and Cognitive Science Lab. The data considered below is from a series of studies in which participants were asked to complete four blocks of either a rule-defined (RD) or information integration (II) category learning task. Participants were also asked to provide responses to a number of situational and demographic-based items.
The following file contains all of the data that was collected.
Data = read.csv("ModifiedFullData.csv", header = TRUE)Some notes regarding the names of variables:
Delete - an indication of whether the participant was to be included in the analysis (coded as “0”) or not (coded as “1”).
UniqueSubjNum - subject numbers that uniquely identify participants in the dataset.
SubjNum - subject numbers that identify participants within each independent experiment.
ExpName - the name of the seven experiments included in the data set.
Cat - an indication of the category set (rule-defined [RD] or information integration [II]) that participants were randomly assigned to learn.
Condition - the name of the conditions in each of the individual experiments.
SDtoday and FRtoday - dummy coded versions of the Cat variable
Time - the time of day that testing occurred.
Time2 - an ordinal version of the Time variable.
Date - the date of testing.
DayofWeek - an ordinal version of the Date variable.
Month - the month in which testing occurred.
NumSubs - the number of subjects tested during each testing session.
SameGeNder and SameGenWhat - an indication of whether participants who were tested in a group were of the same gender (“1”) or not (“0”) and, if so, what gender they were.
CellPhone and Internet - an indication of whether participants were (“1”) or were not (“0”) observed accessing either their cellphone or the internet during the testing session.
PaidPool - an indication of whether participants were recruited from the psychology subject pool or were, instead, paid participants.
Late - a measure of how many minutes late a participant was for their scheduled testing session.
SignUp -
Age - participants’ self-reported age in years.
Gender - participants’ self-reported gender.
NativeLang, SecondLang, and AdditionalLang - participants’ self-reported native, second, and/or additional language(s).
SecondProficiency - participants’ self-reported level of second language proficiency, scored from 0 (low proficiency) to 4 (high proficiency).
Bilingual - an indication of whether a participant identified as bilingual (“1”) or not (“0”).
AcademicYear - participants’ academic year of study.
ExamLastWeek - an indication of whether a participant had written an exam in the week previous to testing (“1”) or not (“0”).
ExamnextWeek - an indication of whether a participant would (“1”) or would not (“0”) be writing an exam in the week after testing.
BusyDay - an indication of whether the testing day was (“1”) or was not (“0”) a busy day for each participant.
ClassBefore and ClassAfter - an indication of whether a participant did (“1”) or did not (“0”) have a class prior to or after the testing session.
FirstExp and OtherExps - an indication of if the study was (“1”) or was not (“0”) the first study a subject had participated and, if not, how many other studies they had previously participated in.
LastMealWhen and LastMealWhat - a measure of how many hours it had been since the participant had previously eaten and what they had eaten at the time they last ate.
Breakfast and BreakfastWhat - an indication of whether a participant did (“1”) or did not (“0”) eat breakfast on the day of testing and, if so, what their breakfast consisted of.
Alcohol and DrinkPerWeek - an indication of whether a participant does (“1”) or does not (“0”) drink alcohol and, if so, how many drinks per week they typically consume.
CoffeeTea - an indication of whether a participant is (“1”) or is not (“0”) a regular coffee or tea drinker.
Exercise and ExerciseFreq - an indication of whether a participant does (“1”) or does not (“0”) exercise on a regular basis and, if so, how many times they typically exercise per week.
SleepAvg and SleepLastNight - a measure of the number of hours a participant typically sleeps for each night and the number of hours they slept for the night prior to testing.
Tired - a measure of self-reported tiredness, scored from 1 (not tired) to 7 (very tired).
ExpDifficulty - a measure of self-reported task difficulty, scored from 1 (easy) to 7 (difficult).
GiveUp - an indication of whether a participant did (“1”) or did not (“0”) give up during the study.
MostlyGuess - an indication of whether a participant did (“1”) or did not (“0”) report that they “mostly guessed” during the study.
X1_Block to X4_Block - proportion of items responded to correctly for blocks 1 to 4 of the category learning task.
Total - proportion of items responded to correctly across all four blocks of the category learning task.
Problem 1
The first set of analyses will involve the calculation of descriptive statistics and the production of some basic figures. For these purposes, we will focus primarily on the following variables:
Total - a continuous dependent variable (DV).
Cat - a nominal independent variable (IV).
Month, Time2, and DayofWeek - ordinal IV’s. Note that the data collected in May was collected as part of an unrelated pilot study. Therefore, we will remove the May data from our data set.
MayData = subset(Data, Month != "05_May")
CatData = droplevels(MayData)All subsequent analyses will be conducted on this CatData data set.
Analysis Prep
Load Libraries
The following libraries will be used for this analysis:
# For creating themed html files:
# install.packages("prettydoc")
library(prettydoc)
# For calculating descriptive statistics:
# install.packages("Rmisc")
library(Rmisc)
# For formatting tables:
# install.packages("knitr")
library(knitr)
# install.packages ("kableExtra")
library(kableExtra)
# For using pipes and plotting performance with ggplot2:
# install.packages("tidyverse")
library(tidyverse)Rename Factor Levels
Before we begin, we’ll rename and reorder the levels of the variables we’ll be using.
# Reorder the Cat variable:
CatData$Cat = factor(CatData$Cat, levels = c("RD", "II"))
# Rename the Month variable:
levels(CatData$Month) = c("Jan", "Feb", "Mar", "Apr", "Sept", "Oct", "Nov")
# Rename the Time2 variable:
levels(CatData$Time2) = c("Morning", "Early Afternoon", "Late Afternoon")
# Rename the DayofWeek variable:
levels(CatData$DayofWeek) = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat")Basic Descriptive Statistics
We’ll start by calculating some basic descriptive statistics (ns, Ms, SDs, SEs, and 95% CIs) for the DV across levels of each of the IV’s.
Category Set
# Calculate summary statistics:
CatDescs = summarySE(data = CatData, measurevar = "Total",
groupvars = "Cat", conf.interval = .95)
# Create a table to display the results:
kable(CatDescs, digits = 4,
caption = "Table 1. Descriptives by category set.",
col.names = c("Category Set", "n", "M","SD", "SE", "CI"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Category Set | n | M | SD | SE | CI |
|---|---|---|---|---|---|
| RD | 249 | 0.7559 | 0.1015 | 0.0064 | 0.0127 |
| II | 207 | 0.6586 | 0.0804 | 0.0056 | 0.0110 |
Month
# Calculate summary statistics:
MonthDescs = summarySE(data = CatData, measurevar = "Total",
groupvars = "Month", conf.interval = .95)
# Create a table to display the results:
kable(MonthDescs, digits = 4,
caption = "Table 2. Descriptives by month.",
col.names = c("Month", "n", "M","SD", "SE", "CI"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Month | n | M | SD | SE | CI |
|---|---|---|---|---|---|
| Jan | 64 | 0.7141 | 0.1082 | 0.0135 | 0.0270 |
| Feb | 39 | 0.6698 | 0.1077 | 0.0172 | 0.0349 |
| Mar | 64 | 0.6674 | 0.1199 | 0.0150 | 0.0299 |
| Apr | 72 | 0.7103 | 0.1121 | 0.0132 | 0.0263 |
| Sept | 95 | 0.7627 | 0.0871 | 0.0089 | 0.0177 |
| Oct | 108 | 0.7098 | 0.0841 | 0.0081 | 0.0161 |
| Nov | 14 | 0.6965 | 0.0719 | 0.0192 | 0.0415 |
Time of Day
# Calculate summary statistics:
TimeDescs = summarySE(data = CatData, measurevar = "Total",
groupvars = "Time2", conf.interval = .95)
# Create a table to display the results:
kable(TimeDescs, digits = 4,
caption = "Table 3. Descriptives by time of day.",
col.names = c("Time", "n", "M","SD", "SE", "CI"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Time | n | M | SD | SE | CI |
|---|---|---|---|---|---|
| Morning | 151 | 0.7108 | 0.1022 | 0.0083 | 0.0164 |
| Early Afternoon | 170 | 0.7040 | 0.1057 | 0.0081 | 0.0160 |
| Late Afternoon | 135 | 0.7224 | 0.1051 | 0.0090 | 0.0179 |
Day of Week
# Calculate summary statistics:
DayDescs = summarySE(data = CatData, measurevar = "Total",
groupvars = "DayofWeek", conf.interval = .95)
# Create a table to display the results:
kable(DayDescs, digits = 4,
caption = "Table 4. Descriptives by day.",
col.names = c("Day", "n", "M","SD", "SE", "CI"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Day | n | M | SD | SE | CI |
|---|---|---|---|---|---|
| Mon | 66 | 0.7419 | 0.0896 | 0.0110 | 0.0220 |
| Tues | 115 | 0.7243 | 0.1028 | 0.0096 | 0.0190 |
| Wed | 82 | 0.7156 | 0.0995 | 0.0110 | 0.0219 |
| Thurs | 127 | 0.6856 | 0.1026 | 0.0091 | 0.0180 |
| Fri | 53 | 0.7010 | 0.1191 | 0.0164 | 0.0328 |
| Sat | 13 | 0.7217 | 0.1263 | 0.0350 | 0.0763 |
Complex Descriptive Statistics
Next we’ll calculate descriptive statistics for the DV across levels of the Month, Time2, and DayofWeek variables crossed with the Cat variable.
Month by Category Set
# Calculate summary statistics:
CMDescs = summarySE(data = CatData, measurevar = "Total",
groupvars = c("Cat", "Month"), conf.interval = .95)
# Create a table to display the results:
kable(CMDescs, digits = 4,
caption = "Table 5. Descriptives by category set and month.",
col.names = c("Category Set", "Month", "n", "M","SD", "SE", "CI"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Category Set | Month | n | M | SD | SE | CI |
|---|---|---|---|---|---|---|
| RD | Jan | 25 | 0.7771 | 0.1099 | 0.0220 | 0.0454 |
| RD | Feb | 15 | 0.7253 | 0.1396 | 0.0360 | 0.0773 |
| RD | Mar | 33 | 0.7216 | 0.1272 | 0.0221 | 0.0451 |
| RD | Apr | 41 | 0.7534 | 0.1139 | 0.0178 | 0.0360 |
| RD | Sept | 68 | 0.7875 | 0.0850 | 0.0103 | 0.0206 |
| RD | Oct | 63 | 0.7411 | 0.0721 | 0.0091 | 0.0182 |
| RD | Nov | 4 | 0.7417 | 0.0319 | 0.0159 | 0.0507 |
| II | Jan | 39 | 0.6736 | 0.0865 | 0.0138 | 0.0280 |
| II | Feb | 24 | 0.6352 | 0.0635 | 0.0130 | 0.0268 |
| II | Mar | 31 | 0.6098 | 0.0791 | 0.0142 | 0.0290 |
| II | Apr | 31 | 0.6534 | 0.0808 | 0.0145 | 0.0296 |
| II | Sept | 27 | 0.7001 | 0.0557 | 0.0107 | 0.0220 |
| II | Oct | 45 | 0.6659 | 0.0807 | 0.0120 | 0.0242 |
| II | Nov | 10 | 0.6784 | 0.0765 | 0.0242 | 0.0548 |
Time of Day by Category Set
# Calculate summary statistics:
CTDescs = summarySE(data = CatData, measurevar = "Total",
groupvars = c("Cat", "Time2"), conf.interval = .95)
# Create a table to display the results:
kable(CTDescs, digits = 4,
caption = "Table 6. Descriptives by category set and time of day.",
col.names = c("Category Set", "Time", "n", "M","SD", "SE", "CI"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Category Set | Time | n | M | SD | SE | CI |
|---|---|---|---|---|---|---|
| RD | Morning | 76 | 0.7639 | 0.0916 | 0.0105 | 0.0209 |
| RD | Early Afternoon | 87 | 0.7482 | 0.1058 | 0.0113 | 0.0225 |
| RD | Late Afternoon | 86 | 0.7565 | 0.1060 | 0.0114 | 0.0227 |
| II | Morning | 75 | 0.6570 | 0.0825 | 0.0095 | 0.0190 |
| II | Early Afternoon | 83 | 0.6577 | 0.0839 | 0.0092 | 0.0183 |
| II | Late Afternoon | 49 | 0.6626 | 0.0720 | 0.0103 | 0.0207 |
Day of Week by Category Set
# Calculate summary statistics:
CDDescs = summarySE(data = CatData, measurevar = "Total",
groupvars = c("Cat", "DayofWeek"),
conf.interval = .95)
# Create a table to display the results:
kable(CDDescs, digits = 4,
caption = "Table 7. Descriptives by category set and day.",
col.names = c("Category Set", "Day", "n", "M","SD", "SE", "CI"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Category Set | Day | n | M | SD | SE | CI |
|---|---|---|---|---|---|---|
| RD | Mon | 43 | 0.7761 | 0.0859 | 0.0131 | 0.0264 |
| RD | Tues | 65 | 0.7750 | 0.0891 | 0.0110 | 0.0221 |
| RD | Wed | 46 | 0.7550 | 0.1015 | 0.0150 | 0.0302 |
| RD | Thurs | 63 | 0.7276 | 0.0996 | 0.0126 | 0.0251 |
| RD | Fri | 26 | 0.7369 | 0.1307 | 0.0256 | 0.0528 |
| RD | Sat | 6 | 0.7901 | 0.1580 | 0.0645 | 0.1658 |
| II | Mon | 23 | 0.6781 | 0.0558 | 0.0116 | 0.0241 |
| II | Tues | 50 | 0.6583 | 0.0799 | 0.0113 | 0.0227 |
| II | Wed | 36 | 0.6652 | 0.0707 | 0.0118 | 0.0239 |
| II | Thurs | 64 | 0.6443 | 0.0882 | 0.0110 | 0.0220 |
| II | Fri | 27 | 0.6664 | 0.0969 | 0.0187 | 0.0383 |
| II | Sat | 7 | 0.6630 | 0.0489 | 0.0185 | 0.0453 |
Plotting Performance
Now we’ll create some plots so that we can visualize the data.
Month by Category Set
We’ll begin with a basic bar plot. (Bar plots may be less useful for visualizing data ranges and distributions than violin plots; however, when a variable has as many levels as the Month variable does, the violin plot becomes crowded and horizontally compressed.)
The following code will produce a bar plot with “Month” on the x-axis, “Proportion Correct” on the y-axis, and separate bars for each of the conditions. Error bars represent SEs.
- First we’ll define the values to be used for the plot.
# Calculate SE min:
CMMin = data.frame("CMMin" = CMDescs$Total - CMDescs$se)
# Calculate SE max:
CMMax = data.frame("CMMax" = CMDescs$Total + CMDescs$se)
# Create data frame of values to be used:
CMData = data.frame("Cat" = CMDescs$Cat,
"Month" = CMDescs$Month,
"CMMean" = CMDescs$Total,
"CMMin" = CMMin,
"CMMax" = CMMax)- Next, we’ll plot the data.
ggplot(CMData, aes(Month, CMMean, fill = Cat )) +
geom_col(color = "black", position = "dodge", alpha = .7) +
# Add in error bars:
geom_errorbar(aes(ymin = CMMin, ymax = CMMax),
color = "black",
position = position_dodge(width = 0.9),
width = .1) +
# Add labels:
labs(x = "Month", y = "Proportion Correct",
fill = "Category Condition") +
ggtitle("Category Learning Performance by Condition and Month") +
# Define the vertical size of the plot:
ylim(0, 1) +
# Define variable colours and theme:
scale_fill_manual(values = c("orchid3", "lightseagreen")) +
scale_color_manual(values = c("orchid3", "lightseagreen")) +
theme_light() Time of Day by Category Set
The Time2 variable has only 3 levels. It is an ideal candidate, therefore, for a violin plot.
The following code will produce a split violin plot with “Time of Day” on the x-axis, “Proportion Correct” on the y-axis, and separate data clouds for each of the conditions. Dots and lines represent means and 95% CIs, respectively.
- We’ll begin by defining a function that will create split violin plots. The code below was taken from DeBruine (2018).
GeomSplitViolin <- ggproto(
"GeomSplitViolin",
GeomViolin,
draw_group = function(self, data, ..., draw_quantiles = NULL) {
data <- transform(data,
xminv = x - violinwidth * (x - xmin),
xmaxv = x + violinwidth * (xmax - x))
grp <- data[1,'group']
newdata <- plyr::arrange(
transform(data, x = if(grp%%2==1) xminv else xmaxv),
if(grp%%2==1) y else -y
)
newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ],
newdata[1, ])
newdata[c(1,nrow(newdata)-1,nrow(newdata)), 'x'] <- round(newdata[1,
'x'])
if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))
quantiles <- ggplot2:::create_quantile_segment_frame(data,
draw_quantiles)
aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data),
c("x", "y")),
drop = FALSE]
aesthetics$alpha <- rep(1, nrow(quantiles))
both <- cbind(quantiles, aesthetics)
quantile_grob <- GeomPath$draw_panel(both, ...)
ggplot2:::ggname("geom_split_violin",
grid::grobTree(GeomPolygon$draw_panel(newdata, ...),
quantile_grob))
} else {
ggplot2:::ggname("geom_split_violin",
GeomPolygon$draw_panel(newdata, ...))
}
}
)
geom_split_violin <- function (mapping = NULL,
data = NULL,
stat = "ydensity",
position = "identity", ...,
draw_quantiles = NULL,
trim = TRUE,
scale = "area",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(data = data,
mapping = mapping,
stat = stat,
geom = GeomSplitViolin,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(trim = trim,
scale = scale,
draw_quantiles = draw_quantiles,
na.rm = na.rm, ...)
)
}- Next, we’ll define the values to be used for the dots and lines in the plot.
# Calculate CI min:
CTMin = data.frame("CTMin" = CTDescs$Total - CTDescs$ci)
# Calculate CI max:
CTMax = data.frame("CTMax" = CTDescs$Total + CTDescs$ci)
# Create data frame of values to be used:
CTData = data.frame("Cat" = CTDescs$Cat, "Time2" = CTDescs$Time2,
"CTMean" = CTDescs$Total, "CTMin" = CTMin,
"CTMax" = CTMax)- Finally, we’ll plot the data.
CatData %>%
ggplot(aes(Time2, Total, fill = Cat)) +
geom_split_violin(color="black", trim=FALSE, alpha = 0.7) +
# Add in dots and lines:
geom_pointrange(data = CTData,
aes(Time2, CTMean, ymin = CTMin, ymax = CTMax),
color = "black",
shape = 20,
position = position_dodge(width = 0.25)) +
# Add labels:
labs(x = "Time of Day", y = "Proportion Correct",
fill = "Category Condition") +
ggtitle("Category Learning Performance by Condition and Time of Day") +
# Define the vertical size of the plot:
ylim(0.3, 1) +
# Define variable colours and theme:
scale_fill_manual(values = c("orchid3", "lightseagreen")) +
scale_color_manual(values = c("orchid3", "lightseagreen")) +
theme_light() Day of Week by Category Set
Just for fun, we’ll create a notched box plot to display the last combination of variables.
The following code will produce a notched box plot with “Day” on the x-axis, “Proportion Correct” on the y-axis, and separate boxes for each of the conditions. Notches represent a CI around the median. (Note that the presence of the horn-like features on the last two boxes indicate that the CI is greater than the interquartile range.)
- Plot the data.
ggplot(CatData, aes(x = DayofWeek, y = Total, fill = Cat)) +
geom_boxplot(outlier.color = "black",
outlier.shape = 16, outlier.size = 2,
notch = TRUE, position = position_dodge(1), alpha = .7) +
# Add labels:
labs(x = "Day", y = "Proportion Correct", fill = "Category Condition") +
ggtitle("Category Learning Performance by Condition and Day") +
# Define the vertical size of the plot:
ylim(0.4, 1) +
# Define variable colours and theme:
scale_fill_manual(values = c("orchid3", "lightseagreen")) +
scale_color_manual(values = c("orchid3", "lightseagreen")) +
theme_light() Problem 2
The second set of analyses will involve running ANOVAs to assess the potential effects of the Cat variable, crossed with both Month and Time2, on overall category learning performance.
Analysis Prep
Load Libraries
The following libraries will be used for this analysis:
# For running Levene's test:
# install.packages("car")
library(car)
# For performing ANOVAs:
# install.packages("ez")
library(ez)
# For conducting Games-Howell post-hocs:
# install.packages("userfriendlyscience")
library(userfriendlyscience)Adjust Display Options
ezANOVA prints output using scientific notation. In order to make it easier to read our ANOVA outputs, we’ll turn the scientific notation option off.
options(scipen = 999)p Value Rounding Function
We’ll also create a function to assess and print p values in the comments of our script. If p >= .005, the function will display “p =” and the value rounded to two decimal places. If .0005 <= p < .005, the function will display “p =” and the value rounded to three decimal places. If p < .0005, the function will display “p < .001.”
p_round <- function(x){
if(x > .005)
{x1 = (paste("= ", round(x, digits = 2), sep = ''))
}
else if(x == .005){x1 = (paste("= .01"))
}
else if(x > .0005 & x < .005)
{x1 = (paste("= ", round(x, digits = 3), sep = ''))
}
else if(x == .0005){x1 = (paste("= .001"))
}
else{x1 = (paste("< .001"))
}
(x1)
}Partial Eta Square
In some cases, we will have to use adjusted df’s and/or perform White adjusted ANOVAs. In these cases, we will have to calculate adjusted effect sizes. Partial eta square can be calculated using the following formula, which we will create a function for: \[\eta^2_{partial} = {\frac{df_n F}{df_n F + df_d}}\]
peta <- function(dfn, dfd, f) {
return(dfn * f / ((dfn * f) + dfd))
}Post-Hoc Rounding Function
We’ll also create a function to help round some of our post-hoc results. (Neither the kable rounding function nor the standard “round” function will work for some of our post-hoc results tables.) The code below was taken from Akhmed (2015).
round_df <- function(df, digits) {
nums <- vapply(df, is.numeric, FUN.VALUE = logical(1))
df[,nums] <- round(df[,nums], digits = digits)
(df)
}Category Set and Month
To assess the effects of category set and month on overall learning performance, we’ll conduct a 2 x 7 ANOVA with both Cat and Month as between-group factors. Because we have unequal sample sizes between groups, we’ll use Type III sum of squares. Note that a White adjustment has been used; this will be discussed further in the Homogeneity of Variance section below.
Assumptions
The standard ANOVA makes three primary assumptions:
1. Independent Random Sampling
This assumption was met during testing.
2. Normality
This assumption can be tested using a Shapiro-Wilk test.
CM_Shap = shapiro.test(CatData$Total)
CM_Shap##
## Shapiro-Wilk normality test
##
## data: CatData$Total
## W = 0.97308, p-value = 0.0000001922
Based on an alpha level of .05, the assumption of normality is not met; W = 0.97, p < .001. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).
3. Homogeneity of Variance
This assumption can be tested using a Levene’s test (provided as part of the ANOVA output).
# Run the ANOVA:
CM_ANOVA = ezANOVA(data = CatData, dv = .(Total),
wid = .(UniqueSubjNum), between = .(Cat, Month),
detailed = TRUE, type = "III",
white.adjust = TRUE, return_aov = TRUE)
# Extract the Levene's Test from the ANOVA output:
CM_Lev = CM_ANOVA$`Levene's Test for Homogeneity of Variance`
# Create a table to display the results:
kable(CM_Lev, digits = 4,
caption = "Table 8. Month by category set Levene's test.",
col.names = c("DFn", "DFd", "SSn", "SSd","F", "p", "sig"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| DFn | DFd | SSn | SSd | F | p | sig |
|---|---|---|---|---|---|---|
| 13 | 442 | 0.1415 | 1.7921 | 2.684 | 0.0012 |
|
Based on an alpha level of .05, the assumption of homogeneity of variances is not met; F (13, 442) = 2.68, p = 0.001. Because sample sizes are unequal, a White-adjustment should be used to correct for this violation.
ANOVA
kable(CM_ANOVA$ANOVA, digits = 4,
caption = "Table 9. Month by category set ANOVA.",
col.names = c("Effect", "DFn", "DFd", "F", "p", "sig"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Effect | DFn | DFd | F | p | sig |
|---|---|---|---|---|---|
| (Intercept) | 1 | 442 | 19570.0933 | 0.0000 |
|
| Cat | 1 | 442 | 80.1292 | 0.0000 |
|
| Month | 6 | 442 | 5.8585 | 0.0000 |
|
| Cat:Month | 6 | 442 | 0.4544 | 0.8418 |
Because we’ve used a White adjustment, effect sizes are not provided in the output. Instead, we will use our peta function to calculate partial eta effect sizes.
# Calculate values:
CMC_peta = peta(dfn = CM_ANOVA$ANOVA[2,]$DFn, dfd = CM_ANOVA$ANOVA[2,]$DFd, f = CM_ANOVA$ANOVA[2,]$F)
CMM_peta = peta(dfn = CM_ANOVA$ANOVA[3,]$DFn, dfd = CM_ANOVA$ANOVA[3,]$DFd, f = CM_ANOVA$ANOVA[3,]$F)
CMI_peta = peta(dfn = CM_ANOVA$ANOVA[4,]$DFn, dfd = CM_ANOVA$ANOVA[4,]$DFd, f = CM_ANOVA$ANOVA[4,]$F)
# Create a data frame of the results:
petas = data.frame("Cat" = CMC_peta, "Month" = CMM_peta,
"Cat*Month" = CMI_peta)
# Create a table to display the results:
kable(petas, digits = 4,
caption = "Table 10. Month by category set effect sizes.",
col.names = c("Category Set", "Month", "Interaction"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Category Set | Month | Interaction |
|---|---|---|
| 0.1535 | 0.0737 | 0.0061 |
Interpretation of Main Effects
Interaction
The category set x month interaction was not found to be statistically significant; F (6, 442) = 0.45, p = 0.84, \(\eta^2_{p}\) = 0.01.
Main Effect of Category Set
The main effect of category set was found to be statistically significant; F (1, 442) = 80.13, p < .001, \(\eta^2_{p}\) = 0.15.
Main Effect of Month
The main effect of month was found to be statistically significant; F (6, 442) = 5.86, p < .001, \(\eta^2_{p}\) = 0.07.
Post-hoc Tests
The significant main effects of category set and month will be further assessed via post-hoc tests. Because we have unequal sample sizes and our data displayed a violation of the homogeneity of variances assumption, we will use the Games-Howell adjustment.
Main Effect of Category Set
# Calculate post-hoc:
C_post = posthocTGH(y = CatData$Total, x = CatData$Cat,
method = c("games-howell"), conf.level = .95,
digits = 4, formatPvalue = TRUE)
# Round results:
C_post_r = round_df(C_post$output$games.howell, digits = 4)
# Format the results table so that significant p-values will be bolded:
C_post_r$p = cell_spec(C_post_r$p,
bold = (ifelse(C_post_r$p < .05, "TRUE", "FALSE")))
# Create a table to display the results:
kable(C_post_r, digits = 4,
caption = "Table 11. Category set post-hoc.",
col.names = c("Difference", "CI Min", "CI Max", "t", "df", "p"),
align = 'c', escape = FALSE) %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Difference | CI Min | CI Max | t | df | p | |
|---|---|---|---|---|---|---|
| II-RD | -0.0973 | -0.114 | -0.0805 | 11.4169 | 452.941 | 0 |
These results indicate that participants in the RD condition (MRD = 0.76, SDRD = 0.1) performed significantly better on the category learning task than participants in the II condition(MII = 0.66, SDII = 0.08); t (452.94) = 11.42, p < .001.
Main Effect of Month
# Calculate post-hoc:
M_post = posthocTGH(y = CatData$Total, x = CatData$Month,
method = c("games-howell"), conf.level = .95,
digits = 4, formatPvalue = TRUE)
# Round results:
M_post_r = round_df(M_post$output$games.howell, digits = 4)
# Format the results table so that significant p-values will be bolded:
M_post_r$p = cell_spec(M_post_r$p,
bold = (ifelse(M_post_r$p < .05, "TRUE", "FALSE")))
# Create a table to display the results:
kable(M_post_r, digits = 4,
caption = "Table 12. Month post-hoc.",
col.names = c("Difference", "CI Min", "CI Max", "t", "df", "p"),
align = 'c', escape = FALSE) %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Difference | CI Min | CI Max | t | df | p | |
|---|---|---|---|---|---|---|
| Feb-Jan | -0.0442 | -0.1105 | 0.0220 | 2.0190 | 80.7115 | 0.411 |
| Mar-Jan | -0.0467 | -0.1071 | 0.0138 | 2.3118 | 124.6992 | 0.2467 |
| Apr-Jan | -0.0037 | -0.0603 | 0.0529 | 0.1976 | 133.0743 | 1 |
| Sept-Jan | 0.0486 | 0.0000 | 0.0972 | 2.9994 | 115.2892 | 0.0502 |
| Oct-Jan | -0.0043 | -0.0517 | 0.0431 | 0.2730 | 108.0980 | 1 |
| Nov-Jan | -0.0175 | -0.0922 | 0.0571 | 0.7467 | 27.6581 | 0.9881 |
| Mar-Feb | -0.0024 | -0.0714 | 0.0665 | 0.1059 | 87.1016 | 1 |
| Apr-Feb | 0.0405 | -0.0252 | 0.1062 | 1.8650 | 80.7967 | 0.5091 |
| Sept-Feb | 0.0928 | 0.0336 | 0.1521 | 4.7817 | 59.4204 | 0.0002 |
| Oct-Feb | 0.0399 | -0.0183 | 0.0982 | 2.0966 | 55.6527 | 0.3689 |
| Nov-Feb | 0.0267 | -0.0541 | 0.1074 | 1.0338 | 34.6614 | 0.9424 |
| Apr-Mar | 0.0429 | -0.0169 | 0.1027 | 2.1491 | 129.5460 | 0.3307 |
| Sept-Mar | 0.0953 | 0.0428 | 0.1477 | 5.4615 | 106.7338 | 0 |
| Oct-Mar | 0.0424 | -0.0089 | 0.0936 | 2.4870 | 100.1513 | 0.1752 |
| Nov-Mar | 0.0291 | -0.0476 | 0.1058 | 1.1946 | 31.2206 | 0.8909 |
| Sept-Apr | 0.0523 | 0.0046 | 0.1001 | 3.2828 | 130.2505 | 0.0219 |
| Oct-Apr | -0.0006 | -0.0470 | 0.0459 | 0.0367 | 122.8858 | 1 |
| Nov-Apr | -0.0138 | -0.0880 | 0.0603 | 0.5923 | 27.0723 | 0.9965 |
| Oct-Sept | -0.0529 | -0.0888 | -0.0170 | 4.3884 | 195.7911 | 0.0004 |
| Nov-Sept | -0.0662 | -0.1357 | 0.0034 | 3.1216 | 19.1030 | 0.0688 |
| Nov-Oct | -0.0132 | -0.0822 | 0.0557 | 0.6351 | 17.9562 | 0.9946 |
These results indicate that participants displayed significantly better performance during September (MSept = 0.76, SDSept = 0.09) than during February (MFeb = 0.67, SDFeb = 0.11), March (MMar = 0.67, SDMar = 0.12), April (MApr = 0.71, SDApr = 0.11), and October (MOct = 0.71, SDOct = 0.08); t (59.42) = 4.78, p < .001; t (106.73) = 5.46, p < .001; t (130.25) = 3.28, p = 0.02; and t (195.79) = 4.39, p < .001, respectively. Participants in September also performed marginally better than participants in January (MJan = 0.71, SDJan = 0.11) and November (MNov = 0.7, SDNov = 0.07); t (115.29) = 3, p = 0.05 and t (19.1) = 3.12, p = 0.07, respectively.
Category Set and Time of Day
To assess the effects of category set and time of day on overall learning performance, we’ll conduct a 2 x 3 ANOVA with both Cat and Time2 as between-group factors. Because we have unequal sample sizes between groups, we’ll use Type III sum of squares.
Assumptions
1. Independent Random Sampling
This assumption was met during testing.
2. Normality
As specified above in the Category Set by Month analysis, the assumption of normality is not met; W = 0.97, p < .001. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).
3. Homogeneity of Variance
Again, this assumption can be tested using a Levene’s test.
# Run the ANOVA:
CT_ANOVA = ezANOVA(data = CatData, dv = .(Total),
wid = .(UniqueSubjNum), between = .(Cat, Time2),
detailed = TRUE, type = "III", return_aov = TRUE)
# Extract the Levene's Test from the ANOVA output:
CT_Lev = CT_ANOVA$`Levene's Test for Homogeneity of Variance`
# Create a table to display the results:
kable(CT_Lev, digits = 4,
caption = "Table 13. Time of day by category set Levene's test.",
col.names = c("DFn", "DFd", "SSn", "SSd","F", "p", "sig"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| DFn | DFd | SSn | SSd | F | p | sig |
|---|---|---|---|---|---|---|
| 5 | 450 | 0.0337 | 1.8655 | 1.6245 | 0.152 |
Based on an alpha level of .05, the assumption of homogeneity of variances is met; F (5, 450) = 1.62, p = 0.15. A White-adjustment, therefore, does not need to be used for this analysis.
ANOVA
kable(CT_ANOVA$ANOVA, digits = 4,
caption = "Table 14. Time of day by category set ANOVA.",
col.names = c("Effect", "DFn", "DFd", "SSn", "SSd", "F",
"p", "sig", "Effect Size"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Effect | DFn | DFd | SSn | SSd | F | p | sig | Effect Size |
|---|---|---|---|---|---|---|---|---|
| (Intercept) | 1 | 450 | 219.6694 | 3.8762 | 25502.3848 | 0.0000 |
|
0.9827 |
| Cat | 1 | 450 | 1.0344 | 3.8762 | 120.0936 | 0.0000 |
|
0.2107 |
| Time2 | 2 | 450 | 0.0053 | 3.8762 | 0.3091 | 0.7342 | 0.0014 | |
| Cat:Time2 | 2 | 450 | 0.0058 | 3.8762 | 0.3377 | 0.7136 | 0.0015 |
Interpretation of Main Effects
Interaction
The category set x time of day interaction was not found to be statistically significant; F (2, 450) = 0.34, p = 0.71, \(\eta^2_{p}\) = 0.
Main Effect of Category Set
As before, the main effect of category set was found to be statistically significant; F (1, 450) = 120.09, p < .001, \(\eta^2_{p}\) = 0.21.
Main Effect of Time of Day
The main effect of time of day was not found to be statistically significant; F (2, 450) = 0.31, p = 0.73, \(\eta^2_{p}\) = 0.
Post-hoc Tests
The significant main effect of category set will be further assessed via a post-hoc test (though the test will essentially be identical to the one conducted in the Category Set by Month analysis above). Because we have only one pairwise comparison to complete, we will use the Bonferroni adjustment.
Main Effect of Category Set
C1_post = pairwise.t.test(CatData$Total, CatData$Cat,
p.adjust.method = "bonf")
C1_post##
## Pairwise comparisons using t tests with pooled SD
##
## data: CatData$Total and CatData$Cat
##
## RD
## II <0.0000000000000002
##
## P value adjustment method: bonferroni
These results confirm the previous finding that participants in the RD condition performed significantly better on the category learning task than participants in the II condition (p < .001).
Problem 3
The third set of analyses will involve creating a new dataset with Cat, Month, Time2, and performance by block in long format. We will then create line plots to depict performance across block by both Month and Cat, followed by a mixed ANOVA to assess the relationships between the variables.
Analysis Prep
Data Gathering
First, we will create a subset of the variables that we are interested in.
# Specify the variables to keep:
VarsKeep = names(CatData) %in% c("UniqueSubjNum", "Cat", "Time2", "Month", "X1_Block", "X2_Block", "X3_Block", "X4_Block")
# Define the new subset and label the columns:
Data3 = (CatData[VarsKeep])
colnames(Data3) = c("Subject", "Cat", "Time", "Month", "Block1", "Block2", "Block3", "Block4")Now we will gather the data into long format.
CatLong = gather(Data3, Block, Performance, Block1:Block4, factor_key = TRUE)Plotting Performance
Now we’ll plot performance across block by both category set and month.
- We’ll begin by calculating the descriptive statistics to be used in the plot.
CMBDescs = summarySE(data = CatLong, measurevar = "Performance", groupvars = c("Cat", "Month", "Block"), conf.interval = .95)- Next, we’ll plot performance. The following code will produce a series of line graphs with “Block” on the x-axis, “Proportion Correct” on the y-axis, separate lines for the two category conditions, and separate plots for the seven months. Error bars represent standard error of the mean (SEM).
CMBFig = ggplot(CMBDescs, aes(x = Block, y = Performance)) +
# Specify labels:
labs(x = "Block", y = "Proportion Correct") +
ggtitle("Category Learning Performance by Block, Month, and Category Set") +
# Define line and point aesthetics:
geom_line(aes(colour = Cat, group = Cat)) + geom_point(size = 2, aes(colour = Cat)) +
# Adjust axes:
scale_x_discrete(limits = c("Block1", "Block2", "Block3", "Block4"),
labels = c("Block1" = "1", "Block2" = "2", "Block3" = "3", "Block4" = "4")) +
scale_y_continuous(limits = c(0.4, 1.0), breaks = seq(0.4, 1.0, .1)) +
# Add legend and adjust colours:
scale_colour_manual(name = "Category Set", values = c("orchid3", "lightseagreen")) +
# Add error bars:
geom_errorbar(data = CMBDescs, mapping = aes(x = Block, ymin = Performance - se, ymax = Performance + se), width = .1) +
# Specify theme:
theme_bw() + theme(plot.title = element_text(hjust = .5))
# Use facet wrap to divide the plot by month:
CMBFig + facet_wrap( ~ Month, ncol = 2)Mixed ANOVA
Now we’ll conduct a Type III, 2 x 7 x 4 mixed ANOVA with category condition and month as between-group factors and block as a within-group factor.
Assumptions
The mixed ANOVA makes four primary assumptions:
- Independent Random Sampling
This assumption was met during testing.
- Normality
This assumption can be tested by applying a Shapiro-Wilk test to the outcome measure (i.e. performance).
CMB_Shap = shapiro.test(CatLong$Performance)
CMB_Shap##
## Shapiro-Wilk normality test
##
## data: CatLong$Performance
## W = 0.97786, p-value = 0.0000000000000003455
Based on an alpha level of .05, the assumption of normality is not met; W = 0.98, p < .001. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).
- Homogeneity of Variance for Between-Group Factors
This assumption can be tested by implementing a Levene’s test on performance collapsed across the within-group factor (i.e. performance averaged across Block).
CMB_Lev = leveneTest(data = CatData, Total ~ Cat*Month, center = median)
CMB_Lev## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 13 2.684 0.001205 **
## 442
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on an alpha level of .05, the assumption of homogeneity of between-group variances is not met; F(13, 442) = 2.68, p = 0.001. Because sample sizes are unequal, a White-corrected F-test should be conducted if one is interested in assessing the main effect of condition.
- Sphericity
This assumption can be tested using Mauchley’s sphericity test (provided as part of the ANOVA output).
# Run the ANOVA:
CMB_ANOVA = ezANOVA(data = CatLong, dv = .(Performance), wid = .(Subject), within = .(Block), between = .(Cat, Month), detailed = TRUE, type = "III", return_aov = TRUE, white.adjust = TRUE)
# Extract the sphericity test from the ANOVA output:
CMB_Mau = CMB_ANOVA$`Mauchly's Test for Sphericity`
# Create a table to display the results:
kable(CMB_Mau, digits = 4,
caption = "Table 15. Test of sphericity on performance by block.",
col.names = c("Effect", "W", "p", "sig"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Effect | W | p | sig | |
|---|---|---|---|---|
| 5 | Block | 0.7442 | 0 |
|
| 6 | Cat:Block | 0.7442 | 0 |
|
| 7 | Month:Block | 0.7442 | 0 |
|
| 8 | Cat:Month:Block | 0.7442 | 0 |
|
Based on an alpha level of .05, the assumption of sphericity is not met for any of the effects involving the within-group factor (W = 0.74, p < .001). We will now look at the potential epsilon corrections that we can use.
# Extract the corrections table from the ANOVA output:
CMB_Eps = CMB_ANOVA$`Sphericity Corrections`
# Create a table to display the results:
kable(CMB_Eps, digits = 4,
caption = "Table 16. Epsilon corrections for the test of performance by block.",
col.names = c("Effect", "GG Epsilon", "GG p", "GG sig", "HF Epsilon", "HF p", "HF sig"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Effect | GG Epsilon | GG p | GG sig | HF Epsilon | HF p | HF sig | |
|---|---|---|---|---|---|---|---|
| 5 | Block | 0.8305 | 0.0000 |
|
0.8356 | 0.0000 |
|
| 6 | Cat:Block | 0.8305 | 0.0000 |
|
0.8356 | 0.0000 |
|
| 7 | Month:Block | 0.8305 | 0.0883 | 0.8356 | 0.0877 | ||
| 8 | Cat:Month:Block | 0.8305 | 0.5984 | 0.8356 | 0.5990 |
Because \(\varepsilon_{GG}\) > .75, we will apply the Huynh-Feldt correction (\(\varepsilon\) = 0.84) to these tests (as suggested by Girden, 1992).
Effect Sizes
Because we’ve used a White adjustment and adjusted df’s, we’ll have to calculate adjusted effect sizes.
# Cat:
CMB_C_peta = peta(dfn = CMB_ANOVA$ANOVA[2,]$DFn, dfd = CMB_ANOVA$ANOVA[2,]$DFd, f = CMB_ANOVA$ANOVA[2,]$F)
# Month:
CMB_M_peta = peta(dfn = CMB_ANOVA$ANOVA[3,]$DFn, dfd = CMB_ANOVA$ANOVA[3,]$DFd, f = CMB_ANOVA$ANOVA[3,]$F)
# Block:
CMB_B_peta = peta(dfn = CMB_ANOVA$ANOVA[4,]$DFn, dfd = CMB_ANOVA$ANOVA[4,]$DFd, f = CMB_ANOVA$ANOVA[4,]$F)
# Cat*Month:
CMB_CM_peta = peta(dfn = CMB_ANOVA$ANOVA[5,]$DFn, dfd = CMB_ANOVA$ANOVA[5,]$DFd, f = CMB_ANOVA$ANOVA[5,]$F)
# Cat*Block:
CMB_CB_peta = peta(dfn = CMB_ANOVA$ANOVA[6,]$DFn, dfd = CMB_ANOVA$ANOVA[6,]$DFd, f = CMB_ANOVA$ANOVA[6,]$F)
# Month*Block:
CMB_MB_peta = peta(dfn = CMB_ANOVA$ANOVA[7,]$DFn, dfd = CMB_ANOVA$ANOVA[7,]$DFd, f = CMB_ANOVA$ANOVA[7,]$F)
# Cat*Month*Block:
CMB_CMB_peta = peta(dfn = CMB_ANOVA$ANOVA[8,]$DFn, dfd = CMB_ANOVA$ANOVA[8,]$DFd, f = CMB_ANOVA$ANOVA[8,]$F)Interpretation of Main Effects
# Create data frames for the epsilon corrected tests:
# Cat:
CMB_C_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[2], "DFn" = CMB_ANOVA$ANOVA$DFn[2], "DFd" = CMB_ANOVA$ANOVA$DFd[2], "SSn" = CMB_ANOVA$ANOVA$SSn[2], "SSd" = CMB_ANOVA$ANOVA$SSd[2], "F" = CMB_ANOVA$ANOVA$F[2], "p" = CMB_ANOVA$ANOVA$p[2], "sig" = CMB_ANOVA$ANOVA$`p<.05`[2], "peta" = CMB_C_peta)
# Month:
CMB_M_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[3], "DFn" = CMB_ANOVA$ANOVA$DFn[3], "DFd" = CMB_ANOVA$ANOVA$DFd[3], "SSn" = CMB_ANOVA$ANOVA$SSn[3], "SSd" = CMB_ANOVA$ANOVA$SSd[3], "F" = CMB_ANOVA$ANOVA$F[3], "p" = CMB_ANOVA$ANOVA$p[3], "sig" = CMB_ANOVA$ANOVA$`p<.05`[3], "peta" = CMB_M_peta)
# Block:
CMB_B_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[4], "DFn" = CMB_ANOVA$ANOVA$DFn[4] * CMB_Eps$HFe[1], "DFd" = CMB_ANOVA$ANOVA$DFd[4] * CMB_Eps$HFe[1], "SSn" = CMB_ANOVA$ANOVA$SSn[4], "SSd" = CMB_ANOVA$ANOVA$SSd[4], "F" = CMB_ANOVA$ANOVA$F[4], "p" = CMB_Eps$`p[HF]`[1], "sig" = CMB_Eps$`p[HF]<.05`[1], "peta" = CMB_B_peta)
# Cat*Month:
CMB_CM_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[5], "DFn" = CMB_ANOVA$ANOVA$DFn[5], "DFd" = CMB_ANOVA$ANOVA$DFd[5], "SSn" = CMB_ANOVA$ANOVA$SSn[5], "SSd" = CMB_ANOVA$ANOVA$SSd[5], "F" = CMB_ANOVA$ANOVA$F[5], "p" = CMB_ANOVA$ANOVA$p[5], "sig" = CMB_ANOVA$ANOVA$`p<.05`[5], "peta" = CMB_CM_peta)
# Cat*Block:
CMB_CB_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[6], "DFn" = CMB_ANOVA$ANOVA$DFn[6] * CMB_Eps$HFe[2], "DFd" = CMB_ANOVA$ANOVA$DFd[6] * CMB_Eps$HFe[2], "SSn" = CMB_ANOVA$ANOVA$SSn[6], "SSd" = CMB_ANOVA$ANOVA$SSd[6], "F" = CMB_ANOVA$ANOVA$F[6], "p" = CMB_Eps$`p[HF]`[2], "sig" = CMB_Eps$`p[HF]<.05`[2], "peta" = CMB_CB_peta)
# Month*Block:
CMB_MB_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[7], "DFn" = CMB_ANOVA$ANOVA$DFn[7] * CMB_Eps$HFe[3], "DFd" = CMB_ANOVA$ANOVA$DFd[7] * CMB_Eps$HFe[3], "SSn" = CMB_ANOVA$ANOVA$SSn[7], "SSd" = CMB_ANOVA$ANOVA$SSd[7], "F" = CMB_ANOVA$ANOVA$F[7], "p" = CMB_Eps$`p[HF]`[3], "sig" = CMB_Eps$`p[HF]<.05`[3], "peta" = CMB_MB_peta)
# Cat*Month*Block:
CMB_CMB_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[8], "DFn" = CMB_ANOVA$ANOVA$DFn[8] * CMB_Eps$HFe[4], "DFd" = CMB_ANOVA$ANOVA$DFd[8] * CMB_Eps$HFe[4], "SSn" = CMB_ANOVA$ANOVA$SSn[8], "SSd" = CMB_ANOVA$ANOVA$SSd[8], "F" = CMB_ANOVA$ANOVA$F[8], "p" = CMB_Eps$`p[HF]`[4], "sig" = CMB_Eps$`p[HF]<.05`[4], "peta" = CMB_CMB_peta)
# Combine into a single data frame:
CMB_ANOVA_Corr = rbind(CMB_C_ANOVA, CMB_M_ANOVA, CMB_B_ANOVA, CMB_CM_ANOVA, CMB_CB_ANOVA, CMB_MB_ANOVA, CMB_CMB_ANOVA)
# Create a table to display the results:
kable(CMB_ANOVA_Corr, digits = 4,
caption = "Table 17. Mixed ANOVA on performance across block by month and category set.",
col.names = c("Effect", "DFn", "DFd", "SSn", "SSd", "F", "p", "sig", "peta"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Effect | DFn | DFd | SSn | SSd | F | p | sig | peta |
|---|---|---|---|---|---|---|---|---|
| Cat | 1.0000 | 442.000 | 1.5916 | 16.0741 | 43.7664 | 0.0000 |
|
0.0901 |
| Month | 6.0000 | 442.000 | 1.1411 | 16.0741 | 5.2295 | 0.0000 |
|
0.0663 |
| Block | 2.5067 | 1107.943 | 1.5776 | 5.9090 | 118.0047 | 0.0000 |
|
0.2107 |
| Cat:Month | 6.0000 | 442.000 | 0.2320 | 16.0741 | 1.0632 | 0.3839 | 0.0142 | |
| Cat:Block | 2.5067 | 1107.943 | 0.1618 | 5.9090 | 12.1035 | 0.0000 |
|
0.0267 |
| Month:Block | 15.0400 | 1107.943 | 0.1226 | 5.9090 | 1.5283 | 0.0877 | 0.0203 | |
| Cat:Month:Block | 15.0400 | 1107.943 | 0.0698 | 5.9090 | 0.8697 | 0.5990 | 0.0117 |
CatMonthBlock:
Based on an alpha level of .05, the three-way interaction between category set, month, and block was not found to be statistically significant F (15, 1108) = 0.87, p = 0.6, \(\eta^2_{p}\) = 0.01.
Month*Block:
Based on an alpha level of .05, the two-way interaction between month and block was not found to be statistically significant F (15, 1108) = 1.53, p = 0.09, \(\eta^2_{p}\) = 0.02.
Cat*Block:
Based on an alpha level of .05, the two-way interaction between category set and block was found to be statistically significant F (3, 1108) = 12.1, p < .001, \(\eta^2_{p}\) = 0.03.
Cat*Month:
Based on an alpha level of .05, the two-way interaction between category set and month was not found to be statistically significant F (6, 442) = 1.06, p = 0.38, \(\eta^2_{p}\) = 0.01.
Block:
Because the category set x block interaction was found to be significant, we don’t need to consider the main effect of block.
Month:
Based on an alpha level of .05, the main effect of month was found to be statistically significant F (6, 442) = 5.23, p < .001, \(\eta^2_{p}\) = 0.07.
Cat:
Because the category set x block interaction was found to be significant, we don’t need to consider the main effect of category set.
Cat*Block Post-Hoc:
The significant category set x block interaction will be further assessed via a test of simple main effects of block across levels of category set. To do so, we will conduct ANOVA analyses to assess the effect of block on performance for each condition separately. Note that, in order to correct the family-wise error rate, a Holm-Bonferroni adjustment will be used when assessing the significance of main effects associated with these ANOVAs. The Holm-Bonferroni adjustment defines a corrected alpha level according to the following formula: \[\alpha_{corrected} = \frac{\alpha}{number\;of\;comparisons - rank\;of\;comparison + 1}\]
Before we begin, we’ll define RD and II data subsets:
RD_Perf = subset(CatLong, Cat == "RD")
II_Perf = subset(CatLong, Cat == "II")Rule-Defined ANOVA
Assumptions
The repeated-measures ANOVA makes three primary assumptions:
1. Independent Random Sampling
This assumption was met during testing.
2. Normality
CMB_RD_Shap = shapiro.test(RD_Perf$Performance)
CMB_RD_Shap##
## Shapiro-Wilk normality test
##
## data: RD_Perf$Performance
## W = 0.91747, p-value < 0.00000000000000022
Based on an alpha level of .05, the assumption of normality is not met; W = 0.92, p < .001. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).
3. Sphericity
# Conduct the ANOVA:
CMB_RD_ANOVA = ezANOVA(data = RD_Perf, dv = .(Performance), wid = .(Subject), within = .(Block), type = "III", detailed = TRUE, return_aov = TRUE)
# Extract the sphericity test from the ANOVA output:
CMB_RD_Mau = CMB_RD_ANOVA$`Mauchly's Test for Sphericity`
# Create a table to display the results:
kable(CMB_RD_Mau, digits = 4,
caption = "Table 18. Test of sphericity on rule-defined performance by block.",
col.names = c("Effect", "W", "p", "sig"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Effect | W | p | sig | |
|---|---|---|---|---|
| 2 | Block | 0.6067 | 0 |
|
Based on an alpha level of .05, the assumption of sphericity is not met (W = 0.61, p < .001). We will now look at the potential epsilon corrections that we can use.
# Extract the corrections table from the ANOVA output:
CMB_RD_Eps = CMB_RD_ANOVA$`Sphericity Corrections`
# Create a table to display the results:
kable(CMB_RD_Eps, digits = 4,
caption = "Table 19. Epsilon corrections for the test of rule-defined performance by block.",
col.names = c("Effect", "GG Epsilon", "GG p", "GG sig", "HF Epsilon", "HF p", "HF sig"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Effect | GG Epsilon | GG p | GG sig | HF Epsilon | HF p | HF sig | |
|---|---|---|---|---|---|---|---|
| 2 | Block | 0.7544 | 0 |
|
0.7617 | 0 |
|
Because \(\varepsilon_{GG}\) > .75, we will apply the Huynh-Feldt correction (\(\varepsilon\) = 0.76; as suggested by Girden, 1992).
Information Integration ANOVA
Assumptions
1. Independent Random Sampling
This assumption was met during testing.
2. Normality
CMB_II_Shap = shapiro.test(II_Perf$Performance)
CMB_II_Shap##
## Shapiro-Wilk normality test
##
## data: II_Perf$Performance
## W = 0.99504, p-value = 0.008679
Based on an alpha level of .05, the assumption of normality is not met; W = 1, p = 0.01. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).
3. Sphericity
# Conduct the ANOVA:
CMB_II_ANOVA = ezANOVA(data = II_Perf, dv = .(Performance), wid = .(Subject), within = .(Block), type = "III", detailed = TRUE, return_aov = TRUE)
# Extract the sphericity test from the ANOVA output:
CMB_II_Mau = CMB_II_ANOVA$`Mauchly's Test for Sphericity`
# Create a table to display the results:
kable(CMB_II_Mau, digits = 4,
caption = "Table 20. Test of sphericity on information integration performance by block.",
col.names = c("Effect", "W", "p", "sig"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Effect | W | p | sig | |
|---|---|---|---|---|
| 2 | Block | 0.8705 | 0 |
|
Based on an alpha level of .05, the assumption of sphericity is not met (W = 0.87, p < .001). We will now look at the potential epsilon corrections that we can use.
# Extract the corrections table from the ANOVA output:
CMB_II_Eps = CMB_II_ANOVA$`Sphericity Corrections`
# Create a table to display the results:
kable(CMB_II_Eps, digits = 4,
caption = "Table 21. Epsilon corrections for the test of information integration performance by block.",
col.names = c("Effect", "GG Epsilon", "GG p", "GG sig", "HF Epsilon", "HF p", "HF sig"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Effect | GG Epsilon | GG p | GG sig | HF Epsilon | HF p | HF sig | |
|---|---|---|---|---|---|---|---|
| 2 | Block | 0.909 | 0 |
|
0.9224 | 0 |
|
Because \(\varepsilon_{GG}\) > .75, we will apply the Huynh-Feldt correction (\(\varepsilon\) = 0.92; as suggested by Girden, 1992).
Interpretation of Main Effects
We will now employ the Holm-Bonferroni adjustment to assess the main effects of these tests.
# Create a data frame of the p-values being assessed:
CMB_Corr_ps1 = data.frame("Condition" = c("RD", "II"), "p" = c(CMB_RD_Eps$`p[HF]`, CMB_II_Eps$`p[HF]`))
# Perform p-adjustment:
CMB_Corr_ps2 = p.adjust(CMB_Corr_ps1[,2], method = c("holm"), n = 2)
# Create a data frame for the RD Condition:
CMB_RD_ANOVA_Corr = data.frame("Condition" = c("RD"), "Effect" = CMB_RD_ANOVA$ANOVA$Effect[2], "DFn" = CMB_RD_ANOVA$ANOVA$DFn[2] * CMB_RD_Eps$HFe, "DFd" = CMB_RD_ANOVA$ANOVA$DFd[2] * CMB_RD_Eps$HFe, "SSn" = CMB_RD_ANOVA$ANOVA$SSn[2], "SSd" = CMB_RD_ANOVA$ANOVA$SSd[2], "F" = CMB_RD_ANOVA$ANOVA$F[2], "p" = round(CMB_Corr_ps2[1], 4), "peta" = CMB_RD_ANOVA$ANOVA$ges[2])
# Create a data frame for the II Condition:
CMB_II_ANOVA_Corr = data.frame("Condition" = c("II"), "Effect" = CMB_II_ANOVA$ANOVA$Effect[2], "DFn" = CMB_II_ANOVA$ANOVA$DFn[2] * CMB_II_Eps$HFe, "DFd" = CMB_II_ANOVA$ANOVA$DFd[2] * CMB_II_Eps$HFe, "SSn" = CMB_II_ANOVA$ANOVA$SSn[2], "SSd" = CMB_II_ANOVA$ANOVA$SSd[2], "F" = CMB_II_ANOVA$ANOVA$F[2], "p" = round(CMB_Corr_ps2[2], 4), "peta" = CMB_II_ANOVA$ANOVA$ges[2])
# Combine both data frames into one:
CMB_RDII_ANOVA_Corr = rbind(CMB_RD_ANOVA_Corr, CMB_II_ANOVA_Corr)
# Format the results table so that significant p-values will be bolded:
CMB_RDII_ANOVA_Corr = CMB_RDII_ANOVA_Corr %>%
mutate(
p = text_spec(p, bold = (ifelse(p < .05, "TRUE", "FALSE")))
)
# Create a table to display the results:
kable(CMB_RDII_ANOVA_Corr, digits = 4,
caption = "Table 22. Tests of simple main effects on performance by block for each category set condition.",
align = 'c', escape = FALSE) %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Condition | Effect | DFn | DFd | SSn | SSd | F | p | peta |
|---|---|---|---|---|---|---|---|---|
| RD | Block | 2.2850 | 566.6777 | 2.6390 | 3.7067 | 176.5669 | 0 | 0.1431 |
| II | Block | 2.7671 | 570.0177 | 0.5793 | 2.3931 | 49.8707 | 0 | 0.0699 |
The main effect of block was found to be significant for both rule-defined and information integration performance; F (2, 567) = 176.57, p < .001, \(\eta^2_{p}\) = 0.14 and F (3, 570) = 49.87, p < .001, \(\eta^2_{p}\) = 0.07, respectively.
The significant effect of block will be further assessed via post-hoc tests. Because our data displayed a violation of the sphericity assumption, we will use the Bonferroni adjustment.
The means that we will be comparing are presented below:
CBDescs = summarySE(data = CatLong, measurevar = "Performance",
groupvars = c("Cat", "Block"), conf.interval = .95)
# Create a table to display the results:
kable(CBDescs, digits = 4,
caption = "Table 23. Descriptives by category set and block.",
col.names = c("Category Set", "Block", "n", "M","SD", "SE", "CI"),
align = 'c') %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Category Set | Block | n | M | SD | SE | CI |
|---|---|---|---|---|---|---|
| RD | Block1 | 249 | 0.6675 | 0.1195 | 0.0076 | 0.0149 |
| RD | Block2 | 249 | 0.7748 | 0.1291 | 0.0082 | 0.0161 |
| RD | Block3 | 249 | 0.7876 | 0.1251 | 0.0079 | 0.0156 |
| RD | Block4 | 249 | 0.7935 | 0.1309 | 0.0083 | 0.0163 |
| II | Block1 | 207 | 0.6155 | 0.0740 | 0.0051 | 0.0101 |
| II | Block2 | 207 | 0.6586 | 0.0922 | 0.0064 | 0.0126 |
| II | Block3 | 207 | 0.6783 | 0.1026 | 0.0071 | 0.0141 |
| II | Block4 | 207 | 0.6820 | 0.1138 | 0.0079 | 0.0156 |
RD Post-hoc Test
CMB_RD_Bon = pairwise.t.test(RD_Perf$Performance, RD_Perf$Block, p.adjust.method = "bonferroni", paired = T)
# Round the results to 4 decimal places:
CMB_RD_Bon = data.frame(round(CMB_RD_Bon$p.value, 4))
# Format the results table so that significant p-values will be bolded:
CMB_RD_BonB = CMB_RD_Bon %>%
mutate(
Block1 = text_spec(Block1, bold = (ifelse(Block1 < .05, "TRUE", "FALSE"))),
Block2 = text_spec(Block2, bold = (ifelse(Block2 < .05, "TRUE", "FALSE"))),
Block3 = text_spec(Block3, bold = (ifelse(Block3 < .05, "TRUE", "FALSE")))
)
# Add row labels:
CMB_RD_BonL = data.frame("Comparison" = c("Block2", "Block3", "Block4"))
CMB_RD_BonB = cbind(CMB_RD_BonL, CMB_RD_BonB)
# Create a table to display the results:
kable(CMB_RD_BonB, digits = 4,
caption = "Table 24. Post-hoc test of rule-defined performance across block.",
align = 'c', escape = FALSE) %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Comparison | Block1 | Block2 | Block3 |
|---|---|---|---|
| Block2 | 0 | NA | NA |
| Block3 | 0 | 0.0531 | NA |
| Block4 | 0 | 0.0077 | 1 |
These results indicate that, with respect to the rule-defined task, block 1 performance (MRD1 = 0.67, SDRD1 = 0.12) was significantly lower than blocks 2 (MRD2 = 0.77, SDRD2 = 0.13), 3 (MRD3 = 0.79, SDRD3 = 0.13), and 4 (MRD4 = 0.79, SDRD4 = 0.13; all ps < .001). Block 4 performance was also significantly higher than performance on block 2 (p = 0.01). No other between-block performance differences were found to be statistically significant (ps > .05).
II Post-hoc Test
CMB_II_Bon = pairwise.t.test(II_Perf$Performance, II_Perf$Block, p.adjust.method = "bonferroni", paired = T)
# Round the results to 4 decimal places:
CMB_II_Bon = data.frame(round(CMB_II_Bon$p.value, 4))
# Format the results table so that significant p-values will be bolded:
CMB_II_BonB = CMB_II_Bon %>%
mutate(
Block1 = text_spec(Block1, bold = (ifelse(Block1 < .05, "TRUE", "FALSE"))),
Block2 = text_spec(Block2, bold = (ifelse(Block2 < .05, "TRUE", "FALSE"))),
Block3 = text_spec(Block3, bold = (ifelse(Block3 < .05, "TRUE", "FALSE")))
)
# Add row labels:
CMB_II_BonL = data.frame("Comparison" = c("Block2", "Block3", "Block4"))
CMB_II_BonB = cbind(CMB_II_BonL, CMB_II_BonB)
# Create a table to display the results:
kable(CMB_II_BonB, digits = 4,
caption = "Table 25. Post-hoc test of information integration performance across block.",
align = 'c', escape = FALSE) %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Comparison | Block1 | Block2 | Block3 |
|---|---|---|---|
| Block2 | 0 | NA | NA |
| Block3 | 0 | 0.0036 | NA |
| Block4 | 0 | 0.0011 | 1 |
These results indicate that, with respect to the information integration task, block 1 performance (MII1 = 0.62, SDII1 = 0.07) was significantly lower than blocks 2 (MII2 = 0.66, SDII2 = 0.09), 3 (MII3 = 0.68, SDII3 = 0.1), and 4 (MII4 = 0.68, SDII4 = 0.11; all ps < .001). Block 2 performance was also significantly lower than both block 3 (p = 0.004) and 4 (p = 0.001). No other between-block performance differences were found to be statistically significant (ps > .05).
Month Post-Hoc:
The significant main effect of month will be further assessed (though the test will essentially be identical to the one conducted in the Problem 2 section above). Because we are planning on completing all possible pairwise comparisons, we will use the Tukey HSD adjustment (as suggested by Maxwell & Delaney, 2003).
Main Effect of Time
CMB_M_Tuk = TukeyHSD(aov(CatLong$Performance ~ CatLong$Month))
# Extract the results table from the output:
CMB_M_Tuk = data.frame(CMB_M_Tuk$`CatLong$Month`)
# Round the results to 4 decimal places:
CMB_M_Tuk = round(CMB_M_Tuk, 4)
# Format the results table so that significant p-values will be bolded:
CMB_M_TukB = CMB_M_Tuk %>%
mutate(
p.adj = cell_spec(p.adj, bold = (ifelse(p.adj < .05, "TRUE", "FALSE")))
)
# Add row labels:
CMB_M_TukL = data.frame("Comparison" = c("Feb-Jan", "Mar-Jan", "Apr-Jan", "Sept-Jan", "Oct-Jan", "Nov-Jan", "Mar-Feb", "Apr-Feb", "Sept-Feb", "Oct-Feb", "Nov-Feb", "Apr-Mar", "Sept-Mar", "Oct-Mar", "Nov-Mar", "Sept-Apr", "Oct-Apr", "Nov-Apr", "Oct-Sept", "Nov-Sept", "Nov-Oct"))
CMB_M_TukB = cbind(CMB_M_TukL, CMB_M_TukB)
# Create a table to display the results:
kable(CMB_M_TukB, digits = 4,
caption = "Table 26. Post-hoc tests of performance by month.",
col.names = c("Comparison", "Difference", "Lower", "Upper", "p"),
align = 'c', escape = FALSE) %>%
kable_styling(bootstrap_options =
c("hover", "responsive", "striped"),
full_width = F, position = "center")| Comparison | Difference | Lower | Upper | p |
|---|---|---|---|---|
| Feb-Jan | -0.0442 | -0.0822 | -0.0063 | 0.0107 |
| Mar-Jan | -0.0467 | -0.0797 | -0.0136 | 0.0006 |
| Apr-Jan | -0.0037 | -0.0358 | 0.0284 | 0.9999 |
| Sept-Jan | 0.0486 | 0.0184 | 0.0789 | 0 |
| Oct-Jan | -0.0007 | -0.0302 | 0.0287 | 1 |
| Nov-Jan | -0.0453 | -0.1005 | 0.0098 | 0.1885 |
| Mar-Feb | -0.0024 | -0.0404 | 0.0356 | 1 |
| Apr-Feb | 0.0405 | 0.0033 | 0.0777 | 0.0223 |
| Sept-Feb | 0.0929 | 0.0573 | 0.1284 | 0 |
| Oct-Feb | 0.0435 | 0.0086 | 0.0784 | 0.0045 |
| Nov-Feb | -0.0011 | -0.0593 | 0.0572 | 1 |
| Apr-Mar | 0.0429 | 0.0108 | 0.0750 | 0.0016 |
| Sept-Mar | 0.0953 | 0.0651 | 0.1255 | 0 |
| Oct-Mar | 0.0459 | 0.0164 | 0.0754 | 0.0001 |
| Nov-Mar | 0.0013 | -0.0538 | 0.0565 | 1 |
| Sept-Apr | 0.0524 | 0.0232 | 0.0816 | 0 |
| Oct-Apr | 0.0030 | -0.0254 | 0.0314 | 0.9999 |
| Nov-Apr | -0.0416 | -0.0962 | 0.0130 | 0.2705 |
| Oct-Sept | -0.0494 | -0.0757 | -0.0231 | 0 |
| Nov-Sept | -0.0939 | -0.1475 | -0.0404 | 0 |
| Nov-Oct | -0.0446 | -0.0977 | 0.0085 | 0.168 |
These results indicate that participants displayed significantly better performance during September (MSept = 0.76, SDSept = 0.09) than during every other month (MJan = 0.71, SDJan = 0.11, p < .001; MFeb = 0.67, SDFeb = 0.11, p < .001; MMar = 0.67, SDMar = 0.12, p < .001; MApr = 0.71, SDApr = 0.11, p < .001; MOct = 0.71, SDOct = 0.08, p < .001; MNov = 0.7, SDNov = 0.07, p < .001). Participants in February and March also performed significantly worse than participants in January (p = 0.01 and p = 0.001, repectively), April (p = 0.02 and p = 0.002, respectively), and October (p = 0.004 and p < .001, respectively).