Introduction

This notebook outlines analyses conducted as part of a coding challenge in The Categorization and Cognitive Science Lab. The data considered below is from a series of studies in which participants were asked to complete four blocks of either a rule-defined (RD) or information integration (II) category learning task. Participants were also asked to provide responses to a number of situational and demographic-based items.

The following file contains all of the data that was collected.

Data = read.csv("ModifiedFullData.csv", header = TRUE)

Some notes regarding the names of variables:

Delete - an indication of whether the participant was to be included in the analysis (coded as “0”) or not (coded as “1”).
UniqueSubjNum - subject numbers that uniquely identify participants in the dataset.
SubjNum - subject numbers that identify participants within each independent experiment.
ExpName - the name of the seven experiments included in the data set.
Cat - an indication of the category set (rule-defined [RD] or information integration [II]) that participants were randomly assigned to learn.
Condition - the name of the conditions in each of the individual experiments.
SDtoday and FRtoday - dummy coded versions of the Cat variable
Time - the time of day that testing occurred.
Time2 - an ordinal version of the Time variable.
Date - the date of testing.
DayofWeek - an ordinal version of the Date variable.
Month - the month in which testing occurred.
NumSubs - the number of subjects tested during each testing session.
SameGeNder and SameGenWhat - an indication of whether participants who were tested in a group were of the same gender (“1”) or not (“0”) and, if so, what gender they were.
CellPhone and Internet - an indication of whether participants were (“1”) or were not (“0”) observed accessing either their cellphone or the internet during the testing session.
PaidPool - an indication of whether participants were recruited from the psychology subject pool or were, instead, paid participants.
Late - a measure of how many minutes late a participant was for their scheduled testing session.
SignUp -
Age - participants’ self-reported age in years.
Gender - participants’ self-reported gender.
NativeLang, SecondLang, and AdditionalLang - participants’ self-reported native, second, and/or additional language(s).
SecondProficiency - participants’ self-reported level of second language proficiency, scored from 0 (low proficiency) to 4 (high proficiency).
Bilingual - an indication of whether a participant identified as bilingual (“1”) or not (“0”).
AcademicYear - participants’ academic year of study.
ExamLastWeek - an indication of whether a participant had written an exam in the week previous to testing (“1”) or not (“0”).
ExamnextWeek - an indication of whether a participant would (“1”) or would not (“0”) be writing an exam in the week after testing.
BusyDay - an indication of whether the testing day was (“1”) or was not (“0”) a busy day for each participant.
ClassBefore and ClassAfter - an indication of whether a participant did (“1”) or did not (“0”) have a class prior to or after the testing session.
FirstExp and OtherExps - an indication of if the study was (“1”) or was not (“0”) the first study a subject had participated and, if not, how many other studies they had previously participated in.
LastMealWhen and LastMealWhat - a measure of how many hours it had been since the participant had previously eaten and what they had eaten at the time they last ate.
Breakfast and BreakfastWhat - an indication of whether a participant did (“1”) or did not (“0”) eat breakfast on the day of testing and, if so, what their breakfast consisted of.
Alcohol and DrinkPerWeek - an indication of whether a participant does (“1”) or does not (“0”) drink alcohol and, if so, how many drinks per week they typically consume.
CoffeeTea - an indication of whether a participant is (“1”) or is not (“0”) a regular coffee or tea drinker.
Exercise and ExerciseFreq - an indication of whether a participant does (“1”) or does not (“0”) exercise on a regular basis and, if so, how many times they typically exercise per week.
SleepAvg and SleepLastNight - a measure of the number of hours a participant typically sleeps for each night and the number of hours they slept for the night prior to testing.
Tired - a measure of self-reported tiredness, scored from 1 (not tired) to 7 (very tired).
ExpDifficulty - a measure of self-reported task difficulty, scored from 1 (easy) to 7 (difficult).
GiveUp - an indication of whether a participant did (“1”) or did not (“0”) give up during the study.
MostlyGuess - an indication of whether a participant did (“1”) or did not (“0”) report that they “mostly guessed” during the study.
X1_Block to X4_Block - proportion of items responded to correctly for blocks 1 to 4 of the category learning task.
Total - proportion of items responded to correctly across all four blocks of the category learning task.

Problem 1

The first set of analyses will involve the calculation of descriptive statistics and the production of some basic figures. For these purposes, we will focus primarily on the following variables:

Total - a continuous dependent variable (DV).
Cat - a nominal independent variable (IV).
Month, Time2, and DayofWeek - ordinal IV’s. Note that the data collected in May was collected as part of an unrelated pilot study. Therefore, we will remove the May data from our data set.

MayData = subset(Data, Month != "05_May")
CatData = droplevels(MayData)

All subsequent analyses will be conducted on this CatData data set.

Analysis Prep

Load Libraries

The following libraries will be used for this analysis:

# For creating themed html files:

# install.packages("prettydoc")
library(prettydoc)

# For calculating descriptive statistics:

# install.packages("Rmisc")
library(Rmisc)

# For formatting tables:

# install.packages("knitr")
library(knitr)

# install.packages ("kableExtra")
library(kableExtra)

# For using pipes and plotting performance with ggplot2:

# install.packages("tidyverse")
library(tidyverse)

Rename Factor Levels

Before we begin, we’ll rename and reorder the levels of the variables we’ll be using.

# Reorder the Cat variable:

CatData$Cat = factor(CatData$Cat, levels = c("RD", "II"))

# Rename the Month variable:

levels(CatData$Month) = c("Jan", "Feb", "Mar", "Apr", "Sept", "Oct", "Nov")

# Rename the Time2 variable:

levels(CatData$Time2) = c("Morning", "Early Afternoon", "Late Afternoon")

# Rename the DayofWeek variable:

levels(CatData$DayofWeek) = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat")

Basic Descriptive Statistics

We’ll start by calculating some basic descriptive statistics (ns, Ms, SDs, SEs, and 95% CIs) for the DV across levels of each of the IV’s.

Category Set

# Calculate summary statistics:

CatDescs = summarySE(data = CatData, measurevar = "Total",
                     groupvars = "Cat", conf.interval = .95)

# Create a table to display the results:
                  
kable(CatDescs, digits = 4,
      caption = "Table 1. Descriptives by category set.",
      col.names = c("Category Set", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 1. Descriptives by category set.
Category Set	n	M	SD	SE	CI
RD	249	0.7559	0.1015	0.0064	0.0127
II	207	0.6586	0.0804	0.0056	0.0110

Month

# Calculate summary statistics:

MonthDescs = summarySE(data = CatData, measurevar = "Total",
                       groupvars = "Month", conf.interval = .95)

# Create a table to display the results:

kable(MonthDescs, digits = 4,
      caption = "Table 2. Descriptives by month.",
      col.names = c("Month", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 2. Descriptives by month.
Month	n	M	SD	SE	CI
Jan	64	0.7141	0.1082	0.0135	0.0270
Feb	39	0.6698	0.1077	0.0172	0.0349
Mar	64	0.6674	0.1199	0.0150	0.0299
Apr	72	0.7103	0.1121	0.0132	0.0263
Sept	95	0.7627	0.0871	0.0089	0.0177
Oct	108	0.7098	0.0841	0.0081	0.0161
Nov	14	0.6965	0.0719	0.0192	0.0415

Time of Day

# Calculate summary statistics:

TimeDescs = summarySE(data = CatData, measurevar = "Total",
                      groupvars = "Time2", conf.interval = .95)

# Create a table to display the results:

kable(TimeDescs, digits = 4,
      caption = "Table 3. Descriptives by time of day.",
      col.names = c("Time", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 3. Descriptives by time of day.
Time	n	M	SD	SE	CI
Morning	151	0.7108	0.1022	0.0083	0.0164
Early Afternoon	170	0.7040	0.1057	0.0081	0.0160
Late Afternoon	135	0.7224	0.1051	0.0090	0.0179

Day of Week

# Calculate summary statistics:

DayDescs = summarySE(data = CatData, measurevar = "Total",
                     groupvars = "DayofWeek", conf.interval = .95)

# Create a table to display the results:

kable(DayDescs, digits = 4,
      caption = "Table 4. Descriptives by day.",
      col.names = c("Day", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 4. Descriptives by day.
Day	n	M	SD	SE	CI
Mon	66	0.7419	0.0896	0.0110	0.0220
Tues	115	0.7243	0.1028	0.0096	0.0190
Wed	82	0.7156	0.0995	0.0110	0.0219
Thurs	127	0.6856	0.1026	0.0091	0.0180
Fri	53	0.7010	0.1191	0.0164	0.0328
Sat	13	0.7217	0.1263	0.0350	0.0763

Complex Descriptive Statistics

Next we’ll calculate descriptive statistics for the DV across levels of the Month, Time2, and DayofWeek variables crossed with the Cat variable.

Month by Category Set

# Calculate summary statistics:

CMDescs = summarySE(data = CatData, measurevar = "Total",
                    groupvars = c("Cat", "Month"), conf.interval = .95)

# Create a table to display the results:

kable(CMDescs, digits = 4,
      caption = "Table 5. Descriptives by category set and month.",
      col.names = c("Category Set", "Month", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 5. Descriptives by category set and month.
Category Set	Month	n	M	SD	SE	CI
RD	Jan	25	0.7771	0.1099	0.0220	0.0454
RD	Feb	15	0.7253	0.1396	0.0360	0.0773
RD	Mar	33	0.7216	0.1272	0.0221	0.0451
RD	Apr	41	0.7534	0.1139	0.0178	0.0360
RD	Sept	68	0.7875	0.0850	0.0103	0.0206
RD	Oct	63	0.7411	0.0721	0.0091	0.0182
RD	Nov	4	0.7417	0.0319	0.0159	0.0507
II	Jan	39	0.6736	0.0865	0.0138	0.0280
II	Feb	24	0.6352	0.0635	0.0130	0.0268
II	Mar	31	0.6098	0.0791	0.0142	0.0290
II	Apr	31	0.6534	0.0808	0.0145	0.0296
II	Sept	27	0.7001	0.0557	0.0107	0.0220
II	Oct	45	0.6659	0.0807	0.0120	0.0242
II	Nov	10	0.6784	0.0765	0.0242	0.0548

Time of Day by Category Set

# Calculate summary statistics:

CTDescs = summarySE(data = CatData, measurevar = "Total",
                    groupvars = c("Cat", "Time2"), conf.interval = .95)

# Create a table to display the results:

kable(CTDescs, digits = 4,
      caption = "Table 6. Descriptives by category set and time of day.",
      col.names = c("Category Set", "Time", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 6. Descriptives by category set and time of day.
Category Set	Time	n	M	SD	SE	CI
RD	Morning	76	0.7639	0.0916	0.0105	0.0209
RD	Early Afternoon	87	0.7482	0.1058	0.0113	0.0225
RD	Late Afternoon	86	0.7565	0.1060	0.0114	0.0227
II	Morning	75	0.6570	0.0825	0.0095	0.0190
II	Early Afternoon	83	0.6577	0.0839	0.0092	0.0183
II	Late Afternoon	49	0.6626	0.0720	0.0103	0.0207

Day of Week by Category Set

# Calculate summary statistics:

CDDescs = summarySE(data = CatData, measurevar = "Total",
                    groupvars = c("Cat", "DayofWeek"), 
                    conf.interval = .95)

# Create a table to display the results:

kable(CDDescs, digits = 4,
      caption = "Table 7. Descriptives by category set and day.",
      col.names = c("Category Set", "Day", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 7. Descriptives by category set and day.
Category Set	Day	n	M	SD	SE	CI
RD	Mon	43	0.7761	0.0859	0.0131	0.0264
RD	Tues	65	0.7750	0.0891	0.0110	0.0221
RD	Wed	46	0.7550	0.1015	0.0150	0.0302
RD	Thurs	63	0.7276	0.0996	0.0126	0.0251
RD	Fri	26	0.7369	0.1307	0.0256	0.0528
RD	Sat	6	0.7901	0.1580	0.0645	0.1658
II	Mon	23	0.6781	0.0558	0.0116	0.0241
II	Tues	50	0.6583	0.0799	0.0113	0.0227
II	Wed	36	0.6652	0.0707	0.0118	0.0239
II	Thurs	64	0.6443	0.0882	0.0110	0.0220
II	Fri	27	0.6664	0.0969	0.0187	0.0383
II	Sat	7	0.6630	0.0489	0.0185	0.0453

Plotting Performance

Now we’ll create some plots so that we can visualize the data.

Month by Category Set

We’ll begin with a basic bar plot. (Bar plots may be less useful for visualizing data ranges and distributions than violin plots; however, when a variable has as many levels as the Month variable does, the violin plot becomes crowded and horizontally compressed.)

The following code will produce a bar plot with “Month” on the x-axis, “Proportion Correct” on the y-axis, and separate bars for each of the conditions. Error bars represent SEs.

First we’ll define the values to be used for the plot.

# Calculate SE min:

CMMin = data.frame("CMMin" = CMDescs$Total - CMDescs$se)

# Calculate SE max:

CMMax = data.frame("CMMax" = CMDescs$Total + CMDescs$se)

# Create data frame of values to be used:

CMData = data.frame("Cat" = CMDescs$Cat,
                    "Month" = CMDescs$Month,
                    "CMMean" = CMDescs$Total,
                    "CMMin" = CMMin,
                    "CMMax" = CMMax)

Next, we’ll plot the data.

ggplot(CMData, aes(Month, CMMean, fill = Cat )) +
  geom_col(color = "black", position = "dodge", alpha = .7) +
  
  
  # Add in error bars:
  
  geom_errorbar(aes(ymin = CMMin, ymax = CMMax),
    color = "black", 
    position = position_dodge(width = 0.9),
    width = .1) +
  
  # Add labels:
  
  labs(x = "Month", y = "Proportion Correct",
       fill = "Category Condition") +
  ggtitle("Category Learning Performance by Condition and Month") +
  
  # Define the vertical size of the plot:
  
  ylim(0, 1) + 
  
  # Define variable colours and theme:
  
  scale_fill_manual(values = c("orchid3", "lightseagreen")) +
  scale_color_manual(values = c("orchid3", "lightseagreen")) +
  theme_light()

Time of Day by Category Set

The Time2 variable has only 3 levels. It is an ideal candidate, therefore, for a violin plot.

The following code will produce a split violin plot with “Time of Day” on the x-axis, “Proportion Correct” on the y-axis, and separate data clouds for each of the conditions. Dots and lines represent means and 95% CIs, respectively.

We’ll begin by defining a function that will create split violin plots. The code below was taken from DeBruine (2018).

GeomSplitViolin <- ggproto(
  "GeomSplitViolin", 
  GeomViolin, 
  draw_group = function(self, data, ..., draw_quantiles = NULL) {
    data <- transform(data, 
                      xminv = x - violinwidth * (x - xmin), 
                      xmaxv = x + violinwidth * (xmax - x))
    grp <- data[1,'group']
    newdata <- plyr::arrange(
      transform(data, x = if(grp%%2==1) xminv else xmaxv), 
      if(grp%%2==1) y else -y
    )
    newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ],
                     newdata[1, ])
    newdata[c(1,nrow(newdata)-1,nrow(newdata)), 'x'] <- round(newdata[1,
                                                                      'x']) 
    if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
      stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))
      quantiles <- ggplot2:::create_quantile_segment_frame(data,
                                                           draw_quantiles)
      aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data),
                                                          c("x", "y")),
                         drop = FALSE]
      aesthetics$alpha <- rep(1, nrow(quantiles))
      both <- cbind(quantiles, aesthetics)
      quantile_grob <- GeomPath$draw_panel(both, ...)
      ggplot2:::ggname("geom_split_violin", 
                       grid::grobTree(GeomPolygon$draw_panel(newdata, ...),
                                      quantile_grob))
    } else {
      ggplot2:::ggname("geom_split_violin",
                       GeomPolygon$draw_panel(newdata, ...))
    }
  }
)

geom_split_violin <- function (mapping = NULL, 
                               data = NULL, 
                               stat = "ydensity", 
                               position = "identity", ..., 
                               draw_quantiles = NULL, 
                               trim = TRUE, 
                               scale = "area", 
                               na.rm = FALSE, 
                               show.legend = NA, 
                               inherit.aes = TRUE) {
  layer(data = data, 
        mapping = mapping, 
        stat = stat, 
        geom = GeomSplitViolin, 
        position = position, 
        show.legend = show.legend, 
        inherit.aes = inherit.aes, 
        params = list(trim = trim, 
                      scale = scale, 
                      draw_quantiles = draw_quantiles, 
                      na.rm = na.rm, ...)
        )
}

Next, we’ll define the values to be used for the dots and lines in the plot.

# Calculate CI min:

CTMin = data.frame("CTMin" = CTDescs$Total - CTDescs$ci)

# Calculate CI max:

CTMax = data.frame("CTMax" = CTDescs$Total + CTDescs$ci)

# Create data frame of values to be used:

CTData = data.frame("Cat" = CTDescs$Cat, "Time2" = CTDescs$Time2,
                    "CTMean" = CTDescs$Total, "CTMin" = CTMin,
                    "CTMax" = CTMax)

Finally, we’ll plot the data.

 CatData %>%
  ggplot(aes(Time2, Total, fill = Cat)) +
  geom_split_violin(color="black", trim=FALSE, alpha = 0.7) +

  # Add in dots and lines:
  
  geom_pointrange(data = CTData,
    aes(Time2, CTMean, ymin = CTMin, ymax = CTMax),
    color = "black", 
    shape = 20,
    position = position_dodge(width = 0.25)) +
  
  # Add labels:
  
  labs(x = "Time of Day", y = "Proportion Correct",
       fill = "Category Condition") +
  ggtitle("Category Learning Performance by Condition and Time of Day") +
  
  # Define the vertical size of the plot:
  
  ylim(0.3, 1) + 
  
  # Define variable colours and theme:
  
  scale_fill_manual(values = c("orchid3", "lightseagreen")) +
  scale_color_manual(values = c("orchid3", "lightseagreen")) +
  theme_light()

Day of Week by Category Set

Just for fun, we’ll create a notched box plot to display the last combination of variables.

The following code will produce a notched box plot with “Day” on the x-axis, “Proportion Correct” on the y-axis, and separate boxes for each of the conditions. Notches represent a CI around the median. (Note that the presence of the horn-like features on the last two boxes indicate that the CI is greater than the interquartile range.)

Plot the data.

ggplot(CatData, aes(x = DayofWeek, y = Total, fill = Cat)) +
  geom_boxplot(outlier.color = "black",
               outlier.shape = 16, outlier.size = 2,
               notch = TRUE, position = position_dodge(1), alpha = .7) +

# Add labels:
  
  labs(x = "Day", y = "Proportion Correct", fill = "Category Condition") +
  ggtitle("Category Learning Performance by Condition and Day") +
  
  # Define the vertical size of the plot:
  
  ylim(0.4, 1) + 
  
  # Define variable colours and theme:
  
  scale_fill_manual(values = c("orchid3", "lightseagreen")) +
  scale_color_manual(values = c("orchid3", "lightseagreen")) +
  theme_light()

Problem 2

The second set of analyses will involve running ANOVAs to assess the potential effects of the Cat variable, crossed with both Month and Time2, on overall category learning performance.

Analysis Prep

Load Libraries

The following libraries will be used for this analysis:

# For running Levene's test:

# install.packages("car")
library(car)

# For performing ANOVAs:

# install.packages("ez")
library(ez)

# For conducting Games-Howell post-hocs:

# install.packages("userfriendlyscience")
library(userfriendlyscience)

Adjust Display Options

ezANOVA prints output using scientific notation. In order to make it easier to read our ANOVA outputs, we’ll turn the scientific notation option off.

options(scipen = 999)

p Value Rounding Function

We’ll also create a function to assess and print p values in the comments of our script. If p >= .005, the function will display “p =” and the value rounded to two decimal places. If .0005 <= p < .005, the function will display “p =” and the value rounded to three decimal places. If p < .0005, the function will display “p < .001.”

p_round <- function(x){
  if(x > .005)
    {x1 = (paste("= ", round(x, digits = 2), sep = ''))
  }  
  else if(x == .005){x1 = (paste("= .01"))
  }
  else if(x > .0005 & x < .005)
    {x1 = (paste("= ", round(x, digits = 3), sep = ''))
  }  
  else if(x == .0005){x1 = (paste("= .001"))
  }
  else{x1 = (paste("< .001"))
  } 
  (x1)
}

Partial Eta Square

In some cases, we will have to use adjusted df’s and/or perform White adjusted ANOVAs. In these cases, we will have to calculate adjusted effect sizes. Partial eta square can be calculated using the following formula, which we will create a function for: \[\eta^2_{partial} = {\frac{df_n F}{df_n F + df_d}}\]

peta <- function(dfn, dfd, f) {
  return(dfn * f / ((dfn * f) + dfd))
}

Post-Hoc Rounding Function

We’ll also create a function to help round some of our post-hoc results. (Neither the kable rounding function nor the standard “round” function will work for some of our post-hoc results tables.) The code below was taken from Akhmed (2015).

round_df <- function(df, digits) {
  nums <- vapply(df, is.numeric, FUN.VALUE = logical(1))

  df[,nums] <- round(df[,nums], digits = digits)

  (df)
}

Category Set and Month

To assess the effects of category set and month on overall learning performance, we’ll conduct a 2 x 7 ANOVA with both Cat and Month as between-group factors. Because we have unequal sample sizes between groups, we’ll use Type III sum of squares. Note that a White adjustment has been used; this will be discussed further in the Homogeneity of Variance section below.

Assumptions

The standard ANOVA makes three primary assumptions:

1. Independent Random Sampling

This assumption was met during testing.

2. Normality

This assumption can be tested using a Shapiro-Wilk test.

CM_Shap = shapiro.test(CatData$Total)
CM_Shap

## 
##  Shapiro-Wilk normality test
## 
## data:  CatData$Total
## W = 0.97308, p-value = 0.0000001922

Based on an alpha level of .05, the assumption of normality is not met; W = 0.97, p < .001. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).

3. Homogeneity of Variance

This assumption can be tested using a Levene’s test (provided as part of the ANOVA output).

# Run the ANOVA:

CM_ANOVA = ezANOVA(data = CatData, dv = .(Total),
                   wid = .(UniqueSubjNum), between = .(Cat, Month),
                   detailed = TRUE, type = "III",
                   white.adjust = TRUE, return_aov = TRUE)

# Extract the Levene's Test from the ANOVA output:

CM_Lev = CM_ANOVA$`Levene's Test for Homogeneity of Variance`

# Create a table to display the results:

kable(CM_Lev, digits = 4,
      caption = "Table 8. Month by category set Levene's test.",
      col.names = c("DFn", "DFd", "SSn", "SSd","F", "p", "sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 8. Month by category set Levene’s test.
DFn	DFd	SSn	SSd	F	p	sig
13	442	0.1415	1.7921	2.684	0.0012

Based on an alpha level of .05, the assumption of homogeneity of variances is not met; F (13, 442) = 2.68, p = 0.001. Because sample sizes are unequal, a White-adjustment should be used to correct for this violation.

ANOVA

kable(CM_ANOVA$ANOVA, digits = 4,
      caption = "Table 9. Month by category set ANOVA.",
      col.names = c("Effect", "DFn", "DFd", "F", "p", "sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 9. Month by category set ANOVA.
Effect	DFn	DFd	F	p
(Intercept)	1	442	19570.0933	0.0000
Cat	1	442	80.1292	0.0000
Month	6	442	5.8585	0.0000
Cat:Month	6	442	0.4544	0.8418

Because we’ve used a White adjustment, effect sizes are not provided in the output. Instead, we will use our peta function to calculate partial eta effect sizes.

# Calculate values:

CMC_peta = peta(dfn = CM_ANOVA$ANOVA[2,]$DFn, dfd = CM_ANOVA$ANOVA[2,]$DFd, f = CM_ANOVA$ANOVA[2,]$F)
  
CMM_peta = peta(dfn = CM_ANOVA$ANOVA[3,]$DFn, dfd = CM_ANOVA$ANOVA[3,]$DFd, f = CM_ANOVA$ANOVA[3,]$F)

CMI_peta = peta(dfn = CM_ANOVA$ANOVA[4,]$DFn, dfd = CM_ANOVA$ANOVA[4,]$DFd, f = CM_ANOVA$ANOVA[4,]$F)

# Create a data frame of the results:

petas = data.frame("Cat" = CMC_peta, "Month" = CMM_peta,
                         "Cat*Month" = CMI_peta)

# Create a table to display the results:

kable(petas, digits = 4,
      caption = "Table 10. Month by category set effect sizes.",
      col.names = c("Category Set", "Month", "Interaction"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 10. Month by category set effect sizes.
Category Set	Month	Interaction
0.1535	0.0737	0.0061

Interpretation of Main Effects

Interaction

The category set x month interaction was not found to be statistically significant; F (6, 442) = 0.45, p = 0.84, \(\eta^2_{p}\) = 0.01.

Main Effect of Category Set

The main effect of category set was found to be statistically significant; F (1, 442) = 80.13, p < .001, \(\eta^2_{p}\) = 0.15.

Main Effect of Month

The main effect of month was found to be statistically significant; F (6, 442) = 5.86, p < .001, \(\eta^2_{p}\) = 0.07.

Post-hoc Tests

The significant main effects of category set and month will be further assessed via post-hoc tests. Because we have unequal sample sizes and our data displayed a violation of the homogeneity of variances assumption, we will use the Games-Howell adjustment.

Main Effect of Category Set

# Calculate post-hoc:

C_post = posthocTGH(y = CatData$Total, x = CatData$Cat,
                    method = c("games-howell"), conf.level = .95,
                    digits = 4, formatPvalue = TRUE)

# Round results:

C_post_r = round_df(C_post$output$games.howell, digits = 4)

# Format the results table so that significant p-values will be bolded:

C_post_r$p = cell_spec(C_post_r$p,
                       bold = (ifelse(C_post_r$p < .05, "TRUE", "FALSE"))) 

# Create a table to display the results:

kable(C_post_r, digits = 4,
      caption = "Table 11. Category set post-hoc.",
      col.names = c("Difference", "CI Min", "CI Max", "t", "df", "p"),
      align = 'c', escape = FALSE) %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 11. Category set post-hoc.
	Difference	CI Min	CI Max	t	df	p
II-RD	-0.0973	-0.114	-0.0805	11.4169	452.941	0

These results indicate that participants in the RD condition (M_RD = 0.76, SD_RD = 0.1) performed significantly better on the category learning task than participants in the II condition(M_II = 0.66, SD_II = 0.08); t (452.94) = 11.42, p < .001.

Main Effect of Month

# Calculate post-hoc: 

M_post = posthocTGH(y = CatData$Total, x = CatData$Month,
                    method = c("games-howell"), conf.level = .95,
                    digits = 4, formatPvalue = TRUE)

# Round results:

M_post_r = round_df(M_post$output$games.howell, digits = 4)

# Format the results table so that significant p-values will be bolded:

M_post_r$p = cell_spec(M_post_r$p,
                       bold = (ifelse(M_post_r$p < .05, "TRUE", "FALSE"))) 

# Create a table to display the results:

kable(M_post_r, digits = 4,
      caption = "Table 12. Month post-hoc.",
      col.names = c("Difference", "CI Min", "CI Max", "t", "df", "p"),
      align = 'c', escape = FALSE) %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"), 
                full_width = F, position = "center")

Table 12. Month post-hoc.
	Difference	CI Min	CI Max	t	df	p
Feb-Jan	-0.0442	-0.1105	0.0220	2.0190	80.7115	0.411
Mar-Jan	-0.0467	-0.1071	0.0138	2.3118	124.6992	0.2467
Apr-Jan	-0.0037	-0.0603	0.0529	0.1976	133.0743	1
Sept-Jan	0.0486	0.0000	0.0972	2.9994	115.2892	0.0502
Oct-Jan	-0.0043	-0.0517	0.0431	0.2730	108.0980	1
Nov-Jan	-0.0175	-0.0922	0.0571	0.7467	27.6581	0.9881
Mar-Feb	-0.0024	-0.0714	0.0665	0.1059	87.1016	1
Apr-Feb	0.0405	-0.0252	0.1062	1.8650	80.7967	0.5091
Sept-Feb	0.0928	0.0336	0.1521	4.7817	59.4204	0.0002
Oct-Feb	0.0399	-0.0183	0.0982	2.0966	55.6527	0.3689
Nov-Feb	0.0267	-0.0541	0.1074	1.0338	34.6614	0.9424
Apr-Mar	0.0429	-0.0169	0.1027	2.1491	129.5460	0.3307
Sept-Mar	0.0953	0.0428	0.1477	5.4615	106.7338	0
Oct-Mar	0.0424	-0.0089	0.0936	2.4870	100.1513	0.1752
Nov-Mar	0.0291	-0.0476	0.1058	1.1946	31.2206	0.8909
Sept-Apr	0.0523	0.0046	0.1001	3.2828	130.2505	0.0219
Oct-Apr	-0.0006	-0.0470	0.0459	0.0367	122.8858	1
Nov-Apr	-0.0138	-0.0880	0.0603	0.5923	27.0723	0.9965
Oct-Sept	-0.0529	-0.0888	-0.0170	4.3884	195.7911	0.0004
Nov-Sept	-0.0662	-0.1357	0.0034	3.1216	19.1030	0.0688
Nov-Oct	-0.0132	-0.0822	0.0557	0.6351	17.9562	0.9946

These results indicate that participants displayed significantly better performance during September (M_Sept = 0.76, SD_Sept = 0.09) than during February (M_Feb = 0.67, SD_Feb = 0.11), March (M_Mar = 0.67, SD_Mar = 0.12), April (M_Apr = 0.71, SD_Apr = 0.11), and October (M_Oct = 0.71, SD_Oct = 0.08); t (59.42) = 4.78, p < .001; t (106.73) = 5.46, p < .001; t (130.25) = 3.28, p = 0.02; and t (195.79) = 4.39, p < .001, respectively. Participants in September also performed marginally better than participants in January (M_Jan = 0.71, SD_Jan = 0.11) and November (M_Nov = 0.7, SD_Nov = 0.07); t (115.29) = 3, p = 0.05 and t (19.1) = 3.12, p = 0.07, respectively.

Category Set and Time of Day

To assess the effects of category set and time of day on overall learning performance, we’ll conduct a 2 x 3 ANOVA with both Cat and Time2 as between-group factors. Because we have unequal sample sizes between groups, we’ll use Type III sum of squares.

Assumptions

1. Independent Random Sampling

This assumption was met during testing.

2. Normality

As specified above in the Category Set by Month analysis, the assumption of normality is not met; W = 0.97, p < .001. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).

3. Homogeneity of Variance

Again, this assumption can be tested using a Levene’s test.

# Run the ANOVA:

CT_ANOVA = ezANOVA(data = CatData, dv = .(Total),
                   wid = .(UniqueSubjNum), between = .(Cat, Time2),
                   detailed = TRUE, type = "III", return_aov = TRUE)

# Extract the Levene's Test from the ANOVA output:

CT_Lev = CT_ANOVA$`Levene's Test for Homogeneity of Variance`

# Create a table to display the results:

kable(CT_Lev, digits = 4,
      caption = "Table 13. Time of day by category set Levene's test.",
      col.names = c("DFn", "DFd", "SSn", "SSd","F", "p", "sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 13. Time of day by category set Levene’s test.
DFn	DFd	SSn	SSd	F	p	sig
5	450	0.0337	1.8655	1.6245	0.152

Based on an alpha level of .05, the assumption of homogeneity of variances is met; F (5, 450) = 1.62, p = 0.15. A White-adjustment, therefore, does not need to be used for this analysis.

ANOVA

kable(CT_ANOVA$ANOVA, digits = 4,
      caption = "Table 14. Time of day by category set ANOVA.",
      col.names = c("Effect", "DFn", "DFd", "SSn", "SSd", "F",
                    "p", "sig", "Effect Size"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 14. Time of day by category set ANOVA.
Effect	DFn	DFd	SSn	SSd	F	p	Effect Size
(Intercept)	1	450	219.6694	3.8762	25502.3848	0.0000	0.9827
Cat	1	450	1.0344	3.8762	120.0936	0.0000	0.2107
Time2	2	450	0.0053	3.8762	0.3091	0.7342	0.0014
Cat:Time2	2	450	0.0058	3.8762	0.3377	0.7136	0.0015

Interpretation of Main Effects

Interaction

The category set x time of day interaction was not found to be statistically significant; F (2, 450) = 0.34, p = 0.71, \(\eta^2_{p}\) = 0.

Main Effect of Category Set

As before, the main effect of category set was found to be statistically significant; F (1, 450) = 120.09, p < .001, \(\eta^2_{p}\) = 0.21.

Main Effect of Time of Day

The main effect of time of day was not found to be statistically significant; F (2, 450) = 0.31, p = 0.73, \(\eta^2_{p}\) = 0.

Post-hoc Tests

The significant main effect of category set will be further assessed via a post-hoc test (though the test will essentially be identical to the one conducted in the Category Set by Month analysis above). Because we have only one pairwise comparison to complete, we will use the Bonferroni adjustment.

Main Effect of Category Set

C1_post = pairwise.t.test(CatData$Total, CatData$Cat,
                          p.adjust.method = "bonf")
C1_post

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  CatData$Total and CatData$Cat 
## 
##    RD                 
## II <0.0000000000000002
## 
## P value adjustment method: bonferroni

These results confirm the previous finding that participants in the RD condition performed significantly better on the category learning task than participants in the II condition (p < .001).

Problem 3

The third set of analyses will involve creating a new dataset with Cat, Month, Time2, and performance by block in long format. We will then create line plots to depict performance across block by both Month and Cat, followed by a mixed ANOVA to assess the relationships between the variables.

Analysis Prep

Data Gathering

First, we will create a subset of the variables that we are interested in.

# Specify the variables to keep:

VarsKeep = names(CatData) %in% c("UniqueSubjNum", "Cat", "Time2",  "Month", "X1_Block", "X2_Block", "X3_Block", "X4_Block")

# Define the new subset and label the columns:

Data3 = (CatData[VarsKeep])

colnames(Data3) = c("Subject", "Cat", "Time", "Month", "Block1", "Block2", "Block3", "Block4")

Now we will gather the data into long format.

CatLong = gather(Data3, Block, Performance, Block1:Block4, factor_key = TRUE)

Plotting Performance

Now we’ll plot performance across block by both category set and month.

We’ll begin by calculating the descriptive statistics to be used in the plot.

CMBDescs = summarySE(data = CatLong, measurevar = "Performance", groupvars = c("Cat", "Month", "Block"), conf.interval = .95)

Next, we’ll plot performance. The following code will produce a series of line graphs with “Block” on the x-axis, “Proportion Correct” on the y-axis, separate lines for the two category conditions, and separate plots for the seven months. Error bars represent standard error of the mean (SEM).

CMBFig = ggplot(CMBDescs, aes(x = Block, y = Performance)) +
  
  # Specify labels: 
  
  labs(x = "Block", y = "Proportion Correct") + 
  ggtitle("Category Learning Performance by Block, Month, and Category Set") +
  
  # Define line and point aesthetics:
  
  geom_line(aes(colour = Cat, group = Cat)) + geom_point(size = 2, aes(colour = Cat)) + 
  
   # Adjust axes:
  
  scale_x_discrete(limits = c("Block1", "Block2", "Block3", "Block4"),
                   labels = c("Block1" = "1", "Block2" = "2", "Block3" = "3", "Block4" = "4")) + 
  scale_y_continuous(limits = c(0.4, 1.0), breaks = seq(0.4, 1.0, .1)) +  
  
  # Add legend and adjust colours:
  
  scale_colour_manual(name = "Category Set", values = c("orchid3", "lightseagreen")) + 
  
  # Add error bars:
  
  geom_errorbar(data = CMBDescs, mapping = aes(x = Block, ymin = Performance - se, ymax = Performance + se), width = .1) + 
  
  # Specify theme:
  
  theme_bw() + theme(plot.title = element_text(hjust = .5)) 

# Use facet wrap to divide the plot by month:

CMBFig + facet_wrap( ~ Month, ncol = 2)

Mixed ANOVA

Now we’ll conduct a Type III, 2 x 7 x 4 mixed ANOVA with category condition and month as between-group factors and block as a within-group factor.

Assumptions

The mixed ANOVA makes four primary assumptions:

Independent Random Sampling

This assumption was met during testing.

Normality

This assumption can be tested by applying a Shapiro-Wilk test to the outcome measure (i.e. performance).

CMB_Shap = shapiro.test(CatLong$Performance)
CMB_Shap

## 
##  Shapiro-Wilk normality test
## 
## data:  CatLong$Performance
## W = 0.97786, p-value = 0.0000000000000003455

Based on an alpha level of .05, the assumption of normality is not met; W = 0.98, p < .001. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).

Homogeneity of Variance for Between-Group Factors

This assumption can be tested by implementing a Levene’s test on performance collapsed across the within-group factor (i.e. performance averaged across Block).

CMB_Lev = leveneTest(data = CatData, Total ~ Cat*Month, center = median)
CMB_Lev

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value   Pr(>F)   
## group  13   2.684 0.001205 **
##       442                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on an alpha level of .05, the assumption of homogeneity of between-group variances is not met; F(13, 442) = 2.68, p = 0.001. Because sample sizes are unequal, a White-corrected F-test should be conducted if one is interested in assessing the main effect of condition.

Sphericity

This assumption can be tested using Mauchley’s sphericity test (provided as part of the ANOVA output).

# Run the ANOVA:

CMB_ANOVA = ezANOVA(data = CatLong, dv = .(Performance), wid = .(Subject), within = .(Block), between = .(Cat, Month), detailed = TRUE, type = "III", return_aov = TRUE, white.adjust = TRUE)

# Extract the sphericity test from the ANOVA output:

CMB_Mau = CMB_ANOVA$`Mauchly's Test for Sphericity`

# Create a table to display the results:

kable(CMB_Mau, digits = 4,
      caption = "Table 15. Test of sphericity on performance by block.",
      col.names = c("Effect", "W", "p", "sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 15. Test of sphericity on performance by block.
	Effect	W
5	Block	0.7442
6	Cat:Block	0.7442
7	Month:Block	0.7442
8	Cat:Month:Block	0.7442

Based on an alpha level of .05, the assumption of sphericity is not met for any of the effects involving the within-group factor (W = 0.74, p < .001). We will now look at the potential epsilon corrections that we can use.

# Extract the corrections table from the ANOVA output:

CMB_Eps = CMB_ANOVA$`Sphericity Corrections`

# Create a table to display the results:

kable(CMB_Eps, digits = 4,
      caption = "Table 16. Epsilon corrections for the test of performance by block.",
      col.names = c("Effect", "GG Epsilon", "GG p", "GG sig", "HF Epsilon", "HF p", "HF sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 16. Epsilon corrections for the test of performance by block.
	Effect	GG Epsilon	GG p	HF Epsilon	HF p
5	Block	0.8305	0.0000	0.8356	0.0000
6	Cat:Block	0.8305	0.0000	0.8356	0.0000
7	Month:Block	0.8305	0.0883	0.8356	0.0877
8	Cat:Month:Block	0.8305	0.5984	0.8356	0.5990

Because \(\varepsilon_{GG}\) > .75, we will apply the Huynh-Feldt correction (\(\varepsilon\) = 0.84) to these tests (as suggested by Girden, 1992).

Effect Sizes

Because we’ve used a White adjustment and adjusted df’s, we’ll have to calculate adjusted effect sizes.

# Cat:

CMB_C_peta = peta(dfn = CMB_ANOVA$ANOVA[2,]$DFn, dfd = CMB_ANOVA$ANOVA[2,]$DFd, f = CMB_ANOVA$ANOVA[2,]$F)

# Month:

CMB_M_peta = peta(dfn = CMB_ANOVA$ANOVA[3,]$DFn, dfd = CMB_ANOVA$ANOVA[3,]$DFd, f = CMB_ANOVA$ANOVA[3,]$F)

# Block:

CMB_B_peta = peta(dfn = CMB_ANOVA$ANOVA[4,]$DFn, dfd = CMB_ANOVA$ANOVA[4,]$DFd, f = CMB_ANOVA$ANOVA[4,]$F)

# Cat*Month:

CMB_CM_peta = peta(dfn = CMB_ANOVA$ANOVA[5,]$DFn, dfd = CMB_ANOVA$ANOVA[5,]$DFd, f = CMB_ANOVA$ANOVA[5,]$F)

# Cat*Block:

CMB_CB_peta = peta(dfn = CMB_ANOVA$ANOVA[6,]$DFn, dfd = CMB_ANOVA$ANOVA[6,]$DFd, f = CMB_ANOVA$ANOVA[6,]$F)

# Month*Block:

CMB_MB_peta = peta(dfn = CMB_ANOVA$ANOVA[7,]$DFn, dfd = CMB_ANOVA$ANOVA[7,]$DFd, f = CMB_ANOVA$ANOVA[7,]$F)

# Cat*Month*Block:

CMB_CMB_peta = peta(dfn = CMB_ANOVA$ANOVA[8,]$DFn, dfd = CMB_ANOVA$ANOVA[8,]$DFd, f = CMB_ANOVA$ANOVA[8,]$F)

Interpretation of Main Effects

# Create data frames for the epsilon corrected tests:

# Cat:

CMB_C_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[2], "DFn" = CMB_ANOVA$ANOVA$DFn[2], "DFd" = CMB_ANOVA$ANOVA$DFd[2], "SSn" = CMB_ANOVA$ANOVA$SSn[2], "SSd" = CMB_ANOVA$ANOVA$SSd[2], "F" = CMB_ANOVA$ANOVA$F[2], "p" = CMB_ANOVA$ANOVA$p[2], "sig" = CMB_ANOVA$ANOVA$`p<.05`[2], "peta" = CMB_C_peta)

# Month:

CMB_M_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[3], "DFn" = CMB_ANOVA$ANOVA$DFn[3], "DFd" = CMB_ANOVA$ANOVA$DFd[3], "SSn" = CMB_ANOVA$ANOVA$SSn[3], "SSd" = CMB_ANOVA$ANOVA$SSd[3], "F" = CMB_ANOVA$ANOVA$F[3], "p" = CMB_ANOVA$ANOVA$p[3], "sig" = CMB_ANOVA$ANOVA$`p<.05`[3], "peta" = CMB_M_peta)

# Block:

CMB_B_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[4], "DFn" = CMB_ANOVA$ANOVA$DFn[4] * CMB_Eps$HFe[1], "DFd" = CMB_ANOVA$ANOVA$DFd[4] * CMB_Eps$HFe[1], "SSn" = CMB_ANOVA$ANOVA$SSn[4], "SSd" = CMB_ANOVA$ANOVA$SSd[4], "F" = CMB_ANOVA$ANOVA$F[4], "p" = CMB_Eps$`p[HF]`[1], "sig" = CMB_Eps$`p[HF]<.05`[1], "peta" = CMB_B_peta)

# Cat*Month:

CMB_CM_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[5], "DFn" = CMB_ANOVA$ANOVA$DFn[5], "DFd" = CMB_ANOVA$ANOVA$DFd[5], "SSn" = CMB_ANOVA$ANOVA$SSn[5], "SSd" = CMB_ANOVA$ANOVA$SSd[5], "F" = CMB_ANOVA$ANOVA$F[5], "p" = CMB_ANOVA$ANOVA$p[5], "sig" = CMB_ANOVA$ANOVA$`p<.05`[5], "peta" = CMB_CM_peta)

# Cat*Block:

CMB_CB_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[6], "DFn" = CMB_ANOVA$ANOVA$DFn[6] * CMB_Eps$HFe[2], "DFd" = CMB_ANOVA$ANOVA$DFd[6] * CMB_Eps$HFe[2], "SSn" = CMB_ANOVA$ANOVA$SSn[6], "SSd" = CMB_ANOVA$ANOVA$SSd[6], "F" = CMB_ANOVA$ANOVA$F[6], "p" = CMB_Eps$`p[HF]`[2], "sig" = CMB_Eps$`p[HF]<.05`[2], "peta" = CMB_CB_peta)

# Month*Block:

CMB_MB_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[7], "DFn" = CMB_ANOVA$ANOVA$DFn[7] * CMB_Eps$HFe[3], "DFd" = CMB_ANOVA$ANOVA$DFd[7] * CMB_Eps$HFe[3], "SSn" = CMB_ANOVA$ANOVA$SSn[7], "SSd" = CMB_ANOVA$ANOVA$SSd[7], "F" = CMB_ANOVA$ANOVA$F[7], "p" = CMB_Eps$`p[HF]`[3], "sig" = CMB_Eps$`p[HF]<.05`[3], "peta" = CMB_MB_peta)

# Cat*Month*Block:

CMB_CMB_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[8], "DFn" = CMB_ANOVA$ANOVA$DFn[8] * CMB_Eps$HFe[4], "DFd" = CMB_ANOVA$ANOVA$DFd[8] * CMB_Eps$HFe[4], "SSn" = CMB_ANOVA$ANOVA$SSn[8], "SSd" = CMB_ANOVA$ANOVA$SSd[8], "F" = CMB_ANOVA$ANOVA$F[8], "p" = CMB_Eps$`p[HF]`[4], "sig" = CMB_Eps$`p[HF]<.05`[4], "peta" = CMB_CMB_peta)

# Combine into a single data frame:

CMB_ANOVA_Corr = rbind(CMB_C_ANOVA, CMB_M_ANOVA, CMB_B_ANOVA, CMB_CM_ANOVA, CMB_CB_ANOVA, CMB_MB_ANOVA, CMB_CMB_ANOVA)

# Create a table to display the results:

kable(CMB_ANOVA_Corr, digits = 4,
      caption = "Table 17. Mixed ANOVA on performance across block by month and category set.",
      col.names = c("Effect", "DFn", "DFd", "SSn", "SSd", "F", "p", "sig", "peta"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 17. Mixed ANOVA on performance across block by month and category set.
Effect	DFn	DFd	SSn	SSd	F	p	peta
Cat	1.0000	442.000	1.5916	16.0741	43.7664	0.0000	0.0901
Month	6.0000	442.000	1.1411	16.0741	5.2295	0.0000	0.0663
Block	2.5067	1107.943	1.5776	5.9090	118.0047	0.0000	0.2107
Cat:Month	6.0000	442.000	0.2320	16.0741	1.0632	0.3839	0.0142
Cat:Block	2.5067	1107.943	0.1618	5.9090	12.1035	0.0000	0.0267
Month:Block	15.0400	1107.943	0.1226	5.9090	1.5283	0.0877	0.0203
Cat:Month:Block	15.0400	1107.943	0.0698	5.9090	0.8697	0.5990	0.0117

CatMonthBlock:

Based on an alpha level of .05, the three-way interaction between category set, month, and block was not found to be statistically significant F (15, 1108) = 0.87, p = 0.6, \(\eta^2_{p}\) = 0.01.

Month*Block:

Based on an alpha level of .05, the two-way interaction between month and block was not found to be statistically significant F (15, 1108) = 1.53, p = 0.09, \(\eta^2_{p}\) = 0.02.

Cat*Block:

Based on an alpha level of .05, the two-way interaction between category set and block was found to be statistically significant F (3, 1108) = 12.1, p < .001, \(\eta^2_{p}\) = 0.03.

Cat*Month:

Based on an alpha level of .05, the two-way interaction between category set and month was not found to be statistically significant F (6, 442) = 1.06, p = 0.38, \(\eta^2_{p}\) = 0.01.

Block:

Because the category set x block interaction was found to be significant, we don’t need to consider the main effect of block.

Month:

Based on an alpha level of .05, the main effect of month was found to be statistically significant F (6, 442) = 5.23, p < .001, \(\eta^2_{p}\) = 0.07.

Cat:

Because the category set x block interaction was found to be significant, we don’t need to consider the main effect of category set.

Cat*Block Post-Hoc:

The significant category set x block interaction will be further assessed via a test of simple main effects of block across levels of category set. To do so, we will conduct ANOVA analyses to assess the effect of block on performance for each condition separately. Note that, in order to correct the family-wise error rate, a Holm-Bonferroni adjustment will be used when assessing the significance of main effects associated with these ANOVAs. The Holm-Bonferroni adjustment defines a corrected alpha level according to the following formula: \[\alpha_{corrected} = \frac{\alpha}{number\;of\;comparisons - rank\;of\;comparison + 1}\]

Before we begin, we’ll define RD and II data subsets:

RD_Perf = subset(CatLong, Cat == "RD")

II_Perf = subset(CatLong, Cat == "II")

Rule-Defined ANOVA

Assumptions

The repeated-measures ANOVA makes three primary assumptions:

1. Independent Random Sampling

This assumption was met during testing.

2. Normality

CMB_RD_Shap = shapiro.test(RD_Perf$Performance)
CMB_RD_Shap

## 
##  Shapiro-Wilk normality test
## 
## data:  RD_Perf$Performance
## W = 0.91747, p-value < 0.00000000000000022

Based on an alpha level of .05, the assumption of normality is not met; W = 0.92, p < .001. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).

3. Sphericity

# Conduct the ANOVA:

CMB_RD_ANOVA = ezANOVA(data = RD_Perf, dv = .(Performance), wid = .(Subject), within = .(Block), type = "III", detailed = TRUE, return_aov = TRUE)

# Extract the sphericity test from the ANOVA output:

CMB_RD_Mau = CMB_RD_ANOVA$`Mauchly's Test for Sphericity`

# Create a table to display the results:

kable(CMB_RD_Mau, digits = 4,
      caption = "Table 18. Test of sphericity on rule-defined performance by block.",
      col.names = c("Effect", "W", "p", "sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 18. Test of sphericity on rule-defined performance by block.
	Effect	W	p	sig
2	Block	0.6067	0

Based on an alpha level of .05, the assumption of sphericity is not met (W = 0.61, p < .001). We will now look at the potential epsilon corrections that we can use.

# Extract the corrections table from the ANOVA output:
CMB_RD_Eps = CMB_RD_ANOVA$`Sphericity Corrections`

# Create a table to display the results:

kable(CMB_RD_Eps, digits = 4,
      caption = "Table 19. Epsilon corrections for the test of rule-defined performance by block.",
      col.names = c("Effect", "GG Epsilon", "GG p", "GG sig", "HF Epsilon", "HF p", "HF sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 19. Epsilon corrections for the test of rule-defined performance by block.
	Effect	GG Epsilon	GG p	GG sig	HF Epsilon	HF p	HF sig
2	Block	0.7544	0		0.7617	0

Because \(\varepsilon_{GG}\) > .75, we will apply the Huynh-Feldt correction (\(\varepsilon\) = 0.76; as suggested by Girden, 1992).

Information Integration ANOVA

Assumptions

1. Independent Random Sampling

This assumption was met during testing.

2. Normality

CMB_II_Shap = shapiro.test(II_Perf$Performance)
CMB_II_Shap

## 
##  Shapiro-Wilk normality test
## 
## data:  II_Perf$Performance
## W = 0.99504, p-value = 0.008679

Based on an alpha level of .05, the assumption of normality is not met; W = 1, p = 0.01. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).

3. Sphericity

# Conduct the ANOVA:

CMB_II_ANOVA = ezANOVA(data = II_Perf, dv = .(Performance), wid = .(Subject), within = .(Block), type = "III", detailed = TRUE, return_aov = TRUE)

# Extract the sphericity test from the ANOVA output:

CMB_II_Mau = CMB_II_ANOVA$`Mauchly's Test for Sphericity`

# Create a table to display the results:

kable(CMB_II_Mau, digits = 4,
      caption = "Table 20. Test of sphericity on information integration performance by block.",
      col.names = c("Effect", "W", "p", "sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 20. Test of sphericity on information integration performance by block.
	Effect	W	p	sig
2	Block	0.8705	0

Based on an alpha level of .05, the assumption of sphericity is not met (W = 0.87, p < .001). We will now look at the potential epsilon corrections that we can use.

# Extract the corrections table from the ANOVA output:
CMB_II_Eps = CMB_II_ANOVA$`Sphericity Corrections`

# Create a table to display the results:

kable(CMB_II_Eps, digits = 4,
      caption = "Table 21. Epsilon corrections for the test of information integration performance by block.",
      col.names = c("Effect", "GG Epsilon", "GG p", "GG sig", "HF Epsilon", "HF p", "HF sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 21. Epsilon corrections for the test of information integration performance by block.
	Effect	GG Epsilon	GG p	GG sig	HF Epsilon	HF p	HF sig
2	Block	0.909	0		0.9224	0

Because \(\varepsilon_{GG}\) > .75, we will apply the Huynh-Feldt correction (\(\varepsilon\) = 0.92; as suggested by Girden, 1992).

Interpretation of Main Effects

We will now employ the Holm-Bonferroni adjustment to assess the main effects of these tests.

# Create a data frame of the p-values being assessed:

CMB_Corr_ps1 = data.frame("Condition" = c("RD", "II"), "p" = c(CMB_RD_Eps$`p[HF]`, CMB_II_Eps$`p[HF]`))

# Perform p-adjustment:

CMB_Corr_ps2 = p.adjust(CMB_Corr_ps1[,2], method = c("holm"), n = 2)

# Create a data frame for the RD Condition:

CMB_RD_ANOVA_Corr = data.frame("Condition" = c("RD"), "Effect" = CMB_RD_ANOVA$ANOVA$Effect[2], "DFn" = CMB_RD_ANOVA$ANOVA$DFn[2] * CMB_RD_Eps$HFe, "DFd" = CMB_RD_ANOVA$ANOVA$DFd[2] * CMB_RD_Eps$HFe, "SSn" = CMB_RD_ANOVA$ANOVA$SSn[2], "SSd" = CMB_RD_ANOVA$ANOVA$SSd[2], "F" = CMB_RD_ANOVA$ANOVA$F[2], "p" = round(CMB_Corr_ps2[1], 4), "peta" = CMB_RD_ANOVA$ANOVA$ges[2])

# Create a data frame for the II Condition:

CMB_II_ANOVA_Corr = data.frame("Condition" = c("II"),  "Effect" = CMB_II_ANOVA$ANOVA$Effect[2], "DFn" = CMB_II_ANOVA$ANOVA$DFn[2] * CMB_II_Eps$HFe, "DFd" = CMB_II_ANOVA$ANOVA$DFd[2] * CMB_II_Eps$HFe, "SSn" = CMB_II_ANOVA$ANOVA$SSn[2], "SSd" = CMB_II_ANOVA$ANOVA$SSd[2], "F" = CMB_II_ANOVA$ANOVA$F[2], "p" = round(CMB_Corr_ps2[2], 4), "peta" = CMB_II_ANOVA$ANOVA$ges[2])

# Combine both data frames into one:

CMB_RDII_ANOVA_Corr = rbind(CMB_RD_ANOVA_Corr, CMB_II_ANOVA_Corr)

# Format the results table so that significant p-values will be bolded:

CMB_RDII_ANOVA_Corr = CMB_RDII_ANOVA_Corr %>%
  mutate(
    p = text_spec(p, bold = (ifelse(p < .05, "TRUE", "FALSE")))
  )

# Create a table to display the results:

kable(CMB_RDII_ANOVA_Corr, digits = 4,
      caption = "Table 22. Tests of simple main effects on performance by block for each category set condition.",

      align = 'c', escape = FALSE) %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 22. Tests of simple main effects on performance by block for each category set condition.
Condition	Effect	DFn	DFd	SSn	SSd	F	p	peta
RD	Block	2.2850	566.6777	2.6390	3.7067	176.5669	0	0.1431
II	Block	2.7671	570.0177	0.5793	2.3931	49.8707	0	0.0699

The main effect of block was found to be significant for both rule-defined and information integration performance; F (2, 567) = 176.57, p < .001, \(\eta^2_{p}\) = 0.14 and F (3, 570) = 49.87, p < .001, \(\eta^2_{p}\) = 0.07, respectively.

The significant effect of block will be further assessed via post-hoc tests. Because our data displayed a violation of the sphericity assumption, we will use the Bonferroni adjustment.

The means that we will be comparing are presented below:

CBDescs = summarySE(data = CatLong, measurevar = "Performance",
                    groupvars = c("Cat", "Block"), conf.interval = .95)

# Create a table to display the results:

kable(CBDescs, digits = 4,
      caption = "Table 23. Descriptives by category set and block.",
      col.names = c("Category Set", "Block", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 23. Descriptives by category set and block.
Category Set	Block	n	M	SD	SE	CI
RD	Block1	249	0.6675	0.1195	0.0076	0.0149
RD	Block2	249	0.7748	0.1291	0.0082	0.0161
RD	Block3	249	0.7876	0.1251	0.0079	0.0156
RD	Block4	249	0.7935	0.1309	0.0083	0.0163
II	Block1	207	0.6155	0.0740	0.0051	0.0101
II	Block2	207	0.6586	0.0922	0.0064	0.0126
II	Block3	207	0.6783	0.1026	0.0071	0.0141
II	Block4	207	0.6820	0.1138	0.0079	0.0156

RD Post-hoc Test

CMB_RD_Bon = pairwise.t.test(RD_Perf$Performance, RD_Perf$Block, p.adjust.method = "bonferroni", paired = T)

# Round the results to 4 decimal places:

CMB_RD_Bon = data.frame(round(CMB_RD_Bon$p.value, 4))

# Format the results table so that significant p-values will be bolded:

CMB_RD_BonB = CMB_RD_Bon %>%
  mutate(
    Block1 = text_spec(Block1, bold = (ifelse(Block1 < .05, "TRUE", "FALSE"))),
    Block2 = text_spec(Block2, bold = (ifelse(Block2 < .05, "TRUE", "FALSE"))),
    Block3 = text_spec(Block3, bold = (ifelse(Block3 < .05, "TRUE", "FALSE")))
  )

# Add row labels:

CMB_RD_BonL = data.frame("Comparison" = c("Block2", "Block3", "Block4"))

CMB_RD_BonB = cbind(CMB_RD_BonL, CMB_RD_BonB)

# Create a table to display the results:

kable(CMB_RD_BonB, digits = 4,
      caption = "Table 24. Post-hoc test of rule-defined performance across block.",

      align = 'c', escape = FALSE) %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 24. Post-hoc test of rule-defined performance across block.
Comparison	Block2	Block3
Block2	NA	NA
Block3	0.0531	NA
Block4	0.0077	1

These results indicate that, with respect to the rule-defined task, block 1 performance (M_RD1 = 0.67, SD_RD1 = 0.12) was significantly lower than blocks 2 (M_RD2 = 0.77, SD_RD2 = 0.13), 3 (M_RD3 = 0.79, SD_RD3 = 0.13), and 4 (M_RD4 = 0.79, SD_RD4 = 0.13; all ps < .001). Block 4 performance was also significantly higher than performance on block 2 (p = 0.01). No other between-block performance differences were found to be statistically significant (ps > .05).

II Post-hoc Test

CMB_II_Bon = pairwise.t.test(II_Perf$Performance, II_Perf$Block, p.adjust.method = "bonferroni", paired = T)

# Round the results to 4 decimal places:

CMB_II_Bon = data.frame(round(CMB_II_Bon$p.value, 4))

# Format the results table so that significant p-values will be bolded:

CMB_II_BonB = CMB_II_Bon %>%
  mutate(
    Block1 = text_spec(Block1, bold = (ifelse(Block1 < .05, "TRUE", "FALSE"))),
    Block2 = text_spec(Block2, bold = (ifelse(Block2 < .05, "TRUE", "FALSE"))),
    Block3 = text_spec(Block3, bold = (ifelse(Block3 < .05, "TRUE", "FALSE")))
  )

# Add row labels:

CMB_II_BonL = data.frame("Comparison" = c("Block2", "Block3", "Block4"))

CMB_II_BonB = cbind(CMB_II_BonL, CMB_II_BonB)

# Create a table to display the results:

kable(CMB_II_BonB, digits = 4,
      caption = "Table 25. Post-hoc test of information integration performance across block.",

      align = 'c', escape = FALSE) %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 25. Post-hoc test of information integration performance across block.
Comparison	Block2	Block3
Block2	NA	NA
Block3	0.0036	NA
Block4	0.0011	1

These results indicate that, with respect to the information integration task, block 1 performance (M_II1 = 0.62, SD_II1 = 0.07) was significantly lower than blocks 2 (M_II2 = 0.66, SD_II2 = 0.09), 3 (M_II3 = 0.68, SD_II3 = 0.1), and 4 (M_II4 = 0.68, SD_II4 = 0.11; all ps < .001). Block 2 performance was also significantly lower than both block 3 (p = 0.004) and 4 (p = 0.001). No other between-block performance differences were found to be statistically significant (ps > .05).

Month Post-Hoc:

The significant main effect of month will be further assessed (though the test will essentially be identical to the one conducted in the Problem 2 section above). Because we are planning on completing all possible pairwise comparisons, we will use the Tukey HSD adjustment (as suggested by Maxwell & Delaney, 2003).

Main Effect of Time

CMB_M_Tuk = TukeyHSD(aov(CatLong$Performance ~ CatLong$Month))

# Extract the results table from the output:

CMB_M_Tuk = data.frame(CMB_M_Tuk$`CatLong$Month`) 

# Round the results to 4 decimal places:

CMB_M_Tuk = round(CMB_M_Tuk, 4)

# Format the results table so that significant p-values will be bolded:

CMB_M_TukB = CMB_M_Tuk %>%
  mutate(
    p.adj = cell_spec(p.adj, bold = (ifelse(p.adj < .05, "TRUE", "FALSE")))
  )

# Add row labels:

CMB_M_TukL = data.frame("Comparison" = c("Feb-Jan", "Mar-Jan", "Apr-Jan", "Sept-Jan", "Oct-Jan", "Nov-Jan", "Mar-Feb", "Apr-Feb", "Sept-Feb", "Oct-Feb", "Nov-Feb", "Apr-Mar", "Sept-Mar", "Oct-Mar", "Nov-Mar", "Sept-Apr", "Oct-Apr", "Nov-Apr", "Oct-Sept", "Nov-Sept", "Nov-Oct"))

CMB_M_TukB = cbind(CMB_M_TukL, CMB_M_TukB)

# Create a table to display the results:

kable(CMB_M_TukB, digits = 4,
      caption = "Table 26. Post-hoc tests of performance by month.",
      col.names = c("Comparison", "Difference", "Lower", "Upper", "p"),
      align = 'c', escape = FALSE) %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")

Table 26. Post-hoc tests of performance by month.
Comparison	Difference	Lower	Upper	p
Feb-Jan	-0.0442	-0.0822	-0.0063	0.0107
Mar-Jan	-0.0467	-0.0797	-0.0136	0.0006
Apr-Jan	-0.0037	-0.0358	0.0284	0.9999
Sept-Jan	0.0486	0.0184	0.0789	0
Oct-Jan	-0.0007	-0.0302	0.0287	1
Nov-Jan	-0.0453	-0.1005	0.0098	0.1885
Mar-Feb	-0.0024	-0.0404	0.0356	1
Apr-Feb	0.0405	0.0033	0.0777	0.0223
Sept-Feb	0.0929	0.0573	0.1284	0
Oct-Feb	0.0435	0.0086	0.0784	0.0045
Nov-Feb	-0.0011	-0.0593	0.0572	1
Apr-Mar	0.0429	0.0108	0.0750	0.0016
Sept-Mar	0.0953	0.0651	0.1255	0
Oct-Mar	0.0459	0.0164	0.0754	0.0001
Nov-Mar	0.0013	-0.0538	0.0565	1
Sept-Apr	0.0524	0.0232	0.0816	0
Oct-Apr	0.0030	-0.0254	0.0314	0.9999
Nov-Apr	-0.0416	-0.0962	0.0130	0.2705
Oct-Sept	-0.0494	-0.0757	-0.0231	0
Nov-Sept	-0.0939	-0.1475	-0.0404	0
Nov-Oct	-0.0446	-0.0977	0.0085	0.168

These results indicate that participants displayed significantly better performance during September (M_Sept = 0.76, SD_Sept = 0.09) than during every other month (M_Jan = 0.71, SD_Jan = 0.11, p < .001; M_Feb = 0.67, SD_Feb = 0.11, p < .001; M_Mar = 0.67, SD_Mar = 0.12, p < .001; M_Apr = 0.71, SD_Apr = 0.11, p < .001; M_Oct = 0.71, SD_Oct = 0.08, p < .001; M_Nov = 0.7, SD_Nov = 0.07, p < .001). Participants in February and March also performed significantly worse than participants in January (p = 0.01 and p = 0.001, repectively), April (p = 0.02 and p = 0.002, respectively), and October (p = 0.004 and p < .001, respectively).