Lab Analysis Challenge

E. G. Nielsen

Oct. 4, 2018

Introduction

This notebook outlines analyses conducted as part of a coding challenge in The Categorization and Cognitive Science Lab. The data considered below is from a series of studies in which participants were asked to complete four blocks of either a rule-defined (RD) or information integration (II) category learning task. Participants were also asked to provide responses to a number of situational and demographic-based items.

The following file contains all of the data that was collected.

Data = read.csv("ModifiedFullData.csv", header = TRUE)

Some notes regarding the names of variables:

  • Delete - an indication of whether the participant was to be included in the analysis (coded as “0”) or not (coded as “1”).

  • UniqueSubjNum - subject numbers that uniquely identify participants in the dataset.

  • SubjNum - subject numbers that identify participants within each independent experiment.

  • ExpName - the name of the seven experiments included in the data set.

  • Cat - an indication of the category set (rule-defined [RD] or information integration [II]) that participants were randomly assigned to learn.

  • Condition - the name of the conditions in each of the individual experiments.

  • SDtoday and FRtoday - dummy coded versions of the Cat variable

  • Time - the time of day that testing occurred.

  • Time2 - an ordinal version of the Time variable.

  • Date - the date of testing.

  • DayofWeek - an ordinal version of the Date variable.

  • Month - the month in which testing occurred.

  • NumSubs - the number of subjects tested during each testing session.

  • SameGeNder and SameGenWhat - an indication of whether participants who were tested in a group were of the same gender (“1”) or not (“0”) and, if so, what gender they were.

  • CellPhone and Internet - an indication of whether participants were (“1”) or were not (“0”) observed accessing either their cellphone or the internet during the testing session.

  • PaidPool - an indication of whether participants were recruited from the psychology subject pool or were, instead, paid participants.

  • Late - a measure of how many minutes late a participant was for their scheduled testing session.

  • SignUp -

  • Age - participants’ self-reported age in years.

  • Gender - participants’ self-reported gender.

  • NativeLang, SecondLang, and AdditionalLang - participants’ self-reported native, second, and/or additional language(s).

  • SecondProficiency - participants’ self-reported level of second language proficiency, scored from 0 (low proficiency) to 4 (high proficiency).

  • Bilingual - an indication of whether a participant identified as bilingual (“1”) or not (“0”).

  • AcademicYear - participants’ academic year of study.

  • ExamLastWeek - an indication of whether a participant had written an exam in the week previous to testing (“1”) or not (“0”).

  • ExamnextWeek - an indication of whether a participant would (“1”) or would not (“0”) be writing an exam in the week after testing.

  • BusyDay - an indication of whether the testing day was (“1”) or was not (“0”) a busy day for each participant.

  • ClassBefore and ClassAfter - an indication of whether a participant did (“1”) or did not (“0”) have a class prior to or after the testing session.

  • FirstExp and OtherExps - an indication of if the study was (“1”) or was not (“0”) the first study a subject had participated and, if not, how many other studies they had previously participated in.

  • LastMealWhen and LastMealWhat - a measure of how many hours it had been since the participant had previously eaten and what they had eaten at the time they last ate.

  • Breakfast and BreakfastWhat - an indication of whether a participant did (“1”) or did not (“0”) eat breakfast on the day of testing and, if so, what their breakfast consisted of.

  • Alcohol and DrinkPerWeek - an indication of whether a participant does (“1”) or does not (“0”) drink alcohol and, if so, how many drinks per week they typically consume.

  • CoffeeTea - an indication of whether a participant is (“1”) or is not (“0”) a regular coffee or tea drinker.

  • Exercise and ExerciseFreq - an indication of whether a participant does (“1”) or does not (“0”) exercise on a regular basis and, if so, how many times they typically exercise per week.

  • SleepAvg and SleepLastNight - a measure of the number of hours a participant typically sleeps for each night and the number of hours they slept for the night prior to testing.

  • Tired - a measure of self-reported tiredness, scored from 1 (not tired) to 7 (very tired).

  • ExpDifficulty - a measure of self-reported task difficulty, scored from 1 (easy) to 7 (difficult).

  • GiveUp - an indication of whether a participant did (“1”) or did not (“0”) give up during the study.

  • MostlyGuess - an indication of whether a participant did (“1”) or did not (“0”) report that they “mostly guessed” during the study.

  • X1_Block to X4_Block - proportion of items responded to correctly for blocks 1 to 4 of the category learning task.

  • Total - proportion of items responded to correctly across all four blocks of the category learning task.


Problem 1

The first set of analyses will involve the calculation of descriptive statistics and the production of some basic figures. For these purposes, we will focus primarily on the following variables:

  1. Total - a continuous dependent variable (DV).

  2. Cat - a nominal independent variable (IV).

  3. Month, Time2, and DayofWeek - ordinal IV’s. Note that the data collected in May was collected as part of an unrelated pilot study. Therefore, we will remove the May data from our data set.

MayData = subset(Data, Month != "05_May")
CatData = droplevels(MayData)

All subsequent analyses will be conducted on this CatData data set.

Analysis Prep

Load Libraries

The following libraries will be used for this analysis:

# For creating themed html files:

# install.packages("prettydoc")
library(prettydoc)

# For calculating descriptive statistics:

# install.packages("Rmisc")
library(Rmisc)

# For formatting tables:

# install.packages("knitr")
library(knitr)

# install.packages ("kableExtra")
library(kableExtra)

# For using pipes and plotting performance with ggplot2:

# install.packages("tidyverse")
library(tidyverse)

Rename Factor Levels

Before we begin, we’ll rename and reorder the levels of the variables we’ll be using.

# Reorder the Cat variable:

CatData$Cat = factor(CatData$Cat, levels = c("RD", "II"))

# Rename the Month variable:

levels(CatData$Month) = c("Jan", "Feb", "Mar", "Apr", "Sept", "Oct", "Nov")

# Rename the Time2 variable:

levels(CatData$Time2) = c("Morning", "Early Afternoon", "Late Afternoon")

# Rename the DayofWeek variable:

levels(CatData$DayofWeek) = c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat")

Basic Descriptive Statistics

We’ll start by calculating some basic descriptive statistics (ns, Ms, SDs, SEs, and 95% CIs) for the DV across levels of each of the IV’s.

Category Set

# Calculate summary statistics:

CatDescs = summarySE(data = CatData, measurevar = "Total",
                     groupvars = "Cat", conf.interval = .95)

# Create a table to display the results:
                  
kable(CatDescs, digits = 4,
      caption = "Table 1. Descriptives by category set.",
      col.names = c("Category Set", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 1. Descriptives by category set.
Category Set n M SD SE CI
RD 249 0.7559 0.1015 0.0064 0.0127
II 207 0.6586 0.0804 0.0056 0.0110

Month

# Calculate summary statistics:

MonthDescs = summarySE(data = CatData, measurevar = "Total",
                       groupvars = "Month", conf.interval = .95)

# Create a table to display the results:

kable(MonthDescs, digits = 4,
      caption = "Table 2. Descriptives by month.",
      col.names = c("Month", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 2. Descriptives by month.
Month n M SD SE CI
Jan 64 0.7141 0.1082 0.0135 0.0270
Feb 39 0.6698 0.1077 0.0172 0.0349
Mar 64 0.6674 0.1199 0.0150 0.0299
Apr 72 0.7103 0.1121 0.0132 0.0263
Sept 95 0.7627 0.0871 0.0089 0.0177
Oct 108 0.7098 0.0841 0.0081 0.0161
Nov 14 0.6965 0.0719 0.0192 0.0415

Time of Day

# Calculate summary statistics:

TimeDescs = summarySE(data = CatData, measurevar = "Total",
                      groupvars = "Time2", conf.interval = .95)

# Create a table to display the results:

kable(TimeDescs, digits = 4,
      caption = "Table 3. Descriptives by time of day.",
      col.names = c("Time", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 3. Descriptives by time of day.
Time n M SD SE CI
Morning 151 0.7108 0.1022 0.0083 0.0164
Early Afternoon 170 0.7040 0.1057 0.0081 0.0160
Late Afternoon 135 0.7224 0.1051 0.0090 0.0179

Day of Week

# Calculate summary statistics:

DayDescs = summarySE(data = CatData, measurevar = "Total",
                     groupvars = "DayofWeek", conf.interval = .95)

# Create a table to display the results:

kable(DayDescs, digits = 4,
      caption = "Table 4. Descriptives by day.",
      col.names = c("Day", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 4. Descriptives by day.
Day n M SD SE CI
Mon 66 0.7419 0.0896 0.0110 0.0220
Tues 115 0.7243 0.1028 0.0096 0.0190
Wed 82 0.7156 0.0995 0.0110 0.0219
Thurs 127 0.6856 0.1026 0.0091 0.0180
Fri 53 0.7010 0.1191 0.0164 0.0328
Sat 13 0.7217 0.1263 0.0350 0.0763

Complex Descriptive Statistics

Next we’ll calculate descriptive statistics for the DV across levels of the Month, Time2, and DayofWeek variables crossed with the Cat variable.

Month by Category Set

# Calculate summary statistics:

CMDescs = summarySE(data = CatData, measurevar = "Total",
                    groupvars = c("Cat", "Month"), conf.interval = .95)

# Create a table to display the results:

kable(CMDescs, digits = 4,
      caption = "Table 5. Descriptives by category set and month.",
      col.names = c("Category Set", "Month", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 5. Descriptives by category set and month.
Category Set Month n M SD SE CI
RD Jan 25 0.7771 0.1099 0.0220 0.0454
RD Feb 15 0.7253 0.1396 0.0360 0.0773
RD Mar 33 0.7216 0.1272 0.0221 0.0451
RD Apr 41 0.7534 0.1139 0.0178 0.0360
RD Sept 68 0.7875 0.0850 0.0103 0.0206
RD Oct 63 0.7411 0.0721 0.0091 0.0182
RD Nov 4 0.7417 0.0319 0.0159 0.0507
II Jan 39 0.6736 0.0865 0.0138 0.0280
II Feb 24 0.6352 0.0635 0.0130 0.0268
II Mar 31 0.6098 0.0791 0.0142 0.0290
II Apr 31 0.6534 0.0808 0.0145 0.0296
II Sept 27 0.7001 0.0557 0.0107 0.0220
II Oct 45 0.6659 0.0807 0.0120 0.0242
II Nov 10 0.6784 0.0765 0.0242 0.0548

Time of Day by Category Set

# Calculate summary statistics:

CTDescs = summarySE(data = CatData, measurevar = "Total",
                    groupvars = c("Cat", "Time2"), conf.interval = .95)

# Create a table to display the results:

kable(CTDescs, digits = 4,
      caption = "Table 6. Descriptives by category set and time of day.",
      col.names = c("Category Set", "Time", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 6. Descriptives by category set and time of day.
Category Set Time n M SD SE CI
RD Morning 76 0.7639 0.0916 0.0105 0.0209
RD Early Afternoon 87 0.7482 0.1058 0.0113 0.0225
RD Late Afternoon 86 0.7565 0.1060 0.0114 0.0227
II Morning 75 0.6570 0.0825 0.0095 0.0190
II Early Afternoon 83 0.6577 0.0839 0.0092 0.0183
II Late Afternoon 49 0.6626 0.0720 0.0103 0.0207

Day of Week by Category Set

# Calculate summary statistics:

CDDescs = summarySE(data = CatData, measurevar = "Total",
                    groupvars = c("Cat", "DayofWeek"), 
                    conf.interval = .95)

# Create a table to display the results:

kable(CDDescs, digits = 4,
      caption = "Table 7. Descriptives by category set and day.",
      col.names = c("Category Set", "Day", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 7. Descriptives by category set and day.
Category Set Day n M SD SE CI
RD Mon 43 0.7761 0.0859 0.0131 0.0264
RD Tues 65 0.7750 0.0891 0.0110 0.0221
RD Wed 46 0.7550 0.1015 0.0150 0.0302
RD Thurs 63 0.7276 0.0996 0.0126 0.0251
RD Fri 26 0.7369 0.1307 0.0256 0.0528
RD Sat 6 0.7901 0.1580 0.0645 0.1658
II Mon 23 0.6781 0.0558 0.0116 0.0241
II Tues 50 0.6583 0.0799 0.0113 0.0227
II Wed 36 0.6652 0.0707 0.0118 0.0239
II Thurs 64 0.6443 0.0882 0.0110 0.0220
II Fri 27 0.6664 0.0969 0.0187 0.0383
II Sat 7 0.6630 0.0489 0.0185 0.0453

Plotting Performance

Now we’ll create some plots so that we can visualize the data.

Month by Category Set

We’ll begin with a basic bar plot. (Bar plots may be less useful for visualizing data ranges and distributions than violin plots; however, when a variable has as many levels as the Month variable does, the violin plot becomes crowded and horizontally compressed.)

The following code will produce a bar plot with “Month” on the x-axis, “Proportion Correct” on the y-axis, and separate bars for each of the conditions. Error bars represent SEs.

  1. First we’ll define the values to be used for the plot.
# Calculate SE min:

CMMin = data.frame("CMMin" = CMDescs$Total - CMDescs$se)

# Calculate SE max:

CMMax = data.frame("CMMax" = CMDescs$Total + CMDescs$se)

# Create data frame of values to be used:

CMData = data.frame("Cat" = CMDescs$Cat,
                    "Month" = CMDescs$Month,
                    "CMMean" = CMDescs$Total,
                    "CMMin" = CMMin,
                    "CMMax" = CMMax)
  1. Next, we’ll plot the data.
ggplot(CMData, aes(Month, CMMean, fill = Cat )) +
  geom_col(color = "black", position = "dodge", alpha = .7) +
  
  
  # Add in error bars:
  
  geom_errorbar(aes(ymin = CMMin, ymax = CMMax),
    color = "black", 
    position = position_dodge(width = 0.9),
    width = .1) +
  
  # Add labels:
  
  labs(x = "Month", y = "Proportion Correct",
       fill = "Category Condition") +
  ggtitle("Category Learning Performance by Condition and Month") +
  
  # Define the vertical size of the plot:
  
  ylim(0, 1) + 
  
  # Define variable colours and theme:
  
  scale_fill_manual(values = c("orchid3", "lightseagreen")) +
  scale_color_manual(values = c("orchid3", "lightseagreen")) +
  theme_light() 

Time of Day by Category Set

The Time2 variable has only 3 levels. It is an ideal candidate, therefore, for a violin plot.

The following code will produce a split violin plot with “Time of Day” on the x-axis, “Proportion Correct” on the y-axis, and separate data clouds for each of the conditions. Dots and lines represent means and 95% CIs, respectively.

  1. We’ll begin by defining a function that will create split violin plots. The code below was taken from DeBruine (2018).
GeomSplitViolin <- ggproto(
  "GeomSplitViolin", 
  GeomViolin, 
  draw_group = function(self, data, ..., draw_quantiles = NULL) {
    data <- transform(data, 
                      xminv = x - violinwidth * (x - xmin), 
                      xmaxv = x + violinwidth * (xmax - x))
    grp <- data[1,'group']
    newdata <- plyr::arrange(
      transform(data, x = if(grp%%2==1) xminv else xmaxv), 
      if(grp%%2==1) y else -y
    )
    newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ],
                     newdata[1, ])
    newdata[c(1,nrow(newdata)-1,nrow(newdata)), 'x'] <- round(newdata[1,
                                                                      'x']) 
    if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
      stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))
      quantiles <- ggplot2:::create_quantile_segment_frame(data,
                                                           draw_quantiles)
      aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data),
                                                          c("x", "y")),
                         drop = FALSE]
      aesthetics$alpha <- rep(1, nrow(quantiles))
      both <- cbind(quantiles, aesthetics)
      quantile_grob <- GeomPath$draw_panel(both, ...)
      ggplot2:::ggname("geom_split_violin", 
                       grid::grobTree(GeomPolygon$draw_panel(newdata, ...),
                                      quantile_grob))
    } else {
      ggplot2:::ggname("geom_split_violin",
                       GeomPolygon$draw_panel(newdata, ...))
    }
  }
)

geom_split_violin <- function (mapping = NULL, 
                               data = NULL, 
                               stat = "ydensity", 
                               position = "identity", ..., 
                               draw_quantiles = NULL, 
                               trim = TRUE, 
                               scale = "area", 
                               na.rm = FALSE, 
                               show.legend = NA, 
                               inherit.aes = TRUE) {
  layer(data = data, 
        mapping = mapping, 
        stat = stat, 
        geom = GeomSplitViolin, 
        position = position, 
        show.legend = show.legend, 
        inherit.aes = inherit.aes, 
        params = list(trim = trim, 
                      scale = scale, 
                      draw_quantiles = draw_quantiles, 
                      na.rm = na.rm, ...)
        )
}
  1. Next, we’ll define the values to be used for the dots and lines in the plot.
# Calculate CI min:

CTMin = data.frame("CTMin" = CTDescs$Total - CTDescs$ci)

# Calculate CI max:

CTMax = data.frame("CTMax" = CTDescs$Total + CTDescs$ci)

# Create data frame of values to be used:

CTData = data.frame("Cat" = CTDescs$Cat, "Time2" = CTDescs$Time2,
                    "CTMean" = CTDescs$Total, "CTMin" = CTMin,
                    "CTMax" = CTMax)
  1. Finally, we’ll plot the data.
 CatData %>%
  ggplot(aes(Time2, Total, fill = Cat)) +
  geom_split_violin(color="black", trim=FALSE, alpha = 0.7) +

  # Add in dots and lines:
  
  geom_pointrange(data = CTData,
    aes(Time2, CTMean, ymin = CTMin, ymax = CTMax),
    color = "black", 
    shape = 20,
    position = position_dodge(width = 0.25)) +
  
  # Add labels:
  
  labs(x = "Time of Day", y = "Proportion Correct",
       fill = "Category Condition") +
  ggtitle("Category Learning Performance by Condition and Time of Day") +
  
  # Define the vertical size of the plot:
  
  ylim(0.3, 1) + 
  
  # Define variable colours and theme:
  
  scale_fill_manual(values = c("orchid3", "lightseagreen")) +
  scale_color_manual(values = c("orchid3", "lightseagreen")) +
  theme_light() 

Day of Week by Category Set

Just for fun, we’ll create a notched box plot to display the last combination of variables.

The following code will produce a notched box plot with “Day” on the x-axis, “Proportion Correct” on the y-axis, and separate boxes for each of the conditions. Notches represent a CI around the median. (Note that the presence of the horn-like features on the last two boxes indicate that the CI is greater than the interquartile range.)

  1. Plot the data.
ggplot(CatData, aes(x = DayofWeek, y = Total, fill = Cat)) +
  geom_boxplot(outlier.color = "black",
               outlier.shape = 16, outlier.size = 2,
               notch = TRUE, position = position_dodge(1), alpha = .7) +

# Add labels:
  
  labs(x = "Day", y = "Proportion Correct", fill = "Category Condition") +
  ggtitle("Category Learning Performance by Condition and Day") +
  
  # Define the vertical size of the plot:
  
  ylim(0.4, 1) + 
  
  # Define variable colours and theme:
  
  scale_fill_manual(values = c("orchid3", "lightseagreen")) +
  scale_color_manual(values = c("orchid3", "lightseagreen")) +
  theme_light() 


Problem 2

The second set of analyses will involve running ANOVAs to assess the potential effects of the Cat variable, crossed with both Month and Time2, on overall category learning performance.

Analysis Prep

Load Libraries

The following libraries will be used for this analysis:

# For running Levene's test:

# install.packages("car")
library(car)

# For performing ANOVAs:

# install.packages("ez")
library(ez)

# For conducting Games-Howell post-hocs:

# install.packages("userfriendlyscience")
library(userfriendlyscience)

Adjust Display Options

ezANOVA prints output using scientific notation. In order to make it easier to read our ANOVA outputs, we’ll turn the scientific notation option off.

options(scipen = 999)

p Value Rounding Function

We’ll also create a function to assess and print p values in the comments of our script. If p >= .005, the function will display “p =” and the value rounded to two decimal places. If .0005 <= p < .005, the function will display “p =” and the value rounded to three decimal places. If p < .0005, the function will display “p < .001.”

p_round <- function(x){
  if(x > .005)
    {x1 = (paste("= ", round(x, digits = 2), sep = ''))
  }  
  else if(x == .005){x1 = (paste("= .01"))
  }
  else if(x > .0005 & x < .005)
    {x1 = (paste("= ", round(x, digits = 3), sep = ''))
  }  
  else if(x == .0005){x1 = (paste("= .001"))
  }
  else{x1 = (paste("< .001"))
  } 
  (x1)
}

Partial Eta Square

In some cases, we will have to use adjusted df’s and/or perform White adjusted ANOVAs. In these cases, we will have to calculate adjusted effect sizes. Partial eta square can be calculated using the following formula, which we will create a function for: \[\eta^2_{partial} = {\frac{df_n F}{df_n F + df_d}}\]

peta <- function(dfn, dfd, f) {
  return(dfn * f / ((dfn * f) + dfd))
}

Post-Hoc Rounding Function

We’ll also create a function to help round some of our post-hoc results. (Neither the kable rounding function nor the standard “round” function will work for some of our post-hoc results tables.) The code below was taken from Akhmed (2015).

round_df <- function(df, digits) {
  nums <- vapply(df, is.numeric, FUN.VALUE = logical(1))

  df[,nums] <- round(df[,nums], digits = digits)

  (df)
}

Category Set and Month

To assess the effects of category set and month on overall learning performance, we’ll conduct a 2 x 7 ANOVA with both Cat and Month as between-group factors. Because we have unequal sample sizes between groups, we’ll use Type III sum of squares. Note that a White adjustment has been used; this will be discussed further in the Homogeneity of Variance section below.

Assumptions

The standard ANOVA makes three primary assumptions:

1. Independent Random Sampling

This assumption was met during testing.

2. Normality

This assumption can be tested using a Shapiro-Wilk test.

CM_Shap = shapiro.test(CatData$Total)
CM_Shap
## 
##  Shapiro-Wilk normality test
## 
## data:  CatData$Total
## W = 0.97308, p-value = 0.0000001922

Based on an alpha level of .05, the assumption of normality is not met; W = 0.97, p < .001. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).

3. Homogeneity of Variance

This assumption can be tested using a Levene’s test (provided as part of the ANOVA output).

# Run the ANOVA:

CM_ANOVA = ezANOVA(data = CatData, dv = .(Total),
                   wid = .(UniqueSubjNum), between = .(Cat, Month),
                   detailed = TRUE, type = "III",
                   white.adjust = TRUE, return_aov = TRUE)

# Extract the Levene's Test from the ANOVA output:

CM_Lev = CM_ANOVA$`Levene's Test for Homogeneity of Variance`

# Create a table to display the results:

kable(CM_Lev, digits = 4,
      caption = "Table 8. Month by category set Levene's test.",
      col.names = c("DFn", "DFd", "SSn", "SSd","F", "p", "sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 8. Month by category set Levene’s test.
DFn DFd SSn SSd F p sig
13 442 0.1415 1.7921 2.684 0.0012

Based on an alpha level of .05, the assumption of homogeneity of variances is not met; F (13, 442) = 2.68, p = 0.001. Because sample sizes are unequal, a White-adjustment should be used to correct for this violation.

ANOVA

kable(CM_ANOVA$ANOVA, digits = 4,
      caption = "Table 9. Month by category set ANOVA.",
      col.names = c("Effect", "DFn", "DFd", "F", "p", "sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 9. Month by category set ANOVA.
Effect DFn DFd F p sig
(Intercept) 1 442 19570.0933 0.0000
Cat 1 442 80.1292 0.0000
Month 6 442 5.8585 0.0000
Cat:Month 6 442 0.4544 0.8418

Because we’ve used a White adjustment, effect sizes are not provided in the output. Instead, we will use our peta function to calculate partial eta effect sizes.

# Calculate values:

CMC_peta = peta(dfn = CM_ANOVA$ANOVA[2,]$DFn, dfd = CM_ANOVA$ANOVA[2,]$DFd, f = CM_ANOVA$ANOVA[2,]$F)
  
CMM_peta = peta(dfn = CM_ANOVA$ANOVA[3,]$DFn, dfd = CM_ANOVA$ANOVA[3,]$DFd, f = CM_ANOVA$ANOVA[3,]$F)

CMI_peta = peta(dfn = CM_ANOVA$ANOVA[4,]$DFn, dfd = CM_ANOVA$ANOVA[4,]$DFd, f = CM_ANOVA$ANOVA[4,]$F)

# Create a data frame of the results:

petas = data.frame("Cat" = CMC_peta, "Month" = CMM_peta,
                         "Cat*Month" = CMI_peta)

# Create a table to display the results:

kable(petas, digits = 4,
      caption = "Table 10. Month by category set effect sizes.",
      col.names = c("Category Set", "Month", "Interaction"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 10. Month by category set effect sizes.
Category Set Month Interaction
0.1535 0.0737 0.0061

Interpretation of Main Effects

Interaction

The category set x month interaction was not found to be statistically significant; F (6, 442) = 0.45, p = 0.84, \(\eta^2_{p}\) = 0.01.

Main Effect of Category Set

The main effect of category set was found to be statistically significant; F (1, 442) = 80.13, p < .001, \(\eta^2_{p}\) = 0.15.

Main Effect of Month

The main effect of month was found to be statistically significant; F (6, 442) = 5.86, p < .001, \(\eta^2_{p}\) = 0.07.

Post-hoc Tests

The significant main effects of category set and month will be further assessed via post-hoc tests. Because we have unequal sample sizes and our data displayed a violation of the homogeneity of variances assumption, we will use the Games-Howell adjustment.

Main Effect of Category Set

# Calculate post-hoc:

C_post = posthocTGH(y = CatData$Total, x = CatData$Cat,
                    method = c("games-howell"), conf.level = .95,
                    digits = 4, formatPvalue = TRUE)

# Round results:

C_post_r = round_df(C_post$output$games.howell, digits = 4)

# Format the results table so that significant p-values will be bolded:

C_post_r$p = cell_spec(C_post_r$p,
                       bold = (ifelse(C_post_r$p < .05, "TRUE", "FALSE"))) 

# Create a table to display the results:

kable(C_post_r, digits = 4,
      caption = "Table 11. Category set post-hoc.",
      col.names = c("Difference", "CI Min", "CI Max", "t", "df", "p"),
      align = 'c', escape = FALSE) %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 11. Category set post-hoc.
Difference CI Min CI Max t df p
II-RD -0.0973 -0.114 -0.0805 11.4169 452.941 0

These results indicate that participants in the RD condition (MRD = 0.76, SDRD = 0.1) performed significantly better on the category learning task than participants in the II condition(MII = 0.66, SDII = 0.08); t (452.94) = 11.42, p < .001.

Main Effect of Month

# Calculate post-hoc: 

M_post = posthocTGH(y = CatData$Total, x = CatData$Month,
                    method = c("games-howell"), conf.level = .95,
                    digits = 4, formatPvalue = TRUE)

# Round results:

M_post_r = round_df(M_post$output$games.howell, digits = 4)

# Format the results table so that significant p-values will be bolded:

M_post_r$p = cell_spec(M_post_r$p,
                       bold = (ifelse(M_post_r$p < .05, "TRUE", "FALSE"))) 

# Create a table to display the results:

kable(M_post_r, digits = 4,
      caption = "Table 12. Month post-hoc.",
      col.names = c("Difference", "CI Min", "CI Max", "t", "df", "p"),
      align = 'c', escape = FALSE) %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"), 
                full_width = F, position = "center")
Table 12. Month post-hoc.
Difference CI Min CI Max t df p
Feb-Jan -0.0442 -0.1105 0.0220 2.0190 80.7115 0.411
Mar-Jan -0.0467 -0.1071 0.0138 2.3118 124.6992 0.2467
Apr-Jan -0.0037 -0.0603 0.0529 0.1976 133.0743 1
Sept-Jan 0.0486 0.0000 0.0972 2.9994 115.2892 0.0502
Oct-Jan -0.0043 -0.0517 0.0431 0.2730 108.0980 1
Nov-Jan -0.0175 -0.0922 0.0571 0.7467 27.6581 0.9881
Mar-Feb -0.0024 -0.0714 0.0665 0.1059 87.1016 1
Apr-Feb 0.0405 -0.0252 0.1062 1.8650 80.7967 0.5091
Sept-Feb 0.0928 0.0336 0.1521 4.7817 59.4204 0.0002
Oct-Feb 0.0399 -0.0183 0.0982 2.0966 55.6527 0.3689
Nov-Feb 0.0267 -0.0541 0.1074 1.0338 34.6614 0.9424
Apr-Mar 0.0429 -0.0169 0.1027 2.1491 129.5460 0.3307
Sept-Mar 0.0953 0.0428 0.1477 5.4615 106.7338 0
Oct-Mar 0.0424 -0.0089 0.0936 2.4870 100.1513 0.1752
Nov-Mar 0.0291 -0.0476 0.1058 1.1946 31.2206 0.8909
Sept-Apr 0.0523 0.0046 0.1001 3.2828 130.2505 0.0219
Oct-Apr -0.0006 -0.0470 0.0459 0.0367 122.8858 1
Nov-Apr -0.0138 -0.0880 0.0603 0.5923 27.0723 0.9965
Oct-Sept -0.0529 -0.0888 -0.0170 4.3884 195.7911 0.0004
Nov-Sept -0.0662 -0.1357 0.0034 3.1216 19.1030 0.0688
Nov-Oct -0.0132 -0.0822 0.0557 0.6351 17.9562 0.9946

These results indicate that participants displayed significantly better performance during September (MSept = 0.76, SDSept = 0.09) than during February (MFeb = 0.67, SDFeb = 0.11), March (MMar = 0.67, SDMar = 0.12), April (MApr = 0.71, SDApr = 0.11), and October (MOct = 0.71, SDOct = 0.08); t (59.42) = 4.78, p < .001; t (106.73) = 5.46, p < .001; t (130.25) = 3.28, p = 0.02; and t (195.79) = 4.39, p < .001, respectively. Participants in September also performed marginally better than participants in January (MJan = 0.71, SDJan = 0.11) and November (MNov = 0.7, SDNov = 0.07); t (115.29) = 3, p = 0.05 and t (19.1) = 3.12, p = 0.07, respectively.

Category Set and Time of Day

To assess the effects of category set and time of day on overall learning performance, we’ll conduct a 2 x 3 ANOVA with both Cat and Time2 as between-group factors. Because we have unequal sample sizes between groups, we’ll use Type III sum of squares.

Assumptions

1. Independent Random Sampling

This assumption was met during testing.

2. Normality

As specified above in the Category Set by Month analysis, the assumption of normality is not met; W = 0.97, p < .001. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).

3. Homogeneity of Variance

Again, this assumption can be tested using a Levene’s test.

# Run the ANOVA:

CT_ANOVA = ezANOVA(data = CatData, dv = .(Total),
                   wid = .(UniqueSubjNum), between = .(Cat, Time2),
                   detailed = TRUE, type = "III", return_aov = TRUE)

# Extract the Levene's Test from the ANOVA output:

CT_Lev = CT_ANOVA$`Levene's Test for Homogeneity of Variance`

# Create a table to display the results:

kable(CT_Lev, digits = 4,
      caption = "Table 13. Time of day by category set Levene's test.",
      col.names = c("DFn", "DFd", "SSn", "SSd","F", "p", "sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 13. Time of day by category set Levene’s test.
DFn DFd SSn SSd F p sig
5 450 0.0337 1.8655 1.6245 0.152

Based on an alpha level of .05, the assumption of homogeneity of variances is met; F (5, 450) = 1.62, p = 0.15. A White-adjustment, therefore, does not need to be used for this analysis.

ANOVA

kable(CT_ANOVA$ANOVA, digits = 4,
      caption = "Table 14. Time of day by category set ANOVA.",
      col.names = c("Effect", "DFn", "DFd", "SSn", "SSd", "F",
                    "p", "sig", "Effect Size"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 14. Time of day by category set ANOVA.
Effect DFn DFd SSn SSd F p sig Effect Size
(Intercept) 1 450 219.6694 3.8762 25502.3848 0.0000
0.9827
Cat 1 450 1.0344 3.8762 120.0936 0.0000
0.2107
Time2 2 450 0.0053 3.8762 0.3091 0.7342 0.0014
Cat:Time2 2 450 0.0058 3.8762 0.3377 0.7136 0.0015

Interpretation of Main Effects

Interaction

The category set x time of day interaction was not found to be statistically significant; F (2, 450) = 0.34, p = 0.71, \(\eta^2_{p}\) = 0.

Main Effect of Category Set

As before, the main effect of category set was found to be statistically significant; F (1, 450) = 120.09, p < .001, \(\eta^2_{p}\) = 0.21.

Main Effect of Time of Day

The main effect of time of day was not found to be statistically significant; F (2, 450) = 0.31, p = 0.73, \(\eta^2_{p}\) = 0.

Post-hoc Tests

The significant main effect of category set will be further assessed via a post-hoc test (though the test will essentially be identical to the one conducted in the Category Set by Month analysis above). Because we have only one pairwise comparison to complete, we will use the Bonferroni adjustment.

Main Effect of Category Set

C1_post = pairwise.t.test(CatData$Total, CatData$Cat,
                          p.adjust.method = "bonf")
C1_post
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  CatData$Total and CatData$Cat 
## 
##    RD                 
## II <0.0000000000000002
## 
## P value adjustment method: bonferroni

These results confirm the previous finding that participants in the RD condition performed significantly better on the category learning task than participants in the II condition (p < .001).


Problem 3

The third set of analyses will involve creating a new dataset with Cat, Month, Time2, and performance by block in long format. We will then create line plots to depict performance across block by both Month and Cat, followed by a mixed ANOVA to assess the relationships between the variables.

Analysis Prep

Data Gathering

First, we will create a subset of the variables that we are interested in.

# Specify the variables to keep:

VarsKeep = names(CatData) %in% c("UniqueSubjNum", "Cat", "Time2",  "Month", "X1_Block", "X2_Block", "X3_Block", "X4_Block")

# Define the new subset and label the columns:

Data3 = (CatData[VarsKeep])

colnames(Data3) = c("Subject", "Cat", "Time", "Month", "Block1", "Block2", "Block3", "Block4")

Now we will gather the data into long format.

CatLong = gather(Data3, Block, Performance, Block1:Block4, factor_key = TRUE)

Plotting Performance

Now we’ll plot performance across block by both category set and month.

  1. We’ll begin by calculating the descriptive statistics to be used in the plot.
CMBDescs = summarySE(data = CatLong, measurevar = "Performance", groupvars = c("Cat", "Month", "Block"), conf.interval = .95)
  1. Next, we’ll plot performance. The following code will produce a series of line graphs with “Block” on the x-axis, “Proportion Correct” on the y-axis, separate lines for the two category conditions, and separate plots for the seven months. Error bars represent standard error of the mean (SEM).
CMBFig = ggplot(CMBDescs, aes(x = Block, y = Performance)) +
  
  # Specify labels: 
  
  labs(x = "Block", y = "Proportion Correct") + 
  ggtitle("Category Learning Performance by Block, Month, and Category Set") +
  
  # Define line and point aesthetics:
  
  geom_line(aes(colour = Cat, group = Cat)) + geom_point(size = 2, aes(colour = Cat)) + 
  
   # Adjust axes:
  
  scale_x_discrete(limits = c("Block1", "Block2", "Block3", "Block4"),
                   labels = c("Block1" = "1", "Block2" = "2", "Block3" = "3", "Block4" = "4")) + 
  scale_y_continuous(limits = c(0.4, 1.0), breaks = seq(0.4, 1.0, .1)) +  
  
  # Add legend and adjust colours:
  
  scale_colour_manual(name = "Category Set", values = c("orchid3", "lightseagreen")) + 
  
  # Add error bars:
  
  geom_errorbar(data = CMBDescs, mapping = aes(x = Block, ymin = Performance - se, ymax = Performance + se), width = .1) + 
  
  # Specify theme:
  
  theme_bw() + theme(plot.title = element_text(hjust = .5)) 

# Use facet wrap to divide the plot by month:

CMBFig + facet_wrap( ~ Month, ncol = 2)

Mixed ANOVA

Now we’ll conduct a Type III, 2 x 7 x 4 mixed ANOVA with category condition and month as between-group factors and block as a within-group factor.

Assumptions

The mixed ANOVA makes four primary assumptions:

  1. Independent Random Sampling

This assumption was met during testing.

  1. Normality

This assumption can be tested by applying a Shapiro-Wilk test to the outcome measure (i.e. performance).

CMB_Shap = shapiro.test(CatLong$Performance)
CMB_Shap
## 
##  Shapiro-Wilk normality test
## 
## data:  CatLong$Performance
## W = 0.97786, p-value = 0.0000000000000003455

Based on an alpha level of .05, the assumption of normality is not met; W = 0.98, p < .001. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).

  1. Homogeneity of Variance for Between-Group Factors

This assumption can be tested by implementing a Levene’s test on performance collapsed across the within-group factor (i.e. performance averaged across Block).

CMB_Lev = leveneTest(data = CatData, Total ~ Cat*Month, center = median)
CMB_Lev
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value   Pr(>F)   
## group  13   2.684 0.001205 **
##       442                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on an alpha level of .05, the assumption of homogeneity of between-group variances is not met; F(13, 442) = 2.68, p = 0.001. Because sample sizes are unequal, a White-corrected F-test should be conducted if one is interested in assessing the main effect of condition.

  1. Sphericity

This assumption can be tested using Mauchley’s sphericity test (provided as part of the ANOVA output).

# Run the ANOVA:

CMB_ANOVA = ezANOVA(data = CatLong, dv = .(Performance), wid = .(Subject), within = .(Block), between = .(Cat, Month), detailed = TRUE, type = "III", return_aov = TRUE, white.adjust = TRUE)

# Extract the sphericity test from the ANOVA output:

CMB_Mau = CMB_ANOVA$`Mauchly's Test for Sphericity`

# Create a table to display the results:

kable(CMB_Mau, digits = 4,
      caption = "Table 15. Test of sphericity on performance by block.",
      col.names = c("Effect", "W", "p", "sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 15. Test of sphericity on performance by block.
Effect W p sig
5 Block 0.7442 0
6 Cat:Block 0.7442 0
7 Month:Block 0.7442 0
8 Cat:Month:Block 0.7442 0

Based on an alpha level of .05, the assumption of sphericity is not met for any of the effects involving the within-group factor (W = 0.74, p < .001). We will now look at the potential epsilon corrections that we can use.

# Extract the corrections table from the ANOVA output:

CMB_Eps = CMB_ANOVA$`Sphericity Corrections`

# Create a table to display the results:

kable(CMB_Eps, digits = 4,
      caption = "Table 16. Epsilon corrections for the test of performance by block.",
      col.names = c("Effect", "GG Epsilon", "GG p", "GG sig", "HF Epsilon", "HF p", "HF sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 16. Epsilon corrections for the test of performance by block.
Effect GG Epsilon GG p GG sig HF Epsilon HF p HF sig
5 Block 0.8305 0.0000
0.8356 0.0000
6 Cat:Block 0.8305 0.0000
0.8356 0.0000
7 Month:Block 0.8305 0.0883 0.8356 0.0877
8 Cat:Month:Block 0.8305 0.5984 0.8356 0.5990

Because \(\varepsilon_{GG}\) > .75, we will apply the Huynh-Feldt correction (\(\varepsilon\) = 0.84) to these tests (as suggested by Girden, 1992).

Effect Sizes

Because we’ve used a White adjustment and adjusted df’s, we’ll have to calculate adjusted effect sizes.

# Cat:

CMB_C_peta = peta(dfn = CMB_ANOVA$ANOVA[2,]$DFn, dfd = CMB_ANOVA$ANOVA[2,]$DFd, f = CMB_ANOVA$ANOVA[2,]$F)

# Month:

CMB_M_peta = peta(dfn = CMB_ANOVA$ANOVA[3,]$DFn, dfd = CMB_ANOVA$ANOVA[3,]$DFd, f = CMB_ANOVA$ANOVA[3,]$F)

# Block:

CMB_B_peta = peta(dfn = CMB_ANOVA$ANOVA[4,]$DFn, dfd = CMB_ANOVA$ANOVA[4,]$DFd, f = CMB_ANOVA$ANOVA[4,]$F)

# Cat*Month:

CMB_CM_peta = peta(dfn = CMB_ANOVA$ANOVA[5,]$DFn, dfd = CMB_ANOVA$ANOVA[5,]$DFd, f = CMB_ANOVA$ANOVA[5,]$F)

# Cat*Block:

CMB_CB_peta = peta(dfn = CMB_ANOVA$ANOVA[6,]$DFn, dfd = CMB_ANOVA$ANOVA[6,]$DFd, f = CMB_ANOVA$ANOVA[6,]$F)

# Month*Block:

CMB_MB_peta = peta(dfn = CMB_ANOVA$ANOVA[7,]$DFn, dfd = CMB_ANOVA$ANOVA[7,]$DFd, f = CMB_ANOVA$ANOVA[7,]$F)

# Cat*Month*Block:

CMB_CMB_peta = peta(dfn = CMB_ANOVA$ANOVA[8,]$DFn, dfd = CMB_ANOVA$ANOVA[8,]$DFd, f = CMB_ANOVA$ANOVA[8,]$F)

Interpretation of Main Effects

# Create data frames for the epsilon corrected tests:

# Cat:

CMB_C_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[2], "DFn" = CMB_ANOVA$ANOVA$DFn[2], "DFd" = CMB_ANOVA$ANOVA$DFd[2], "SSn" = CMB_ANOVA$ANOVA$SSn[2], "SSd" = CMB_ANOVA$ANOVA$SSd[2], "F" = CMB_ANOVA$ANOVA$F[2], "p" = CMB_ANOVA$ANOVA$p[2], "sig" = CMB_ANOVA$ANOVA$`p<.05`[2], "peta" = CMB_C_peta)

# Month:

CMB_M_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[3], "DFn" = CMB_ANOVA$ANOVA$DFn[3], "DFd" = CMB_ANOVA$ANOVA$DFd[3], "SSn" = CMB_ANOVA$ANOVA$SSn[3], "SSd" = CMB_ANOVA$ANOVA$SSd[3], "F" = CMB_ANOVA$ANOVA$F[3], "p" = CMB_ANOVA$ANOVA$p[3], "sig" = CMB_ANOVA$ANOVA$`p<.05`[3], "peta" = CMB_M_peta)

# Block:

CMB_B_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[4], "DFn" = CMB_ANOVA$ANOVA$DFn[4] * CMB_Eps$HFe[1], "DFd" = CMB_ANOVA$ANOVA$DFd[4] * CMB_Eps$HFe[1], "SSn" = CMB_ANOVA$ANOVA$SSn[4], "SSd" = CMB_ANOVA$ANOVA$SSd[4], "F" = CMB_ANOVA$ANOVA$F[4], "p" = CMB_Eps$`p[HF]`[1], "sig" = CMB_Eps$`p[HF]<.05`[1], "peta" = CMB_B_peta)

# Cat*Month:

CMB_CM_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[5], "DFn" = CMB_ANOVA$ANOVA$DFn[5], "DFd" = CMB_ANOVA$ANOVA$DFd[5], "SSn" = CMB_ANOVA$ANOVA$SSn[5], "SSd" = CMB_ANOVA$ANOVA$SSd[5], "F" = CMB_ANOVA$ANOVA$F[5], "p" = CMB_ANOVA$ANOVA$p[5], "sig" = CMB_ANOVA$ANOVA$`p<.05`[5], "peta" = CMB_CM_peta)

# Cat*Block:

CMB_CB_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[6], "DFn" = CMB_ANOVA$ANOVA$DFn[6] * CMB_Eps$HFe[2], "DFd" = CMB_ANOVA$ANOVA$DFd[6] * CMB_Eps$HFe[2], "SSn" = CMB_ANOVA$ANOVA$SSn[6], "SSd" = CMB_ANOVA$ANOVA$SSd[6], "F" = CMB_ANOVA$ANOVA$F[6], "p" = CMB_Eps$`p[HF]`[2], "sig" = CMB_Eps$`p[HF]<.05`[2], "peta" = CMB_CB_peta)

# Month*Block:

CMB_MB_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[7], "DFn" = CMB_ANOVA$ANOVA$DFn[7] * CMB_Eps$HFe[3], "DFd" = CMB_ANOVA$ANOVA$DFd[7] * CMB_Eps$HFe[3], "SSn" = CMB_ANOVA$ANOVA$SSn[7], "SSd" = CMB_ANOVA$ANOVA$SSd[7], "F" = CMB_ANOVA$ANOVA$F[7], "p" = CMB_Eps$`p[HF]`[3], "sig" = CMB_Eps$`p[HF]<.05`[3], "peta" = CMB_MB_peta)

# Cat*Month*Block:

CMB_CMB_ANOVA = data.frame("Effect" = CMB_ANOVA$ANOVA$Effect[8], "DFn" = CMB_ANOVA$ANOVA$DFn[8] * CMB_Eps$HFe[4], "DFd" = CMB_ANOVA$ANOVA$DFd[8] * CMB_Eps$HFe[4], "SSn" = CMB_ANOVA$ANOVA$SSn[8], "SSd" = CMB_ANOVA$ANOVA$SSd[8], "F" = CMB_ANOVA$ANOVA$F[8], "p" = CMB_Eps$`p[HF]`[4], "sig" = CMB_Eps$`p[HF]<.05`[4], "peta" = CMB_CMB_peta)

# Combine into a single data frame:

CMB_ANOVA_Corr = rbind(CMB_C_ANOVA, CMB_M_ANOVA, CMB_B_ANOVA, CMB_CM_ANOVA, CMB_CB_ANOVA, CMB_MB_ANOVA, CMB_CMB_ANOVA)

# Create a table to display the results:

kable(CMB_ANOVA_Corr, digits = 4,
      caption = "Table 17. Mixed ANOVA on performance across block by month and category set.",
      col.names = c("Effect", "DFn", "DFd", "SSn", "SSd", "F", "p", "sig", "peta"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 17. Mixed ANOVA on performance across block by month and category set.
Effect DFn DFd SSn SSd F p sig peta
Cat 1.0000 442.000 1.5916 16.0741 43.7664 0.0000
0.0901
Month 6.0000 442.000 1.1411 16.0741 5.2295 0.0000
0.0663
Block 2.5067 1107.943 1.5776 5.9090 118.0047 0.0000
0.2107
Cat:Month 6.0000 442.000 0.2320 16.0741 1.0632 0.3839 0.0142
Cat:Block 2.5067 1107.943 0.1618 5.9090 12.1035 0.0000
0.0267
Month:Block 15.0400 1107.943 0.1226 5.9090 1.5283 0.0877 0.0203
Cat:Month:Block 15.0400 1107.943 0.0698 5.9090 0.8697 0.5990 0.0117

CatMonthBlock:

Based on an alpha level of .05, the three-way interaction between category set, month, and block was not found to be statistically significant F (15, 1108) = 0.87, p = 0.6, \(\eta^2_{p}\) = 0.01.

Month*Block:

Based on an alpha level of .05, the two-way interaction between month and block was not found to be statistically significant F (15, 1108) = 1.53, p = 0.09, \(\eta^2_{p}\) = 0.02.

Cat*Block:

Based on an alpha level of .05, the two-way interaction between category set and block was found to be statistically significant F (3, 1108) = 12.1, p < .001, \(\eta^2_{p}\) = 0.03.

Cat*Month:

Based on an alpha level of .05, the two-way interaction between category set and month was not found to be statistically significant F (6, 442) = 1.06, p = 0.38, \(\eta^2_{p}\) = 0.01.

Block:

Because the category set x block interaction was found to be significant, we don’t need to consider the main effect of block.

Month:

Based on an alpha level of .05, the main effect of month was found to be statistically significant F (6, 442) = 5.23, p < .001, \(\eta^2_{p}\) = 0.07.

Cat:

Because the category set x block interaction was found to be significant, we don’t need to consider the main effect of category set.

Cat*Block Post-Hoc:

The significant category set x block interaction will be further assessed via a test of simple main effects of block across levels of category set. To do so, we will conduct ANOVA analyses to assess the effect of block on performance for each condition separately. Note that, in order to correct the family-wise error rate, a Holm-Bonferroni adjustment will be used when assessing the significance of main effects associated with these ANOVAs. The Holm-Bonferroni adjustment defines a corrected alpha level according to the following formula: \[\alpha_{corrected} = \frac{\alpha}{number\;of\;comparisons - rank\;of\;comparison + 1}\]

Before we begin, we’ll define RD and II data subsets:

RD_Perf = subset(CatLong, Cat == "RD")

II_Perf = subset(CatLong, Cat == "II")

Rule-Defined ANOVA

Assumptions

The repeated-measures ANOVA makes three primary assumptions:

1. Independent Random Sampling

This assumption was met during testing.

2. Normality
CMB_RD_Shap = shapiro.test(RD_Perf$Performance)
CMB_RD_Shap
## 
##  Shapiro-Wilk normality test
## 
## data:  RD_Perf$Performance
## W = 0.91747, p-value < 0.00000000000000022

Based on an alpha level of .05, the assumption of normality is not met; W = 0.92, p < .001. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).

3. Sphericity
# Conduct the ANOVA:

CMB_RD_ANOVA = ezANOVA(data = RD_Perf, dv = .(Performance), wid = .(Subject), within = .(Block), type = "III", detailed = TRUE, return_aov = TRUE)

# Extract the sphericity test from the ANOVA output:

CMB_RD_Mau = CMB_RD_ANOVA$`Mauchly's Test for Sphericity`

# Create a table to display the results:

kable(CMB_RD_Mau, digits = 4,
      caption = "Table 18. Test of sphericity on rule-defined performance by block.",
      col.names = c("Effect", "W", "p", "sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 18. Test of sphericity on rule-defined performance by block.
Effect W p sig
2 Block 0.6067 0

Based on an alpha level of .05, the assumption of sphericity is not met (W = 0.61, p < .001). We will now look at the potential epsilon corrections that we can use.

# Extract the corrections table from the ANOVA output:
CMB_RD_Eps = CMB_RD_ANOVA$`Sphericity Corrections`

# Create a table to display the results:

kable(CMB_RD_Eps, digits = 4,
      caption = "Table 19. Epsilon corrections for the test of rule-defined performance by block.",
      col.names = c("Effect", "GG Epsilon", "GG p", "GG sig", "HF Epsilon", "HF p", "HF sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 19. Epsilon corrections for the test of rule-defined performance by block.
Effect GG Epsilon GG p GG sig HF Epsilon HF p HF sig
2 Block 0.7544 0
0.7617 0

Because \(\varepsilon_{GG}\) > .75, we will apply the Huynh-Feldt correction (\(\varepsilon\) = 0.76; as suggested by Girden, 1992).

Information Integration ANOVA

Assumptions
1. Independent Random Sampling

This assumption was met during testing.

2. Normality
CMB_II_Shap = shapiro.test(II_Perf$Performance)
CMB_II_Shap
## 
##  Shapiro-Wilk normality test
## 
## data:  II_Perf$Performance
## W = 0.99504, p-value = 0.008679

Based on an alpha level of .05, the assumption of normality is not met; W = 1, p = 0.01. However, ANOVA tests are typically robust to violations of normality (Gardner & Tremblay, 2007).

3. Sphericity
# Conduct the ANOVA:

CMB_II_ANOVA = ezANOVA(data = II_Perf, dv = .(Performance), wid = .(Subject), within = .(Block), type = "III", detailed = TRUE, return_aov = TRUE)

# Extract the sphericity test from the ANOVA output:

CMB_II_Mau = CMB_II_ANOVA$`Mauchly's Test for Sphericity`

# Create a table to display the results:

kable(CMB_II_Mau, digits = 4,
      caption = "Table 20. Test of sphericity on information integration performance by block.",
      col.names = c("Effect", "W", "p", "sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 20. Test of sphericity on information integration performance by block.
Effect W p sig
2 Block 0.8705 0

Based on an alpha level of .05, the assumption of sphericity is not met (W = 0.87, p < .001). We will now look at the potential epsilon corrections that we can use.

# Extract the corrections table from the ANOVA output:
CMB_II_Eps = CMB_II_ANOVA$`Sphericity Corrections`

# Create a table to display the results:

kable(CMB_II_Eps, digits = 4,
      caption = "Table 21. Epsilon corrections for the test of information integration performance by block.",
      col.names = c("Effect", "GG Epsilon", "GG p", "GG sig", "HF Epsilon", "HF p", "HF sig"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 21. Epsilon corrections for the test of information integration performance by block.
Effect GG Epsilon GG p GG sig HF Epsilon HF p HF sig
2 Block 0.909 0
0.9224 0

Because \(\varepsilon_{GG}\) > .75, we will apply the Huynh-Feldt correction (\(\varepsilon\) = 0.92; as suggested by Girden, 1992).

Interpretation of Main Effects

We will now employ the Holm-Bonferroni adjustment to assess the main effects of these tests.

# Create a data frame of the p-values being assessed:

CMB_Corr_ps1 = data.frame("Condition" = c("RD", "II"), "p" = c(CMB_RD_Eps$`p[HF]`, CMB_II_Eps$`p[HF]`))

# Perform p-adjustment:

CMB_Corr_ps2 = p.adjust(CMB_Corr_ps1[,2], method = c("holm"), n = 2)

# Create a data frame for the RD Condition:

CMB_RD_ANOVA_Corr = data.frame("Condition" = c("RD"), "Effect" = CMB_RD_ANOVA$ANOVA$Effect[2], "DFn" = CMB_RD_ANOVA$ANOVA$DFn[2] * CMB_RD_Eps$HFe, "DFd" = CMB_RD_ANOVA$ANOVA$DFd[2] * CMB_RD_Eps$HFe, "SSn" = CMB_RD_ANOVA$ANOVA$SSn[2], "SSd" = CMB_RD_ANOVA$ANOVA$SSd[2], "F" = CMB_RD_ANOVA$ANOVA$F[2], "p" = round(CMB_Corr_ps2[1], 4), "peta" = CMB_RD_ANOVA$ANOVA$ges[2])

# Create a data frame for the II Condition:

CMB_II_ANOVA_Corr = data.frame("Condition" = c("II"),  "Effect" = CMB_II_ANOVA$ANOVA$Effect[2], "DFn" = CMB_II_ANOVA$ANOVA$DFn[2] * CMB_II_Eps$HFe, "DFd" = CMB_II_ANOVA$ANOVA$DFd[2] * CMB_II_Eps$HFe, "SSn" = CMB_II_ANOVA$ANOVA$SSn[2], "SSd" = CMB_II_ANOVA$ANOVA$SSd[2], "F" = CMB_II_ANOVA$ANOVA$F[2], "p" = round(CMB_Corr_ps2[2], 4), "peta" = CMB_II_ANOVA$ANOVA$ges[2])

# Combine both data frames into one:

CMB_RDII_ANOVA_Corr = rbind(CMB_RD_ANOVA_Corr, CMB_II_ANOVA_Corr)

# Format the results table so that significant p-values will be bolded:

CMB_RDII_ANOVA_Corr = CMB_RDII_ANOVA_Corr %>%
  mutate(
    p = text_spec(p, bold = (ifelse(p < .05, "TRUE", "FALSE")))
  )

# Create a table to display the results:

kable(CMB_RDII_ANOVA_Corr, digits = 4,
      caption = "Table 22. Tests of simple main effects on performance by block for each category set condition.",

      align = 'c', escape = FALSE) %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 22. Tests of simple main effects on performance by block for each category set condition.
Condition Effect DFn DFd SSn SSd F p peta
RD Block 2.2850 566.6777 2.6390 3.7067 176.5669 0 0.1431
II Block 2.7671 570.0177 0.5793 2.3931 49.8707 0 0.0699

The main effect of block was found to be significant for both rule-defined and information integration performance; F (2, 567) = 176.57, p < .001, \(\eta^2_{p}\) = 0.14 and F (3, 570) = 49.87, p < .001, \(\eta^2_{p}\) = 0.07, respectively.

The significant effect of block will be further assessed via post-hoc tests. Because our data displayed a violation of the sphericity assumption, we will use the Bonferroni adjustment.

The means that we will be comparing are presented below:

CBDescs = summarySE(data = CatLong, measurevar = "Performance",
                    groupvars = c("Cat", "Block"), conf.interval = .95)

# Create a table to display the results:

kable(CBDescs, digits = 4,
      caption = "Table 23. Descriptives by category set and block.",
      col.names = c("Category Set", "Block", "n", "M","SD", "SE", "CI"),
      align = 'c') %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 23. Descriptives by category set and block.
Category Set Block n M SD SE CI
RD Block1 249 0.6675 0.1195 0.0076 0.0149
RD Block2 249 0.7748 0.1291 0.0082 0.0161
RD Block3 249 0.7876 0.1251 0.0079 0.0156
RD Block4 249 0.7935 0.1309 0.0083 0.0163
II Block1 207 0.6155 0.0740 0.0051 0.0101
II Block2 207 0.6586 0.0922 0.0064 0.0126
II Block3 207 0.6783 0.1026 0.0071 0.0141
II Block4 207 0.6820 0.1138 0.0079 0.0156
RD Post-hoc Test
CMB_RD_Bon = pairwise.t.test(RD_Perf$Performance, RD_Perf$Block, p.adjust.method = "bonferroni", paired = T)

# Round the results to 4 decimal places:

CMB_RD_Bon = data.frame(round(CMB_RD_Bon$p.value, 4))

# Format the results table so that significant p-values will be bolded:

CMB_RD_BonB = CMB_RD_Bon %>%
  mutate(
    Block1 = text_spec(Block1, bold = (ifelse(Block1 < .05, "TRUE", "FALSE"))),
    Block2 = text_spec(Block2, bold = (ifelse(Block2 < .05, "TRUE", "FALSE"))),
    Block3 = text_spec(Block3, bold = (ifelse(Block3 < .05, "TRUE", "FALSE")))
  )

# Add row labels:

CMB_RD_BonL = data.frame("Comparison" = c("Block2", "Block3", "Block4"))

CMB_RD_BonB = cbind(CMB_RD_BonL, CMB_RD_BonB)

# Create a table to display the results:

kable(CMB_RD_BonB, digits = 4,
      caption = "Table 24. Post-hoc test of rule-defined performance across block.",

      align = 'c', escape = FALSE) %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 24. Post-hoc test of rule-defined performance across block.
Comparison Block1 Block2 Block3
Block2 0 NA NA
Block3 0 0.0531 NA
Block4 0 0.0077 1

These results indicate that, with respect to the rule-defined task, block 1 performance (MRD1 = 0.67, SDRD1 = 0.12) was significantly lower than blocks 2 (MRD2 = 0.77, SDRD2 = 0.13), 3 (MRD3 = 0.79, SDRD3 = 0.13), and 4 (MRD4 = 0.79, SDRD4 = 0.13; all ps < .001). Block 4 performance was also significantly higher than performance on block 2 (p = 0.01). No other between-block performance differences were found to be statistically significant (ps > .05).

II Post-hoc Test
CMB_II_Bon = pairwise.t.test(II_Perf$Performance, II_Perf$Block, p.adjust.method = "bonferroni", paired = T)

# Round the results to 4 decimal places:

CMB_II_Bon = data.frame(round(CMB_II_Bon$p.value, 4))

# Format the results table so that significant p-values will be bolded:

CMB_II_BonB = CMB_II_Bon %>%
  mutate(
    Block1 = text_spec(Block1, bold = (ifelse(Block1 < .05, "TRUE", "FALSE"))),
    Block2 = text_spec(Block2, bold = (ifelse(Block2 < .05, "TRUE", "FALSE"))),
    Block3 = text_spec(Block3, bold = (ifelse(Block3 < .05, "TRUE", "FALSE")))
  )

# Add row labels:

CMB_II_BonL = data.frame("Comparison" = c("Block2", "Block3", "Block4"))

CMB_II_BonB = cbind(CMB_II_BonL, CMB_II_BonB)

# Create a table to display the results:

kable(CMB_II_BonB, digits = 4,
      caption = "Table 25. Post-hoc test of information integration performance across block.",

      align = 'c', escape = FALSE) %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 25. Post-hoc test of information integration performance across block.
Comparison Block1 Block2 Block3
Block2 0 NA NA
Block3 0 0.0036 NA
Block4 0 0.0011 1

These results indicate that, with respect to the information integration task, block 1 performance (MII1 = 0.62, SDII1 = 0.07) was significantly lower than blocks 2 (MII2 = 0.66, SDII2 = 0.09), 3 (MII3 = 0.68, SDII3 = 0.1), and 4 (MII4 = 0.68, SDII4 = 0.11; all ps < .001). Block 2 performance was also significantly lower than both block 3 (p = 0.004) and 4 (p = 0.001). No other between-block performance differences were found to be statistically significant (ps > .05).

Month Post-Hoc:

The significant main effect of month will be further assessed (though the test will essentially be identical to the one conducted in the Problem 2 section above). Because we are planning on completing all possible pairwise comparisons, we will use the Tukey HSD adjustment (as suggested by Maxwell & Delaney, 2003).

Main Effect of Time
CMB_M_Tuk = TukeyHSD(aov(CatLong$Performance ~ CatLong$Month))

# Extract the results table from the output:

CMB_M_Tuk = data.frame(CMB_M_Tuk$`CatLong$Month`) 

# Round the results to 4 decimal places:

CMB_M_Tuk = round(CMB_M_Tuk, 4)

# Format the results table so that significant p-values will be bolded:

CMB_M_TukB = CMB_M_Tuk %>%
  mutate(
    p.adj = cell_spec(p.adj, bold = (ifelse(p.adj < .05, "TRUE", "FALSE")))
  )

# Add row labels:

CMB_M_TukL = data.frame("Comparison" = c("Feb-Jan", "Mar-Jan", "Apr-Jan", "Sept-Jan", "Oct-Jan", "Nov-Jan", "Mar-Feb", "Apr-Feb", "Sept-Feb", "Oct-Feb", "Nov-Feb", "Apr-Mar", "Sept-Mar", "Oct-Mar", "Nov-Mar", "Sept-Apr", "Oct-Apr", "Nov-Apr", "Oct-Sept", "Nov-Sept", "Nov-Oct"))

CMB_M_TukB = cbind(CMB_M_TukL, CMB_M_TukB)

# Create a table to display the results:

kable(CMB_M_TukB, digits = 4,
      caption = "Table 26. Post-hoc tests of performance by month.",
      col.names = c("Comparison", "Difference", "Lower", "Upper", "p"),
      align = 'c', escape = FALSE) %>%
  kable_styling(bootstrap_options =
                  c("hover", "responsive", "striped"),
                full_width = F, position = "center")
Table 26. Post-hoc tests of performance by month.
Comparison Difference Lower Upper p
Feb-Jan -0.0442 -0.0822 -0.0063 0.0107
Mar-Jan -0.0467 -0.0797 -0.0136 0.0006
Apr-Jan -0.0037 -0.0358 0.0284 0.9999
Sept-Jan 0.0486 0.0184 0.0789 0
Oct-Jan -0.0007 -0.0302 0.0287 1
Nov-Jan -0.0453 -0.1005 0.0098 0.1885
Mar-Feb -0.0024 -0.0404 0.0356 1
Apr-Feb 0.0405 0.0033 0.0777 0.0223
Sept-Feb 0.0929 0.0573 0.1284 0
Oct-Feb 0.0435 0.0086 0.0784 0.0045
Nov-Feb -0.0011 -0.0593 0.0572 1
Apr-Mar 0.0429 0.0108 0.0750 0.0016
Sept-Mar 0.0953 0.0651 0.1255 0
Oct-Mar 0.0459 0.0164 0.0754 0.0001
Nov-Mar 0.0013 -0.0538 0.0565 1
Sept-Apr 0.0524 0.0232 0.0816 0
Oct-Apr 0.0030 -0.0254 0.0314 0.9999
Nov-Apr -0.0416 -0.0962 0.0130 0.2705
Oct-Sept -0.0494 -0.0757 -0.0231 0
Nov-Sept -0.0939 -0.1475 -0.0404 0
Nov-Oct -0.0446 -0.0977 0.0085 0.168

These results indicate that participants displayed significantly better performance during September (MSept = 0.76, SDSept = 0.09) than during every other month (MJan = 0.71, SDJan = 0.11, p < .001; MFeb = 0.67, SDFeb = 0.11, p < .001; MMar = 0.67, SDMar = 0.12, p < .001; MApr = 0.71, SDApr = 0.11, p < .001; MOct = 0.71, SDOct = 0.08, p < .001; MNov = 0.7, SDNov = 0.07, p < .001). Participants in February and March also performed significantly worse than participants in January (p = 0.01 and p = 0.001, repectively), April (p = 0.02 and p = 0.002, respectively), and October (p = 0.004 and p < .001, respectively).