final_project

Final project work

Quarto doc, due May 16th

Hypotheses

  1. If kids have to check a shopping list that tells them what items to select from a virtual store, and there is an annoying 4-second delay every time they want to check their list, children will be incentivized to make fewer trips. This can be accomplished by selecting more correct items per trip. To accomplish this, children will have to attempt to encode more items in memory, by spending more time encoding list items. We predict, then, that the time children spend studying the list will be longer in the Long-delay conditions (when there is a 4-second delay to access the list) and shorter in the No-delay conditions (when there is no delay to access the list).

  2. Relatedly, we predict that children will succeed at encoding more items when they attempt to do so, leading to more correct responses as costs increase. We predict, then, that Memory Usage (MU, calculated as a difference score, with corrections for guessing) will be larger in the Long-delay conditions and smaller in the No-delay conditions.

  3. When allowed to choose how many attempts to make, we expect that children will tend to be conservative and therefore underperform. Given this, if children are ‘forced’ to respond despite their risk aversion, memory usage will increase. We predict then that MU in total-response conditions will be higher than MU in free-response conditions.

Sampling design

The Shopping Game is a tablet-based, naturalistic, child-friendly memory paradigm. In this game, 74 5.5-7.0 year-old children were asked to pick items from a store based on a shopping list of 10 items. The store contained 20 items (10 of them from the list), arranged in two-alternative forced choice (2AFC) pairs on shelves. Importantly, the store and the list were not visible on the screen simultaneously; instead, participants could toggle between them by tapping an icon. The key manipulation was the accessibility of the list.

The experiment had four conditions: Two Delay Cost conditions (Long-delay (4 s) versus No-delay (0 s)) and two Response Type conditions (Free-response versus Total-response). No-delay and Long-delay condition trials are identical except for the imposition of a 4 s waiting period between a child touching the icon to make a trip to their shopping list and the list actually appearing. In the Free-response condition trials, children are free to attempt as many of the 2AFC shelves as they like on each visit to the store and may make as many trips back to the list as they like. Total-response condition trials are always the last (4th) trial in their respective block and will come as a surprise to the children. When a Total-response trial begins, it doesn’t appear any different from the previous three Free-response trials: after the child makes their first trip to the list and returns to the store, they will make their selections as usual. However, when the child taps the icon to revisit the list, a “closing” sign appears on the left of the screen, and the experimenter will explain to the child that the store is closing now, and they must continue to choose items from each of the 2AFC pairs (based on whatever memory they have from their first (and only) trip). The Total-response conditions will therefore allow us to assess whether children are using the entire contents of their memory when they make a trip between the list and the shelf. This is a way to estimate how children respond based on memories that they may be less confident in.

There will be two blocks of trials, blocked by condition. A session will begin with either a No-delay block or a Long-delay block (counterbalanced between children. Each block will consist of one practice trial, three Free-response trials, and a fourth, final Total-response trial where children are given only one opportunity to sample the list and, critically, must make a selection for all of the 10 2AFC shelves. Since there are only two Total-response trials and they occur after three Free-response trials in both No-delay and Long-delay conditions, we do not expect that children will anticipate their appearance.

All children will complete both the blocked No-delay and Long-delay conditions; order will be counterbalanced between children. The Total-response conditions will always be the last trial in each of the two blocks.

We will recruit typically developing children between the ages of 5.5 and 7.0 years. We will recruit and test children from local children’s museums, daycares, and libraries. Children will receive a small toy and some stickers as a gift for participating.

Based on power analyses calculated using G*Power 3.1 for multiple linear regression with two predictors, we determined the minimum sample size to be 68 children based on a medium effect size of f2 = 0.15, alpha = 0.05, and 80% power.

We will stop data collection when we have collected data from 78 children, to allow for unexpected exclusions after data collection has been completed. We will include all children who met the inclusion criteria. The final sample size after exclusions was 74 children.

Outliers and Exclusions:
We will conduct an outlier analysis on the log of Study Time (the amount of time the child chooses to spend on the shopping list before making a visit to the store). Log-transformed Study times more than 3.0 MAD from the median (Leys et al., 2013) will be excluded from the analysis. If such an exclusion results in no trips for a given trial, the trial will be excluded.

Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median.

Analyses

Hypotheses will be tested using Generalized Linear Mixed-Effect Models (Lo & Andrews, 2015) with Condition as a fixed effect and child (participant) as a random intercept, and with appropriate linking functions. The model for Study Time will have an inverse Gaussian linking function because of its similar distribution to reaction times (Lo & Andrews, 2015). The model for Memory Usage will have an identity linking function because based on the results of our previous studies, we expect it to follow a normal distribution. If there are substantive violations of the assumptions of these analyses, we will corroborate results with analogous non-parametric tests.

We will be using an alpha of 0.05 to determine the significance of factors in our generalized linear mixed-effects models.

Lo, S., & Andrews, S. (2015). To transform or not to transform: using generalized linear mixed models to analyze reaction time data. Frontiers in Psychology, 6, 1171.

Preprocessing

Technically there is a preprocessing step that comes before this, but its kind of long and I don’t think its necessary to include in the project (its just about condensing individual participant csv files into one master file, and in the meantime it also calculates correct responses + means across trials for the variables of interest).

Here we read in the data after its initial preprocessing step, but we have to combine it with the csv that contains children’s ages and properly name one of the conditions, and then we get an average study time for each condition for each child, since we preregistered modeling it that way. Although a glm could also model multiple rows per condition per trial, I am not aware that doing it that way would be substantially “better”, so I will stick to the plan.

Visualize the data

We should check the histograms at least, to make sure its distribution does indeed look like an inverse-gaussian (response time distribution) for study time, and normal distributions for Memory usage.

hist(freeResponseOnly$mean_StudyTime)

hist(freeResponseOnly$mean_MU)

hist(freeVtotal$mean_MU)

The memory usage histograms are not as normal as expected. We may have some issues in our linear models.

Modeling Study Time

Hypotheses will be tested using Generalized Linear Mixed-Effect Models (Lo & Andrews, 2015) with Condition as a fixed effect and child (participant) as a random intercept, and with appropriate linking functions. The model for Study Time will have an inverse Gaussian linking function because of its similar distribution to reaction times (Lo & Andrews, 2015).

Lo, S., & Andrews, S. (2015). To transform or not to transform: using generalized linear mixed models to analyze reaction time data. Frontiers in Psychology, 6, 1171.

studyTimeGlmm <- glmer(mean_StudyTime ~ condsFile + (1|participant), 
                         data = freeResponseOnly, 
                         family = inverse.gaussian(link = "identity"))

Diagnose issues

Before we look at the summary, lets do some model checks to make sure this worked OK.

Kernel density estimate plot

I like using this plot! It shows the actual values as a smoothed histogram and then the model predicted values in red. We want the lines to somewhat match up, but we won’t get a perfect match - as long as it’s in the same ballpark.

plot(density(freeResponseOnly$mean_StudyTime), 
     xlim=c(0, 160), ylim=c(0, 1), main='M_1 y_hat')
lines(density(predict(studyTimeGlmm, type='response')), col='red')

That looks great!

Residuals vs Fitted plot

We are looking for random, even scatter here.

plot(studyTimeGlmm)

I wouldn’t say this scatter is super random. The errors are getting larger as the fitted values increase.

QQ Plot

In the QQ plot, we want the dots to generally follow the diagonal line.

residuals <- residuals(studyTimeGlmm)
qqnorm(residuals)
qqline(residuals, col = "red")

That looks great.

Check_model from Performance

Now let’s run check_model from the Performance package.

library(performance)
check_model(studyTimeGlmm)
Loading required namespace: statmod
Simulations from your fitted model produce infinite values. Consider if this is sensible
Cannot simulate residuals for models of class `glmerMod`. Please try
  `check_model(..., residual_type = "normal")` instead.

There is a bit of wiggliness here, but overall things look OK.

Null VS Full model

Now we will compare our model with one that has no fixed effects. We want our full model to fit better than the null.

mixConTime_GLMM_null <- glmer(mean_StudyTime ~ 1 + (1|participant), 
                              data = freeResponseOnly, 
                              family = inverse.gaussian(link = "identity"))
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0721721 (tol = 0.002, component 1)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
anova(mixConTime_GLMM_null,studyTimeGlmm)
Data: freeResponseOnly
Models:
mixConTime_GLMM_null: mean_StudyTime ~ 1 + (1 | participant)
studyTimeGlmm: mean_StudyTime ~ condsFile + (1 | participant)
                     npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
mixConTime_GLMM_null    3 625.04 634.05 -309.52   619.04                     
studyTimeGlmm           4 618.61 630.62 -305.30   610.61 8.4319  1   0.003687
                       
mixConTime_GLMM_null   
studyTimeGlmm        **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The full model does fit better! The AIC in the full model is lower, and its significantly a better fit according to anova().

Conclusion of diagnostics

There was a little bit of patterning in the residuals in the residuals vs. fitted plot, but the qqplot, KDE plot, and check_model results looks overall AOK. We also know that our fixed effect of condition is doing a good job as removing it significantly reduces the fit of the model. I don’t think doing a transformation is warranted.

Interpret Study Time results

Results output

summary(studyTimeGlmm)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: inverse.gaussian  ( identity )
Formula: mean_StudyTime ~ condsFile + (1 | participant)
   Data: freeResponseOnly

     AIC      BIC   logLik deviance df.resid 
   618.6    630.6   -305.3    610.6      145 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.74842 -0.54638 -0.08409  0.47357  2.28652 

Random effects:
 Groups      Name        Variance Std.Dev.
 participant (Intercept) 3.08496  1.7564  
 Residual                0.02398  0.1548  
Number of obs: 149, groups:  participant, 75

Fixed effects:
                 Estimate Std. Error t value             Pr(>|z|)    
(Intercept)        6.5354     0.3921  16.667 < 0.0000000000000002 ***
condsFilenoDelay  -0.6350     0.2174  -2.921              0.00348 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr)
condsFlnDly -0.276
# get overall p for condition
mixTimeConAov <- cbind(car::Anova(studyTimeGlmm),
                       r.squaredGLMM(studyTimeGlmm,mixConTime_GLMM_null)) 
Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
mixTimeConAov
             Chisq Df Pr(>Chisq)        R2m       R2c
condsFile 8.535085  1 0.00348365 0.03037668 0.9536573

Effect size

From Ben Bolker’s glm FAQ (quoting Henrik Singmann):

“The fact that calculating a global measure of model fit (such as R2) is already riddled with complications and that no simple single number can be found, should be a hint that doing so for a subset of the model parameters (i.e., main-effects or interactions) is even more difficult. Given this, I would not recommend to try finding a measure of standardized effect sizes for mixed models.”

Results visualization

Results interpretation

[1] "Standard deviation for study time in No-Delay conditon:"
[1] 2.481185
[1] "Standard deviation for study time in Long-Delay conditon:"
[1] 3.627854

Study time showed a significant effect of condition (X2 = 8.54, p = 0.003), with children studying the list longer as a function of increasing access costs across conditions. Study time was longest in the Long-delay condition (M = 6.43, SD = 3.63), and it was significantly longer than study time in the No-delay condition (M = 5.31, SD = 2.48).

Modeling Memory Usage

MU_GLMM <- lmer(mean_MU ~ condsFile + (1|participant),
                 data = freeResponseOnly)

Diagnose issues

Kernel density estimate plot

plot(density(freeResponseOnly$mean_MU), xlim=c(-50, 50), ylim=c(0, 0.6), 
     main='M_1 y_hat')
lines(density(predict(MU_GLMM, type='response') ), col='red')

This isn’t so bad - the predicted line is in the ballpark, but it does shoot up higher than the actual data.

Residuals vs Fitted plot

plot(MU_GLMM)

This is pretty clear heteroskedasticity.

QQ Plot

residuals <- residuals(MU_GLMM)
qqnorm(residuals)
qqline(residuals, col = "red")

The tail at the end wiggles up.

Check_model from Performance

library(performance)
check_model(MU_GLMM)

Transform or not to transform?

Our model is not fitting perfectly. The data has negative values though, so we can’t just use a gamma or negative binom distribution in a glm without doing some transformation to deal with the negatives. Let’s try adding a constant to MU and then using a gamma distribution.

freeResponseOnly$MU_add_1 <- freeResponseOnly$mean_MU + 1
hist(freeResponseOnly$MU_add_1)

MU_add1_GLM <- glmer(MU_add_1 ~ condsFile + (1|participant), 
                     data = freeResponseOnly, family = Gamma(link = "log"))

# KDE
plot(density(freeResponseOnly$MU_add_1), xlim=c(-50, 50), ylim=c(0, 0.6), 
     main='M_1 y_hat')
lines(density(predict(MU_add1_GLM, type='response') ), col='red')

residuals <- residuals(MU_add1_GLM)
qqnorm(residuals)
qqline(residuals, col = "red")

plot(MU_add1_GLM)

# Eh- not much better

check_model(MU_add1_GLM)

I feel like that is not much better.

Null vs Full model

MUfree_null <- lmer(mean_MU ~ 1 + (1|participant), 
                     data = freeResponseOnly)

MU_add1_null <- glmer(MU_add_1 ~ 1 + (1|participant), 
                     data = freeResponseOnly, family = Gamma(link = "log"))

print("Below is for non-transformed data, Gaussian")
[1] "Below is for non-transformed data, Gaussian"
anova(MUfree_null,MU_GLMM)
refitting model(s) with ML (instead of REML)
Data: freeResponseOnly
Models:
MUfree_null: mean_MU ~ 1 + (1 | participant)
MU_GLMM: mean_MU ~ condsFile + (1 | participant)
            npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
MUfree_null    3 441.31 450.32 -217.65   435.31                       
MU_GLMM        4 440.48 452.50 -216.24   432.48 2.8228  1    0.09293 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print("Below is for transformed, Gamma")
[1] "Below is for transformed, Gamma"
anova(MU_add1_null,MU_add1_GLM)
Data: freeResponseOnly
Models:
MU_add1_null: MU_add_1 ~ 1 + (1 | participant)
MU_add1_GLM: MU_add_1 ~ condsFile + (1 | participant)
             npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
MU_add1_null    3 366.65 375.66 -180.32   360.65                       
MU_add1_GLM     4 364.78 376.80 -178.39   356.78 3.8656  1    0.04928 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion of diagnostics

Even though the transformed data did not seem much better, it fit better than the null model while for the non-transformed data, the null model fit better than the full. I wonder if this means that the transformed model indeed is providing a better fit to the data.

Interpret Memory Usage results

Results output

summary(MU_add1_GLM)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: Gamma  ( log )
Formula: MU_add_1 ~ condsFile + (1 | participant)
   Data: freeResponseOnly

     AIC      BIC   logLik deviance df.resid 
   364.8    376.8   -178.4    356.8      145 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.4888 -0.5178 -0.0931  0.4913  3.3170 

Random effects:
 Groups      Name        Variance Std.Dev.
 participant (Intercept) 0.05277  0.2297  
 Residual                0.08150  0.2855  
Number of obs: 149, groups:  participant, 75

Fixed effects:
                  Estimate Std. Error t value            Pr(>|z|)    
(Intercept)        0.98810    0.04967  19.892 <0.0000000000000002 ***
condsFileNo-Delay -0.08231    0.04156  -1.981              0.0476 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr)
cndsFlN-Dly -0.424
MUfreeAov <- cbind(car::Anova(MU_add1_GLM),
                   r.squaredGLMM(MU_add1_GLM,MU_add1_null))
Warning in data.frame(..., check.names = FALSE): row names were found from a
short variable and have been discarded
MUfreeAov
     Chisq Df Pr(>Chisq)        R2m       R2c
1 3.922957  1 0.04763103 0.01254072 0.4006480
2 3.922957  1 0.04763103 0.01283819 0.4101518
3 3.922957  1 0.04763103 0.01223383 0.3908438

Results visualization

# Calculate means and standard errors
means <- tapply(freeResponseOnly$mean_MU, freeResponseOnly$condsFile, mean)

# Calculate standard deviations and sample sizes for each condition
sds <- tapply(freeResponseOnly$mean_MU, freeResponseOnly$condsFile, sd)
n_per_condition <- tapply(freeResponseOnly$mean_MU, freeResponseOnly$condsFile, length)

# Calculate standard errors
se <- sds / sqrt(n_per_condition)

# Define the x-axis levels (to match the ggplot2 ordering)
conds_levels <- c("No-Delay", "Long-Delay")
x_positions <- 1:length(conds_levels)

# Create an empty plot
plot(
  x_positions, means[conds_levels], 
  type = "n", xaxt = "n", ylim = c(-1, 7.0), xlim = c(0.5, 2.5),
  xlab = "", ylab = "Mean Memory Usage (items)", 
  main = "",
    cex.lab = 1.5,
  cex.axis = 1.5,
  cex.main = 1.5
)

participants <- unique(freeResponseOnly$participant)

for (p in participants) {
  this_participant <- freeResponseOnly[freeResponseOnly$participant == p, ]
  if (nrow(this_participant) == 2) {
    # Get the x positions for this participant’s conditions
    x_vals <- match(this_participant$condsFile, conds_levels)
    y_vals <- this_participant$mean_MU
    lines(x_vals, y_vals, col = adjustcolor("#009E73", alpha.f = 0.3), lwd = 1)
  }
}


# Add x-axis labels normally
axis(1, at = x_positions, labels = conds_levels, cex.axis = 1.5)

# Add points for each participant
for (i in 1:length(conds_levels)) {
  condition_data <- freeResponseOnly[freeResponseOnly$condsFile == conds_levels[i], ]
  points(rep(x_positions[i], nrow(condition_data)), condition_data$mean_MU, 
         pch = 1, col = "#0072B2")
}

# Add points for the means
points(x_positions, means[conds_levels], pch = 10, col = "black")

# Add a solid line connecting the means
lines(x_positions, means[conds_levels], col = "black", lwd = 2)

# Add horizontal segments at the means (optional, similar to ggplot2's stat_summary)
segments(
  x_positions - 0.1, means[conds_levels], 
  x_positions + 0.1, means[conds_levels],
  col = "black", lwd = 3.0
)

# Add mean labels for each condition
text(x_positions, means[conds_levels] + 2, 
     labels = round(means[conds_levels], 2), cex = 2.5, col = "black")

# Add error bars using arrows() with specified lower and upper bounds
# Function to compute standard error
se <- function(x) sd(x, na.rm = TRUE) / sqrt(sum(!is.na(x)))

# Get standard errors for each condition
ses <- tapply(freeResponseOnly$mean_MU, freeResponseOnly$condsFile, se)

# Get means again, just to be sure
means <- tapply(freeResponseOnly$mean_MU, freeResponseOnly$condsFile, mean)

# Define bounds for error bars
lower_bounds <- means[conds_levels] - 2*ses[conds_levels]
upper_bounds <- means[conds_levels] + 2*ses[conds_levels]


# Add vertical error bars using arrows()
for (i in 1:length(x_positions)) {
  arrows(
    x0 = x_positions[i], y0 = lower_bounds[i], 
    x1 = x_positions[i], y1 = upper_bounds[i], 
    length = 0.1, angle = 90, code = 3, col = "#D55E00", lwd = 3.5
  )
}

Results interpretation

print("Standard deviation for MU in No-Delay conditon:")
[1] "Standard deviation for MU in No-Delay conditon:"
sd(freeResponseOnly$mean_MU[freeResponseOnly$condsFile == "No-Delay"], na.rm = T)
[1] 0.9352832
print("Standard deviation for MU in Long-Delay conditon:")
[1] "Standard deviation for MU in Long-Delay conditon:"
sd(freeResponseOnly$mean_MU[freeResponseOnly$condsFile == "Long-Delay"], na.rm = T)
[1] 1.174552

Memory Usage showed a marginally significant effect of condition (X2 = 3.92, p = 0.048), with children using more memory as a function of increasing access costs across conditions. Memory Usage was larger in the Long-delay condition (M = 1.85, SD = 1.74) than Memory Usage in the No-delay condition (M = 1.60, SD = 0.94).

Modeling Free VS Total Response

freeVtotal_glmm <- lmer(mean_MU ~ condType + (1|participant),
                 data = freeVtotal)

Diagnose issues

Kernel density estimate plot

plot(density(freeVtotal$mean_MU), xlim=c(-50, 50), ylim=c(0, 0.6), 
     main='M_1 y_hat')
lines(density(predict(freeVtotal_glmm, type='response') ), col='red')

We might have the same problem here as with the other model for MU in the other conditions.

Residuals vs Fitted plot

plot(freeVtotal_glmm)

QQ Plot

residuals <- residuals(freeVtotal_glmm)
qqnorm(residuals)
qqline(residuals, col = "red")

Check_model from Performance

check_model(freeVtotal_glmm)

The problem here is that yes the residuals have an insane pattern, but the data doesn’t obviously look like a gamma. It DOES look normal, but it just has a tall peak in the middle and tiny tails. Also there is negative values.

Try glmmTMB

library(glmmTMB)

MU_TMB <- glmmTMB(mean_MU ~ condType + (1|participant),
                   data = freeVtotal, ziformula=~1,
                   family = gaussian())

plot(density(freeVtotal$mean_MU), xlim=c(-50, 50), ylim=c(0, 0.6), 
     main='M_1 y_hat')
lines(density(predict(MU_TMB, type='response') ), col='red')

residuals <- residuals(MU_TMB)
qqnorm(residuals)
qqline(residuals, col = "red")

# Have to do this by scratch...
resid_vals <- residuals(MU_TMB, type = "pearson")
fitted_vals <- fitted(MU_TMB)

plot(fitted_vals, resid_vals,
     xlab = "Fitted values", ylab = "Pearson residuals",
     main = "Residuals vs Fitted (glmmTMB)")
abline(h = 0, lty = 2, col = "red")

I do not think that worked better.

Null vs Full model

MUfree_null <- lmer(mean_MU ~ 1 + (1|participant), 
                     data = freeVtotal)

MUTMB_null <- glmmTMB(mean_MU ~ 1 + (1|participant), 
                     data = freeVtotal,  ziformula=~1,
                   family = gaussian())

anova(MUfree_null,freeVtotal_glmm)
refitting model(s) with ML (instead of REML)
Data: freeVtotal
Models:
MUfree_null: mean_MU ~ 1 + (1 | participant)
freeVtotal_glmm: mean_MU ~ condType + (1 | participant)
                npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
MUfree_null        3 638.84 647.88 -316.42   632.84                     
freeVtotal_glmm    4 639.97 652.01 -315.98   631.97 0.8786  1     0.3486
anova(MUTMB_null,MU_TMB)
Data: freeVtotal
Models:
MUTMB_null: mean_MU ~ 1 + (1 | participant), zi=~1, disp=~1
MU_TMB: mean_MU ~ condType + (1 | participant), zi=~1, disp=~1
           Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
MUTMB_null  4 640.84 652.89 -316.42   632.84                         
MU_TMB      5 641.97 657.02 -315.98   631.97 0.8786      1     0.3486

Conclusion of diagnostics

Using a regular lmer and using glmTMB with the zero-inflation add-on did not make vastly different results, so I will go with the regular lmer because its simpler, but we should keep in mind when interpreting the results that the model is not an ideal fit.

Interpret Free V Total results

Results output

summary(freeVtotal_glmm)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: mean_MU ~ condType + (1 | participant)
   Data: freeVtotal

REML criterion at convergence: 634.2

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.12767 -0.40792  0.03309  0.43621  2.86641 

Random effects:
 Groups      Name        Variance Std.Dev.
 participant (Intercept) 0.9624   0.981   
 Residual                3.1610   1.778   
Number of obs: 150, groups:  participant, 75

Fixed effects:
              Estimate Std. Error       df t value        Pr(>|t|)    
(Intercept)     1.7156     0.2345 140.3543   7.317 0.0000000000179 ***
condTypeTotal   0.2711     0.2903  74.0000   0.934           0.353    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr)
condTypeTtl -0.619
MUfreeAov <- cbind(car::Anova(freeVtotal_glmm),
                   r.squaredGLMM(freeVtotal_glmm,MUfree_null))
MUfreeAov
             Chisq Df Pr(>Chisq)         R2m       R2c
condType 0.8719833  1  0.3504068 0.004466303 0.2368211

Results visualization

# Calculate means and standard errors
means <- tapply(freeVtotal$mean_MU, freeVtotal$condType, mean)

# Calculate standard deviations and sample sizes for each condition
sds <- tapply(freeVtotal$mean_MU, freeVtotal$condType, sd)
n_per_condition <- tapply(freeVtotal$mean_MU, freeVtotal$condType, length)

# Calculate standard errors
se <- sds / sqrt(n_per_condition)

# Define the x-axis levels (to match the ggplot2 ordering)
conds_levels <- c("Free", "Total")
x_positions <- 1:length(conds_levels)

# Create an empty plot
plot(
  x_positions, means[conds_levels], 
  type = "n", xaxt = "n", ylim = c(-5, 10.0), xlim = c(0.5, 2.5),
  xlab = "", ylab = "Mean Memory Usage (items)", 
  main = "",
  cex.lab = 1.5,
  cex.axis = 1.5,
  cex.main = 1.5
)


participants <- unique(freeVtotal$participant)

for (p in participants) {
  this_participant <- freeVtotal[freeVtotal$participant == p, ]
  if (nrow(this_participant) == 2) {
    # Get the x positions for this participant’s conditions
    x_vals <- match(this_participant$condType, conds_levels)
    y_vals <- this_participant$mean_MU
    lines(x_vals, y_vals, col = adjustcolor("#009E73", alpha.f = 0.3), lwd = 1)
  }
}

# Add x-axis labels normally
axis(1, at = x_positions, labels = conds_levels, cex.axis = 1.6)

# Add points for each participant
for (i in 1:length(conds_levels)) {
  condition_data <- freeVtotal[freeVtotal$condType == conds_levels[i], ]
  points(rep(x_positions[i], nrow(condition_data)), condition_data$mean_MU, 
         pch = 1, col = "#0072B2")
}

# Add points for the means
points(x_positions, means[conds_levels], pch = 10, col = "black")

# Add a solid line connecting the means
lines(x_positions, means[conds_levels], col = "black", lwd = 2)

# Add horizontal segments at the means (optional, similar to ggplot2's stat_summary)
segments(
  x_positions - 0.1, means[conds_levels], 
  x_positions + 0.1, means[conds_levels],
  col = "black", lwd = 3.0
)

# Add mean labels for each condition
text(x_positions, means[conds_levels] + 2, 
     labels = round(means[conds_levels], 2), cex = 2.5, col = "black")

# Add error bars using arrows() with specified lower and upper bounds
# Function to compute standard error
se <- function(x) sd(x, na.rm = TRUE) / sqrt(sum(!is.na(x)))

# Get standard errors for each condition
ses <- tapply(freeVtotal$mean_MU, freeVtotal$condType, se)

# Get means again, just to be sure
means <- tapply(freeVtotal$mean_MU, freeVtotal$condType, mean)

# Define bounds for error bars
lower_bounds <- means[conds_levels] - 2*ses[conds_levels]
upper_bounds <- means[conds_levels] + 2*ses[conds_levels]


# Add vertical error bars using arrows()
for (i in 1:length(x_positions)) {
  arrows(
    x0 = x_positions[i], y0 = lower_bounds[i], 
    x1 = x_positions[i], y1 = upper_bounds[i], 
    length = 0.1, angle = 90, code = 3, col = "#D55E00", lwd = 3.5
  )
}

Results interpretation

print("Standard deviation for MU in Free-Response conditon:")
[1] "Standard deviation for MU in Free-Response conditon:"
sd(freeVtotal$mean_MU[freeVtotal$condType == "Free"], na.rm = T)
[1] 0.8484102
print("Standard deviation for MU in Total-Response conditon:")
[1] "Standard deviation for MU in Total-Response conditon:"
sd(freeVtotal$mean_MU[freeVtotal$condType == "Total"], na.rm = T)
[1] 2.74351

Children used the entire contents of their memory when they made a trip between the list and the shelf, as Memory Usage in the Total-response condition (M = 1.99, SD = 2.74) was larger than Memory Usage in the Free-response condition (M = 1.72, SD = 0.85), but this was non-significant (X2 = 0.87, p = 0.35).