cluster_vs_errors

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Switch value and error rate

Intrusion errors

library(lme4)

## Loading required package: Matrix

df_recall_merged <- read.csv("data/df_recall_forager_merged.csv")

df_model <- df_recall_merged[df_recall_merged$Switch_Value %in% c(0, 1), ]
df_model$error        <- as.integer(df_model$error)
df_model$Switch_Value <- as.numeric(df_model$Switch_Value)

m_switch_recall <- glmer(
  error ~ Switch_Value + (1 | participant_id),
  data    = df_model,
  family  = binomial
)
summary(m_switch_recall)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: error ~ Switch_Value + (1 | participant_id)
##    Data: df_model
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     454.3     468.0    -224.2     448.3       697 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0669 -0.3876 -0.1904 -0.1238  5.2522 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 4.455    2.111   
## Number of obs: 700, groups:  participant_id, 39
## 
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -2.83366    0.46817  -6.053 1.42e-09 ***
## Switch_Value  0.07645    0.27512   0.278    0.781    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## Switch_Valu -0.304

The above results mean a slightly higher odds of error at the cluster switch boundary, although not statistically significant.

Forgetting errors

df_generation_merged <- read.csv("data/df_generation_forager_merged.csv")

df_model <- df_generation_merged[df_generation_merged$Switch_Value %in% c(0, 1), ]
df_model$error        <- as.integer(df_model$error)
df_model$Switch_Value <- as.numeric(df_model$Switch_Value)

m_switch_generation <- glmer(
  error ~ Switch_Value + (1 | participant_id),
  data    = df_model,
  family  = binomial
)
summary(m_switch_generation)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: error ~ Switch_Value + (1 | participant_id)
##    Data: df_model
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     959.7     974.0    -476.9     953.7       853 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.9278 -0.5881 -0.4484  0.5221  3.1757 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 1.251    1.118   
## Number of obs: 856, groups:  participant_id, 39
## 
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.02955    0.22272  -4.623 3.79e-06 ***
## Switch_Value -0.06407    0.16658  -0.385    0.701    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## Switch_Valu -0.406

The above shows the reversed pattern for forgetting errors: switching (during generation) predicts less forgetting of this item during recall. Again, not significant.

Load all data

d_all <- read.csv("data/cluster_level_all_methods_for_r.csv")

Descriptive stats

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.3     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ tidyr::pack()   masks Matrix::pack()
## ✖ tidyr::unpack() masks Matrix::unpack()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

d_all %>% mutate(is_singleton = cluster_size == 1) %>% group_by(method, task, is_singleton) %>% summarize(n=n(), error_rate=mean(error_rate))

## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by method, task, and is_singleton.
## ℹ Output is grouped by method and task.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(method, task, is_singleton))` for per-operation
##   grouping (`?dplyr::dplyr_by`) instead.

## # A tibble: 8 × 5
## # Groups:   method, task [4]
##   method task       is_singleton     n error_rate
##   <chr>  <chr>      <lgl>        <int>      <dbl>
## 1 delta  generation FALSE          197      0.291
## 2 delta  generation TRUE           312      0.295
## 3 delta  recall     FALSE          170      0.129
## 4 delta  recall     TRUE           219      0.169
## 5 norms  generation FALSE          202      0.282
## 6 norms  generation TRUE           303      0.310
## 7 norms  recall     FALSE          172      0.120
## 8 norms  recall     TRUE           272      0.173

Cluster size ~ cluster density

library(lme4)
library(lmerTest)

## 
## Attaching package: 'lmerTest'

## The following object is masked from 'package:lme4':
## 
##     lmer

## The following object is masked from 'package:stats':
## 
##     step

# Delta — generation
m_coupling_delta_gen <- lmer(
  cluster_density ~ cluster_size + (1 | participant_id),
  data = d_all[d_all$method == "delta" & d_all$task == "generation", ]
)
summary(m_coupling_delta_gen)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: cluster_density ~ cluster_size + (1 | participant_id)
##    Data: d_all[d_all$method == "delta" & d_all$task == "generation", ]
## 
## REML criterion at convergence: -419.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.46849 -0.77186 -0.02669  0.76599  2.65612 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 0.000342 0.01849 
##  Residual                   0.006133 0.07832 
## Number of obs: 197, groups:  participant_id, 39
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)    0.619277   0.014255 180.091228  43.443   <2e-16 ***
## cluster_size  -0.039797   0.004303 187.631188  -9.249   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## cluster_siz -0.893

# Delta — recall
m_coupling_delta_recall <- lmer(
  cluster_density ~ cluster_size + (1 | participant_id),
  data = d_all[d_all$method == "delta" & d_all$task == "recall", ]
)
summary(m_coupling_delta_recall)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: cluster_density ~ cluster_size + (1 | participant_id)
##    Data: d_all[d_all$method == "delta" & d_all$task == "recall", ]
## 
## REML criterion at convergence: -354.2
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.55232 -0.63427 -0.03596  0.58985  2.81543 
## 
## Random effects:
##  Groups         Name        Variance  Std.Dev.
##  participant_id (Intercept) 0.0009244 0.03040 
##  Residual                   0.0059517 0.07715 
## Number of obs: 170, groups:  participant_id, 39
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)    0.582416   0.015165 147.581543  38.406  < 2e-16 ***
## cluster_size  -0.030575   0.004227 163.114222  -7.233 1.74e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## cluster_siz -0.855

# Norms — generation
m_coupling_norms_gen <- lmer(
  cluster_density ~ cluster_size + (1 | participant_id),
  data = d_all[d_all$method == "norms" & d_all$task == "generation", ]
)
summary(m_coupling_norms_gen)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: cluster_density ~ cluster_size + (1 | participant_id)
##    Data: d_all[d_all$method == "norms" & d_all$task == "generation", ]
## 
## REML criterion at convergence: -349.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3303 -0.5118 -0.0364  0.6125  3.3173 
## 
## Random effects:
##  Groups         Name        Variance  Std.Dev.
##  participant_id (Intercept) 0.0002285 0.01512 
##  Residual                   0.0094291 0.09710 
## Number of obs: 202, groups:  participant_id, 39
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)    0.527240   0.016788 183.642480  31.406  < 2e-16 ***
## cluster_size  -0.017152   0.005155 197.853036  -3.327  0.00104 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## cluster_siz -0.900

# Norms — recall
m_coupling_norms_recall <- lmer(
  cluster_density ~ cluster_size + (1 | participant_id),
  data = d_all[d_all$method == "norms" & d_all$task == "recall", ]
)
summary(m_coupling_norms_recall)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: cluster_density ~ cluster_size + (1 | participant_id)
##    Data: d_all[d_all$method == "norms" & d_all$task == "recall", ]
## 
## REML criterion at convergence: -281.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2939 -0.5616 -0.0310  0.6991  3.1809 
## 
## Random effects:
##  Groups         Name        Variance  Std.Dev.
##  participant_id (Intercept) 0.0001541 0.01241 
##  Residual                   0.0103648 0.10181 
## Number of obs: 172, groups:  participant_id, 38
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)    0.565246   0.020320 157.352125  27.817  < 2e-16 ***
## cluster_size  -0.028065   0.006865 169.996081  -4.088 6.69e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## cluster_siz -0.917

Visuals

library(ggplot2)

# DECISION: faceted scatter with two layers of fits — thin gray per-participant
# linear fits (the within-participant coupling) and a thick colored line for
# the LMM fixed effect. This visually decomposes what the random-intercept
# model is doing: each participant has their own intercept (gray lines shift
# vertically), and the population slope (colored line) is the average
# within-participant slope.
ggplot(d_all[!is.na(d_all$cluster_density), ],
       aes(x = cluster_size, y = cluster_density)) +
  geom_point(aes(color = method), alpha = 0.4, size = 1.5) +
  geom_smooth(aes(group = participant_id),
              method = "lm", se = FALSE,
              color = "gray60", linewidth = 0.3, alpha = 0.5) +
  geom_smooth(aes(color = method),
              method = "lm", se = TRUE,
              linewidth = 1.2) +
  facet_grid(method ~ task) +
  scale_color_manual(values = c(delta = "#1f77b4", norms = "#d62728")) +
  labs(x = "Cluster size (items)",
       y = "Cluster density (mean within-cluster cosine similarity)",
       title = "Density–size coupling, by method × task",
       caption = "Gray lines = per-participant fits. Colored line = pooled fit (95% CI band).") +
  theme_minimal(base_size = 11) +
  theme(legend.position = "none")

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Cluster size and density as predictors of error rate

Norms-based clusters

Forgetting errors:

m_norms_gen <- glmer(
  cbind(n_errors, cluster_size - n_errors) ~ size_c + density_c + (1 | participant_id),
  data   = d_all[d_all$method == "norms" & d_all$task == "generation", ],
  family = binomial
)
summary(m_norms_gen)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(n_errors, cluster_size - n_errors) ~ size_c + density_c +  
##     (1 | participant_id)
##    Data: d_all[d_all$method == "norms" & d_all$task == "generation", ]
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     475.1     488.4    -233.6     467.1       198 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.5509 -0.8403 -0.4364  0.7278  4.2947 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 1.16     1.077   
## Number of obs: 202, groups:  participant_id, 39
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.11382    0.23944  -4.652 3.29e-06 ***
## size_c      -0.03791    0.06992  -0.542    0.588    
## density_c   -1.42253    1.23749  -1.150    0.250    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) size_c
## size_c    -0.470       
## density_c -0.043  0.261

Visuals

d_norms_gen <- d_all[d_all$method == "norms" & d_all$task == "generation",]
d_norms_gen_dens <- d_norms_gen[!is.na(d_norms_gen$cluster_density),]

# DECISION: bubble plot with point size = cluster_size. The binomial GLMM
# weights clusters by size, so a 10-item cluster contributes more than a
# 2-item cluster — making point size encode cluster_size makes the visual
# weighting match the analytical weighting. The fitted curve uses the GLMM's
# fixed effect on density_c, holding size_c at zero (mean cluster size).
p_density <- ggplot(d_norms_gen_dens, aes(x = cluster_density, y = error_rate)) +
  geom_point(aes(size = cluster_size), alpha = 0.4, color = "#d62728") +
  geom_smooth(aes(weight = cluster_size),
              method = "glm", method.args = list(family = "binomial"),
              formula = y ~ x, color = "black", linewidth = 1) +
  scale_size_continuous(range = c(1, 6), name = "Cluster size") +
  labs(x = "Cluster density",
       y = "Cluster error rate (proportion forgotten)",
       title = "Forgetting ~ density (norms)",
       caption = "Point size = cluster size. Curve = size-weighted binomial fit.") +
  theme_minimal(base_size = 11)

p_size <- ggplot(d_norms_gen, aes(x = cluster_size, y = error_rate)) +
  geom_jitter(alpha = 0.4, color = "#d62728", width = 0.15, height = 0) +
  geom_smooth(aes(weight = cluster_size),
              method = "glm", method.args = list(family = "binomial"),
              formula = y ~ x, color = "black", linewidth = 1) +
  labs(x = "Cluster size", y = "Cluster error rate (proportion forgotten)",
       title = "Forgetting ~ size (norms)") +
  theme_minimal(base_size = 11)

if (requireNamespace("patchwork", quietly = TRUE)) {
  library(patchwork)
  p_density + p_size
} else {
  print(p_density); print(p_size)
}

Intrusion errors:

m_norms_recall <- glmer(
  cbind(n_errors, cluster_size - n_errors) ~ size_c + density_c + (1 | participant_id),
  data   = d_all[d_all$method == "norms" & d_all$task == "recall", ],
  family = binomial
)
summary(m_norms_recall)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(n_errors, cluster_size - n_errors) ~ size_c + density_c +  
##     (1 | participant_id)
##    Data: d_all[d_all$method == "norms" & d_all$task == "recall", ]
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     228.0     240.6    -110.0     220.0       168 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.9050 -0.4922 -0.2535 -0.1903  3.0071 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 4.503    2.122   
## Number of obs: 172, groups:  participant_id, 38
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -2.6075     0.5313  -4.908  9.2e-07 ***
## size_c       -0.2853     0.1741  -1.639    0.101    
## density_c    -1.1891     1.7895  -0.665    0.506    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) size_c
## size_c    -0.313       
## density_c -0.054  0.318

Visuals

d_norms_recall <- d_all[d_all$method == "norms" & d_all$task == "recall", ]
d_norms_recall_dens <- d_norms_recall[!is.na(d_norms_recall$cluster_density),]

p_density <- ggplot(d_norms_recall_dens, aes(x = cluster_density, y = error_rate)) +
  geom_point(aes(size = cluster_size), alpha = 0.4, color = "#9467bd") +
  geom_smooth(aes(weight = cluster_size),
              method = "glm", method.args = list(family = "binomial"),
              formula = y ~ x, color = "black", linewidth = 1) +
  scale_size_continuous(range = c(1, 6), name = "Cluster size") +
  labs(x = "Cluster density",
       y = "Cluster error rate (proportion intrusions)",
       title = "Intrusions ~ density (norms)",
       caption = "Point size = cluster size. Curve = size-weighted binomial fit.") +
  theme_minimal(base_size = 11)

p_size <- ggplot(d_norms_recall, aes(x = cluster_size, y = error_rate)) +
  geom_jitter(alpha = 0.4, color = "#9467bd", width = 0.15, height = 0) +
  geom_smooth(aes(weight = cluster_size),
              method = "glm", method.args = list(family = "binomial"),
              formula = y ~ x, color = "black", linewidth = 1) +
  labs(x = "Cluster size", y = "Cluster error rate (proportion intrusions)",
       title = "Intrusions ~ size (norms)") +
  theme_minimal(base_size = 11)

if (requireNamespace("patchwork", quietly = TRUE)) {
  library(patchwork)
  p_density + p_size
} else {
  print(p_density); print(p_size)
}

Similarity-drop clusters

Forgetting errors:

m_sim_gen <- glmer(
  cbind(n_errors, cluster_size - n_errors) ~ size_c + density_c + (1 | participant_id),
  data = subset(d_all, method == "delta" & task == "generation" & !is.na(density_c)),
  family = binomial
)
summary(m_sim_gen)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(n_errors, cluster_size - n_errors) ~ size_c + density_c +  
##     (1 | participant_id)
##    Data: 
## subset(d_all, method == "delta" & task == "generation" & !is.na(density_c))
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     467.6     480.7    -229.8     459.6       193 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.7478 -0.8227 -0.5133  0.7321  2.7860 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 1.185    1.089   
## Number of obs: 197, groups:  participant_id, 39
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.95743    0.24329  -3.935 8.31e-05 ***
## size_c      -0.07546    0.08526  -0.885    0.376    
## density_c   -0.01595    1.41192  -0.011    0.991    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) size_c
## size_c    -0.501       
## density_c -0.237  0.618

Visuals

# DECISION: build two separate data subsets — density plot drops singletons
# (no defined density), size plot keeps them.
d_delta_gen      <- d_all[d_all$method == "delta" & d_all$task == "generation", ]
d_delta_gen_dens <- d_delta_gen[!is.na(d_delta_gen$cluster_density), ]

p_density <- ggplot(d_delta_gen_dens, aes(x = cluster_density, y = error_rate)) +
  geom_point(aes(size = cluster_size), alpha = 0.4, color = "#1f77b4") +
  geom_smooth(aes(weight = cluster_size),
              method = "glm", method.args = list(family = "binomial"),
              formula = y ~ x, color = "black", linewidth = 1) +
  scale_size_continuous(range = c(1, 6), name = "Cluster size") +
  labs(x = "Cluster density",
       y = "Cluster error rate (proportion forgotten)",
       title = "Forgetting ~ density (delta)",
       caption = "Singletons excluded (no defined density). Point size = cluster size.") +
  theme_minimal(base_size = 11)

p_size <- ggplot(d_delta_gen, aes(x = cluster_size, y = error_rate)) +
  geom_jitter(alpha = 0.4, color = "#1f77b4", width = 0.15, height = 0) +
  geom_smooth(aes(weight = cluster_size),
              method = "glm", method.args = list(family = "binomial"),
              formula = y ~ x, color = "black", linewidth = 1) +
  geom_vline(xintercept = 1.5, linetype = "dashed",
             color = "gray50", linewidth = 0.3) +
  labs(x = "Cluster size", y = "Cluster error rate (proportion forgotten)",
       title = "Forgetting ~ size (delta)",
       caption = "All clusters including singletons; dashed line = singleton boundary.") +
  theme_minimal(base_size = 11)

if (requireNamespace("patchwork", quietly = TRUE)) {
  library(patchwork)
  p_density + p_size
} else {
  print(p_density); print(p_size)
}

Intrusion errors:

m_sim_recall <- glmer(
  cbind(n_errors, cluster_size - n_errors) ~ size_c + density_c + (1 | participant_id),
  data   =  subset(d_all, method == "delta" & task == "recall" & !is.na(density_c)),
  family = binomial
)
summary(m_sim_recall)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(n_errors, cluster_size - n_errors) ~ size_c + density_c +  
##     (1 | participant_id)
##    Data: 
## subset(d_all, method == "delta" & task == "recall" & !is.na(density_c))
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     239.8     252.4    -115.9     231.8       166 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.2238 -0.3922 -0.2181 -0.1547  3.7287 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 5.745    2.397   
## Number of obs: 170, groups:  participant_id, 39
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -2.9407     0.6076  -4.840  1.3e-06 ***
## size_c       -0.2101     0.1367  -1.537    0.124    
## density_c    -2.4029     2.4308  -0.989    0.323    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) size_c
## size_c    -0.261       
## density_c -0.073  0.501

Cluster size (only), more data

Similarity drop clusters

Forgetting errors

m_sim_gen <- glmer(
  cbind(n_errors, n_correct) ~ size_c + (1 | participant_id),
  data = subset(d_all, method == "delta" & task == "generation"),
  family = binomial
)
summary(m_sim_gen)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(n_errors, n_correct) ~ size_c + (1 | participant_id)
##    Data: subset(d_all, method == "delta" & task == "generation")
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     810.3     823.0    -402.2     804.3       506 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0481 -0.6772 -0.4745  0.8234  3.0562 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 1.135    1.066   
## Number of obs: 509, groups:  participant_id, 39
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.03812    0.19793  -5.245 1.56e-07 ***
## size_c      -0.05010    0.04696  -1.067    0.286    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr)
## size_c -0.189

Intrusion errors

m_sim_gen <- glmer(
  cbind(n_errors, n_correct) ~ size_c + (1 | participant_id),
  data = subset(d_all, method == "delta" & task == "recall"),
  family = binomial
)
summary(m_sim_gen)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(n_errors, n_correct) ~ size_c + (1 | participant_id)
##    Data: subset(d_all, method == "delta" & task == "recall")
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     414.2     426.1    -204.1     408.2       386 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.7861 -0.4574 -0.2571 -0.1665  4.5910 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 3.553    1.885   
## Number of obs: 389, groups:  participant_id, 39
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.50316    0.38996  -6.419 1.37e-10 ***
## size_c      -0.18396    0.08369  -2.198   0.0279 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr)
## size_c -0.109

Norms based clusters

Forgetting errors

m_norms_gen <- glmer(
  cbind(n_errors, n_correct) ~ size_c + (1 | participant_id),
  data = subset(d_all, method == "norms" & task == "generation"),
  family = binomial
)
summary(m_norms_gen)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(n_errors, n_correct) ~ size_c + (1 | participant_id)
##    Data: subset(d_all, method == "norms" & task == "generation")
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     817.7     830.4    -405.9     811.7       502 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0019 -0.7144 -0.4654  0.7297  3.2569 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 1.125    1.06    
## Number of obs: 505, groups:  participant_id, 39
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.05702    0.19790  -5.341 9.24e-08 ***
## size_c      -0.02623    0.04876  -0.538    0.591    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr)
## size_c -0.207

Intrusion errors

m_norms_recall <- glmer(
  cbind(n_errors, n_correct) ~ size_c + (1 | participant_id),
  data = subset(d_all, method == "norms" & task == "recall"),
  family = binomial
)
summary(m_norms_recall)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(n_errors, n_correct) ~ size_c + (1 | participant_id)
##    Data: subset(d_all, method == "norms" & task == "recall")
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     428.4     440.7    -211.2     422.4       441 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0927 -0.4215 -0.2481 -0.1633  3.9280 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 3.684    1.919   
## Number of obs: 444, groups:  participant_id, 39
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -2.5275     0.3944  -6.408 1.47e-10 ***
## size_c       -0.2947     0.1087  -2.711   0.0067 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr)
## size_c -0.047

Now, using a generic error rate (forgetting) per cluster as a linear predictor:

d_sub <- d_all[d_all$method == "norms" & d_all$task == "generation", ]

lm <- lm(
  error_rate ~ size_c + density_c,
  data = d_sub
)


summary(lm)

## 
## Call:
## lm(formula = error_rate ~ size_c + density_c, data = d_sub)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3734 -0.2804 -0.2116  0.1790  0.7837 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.30486    0.03346   9.110   <2e-16 ***
## size_c      -0.01977    0.01919  -1.030    0.304    
## density_c   -0.36683    0.25577  -1.434    0.153    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3554 on 199 degrees of freedom
##   (303 observations deleted due to missingness)
## Multiple R-squared:  0.01283,    Adjusted R-squared:  0.002912 
## F-statistic: 1.293 on 2 and 199 DF,  p-value: 0.2766

library(lmerTest)   

d_sub <- d_all[d_all$method == "norms" & d_all$task == "generation", ]

lmm <- lmer(
  error_rate ~ size_c + density_c + (1 | participant_id),
  data = d_sub
)

summary(lmm)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: error_rate ~ size_c + density_c + (1 | participant_id)
##    Data: d_sub
## 
## REML criterion at convergence: 157
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3434 -0.7183 -0.3172  0.5332  2.6527 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 0.02391  0.1546  
##  Residual                   0.10545  0.3247  
## Number of obs: 202, groups:  participant_id, 39
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   0.29275    0.04062  53.43395   7.208    2e-09 ***
## size_c       -0.01138    0.01838 184.36976  -0.619    0.537    
## density_c    -0.29151    0.24671 186.49674  -1.182    0.239    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) size_c
## size_c    -0.522       
## density_c -0.117  0.251

d_sub <- d_all[d_all$method == "delta" & d_all$task == "recall", ]

lmm <- lmer(
  error_rate ~ size_c + density_c + (1 | participant_id),
  data = d_sub
)

summary(lmm)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: error_rate ~ size_c + density_c + (1 | participant_id)
##    Data: d_sub
## 
## REML criterion at convergence: 2.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.8888 -0.4152 -0.1462  0.0174  4.3329 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 0.05205  0.2282  
##  Residual                   0.03608  0.1899  
## Number of obs: 170, groups:  participant_id, 39
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   0.16294    0.04236  43.00905   3.846 0.000392 ***
## size_c       -0.01936    0.01285 139.49481  -1.507 0.134109    
## density_c    -0.24892    0.20743 141.50673  -1.200 0.232148    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) size_c
## size_c    -0.339       
## density_c -0.150  0.514

d_sub <- d_all[d_all$method == "norms" & d_all$task == "generation", ]
lmm_coupling_gen <- lmer(
  cluster_density ~ cluster_size + (1 | participant_id),
  data = d_sub
)
summary(lmm_coupling_gen)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: cluster_density ~ cluster_size + (1 | participant_id)
##    Data: d_sub
## 
## REML criterion at convergence: -349.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3303 -0.5118 -0.0364  0.6125  3.3173 
## 
## Random effects:
##  Groups         Name        Variance  Std.Dev.
##  participant_id (Intercept) 0.0002285 0.01512 
##  Residual                   0.0094291 0.09710 
## Number of obs: 202, groups:  participant_id, 39
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)    0.527240   0.016788 183.642480  31.406  < 2e-16 ***
## cluster_size  -0.017152   0.005155 197.853036  -3.327  0.00104 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## cluster_siz -0.900

d_sub <- d_all[d_all$method == "delta" & d_all$task == "generation", ]
lmm_coupling_gen <- lmer(
  cluster_density ~ cluster_size + (1 | participant_id),
  data = d_sub
)
summary(lmm_coupling_gen)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: cluster_density ~ cluster_size + (1 | participant_id)
##    Data: d_sub
## 
## REML criterion at convergence: -419.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.46849 -0.77186 -0.02669  0.76599  2.65612 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept) 0.000342 0.01849 
##  Residual                   0.006133 0.07832 
## Number of obs: 197, groups:  participant_id, 39
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)    0.619277   0.014255 180.091228  43.443   <2e-16 ***
## cluster_size  -0.039797   0.004303 187.631188  -9.249   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## cluster_siz -0.893

cluster_vs_errors_stats

alina

2026-04-24

R Markdown

Switch value and error rate

Intrusion errors

Forgetting errors

Load all data

Descriptive stats

Cluster size ~ cluster density

Visuals

Cluster size and density as predictors of error rate

Norms-based clusters

Forgetting errors:

Visuals

Intrusion errors:

Visuals

Similarity-drop clusters

Forgetting errors:

Visuals

Intrusion errors:

Cluster size (only), more data

Similarity drop clusters

Forgetting errors

Intrusion errors

Norms based clusters

Forgetting errors

Intrusion errors