Test Smith & Yu (2008) / Yu & Smith (2011)

After a couple minutes of “hand-tuning” I managed to find parameter values that closely match the overall 2AFC performance reported in Smith & Yu (2008), and a slightly different set that produce the higher performance in Yu & Smith (2011).

# Smith & Yu 2008 / Yu & Smith, 2011
sy = read.table("SmithYu2008-2x2-6w.txt", header=F)
source("models/model.R")
mp = model(c(.02, 2, .98), sy)
sum(mp$perf) # 3.48 words learned, but this is 6AFC
## [1] 3.482948
mp = model(c(.03, 2, .98), sy)
sum(mp$perf) # 3.95 words learned (6AFC)
## [1] 3.953954
mAFC_test <- function(mat, m) {
  perf = rep(0, nrow(mat))
  for(i in 1:nrow(mat)) {
    # sample m-1 incorrect referents
    alts = sample(setdiff(1:ncol(mat), i), m-1)
    perf[i] = mat[i,i] / sum(mat[i,c(alts, i)])
  }
  return(perf)
}

mp = model(c(.0015, 2, .97), sy)
sum(mAFC_test(mp$matrix, 2)) # 3.57 words learned
## [1] 3.575718
mp = model(c(.003, 1, .97), sy)
sum(mAFC_test(mp$matrix, 2)) # 4.00 words learned
## [1] 3.998423

Let’s try the parameter sets they chose that were optimized in prior work. We don’t find quite the same RMSE that they did for set B (I got 2.90, they got 2.05), and in fact found that set C achieves a better RMSE of 0.75.

# set B: they report RMSE=2.05 with parameter set B (optimized for 6-late condition)
mp = model(c(.2, .88, .96), sy)
sqrt(sum((sum(mAFC_test(mp$matrix, 2)) - c(3.5, 4))^2)) # 2.90 (with noise +.01)
## [1] 3.002922
# set C
mp = model(c(.001, 15, 1), sy)
sum(mAFC_test(mp$matrix, 2)) # 3.28 words
## [1] 3.278335
sqrt(sum((sum(mAFC_test(mp$matrix, 2)) - c(3.5, 4))^2))
## [1] 0.7542496
# set A
mp = model(c(0.347, 18.31, .995), sy)
sum(mAFC_test(mp$matrix, 2)) # 5.79 words
## [1] 5.941403

Explore Parameters’ Influence on Learning

We systematically evaluate learning performance of the model over a small region of parameter space to get a better sense (as compared to hand-tuning) of the influence of the parameters.

The heatmap below shows the number of words learned by the model for different parameter values (facets: \(\lambda = 0.5\) and \(\lambda = 2\)).

require(ggplot2)
## Loading required package: ggplot2
ggplot(sy_perf, aes(x=chi, y=alpha, fill=perf)) + geom_tile() + # or geom_tile
  facet_wrap(vars(lambda)) + theme_bw()

It is clear that the model’s performance greatly exceeds children’s for much of the parameter space, there are many combinations of small learning rate (\(\chi\)) values and high values of memory decay (\(\alpha\)) that learn 3-4 words as children typically do.

good_parms = sy_perf[with(sy_perf, which(perf<=4.0 & perf>=3.5)),]
summary(good_parms) # chi=.001 - .041, alpha=.827 - .998
##       chi               lambda          alpha             perf      
##  Min.   :0.000060   Min.   :0.500   Min.   :0.8030   Min.   :3.500  
##  1st Qu.:0.001060   1st Qu.:0.500   1st Qu.:0.9320   1st Qu.:3.646  
##  Median :0.001810   Median :2.000   Median :0.9650   Median :3.774  
##  Mean   :0.001855   Mean   :1.251   Mean   :0.9533   Mean   :3.767  
##  3rd Qu.:0.002560   3rd Qu.:2.000   3rd Qu.:0.9830   3rd Qu.:3.893  
##  Max.   :0.004510   Max.   :2.000   Max.   :0.9980   Max.   :3.999

Vlach & DeBrock 2017 / 2019

Let’s try the Vlach & DeBrock 2017 / 2019 massed vs. interleaved (spaced) repetition data.

vd = read.table("VlachDeBrock2017_2019.txt", header=F)
massed = c(1, 8, 9, 10, 11, 12)
spaced = setdiff(1:12, massed)
# 2AFC test, overall M=6.81 words learned (better with age: only 47-58 mos / 50-70 mos groups above chance
# Exp1: massed M=3.17, interleaved M=3.64 
# Exp2: massed M=3.00, interleaved M=3.48

mp = model(c(.003, 2, .97), vd)
mperf = mAFC_test(mp$matrix, 2)
massed_perf = sum(mperf[massed]) 
spaced_perf = sum(mperf[spaced]) 

Using the same hand-tuned parameters that I found for Yu & Smith (2011), the model learns 3.5 of the massed words, and 3.37 of the interleaved words. Although the model does not show the same advantage for interleaved words that children showed, these means are quite close to the mean performance of children.