After a couple minutes of “hand-tuning” I managed to find parameter values that closely match the overall 2AFC performance reported in Smith & Yu (2008), and a slightly different set that produce the higher performance in Yu & Smith (2011).
# Smith & Yu 2008 / Yu & Smith, 2011
sy = read.table("SmithYu2008-2x2-6w.txt", header=F)
source("models/model.R")
mp = model(c(.02, 2, .98), sy)
sum(mp$perf) # 3.48 words learned, but this is 6AFC
## [1] 3.482948
mp = model(c(.03, 2, .98), sy)
sum(mp$perf) # 3.95 words learned (6AFC)
## [1] 3.953954
mAFC_test <- function(mat, m) {
perf = rep(0, nrow(mat))
for(i in 1:nrow(mat)) {
# sample m-1 incorrect referents
alts = sample(setdiff(1:ncol(mat), i), m-1)
perf[i] = mat[i,i] / sum(mat[i,c(alts, i)])
}
return(perf)
}
mp = model(c(.0015, 2, .97), sy)
sum(mAFC_test(mp$matrix, 2)) # 3.57 words learned
## [1] 3.575718
mp = model(c(.003, 1, .97), sy)
sum(mAFC_test(mp$matrix, 2)) # 4.00 words learned
## [1] 3.998423
Let’s try the parameter sets they chose that were optimized in prior work. We don’t find quite the same RMSE that they did for set B (I got 2.90, they got 2.05), and in fact found that set C achieves a better RMSE of 0.75.
# set B: they report RMSE=2.05 with parameter set B (optimized for 6-late condition)
mp = model(c(.2, .88, .96), sy)
sqrt(sum((sum(mAFC_test(mp$matrix, 2)) - c(3.5, 4))^2)) # 2.90 (with noise +.01)
## [1] 3.002922
# set C
mp = model(c(.001, 15, 1), sy)
sum(mAFC_test(mp$matrix, 2)) # 3.28 words
## [1] 3.278335
sqrt(sum((sum(mAFC_test(mp$matrix, 2)) - c(3.5, 4))^2))
## [1] 0.7542496
# set A
mp = model(c(0.347, 18.31, .995), sy)
sum(mAFC_test(mp$matrix, 2)) # 5.79 words
## [1] 5.941403
We systematically evaluate learning performance of the model over a small region of parameter space to get a better sense (as compared to hand-tuning) of the influence of the parameters.
The heatmap below shows the number of words learned by the model for different parameter values (facets: \(\lambda = 0.5\) and \(\lambda = 2\)).
require(ggplot2)
## Loading required package: ggplot2
ggplot(sy_perf, aes(x=chi, y=alpha, fill=perf)) + geom_tile() + # or geom_tile
facet_wrap(vars(lambda)) + theme_bw()
It is clear that the model’s performance greatly exceeds children’s for much of the parameter space, there are many combinations of small learning rate (\(\chi\)) values and high values of memory decay (\(\alpha\)) that learn 3-4 words as children typically do.
good_parms = sy_perf[with(sy_perf, which(perf<=4.0 & perf>=3.5)),]
summary(good_parms) # chi=.001 - .041, alpha=.827 - .998
## chi lambda alpha perf
## Min. :0.000060 Min. :0.500 Min. :0.8030 Min. :3.500
## 1st Qu.:0.001060 1st Qu.:0.500 1st Qu.:0.9320 1st Qu.:3.646
## Median :0.001810 Median :2.000 Median :0.9650 Median :3.774
## Mean :0.001855 Mean :1.251 Mean :0.9533 Mean :3.767
## 3rd Qu.:0.002560 3rd Qu.:2.000 3rd Qu.:0.9830 3rd Qu.:3.893
## Max. :0.004510 Max. :2.000 Max. :0.9980 Max. :3.999
Let’s try the Vlach & DeBrock 2017 / 2019 massed vs. interleaved (spaced) repetition data.
vd = read.table("VlachDeBrock2017_2019.txt", header=F)
massed = c(1, 8, 9, 10, 11, 12)
spaced = setdiff(1:12, massed)
# 2AFC test, overall M=6.81 words learned (better with age: only 47-58 mos / 50-70 mos groups above chance
# Exp1: massed M=3.17, interleaved M=3.64
# Exp2: massed M=3.00, interleaved M=3.48
mp = model(c(.003, 2, .97), vd)
mperf = mAFC_test(mp$matrix, 2)
massed_perf = sum(mperf[massed])
spaced_perf = sum(mperf[spaced])
Using the same hand-tuned parameters that I found for Yu & Smith (2011), the model learns 3.5 of the massed words, and 3.37 of the interleaved words. Although the model does not show the same advantage for interleaved words that children showed, these means are quite close to the mean performance of children.