The Exemplar-Based Random Walk (EBRW) is a dynamic extension of the Generalized Context Model (GCM). According to the GCM, each time you experience an item in a particular context, it leaves a “trace” in memory. This trace is called an “exemplar” or an “instance” and it consists of a record of the values the item had for particular features.
For example, colors can be described in terms of three features: hue, chroma (or “saturation”), and luminance (or “brightness”). Imagine that you saw many reddish-hued colors that varied in chroma and luminance. According to the GCM, each of the colors you saw would leave a trace in memory. Each trace would consist of the values of the color you saw along those three dimensions. In this example, all the colors have the same value for “hue”, which is 0. The value of “0” for “hue” corresponds to a reddish color. The colors differ in their values for chroma and luminance, as illustrated in the graph below:
colDF <- expand_grid(h = hue, c = seq(30, 70, length.out = 5), l = seq(30, 70, length.out = 5)) %>%
mutate(col = hcl(h, c, l))
colDF %>%
ggplot(aes(x = c, y = l, fill = col)) +
geom_tile(width = 2.5, height = 2.5) +
scale_fill_identity() +
coord_equal() +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", caption = bquote(plain(hue) == .(round(hue))))
Your memory traces can also be written as a matrix, where each row corresponds to an item (color) and each column corresponds to a feature (h for “hue”, c for “chroma”, and l for “luminance”):
knitr::kable(head(colDF %>% select(h, c, l), 10), row.names = TRUE)
h | c | l | |
---|---|---|---|
1 | 0 | 30 | 30 |
2 | 0 | 30 | 40 |
3 | 0 | 30 | 50 |
4 | 0 | 30 | 60 |
5 | 0 | 30 | 70 |
6 | 0 | 40 | 30 |
7 | 0 | 40 | 40 |
8 | 0 | 40 | 50 |
9 | 0 | 40 | 60 |
10 | 0 | 40 | 70 |
The perceived distance between any two of these colors is the Euclidean distance between their feature values, weighted by the degree of attention given to each feature: \[ d_{ij} = \sqrt{\sum_{k = 1}^{N_F} w_k \left(x_{ik} - x_{jk} \right)^2} \] where \(w_k\) is the weight given to feature \(k\), \(x_{ik}\) is the value of item \(i\) on feature \(k\), \(x_{jk}\) is the value of item \(j\) on feature \(k\), \(d_{ij}\) is the perceived distance between items \(i\) and \(j\), and \(N_F\) is the number of features which in this example is 3 (hue, chroma, and luminance). Note also that while Euclidean distance is appropriate for colors, other types of distance metrics may be more appropriate for other kinds of stimuli.
Perceived similarity between two items, \(s_{ij}\), is an exponential function of the psychological distance between those items: \[ s_{ij} = \exp(-c d_{ij}) \] where \(c\) is a sensitivity parameter that controls how quickly perceived similarity decreases with distance, as illustrated in the graph below:
expand_grid(d = seq(0, 10, length.out = 151), c = seq(0.25, 3, by = 0.25)) %>%
mutate(s = exp(-c * d)) %>%
ggplot(aes(x = d, y = s, color = c, group = c)) +
geom_line() +
scale_color_continuous_sequential() +
labs(x = expression(d[ij]), y = expression(s[ij])) +
theme(legend.position = c(1, 1), legend.justification = c(1, 1))
Imagine that you have just been shown two colors from among the set shown earlier. We then present you with a third color—a probe item—and ask whether it was one of the two you just saw. You answer this question on the basis of the summed similarity between the probe item and the two items in memory. The probe item has its own vector of feature values, \(\mathbf{x_q}\). The perceived distance between the probe item and each of the memory items is, as above, the Euclidean distance between the probe’s feature values and those of the memory item: \[ \begin{align} d_{qj} & = \sqrt{\sum_{k = 1}^{N_F} w_k \left(x_{qk} - x_{jk} \right)^2} \\ s_{qj} & = \exp \left( -c d_{qj} \right) \\ S & = \sum_{j = 1}^{N_M} s_{qj} \end{align} \] and, again, the perceived similarity \(s_{qj}\) between the probe \(q\) and memory item \(j\) is an exponential function of perceived distance \(d_{qj}\). Finally, summed similarity \(S\) is the sum of the perceived similarities across all \(N_M\) items in memory.
The graphs below illustrate how this works. The left graph shows contours of equal similarity from each of two study items. The right graph shows summed similarity as a function of the chroma and luminance of a probe item (assuming the same hue).
X <- matrix(
c(40, 60,
60, 50),
nrow = 2,
ncol = 2,
byrow = TRUE
)
sens <- 0.05
colDF <- tibble(h = hue, c = X[,1], l = X[,2]) %>%
mutate(col = hcl(h, c, l))
item_cols <- as.vector(colDF$col)
names(item_cols) <- as.character(1:length(item_cols))
simDF <- expand_grid(c = seq(30, 70, length.out=101), l = seq(30, 70, length.out=101), item = c(1, 2)) %>%
mutate(d = sqrt((c - X[item, 1])^2 + (l - X[item, 2])^2)) %>%
mutate(s = exp(-sens * d))
conPlot <- colDF %>%
ggplot(aes(x = c, y = l)) +
geom_contour(aes(z = s, color = factor(item)), data = simDF, alpha = 2/3, breaks = seq(0, 1, by = 0.2)) +
geom_tile(aes(fill = col), width = 2.5, height = 2.5) +
scale_fill_identity(aesthetics = "fill") +
scale_color_manual(values = item_cols, aesthetics = "color", guide = "none") +
coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Contours of equal similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)))))
sumSimPlot <- simDF %>%
group_by(c, l) %>%
summarize(summed_sim = sum(s), .groups = "keep") %>%
ggplot(aes(x = c, y = l, fill = summed_sim)) +
geom_raster() +
scale_fill_viridis_c(option = "magma", limits = c(0, 3), guide = "none") +
coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Summed similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)))))
conPlot + sumSimPlot + plot_layout(nrow = 1)
It is reasonable to believe that some memory traces are stronger than others, likely due to things like primacy and recency. In GCM/EBRW, “strength” is operationalized as a scaling factor \(m_j\) applied to perceived similarity: \[ s_{qj} = m_j \exp \left( -c d_{qj} \right) \]
Stronger traces have their similarity multiplied by a large value (\(m_j\) is large if trace \(j\) is strong) while weaker traces have their similarity multiplied by a small value (\(m_j\) is small if trace \(j\) is weak). This is illustrated in the pair of graphs below. A probe item does not need to be as similar to a strong item in order to evoke the same level of perceived similarity.
X <- matrix(
c(40, 60,
60, 50),
nrow = 2,
ncol = 2,
byrow = TRUE
)
sens <- 0.05
strength <- c(1, 2)
colDF <- tibble(h = hue, c = X[,1], l = X[,2]) %>%
mutate(col = hcl(h, c, l)) %>%
mutate(label = c("1", "2"))
item_cols <- as.vector(colDF$col)
names(item_cols) <- as.character(1:length(item_cols))
simDF <- expand_grid(c = seq(30, 70, length.out=101), l = seq(30, 70, length.out=101), item = c(1, 2)) %>%
mutate(d = sqrt((c - X[item, 1])^2 + (l - X[item, 2])^2)) %>%
mutate(s = exp(-sens * d) * strength[item])
conPlot <- colDF %>%
ggplot(aes(x = c, y = l)) +
geom_contour(aes(z = s, color = factor(item)), data = simDF, alpha = 2/3, breaks = seq(0, 2, by = 0.2)) +
geom_tile(aes(fill = col), width = 2.5, height = 2.5) +
geom_text(aes(label = label), color = "black") +
scale_fill_identity(aesthetics = "fill") +
scale_color_manual(values = item_cols, aesthetics = "color", guide = "none") +
coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Contours of equal similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)))))
sumSimPlot <- simDF %>%
group_by(c, l) %>%
summarize(summed_sim = sum(s), .groups = "keep") %>%
ggplot(aes(x = c, y = l, fill = summed_sim)) +
geom_raster() +
scale_fill_viridis_c(option = "magma", limits = c(0, 3), guide = "none") +
coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Summed similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)))))
conPlot + sumSimPlot + plot_layout(nrow = 1)
The specificity parameter \(c\) represents the fact that items are assumed to be encoded with some degree of error/uncertainty. Just as items may be encoded with more or less strength, it is reasonable to assume that items can be encoded in memory with more or less specificity. Thus, we add a subscript to the \(c\) parameter corresponding to each study item: \[ s_{qj} = m_j \exp \left( -c_j d_{qj} \right) \]
If an item is encoded with high specificity, then it will only be perceived as similar to the probe if the probe is very close in psychological space. This is illustrated in the pair of graphs below, where item 2 is not only stronger (\(m_2 = 2\) vs. \(m_1 = 1\)) but also more precise (\(c_2 = 0.1\) vs. \(c_1 = 0.05\)).
X <- matrix(
c(40, 60,
60, 50),
nrow = 2,
ncol = 2,
byrow = TRUE
)
specificity <- c(0.05, 0.1)
strength <- c(1, 2)
colDF <- tibble(h = hue, c = X[,1], l = X[,2]) %>%
mutate(col = hcl(h, c, l)) %>%
mutate(label = c("1", "2"))
item_cols <- as.vector(colDF$col)
names(item_cols) <- as.character(1:length(item_cols))
simDF <- expand_grid(c = seq(30, 70, length.out=101), l = seq(30, 70, length.out=101), item = c(1, 2)) %>%
mutate(d = sqrt((c - X[item, 1])^2 + (l - X[item, 2])^2)) %>%
mutate(s = exp(-specificity[item] * d) * strength[item])
conPlot <- colDF %>%
ggplot(aes(x = c, y = l)) +
geom_contour(aes(z = s, color = factor(item)), data = simDF, alpha = 2/3, breaks = seq(0, 2, by = 0.2)) +
geom_tile(aes(fill = col), width = 2.5, height = 2.5) +
geom_text(aes(label = label), color = "black") +
scale_fill_identity(aesthetics = "fill") +
scale_color_manual(values = item_cols, aesthetics = "color", guide = "none") +
coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Contours of equal similarity", caption = bquote(list(plain(hue) == .(round(hue)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)), c[1] == .(signif(specificity[1], 3)), c[2] == .(signif(specificity[2], 3)))))
sumSimPlot <- simDF %>%
group_by(c, l) %>%
summarize(summed_sim = sum(s), .groups = "keep") %>%
ggplot(aes(x = c, y = l, fill = summed_sim)) +
geom_raster() +
scale_fill_viridis_c(option = "magma", limits = c(0, 3), guide = "none") +
coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Summed similarity", caption = bquote(list(plain(hue) == .(round(hue)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)), c[1] == .(signif(specificity[1], 3)), c[2] == .(signif(specificity[2], 3)))))
conPlot + sumSimPlot + plot_layout(nrow = 1)
According to the EBRW, each trace \(j\) in memory races to be retrieved at a rate proportional to its perceived similarity to the probe item, \(s_{qj}\). Traces race not just against one another, but against a criterion. If a memory trace wins the race, this is taken as evidence that the probe item matches something in memory, thus favoring a “yes” recognition response. If the criterion wins the race instead, this is taken as evidence that the probe item does not match anything in memory, thus favoring a “no” recognition response.
The idea is that, if the probe item matches something in memory, then the corresponding memory trace (or one sufficiently similar to probably match) should be able to win against the criterion. If nothing wins against the criterion, then this suggests there are no traces in memory that are a good match to the probe.
The outcome of each race is added to a running tally which starts at zero. The value of the tally at any given time \(t\), which we can denote \(x(t)\), constitutes the current state of evidence from memory for making a recognition decision. Assume that each race takes \(\Delta\) seconds to complete. Each time a memory trace wins the race, the tally gets incremented by one; each time the criterion wins the race, the tally gets decremented by one. Put formally, we can write this process as \[ x\left( t + \Delta \right) = \begin{cases} x\left( t \right) + 1 & \text{if trace wins} \\ x\left( t \right) - 1 & \text{if criterion wins} \end{cases} \] where \(x(0) = 0\).
You commit to a recognition decision when your accumulated memory evidence is either sufficiently high, leading to a “yes” response, or sufficiently low, leading to a “no” response. We can write the upper and lower decision boundaries as \(B_Y\) and \(B_N\), respectively. For example, imagine setting \(B_Y = 7\) and \(B_N = -8\). In that case, you would say “yes” when \(x(t) = B_Y = 7\), i.e., when a memory trace had won seven more races than the criterion. You would say “no” when \(x(t) = B_N = -8\), i.e., when the criterion had won eight more races than any memory trace. In this example, you would have a slight bias toward saying “yes”, in the sense that you require less evidence to say “yes” than to say “no”. We might therefore expect that, all else being equal, you would tend to say “yes” faster than you would say “no”, since in general it will take less time to accumulate an advantage of \(B_Y = 7\) as opposed to an advantage of \(|B_N| = 8\).
Although we have specified that whether memory evidence goes up or down depends on whether a trace or the criterion wins the race, we have not yet specified how the outcome of that race is determined. The winner of each race is random but depends on the similarity \(s_{qj}\) between each trace \(j\) and the probe \(q\). Specifically, trace \(j\) wins the race with probability: \[ \frac{s_{qj}}{\sum_{k = 1}^N s_{qk} + \kappa} \] where \(\kappa\) is a nonnegative number that represents how stringent the criterion is. In other words, the equation above says that the probability that trace \(j\) wins the race is the similarity between trace \(j\) and the probe \(q\) divided by the summed similarity across all traces plus the criterion \(\kappa\). The probability that the criterion wins is \[ \frac{\kappa}{\sum_{k = 1}^N s_{qk} + \kappa} \] In a sense, we can think of the criterion is like a “virtual memory trace” that races alongside the \(N\) actual memory traces.
Remember that we increment memory strength whenever any trace wins the race, regardless of which one it is. Because only one trace can win each race, the probability that any trace wins is just the sum of the probabilities of winning across all \(N\) traces, i.e.: \[ \sum_{j = 1}^N \frac{s_{qj}}{\sum_{k = 1}^N s_{qk} + \kappa} = \frac{1}{\sum_{k = 1}^N s_{qk} + \kappa} \left( \sum_{j = 1}^Ns_{qj} \right) = \frac{S}{S + \kappa} \] where \(S = \sum_{j = 1}^N s_{qj}\) is the summed similarity across all \(N\) traces.
It is reasonable to think that the criterion \(\kappa\) could be adjusted in response to a number of factors. One such factor might be list length; it would make sense to set a more stringent criterion for a longer list, since there will be a lot of incidental similarity between probes and traces.
However, in the present application, we only have one list length (\(N = 2\)). Another factor that can be important in adjusting criterion is within-list similarity. If the items within a list are themselves highly similar, then it makes sense to set a more stringent criterion since otherwise you may falsely recognize a probe that happens to be similar to both items. In other words, recognition should depend, at least in part, on judging whether the probe and trace(s) are more similar to one another than the traces are to themselves.
Conceptually, within-list similarity is the average similarity between all pairs of list items. Practically, this is a bit challenging to calculate because similarity depends on the strength and specificity with which the items are encoded.
To calculate within-list similarity, let’s first consider a pair of traces \(j\) and \(k\). If we treat trace \(k\) as if it were a probe (like \(q\)), then the similarity between \(k\) and \(j\) would be \[ s_{kj} = m_j \exp \left(-c_j d_{kj} \right) \] where the strength \(m_j\) and specificity \(c_j\) come from trace \(j\). We can also go the other direction and treat trace \(j\) as if it were a probe, resulting in \[ s_{jk} = m_k \exp \left(-c_k d_{jk} \right) \] We assume that the perceived similarity between traces \(j\) and \(k\) is just the average of \(s_{kj}\) and \(s_{jk}\), i.e., \[ \frac{s_{kj} + s_{jk}}{2} = \frac{1}{2} \left[ m_j \exp \left(-c_j d_{kj} \right) + m_k \exp \left(-c_k d_{jk} \right) \right] \]
In our present application, \(N = 2\), so the formula above is all we need to find the within-list similarity; just set \(j = 1\) and \(k = 2\). In general, however, within-list similarity can be written as \[ \begin{align} W & = \frac{1}{N \left(N - 1 \right)}\sum_{j = 1}^{N - 1} \sum_{k = j + 1}^N \left[ m_j \exp \left(-c_j d_{kj} \right) + m_k \exp \left(-c_k d_{jk} \right) \right] \end{align} \] where we divide by \(N \left(N - 1 \right)\) because that is the number of pairwise similarities entering into the average. Notice also that the indexes \(j\) and \(k\) are cleverly chosen so as to cover every unique \(jk\) pair.
Finally, we can write the probability of taking a step up (i.e., of a trace winning the race) as \[ \frac{S}{S + \gamma W + \kappa} \] where \(\gamma W + \kappa\) reflects the criterion adjusted for within-list similarity, and \(\gamma\) is a free parameter (assumed to be nonnegative) that reflects the strength of the adjustment.
Finally, we are in a position to fully specify the Exemplar-Based Random Walk (EBRW) model. Let \(i\) be the index of the current experimental trial and \(s_i\) indicate the participant who produced trial \(i\). That participant is associated with a matrix of perceived distances \(\mathbf{d_{s_i}}\), where each entry in the matrix \(d_{{s_i}, jk}\) is the perceived distance between items \(j\) and \(k\) for participant \(s_i\). The probe item on trial \(i\) is item \(q_i\). The item studied in position \(j\) on trial \(i\) is item \(j_i\) (same goes for \(k\)). The summed similarity \(S_i\), within-list similarity \(W_i\), and step probability \(p_i\) for trial \(i\) can then be written \[ \begin{align} S_i & = \sum_{j = 1}^N m_{s_i, j} \exp \left( -c_{s_i, j} d_{s_i, q_i, j_i} \right) \\ W_i & = \frac{1}{N \left(N - 1 \right)}\sum_{j = 1}^{N - 1} \sum_{k = j + 1}^N \left[ m_{s_i, j} \exp \left(-c_{s_i, j} d_{s_i, k_i, j_i} \right) + m_{s_i, k} \exp \left(-c_{s_i, k} d_{s_i, j_i, k_i} \right) \right] \\ p_i & = \frac{S_i}{S_i + \gamma_{s_i} W_i + \kappa_{s_i}} \end{align} \] where \(N\) is the number of items in the list, \(m_{s_i, j}\) are free parameters for the strengths of items in different study positions, \(c_{s_i, j}\) are the specificity parameters for items in different study positions, and \(\gamma_{s_i}\) and \(\kappa_{s_i}\) are free parameters for the participant’s within-list similarity and criterion.
The EBRW models the speed of decision making in terms of how many races must be run until the accumulated win advantage in favor of either a “yes” or “no” response reaches a decision boundary. To convert this to “real time”, we must say how long each race takes and allow for a residual time that accounts for the additional time needed to orient attention to the trial and execute the selected motor response. Above, we used \(\Delta\) to stand for the amount of time (in seconds) each race takes to run. We add a subscript \(s_i\) to make \(\Delta_{s_i}\) to make clear that \(\Delta\) is a parameter that can vary between people. In addition, it will be convenient later to think instead of \(\nu_{s_i} = \frac{1}{\Delta_{s_i}}\), where \(\nu_{s_i}\) is the number of races per second. Finally, parameter \(R_{s_i}\) stands for the additional residual time.
The figure below shows an example of how memory evidence evolves during a single trial in which \(p_i = 0.55\), \(\nu_{s_i} = 40\), \(B_{s_i, Y} = 7\), \(B_{s_i, N} = -8\), and \(R_{s_i} = 0.2\). In addition, the graphs above and below the evidence trajectory illustrate the relative frequency with which, across many identical trials, each type of response would be made at each unit of time. Note that these distributions are discrete because the random walk operates in discrete time intervals, each of duration \(\Delta_{s_i}\) (which in this example is \(\Delta_{s_i} = \frac{1}{\nu_{s_i}} = 0.025\) seconds).
set.seed(1)
nu <- 40
p <- 0.55
B <- c(-8, 7)
resid <- 0.2
### RT distributions
Y_rw <- seq(B[1], B[2])
P_rw <- matrix(0, nrow = length(Y_rw), ncol = length(Y_rw))
P_rw[cbind(2:(nrow(P_rw) - 1), 1:(ncol(P_rw) - 2))] <- 1 - p
P_rw[cbind(2:(nrow(P_rw) - 1), 3:ncol(P_rw))] <- p
P_rw[1, 1] <- P_rw[nrow(P_rw), ncol(P_rw)] <- 1
### Simulation
while (TRUE) {
winner <- 0
x_rw <- 0
while (TRUE) {
s <- 2 * (runif(n = 1) < p) - 1
x_rw <- c(x_rw, x_rw[length(x_rw)] + s)
if (x_rw[length(x_rw)] <= B[1]) {
winner <- 1
break
} else if (x_rw[length(x_rw)] >= B[2]) {
winner <- 2
break
}
}
if (winner == 2 & (length(x_rw) / nu) > 1.5) {
break
}
}
RT_rw <- matrix(0, nrow = 2, ncol = length(x_rw))
Z_rw <- 1 * c(Y_rw == 0)
for (i in 1:length(x_rw)) {
Z_rw <- Z_rw %*% P_rw
RT_rw[1, i] <- Z_rw[1]
RT_rw[2, i] <- Z_rw[length(Z_rw)]
}
dRT_rw <- apply(RT_rw, MARGIN = 1, FUN = diff) * nu / 2
rtPlot1 <- tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,2])) %>%
ggplot(aes(x = t, y = p)) +
geom_col(fill = "#eeb211", color = NA, width = 1 / nu) +
coord_cartesian(xlim = c(0, NA), ylim = c(0, max(c(dRT_rw)))) +
labs(x = NULL, y = "Pr(Respond at time t)") +
theme(axis.text = element_blank(), axis.ticks.y = element_blank())
rtPlot0 <- tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,1])) %>%
ggplot(aes(x = t, y = p)) +
geom_col(fill = "#46166b", color = NA, width = 1 / nu) +
labs(x = NULL, y = "Pr(Respond at time t)") +
scale_x_continuous(limits = c(0, NA)) +
scale_y_reverse(limits = c(max(c(dRT_rw)), 0)) +
theme(axis.text = element_blank(), axis.ticks.y = element_blank())
rwPlot <- tibble(t = resid + 1:length(x_rw) / nu, x = x_rw) %>%
ggplot(aes(x = t, y = x)) +
geom_step(size = 1.5) +
geom_hline(yintercept = B[2], linetype = "solid", color = "#eeb211", size = 2) +
geom_hline(yintercept = B[1], linetype = "solid", color = "#46166b", size = 2) +
geom_vline(xintercept = resid, linetype = "dashed", color = "#666666", size = 2) +
geom_text(data = tibble(x = 0, y = B[2], label = paste0("B[list(Y, s[i])] == ", B[2])), mapping = aes(x = x, y = y, label = label), color = "#eeb211", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = 2) +
geom_text(data = tibble(x = 0, y = B[1], label = paste0("B[list(N, s[i])] == ", B[1])), mapping = aes(x = x, y = y, label = label), color = "#46166b", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = -1) +
geom_text(data = tibble(x = resid, y = 0, label = paste0("R[s[i]] == ", resid)), mapping = aes(x = x, y = y, label = label), color = "#666666", inherit.aes = FALSE, parse = TRUE, hjust = 1.1, vjust = 1.5) +
coord_cartesian(xlim = c(0, NA)) +
labs(x = "Retrieval time (s)", y = "Memory evidence", caption = bquote(list(p[i] == .(p), nu[s[i]] == .(nu))))
rtPlot1 + rwPlot + rtPlot0 + plot_layout(ncol = 1, heights = c(1, 1.75, 1))
A random walk takes discrete-valued steps either up or down in discrete units of time. A Wiener diffusion process takes continuous-valued steps sampled from a normal distribution in infinitely small units of time, thus effectively operating in continuous time. We are going to approximate the discrete EBRW with a continuous EBD, or Exemplar-Based Diffusion.
In going from a random walk to a diffusion model, we are making an important psychological claim: We are saying that, instead of memory evidence arriving in discrete units at regular intervals, memory evidence is a continuous value that continually evolves as new information arrives. We can think of this as saying that, instead of only knowing the outcome of each race, you can see who is ahead and who is behind at any given time; this is the move from discrete time to continuous time. Moreover, instead of only scoring each race as a win or loss for the memory traces, the races are assigned a continuous value depending on how clear the winner is; this is the move from discrete evidence to continuous evidence.
We can write the update equation for the random walk like we did above: \[ \begin{align} x \left( t + \Delta_{s_i} \right) & = \begin{cases} x(t) + 1 & \text{with probability } p_i \\ x(t) - 1 & \text{with probability } 1 - p_i \end{cases} \\ x \left( t + \Delta_{s_i} \right) - x(t) & = \begin{cases} 1 & \text{with probability } p_i \\ -1 & \text{with probability } 1 - p_i \end{cases} \\ x \left( t + \Delta_{s_i} \right) - x(t) & \sim 2 \times \text{Bernoulli} \left( p_i \right) - 1 \end{align} \] where we have rearranged terms and used the shorthand in the final line to emphasize the idea that each step of the random walk can be thought of as a sample from a Bernoulli distribution with parameter \(p_i\) that is then transformed from \(\lbrace 0, 1 \rbrace\) to \(\lbrace -1, 1 \rbrace\).
The update equation above is for a single race, but what if we want to know the distribution of the random walk after a particular span of time? Specifically, what is the distribution of the total change in memory evidence over one second of time? It depends on how many races can be run within one second, which is \(\frac{1}{\Delta_{s_i}} = \nu_{s_i}\). Because the total change over one second is the sum of \(\nu\) Bernoulli random variables each with parameter \(p_i\), it is distributed as a Binomial distribution. As above, this distribution gets transformed to reflect the change in scale in the outcome of each race from \(\lbrace 0, 1 \rbrace\) to \(\lbrace -1, 1 \rbrace\): \[ x \left( t + 1 \right) - x(t) \sim 2 \times \text{Binomial} \left( \nu_{s_i}, p_i \right) - 1 \]
One last step to turn this into a continuous diffusion: We need to approximate the binomial distribution with a normal distribution that has the same mean and variance. The mean and variance of this normal distribution are the mean and variance of the diffusion process that approximates the random walk. The mean of a binomial distribution is \(\nu_{s_i} p_i\) and its variance is \(\nu_{s_i} p_i \left(1 - p_i \right)\). This leads to the normal approximation \[ \begin{align} x \left( t + 1 \right) - x(t) & \sim 2 \times \text{Normal} \left[ \nu_{s_i} p_i, \nu_{s_i} p_i \left(1 - p_i \right) \right] - 1 \\ x \left( t + 1 \right) - x(t) & \sim \text{Normal} \left[ 2\nu_{s_i} p_i - 1, 4\nu_{s_i} p_i \left(1 - p_i \right) \right] \\ x \left( t + 1 \right) - x(t) & \sim \text{Normal} \left( \mu_i, \sigma^2_i \right) \\ \end{align} \] where, to summarize, the mean of the diffusion on trial \(i\) is \(\mu_i\) and the variance of the diffusion on trial \(i\) is \(\sigma_i^2\), both of which depend on step probability \(p_i\) and rate \(\nu_{s_i}\): \[ \begin{align} \mu_i & = \nu_{s_i} \left( 2 p_i - 1 \right) \\ \sigma^2_i & = 4 \nu_{s_i} p_i \left(1 - p_i \right) \\ \sigma_i & = 2 \sqrt{\nu_{s_i} p_i \left(1 - p_i \right)} \end{align} \]
The mean \(\mu_i\) is the drift rate of the diffusion process. The variance \(\sigma_i^2\) is the noise in the diffusion process. For each infinitesimal unit of time \(dt\), the current state of memory evidence is updated according to \[ dx(t) = \mu_i dt + \epsilon_i(t) \] where \[ \epsilon_i(t) \sim \text{Normal} \left( 0, \sigma_i^2 dt \right) \] is a random variable sampled from a Normal distribution with mean zero and variance \(\sigma_i^2 dt\).
Notice an important property of the noise \(\sigma^2_i\): It is largest when \(p_i = 0.5\) and approaches zero as \(p_i\) approaches either 0 or 1. In other words, the more uncertain the outcome of each race, the more noise there is in the diffusion. This is illustrated below:
expand_grid(p = seq(0, 1, length.out = 101), nu = seq(10, 40, by = 10)) %>%
mutate(sigma2 = 4 * nu * p * (1 - p)) %>%
ggplot(aes(x = p, y = sigma2, color = nu, group = nu)) +
geom_line() +
labs(x = expression(p[i]), y = expression(sigma[i]^2), color = expression(nu[s[i]])) +
theme(legend.position = c(1, 1), legend.justification = c(1, 1))
To summarize, the difference between the random walk and the diffusion is that we have swapped out a discrete binomial distribution of evidence increments per unit time with a continuous normal distribution of evidence increments per unit time. Everything else is the same: You still respond “yes” if and when the accumulated evidence \(x(t)\) reaches either the upper boundary \(B_{s_i, Y}\) or the lower boundary \(B_{s_i, N}\) and
To illustrate how well the diffusion process approximates the random walk, the graphs below show the diffusion approximation to the same random walk example used above. The smooth lines in the upper and lower graphs are the probability of responding per unit time (i.e., the probability density function) according to the Wiener diffusion model. The open bars are the same probabilities from the random walk. The diffusion model’s predictions hew very closely to those of the random walk!
mu <- nu * (2 * p - 1)
sigma2 <- 4 * nu * p * (1 - p)
boundsep <- B[2] - B[1]
bias <- (0 - B[1]) / (B[2] - B[1])
delta_t <- 0.001
while (TRUE) {
winner_diff <- NA
x_diff <- 0
while (TRUE) {
x_diff <- c(x_diff, x_diff[length(x_diff)] + rnorm(n = 1, mean = mu * delta_t, sd = sqrt(sigma2 * delta_t)))
if (x_diff[length(x_diff)] <= B[1]) {
winner_diff <- 1
break
} else if (x_diff[length(x_diff)] >= B[2]) {
winner_diff <- 2
break
}
}
if (winner == winner_diff & abs((length(x_diff) * delta_t) - (length(x_rw) / nu)) < (1 / nu)) {
break
}
}
x_diff <- pmax(pmin(x_diff, B[2]), B[1])
t <- seq(1, length(x_diff)) * delta_t
dRT_diff <- cbind(
WienerPDF(t = t, response = "lower", a = boundsep / sqrt(sigma2), v = mu / sqrt(sigma2), w = bias)$value,
WienerPDF(t = t, response = "upper", a = boundsep / sqrt(sigma2), v = mu / sqrt(sigma2), w = bias)$value
)
rtPlot1 <- tibble(t = resid + t, p = dRT_diff[,2]) %>%
ggplot(aes(x = t, y = p)) +
geom_col(data = tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,2])), color = "#eeb211aa", fill = NA, width = 1 / nu) +
geom_area(fill = "#eeb21177", color = "#eeb211") +
coord_cartesian(xlim = c(0, NA), ylim = c(0, max(c(dRT_diff)))) +
labs(x = NULL, y = "Pr(Respond at time t)") +
theme(axis.text = element_blank(), axis.ticks.y = element_blank())
rtPlot0 <- tibble(t = resid + t, p = dRT_diff[,1]) %>%
ggplot(aes(x = t, y = p)) +
geom_col(data = tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,1])), color = "#46166baa", fill = NA, width = 1 / nu) +
geom_area(fill = "#46166b77", color = "#46166b") +
labs(x = NULL, y = "Pr(Respond at time t)") +
scale_x_continuous(limits = c(0, NA)) +
scale_y_reverse(limits = c(max(c(dRT_diff)), 0)) +
theme(axis.text = element_blank(), axis.ticks.y = element_blank())
rwPlot <- tibble(t = resid + t, x = x_diff) %>%
ggplot(aes(x = t, y = x)) +
geom_line() +
geom_hline(yintercept = B[2], linetype = "solid", color = "#eeb211", size = 2) +
geom_hline(yintercept = B[1], linetype = "solid", color = "#46166b", size = 2) +
geom_vline(xintercept = resid, linetype = "dashed", color = "#666666", size = 2) +
geom_text(data = tibble(x = 0, y = B[2], label = paste0("B[list(Y, s[i])] == ", B[2])), mapping = aes(x = x, y = y, label = label), color = "#eeb211", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = 2) +
geom_text(data = tibble(x = 0, y = B[1], label = paste0("B[list(N, s[i])] == ", B[1])), mapping = aes(x = x, y = y, label = label), color = "#46166b", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = -1) +
geom_text(data = tibble(x = resid, y = 0, label = paste0("R[s[i]] == ", resid)), mapping = aes(x = x, y = y, label = label), color = "#666666", inherit.aes = FALSE, parse = TRUE, hjust = 1.1, vjust = 1.5) +
coord_cartesian(xlim = c(0, NA)) +
labs(x = "Retrieval time (s)", y = "Memory evidence", caption = bquote(list(p[i] == .(p), nu[s[i]] == .(nu), mu[i] == .(signif(mu, 3)), sigma[i] == .(signif(sqrt(sigma2), 3)))))
rtPlot1 + rwPlot + rtPlot0 + plot_layout(ncol = 1, heights = c(1, 1.75, 1))
To find the EBD parameters that best fit the recognition data from each participant, I implemented the EBD in Stan. The Stan implementation of the EBD as applied to our data involves some additional complications and considerations.
We model the data at the level of individual trials. For each trial
i
, we know
subject[i]
: The numerical index of the participant who
produced the trial.study_item[i, 1]
and study_item[i, 2]
:
These are the numerical indexes of the first and second items presented
on the study list just prior to this trial. For example, if the study
list for trial i
was items 6
and
3
in that order, then study_item[i, 1]
would
be 6
and study_item[i, 2]
would be
3
.probe_item[i]
: The numerical index of the probe item
presented on trial i
.resp[i]
: The response the participant gave on trial
i
, either 0
(for “no”) or 1
(for
“yes”).rt[i]
: The response time of the participant on trial
i
.dist[1:nSubj, 1:nItem, 1:nItem]
: A list of matrices of
the perceived distances between all nItem
items, inferred
from Multidimensional Scaling for each participant.
dist[s, j, k]
gives the distance perceived by participant
s
between item j
and item k
.For each participant s
, there are a total of nine
parameters to be estimated. These are:
retrieval_rate[s]
: This is the number of races run per
second, labeled \(\nu\) above.boundsep[s]
: This is the difference \(B_Y - B_N\), called boundary
separation. Since the decision boundaries of a diffusion
process do not need to be whole numbers, it is convenient to use this
and the following parameter instead of \(B_Y\) and \(B_N\).bias[s]
: This is a number between 0 and 1 representing
the degree of bias in how participant s
sets their
decision boundaries. \(B_Y\) is given
by boundsep[s] * (1 - bias[s])
and \(B_N\) is given by
-boundsep[s] * bias[s]
. That way, if bias[s]
is 0.5, the diffusion process is unbiased because \(B_Y = -B_N\) and you need exactly the same
amount of evidence before you are willing to make either a “yes” or “no”
response. If bias[s]
is greater than 0.5, then
s
is biased in favor of “yes” responses while if
bias[s]
is less than 0.5, then s
is biased in
favor of “no” responses.resid[s]
: The residual time, other than that needed for
the diffusion process, needed to orient to the probe item on the trial
and execute the motor response. By definition, this must be a number
between 0 and the shortest RT produced by participant s
.
This was called \(R\) above.strength1[s]
: The strength in memory of the trace of
the first list item, \(m_{s_i, 1}\)
above. Since strengths are on an arbitrary scale, we assume that \(m_{s_i, 2} = 1\) for all participants. That
means strength1[s]
is the strength of the first studied
item relative to the strength of the second for participant
s
.specificity[s]
: This is the mean specificity for
participant s
, called parameter \(c\) above, which together with the next
parameter is used to define the specificity for each serial
position.specificity_diff[s]
: This is a number between 0 and 1
that, combined with the parameter above, specifies the specificity with
which subject s
encodes items in each serial position.
Specifically,
specificity_pos[1, s] = 2 * specificity[s] * (1 - specificity_diff[s])
(the result corresponds to \(c_{s_i,
1}\) above) and
specificity_pos[1, s] = 2 * specificity[s] * specificity_diff[s]
(the result corresponds to \(c_{s_i,
2}\) above). If specificity_diff[s]
is 0.5, then
subject s
encodes items in each position with the same
specificity, specificity[s]
. If
specificity_diff[s]
> 0.5, then the second (more recent)
item is encoded with greater precision than the first; if
specificity_diff[s]
< 0.5, then the first item is
encoded with greater precision than the second.within_sim[s]
: This is how much the criterion for
participant s
is raised by within-list similarity; this
corresponds to parameter \(\gamma\)
above.criterion[s]
: This is the baseline criterion for
participant s
, corresponding to parameter \(\kappa\) above.All the parameters to be estimated with the exception of
bias
and specificity_diff
are restricted to be
non-negative real numbers. For these parameters, we assume that they are
themselves sampled from group-level Gamma distributions characterized by
their mode and standard deviation. The group-level modes and standard
deviations are themselves free parameters. For bias
and
specificity_diff
, since these parameters are restricted to
be between 0 and 1, we assume that they are sampled from group-level
Beta distributions characterized by their mode and concentration. The
group-level modes and concentrations are also free parameters.
Thus, in addition to the nine model parameters for each participant,
we estimate parameters corresponding to the group-level central tendency
(modes) and variability (either standard deviation or concentration) of
each of the nine parameters. This structure, along with moderately
informative priors on the group-level parameters, is specified in the
model
block of the Stan program:
retrieval_rate_mode ~ exponential(0.033); // Mean = 30, SD = 30
retrieval_rate_sd ~ gamma(2.618, 0.054); // Mode = 30, SD = 30
retrieval_rate ~ gamma(retrieval_rate_shape, retrieval_rate_rate);
boundsep_mode ~ exponential(0.033); // Mean = 30, SD = 30
boundsep_sd ~ gamma(2.618, 0.054); // Mode = 30, SD = 30
boundsep ~ gamma(boundsep_shape, boundsep_rate);
bias_mode ~ beta(2.0, 2.0); // Mean of 0.5, goes to zero at 0 or 1
bias_concentration ~ exponential(0.05); // Mean = 20, SD = 20
bias ~ beta(bias_shape1, bias_shape2);
resid_mode ~ exponential(2.0); // Mean = 0.5, SD = 0.5
resid_sd ~ gamma(2.618, 3.236); // Mode = 0.5, SD = 0.5
resid ~ gamma(resid_shape, resid_rate);
strength1_mode ~ exponential(1.0); // Mean = 1, SD = 1
strength1_sd ~ gamma(2.618, 1.618); // Mode = 1, SD = 1
strength1 ~ gamma(strength1_shape, strength1_rate);
specificity_mode ~ exponential(0.5); // Mean = 2, SD = 2
specificity_sd ~ gamma(2.618, 0.809); // Mode = 2, SD = 2
specificity ~ gamma(specificity_shape, specificity_rate);
specificity_diff_mode ~ beta(2.0, 2.0); // Mean of 0.5, goes to zero at 0 or 1
specificity_diff_concentration ~ exponential(0.05); // Mean = 20, SD = 20
specificity_diff ~ beta(specificity_diff_shape1, specificity_diff_shape2);
within_sim_mode ~ exponential(1.0); // Mean = 1, SD = 1
within_sim_sd ~ gamma(2.618, 1.618); // Mode = 1, SD = 1
within_sim ~ gamma(within_sim_shape, within_sim_rate);
criterion_mode ~ exponential(1.0); // Mean = 1, SD = 1
criterion_sd ~ gamma(2.618, 1.618); // Mode = 1, SD = 1
criterion ~ gamma(criterion_shape, criterion_rate);
For the parameters that are nonnegative, there is a prior on the
group-level mode as well as the group-level standard deviation. The
priors on the group-level modes are exponential()
distributions chosen so that the mean and standard deviation of the
exponential
distribution are the same as the overall scale
of the corresponding parameter (by definition, the mean and SD of an
exponential distribution are identical). In some cases, the choice of
overall scales is somewhat arbitrary (e.g., assuming that boundary
separation \(A_{s_i}\) and retrieval
rate \(\nu_{s_i}\) are on the order of
30) but in other cases it is dictated by meaningful considerations
(e.g., strength and criterion-related parameters should be on a unit
scale because they are all relative to the strength of the second item,
which is 1). Instead of exponential
distribution where the
mean and SD are equal, the priors on the group-level
standard deviations are gamma
distributions where the
mode and SD are equal. with a single parameter that
specifies the overall scale of the parameter while the group-level
standard deviations have a gamma()
prior with two
parameters (a shape and rate), also tied to the overall scale of the
parameter. By “scale”, I’m referring to the mean/mode/SD of the priors
on the group-level parameters. The figure below gives some examples of
exponential and gamma distributions with different scales.
expand_grid(x = seq(0, 10, length.out = 101), scale = c(0.2, 0.5, 1, 2, 3, 4)) %>%
mutate(exponential = dexp(x, rate = 1 / scale), gamma = dgamma(x, shape = (3 + sqrt(5)) / 2, rate = (1 + sqrt(5)) / (2 * scale))) %>%
pivot_longer(cols = c(exponential, gamma), names_to = "Distribution", values_to = "d") %>%
ggplot(aes(x = x, y = d, color = Distribution)) +
geom_line() +
facet_wrap("scale", labeller = label_both)
The reason for using a gamma()
prior on standard
deviations (as opposed to another exponential
prior) is to
avoid implosive shrinkage of the participant-level parameters
onto the group mode. In preliminary explorations, I noticed that varying
some parameters (like within_sim
) does not have a strong
influence on model predictions. As a result, you would need a
lot of data (hundreds of trials per person) to uniquely
identify the action of some parameters. Another way of saying that is
that, if you have relatively few trials per person, that sparse data
would probably be consistent with many possible values of some model
parameters. When the data provide only a weak constraint on model
parameters, the prior exerts itself. If the prior allows for zero
between-participant variability, then implosive shrinkage can occur.
With implosive shrinkage, the group-level SD is estimated to be zero and
everyone gets assigned the same value for a parameter. As shown in the
figure above, the gamma
priors used on the group-level SD’s
assign zero prior probability to values of zero. Thus, the
gamma priors prevent implosive shrinkage while still allowing for a wide
range of plausible values on a reasonable scale.
Finally, I note that the _mode
and _sd
group-level parameters are transformed into _shape
and
_rate
parameters in the transformed parameters
block of the Stan program, following the conversion formulas given by
Kruschke: \[
\begin{align}
r & = \frac{\omega + \sqrt{\omega^2 + 4 \sigma^2}}{2 \sigma^2} \\
s & = 1 + \omega r
\end{align}
\] where \(s\) and \(r\) are the shape and rate of the gamma
distribution, respectively, and \(\omega\) and \(\sigma\) are the mode and standard
deviation of the gamma distribution, respectively. Similarly, for the
two parameters that fall between 0 and 1, their _mode
and
_concentration
parameters are converted to the two shape
parameters (_shape1
and _shape2
) of a Beta
distribution: \[
\begin{align}
s_1 & = \omega \kappa + 1 \\
s_2 & = \left(1 - \omega \right) \kappa + 1
\end{align}
\] where \(s_1\) and \(s_2\) are the two shape parameters of the
Beta distribution and \(\omega\) and
\(\kappa\) are the mode and
concentration of the Beta distribution, respectively.
The final line of the model
block in the Stan program
is
rt ~ wiener(trial_boundsep, resid[subject], trial_bias, trial_drift);
This line is a shorter and more efficient way of writing the following:
for (i in 1:nData) {
rt[i] ~ wiener(trial_boundsep[i], resid[subject[i]], trial_bias[i], trial_drift[i]);
}
What these lines say is that the likelihood of the response time on
each trial is the likelihood that a Wiener diffusion process with
boundary separation trial_boundsep
, residual time
resid[subject]
, bias trial_bias
, drift rate
trial_drift
, and drift variance 1 reaches its
upper boundary at time rt
. Notice that this poses
two challenges:
rt
. This means that, in
order to get the likelihood of hitting the lower boundary at
time rt
, we will need to “flip” the process. As described
below, we can account for this by inverting the drift rate and bias on
these trials.The noise in the diffusion process, \(\sigma_i\), sets the scale for the diffusion process. If we multiply the response boundaries, mean drift rate \(\mu_i\), and standard deviation \(\sigma_i\) by the same amount, the model’s predictions will be unchanged. This means that, to make the process have unit variance, all we need to do is divide the mean drift on trial \(i\), \(\mu_i\), as well as the boundary separation for the subject on trial \(i\) (written below as \(A_{s_i}\)) by \(\sigma_i\) to find the re-scaled mean drift \(d_i\) and boundary separation \(b_i\) for trial \(i\). Mathematically, this can be written \[ \begin{align} d_i & = \frac{\mu_i}{\sigma_i} = \left(p_i - \frac{1}{2} \right) \sqrt{\frac{\nu_{s_i}}{p_i \left(1 - p_i \right)}} \\ b_i & = \frac{A_{s_i}}{\sigma_i} = \frac{A_{s_i}}{2 \sqrt{\nu_{s_i} p_i \left(1 - p_i \right)}} \end{align} \]
Recall that we record the response made on trial i
as
resp[i]
, which is either 0 (for a “no” response) or 1 (for
a “yes” response). That means that, for purposes of using Stan to
calculate the likelihood, we need to “flip” both the drift rate and the
bias on any trial i
for which resp[i]
is zero.
The idea is that hitting the lower boundary with drift \(d\) and bias \(w\) is equivalent to hitting the upper
boundary with drift \(-d\) and bias
\(1 - w\).
Mathematically, if we let \(\mathbb{R}_i\) be an indicator variable
that stands for resp[i]
and \(\beta_{s_i}\) stand for the participant on
trial \(i\)’s bias parameter
(bias[subject[i]]
in Stan) , we can write this additional
transformation as \[
\begin{align}
\tilde{d}_i & = \left(2 \mathbb{R}_i - 1 \right) d_i = \left(2
\mathbb{R}_i - 1 \right)\left(p_i - \frac{1}{2} \right)
\sqrt{\frac{\nu_{s_i}}{p_i \left(1 - p_i \right)}} \\
b_i & = \frac{A_{s_i}}{\sigma_i} = \frac{A_{s_i}}{2 \sqrt{\nu_{s_i}
p_i \left(1 - p_i \right)}} \\
w_i & = \mathbb{R}_i \beta_{s_i} + \left(1 - \mathbb{R}_i \right)
\left(1 - \beta_{s_i} \right)
\end{align}
\] where \(\beta_{s_i}\) is the
bias parameter for participant \(s_i\)
(bias[subject[i]]
) and \(\tilde{d}_i\), \(b_i\), and \(w_i\) correspond to
trial_drift
, trial_boundsep
, and
trial_bias
in the Stan code below. Note that we use \(d_i\) to stand for the drift rate on trial
\(i\) and \(\tilde{d}_i\) to stand for the “flipped”
drift rate used for the purpose of calculating the likelihood in
Stan.
The Stan code for defining the trial-level parameters from the EBD is
part of the transformed parameters
block and is a direct
translation of the equations above from math into Stan:
vector[nSubj] sqrt_retrieval_rate;
vector[nSubj] scaled_boundsep;
vector[nData] trial_sum_sim;
vector[nData] trial_within_sim;
vector[nData] p;
sqrt_retrieval_rate = sqrt(retrieval_rate);
scaled_boundsep = 0.5 * boundsep ./ sqrt_retrieval_rate;
for (i in 1:nData) {
trial_sum_sim[i] = strength1[subject[i]] * exp(-specificity_pos[1, subject[i]] * dist[subject[i], study_item[i, 1], probe_item[i]]) + exp(-specificity_pos[2, subject[i]] * dist[subject[i], study_item[i, 2], probe_item[i]]);
trial_within_sim[i] = 0.5 * (strength1[subject[i]] * exp(-specificity_pos[1, subject[i]] * dist[subject[i], study_item[i, 1], study_item[i, 2]]) + exp(-specificity_pos[2, subject[i]] * dist[subject[i], study_item[i, 1], study_item[i, 2]]));
}
p = trial_sum_sim ./ (trial_sum_sim + criterion[subject] + within_sim[subject] .* trial_within_sim);
trial_drift = sqrt_retrieval_rate[subject] .* (2.0 * resp - 1.0) .* (p - 0.5) .* inv_sqrt(p .* (1.0 - p));
trial_boundsep = scaled_boundsep[subject] .* inv_sqrt(p .* (1.0 - p));
trial_bias = bias[subject] .* resp + (1.0 - bias[subject]) .* (1.0 - resp);
When we fit the model, it is important to check whether its predictions are close to what was observed. Therefore, it will be important to be able to calculate, for each trial \(i\), three things:
Since we know whether the probe item on trial \(i\) was a target (from the list) or not, we can use \(\widehat{RT}_{Y, i}\) and \(\widehat{RT}_{N, i}\) to find the predicted mean correct and error RT’s. If trial \(i\) is a target, then the predicted mean correct RT is \(\widehat{RT}_{Y, i}\) and the predicted mean error RT is \(\widehat{RT}_{N, i}\). These are reversed if the probe on trial \(i\) is a foil.
You may ask why it is useful to find the mean RT’s and response probabilities for each trial when we only see one outcome per trial. You can consider the predicted mean RT’s and response probabilities as what we would get if we simulated many many repetitions of trial \(i\). In that case, the mean RT’s and response probabilities represent the best possible guesses we could make about how trial \(i\) actually turned out.
As it turns out, there are closed-form (if ugly) expressions for the response probability and mean RT at each boundary for a Wiener process. These are given below and are computed as part of the Stan program. Note that these formulas refer to \(d_i\), the non-flipped scaled drift rate on trial \(i\), and \(\beta_{s_i}\), the non-flipped bias parameter for participant \(s_i\). \[ \begin{align} \hat{p}_i & = \frac{1 - \exp \left( -2 b_i d_i \beta_{s_i} \right)}{1 - \exp \left( -2 b_i d_i \right)} \\ \widehat{RT}_{Y, i} & = r_{s_i} + \frac{b_i}{d_i} \left\lbrace \frac{1}{\tanh \left( b_i d_i \right)} - \frac{\beta_{s_i}}{\tanh \left(b_i d_i \beta_{s_i} \right)} \right\rbrace \\ \widehat{RT}_{N, i} & = r_{s_i} + \frac{b_i}{d_i} \left\lbrace \frac{1}{\tanh \left( b_i d_i \right)} - \frac{\left(1 - \beta_{s_i} \right)}{\tanh \left[b_i d_i \left(1 - \beta_{s_i} \right) \right]} \right\rbrace \\ \end{align} \]
The trial-level predictions and log-likelihoods are computed in the
generated quantities
block of the Stan program:
generated quantities {
vector[nData] p_resp;
vector[nData] mean_rt_corr;
vector[nData] mean_rt_err;
vector[nData] ll;
for (i in 1:nData) {
real this_drift = trial_drift[i] / (2.0 * resp[i] - 1.0);
p_resp[i] = (1.0 - exp(-2.0 * trial_boundsep[i] * this_drift * bias[subject[i]])) / (1.0 - exp(-2.0 * trial_boundsep[i] * this_drift));
if (isTarget[i] == 1) {
mean_rt_corr[i] = resid[subject[i]] + trial_boundsep[i] * (1.0 / tanh(this_drift * trial_boundsep[i]) - bias[subject[i]] / tanh(this_drift * trial_boundsep[i] * bias[subject[i]])) / this_drift;
mean_rt_err[i] = resid[subject[i]] + trial_boundsep[i] * (1.0 / tanh(this_drift * trial_boundsep[i]) - (1.0 - bias[subject[i]]) / tanh(this_drift * trial_boundsep[i] * (1.0 - bias[subject[i]]))) / this_drift;
}
else {
mean_rt_err[i] = resid[subject[i]] + trial_boundsep[i] * (1.0 / tanh(this_drift * trial_boundsep[i]) - bias[subject[i]] / tanh(this_drift * trial_boundsep[i] * bias[subject[i]])) / this_drift;
mean_rt_corr[i] = resid[subject[i]] + trial_boundsep[i] * (1.0 / tanh(this_drift * trial_boundsep[i]) - (1.0 - bias[subject[i]]) / tanh(this_drift * trial_boundsep[i] * (1.0 - bias[subject[i]]))) / this_drift;
}
ll[i] = wiener_lpdf(rt[i] | trial_boundsep[i], resid[subject[i]], trial_bias[i], trial_drift[i]);
}
}