The Exemplar-Based Random Walk (EBRW) is a dynamic extension of the Generalized Context Model (GCM). According to these models, each time you experience an item in a particular context, it leaves a “trace” in memory. This trace is called an “exemplar” or an “instance” and it consists of a record of the values the item had for particular features. When you are presented with a probe item, it activates each of these exemplars in memory in proportion to how similar the features of the probe are to the features of the exemplars in memory. The summed activation across all exemplars in memory is termed the familiarity of the probe. The extent to which familiarity is higher than a criterion determines the extent to which the probe will be recognized as having been seen before. Specifically, familiarity determines the rate at which evidence accumulates toward either a “yes” or “no” decision regarding whether the probe item was seen before. If the accumulated evidence reaches an upper threshold, a “yes” decision is made; otherwise, if the accumulated evidence reaches a lower threshold, a “no” decision is made.
The text below first describes how familiarity is determined by the summed similarity between a probe and the exemplars in memory, using the concrete example of items consisting of patches of color. Then, the dynamics of evidence accumulation are described. These dynamics are first described as a random walk between the thresholds for “yes” and “no” decisions. Then, we illustrate how the same dynamics can be described in continuous time as a diffusion process instead of a random walk. Finally, we see how the continuous diffusion process makes it easier to fit the parameters of the EBRW model to recognition data.
For concreteness, we imagine an experiment in which several color patches were studied. Colors can be described in terms of three features: hue, chroma (or “saturation”), and luminance (or “brightness”). Imagine that you saw many reddish-hued colors that varied in chroma and luminance. According to the GCM, each of the colors you saw would leave a trace in memory. Each trace would consist of the values of the color you saw along those three dimensions. In this example, all the colors have the same value for “hue”, which is 0. The value of “0” for “hue” corresponds to a reddish color. The colors differ in their values for chroma and luminance, as illustrated in the graph below:
colDF <- expand_grid(h = hue, c = seq(30, 70, length.out = 5), l = seq(30, 70, length.out = 5)) %>%
mutate(col = hcl(h, c, l))
colDF %>%
ggplot(aes(x = c, y = l, fill = col)) +
geom_tile(width = 2.5, height = 2.5) +
scale_fill_identity() +
coord_equal() +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", caption = bquote(plain(hue) == .(round(hue))))
Your memory traces can also be written as a matrix, where each row corresponds to an item (color) and each column corresponds to a feature (h for “hue”, c for “chroma”, and l for “luminance”):
knitr::kable(head(colDF %>% select(h, c, l), 10), row.names = TRUE)
h | c | l | |
---|---|---|---|
1 | 0 | 30 | 30 |
2 | 0 | 30 | 40 |
3 | 0 | 30 | 50 |
4 | 0 | 30 | 60 |
5 | 0 | 30 | 70 |
6 | 0 | 40 | 30 |
7 | 0 | 40 | 40 |
8 | 0 | 40 | 50 |
9 | 0 | 40 | 60 |
10 | 0 | 40 | 70 |
The perceived distance between any two of these colors is the Euclidean distance between their feature values, weighted by the degree of attention given to each feature: \[ d_{ij} = \sqrt{\sum_{k = 1}^{N_F} w_k \left(x_{ik} - x_{jk} \right)^2} \] where \(w_k\) is the weight given to feature \(k\), \(x_{ik}\) is the value of item \(i\) on feature \(k\), \(x_{jk}\) is the value of item \(j\) on feature \(k\), \(d_{ij}\) is the perceived distance between items \(i\) and \(j\), and \(N_F\) is the number of features which in this example is 3 (hue, chroma, and luminance). Note also that while Euclidean distance is appropriate for colors, other types of distance metrics may be more appropriate for other kinds of stimuli.
Perceived similarity between two items, \(s_{ij}\), is an exponential function of the psychological distance between those items: \[ s_{ij} = \exp(-c d_{ij}) \] where \(c\) is a sensitivity parameter that controls how quickly perceived similarity decreases with distance, as illustrated in the graph below:
expand_grid(d = seq(0, 10, length.out = 151), c = seq(0.25, 3, by = 0.25)) %>%
mutate(s = exp(-c * d)) %>%
ggplot(aes(x = d, y = s, color = c, group = c)) +
geom_line() +
scale_color_continuous_sequential() +
labs(x = expression(d[ij]), y = expression(s[ij])) +
theme(legend.position = c(1, 1), legend.justification = c(1, 1))
Imagine that you have just been shown two colors from among the set shown earlier. We then present you with a third color—a probe item—and ask whether it was one of the two you just saw. You answer this question on the basis of the summed similarity between the probe item and the two items in memory. The probe item has its own vector of feature values, \(\mathbf{x_q}\). The perceived distance between the probe item and each of the memory items is, as above, the Euclidean distance between the probe’s feature values and those of the memory item: \[ \begin{align} d_{qj} & = \sqrt{\sum_{k = 1}^{N_F} w_k \left(x_{qk} - x_{jk} \right)^2} \\ s_{qj} & = \exp \left( -c d_{qj} \right) \\ S & = \sum_{j = 1}^{N_M} s_{qj} \end{align} \] and, again, the perceived similarity \(s_{qj}\) between the probe \(q\) and memory item \(j\) is an exponential function of perceived distance \(d_{qj}\). Finally, summed similarity \(S\) is the sum of the perceived similarities across all \(N_M\) items in memory.
The graphs below illustrate how this works. The left graph shows contours of equal similarity from each of two study items. The right graph shows summed similarity as a function of the chroma and luminance of a probe item (assuming the same hue).
X <- matrix(
c(40, 60,
60, 50),
nrow = 2,
ncol = 2,
byrow = TRUE
)
sens <- 0.05
colDF <- tibble(h = hue, c = X[,1], l = X[,2]) %>%
mutate(col = hcl(h, c, l))
item_cols <- as.vector(colDF$col)
names(item_cols) <- as.character(1:length(item_cols))
simDF <- expand_grid(c = seq(30, 70, length.out=101), l = seq(30, 70, length.out=101), item = c(1, 2)) %>%
mutate(d = sqrt((c - X[item, 1])^2 + (l - X[item, 2])^2)) %>%
mutate(s = exp(-sens * d))
conPlot <- colDF %>%
ggplot(aes(x = c, y = l)) +
geom_contour(aes(z = s, color = factor(item)), data = simDF, alpha = 2/3, breaks = seq(0, 1, by = 0.2)) +
geom_tile(aes(fill = col), width = 2.5, height = 2.5) +
scale_fill_identity(aesthetics = "fill") +
scale_color_manual(values = item_cols, aesthetics = "color", guide = "none") +
coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Contours of equal similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)))))
sumSimPlot <- simDF %>%
group_by(c, l) %>%
summarize(summed_sim = sum(s), .groups = "keep") %>%
ggplot(aes(x = c, y = l, fill = summed_sim)) +
geom_raster() +
scale_fill_viridis_c(option = "magma", limits = c(0, 3), guide = "none") +
coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Summed similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)))))
conPlot + sumSimPlot + plot_layout(nrow = 1)
It is reasonable to believe that some memory traces are stronger than others, likely due to things like primacy and recency. In GCM/EBRW, “strength” is operationalized as a scaling factor \(m_j\) applied to perceived similarity: \[ s_{qj} = m_j \exp \left( -c d_{qj} \right) \]
Stronger traces have their similarity multiplied by a large value (\(m_j\) is large if trace \(j\) is strong) while weaker traces have their similarity multiplied by a small value (\(m_j\) is small if trace \(j\) is weak). This is illustrated in the pair of graphs below. A probe item does not need to be as similar to a strong item in order to evoke the same level of perceived similarity.
X <- matrix(
c(40, 60,
60, 50),
nrow = 2,
ncol = 2,
byrow = TRUE
)
sens <- 0.05
strength <- c(1, 2)
colDF <- tibble(h = hue, c = X[,1], l = X[,2]) %>%
mutate(col = hcl(h, c, l)) %>%
mutate(label = c("1", "2"))
item_cols <- as.vector(colDF$col)
names(item_cols) <- as.character(1:length(item_cols))
simDF <- expand_grid(c = seq(30, 70, length.out=101), l = seq(30, 70, length.out=101), item = c(1, 2)) %>%
mutate(d = sqrt((c - X[item, 1])^2 + (l - X[item, 2])^2)) %>%
mutate(s = exp(-sens * d) * strength[item])
conPlot <- colDF %>%
ggplot(aes(x = c, y = l)) +
geom_contour(aes(z = s, color = factor(item)), data = simDF, alpha = 2/3, breaks = seq(0, 2, by = 0.2)) +
geom_tile(aes(fill = col), width = 2.5, height = 2.5) +
geom_text(aes(label = label), color = "black") +
scale_fill_identity(aesthetics = "fill") +
scale_color_manual(values = item_cols, aesthetics = "color", guide = "none") +
coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Contours of equal similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)))))
sumSimPlot <- simDF %>%
group_by(c, l) %>%
summarize(summed_sim = sum(s), .groups = "keep") %>%
ggplot(aes(x = c, y = l, fill = summed_sim)) +
geom_raster() +
scale_fill_viridis_c(option = "magma", limits = c(0, 3), guide = "none") +
coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Summed similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)))))
conPlot + sumSimPlot + plot_layout(nrow = 1)
The specificity parameter \(c\) represents the fact that items are assumed to be encoded with some degree of error/uncertainty. Just as items may be encoded with more or less strength, it is reasonable to assume that items can be encoded in memory with more or less specificity. Thus, we add a subscript to the \(c\) parameter corresponding to each study item: \[ s_{qj} = m_j \exp \left( -c_j d_{qj} \right) \]
If an item is encoded with high specificity, then it will only be perceived as similar to the probe if the probe is very close in psychological space. This is illustrated in the pair of graphs below, where item 2 is not only stronger (\(m_2 = 2\) vs. \(m_1 = 1\)) but also more precise (\(c_2 = 0.1\) vs. \(c_1 = 0.05\)).
X <- matrix(
c(40, 60,
60, 50),
nrow = 2,
ncol = 2,
byrow = TRUE
)
specificity <- c(0.05, 0.1)
strength <- c(1, 2)
colDF <- tibble(h = hue, c = X[,1], l = X[,2]) %>%
mutate(col = hcl(h, c, l)) %>%
mutate(label = c("1", "2"))
item_cols <- as.vector(colDF$col)
names(item_cols) <- as.character(1:length(item_cols))
simDF <- expand_grid(c = seq(30, 70, length.out=101), l = seq(30, 70, length.out=101), item = c(1, 2)) %>%
mutate(d = sqrt((c - X[item, 1])^2 + (l - X[item, 2])^2)) %>%
mutate(s = exp(-specificity[item] * d) * strength[item])
conPlot <- colDF %>%
ggplot(aes(x = c, y = l)) +
geom_contour(aes(z = s, color = factor(item)), data = simDF, alpha = 2/3, breaks = seq(0, 2, by = 0.2)) +
geom_tile(aes(fill = col), width = 2.5, height = 2.5) +
geom_text(aes(label = label), color = "black") +
scale_fill_identity(aesthetics = "fill") +
scale_color_manual(values = item_cols, aesthetics = "color", guide = "none") +
coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Contours of equal similarity", caption = bquote(list(plain(hue) == .(round(hue)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)), c[1] == .(signif(specificity[1], 3)), c[2] == .(signif(specificity[2], 3)))))
sumSimPlot <- simDF %>%
group_by(c, l) %>%
summarize(summed_sim = sum(s), .groups = "keep") %>%
ggplot(aes(x = c, y = l, fill = summed_sim)) +
geom_raster() +
scale_fill_viridis_c(option = "magma", limits = c(0, 3), guide = "none") +
coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Summed similarity", caption = bquote(list(plain(hue) == .(round(hue)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)), c[1] == .(signif(specificity[1], 3)), c[2] == .(signif(specificity[2], 3)))))
conPlot + sumSimPlot + plot_layout(nrow = 1)
According to the EBRW, each trace \(j\) in memory races to be retrieved at a rate proportional to its perceived similarity to the probe item, \(s_{qj}\). Traces race not just against one another, but against a criterion. If a memory trace wins the race, this is taken as evidence that the probe item matches something in memory, thus favoring a “yes” recognition response. If the criterion wins the race instead, this is taken as evidence that the probe item does not match anything in memory, thus favoring a “no” recognition response.
The idea is that, if the probe item matches something in memory, then the corresponding memory trace (or one sufficiently similar to probably match) should be able to win against the criterion. If nothing wins against the criterion, then this suggests there are no traces in memory that are a good match to the probe.
The outcome of each race is added to a running tally which starts at zero. The value of the tally at any given time \(t\), which we can denote \(x(t)\), constitutes the current state of evidence from memory for making a recognition decision. Assume that each race takes \(\Delta\) seconds to complete. Each time a memory trace wins the race, the tally gets incremented by one; each time the criterion wins the race, the tally gets decremented by one. Put formally, we can write this process as \[ x\left( t + \Delta \right) = \begin{cases} x\left( t \right) + 1 & \text{if trace wins} \\ x\left( t \right) - 1 & \text{if criterion wins} \end{cases} \] where \(x(0) = 0\).
You commit to a recognition decision when your accumulated memory evidence is either sufficiently high, leading to a “yes” response, or sufficiently low, leading to a “no” response. We can write the upper and lower decision boundaries as \(B_Y\) and \(B_N\), respectively. For example, imagine setting \(B_Y = 7\) and \(B_N = -8\). In that case, you would say “yes” when \(x(t) = B_Y = 7\), i.e., when a memory trace had won seven more races than the criterion. You would say “no” when \(x(t) = B_N = -8\), i.e., when the criterion had won eight more races than any memory trace. In this example, you would have a slight bias toward saying “yes”, in the sense that you require less evidence to say “yes” than to say “no”. We might therefore expect that, all else being equal, you would tend to say “yes” faster than you would say “no”, since in general it will take less time to accumulate an advantage of \(B_Y = 7\) as opposed to an advantage of \(|B_N| = 8\).
Although we have specified that whether memory evidence goes up or down depends on whether a trace or the criterion wins the race, we have not yet specified how the outcome of that race is determined. The winner of each race is random but depends on the similarity \(s_{qj}\) between each trace \(j\) and the probe \(q\). Specifically, trace \(j\) wins the race with probability: \[ \frac{s_{qj}}{\sum_{k = 1}^N s_{qk} + \kappa} \] where \(\kappa\) is a nonnegative number that represents how stringent the criterion is. In other words, the equation above says that the probability that trace \(j\) wins the race is the similarity between trace \(j\) and the probe \(q\) divided by the summed similarity across all traces plus the criterion \(\kappa\). The probability that the criterion wins is \[ \frac{\kappa}{\sum_{k = 1}^N s_{qk} + \kappa} \] In a sense, we can think of the criterion is like a “virtual memory trace” that races alongside the \(N\) actual memory traces.
Remember that we increment memory strength whenever any trace wins the race, regardless of which one it is. Because only one trace can win each race, the probability that any trace wins is just the sum of the probabilities of winning across all \(N\) traces, i.e.: \[ \sum_{j = 1}^N \frac{s_{qj}}{\sum_{k = 1}^N s_{qk} + \kappa} = \frac{1}{\sum_{k = 1}^N s_{qk} + \kappa} \left( \sum_{j = 1}^Ns_{qj} \right) = \frac{S}{S + \kappa} \] where \(S = \sum_{j = 1}^N s_{qj}\) is the summed similarity across all \(N\) traces.
It is reasonable to think that the criterion \(\kappa\) could be adjusted in response to a number of factors. One such factor might be list length; it would make sense to set a more stringent criterion for a longer list, since there will be a lot of incidental similarity between probes and traces.
However, in the present application, we only have one list length (\(N = 2\)). Another factor that can be important in adjusting criterion is within-list similarity. If the items within a list are themselves highly similar, then it makes sense to set a more stringent criterion since otherwise you may falsely recognize a probe that happens to be similar to both items. In other words, recognition should depend, at least in part, on judging whether the probe and trace(s) are more similar to one another than the traces are to themselves.
Conceptually, within-list similarity is the average similarity between all pairs of list items. Practically, this is a bit challenging to calculate because similarity depends on the strength and specificity with which the items are encoded.
To calculate within-list similarity, let’s first consider a pair of traces \(j\) and \(k\). If we treat trace \(k\) as if it were a probe (like \(q\)), then the similarity between \(k\) and \(j\) would be \[ s_{kj} = m_j \exp \left(-c_j d_{kj} \right) \] where the strength \(m_j\) and specificity \(c_j\) come from trace \(j\). We can also go the other direction and treat trace \(j\) as if it were a probe, resulting in \[ s_{jk} = m_k \exp \left(-c_k d_{jk} \right) \] We assume that the perceived similarity between traces \(j\) and \(k\) is just the average of \(s_{kj}\) and \(s_{jk}\), i.e., \[ \frac{s_{kj} + s_{jk}}{2} = \frac{1}{2} \left[ m_j \exp \left(-c_j d_{kj} \right) + m_k \exp \left(-c_k d_{jk} \right) \right] \]
In our present application, \(N = 2\), so the formula above is all we need to find the within-list similarity; just set \(j = 1\) and \(k = 2\). In general, however, within-list similarity can be written as \[ \begin{align} W & = \frac{1}{N \left(N - 1 \right)}\sum_{j = 1}^{N - 1} \sum_{k = j + 1}^N \left[ m_j \exp \left(-c_j d_{kj} \right) + m_k \exp \left(-c_k d_{jk} \right) \right] \end{align} \] where we divide by \(N \left(N - 1 \right)\) because that is the number of pairwise similarities entering into the average. Notice also that the indexes \(j\) and \(k\) are cleverly chosen so as to cover every unique \(jk\) pair.
Finally, we can write the probability of taking a step up (i.e., of a trace winning the race) as \[ \frac{S}{S + \gamma W + \kappa} \] where \(\gamma W + \kappa\) reflects the criterion adjusted for within-list similarity, and \(\gamma\) is a free parameter (assumed to be nonnegative) that reflects the strength of the adjustment.
Finally, we are in a position to fully specify the Exemplar-Based Random Walk (EBRW) model. Let \(i\) be the index of the current experimental trial and \(s_i\) indicate the participant who produced trial \(i\). That participant is associated with a matrix of perceived distances \(\mathbf{d_{s_i}}\), where each entry in the matrix \(d_{{s_i}, jk}\) is the perceived distance between items \(j\) and \(k\) for participant \(s_i\). The probe item on trial \(i\) is item \(q_i\). The item studied in position \(j\) on trial \(i\) is item \(j_i\) (same goes for \(k\)). The summed similarity \(S_i\), within-list similarity \(W_i\), and step probability \(p_i\) for trial \(i\) can then be written \[ \begin{align} S_i & = \sum_{j = 1}^N m_{s_i, j} \exp \left( -c_{s_i, j} d_{s_i, q_i, j_i} \right) \\ W_i & = \frac{1}{N \left(N - 1 \right)}\sum_{j = 1}^{N - 1} \sum_{k = j + 1}^N \left[ m_{s_i, j} \exp \left(-c_{s_i, j} d_{s_i, k_i, j_i} \right) + m_{s_i, k} \exp \left(-c_{s_i, k} d_{s_i, j_i, k_i} \right) \right] \\ p_i & = \frac{S_i}{S_i + \gamma_{s_i} W_i + \kappa_{s_i}} \end{align} \] where \(N\) is the number of items in the list, \(m_{s_i, j}\) are free parameters for the strengths of items in different study positions, \(c_{s_i, j}\) are the specificity parameters for items in different study positions, and \(\gamma_{s_i}\) and \(\kappa_{s_i}\) are free parameters for the participant’s within-list similarity and criterion.
The EBRW models the speed of decision making in terms of how many races must be run until the accumulated win advantage in favor of either a “yes” or “no” response reaches a decision boundary. To convert this to “real time”, we must say how long each race takes and allow for a residual time that accounts for the additional time needed to orient attention to the trial and execute the selected motor response. Above, we used \(\Delta\) to stand for the amount of time (in seconds) each race takes to run. We add a subscript \(s_i\) to make \(\Delta_{s_i}\) to make clear that \(\Delta\) is a parameter that can vary between people. In addition, it will be convenient later to think instead of \(\nu_{s_i} = \frac{1}{\Delta_{s_i}}\), where \(\nu_{s_i}\) is the number of races per second. Finally, parameter \(R_{s_i}\) stands for the additional residual time.
The figure below shows an example of how memory evidence evolves during a single trial in which \(p_i = 0.55\), \(\nu_{s_i} = 40\), \(B_{s_i, Y} = 7\), \(B_{s_i, N} = -8\), and \(R_{s_i} = 0.2\). In addition, the graphs above and below the evidence trajectory illustrate the relative frequency with which, across many identical trials, each type of response would be made at each unit of time. Note that these distributions are discrete because the random walk operates in discrete time intervals, each of duration \(\Delta_{s_i}\) (which in this example is \(\Delta_{s_i} = \frac{1}{\nu_{s_i}} = 0.025\) seconds).
set.seed(1)
nu <- 40
p <- 0.55
B <- c(-8, 7)
resid <- 0.2
### RT distributions
Y_rw <- seq(B[1], B[2])
P_rw <- matrix(0, nrow = length(Y_rw), ncol = length(Y_rw))
P_rw[cbind(2:(nrow(P_rw) - 1), 1:(ncol(P_rw) - 2))] <- 1 - p
P_rw[cbind(2:(nrow(P_rw) - 1), 3:ncol(P_rw))] <- p
P_rw[1, 1] <- P_rw[nrow(P_rw), ncol(P_rw)] <- 1
### Simulation
while (TRUE) {
winner <- 0
x_rw <- 0
while (TRUE) {
s <- 2 * (runif(n = 1) < p) - 1
x_rw <- c(x_rw, x_rw[length(x_rw)] + s)
if (x_rw[length(x_rw)] <= B[1]) {
winner <- 1
break
} else if (x_rw[length(x_rw)] >= B[2]) {
winner <- 2
break
}
}
if (winner == 2 & (length(x_rw) / nu) > 1.5) {
break
}
}
RT_rw <- matrix(0, nrow = 2, ncol = length(x_rw))
Z_rw <- 1 * c(Y_rw == 0)
for (i in 1:length(x_rw)) {
Z_rw <- Z_rw %*% P_rw
RT_rw[1, i] <- Z_rw[1]
RT_rw[2, i] <- Z_rw[length(Z_rw)]
}
dRT_rw <- apply(RT_rw, MARGIN = 1, FUN = diff) * nu / 2
rtPlot1 <- tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,2])) %>%
ggplot(aes(x = t, y = p)) +
geom_col(fill = "#eeb211", color = NA, width = 1 / nu) +
coord_cartesian(xlim = c(0, NA), ylim = c(0, max(c(dRT_rw)))) +
labs(x = NULL, y = "Pr(Respond at time t)") +
theme(axis.text = element_blank(), axis.ticks.y = element_blank())
rtPlot0 <- tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,1])) %>%
ggplot(aes(x = t, y = p)) +
geom_col(fill = "#46166b", color = NA, width = 1 / nu) +
labs(x = NULL, y = "Pr(Respond at time t)") +
scale_x_continuous(limits = c(0, NA)) +
scale_y_reverse(limits = c(max(c(dRT_rw)), 0)) +
theme(axis.text = element_blank(), axis.ticks.y = element_blank())
rwPlot <- tibble(t = resid + 1:length(x_rw) / nu, x = x_rw) %>%
ggplot(aes(x = t, y = x)) +
geom_step(linewidth = 1.5) +
geom_hline(yintercept = B[2], linetype = "solid", color = "#eeb211", linewidth = 2) +
geom_hline(yintercept = B[1], linetype = "solid", color = "#46166b", linewidth = 2) +
geom_vline(xintercept = resid, linetype = "dashed", color = "#666666", linewidth = 2) +
geom_text(data = tibble(x = 0, y = B[2], label = paste0("B[list(Y, s[i])] == ", B[2])), mapping = aes(x = x, y = y, label = label), color = "#eeb211", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = 2) +
geom_text(data = tibble(x = 0, y = B[1], label = paste0("B[list(N, s[i])] == ", B[1])), mapping = aes(x = x, y = y, label = label), color = "#46166b", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = -1) +
geom_text(data = tibble(x = resid, y = 0, label = paste0("R[s[i]] == ", resid)), mapping = aes(x = x, y = y, label = label), color = "#666666", inherit.aes = FALSE, parse = TRUE, hjust = 1.1, vjust = 1.5) +
coord_cartesian(xlim = c(0, NA)) +
labs(x = "Retrieval time (s)", y = "Memory evidence", caption = bquote(list(p[i] == .(p), nu[s[i]] == .(nu))))
rtPlot1 + rwPlot + rtPlot0 + plot_layout(ncol = 1, heights = c(1, 1.75, 1))
A random walk takes discrete-valued steps either up or down in discrete units of time. A Wiener diffusion process takes continuous-valued steps sampled from a normal distribution in infinitely small units of time, thus effectively operating in continuous time. We are going to approximate the discrete EBRW with a continuous diffusion process (so technically we should call this model the EBD for Exemplar-Based Diffusion, but we will keep calling it the EBRW for posterity).
In going from a random walk to a diffusion model, we are making an important psychological claim: We are saying that, instead of memory evidence arriving in discrete units at regular intervals, memory evidence is a continuous value that continually evolves as new information arrives. We can think of this as saying that, instead of only knowing the outcome of each race, you can see who is ahead and who is behind at any given time; this is the move from discrete time to continuous time. Moreover, instead of only scoring each race as a win or loss for the memory traces, the races are assigned a continuous value depending on how clear the winner is; this is the move from discrete evidence to continuous evidence.
We can write the update equation for the random walk like we did above: \[ \begin{align} x \left( t + \Delta_{s_i} \right) & = \begin{cases} x(t) + 1 & \text{with probability } p_i \\ x(t) - 1 & \text{with probability } 1 - p_i \end{cases} \\ x \left( t + \Delta_{s_i} \right) - x(t) & = \begin{cases} 1 & \text{with probability } p_i \\ -1 & \text{with probability } 1 - p_i \end{cases} \\ x \left( t + \Delta_{s_i} \right) - x(t) & \sim 2 \times \text{Bernoulli} \left( p_i \right) - 1 \end{align} \] where we have rearranged terms and used the shorthand in the final line to emphasize the idea that each step of the random walk can be thought of as a sample from a Bernoulli distribution with parameter \(p_i\) that is then transformed from \(\lbrace 0, 1 \rbrace\) to \(\lbrace -1, 1 \rbrace\).
The update equation above is for a single race, but what if we want to know the distribution of the random walk after a particular span of time? Specifically, what is the distribution of the total change in memory evidence over one second of time? It depends on how many races can be run within one second, which is \(\frac{1}{\Delta_{s_i}} = \nu_{s_i}\). Because the total change over one second is the sum of \(\nu\) Bernoulli random variables each with parameter \(p_i\), it is distributed as a Binomial distribution. As above, this distribution gets transformed to reflect the change in scale in the outcome of each race from \(\lbrace 0, 1 \rbrace\) to \(\lbrace -1, 1 \rbrace\): \[ x \left( t + 1 \right) - x(t) \sim 2 \times \text{Binomial} \left( \nu_{s_i}, p_i \right) - 1 \]
One last step to turn this into a continuous diffusion: We need to approximate the binomial distribution with a normal distribution that has the same mean and variance. The mean and variance of this normal distribution are the mean and variance of the diffusion process that approximates the random walk. The mean of a binomial distribution is \(\nu_{s_i} p_i\) and its variance is \(\nu_{s_i} p_i \left(1 - p_i \right)\). This leads to the normal approximation \[ \begin{align} x \left( t + 1 \right) - x(t) & \sim 2 \times \text{Normal} \left[ \nu_{s_i} p_i, \nu_{s_i} p_i \left(1 - p_i \right) \right] - 1 \\ x \left( t + 1 \right) - x(t) & \sim \text{Normal} \left[ 2\nu_{s_i} p_i - 1, 4\nu_{s_i} p_i \left(1 - p_i \right) \right] \\ x \left( t + 1 \right) - x(t) & \sim \text{Normal} \left( \mu_i, \sigma^2_i \right) \\ \end{align} \] where, to summarize, the mean of the diffusion on trial \(i\) is \(\mu_i\) and the variance of the diffusion on trial \(i\) is \(\sigma_i^2\), both of which depend on step probability \(p_i\) and rate \(\nu_{s_i}\): \[ \begin{align} \mu_i & = \nu_{s_i} \left( 2 p_i - 1 \right) \\ \sigma^2_i & = 4 \nu_{s_i} p_i \left(1 - p_i \right) \\ \sigma_i & = 2 \sqrt{\nu_{s_i} p_i \left(1 - p_i \right)} \end{align} \]
The mean \(\mu_i\) is the drift rate of the diffusion process. The variance \(\sigma_i^2\) is the noise in the diffusion process. For each infinitesimal unit of time \(dt\), the current state of memory evidence is updated according to \[ dx(t) = \mu_i dt + \epsilon_i(t) \] where \[ \epsilon_i(t) \sim \text{Normal} \left( 0, \sigma_i^2 dt \right) \] is a random variable sampled from a Normal distribution with mean zero and variance \(\sigma_i^2 dt\).
Notice an important property of the noise \(\sigma^2_i\): It is largest when \(p_i = 0.5\) and approaches zero as \(p_i\) approaches either 0 or 1. In other words, the more uncertain the outcome of each race, the more noise there is in the diffusion. This is illustrated below:
expand_grid(p = seq(0, 1, length.out = 101), nu = seq(10, 40, by = 10)) %>%
mutate(sigma2 = 4 * nu * p * (1 - p)) %>%
ggplot(aes(x = p, y = sigma2, color = nu, group = nu)) +
geom_line() +
labs(x = expression(p[i]), y = expression(sigma[i]^2), color = expression(nu[s[i]])) +
theme(legend.position = c(1, 1), legend.justification = c(1, 1))
To summarize, the difference between the random walk and the diffusion is that we have swapped out a discrete binomial distribution of evidence increments per unit time with a continuous normal distribution of evidence increments per unit time. Everything else is the same: You still respond “yes” if and when the accumulated evidence \(x(t)\) reaches either the upper boundary \(B_{s_i, Y}\) or the lower boundary \(B_{s_i, N}\) and
To illustrate how well the diffusion process approximates the random walk, the graphs below show the diffusion approximation to the same random walk example used above. The smooth lines in the upper and lower graphs are the probability of responding per unit time (i.e., the probability density function) according to the Wiener diffusion model. The open bars are the same probabilities from the random walk. The diffusion model’s predictions hew very closely to those of the random walk!
mu <- nu * (2 * p - 1)
sigma2 <- 4 * nu * p * (1 - p)
boundsep <- B[2] - B[1]
bias <- (0 - B[1]) / (B[2] - B[1])
delta_t <- 0.001
while (TRUE) {
winner_diff <- NA
x_diff <- 0
while (TRUE) {
x_diff <- c(x_diff, x_diff[length(x_diff)] + rnorm(n = 1, mean = mu * delta_t, sd = sqrt(sigma2 * delta_t)))
if (x_diff[length(x_diff)] <= B[1]) {
winner_diff <- 1
break
} else if (x_diff[length(x_diff)] >= B[2]) {
winner_diff <- 2
break
}
}
if (winner == winner_diff & abs((length(x_diff) * delta_t) - (length(x_rw) / nu)) < (1 / nu)) {
break
}
}
x_diff <- pmax(pmin(x_diff, B[2]), B[1])
t <- seq(1, length(x_diff)) * delta_t
dRT_diff <- cbind(
WienerPDF(t = t, response = "lower", a = boundsep / sqrt(sigma2), v = mu / sqrt(sigma2), w = bias)$value,
WienerPDF(t = t, response = "upper", a = boundsep / sqrt(sigma2), v = mu / sqrt(sigma2), w = bias)$value
)
rtPlot1 <- tibble(t = resid + t, p = dRT_diff[,2]) %>%
ggplot(aes(x = t, y = p)) +
geom_col(data = tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,2])), color = "#eeb211aa", fill = NA, width = 1 / nu) +
geom_area(fill = "#eeb21177", color = "#eeb211") +
coord_cartesian(xlim = c(0, NA), ylim = c(0, max(c(dRT_diff)))) +
labs(x = NULL, y = "Pr(Respond at time t)") +
theme(axis.text = element_blank(), axis.ticks.y = element_blank())
rtPlot0 <- tibble(t = resid + t, p = dRT_diff[,1]) %>%
ggplot(aes(x = t, y = p)) +
geom_col(data = tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,1])), color = "#46166baa", fill = NA, width = 1 / nu) +
geom_area(fill = "#46166b77", color = "#46166b") +
labs(x = NULL, y = "Pr(Respond at time t)") +
scale_x_continuous(limits = c(0, NA)) +
scale_y_reverse(limits = c(max(c(dRT_diff)), 0)) +
theme(axis.text = element_blank(), axis.ticks.y = element_blank())
rwPlot <- tibble(t = resid + t, x = x_diff) %>%
ggplot(aes(x = t, y = x)) +
geom_line() +
geom_hline(yintercept = B[2], linetype = "solid", color = "#eeb211", linewidth = 2) +
geom_hline(yintercept = B[1], linetype = "solid", color = "#46166b", linewidth = 2) +
geom_vline(xintercept = resid, linetype = "dashed", color = "#666666", linewidth = 2) +
geom_text(data = tibble(x = 0, y = B[2], label = paste0("B[list(Y, s[i])] == ", B[2])), mapping = aes(x = x, y = y, label = label), color = "#eeb211", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = 2) +
geom_text(data = tibble(x = 0, y = B[1], label = paste0("B[list(N, s[i])] == ", B[1])), mapping = aes(x = x, y = y, label = label), color = "#46166b", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = -1) +
geom_text(data = tibble(x = resid, y = 0, label = paste0("R[s[i]] == ", resid)), mapping = aes(x = x, y = y, label = label), color = "#666666", inherit.aes = FALSE, parse = TRUE, hjust = 1.1, vjust = 1.5) +
coord_cartesian(xlim = c(0, NA)) +
labs(x = "Retrieval time (s)", y = "Memory evidence", caption = bquote(list(p[i] == .(p), nu[s[i]] == .(nu), mu[i] == .(signif(mu, 3)), sigma[i] == .(signif(sqrt(sigma2), 3)))))
rtPlot1 + rwPlot + rtPlot0 + plot_layout(ncol = 1, heights = c(1, 1.75, 1))
To find the EBRW parameters that best fit the recognition data from
each participant, I implemented the EBRW in R using the
WienR
package.
We model the data at the level of individual trials. For each trial
i
, we know
subject[i]
: The numerical index of the participant who
produced the trial.study_item[i, 1]
and study_item[i, 2]
:
These are the numerical indexes of the first and second items presented
on the study list just prior to this trial. For example, if the study
list for trial i
was items 6
and
3
in that order, then study_item[i, 1]
would
be 6
and study_item[i, 2]
would be
3
.probe_item[i]
: The numerical index of the probe item
presented on trial i
.resp[i]
: The response the participant gave on trial
i
, either 0
(for “no”) or 1
(for
“yes”).rt[i]
: The response time of the participant on trial
i
.dist[1:nSubj, 1:nItem, 1:nItem]
: An array which is
basically a list of matrices of the perceived distances between all
nItem
items, inferred from Multidimensional Scaling for
each participant. dist[s, j, k]
gives the distance
perceived by participant s
between item j
and
item k
.For each participant s
, there are a total of nine
parameters to be estimated. These are:
retrieval_rate[s]
: This is the number of races run per
second, labeled \(\nu\) above.boundsep[s]
: This is the difference \(B_Y - B_N\), called boundary
separation. Since the decision boundaries of a diffusion
process do not need to be whole numbers, it is convenient to use this
and the following parameter instead of \(B_Y\) and \(B_N\).bias[s]
: This is a number between 0 and 1 representing
the degree of bias in how participant s
sets their
decision boundaries. \(B_Y\) is given
by boundsep[s] * (1 - bias[s])
and \(B_N\) is given by
-boundsep[s] * bias[s]
. That way, if bias[s]
is 0.5, the diffusion process is unbiased because \(B_Y = -B_N\) and you need exactly the same
amount of evidence before you are willing to make either a “yes” or “no”
response. If bias[s]
is greater than 0.5, then
s
is biased in favor of “yes” responses while if
bias[s]
is less than 0.5, then s
is biased in
favor of “no” responses.resid[s]
: The residual time, other than that needed for
the diffusion process, needed to orient to the probe item on the trial
and execute the motor response. By definition, this must be a number
between 0 and the minimum RT produced by participant s
.
This was called \(R\) above.strength1[s]
: The strength in memory of the trace of
the first list item, \(m_{s_i, 1}\)
above. Since strengths are on an arbitrary scale, we assume that \(m_{s_i, 2} = 1\) for all participants. That
means strength1[s]
is the strength of the first studied
item relative to the strength of the second for participant
s
.specificity[s]
: This is the mean specificity for
participant s
, called parameter \(c\) above, which together with the next
parameter is used to define the specificity for each serial
position.specificity_diff[s]
: This is a number between 0 and 1
that, combined with the parameter above, specifies the specificity with
which subject s
encodes items in each serial position.
Specifically,
specificity_pos[1, s] = 2 * specificity[s] * (1 - specificity_diff[s])
(the result corresponds to \(c_{s_i,
1}\) above) and
specificity_pos[2, s] = 2 * specificity[s] * specificity_diff[s]
(the result corresponds to \(c_{s_i,
2}\) above). If specificity_diff[s]
is 0.5, then
subject s
encodes items in each position with the same
specificity, specificity[s]
. If
specificity_diff[s]
> 0.5, then the second (more recent)
item is encoded with greater precision than the first; if
specificity_diff[s]
< 0.5, then the first item is
encoded with greater precision than the second.within_sim[s]
: This is how much the criterion for
participant s
is raised by within-list similarity; this
corresponds to parameter \(\gamma\)
above.criterion[s]
: This is the baseline criterion for
participant s
, corresponding to parameter \(\kappa\) above.The whole reason why we took the original EBRW and turned it into a
Wiener diffusion process was so that we could compute the likelihood of
making a “yes” or “no” response at any time. What we need to do now is
find the set of parameter values, for each participant, that, out of all
possible sets of parameter values, assigns the highest likelihood to the
observed set of responses/RT’s across all trials produced by subject
s
. This means we need to tell R how to calculate that
likelihood. We do this by specifying a function that
takes a set of parameters as well as a set of data and returns this
likelihood value. We will break down the design of this function
below.
For one thing, we actually don’t work directly with the likelihood function. The reason is twofold: First, because the likelihood involves multiplying many small numbers, it is numerically unstable; multiply too many small numbers together and even a computer can’t distinguish that result from zero. Second, most optimization routines are minimizers by default, so we need to “reverse-code” the likelihood.
The first issue above we deal with by taking the natural logarithm of the likelihood. Let’s define the likelihood explicitly here:
\[ \text{likelihood} = \prod_{i = 1}^N f \left(\text{resp}_i, \text{RT}_i | \text{parameters} \right) \] where \(f \left(\text{resp}_i, \text{RT}_i | \text{parameters} \right)\) is shorthand for the likelihood of the response and RT observed on trial \(i\), given a set of EBRW parameters. Because we assume that each trial is conditionally independent (conditional on the parameters characterizing the participant that produced that trial), the likelihood of all \(N\) trials is the product of their individual likelihoods. These are all the tiny numbers that need to get added up. By taking the natural logarithm, we turn the product into a sum of log-likelihoods:
\[ \text{log-likelihood} = \sum_{i = 1}^N \log f \left(\text{resp}_i, \text{RT}_i | \text{parameters} \right) \]
Finally, to “reverse-code” the likelihood for the purposes of using typical optimization routines, we just take the negative log-likelihood, often abbreviated NLL.
\[ NLL = \text{negative log-likelihood} = -\sum_{i = 1}^N \log f \left(\text{resp}_i, \text{RT}_i | \text{parameters} \right) \]
Our job is now subtly changed. We are looking for the set of parameters that result in the lowest possible value of the negative log-likelihood (NLL).
Having set our goals, we must now define a function to compute the
NLL for a set of observed responses/RT’s, given a set of parameters. The
function itself, in outline form, looks like the one below. I have
written comments for the things that the function needs to accomplish to
get from what is given to the function (in the parentheses following
function
) to what the function needs to return
at the end.
# Function arguments:
# par: this is a named vector of parameter values
# dist: this is a matrix of perceived distances (derived from MDS) between all pairs of stimuli
# study_items: this is a matrix where each row is a trial and there are two columns; the value in the first column indicates the first studied item, the value in the second column indicates the second studied item
# probe_item: this is a vector giving the index of the probe item on each trial
# resp: this is a vector where each value is 0 or 1, depending on whether the participant responsed "yes" (1) or "no" (0) on that trial
# rt: this is a vector of the response times from each trial
ebrw_nll <- function(par, dist, study_items, probe_item, resp, rt) {
# 1. Transform the given parameters into their appropriate ranges
# 2. Compute the mean and SD of the drift for each trial
# 3. Calculate the log-likelihood of each observed response/RT on each trial
# 4. Return final result
return(-sum(trial_log_likelihood))
}
Let’s now fill in each of those sections in turn.
The vector par
that is the first argument to the
ebrw_nll
function should be a named vector that
looks like the following:
par <- c(
"retrieval_rate" = 3,
"boundsep" = 2.3,
"bias" = 0,
"resid" = 0,
"strength1" = -0.22,
"specificity" = 0.41,
"specificity_diff" = 0.41,
"within_sim" = -1.2,
"criterion" = -0.92
)
The vector par
is a bit odd, however, because it
specifies the values of each of those named parameters on a scale that
goes from \(-\infty\) to \(\infty\). This is because many optimization
routines do not allow for constraints on parameter values. But this
doesn’t pose a problem for us, because we can transform the “raw” values
in the par
vector into the actual parameter values on their
corresponding ranges, so that we can use those parameters as they were
intended.
We use basically two functions to perform these transformations:
exp()
and plogis()
. The exponential function
exp()
takes numbers that can fall across the entire real
line and returns a value that is strictly positive. So we will use the
exp()
function for any parameters that are constrained to
be positive but that do not have an upper bound.
The plogis()
function is also called the “logistic”
function. It takes values that can fall across the entire real line and
returns a value that is between 0 or 1. We will use the
plogis()
function for any parameters that have both an
upper and lower bound. If the upper bound is different from 1, then we
can just multiply the transformed parameter by its proper maximum (this
is what we do for residual time).
The graph below illustrates the exp()
and
plogis()
functions.
expand_grid(x = seq(-4, 4, length.out = 201), trans = c("exp", "plogis")) %>%
mutate(y = if_else(trans == "exp", exp(x), plogis(x))) %>%
ggplot(aes(x = x, y = y, color = trans, linetype = trans)) +
geom_hline(yintercept = c(0, 1), linetype = "dotted", color = "black") +
geom_line() +
coord_cartesian(ylim = c(0, 3)) +
labs(x = "Raw value", y = "Transformed value", color = NULL, linetype = NULL) +
theme(legend.position = c(0.01, 0.99), legend.justification = c(0, 1))
We can now update our definition of the ebrw_nll
function to include the necessary parameter transformations:
# Function arguments:
# par: this is a named vector of parameter values
# dist: this is a matrix of perceived distances (derived from MDS) between all pairs of stimuli
# study_items: this is a matrix where each row is a trial and there are two columns; the value in the first column indicates the first studied item, the value in the second column indicates the second studied item
# probe_item: this is a vector giving the index of the probe item on each trial
# resp: this is a vector where each value is 0 or 1, depending on whether the participant responsed "yes" (1) or "no" (0) on that trial
# rt: this is a vector of the response times from each trial
ebrw_nll <- function(par, dist, study_items, probe_item, resp, rt) {
# 1. Transform the given parameters into their appropriate ranges
# 1a. Find the minimum observed RT; this is the upper bound for the residual time parameter
min_rt <- min(rt)
# 1b. Transform the raw parameters as needed
retrieval_rate <- exp(par["retrieval_rate"])
boundsep <- exp(par["boundsep"])
bias <- plogis(par["bias"])
resid <- plogis(par["resid"]) * min_rt
strength1 <- exp(par["strength1"])
specificity <- exp(par["specificity"])
specificity_diff <- plogis(par["specificity_diff"])
within_sim <- exp(par["within_sim"])
criterion <- exp(par["criterion"])
# 1c. As noted above, this trick lets us take the overall specificity ("specificity") and allocate it to each of the positions according to "specificity_diff".
specificity_pos1 <- 2 * specificity * (1 - specificity_diff)
specificity_pos2 <- 2 * specificity * specificity_diff
# 2. Compute the mean and SD of the drift for each trial
# 3. Calculate the log-likelihood of each observed response/RT on each trial
# 4. Return final result
return(-sum(trial_log_likelihood))
}
Armed with our properly-transformed parameter values, we can now use
them, in conjunction with the dist
,
study_item
, and probe_item
arguments, to
compute what the mean and SD of the drift will be for each trial.
Assume that the index variable i
indicates which trial
we are currently looking at. Then the study items for that trial can be
found by looking up study_items[i, ]
. Specifically,
study_items[i, 1]
and study_items[i, 2]
tell
us the identities of the first and second items studied on trial
i
. Meanwhile, probe_item[i]
tells us the
identity of the probe item on trial i
. As a result,
dist[study_items[i, 1], probe_item[i]]
will give us the
perceived distance between the first study item and the probe item on
trial i
. The similarity between the probe item and the
first studied item on trial i
is then given by
strength1 * exp(-specificity_pos1 * dist[study_items[i, 1], probe_item[i]])
.
Finally, the summed similarity between the probe item and both studied
items on trial i
can be computed and stored as
trial_sum_sim
:
trial_sum_sim <- strength1 * exp(-specificity_pos1 * dist[study_items[i, 1], probe_item]) + exp(-specificity_pos2 * dist[study_items[i, 2], probe_item])
We also need to compute the average similarity between items within
the list, too. As described in mathematical form above, the similarity
between the first and second items is
strength1 * exp(-specificity_pos1 * dist[study_items[i, 1], study_items[i, 2]])
while the similarity between the second and first items is
strength2 * exp(-specificity_pos2 * dist[study_items[i, 1], study_items[i, 2]])
.
Thus, we can define the average within-list similarity on trial
i
(called trial_list_sim
) as:
trial_list_sim <- (strength1 * exp(-specificity_pos1 * dist[study_items[i, 1], study_items[i, 2]]) + exp(-specificity_pos2 * dist[study_items[i, 1], study_items[i, 2]])) / 2
At this point, we can directly define the probability of taking an upward step in the random walk and then the mean and standard deviation of the corresponding diffusion process by translating the formulas above into R code.
trial_p <- trial_sum_sim / (trial_sum_sim + criterion + within_sim * trial_list_sim)
trial_drift_mean <- retrieval_rate * (2 * trial_p - 1)
trial_drift_sd <- 2 * sqrt(retrieval_rate * trial_p * (1 - trial_p))
The code above was written under the assumption that we were dealing
with just a single trial with index i
. However, R makes it
easy to do the same set of calculations for all trials at once via the
miracle of vectorization. “Vectorization” means that an
operation that works on a single number can also be applied at once to
an entire vector (or matrix or even array) of
numbers. We will can use this trick because probe_item
is a
vector, and so are the columns of study_items
(e.g.,
study_items[, 1]
is a vector of the first studied item in
every trial).
To index the appropriate entries in the dist
matrix, we
use the cbind
function (short for “bind into columns”) to
stitch together the indices we need. For example,
cbind(study_items[, 1], probe_item)
gives a matrix where
each row is a trial, column 1 is the first studied item on each trial,
and column 2 is the second studied item on each trial. Then
dist[cbind(study_items[, 1], probe_item)]
gives a
vector of the distances between all the first-studied items and
probe items across every trial.
Armed with this vectorization trick, we can define
trial_sum_sim
not just for trial i
, but as a
vector where each entry is the summed similarity on a different
trial:
trial_sum_sim <- strength1 * exp(-specificity_pos1 * dist[cbind(study_items[, 1], probe_item)]) + exp(-specificity_pos2 * dist[cbind(study_items[, 2], probe_item)])
For within-list similarity, we don’t even need the cbind
function, since study_items
is already a matrix where each
row is a trial and the two columns indicate the first and second studied
items:
trial_list_sim <- (strength1 * exp(-specificity_pos1 * dist[study_items]) + exp(-specificity_pos2 * dist[study_items])) / 2
We have now completed the second step of writing the
ebrw_nll
function, as summarized below.
# Function arguments:
# par: this is a named vector of parameter values
# dist: this is a matrix of perceived distances (derived from MDS) between all pairs of stimuli
# study_items: this is a matrix where each row is a trial and there are two columns; the value in the first column indicates the first studied item, the value in the second column indicates the second studied item
# probe_item: this is a vector giving the index of the probe item on each trial
# resp: this is a vector where each value is 0 or 1, depending on whether the participant responsed "yes" (1) or "no" (0) on that trial
# rt: this is a vector of the response times from each trial
ebrw_nll <- function(par, dist, study_items, probe_item, resp, rt) {
# 1. Transform the given parameters into their appropriate ranges
# 1a. Find the minimum observed RT; this is the upper bound for the residual time parameter
min_rt <- min(rt)
# 1b. Transform the raw parameters as needed
retrieval_rate <- exp(par["retrieval_rate"])
boundsep <- exp(par["boundsep"])
bias <- plogis(par["bias"])
resid <- plogis(par["resid"]) * min_rt
strength1 <- exp(par["strength1"])
specificity <- exp(par["specificity"])
specificity_diff <- plogis(par["specificity_diff"])
within_sim <- exp(par["within_sim"])
criterion <- exp(par["criterion"])
# 1c. As noted above, this trick lets us take the overall specificity ("specificity") and allocate it to each of the positions according to "specificity_diff".
specificity_pos1 <- 2 * specificity * (1 - specificity_diff)
specificity_pos2 <- 2 * specificity * specificity_diff
# 2. Compute the mean and SD of the drift for each trial
# 2a. Compute the summed similarity between the probe and study items on each trial
trial_sum_sim <- strength1 * exp(-specificity_pos1 * dist[cbind(study_items[, 1], probe_item)]) + exp(-specificity_pos2 * dist[cbind(study_items[, 2], probe_item)])
# 2b. Compute the within-list similarity between study items on each trial
trial_list_sim <- (strength1 * exp(-specificity_pos1 * dist[study_items]) + exp(-specificity_pos2 * dist[study_items])) / 2
# 2c. Compute the probability of taking a step up in the random walk
trial_p <- trial_sum_sim / (trial_sum_sim + criterion + within_sim * trial_list_sim)
# 2d. Compute the mean and standard deviation of the corresponding diffusion
trial_drift_mean <- retrieval_rate * (2 * trial_p - 1)
trial_drift_sd <- 2 * sqrt(retrieval_rate * trial_p * (1 - trial_p))
# 3. Calculate the log-likelihood of each observed response/RT on each trial
# 4. Return final result
return(-sum(trial_log_likelihood))
}
This step is almost too easy. We are using the WienR
package, which includes a function called WienerPDF
. The
WienerPDF
function computes for us both the likelihood
and log-likelihood of the response/RT for each trial. We just
need to tell it the drift rate, boundary separation, bias, and residual
time parameters.
There is only one thing we need to do before we can use the
WienerPDF
function. The WienerPDF
function
assumes that the standard deviation of the diffusion is always equal to
one. As such, we need to standardize the drift rate and
boundary separation before we send them to the WienerPDF
function by dividing each by trial_drift_sd
:
trial_standard_drift <- trial_drift_mean / trial_drift_sd
trial_standard_boundsep <- boundsep / trial_drift_sd
Then, we can send these standardized values to the
WienerPDF
function:
trial_wiener <- WienerPDF(t = rt, response = resp + 1, a = trial_standard_boundsep, v = trial_standard_drift, w = bias, t0 = resid)
Notice that WienerPDF
expects response
to
be either 1
or 2
for “no” and “yes”,
respectively, so we have to add 1 to resp
.
Our code now looks like the following:
# Function arguments:
# par: this is a named vector of parameter values
# dist: this is a matrix of perceived distances (derived from MDS) between all pairs of stimuli
# study_items: this is a matrix where each row is a trial and there are two columns; the value in the first column indicates the first studied item, the value in the second column indicates the second studied item
# probe_item: this is a vector giving the index of the probe item on each trial
# resp: this is a vector where each value is 0 or 1, depending on whether the participant responsed "yes" (1) or "no" (0) on that trial
# rt: this is a vector of the response times from each trial
ebrw_nll <- function(par, dist, study_items, probe_item, resp, rt) {
# 1. Transform the given parameters into their appropriate ranges
# 1a. Find the minimum observed RT; this is the upper bound for the residual time parameter
min_rt <- min(rt)
# 1b. Transform the raw parameters as needed
retrieval_rate <- exp(par["retrieval_rate"])
boundsep <- exp(par["boundsep"])
bias <- plogis(par["bias"])
resid <- plogis(par["resid"]) * min_rt
strength1 <- exp(par["strength1"])
specificity <- exp(par["specificity"])
specificity_diff <- plogis(par["specificity_diff"])
within_sim <- exp(par["within_sim"])
criterion <- exp(par["criterion"])
# 1c. As noted above, this trick lets us take the overall specificity ("specificity") and allocate it to each of the positions according to "specificity_diff".
specificity_pos1 <- 2 * specificity * (1 - specificity_diff)
specificity_pos2 <- 2 * specificity * specificity_diff
# 2. Compute the mean and SD of the drift for each trial
# 2a. Compute the summed similarity between the probe and study items on each trial
trial_sum_sim <- strength1 * exp(-specificity_pos1 * dist[cbind(study_items[, 1], probe_item)]) + exp(-specificity_pos2 * dist[cbind(study_items[, 2], probe_item)])
# 2b. Compute the within-list similarity between study items on each trial
trial_list_sim <- (strength1 * exp(-specificity_pos1 * dist[study_items]) + exp(-specificity_pos2 * dist[study_items])) / 2
# 2c. Compute the probability of taking a step up in the random walk
trial_p <- trial_sum_sim / (trial_sum_sim + criterion + within_sim * trial_list_sim)
# 2d. Compute the mean and standard deviation of the corresponding diffusion
trial_drift_mean <- retrieval_rate * (2 * trial_p - 1)
trial_drift_sd <- 2 * sqrt(retrieval_rate * trial_p * (1 - trial_p))
# 3. Calculate the log-likelihood of each observed response/RT on each trial
# 3a. Standardize the drift rates and boundary separation for each trial
trial_standard_drift <- trial_drift_mean / trial_drift_sd
trial_standard_boundsep <- boundsep / trial_drift_sd
# 3b. Find the likelihoods of the given responses/RT
trial_wiener <- WienerPDF(t = rt, response = resp + 1, a = trial_standard_boundsep, v = trial_standard_drift, w = bias, t0 = resid)
# 4. Return final result
return(-sum(trial_log_likelihood))
}
When we are fitting the model to participants from whom we have only a few trials (e.g., a few dozen), then it is possible for the maximum likelihood parameters to be very bizarre. This is because the parameters are just fitting the idiosyncratic noise in the data. One way to counteract this would be to place hard constraints on how large each parameter can get (all the parameters are such that zero is as small as they can get). The problem with this approach is that the exact choice of constraint may matter and it is difficult to justify. Moreover, it may well be that a participant really is best characterized by a “bizarre” parameter value and we do not want to rule those out completely.
An alternative approach is to place soft constraints on the parameters. This is called regularization, because parameter estimates are regularized toward “regular” values when the data are ambiguous. Regularization is, in fact, the same idea as placing a prior distribution in a Bayesian analysis. The prior distribution acts to regularize the resulting parameter estimates so that they tend not to over-fit noise in the data.
For the purpose of estimating the EBRW, I chose the following prior distributions for each free parameter. These priors encourage the parameters not to get too bizarre but do not entirely prevent that from happening if the data require it.
log_prior <-
# Prevents from getting too large
dexp(retrieval_rate, rate = 1 / 50, log = TRUE) +
# Prevents from getting too large
dexp(boundsep, rate = 1 / 50, log = TRUE) +
# Prevents from getting too close to 0 or 1
dbeta(bias, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
# Prevents from getting too close to 0 or 1 (notice the division below)
dbeta(resid / min_rt, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
# Prevents from getting too large or small (Gamma distribution has mode = 1, SD = 1)
dgamma(strength1, shape = 2.618, rate = 1.618, log = TRUE) +
# Prevents from getting too large or small (Gamma distribution has mode = 1, SD = 10)
dgamma(specificity, shape = 1.1, rate = 0.1, log = TRUE) +
# Prevents from getting too close to 0 or 1
dbeta(specificity_diff, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
# Prevents from getting too large
dexp(within_sim, rate = 1 / 20, log = TRUE) +
# Prevents from getting too large
dexp(criterion, rate = 1 / 20, log = TRUE)
Notice that the quantity log_prior
is the sum
of the log-priors for each EBRW parameter. We will incorporate
log_prior
into the negative summed log-likelihood that the
function returns. Doing so has the effect of forcing the optimization to
find parameters that achieve the best possible balance between the
likelihood and the regularizing prior.
As part of filling in the blanks for the final step, in which we
return the final result, it is important for us to do some error
checking. Sometimes, a particular combination of parameters will make it
impossible for the WienerPDF
function to calculate the
log-likelihood. When that happens, it gives an error. In essence, such a
result tells us that the model cannot work with that combination of
parameters. Thus, rather than an “error”, that is really telling us that
we should assign zero likelihood to that set of parameters, which is
equivalent to a log-likelihood of \(-\infty\).
We can do that kind of check in R by putting the
WienerPDF
function call within try()
. If the
WienerPDF
function gives an error, then the result that
gets stored in trial_wiener
is also an error. Otherwise, it
just gives us the log-likelihoods that we want.
Let’s set up an “if…else” structure to do this check:
trial_wiener <- try(WienerPDF(t = rt, response = resp + 1, a = trial_standard_boundsep, v = trial_standard_drift, w = bias, t0 = resid))
if (class(trial_wiener) == "try-error") {
trial_log_likelihood <- -Inf
} else {
trial_log_likelihood <- trial_wiener$logvalue
}
You will notice, too, that the log-likelihoods that we want are
stored under the name logvalue
in the list that is returned
by the WienerPDF
function.
We now have a complete definition of the likelihood function below.
# Function arguments:
# par: this is a named vector of parameter values
# dist: this is a matrix of perceived distances (derived from MDS) between all pairs of stimuli
# study_items: this is a matrix where each row is a trial and there are two columns; the value in the first column indicates the first studied item, the value in the second column indicates the second studied item
# probe_item: this is a vector giving the index of the probe item on each trial
# resp: this is a vector where each value is 0 or 1, depending on whether the participant responsed "yes" (1) or "no" (0) on that trial
# rt: this is a vector of the response times from each trial
ebrw_nll <- function(par, dist, study_items, probe_item, resp, rt) {
# 1. Transform the given parameters into their appropriate ranges
# 1a. Find the minimum observed RT; this is the upper bound for the residual time parameter
min_rt <- min(rt)
# 1b. Transform the raw parameters as needed
retrieval_rate <- exp(par["retrieval_rate"])
boundsep <- exp(par["boundsep"])
bias <- plogis(par["bias"])
resid <- plogis(par["resid"]) * min_rt
strength1 <- exp(par["strength1"])
specificity <- exp(par["specificity"])
specificity_diff <- plogis(par["specificity_diff"])
within_sim <- exp(par["within_sim"])
criterion <- exp(par["criterion"])
# 1c. As noted above, this trick lets us take the overall specificity ("specificity") and allocate it to each of the positions according to "specificity_diff".
specificity_pos1 <- 2 * specificity * (1 - specificity_diff)
specificity_pos2 <- 2 * specificity * specificity_diff
# 2. Compute the mean and SD of the drift for each trial
# 2a. Compute the summed similarity between the probe and study items on each trial
trial_sum_sim <- strength1 * exp(-specificity_pos1 * dist[cbind(study_items[, 1], probe_item)]) + exp(-specificity_pos2 * dist[cbind(study_items[, 2], probe_item)])
# 2b. Compute the within-list similarity between study items on each trial
trial_list_sim <- (strength1 * exp(-specificity_pos1 * dist[study_items]) + exp(-specificity_pos2 * dist[study_items])) / 2
# 2c. Compute the probability of taking a step up in the random walk
trial_p <- trial_sum_sim / (trial_sum_sim + criterion + within_sim * trial_list_sim)
# 2d. Compute the mean and standard deviation of the corresponding diffusion
trial_drift_mean <- retrieval_rate * (2 * trial_p - 1)
trial_drift_sd <- 2 * sqrt(retrieval_rate * trial_p * (1 - trial_p))
# 3. Calculate the log-likelihood of each observed response/RT on each trial
# 3a. Standardize the drift rates and boundary separation for each trial
trial_standard_drift <- trial_drift_mean / trial_drift_sd
trial_standard_boundsep <- boundsep / trial_drift_sd
# 3b. Find the likelihoods of the given responses/RT
trial_wiener <- try(WienerPDF(t = rt, response = resp + 1, a = trial_standard_boundsep, v = trial_standard_drift, w = bias, t0 = resid))
# 4. Return final result
# 4a. Check if the Wiener PDF returned an error or not
if (class(trial_wiener) == "try-error") {
trial_log_likelihood <- -Inf
} else {
trial_log_likelihood <- trial_wiener$logvalue
}
# 4b. Compute regularizing prior terms to encourage (but not constrain) parameters to stay within reasonable ranges
log_prior <-
# Prevents from getting too large
dexp(retrieval_rate, rate = 1 / 50, log = TRUE) +
# Prevents from getting too large
dexp(boundsep, rate = 1 / 50, log = TRUE) +
# Prevents from getting too close to 0 or 1
dbeta(bias, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
# Prevents from getting too close to 0 or 1
dbeta(resid / min_rt, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
# Prevents from getting too large or small (Gamma distribution has mode = 1, SD = 1)
dgamma(strength1, shape = 2.618, rate = 1.618, log = TRUE) +
# Prevents from getting too large or small (Gamma distribution has mode = 1, SD = 10)
dgamma(specificity, shape = 1.1, rate = 0.1, log = TRUE) +
# Prevents from getting too close to 0 or 1
dbeta(specificity_diff, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
# Prevents from getting too large
dexp(within_sim, rate = 1 / 20, log = TRUE) +
# Prevents from getting too large
dexp(criterion, rate = 1 / 20, log = TRUE)
# 4b. The value to be returned is the negative of the summed log likelihood plus log prior
return(-(sum(trial_log_likelihood) + log_prior))
}
When we fit the model, it is important to check whether its predictions are close to what was observed. Therefore, it will be important to be able to calculate, for each trial \(i\), three things:
Since we know whether the probe item on trial \(i\) was a target (from the list) or not, we can use \(\widehat{RT}_{Y, i}\) and \(\widehat{RT}_{N, i}\) to find the predicted mean correct and error RT’s. If trial \(i\) is a target, then the predicted mean correct RT is \(\widehat{RT}_{Y, i}\) and the predicted mean error RT is \(\widehat{RT}_{N, i}\). These are reversed if the probe on trial \(i\) is a foil.
You may ask why it is useful to find the mean RT’s and response probabilities for each trial when we only see one outcome per trial. You can consider the predicted mean RT’s and response probabilities as what we would get if we simulated many many repetitions of trial \(i\). In that case, the mean RT’s and response probabilities represent the best possible guesses we could make about how trial \(i\) actually turned out.
As it turns out, there are closed-form (if ugly) expressions for the response probability and mean RT at each boundary for a Wiener process. These are given below and are computed as part of the Stan program. Note that these formulas refer to \(d_i\), the non-flipped scaled drift rate on trial \(i\), and \(\beta_{s_i}\), the non-flipped bias parameter for participant \(s_i\). \[ \begin{align} \hat{p}_i & = \frac{1 - \exp \left( -2 b_i d_i \beta_{s_i} \right)}{1 - \exp \left( -2 b_i d_i \right)} \\ \widehat{RT}_{Y, i} & = r_{s_i} + \frac{b_i}{d_i} \left\lbrace \frac{1}{\tanh \left( b_i d_i \right)} - \frac{\beta_{s_i}}{\tanh \left(b_i d_i \beta_{s_i} \right)} \right\rbrace \\ \widehat{RT}_{N, i} & = r_{s_i} + \frac{b_i}{d_i} \left\lbrace \frac{1}{\tanh \left( b_i d_i \right)} - \frac{\left(1 - \beta_{s_i} \right)}{\tanh \left[b_i d_i \left(1 - \beta_{s_i} \right) \right]} \right\rbrace \\ \end{align} \]
Rather than define an entirely new function that will compute the
predictions of the EBRW, it will be easier to add some functionality to
our existing ebrw_nll
function. Specifically, we will add a
new function argument called return.pred
which we can use
to tell the function whether we want it to return the model predictions
(return.pred = TRUE
) or just the negative summed
log-likelihood (return.pred = FALSE
).
Let’s focus our attention on the block of code we will add to our
ebrw_nll
function. It will be surrounded by an if statement
that checks whether return.pred
is TRUE
. If it
is, then we will compute the predictions for each trial.
This block of code is shown below, along with comments describing the function of each line:
if (return.pred) {
# Define a logical vector that is TRUE for trials in which the probe item was a target and FALSE otherwise. We define this vector by seeing whether at least one study item is equal to the probe item on that trial. If the number of study items that are equal to the probe item is greater than 0, that means the probe matches at least one studied item and is, therefore, a target.
is_target <- (rowSums(study_items == matrix(probe_item, nrow = nrow(study_items), ncol = ncol(study_items))) > 0)
# This is a vector giving the predicted probability of making a "yes" response for each trial.
p_resp <- (1 - exp(-2 * trial_standard_boundsep * trial_standard_drift * bias)) / (1 - exp(-2 * trial_standard_boundsep * trial_standard_drift))
# The following two vectors give the predicted mean "yes" and "no" response times for each trial.
mean_yes_rt <- resid + trial_standard_boundsep * (1 / tanh(trial_standard_drift * trial_standard_boundsep) - bias / tanh(trial_standard_drift * trial_standard_boundsep * bias)) / trial_standard_drift
mean_no_rt <- resid + trial_standard_boundsep * (1 / tanh(trial_standard_drift * trial_standard_boundsep) - (1 - bias) / tanh(trial_standard_drift * trial_standard_boundsep * (1 - bias))) / trial_standard_drift
# The following two lines use our knowledge about which response is correct to convert "yes" and "no" into "correct" and "error" response times, depending on whether the probe is a target or not.
mean_corr_rt <- mean_yes_rt * is_target + mean_no_rt * (1 - is_target)
mean_err_rt <- mean_yes_rt * (1 - is_target) + mean_no_rt * is_target
# Finally, since we are returning the predictions and not the summed negative log-likelihood, let's put all the predictions into a data frame where each row is a trial.
return(data.frame(p_resp = p_resp, mean_corr_rt = mean_corr_rt, mean_err_rt = mean_err_rt))
}
We will insert the block of code above just before we computed our log-likelihoods before.
# Function arguments:
# par: this is a named vector of parameter values
# dist: this is a matrix of perceived distances (derived from MDS) between all pairs of stimuli
# study_items: this is a matrix where each row is a trial and there are two columns; the value in the first column indicates the first studied item, the value in the second column indicates the second studied item
# probe_item: this is a vector giving the index of the probe item on each trial
# resp: this is a vector where each value is 0 or 1, depending on whether the participant responsed "yes" (1) or "no" (0) on that trial
# rt: this is a vector of the response times from each trial
# return.pred: A logical indicator telling the function whether we want the model predictions for each trial (TRUE) or just the negative summed log-likelihood (FALSE); default is "FALSE"
ebrw_nll <- function(par, dist, study_items, probe_item, resp, rt, return.pred = FALSE) {
# 1. Transform the given parameters into their appropriate ranges
# 1a. Find the minimum observed RT; this is the upper bound for the residual time parameter
min_rt <- min(rt)
# 1b. Transform the raw parameters as needed
retrieval_rate <- exp(par["retrieval_rate"])
boundsep <- exp(par["boundsep"])
bias <- plogis(par["bias"])
resid <- plogis(par["resid"]) * min_rt
strength1 <- exp(par["strength1"])
specificity <- exp(par["specificity"])
specificity_diff <- plogis(par["specificity_diff"])
within_sim <- exp(par["within_sim"])
criterion <- exp(par["criterion"])
# 1c. As noted above, this trick lets us take the overall specificity ("specificity") and allocate it to each of the positions according to "specificity_diff".
specificity_pos1 <- 2 * specificity * (1 - specificity_diff)
specificity_pos2 <- 2 * specificity * specificity_diff
# 2. Compute the mean and SD of the drift for each trial
# 2a. Compute the summed similarity between the probe and study items on each trial
trial_sum_sim <- strength1 * exp(-specificity_pos1 * dist[cbind(study_items[, 1], probe_item)]) + exp(-specificity_pos2 * dist[cbind(study_items[, 2], probe_item)])
# 2b. Compute the within-list similarity between study items on each trial
trial_list_sim <- (strength1 * exp(-specificity_pos1 * dist[study_items]) + exp(-specificity_pos2 * dist[study_items])) / 2
# 2c. Compute the probability of taking a step up in the random walk
trial_p <- trial_sum_sim / (trial_sum_sim + criterion + within_sim * trial_list_sim)
# 2d. Compute the mean and standard deviation of the corresponding diffusion
trial_drift_mean <- retrieval_rate * (2 * trial_p - 1)
trial_drift_sd <- 2 * sqrt(retrieval_rate * trial_p * (1 - trial_p))
# 3. Calculate the log-likelihood of each observed response/RT on each trial
# 3a. Standardize the drift rates and boundary separation for each trial
trial_standard_drift <- trial_drift_mean / trial_drift_sd
trial_standard_boundsep <- boundsep / trial_drift_sd
# --- If we only need to return the predictions, then compute and return them now ---
if (return.pred) {
# Define a logical vector that is TRUE for trials in which the probe item was a target and FALSE otherwise. We define this vector by seeing whether at least one study item is equal to the probe item on that trial. If the number of study items that are equal to the probe item is greater than 0, that means the probe matches at least one studied item and is, therefore, a target.
is_target <- (rowSums(study_items == matrix(probe_item, nrow = nrow(study_items), ncol = ncol(study_items))) > 0)
# This is a vector giving the predicted probability of making a "yes" response for each trial.
p_resp <- (1 - exp(-2 * trial_standard_boundsep * trial_standard_drift * bias)) / (1 - exp(-2 * trial_standard_boundsep * trial_standard_drift))
# The following two vectors give the predicted mean "yes" and "no" response times for each trial.
mean_yes_rt <- resid + trial_standard_boundsep * (1 / tanh(trial_standard_drift * trial_standard_boundsep) - bias / tanh(trial_standard_drift * trial_standard_boundsep * bias)) / trial_standard_drift
mean_no_rt <- resid + trial_standard_boundsep * (1 / tanh(trial_standard_drift * trial_standard_boundsep) - (1 - bias) / tanh(trial_standard_drift * trial_standard_boundsep * (1 - bias))) / trial_standard_drift
# The following two lines use our knowledge about which response is correct to convert "yes" and "no" into "correct" and "error" response times, depending on whether the probe is a target or not.
mean_corr_rt <- mean_yes_rt * is_target + mean_no_rt * (1 - is_target)
mean_err_rt <- mean_yes_rt * (1 - is_target) + mean_no_rt * is_target
# Finally, since we are returning the predictions and not the summed negative log-likelihood, let's put all the predictions into a data frame where each row is a trial.
return(data.frame(p_resp = p_resp, mean_corr_rt = mean_corr_rt, mean_err_rt = mean_err_rt))
}
# --- If we did not need to compute and return the predictions, the program will continue here ---
# 3b. Find the likelihoods of the given responses/RT
trial_wiener <- try(WienerPDF(t = rt, response = resp + 1, a = trial_standard_boundsep, v = trial_standard_drift, w = bias, t0 = resid))
# 4. Return final result
# 4a. Check if the Wiener PDF returned an error or not
if (class(trial_wiener) == "try-error") {
trial_log_likelihood <- -Inf
} else {
trial_log_likelihood <- trial_wiener$logvalue
}
# 4b. Compute regularizing prior terms to encourage (but not constrain) parameters to stay within reasonable ranges
log_prior <-
# Prevents from getting too large
dexp(retrieval_rate, rate = 1 / 50, log = TRUE) +
# Prevents from getting too large
dexp(boundsep, rate = 1 / 50, log = TRUE) +
# Prevents from getting too close to 0 or 1
dbeta(bias, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
# Prevents from getting too close to 0 or 1
dbeta(resid / min_rt, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
# Prevents from getting too large or small (Gamma distribution has mode = 1, SD = 1)
dgamma(strength1, shape = 2.618, rate = 1.618, log = TRUE) +
# Prevents from getting too large or small (Gamma distribution has mode = 1, SD = 10)
dgamma(specificity, shape = 1.1, rate = 0.1, log = TRUE) +
# Prevents from getting too close to 0 or 1
dbeta(specificity_diff, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
# Prevents from getting too large
dexp(within_sim, rate = 1 / 20, log = TRUE) +
# Prevents from getting too large
dexp(criterion, rate = 1 / 20, log = TRUE)
# 4b. The value to be returned is the negative of the summed log likelihood plus log prior
return(-(sum(trial_log_likelihood) + log_prior))
}