We consider a task in which a participant must select from a set of “choice items” one that most closely matches a “target item”. Such a task might include visual search, where the choice items are objects in a display and the target has been shown to the participant. Alternatively, we could consider a cued recall task in which the “choice items” are not present during the trial, but had been previously studied and are represented in memory; the “target item” would then be a retrieval cue. In any case, the evidence for choosing one of the available items is based on the degree of match between each choice item and the target item.
Assume that the similarity between choice item \(i\) and the target item can be expressed as \(s_i\), which is the cosine of the angle between a vector representing choice item \(i\) and the target item. These vectors have dimensionality \(p\). As described by Cox (in revision), a cosine similarity can be transformed into a log-likelihood ratio. This log-likelihood ratio, denoted \(\lambda_i\), represents the relative likelihood that two vectors represent the same item versus unrelated items. It is based on the Fisher-von Mises distribution of angles in a hypersphere. Vectors that encode the same item tend to point in the same direction and thus have higher cosine similarities. Meanwhile, vectors that encode unrelated items point in random directions such that the angle between them is uniformly distributed over the set of angles in the \(p\)-dimensional hypersphere.
The degree to which two vectors encoding the same item tend to point in the same direction is represented by the parameter \(\kappa\), which represents the precision with which the vectors are encoded. The log-likelihood ratio \(\lambda_i\) corresponding to cosine similarity \(s_i\) can thus be expressed as a function of the dimensionality \(p\) and precision \(\kappa\) with which the vectors are encoded: \[ \lambda_i = \kappa s_i + \left( \frac{p}{2} - 1 \right) \log \left( \frac{\kappa}{2} \right) - \log I_{\frac{p}{2} - 1} \left( \kappa \right) - \log \Gamma \left( \frac{p}{2} \right) \] where \(I_{\frac{p}{2} - 1} \left( \kappa \right)\) is the modified Bessel function of the first kind of order \(\frac{p}{2} - 1\) evaluated at \(\kappa\).
expand_grid(s = seq(0, 1, length.out = 5), precision = seq(0, 10, length.out = 101), n_dim = 3) %>%
mutate(ll = if_else(precision > 0, precision * s + (n_dim / 2 - 1) * log(precision / 2) - log(besselI(precision, n_dim / 2 - 1)) - lgamma(n_dim / 2), 0)) %>%
ggplot(aes(x = precision, y = ll, color = s, group = s)) +
geom_line() +
scale_color_gradient2(limits = c(0, 1), midpoint = 0.5, low = "#a50026", mid = "#fee090", high = "#313695") +
labs(x = expression("Precision" ~ kappa), y = expression("Log-likelihood ratio" ~ lambda[i]), color = expression("Similarity" ~ s[i])) +
plot_theme
The graph above shows how the log-likelihood ratio changes as precision increases for different degrees of similarity. Notably, the function is monotonically increasing for perfect matches (\(s_i = 1\)), monotonically decreasing for dissimilar items (\(s_i \leq 0\)), but non-monotonic for partially similar items (\(0 < s_i < 1\)). For imprecisely encoded items, partial similarity is viewed as evidence in favor of a match.
Because \(\lambda_i\) is a log-likelihood ratio, we can readily transform it into the posterior probability \(\pi_i\) that choice item \(i\) is a match to the target item: \[ \pi_i = \frac{\exp \lambda_i}{\exp \lambda_i + \omega_0} \] where \(\omega_0\) is the prior odds against any given item being a match. For example, if the task is set up such that there are \(N\) choice items and at most one of them is a match to the target, then \(\omega_0\) could be set to \(\frac{N - 1}{1}\). This transformation is illustrated below, assuming a prior odds of \(\omega_0 = 3\).
expand_grid(s = seq(0, 1, length.out = 5), precision = seq(0, 10, length.out = 101), n_dim = 3, prior_odds = 3) %>%
mutate(ll = if_else(precision > 0, precision * s + (n_dim / 2 - 1) * log(precision / 2) - log(besselI(precision, n_dim / 2 - 1)) - lgamma(n_dim / 2), 0)) %>%
mutate(post = exp(ll) / (exp(ll) + prior_odds)) %>%
ggplot(aes(x = precision, y = post, color = s, group = s)) +
geom_line() +
scale_color_gradient2(limits = c(0, 1), midpoint = 0.5, low = "#a50026", mid = "#fee090", high = "#313695") +
coord_cartesian(ylim = c(0, 1)) +
labs(x = expression("Precision" ~ kappa), y = expression("Posterior probability of match" ~ pi[i]), color = expression("Similarity" ~ s[i])) +
plot_theme
Note that this prior probability is for each item considered separately, it is not normalized across the set of choice items! However, this representation in terms of posterior probabilities instead of log-likelihood ratios would matter for calculating the expected value of choosing any particular item: \[ \text{ExpectedValue} \left( \text{Choose item } i \right) = \pi_i \text{Value}_{\text{Match}} + \left(1 - \pi_i \right) \text{Value}_{\text{Nonmatch}} \] For example, if the participant receives $1 for selecting a matching choice item (\(\text{Value}_{\text{Match}} = 1\)) and gets nothing for selecting a non-matching choice item (\(\text{Value}_{\text{Nonmatch}} = 0\)), then the expected value of choosing an item is exactly equal to its posterior probability of being a match. Thus, a participant attempting to maximize the expected value of their choice would need to rely on the posterior probability that any given item was a match.
To return to the choice task, we can specify that the goal of the participant is to choose an item with the greatest expected value. The expected value of an item is a function of the posterior probability that the item is a “match” to a target item. This posterior probability is, in turn, a function of how precisely the items are encoded. We will assume that encoding precision can be modulated dynamically over the course of a trial, such that precision \(\kappa\) is a function of time.
However, in any realistic situation, even if it were physically possible, it would be inadvisable to let precision increase over time indefinitely. As encoding precision increases, any deviation from perfect similarity between the choice and target items will come to be seen as evidence against a match. In any real system, there would be noise in how the choice items and/or the target item are represented. For example, if the target item were held in memory, this memory representation would undoubtedly fail to perfectly match a choice item that was otherwise identical to the target. Even if both the target and choice items were presented simultaneously, perceptual noise would play a role. Thus, increasing encoding precision would, at some point, reveal even minor imperfections in how the items were represented, resulting in all items being treated as non-matches and losing expected value.
Therefore, I suggest that the goal of encoding in this kind of task is not to achieve perfect verisimilitude, but instead to maximize the variability in expected value between choice items. A participant should encode items precisely enough to distinguish between high-value and low-value items, but not so precisely that all items begin to look like low-value items.
One way to quantify the variability in expected value of choice items is by computing the variance of the expected values between items. The example below assumes that there are 3 “non-matching” choice options with different degrees of partial match \(s_1 = 0\), \(s_2 = 0.1\), and \(s_3 = 0.2\). Finally, the “matching” choice option has \(s_4 = 0.9\), which is high but not perfect. For simplicity, I assume \(\text{Value}_{\text{Match}} = 1\) and \(\text{Value}_{\text{Nonmatch}} = 0\) so that the expected value of each item is directly equal to its posterior probability of being a match to the target. As above, I assume the prior odds against a match are \(\omega_0 = 3\) and the dimensionality is \(p = 3\).
plotDF <- expand_grid(s = c(0, 0.1, 0.2, 0.9), precision = seq(0, 30, length.out = 101), n_dim = 3, prior_odds = 3) %>%
mutate(ll = if_else(precision > 0, precision * s + (n_dim / 2 - 1) * log(precision / 2) - log(besselI(precision, n_dim / 2 - 1)) - lgamma(n_dim / 2), 0)) %>%
mutate(post = exp(ll) / (exp(ll) + prior_odds))
ev_plot <- plotDF %>%
ggplot(aes(x = precision, y = post, color = s, group = s)) +
geom_line() +
scale_color_gradient2(limits = c(0, 1), midpoint = 0.5, low = "#a50026", mid = "#fee090", high = "#313695") +
coord_cartesian(ylim = c(0, 1)) +
labs(x = expression("Precision" ~ kappa), y = "Expected value", color = expression("Similarity" ~ s[i])) +
plot_theme
var_plot <- plotDF %>%
group_by(precision) %>%
summarize(ev_var = var(post)) %>%
ggplot(aes(x = precision, y = ev_var)) +
geom_line() +
labs(x = expression("Precision" ~ kappa), y = "Variance of expected values") +
plot_theme
ev_plot + var_plot + plot_layout(nrow = 1, guides = "collect")
As the graph shows, because the “best” match is not perfect, its expected value starts to decrease when precision gets high enough. This causes the variance of the expected values to decrease as well. If the goal is to distinguish between high- and low-value items, then one should try to achieve the level of precision that maximizes the variance of the expected values across items and then stop increasing precision.
Unfortunately, it would not in general be possible for a system to know ahead of time what the optimal level of precision would be. After all, the whole reason that a participant forms a representation of the items is because they do not know the expected value of selecting those items! Therefore, I suggest the following dynamical principle for governing the level of encoding precision over time: Continue to increase the precision of encoding only to the extent that doing so increases the variance of the expected values of the choice items. In other words, the value of increasing precision is directly proportional to one’s expectation that doing so will allow one to better distinguish between the values of the choice items.
Mathematically, we can write this as a set of differential equations: \[ \begin{align} \frac{d \mathbb{V}\left( EV_i (t) \right)}{dt} & = \lim_{\Delta t \rightarrow 0} \frac{\mathbb{V}\left( EV_i (t) \right) - \mathbb{V} \left( EV_i (t - \Delta t) \right)}{\Delta t} \\ \frac{d \kappa}{dt} & = \alpha + \beta \frac{d \mathbb{V}\left( EV_i (t) \right)}{dt} \\ \end{align} \] where \(\mathbb{V}\left( EV_i ( \cdot ) \right)\) is the variance of the expected values of the choice items evaluated at a given time, \(\alpha\) is the base rate at which precision increases over time, and \(\beta\) is the extent to which that rate depends on the change in the variance of expected values (this change is denoted \(\frac{d \mathbb{V}\left( EV_i (t) \right)}{dt}\)). Thus, the equations above specify a system that will, in general, tend to encode items with greater precision over time, modulated by the effect this has on the variance of the expected values of the choice items. Early on, as increased precision helps to distinguish between low-value and high-value items, encoding precision will increase over time (and even accelerate). Later, as increased precision causes the high-value items to look worse and the variance of the expected values to decrease, encoding will decelerate until it eventually comes to a stop.
This behavior is illustrated in the graphs below. Below, I simulate situations in which there is one “matching” choice item and three “non-matching” or “foil” choice items. I vary the similarity \(s_4\) of the “matching” item between 0.55 and 0.95. In addition, I consider two sets of “non-matching” items: in one set, the maximum foil similarity is 0.1, where the foils have similarity \(s_1 = 0, s_2 = 0.05, s_3 = 0.1\); in the other set, the maximum foil similarity is 0.5, where the foils have similarity \(s_1 = 0, s_2 = 0.25, s_3 = 0.5\). The number of dimensions in the vector representations is \(p = 3\) and the prior odds is \(\omega_0 = 3\). I assume a value of \(1\) for choosing the “matching” item and a value of \(0\) for choosing a “non-match”, such that the expected value of an item is equal to its posterior probability of being a match to the target item. Finally, I set \(\alpha = 1\) and \(\beta = 500\).
encoding_dynamics <- function(t, y, pars) {
# y: precision, ev_variance
# pars: n_dim, prior_odds, precision_goal, precision_scale, precision_leak
with(as.list(c(y, pars)), {
if (precision > 0) {
ll <- precision * s + (n_dim / 2 - 1) * log(precision / 2) - log(besselI(precision, n_dim / 2 - 1)) - lgamma(n_dim / 2)
} else {
ll <- rep(0, length(s))
}
ev <- exp(ll) / (exp(ll) + prior_odds)
ev_variance_new <- var(ev)
d_ev_variance <- ev_variance_new - ev_variance
d_precision <- precision_base + precision_scale * d_ev_variance
return(list(c(d_precision, d_ev_variance)))
})
}
pars <- c(
n_dim = 3,
prior_odds = 3 / 1,
precision_base = 1,
precision_scale = 500
)
y_init <- c(
precision = 0,
ev_variance = 0
)
t <- seq(0, 2.5, by = 0.001)
max_target_s_vals <- c(0.55, 0.75, 0.9, 0.95)
max_foil_s_vals <- c(0.1, 0.5)
n_foil <- 3
encodingDynDF <- c()
for (max_foil_s in max_foil_s_vals) {
for (max_s in max_target_s_vals) {
s <- c(seq(0, max_foil_s, length.out = n_foil), max_s)
prec_dyn <- ode(y_init, t, encoding_dynamics, pars, method = 'bdf')
encodingDynDF <- rbind(
encodingDynDF,
expand_grid(nesting(t = prec_dyn[,'time'], precision = prec_dyn[,'precision'], ev_variance = prec_dyn[,'ev_variance']), nesting(item = 1:length(s), s = s)) %>%
mutate(ll = precision * s + (pars["n_dim"] / 2 - 1) * log(precision / 2) - log(besselI(precision, pars["n_dim"] / 2 - 1)) - lgamma(pars["n_dim"] / 2)) %>%
mutate(ev = exp(ll) / (exp(ll) + pars["prior_odds"])) %>%
mutate(max_target_s_factor = paste0("Target similarity = ", max_s), max_foil_s_factor = paste0("Max. foil similarity = ", max_foil_s))
)
}
}
post_plot <- encodingDynDF %>%
ggplot(aes(x = t, y = ev, color = s, group = interaction(item, max_foil_s_factor), linewidth = (item == length(s)), linetype = max_foil_s_factor)) +
geom_line() +
scale_linewidth_manual(values = c("TRUE" = 1.5, "FALSE" = 0.75), guide = "none") +
scale_linetype_discrete(guide = "none") +
scale_color_gradient2(limits = c(0, 1), midpoint = max_foil_s, low = "#a50026", mid = "#fee090", high = "#313695") +
coord_cartesian(ylim = c(0, 1)) +
facet_wrap("max_target_s_factor", nrow = 1) +
labs(x = "Time", y = "Posterior probability choice item is match", color = "Similarity between choice\nitem and test item", linetype = NULL) +
plot_theme +
theme(legend.position = "inside", legend.position.inside = c(0, 1), legend.justification = c(0, 1), legend.background = element_blank(), legend.box = "horizontal")
prec_plot <- encodingDynDF %>%
group_by(t, max_target_s_factor, max_foil_s_factor) %>%
summarize(precision = first(precision), ev_variance = first(ev_variance), max_s = max(s), .groups = "keep") %>%
ggplot(aes(x = t, y = precision, color = max_s, group = interaction(max_s, max_foil_s_factor), linetype = max_foil_s_factor)) +
geom_line() +
scale_color_gradient2(limits = c(0, 1), midpoint = max_foil_s, low = "#a50026", mid = "#fee090", high = "#313695") +
labs(x = "Time", y = "Encoding precision", color = "Target similarity", linetype = NULL) +
plot_theme +
theme(legend.position = "inside", legend.position.inside = c(0, 1), legend.justification = c(0, 1), legend.background = element_blank(), legend.box = "horizontal")
var_plot <- encodingDynDF %>%
group_by(t, max_target_s_factor, max_foil_s_factor) %>%
summarize(precision = first(precision), ev_variance = first(ev_variance), max_s = max(s), .groups = "keep") %>%
ggplot(aes(x = t, y = ev_variance, color = max_s, group = interaction(max_s, max_foil_s_factor), linetype = max_foil_s_factor)) +
geom_line() +
scale_color_gradient2(limits = c(0, 1), midpoint = max_foil_s, low = "#a50026", mid = "#fee090", high = "#313695", guide = "none") +
scale_linetype_discrete(guide = "none") +
labs(x = "Time", y = "Variance of posteriors across items", color = "Target similarity", linetype = NULL) +
plot_theme +
theme(legend.position = "inside", legend.position.inside = c(0, 1), legend.justification = c(0, 1), legend.background = element_blank())
post_plot + prec_plot + var_plot + plot_layout(design = "11\n23", guides = "keep")
There are some noteworthy features of the resulting dynamics:
For comparison, these are the encoding dynamics when \(\beta = 0\) such that precision increases linearly with time and is not modulated by how it affects the variance of the expected values. I also set \(\alpha = 15\). All other parameters are equal between the above graph and the one below.
encoding_dynamics <- function(t, y, pars) {
# y: precision, ev_variance
# pars: n_dim, prior_odds, precision_goal, precision_scale, precision_leak
with(as.list(c(y, pars)), {
if (precision > 0) {
ll <- precision * s + (n_dim / 2 - 1) * log(precision / 2) - log(besselI(precision, n_dim / 2 - 1)) - lgamma(n_dim / 2)
} else {
ll <- rep(0, length(s))
}
ev <- exp(ll) / (exp(ll) + prior_odds)
ev_variance_new <- var(ev)
d_ev_variance <- ev_variance_new - ev_variance
d_precision <- precision_base + precision_scale * d_ev_variance
return(list(c(d_precision, d_ev_variance)))
})
}
pars <- c(
n_dim = 3,
prior_odds = 3 / 1,
precision_base = 15,
precision_scale = 0
)
y_init <- c(
precision = 0,
ev_variance = 0
)
t <- seq(0, 2.5, by = 0.001)
max_target_s_vals <- c(0.55, 0.75, 0.9, 0.95)
max_foil_s_vals <- c(0.1, 0.5)
n_foil <- 3
encodingDynDF <- c()
for (max_foil_s in max_foil_s_vals) {
for (max_s in max_target_s_vals) {
s <- c(seq(0, max_foil_s, length.out = n_foil), max_s)
prec_dyn <- ode(y_init, t, encoding_dynamics, pars, method = 'bdf')
encodingDynDF <- rbind(
encodingDynDF,
expand_grid(nesting(t = prec_dyn[,'time'], precision = prec_dyn[,'precision'], ev_variance = prec_dyn[,'ev_variance']), nesting(item = 1:length(s), s = s)) %>%
mutate(ll = precision * s + (pars["n_dim"] / 2 - 1) * log(precision / 2) - log(besselI(precision, pars["n_dim"] / 2 - 1)) - lgamma(pars["n_dim"] / 2)) %>%
mutate(ev = exp(ll) / (exp(ll) + pars["prior_odds"])) %>%
mutate(max_target_s_factor = paste0("Target similarity = ", max_s), max_foil_s_factor = paste0("Max. foil similarity = ", max_foil_s))
)
}
}
post_plot <- encodingDynDF %>%
ggplot(aes(x = t, y = ev, color = s, group = interaction(item, max_foil_s_factor), linewidth = (item == length(s)), linetype = max_foil_s_factor)) +
geom_line() +
scale_linewidth_manual(values = c("TRUE" = 1.5, "FALSE" = 0.75), guide = "none") +
scale_linetype_discrete(guide = "none") +
scale_color_gradient2(limits = c(0, 1), midpoint = max_foil_s, low = "#a50026", mid = "#fee090", high = "#313695") +
coord_cartesian(ylim = c(0, 1)) +
facet_wrap("max_target_s_factor", nrow = 1) +
labs(x = "Time", y = "Posterior probability choice item is match", color = "Similarity between choice\nitem and test item", linetype = NULL) +
plot_theme +
theme(legend.position = "inside", legend.position.inside = c(0, 1), legend.justification = c(0, 1), legend.background = element_blank(), legend.box = "horizontal")
prec_plot <- encodingDynDF %>%
group_by(t, max_target_s_factor, max_foil_s_factor) %>%
summarize(precision = first(precision), ev_variance = first(ev_variance), max_s = max(s), .groups = "keep") %>%
ggplot(aes(x = t, y = precision, color = max_s, group = interaction(max_s, max_foil_s_factor), linetype = max_foil_s_factor)) +
geom_line() +
scale_color_gradient2(limits = c(0, 1), midpoint = max_foil_s, low = "#a50026", mid = "#fee090", high = "#313695") +
labs(x = "Time", y = "Encoding precision", color = "Target similarity", linetype = NULL) +
plot_theme +
theme(legend.position = "inside", legend.position.inside = c(0, 1), legend.justification = c(0, 1), legend.background = element_blank(), legend.box = "horizontal")
var_plot <- encodingDynDF %>%
group_by(t, max_target_s_factor, max_foil_s_factor) %>%
summarize(precision = first(precision), ev_variance = first(ev_variance), max_s = max(s), .groups = "keep") %>%
ggplot(aes(x = t, y = ev_variance, color = max_s, group = interaction(max_s, max_foil_s_factor), linetype = max_foil_s_factor)) +
geom_line() +
scale_color_gradient2(limits = c(0, 1), midpoint = max_foil_s, low = "#a50026", mid = "#fee090", high = "#313695", guide = "none") +
scale_linetype_discrete(guide = "none") +
labs(x = "Time", y = "Variance of posteriors across items", color = "Target similarity", linetype = NULL) +
plot_theme +
theme(legend.position = "inside", legend.position.inside = c(0, 1), legend.justification = c(0, 1), legend.background = element_blank())
post_plot + prec_plot + var_plot + plot_layout(design = "11\n23", guides = "keep")
The graph above verifies that, when encoding is not modulated by the variance in expected values, all items gradually lose value with increasing precision, since they are all less than perfect matches to the target.
I used the Cox (2024) “cosine-to-likelihood” approach for quantifying the degree of match, but I think many different approaches would work just fine. For example, instead of precision being a continuous quantity, one might consider precision as the number of features of an item that have been encoded, as in REM or DREAM. However encoding precision is operationalized, the present approach should apply so long as
I’m also not convinced that the variance of the expected values is necessarily the best thing to optimize. I picked that because it was easy to compute, but perhaps there are other consideration that could justify a different choice. On the other hand, maybe anything that is monotonically related to variance would work just as well?
I also think it would be possible for the prior odds \(\omega_0\) to be modulated dynamically too. For example, in a visual search task, the prior odds would probably relate to the number of objects in a display, but this information would not be available instantly—it would have to be perceived over time (even if this time course were very brief).