Visual Vocab Choice Model

Model of word recognition

Let’s assert a model of word recognition for some target and distractor \({t,d} \in O\). We want to compute \(p(t | w)\), the probability of the particular target given the word. By Bayes’ rule,

\[ \begin{align} p(t | w) &=& \frac{p(w | t) p(t)}{\sum_{o \in {t,d}}{p(w | o) p(o)}}\\ &=& \frac{p(w | t) p(t)}{p(w | t) p(t) + p(w | d) p(d) }.\\ \end{align} \tag{1}\]

The terms \(p(t)\) and \(p(d)\) are our priors over particular objects being the target. For now let’s say that \(p(o) \propto 1\) and ignore them for now.1

So how do we define the probability of a particular word being produced for a particular object? Let’s say again by Bayes’ rule that

\[ p(w | o) = \frac{p(o | w) p(w)}{\sum_{w' \in W}{p(o | w') p(w')}}. \]That is, it’s the probability of that object given the word, but now normalized by the set of other possible words. For the sake of argument, let’s again say that \(p(w) \propto 1\) so all words are equal.2 And let’s assume that \(W\) is just two words: \(w_t\) and \(w_d\), the names of the target and distractor.

Now substituting, we have:

\[ p (t | w_t) = \frac{\frac{p(t|w_t)}{p (t | w_t) + p(t | w_d)}}{\frac{p(t|w_t)}{p (t | w_t) + p(t | w_d)} + \frac{p(d|w_t)}{p (d | w_t) + p(d | w_d)}}. \]

Worked example

Let’s walk through this equation in the context of a simple word recognition trial. Say we have pictures of a DOG and a BABY, and the prompt is “baby.”

First, let’s think about the simpler model in Equation 1 where we just choose based on the probability of saying “baby” about a BABY vs “baby” about a DOG. That model seems like it works well for this scenario. But, it doesn’t work for a novel word scenario. For example, imagine a trial with a NOVEL object and a BABY, and the prompt is “blicket” (novel word). If we just assume a simple match score, then no one should solve this choice task because the participant doesn’t know that “blicket” names either object.

The second model will solve this example, however. The participant does know that BABY has a name. Let’s fill in this example, assuming that there’s some kind of low probability \(\epsilon\) for “blicket” with each word and \(p(\text{BABY} | \text{"baby"}) = 1\):

\[ \begin{align} p(\text{BLICKET}| \text{"blicket"}) &=& \frac{\frac{p(\text{BLICKET}| \text{"blicket"})}{p(\text{BLICKET}| \text{"blicket"}) + p(\text{BLICKET}| \text{"baby"})}}{ \frac{p(\text{BLICKET}| \text{"blicket"})}{p(\text{BLICKET}| \text{"blicket"}) + p(\text{BLICKET}| \text{"baby"})} + \frac{p(\text{BABY}| \text{"blicket"})}{p(\text{BABY}| \text{"blicket"}) + p(\text{BABY}| \text{"baby"})} }\\ &=& \frac{\frac{\epsilon}{\epsilon + \epsilon}}{ \frac{\epsilon}{\epsilon + \epsilon} + \frac{\epsilon}{\epsilon + 1} }\\ &=& \frac{.5}{ .5 + \epsilon }\\ &=& 1 \end{align} \]

That’s a lot of annoying math but the basic idea is that we get a mutual exclusivity effect out of this model. This is critical for modeling a vocabulary task because often participants will not know a word but we need to model their choice anyway via this kind of mutual exclusivity effect.

Footnotes

  1. They actually could be great for modeling salience effects later.↩︎

  2. Again, we could use this feature to model frequency effects.↩︎