Rational speech act model (RSA) of pragmatic inference

rrrsa is an R package for running RSA, a Bayesian model of pragmatic inference. rrrsa was created by Ben Peloquin in collaboration with Michael C. Frank.

What is RSA?

Rational speech act (RSA) models frame language understanding as a special case of social cognition in which speakers and listeners reason about one another recursively. A pragmatic listener \(P_{L_n}(m|u)\), reasons about intended meaning \(m\) of an utterance \(u\) by a rational speaker \(P_{s_n}(u|m)\) who chooses an utterance according to the expected utility of an utterance \(U(m;u)\). \(\alpha\) is a decision noise parameter.

\[P_{L_n}(m|u) \propto P_{S_n}(u|m)P(m)\] \[P_{S_n} \propto e^{U(m;u)}\] \[U(m;u) = -\alpha(-\log(P_{L_{n-1}}(m|u)) - C(u))\]

rrrsa includes access to all model components

rrrsa provides users with access to all model components. The following sections demonstrate how this functionality can be used.

library(rrrsa)
library(ggplot2)
library(dplyr)
library(tidyr)

Calculating the informativity an utterance

rsa.informativity() takes three arguments, literal semantics \(P_{L_0}\), alpha level (default 1), and cost (default 0). This function returns the surprisal of an utterance minus cost, multiplied by alpha.

rsa.informativity(0.4)
## [1] 0.4
rsa.informativity(rsa.informativity(0.4), alpha = 2, cost = 0.5)
## [1] 0.4349251

Calculating the utility of an utterance

rsa.utility takes an input vector of literal listener semantics and outputs a normalized vector of speaker likelihoods. If costs are not specified the default 0’s vector is used. If alpha is not specified a default value of \(1\) is used.

literalSemantics <- c(0.0, 0.0, 0.3, 0.3, 0.4)
costs <- c(0.0, 0.0, 0.2, 0.3, 0.4)
rsa.utility(items = literalSemantics, costs = costs, alpha = 3)
## [1] 0.0000000 0.0000000 0.1499485 0.2024093 0.6476421

Computing one full recursion

In the RSA framework one full recursion consists of a pragmatic listner \(P_{L_1}\) who reasons about a rational speaker \(P_{s_1}\) who reason about a literal listener \(P_{L_0}\). Expected input is an \(m\) matrix of \(P_{L_0}\) literal listener values in which columns corresond to items (words) and rows correspond to semantic quantity (stars in Peloquin & Frank, under review). Optional arguments include a costs vector which whould be the same length as ncol and an optional priors vector which should the same length as nrows. rsa.fullRecursion provides safety checking for these cases. Output corresponds with pragmatic listener posterior predictions.

m <- matrix(data = c(1.0, 0.0, 0.0, 0.0, 0.0,
                     0.0, 0.25, 0.25, 0.25, 0.25,
                     0.0, 0.0, 0.0, 0.0, 1.0), nrow = 5)
colnames(m) <- c("none", "some", "all")
rownames(m) <- 1:5
# costs <- c("none" = 0, "some" = 0, "all" = 0)
# priors <- rnorm(n = nrow(m), mean = 0.5, sd = 0.1)
res <- rsa.fullRecursion(m = m)
res <- as.data.frame(res) %>%
  mutate(quantity = rownames(.))

pragmaticsTidied <- res %>%
  gather(word, pragmatics, -quantity)
semanticsTidied <- as.data.frame(m) %>%
  mutate(quantity = rownames(.)) %>%
  gather(word, semantics, c(none, some, all))
fullData <- merge(pragmaticsTidied, semanticsTidied) %>%
  gather(type, value, c(pragmatics, semantics)) %>%
  mutate(quantity = as.numeric(quantity),
         lineSize = ifelse(word == "some", 2, 1))

ggplot(fullData, aes(x = quantity, y = value, col = word)) +
  geom_line(aes(size=lineSize)) +
  facet_wrap(~type)

Running multiple recursions

rsa.reason is really a wrapper function for rsa.fullRecursion which provides an additional depth parameter which specifies the recursive depth during reasoning. If depth is not provided, default value is \(1\).

all(rsa.reason(m = m, depth = 2) == rsa.fullRecursion(rsa.fullRecursion(m = m)))
## [1] TRUE
rsa.reason(m = m, depth = 2)
##   none       some all
## 1    1 0.00000000   0
## 2    0 0.32692308   0
## 3    0 0.32692308   0
## 4    0 0.32692308   0
## 5    0 0.01923077   1

Running data frames

Run RSA on a tidied data frame and avoid running individual model components individually with rsa.runDf. An RSA-ready, tidied data frame must contian columns for semantic quantity, item and semantics, where each row corresponds with unique item/quantity combination. A user should specify their naming convention for these items in the quantityVarName, itemVarName and semanticsVarName arguments. The costVarName and priorsVarName args correspond with costs and/or priors data. Users can specify values for alpha and depth hyperparamenters. runDf will return a data frame with a new model predictions preds column appended.

## Hypothetical literal listener data we might want to compute RSA on.
df <- data.frame(scales = rep("some_all", 15),
                  stars = as.factor(rep(1:5, 3)),
                  starsChar = as.factor(rep(c("one", "two", "three", "four", "five"), 3)),
                  words = c(rep("all", 5), rep("some", 5), rep("none", 5)),
                  listenerSemantics = c(rep(0.0, 4), 1.0,
                                0.0, rep(0.25, 4),
                                1.0, rep(0.0, 4)))
rsa.runDf(df, quantityVarName = "stars", semanticsVarName = "listenerSemantics", itemVarName = "words")
##      scales stars starsChar words listenerSemantics  preds
## 1  some_all     1       one   all              0.00 0.0000
## 2  some_all     2       two   all              0.00 0.0000
## 3  some_all     3     three   all              0.00 0.0000
## 4  some_all     4      four   all              0.00 0.0000
## 5  some_all     5      five   all              1.00 1.0000
## 6  some_all     1       one  some              0.00 0.0000
## 7  some_all     2       two  some              0.25 0.3125
## 8  some_all     3     three  some              0.25 0.3125
## 9  some_all     4      four  some              0.25 0.3125
## 10 some_all     5      five  some              0.25 0.0625
## 11 some_all     1       one  none              1.00 1.0000
## 12 some_all     2       two  none              0.00 0.0000
## 13 some_all     3     three  none              0.00 0.0000
## 14 some_all     4      four  none              0.00 0.0000
## 15 some_all     5      five  none              0.00 0.0000

Importantly, runDf maintains all column naming and can handle multiple data types. For example, we can run runDf with a character vector for quantity (contrast with the factor vector used above):

all(rsa.runDf(df, quantityVarName = "starsChar",
              semanticsVarName = "listenerSemantics",
            itemVarName = "words") ==
       rsa.runDf(df, quantityVarName = "stars",
            semanticsVarName = "listenerSemantics",
            itemVarName = "words"))
## [1] TRUE

A frequent use case for RSA will require running RSA over multiple groups of data. Rather than subsetting data frames and running RSA iteratively, we recommend using ddply from the plyr package. Users can supply the grouping variable in ddply’s .vars argument

df <- data.frame(scales = c(rep("some_all", 10), rep("good_excellent", 10)),
                  stars = as.factor(rep(1:5, 4)),
                  words = c(rep("all", 5), rep("some", 5), c(rep("excellent", 5), rep("good", 5))),
                  listenerSemantics = c(rep(0.0, 4), 1.0,
                                0.0, rep(0.25, 4),
                                rep(0.0, 4), 1.0,
                                0.0, rep(0.25, 4))) %>% mutate(priors = 0.20)
df$costs <- c(rep(3, 5), rep(4, 5), rep(9, 5), rep(4, 5))

tail(plyr::ddply(.data = df, .variables = c("scales"), rsa.runDf, quantityVarName = "stars",
      semanticsVarName = "listenerSemantics", itemVarName = "words", costsVarName = "costs", depth = 2), n = 3)
##      scales stars words listenerSemantics priors costs      preds
## 18 some_all     3  some              0.25    0.2     4 0.30824544
## 19 some_all     4  some              0.25    0.2     4 0.30824544
## 20 some_all     5  some              0.25    0.2     4 0.07526367

We can see this more clearly using data from Peloquin & Frank (under review). Here the grouping variable represents five different scalar familes (5 in total) named scale.

d <- peloquinFrank_2Alts
head(plyr::ddply(.data = d, .variables = c("scale"), rsa.runDf, quantityVarName = "stars",
              semanticsVarName = "speaker.p", itemVarName = "words"), n = 3)
##   exp          scale stars  speaker.p     words e11 e6     preds
## 1  e8 good_excellent     1 0.00000000 excellent   0  0 0.0000000
## 2  e8 good_excellent     2 0.03448276 excellent   0  0 0.2998664
## 3  e8 good_excellent     3 0.06896552 excellent   0  0 0.1203004
tail(plyr::ddply(.data = d, .variables = c("scale"), rsa.runDf, quantityVarName = "stars",
              semanticsVarName = "speaker.p", itemVarName = "words"), n = 3)
##    exp    scale stars speaker.p words        e11        e6     preds
## 48  e8 some_all     3 0.2325581  some 0.83720930 0.8536585 0.3610177
## 49  e8 some_all     4 0.3023256  some 0.04651163 0.0000000 0.1611409
## 50  e8 some_all     5 0.2790698  some 0.00000000 0.0000000 0.1168237

Tuning hyperparamaters

If a user has pragmatic judgment data we can use the tuneDepthAlpha function to tune hyperparameters. Calling this function requires the same argument list as a simple runDf with the addition of alphas and depths vectors to iterate over as well as the column of pragmatic judgments specified through the compareDataVarName argument, which we’ll be using as the basis of comparison.

d <- peloquinFrank_5Alts
alphas <- seq(1, 5, by = 0.1)
depths <- 1:3

Users may also want to check only a subset of the items in their data frame for tuning. In the case of Peloquin & Frank (under review) we only wanted to tune against entailment pairs. The compareIndices argument allows the user to pass a vector of items for subsetting.

checkWords <- c("some", "all", "good", "excellent", "liked", "loved", "memorable", "unforgettable",
                "palatable", "delicious")
compareIndices <- which(peloquinFrank_5Alts$words %in% checkWords)
results <- rsa.tuneDepthAlpha(data = d, groupName = "scale", 
                     quantityVarName = "stars", itemVarName = "words",
                     semanticsVarName = "speaker.p", compareDataName = "e11",
                     compareIndices = compareIndices, alphas = alphas, depths = depths)
head(results, n = 5)
##           cor depth alpha
## 107 0.9133654     1   4.5
## 104 0.9133501     1   4.4
## 110 0.9133053     1   4.6
## 101 0.9132532     1   4.3
## 113 0.9131754     1   4.7

rsa.tuneDepthAlpha returns a data frame correlation, depth, alpha columns which an be used for plotting or users can simply sort to find the hyperparamter combination leading to best model fit. Let’s take a look at the tuning.

ggplot(results, aes(x = alpha, y = cor, col = as.factor(depth))) +
  geom_point(size=3, alpha=0.5) +
  geom_vline(aes(xintercept = results$alpha[1]))

Let’s take a look at how RSA performs with best model hyper-parameters correlation = 0.913, alpha = 4.5, depth = 1.

bestModel <- plyr::ddply(.data = peloquinFrank_5Alts, .variables = c("scale"), rsa.runDf, quantityVarName = "stars",
              semanticsVarName = "speaker.p", itemVarName = "words", alpha = 4.5) %>%
  filter(words %in% checkWords)

ggplot(bestModel, aes(x = stars, y = e11, col=words)) +
  geom_point(size=5, alpha=0.5) +
  facet_wrap(~scale) +
  geom_line(aes(y = preds), size=1, lty=4)

Data from Peloquin & Frank (under review)

rrrsa includes empirical literal listener \(P_{L_0}\) which can be used as input to rrrsa as well as \(P_{L_1}\) pragmatic judgments for model tuning and comparison. Four data sets are included:

peloquinFrank_2Alts: data set with entailment alternatives

peloquinFrank_3Alts: data set with entailment alternatives + universal none

peloquinFrank_4Alts: data set with entailment alternatives + top two empirically derived alts

peloquinFrank_5Alts: data set with entailment alternatives + top two empirically derived alts + neutral valence alternative

str(peloquinFrank_2Alts)
## 'data.frame':    50 obs. of  7 variables:
##  $ exp      : Factor w/ 3 levels "e10","e12","e8": 3 3 3 3 3 3 3 3 3 3 ...
##  $ scale    : Factor w/ 5 levels "good_excellent",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ stars    : int  1 2 3 4 5 1 2 3 4 5 ...
##  $ speaker.p: num  0 0.0345 0.069 0.4138 0.4828 ...
##  $ words    : chr  "excellent" "excellent" "excellent" "excellent" ...
##  $ e11      : num  0 0 0 0.302 0.698 ...
##  $ e6       : num  0 0 0 0.0732 0.9268 ...
str(peloquinFrank_3Alts)
## 'data.frame':    75 obs. of  7 variables:
##  $ exp      : Factor w/ 3 levels "e10","e12","e8": 3 3 3 3 3 3 3 3 3 3 ...
##  $ scale    : Factor w/ 5 levels "good_excellent",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ stars    : int  1 2 3 4 5 1 2 3 4 5 ...
##  $ speaker.p: num  0 0.0345 0.069 0.4138 0.4828 ...
##  $ words    : chr  "excellent" "excellent" "excellent" "excellent" ...
##  $ e11      : num  0 0 0 0.302 0.698 ...
##  $ e6       : num  0 0 0 0.0732 0.9268 ...
str(peloquinFrank_4Alts)
## Classes 'tbl_df' and 'data.frame':   100 obs. of  7 variables:
##  $ exp      : Factor w/ 3 levels "e10","e12","e8": 1 1 1 1 1 1 1 1 1 1 ...
##  $ scale    : Factor w/ 5 levels "good_excellent",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ stars    : int  1 2 3 4 5 1 2 3 4 5 ...
##  $ speaker.p: num  0.0222 0 0.0667 0.4 0.5111 ...
##  $ words    : chr  "excellent" "excellent" "excellent" "excellent" ...
##  $ e11      : num  0 0 0 0.302 0.698 ...
##  $ e6       : num  0 0 0 0.0732 0.9268 ...
str(peloquinFrank_5Alts)
## Classes 'tbl_df' and 'data.frame':   125 obs. of  7 variables:
##  $ exp      : Factor w/ 3 levels "e10","e12","e8": 2 2 2 2 2 2 2 2 2 2 ...
##  $ scale    : Factor w/ 5 levels "good_excellent",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ stars    : int  1 2 3 4 5 1 2 3 4 5 ...
##  $ speaker.p: num  0.0426 0 0.0213 0.3617 0.5745 ...
##  $ words    : chr  "excellent" "excellent" "excellent" "excellent" ...
##  $ e11      : num  0 0 0 0.302 0.698 ...
##  $ e6       : num  0 0 0 0.0732 0.9268 ...