rrrsa
is an R package for running RSA, a Bayesian model of pragmatic inference. rrrsa
was created by Ben Peloquin in collaboration with Michael C. Frank.
Rational speech act (RSA) models frame language understanding as a special case of social cognition in which speakers
and listeners
reason about one another recursively. A pragmatic listener
\(P_{L_n}(m|u)\), reasons about intended meaning \(m\) of an utterance \(u\) by a rational speaker
\(P_{s_n}(u|m)\) who chooses an utterance according to the expected utility of an utterance \(U(m;u)\). \(\alpha\) is a decision noise parameter.
\[P_{L_n}(m|u) \propto P_{S_n}(u|m)P(m)\] \[P_{S_n} \propto e^{U(m;u)}\] \[U(m;u) = -\alpha(-\log(P_{L_{n-1}}(m|u)) - C(u))\]
rrrsa
provides users with access to all model components. The following sections demonstrate how this functionality can be used.
library(rrrsa)
library(ggplot2)
library(dplyr)
library(tidyr)
rsa.informativity()
takes three arguments, literal semantics \(P_{L_0}\), alpha
level (default 1), and cost
(default 0). This function returns the surprisal of an utterance minus cost, multiplied by alpha.
rsa.informativity(0.4)
## [1] 0.4
rsa.informativity(rsa.informativity(0.4), alpha = 2, cost = 0.5)
## [1] 0.4349251
rsa.utility
takes an input vector of literal listener semantics and outputs a normalized vector of speaker likelihoods. If costs
are not specified the default 0’s vector is used. If alpha
is not specified a default value of \(1\) is used.
literalSemantics <- c(0.0, 0.0, 0.3, 0.3, 0.4)
costs <- c(0.0, 0.0, 0.2, 0.3, 0.4)
rsa.utility(items = literalSemantics, costs = costs, alpha = 3)
## [1] 0.0000000 0.0000000 0.1499485 0.2024093 0.6476421
In the RSA framework one full recursion consists of a pragmatic listner
\(P_{L_1}\) who reasons about a rational speaker
\(P_{s_1}\) who reason about a literal listener
\(P_{L_0}\). Expected input is an \(m\) matrix of \(P_{L_0}\) literal listener
values in which columns corresond to items (words) and rows correspond to semantic quantity (stars in Peloquin & Frank, under review). Optional arguments include a costs
vector which whould be the same length as ncol
and an optional priors
vector which should the same length as nrows
. rsa.fullRecursion
provides safety checking for these cases. Output corresponds with pragmatic listener posterior predictions.
m <- matrix(data = c(1.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.25, 0.25, 0.25, 0.25,
0.0, 0.0, 0.0, 0.0, 1.0), nrow = 5)
colnames(m) <- c("none", "some", "all")
rownames(m) <- 1:5
# costs <- c("none" = 0, "some" = 0, "all" = 0)
# priors <- rnorm(n = nrow(m), mean = 0.5, sd = 0.1)
res <- rsa.fullRecursion(m = m)
res <- as.data.frame(res) %>%
mutate(quantity = rownames(.))
pragmaticsTidied <- res %>%
gather(word, pragmatics, -quantity)
semanticsTidied <- as.data.frame(m) %>%
mutate(quantity = rownames(.)) %>%
gather(word, semantics, c(none, some, all))
fullData <- merge(pragmaticsTidied, semanticsTidied) %>%
gather(type, value, c(pragmatics, semantics)) %>%
mutate(quantity = as.numeric(quantity),
lineSize = ifelse(word == "some", 2, 1))
ggplot(fullData, aes(x = quantity, y = value, col = word)) +
geom_line(aes(size=lineSize)) +
facet_wrap(~type)
rsa.reason
is really a wrapper function for rsa.fullRecursion
which provides an additional depth
parameter which specifies the recursive depth during reasoning. If depth is not provided, default value is \(1\).
all(rsa.reason(m = m, depth = 2) == rsa.fullRecursion(rsa.fullRecursion(m = m)))
## [1] TRUE
rsa.reason(m = m, depth = 2)
## none some all
## 1 1 0.00000000 0
## 2 0 0.32692308 0
## 3 0 0.32692308 0
## 4 0 0.32692308 0
## 5 0 0.01923077 1
Run RSA on a tidied data frame and avoid running individual model components individually with rsa.runDf
. An RSA-ready, tidied data frame must contian columns for semantic quantity
, item
and semantics
, where each row corresponds with unique item
/quantity
combination. A user should specify their naming convention for these items in the quantityVarName
, itemVarName
and semanticsVarName
arguments. The costVarName
and priorsVarName
args correspond with costs
and/or priors
data. Users can specify values for alpha
and depth
hyperparamenters. runDf
will return a data frame with a new model predictions preds
column appended.
## Hypothetical literal listener data we might want to compute RSA on.
df <- data.frame(scales = rep("some_all", 15),
stars = as.factor(rep(1:5, 3)),
starsChar = as.factor(rep(c("one", "two", "three", "four", "five"), 3)),
words = c(rep("all", 5), rep("some", 5), rep("none", 5)),
listenerSemantics = c(rep(0.0, 4), 1.0,
0.0, rep(0.25, 4),
1.0, rep(0.0, 4)))
rsa.runDf(df, quantityVarName = "stars", semanticsVarName = "listenerSemantics", itemVarName = "words")
## scales stars starsChar words listenerSemantics preds
## 1 some_all 1 one all 0.00 0.0000
## 2 some_all 2 two all 0.00 0.0000
## 3 some_all 3 three all 0.00 0.0000
## 4 some_all 4 four all 0.00 0.0000
## 5 some_all 5 five all 1.00 1.0000
## 6 some_all 1 one some 0.00 0.0000
## 7 some_all 2 two some 0.25 0.3125
## 8 some_all 3 three some 0.25 0.3125
## 9 some_all 4 four some 0.25 0.3125
## 10 some_all 5 five some 0.25 0.0625
## 11 some_all 1 one none 1.00 1.0000
## 12 some_all 2 two none 0.00 0.0000
## 13 some_all 3 three none 0.00 0.0000
## 14 some_all 4 four none 0.00 0.0000
## 15 some_all 5 five none 0.00 0.0000
Importantly, runDf
maintains all column naming and can handle multiple data types. For example, we can run runDf
with a character vector for quantity
(contrast with the factor vector used above):
all(rsa.runDf(df, quantityVarName = "starsChar",
semanticsVarName = "listenerSemantics",
itemVarName = "words") ==
rsa.runDf(df, quantityVarName = "stars",
semanticsVarName = "listenerSemantics",
itemVarName = "words"))
## [1] TRUE
A frequent use case for RSA will require running RSA over multiple groups of data. Rather than subsetting data frames and running RSA iteratively, we recommend using ddply
from the plyr
package. Users can supply the grouping variable in ddply’s .vars
argument
df <- data.frame(scales = c(rep("some_all", 10), rep("good_excellent", 10)),
stars = as.factor(rep(1:5, 4)),
words = c(rep("all", 5), rep("some", 5), c(rep("excellent", 5), rep("good", 5))),
listenerSemantics = c(rep(0.0, 4), 1.0,
0.0, rep(0.25, 4),
rep(0.0, 4), 1.0,
0.0, rep(0.25, 4))) %>% mutate(priors = 0.20)
df$costs <- c(rep(3, 5), rep(4, 5), rep(9, 5), rep(4, 5))
tail(plyr::ddply(.data = df, .variables = c("scales"), rsa.runDf, quantityVarName = "stars",
semanticsVarName = "listenerSemantics", itemVarName = "words", costsVarName = "costs", depth = 2), n = 3)
## scales stars words listenerSemantics priors costs preds
## 18 some_all 3 some 0.25 0.2 4 0.30824544
## 19 some_all 4 some 0.25 0.2 4 0.30824544
## 20 some_all 5 some 0.25 0.2 4 0.07526367
We can see this more clearly using data from Peloquin & Frank (under review). Here the grouping variable represents five different scalar familes (5 in total) named scale
.
d <- peloquinFrank_2Alts
head(plyr::ddply(.data = d, .variables = c("scale"), rsa.runDf, quantityVarName = "stars",
semanticsVarName = "speaker.p", itemVarName = "words"), n = 3)
## exp scale stars speaker.p words e11 e6 preds
## 1 e8 good_excellent 1 0.00000000 excellent 0 0 0.0000000
## 2 e8 good_excellent 2 0.03448276 excellent 0 0 0.2998664
## 3 e8 good_excellent 3 0.06896552 excellent 0 0 0.1203004
tail(plyr::ddply(.data = d, .variables = c("scale"), rsa.runDf, quantityVarName = "stars",
semanticsVarName = "speaker.p", itemVarName = "words"), n = 3)
## exp scale stars speaker.p words e11 e6 preds
## 48 e8 some_all 3 0.2325581 some 0.83720930 0.8536585 0.3610177
## 49 e8 some_all 4 0.3023256 some 0.04651163 0.0000000 0.1611409
## 50 e8 some_all 5 0.2790698 some 0.00000000 0.0000000 0.1168237
If a user has pragmatic judgment
data we can use the tuneDepthAlpha
function to tune hyperparameters. Calling this function requires the same argument list as a simple runDf
with the addition of alphas
and depths
vectors to iterate over as well as the column of pragmatic judgments
specified through the compareDataVarName
argument, which we’ll be using as the basis of comparison.
d <- peloquinFrank_5Alts
alphas <- seq(1, 5, by = 0.1)
depths <- 1:3
Users may also want to check only a subset of the items in their data frame for tuning. In the case of Peloquin & Frank (under review) we only wanted to tune against entailment
pairs. The compareIndices
argument allows the user to pass a vector of items for subsetting.
checkWords <- c("some", "all", "good", "excellent", "liked", "loved", "memorable", "unforgettable",
"palatable", "delicious")
compareIndices <- which(peloquinFrank_5Alts$words %in% checkWords)
results <- rsa.tuneDepthAlpha(data = d, groupName = "scale",
quantityVarName = "stars", itemVarName = "words",
semanticsVarName = "speaker.p", compareDataName = "e11",
compareIndices = compareIndices, alphas = alphas, depths = depths)
head(results, n = 5)
## cor depth alpha
## 107 0.9133654 1 4.5
## 104 0.9133501 1 4.4
## 110 0.9133053 1 4.6
## 101 0.9132532 1 4.3
## 113 0.9131754 1 4.7
rsa.tuneDepthAlpha
returns a data frame correlation
, depth
, alpha
columns which an be used for plotting or users can simply sort to find the hyperparamter combination leading to best model fit. Let’s take a look at the tuning.
ggplot(results, aes(x = alpha, y = cor, col = as.factor(depth))) +
geom_point(size=3, alpha=0.5) +
geom_vline(aes(xintercept = results$alpha[1]))
Let’s take a look at how RSA performs with best model hyper-parameters correlation = 0.913, alpha = 4.5, depth = 1.
bestModel <- plyr::ddply(.data = peloquinFrank_5Alts, .variables = c("scale"), rsa.runDf, quantityVarName = "stars",
semanticsVarName = "speaker.p", itemVarName = "words", alpha = 4.5) %>%
filter(words %in% checkWords)
ggplot(bestModel, aes(x = stars, y = e11, col=words)) +
geom_point(size=5, alpha=0.5) +
facet_wrap(~scale) +
geom_line(aes(y = preds), size=1, lty=4)
rrrsa
includes empirical literal listener \(P_{L_0}\) which can be used as input to rrrsa
as well as \(P_{L_1}\) pragmatic judgments for model tuning and comparison. Four data sets are included:
peloquinFrank_2Alts
: data set with entailment alternatives
peloquinFrank_3Alts
: data set with entailment alternatives + universal none
peloquinFrank_4Alts
: data set with entailment alternatives + top two empirically derived alts
peloquinFrank_5Alts
: data set with entailment alternatives + top two empirically derived alts + neutral valence alternative
str(peloquinFrank_2Alts)
## 'data.frame': 50 obs. of 7 variables:
## $ exp : Factor w/ 3 levels "e10","e12","e8": 3 3 3 3 3 3 3 3 3 3 ...
## $ scale : Factor w/ 5 levels "good_excellent",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ stars : int 1 2 3 4 5 1 2 3 4 5 ...
## $ speaker.p: num 0 0.0345 0.069 0.4138 0.4828 ...
## $ words : chr "excellent" "excellent" "excellent" "excellent" ...
## $ e11 : num 0 0 0 0.302 0.698 ...
## $ e6 : num 0 0 0 0.0732 0.9268 ...
str(peloquinFrank_3Alts)
## 'data.frame': 75 obs. of 7 variables:
## $ exp : Factor w/ 3 levels "e10","e12","e8": 3 3 3 3 3 3 3 3 3 3 ...
## $ scale : Factor w/ 5 levels "good_excellent",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ stars : int 1 2 3 4 5 1 2 3 4 5 ...
## $ speaker.p: num 0 0.0345 0.069 0.4138 0.4828 ...
## $ words : chr "excellent" "excellent" "excellent" "excellent" ...
## $ e11 : num 0 0 0 0.302 0.698 ...
## $ e6 : num 0 0 0 0.0732 0.9268 ...
str(peloquinFrank_4Alts)
## Classes 'tbl_df' and 'data.frame': 100 obs. of 7 variables:
## $ exp : Factor w/ 3 levels "e10","e12","e8": 1 1 1 1 1 1 1 1 1 1 ...
## $ scale : Factor w/ 5 levels "good_excellent",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ stars : int 1 2 3 4 5 1 2 3 4 5 ...
## $ speaker.p: num 0.0222 0 0.0667 0.4 0.5111 ...
## $ words : chr "excellent" "excellent" "excellent" "excellent" ...
## $ e11 : num 0 0 0 0.302 0.698 ...
## $ e6 : num 0 0 0 0.0732 0.9268 ...
str(peloquinFrank_5Alts)
## Classes 'tbl_df' and 'data.frame': 125 obs. of 7 variables:
## $ exp : Factor w/ 3 levels "e10","e12","e8": 2 2 2 2 2 2 2 2 2 2 ...
## $ scale : Factor w/ 5 levels "good_excellent",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ stars : int 1 2 3 4 5 1 2 3 4 5 ...
## $ speaker.p: num 0.0426 0 0.0213 0.3617 0.5745 ...
## $ words : chr "excellent" "excellent" "excellent" "excellent" ...
## $ e11 : num 0 0 0 0.302 0.698 ...
## $ e6 : num 0 0 0 0.0732 0.9268 ...