The following will demonstrate that we can use an on-line (incremental) parameter approximation technique with equivalent accuracy as an off-line parameter approximation using Hamiltonian Monte Carlo MCMC as implemented in Stan (Carpenter et al., 2016) to uncover a parameter value. To demonstrate this we simulate data based on an known parameter value and used brms
(Bürkner, 2017) and then our on-line grid approximation to estimate the parameter value.
The goal of is to implement real-time Bayesian updating to estimate the parameter value and the probability that a participant is currently struggling with their process process for a range of different behavioural measures obtained from the writing process (pausing, editing, lookbacks in text).
Bayesian methods are ideal in this context because they are conditional on prior information that can be used to evaluate the probability of an observation \(y\) under the posterior distribution of the parameter value \(\lambda\). Because incremental updating, for our purposes, does not require estimating large numbers of parameters, we can use an analytical solution – i.e. grid approximation – which does not require computationally demanding methods and can therefore be used on-line. See e.g. McElreath (2020) Chapter 2.4 and Johnson et al. (2022) Chapter 6.1 for an overview.
Simulate data: lookbacks
We can model whether a participant is frequently re-reading already-produced text. Frequent re-reading is disrupting the writing process. In this example we will simulate the number of lookback into the text for \(N\) increments with each comprising a certain number of key transitions (let’s assume 50 just as example). Less lookbacks is not problematic during (prior to final revision); however more lookbacks has potential knock-on effects on higher level processes.
We will simulate n_increments
observations that come from a distribution \(\text{Poisson}(\lambda)\) where the true parameter value for \(\lambda\)= 4.
# Simulate lookback data
<- 50
keys_per_increment # avg. number of lookbacks per 50 keys
<- 4 # this is the parameter to recover
lambda # number of increments
<- 50
n_increments # Set seed for replicatability
set.seed(365)
# simulate data for n increments
<- tibble(increment = 1:n_increments,
sim_data sum_lookbacks = rpois(n = n_increments,
lambda = lambda))
Brief overview of data. Observations are increments and the total of lookbacks per increment.
Rows: 50
Columns: 2
$ increment <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ sum_lookbacks <int> 6, 2, 2, 1, 4, 4, 4, 7, 5, 5, 2, 2, 6, 4, 7, 6, 6, 9, 5,…
Off-line parameter-value estimation
First we used brms
to obtain a draw posterior samples from a Poisson model of the simulated data.
# Sampling parameters
<- 3
n_cores <- 3
n_chains <- 20000
iterations
# Specify model
<- bf(sum_lookbacks ~ 1, family = poisson())
model
# Run model
<- brm(model,
fit data = sim_data,
chains = n_chains,
cores = n_cores,
iter = iterations,
warmup = iterations/2,
seed = 365)
The off-line model estimated the true parameter value for \(\lambda\) to be \(\hat\lambda=\) 4.19 with a 95% probability interval of [3.66 – 4.79]. In other words, as was to be expected, the off-line model successfully uncovered the parameter value for \(\lambda\).
On-line parameter-value estimation
For the real-time parameter estimation we use grid-approximation to simulate the posterior of a Gamma-Poisson model. Let \(Y\) be the number of lookbacks in each increment where lookbacks occure with an average rate of \(\lambda\) per increment. There are \(1 ...N\) observations. We will start with a relatively extreme prior on \(\lambda\):
\[ Y_i|\lambda \sim \text{Poisson}(\lambda)\\ \lambda \sim \text{Gamma}(10,1) \]
To simulate the posterior using grid approximation, we first need to specify a grid of reasonable values for parameter values of \(\lambda\). We will restrict the parameter space to \(\lambda \in [0.1, 20)\) because 20 lookbacks in 50 keystrokes are very uncommon.
Applying the prior to the grid of \(\lambda\) values renders the prior probability distribution \(P(\lambda)\).
The Poisson likelihood function of \(\lambda\) must be calculated by the product of the individual Poisson probability density functions of each increment. For each incoming observations \(y_i \in Y_1 ... Y_N\) the likelihood is being updated so that
\[ P(y_i \mid \lambda)=\prod_{i=1}^N\text{Poisson}(y_i \mid\lambda) \]
For each grid value, we obtain the probability of observing the given data data \(P(y \mid \lambda)\).
The posterior probability of \(\lambda\) can then be derived using Bayes’ Theorem
\[ P(\lambda \mid y_i) \propto P(\lambda) \cdot P(y_i \mid \lambda) \]
i.e. the product of the the prior and the likelihood. We can then calculate the probability of any given observation under the posterior distribution. The posterior of the current increment \(i\) was then used as the prior for the subsequent increment \(i+1\).
During each increment we can also calculate the probability of the current observation \(y_i\) given the current posterior, namely \(P(y_i \mid \lambda_\text{posterior})\). This would allow us to detect a sudden influx in lookbacks to flag up any number of lookbacks that is extremer than expected, for example when \(P(y_i \mid \lambda_\text{posterior}) < .05\).
# Grid for lambda space
<- seq(.1, 20, length = 100)
lambdas
# Pre-allocation for likelihood
<- tibble(.rows = length(lambdas))
likelihood
# Prior for lambda
<- 10 # <- this one is obviously a very extreme prior.
lambda_prior
# Store posterior and prior
<- list()
all_posteriors <- list()
all_priors
# Time per increment
<- c()
time
# Incrementally update the posterior for every observation.
for(i in 1:n_increments){
# Get new observation
<- sim_data$sum_lookbacks[i]
new_obs
# Start time counter
<- Sys.time()
start_time
# Prior probability distribution over lambda
<- dgamma(lambdas, lambda_prior, 1, log = TRUE)
prior
# Likelihood of current and previous increments
# Sum of the log likelihoods is equivalent to the product of the likelihood
<- rowSums(
likelihood bind_cols(likelihood,
dpois(new_obs, lambdas, log = TRUE)))
# We add prior to likelihood cause both are on the log scale
<- likelihood + prior
posterior
# Normalising (so it sums to unity) requires exponential
<- exp(posterior) / sum(exp(posterior))
posterior
# Find lambda with highest posterior probability
<- lambdas[which.max(posterior)]
lambda_posterior
# Get conditional probability of the observation given the current
# parameter estimate.
<- 1 - pgamma(new_obs, lambda_posterior)
p_obs
# Save to data
$lambda_posterior[i] <- lambda_posterior
sim_data$lambda_prior[i] <- lambda_prior
sim_data$p_obs[i] <- p_obs
sim_data
# New prior for next increment is the current posterior.
<- lambda_posterior
lambda_prior
# Save the posterior
<- posterior
all_posteriors[[i]]
# Save the priors
<- exp(prior) / sum(exp(prior))
all_priors[[i]] <- Sys.time()
end_time
<- c(time, end_time - start_time)
time
}
# Save the latest posterior.
<- all_posteriors[[max(n_increments)]] last_posterior
The average time to process one increment was 52.7 msecs with a SD of 31.4 msecs.
Use the posterior value of the last increment to estimate the posterior distribution.
<- tibble(lambdas = lambdas,
grid_data posterior = last_posterior)
<- sample_n(grid_data,
ps_online size = 10000,
weight = posterior,
replace = TRUE) %>%
tabyl(lambdas)
The posterior parameter estimate for \(\lambda\) obtained by the on-line method was \(\hat\lambda=\) 4.12 in a 95% probability interval of [3.52 – 4.72]. As a reminder, the true parameter value was \(\lambda=\) 4 and the off-line model estimated \(\hat\lambda=\) 4.19 [3.66 – 4.79]. In other words, both the on-line and the off-line approach successfully uncovered the parameter value for \(\lambda\).
The posterior of the on-line parameter estimation is shown in the following figure with the red line indicating the true parameter value.
This figure shows the simulated data compared to the prior (\(\lambda_\text{prior}\)) and the posterior parameter estimate (\(\lambda_\text{posterior}\)). Note that even though we chose a relatively extreme initial prior, the chain quickly stabalised – after \(\sim\) 20 increments – on a parameter value for \(\lambda\) that is close to the true value represented as horizontal red line. The highlighted red points are observations that were unexpected under the posterior distribution of the current increment, namely data for which \(P(\text{y}_i\mid\hat\lambda_\text{posterior})<.05\) was true.
Real data: Comparison for extreme value classification
Extract and bin data from participant with the highest number of lookback events.
# Data of participant with most lookbacks
<- read_csv("data/all_data_incremental.csv") %>%
data filter(task == "A", ppt == "M19-0036") %>%
select(is_lookback)
<- 50
keys_in_increment <- floor(nrow(data)/keys_in_increment) * keys_in_increment
n_bins <- data[1:n_bins,]
data
<- data %>%
data mutate(increment = rep(1:(n_bins/keys_in_increment), each = keys_in_increment)) %>%
group_by(increment) %>%
summarise(sum_lookbacks = sum(is_lookback))
Brief overview of data.
Rows: 94
Columns: 2
$ increment <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ sum_lookbacks <int> 1, 3, 3, 2, 3, 2, 3, 1, 1, 4, 8, 4, 6, 1, 6, 8, 8, 7, 4,…
Parameter value estimates were obtained as above using both an off-line model and an on-line model. THe off-line Poisson model estimated the true parameter value for \(\lambda\) to be \(\hat\lambda=\) 3.1 with a 95% probability interval of [2.76 – 3.47]. The on-line approximation estimated a parameter value of \(\hat\lambda=\) 3.12 with a 95% probability interval of [2.51 – 3.52]. As a reminder, the true parameter value was \(\lambda=\) 4 and the off-line model estimated \(\hat\lambda=\) 3.1 [2.76 – 3.47]. Again, the on-line and the off-line revealed relatively similar parameter estimates.
The following figure shows the observed data compared to the prior (\(\lambda_\text{prior}\)) and the posterior parameter estimate (\(\lambda_\text{posterior}\)). The highlighted red points are observations that were unexpected under the posterior distribution of the current increment, namely data for which \(P(\text{y}_i\mid\hat\lambda_\text{posterior})<.05\) was true.
We are now going to compare how the on-line model compares to the off-line model’s ability to detect extreme values. For the off-line model we use the probability of an observation given the posterior predicted data \(P(y_i\mid \tilde{y}_i)\) to detect extreme values similar to the on-line method. Again, observations with a probability smaller than .05 given the posterior predicted values will be considered extreme.
<- data %>%
data mutate(predicted = predict(fit, newdata = data)[,1],
p_obs_offline = 1 - pgamma(sum_lookbacks, predicted)) %>%
rename(p_obs_online = p_obs) %>%
mutate(across(starts_with("p_"), ~. < .05))
Shown are the observations identified as extreme for the off-line and the on-line model. As can be seen, the off-line model identified two more observations as extreme.
Comparing these classifications based on the on-line and the offline model renders the following confusion matrix.
offline FALSE TRUE Total
FALSE 86 0 86
TRUE 2 6 8
Total 88 6 94
The confusion matrix renders that the on-line model (assuming the off-line model is the gold-standard for extreme value classification) has a sensitivity (precision) of .75, a recall of 1.00, a specificity of 1.00 and a misclassification error of .02. The F1 scores is .86.