Real-time updating for writing-process data – Simulation

Jens Roeser

Compiled Oct 04 2022

The following will demonstrate that we can use an on-line (incremental) parameter approximation technique with equivalent accuracy as an off-line parameter approximation using Hamiltonian Monte Carlo MCMC as implemented in Stan (Carpenter et al., 2016) to uncover a parameter value. To demonstrate this we simulate data based on an known parameter value and used brms (Bürkner, 2017) and then our on-line grid approximation to estimate the parameter value.

The goal of is to implement real-time Bayesian updating to estimate the parameter value and the probability that a participant is currently struggling with their process process for a range of different behavioural measures obtained from the writing process (pausing, editing, lookbacks in text).

Bayesian methods are ideal in this context because they are conditional on prior information that can be used to evaluate the probability of an observation \(y\) under the posterior distribution of the parameter value \(\lambda\). Because incremental updating, for our purposes, does not require estimating large numbers of parameters, we can use an analytical solution – i.e. grid approximation – which does not require computationally demanding methods and can therefore be used on-line. See e.g. McElreath (2020) Chapter 2.4 and Johnson et al. (2022) Chapter 6.1 for an overview.

Simulate data: lookbacks

We can model whether a participant is frequently re-reading already-produced text. Frequent re-reading is disrupting the writing process. In this example we will simulate the number of lookback into the text for \(N\) increments with each comprising a certain number of key transitions (let’s assume 50 just as example). Less lookbacks is not problematic during (prior to final revision); however more lookbacks has potential knock-on effects on higher level processes.

We will simulate n_increments observations that come from a distribution \(\text{Poisson}(\lambda)\) where the true parameter value for \(\lambda\)= 4.

# Simulate lookback data
keys_per_increment <- 50
# avg. number of lookbacks per 50 keys
lambda <- 4 # this is the parameter to recover
# number of increments 
n_increments <- 50
# Set seed for replicatability
set.seed(365)
# simulate data for n increments
sim_data <- tibble(increment = 1:n_increments,
               sum_lookbacks = rpois(n = n_increments, 
                                     lambda = lambda))

Brief overview of data. Observations are increments and the total of lookbacks per increment.

Rows: 50
Columns: 2
$ increment     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ sum_lookbacks <int> 6, 2, 2, 1, 4, 4, 4, 7, 5, 5, 2, 2, 6, 4, 7, 6, 6, 9, 5,…

Off-line parameter-value estimation

First we used brms to obtain a draw posterior samples from a Poisson model of the simulated data.

# Sampling parameters
n_cores <- 3
n_chains <- 3
iterations <- 20000

# Specify model 
model <- bf(sum_lookbacks ~ 1, family = poisson())

# Run model
fit <- brm(model, 
           data = sim_data,
           chains = n_chains, 
           cores = n_cores,
           iter = iterations, 
           warmup = iterations/2,
           seed = 365)

The off-line model estimated the true parameter value for \(\lambda\) to be \(\hat\lambda=\) 4.19 with a 95% probability interval of [3.66 – 4.79]. In other words, as was to be expected, the off-line model successfully uncovered the parameter value for \(\lambda\).

On-line parameter-value estimation

For the real-time parameter estimation we use grid-approximation to simulate the posterior of a Gamma-Poisson model. Let \(Y\) be the number of lookbacks in each increment where lookbacks occure with an average rate of \(\lambda\) per increment. There are \(1 ...N\) observations. We will start with a relatively extreme prior on \(\lambda\):

\[ Y_i|\lambda \sim \text{Poisson}(\lambda)\\ \lambda \sim \text{Gamma}(10,1) \]

To simulate the posterior using grid approximation, we first need to specify a grid of reasonable values for parameter values of \(\lambda\). We will restrict the parameter space to \(\lambda \in [0.1, 20)\) because 20 lookbacks in 50 keystrokes are very uncommon.

Applying the prior to the grid of \(\lambda\) values renders the prior probability distribution \(P(\lambda)\).

The Poisson likelihood function of \(\lambda\) must be calculated by the product of the individual Poisson probability density functions of each increment. For each incoming observations \(y_i \in Y_1 ... Y_N\) the likelihood is being updated so that

\[ P(y_i \mid \lambda)=\prod_{i=1}^N\text{Poisson}(y_i \mid\lambda) \]

For each grid value, we obtain the probability of observing the given data data \(P(y \mid \lambda)\).

The posterior probability of \(\lambda\) can then be derived using Bayes’ Theorem

\[ P(\lambda \mid y_i) \propto P(\lambda) \cdot P(y_i \mid \lambda) \]

i.e. the product of the the prior and the likelihood. We can then calculate the probability of any given observation under the posterior distribution. The posterior of the current increment \(i\) was then used as the prior for the subsequent increment \(i+1\).

During each increment we can also calculate the probability of the current observation \(y_i\) given the current posterior, namely \(P(y_i \mid \lambda_\text{posterior})\). This would allow us to detect a sudden influx in lookbacks to flag up any number of lookbacks that is extremer than expected, for example when \(P(y_i \mid \lambda_\text{posterior}) < .05\).

# Grid for lambda space
lambdas <- seq(.1, 20, length = 100)

# Pre-allocation for likelihood
likelihood <- tibble(.rows = length(lambdas))

# Prior for lambda
lambda_prior <- 10 # <- this one is obviously a very extreme prior.

# Store posterior and prior
all_posteriors <- list()
all_priors <- list()

# Time per increment
time <- c()

# Incrementally update the posterior for every observation.
for(i in 1:n_increments){

  # Get new observation
  new_obs <- sim_data$sum_lookbacks[i]

  # Start time counter
  start_time <- Sys.time()

  # Prior probability distribution over lambda
  prior <- dgamma(lambdas, lambda_prior, 1, log = TRUE)

  # Likelihood of current and previous increments
  # Sum of the log likelihoods is equivalent to the product of the likelihood
  likelihood <- rowSums(
    bind_cols(likelihood, 
              dpois(new_obs, lambdas, log = TRUE)))

  # We add prior to likelihood cause both are on the log scale
  posterior <- likelihood + prior
  
  # Normalising (so it sums to unity) requires exponential
  posterior <- exp(posterior) / sum(exp(posterior))

  # Find lambda with highest posterior probability
  lambda_posterior <- lambdas[which.max(posterior)]

  # Get conditional probability of the observation given the current
  # parameter estimate.
  p_obs <- 1 - pgamma(new_obs, lambda_posterior)

  # Save to data  
  sim_data$lambda_posterior[i] <- lambda_posterior
  sim_data$lambda_prior[i] <- lambda_prior  
  sim_data$p_obs[i] <- p_obs

  # New prior for next increment is the current posterior.
  lambda_prior <- lambda_posterior
  
  # Save the posterior
  all_posteriors[[i]] <- posterior
  
  # Save the priors
  all_priors[[i]] <- exp(prior) / sum(exp(prior))
  end_time <- Sys.time()
  
  time <- c(time, end_time - start_time)

}

# Save the latest posterior.
last_posterior <- all_posteriors[[max(n_increments)]]

The average time to process one increment was 52.7 msecs with a SD of 31.4 msecs.

Use the posterior value of the last increment to estimate the posterior distribution.

grid_data <- tibble(lambdas = lambdas,
                    posterior = last_posterior)

ps_online <- sample_n(grid_data, 
         size = 10000, 
         weight = posterior,
         replace = TRUE) %>% 
  tabyl(lambdas)

The posterior parameter estimate for \(\lambda\) obtained by the on-line method was \(\hat\lambda=\) 4.12 in a 95% probability interval of [3.52 – 4.72]. As a reminder, the true parameter value was \(\lambda=\) 4 and the off-line model estimated \(\hat\lambda=\) 4.19 [3.66 – 4.79]. In other words, both the on-line and the off-line approach successfully uncovered the parameter value for \(\lambda\).

The posterior of the on-line parameter estimation is shown in the following figure with the red line indicating the true parameter value.

This figure shows the simulated data compared to the prior (\(\lambda_\text{prior}\)) and the posterior parameter estimate (\(\lambda_\text{posterior}\)). Note that even though we chose a relatively extreme initial prior, the chain quickly stabalised – after \(\sim\) 20 increments – on a parameter value for \(\lambda\) that is close to the true value represented as horizontal red line. The highlighted red points are observations that were unexpected under the posterior distribution of the current increment, namely data for which \(P(\text{y}_i\mid\hat\lambda_\text{posterior})<.05\) was true.

Real data: Comparison for extreme value classification

Extract and bin data from participant with the highest number of lookback events.

# Data of participant with most lookbacks
data <- read_csv("data/all_data_incremental.csv") %>% 
  filter(task == "A", ppt == "M19-0036") %>% 
  select(is_lookback) 
  
keys_in_increment <- 50
n_bins <- floor(nrow(data)/keys_in_increment) * keys_in_increment
data <- data[1:n_bins,]

data <- data %>% 
  mutate(increment = rep(1:(n_bins/keys_in_increment), each = keys_in_increment)) %>% 
  group_by(increment) %>% 
  summarise(sum_lookbacks = sum(is_lookback))

Brief overview of data.

Rows: 94
Columns: 2
$ increment     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ sum_lookbacks <int> 1, 3, 3, 2, 3, 2, 3, 1, 1, 4, 8, 4, 6, 1, 6, 8, 8, 7, 4,…

Parameter value estimates were obtained as above using both an off-line model and an on-line model. THe off-line Poisson model estimated the true parameter value for \(\lambda\) to be \(\hat\lambda=\) 3.1 with a 95% probability interval of [2.76 – 3.47]. The on-line approximation estimated a parameter value of \(\hat\lambda=\) 3.12 with a 95% probability interval of [2.51 – 3.52]. As a reminder, the true parameter value was \(\lambda=\) 4 and the off-line model estimated \(\hat\lambda=\) 3.1 [2.76 – 3.47]. Again, the on-line and the off-line revealed relatively similar parameter estimates.

The following figure shows the observed data compared to the prior (\(\lambda_\text{prior}\)) and the posterior parameter estimate (\(\lambda_\text{posterior}\)). The highlighted red points are observations that were unexpected under the posterior distribution of the current increment, namely data for which \(P(\text{y}_i\mid\hat\lambda_\text{posterior})<.05\) was true.

We are now going to compare how the on-line model compares to the off-line model’s ability to detect extreme values. For the off-line model we use the probability of an observation given the posterior predicted data \(P(y_i\mid \tilde{y}_i)\) to detect extreme values similar to the on-line method. Again, observations with a probability smaller than .05 given the posterior predicted values will be considered extreme.

data <- data %>% 
  mutate(predicted = predict(fit, newdata = data)[,1],
         p_obs_offline = 1 - pgamma(sum_lookbacks, predicted)) %>% 
  rename(p_obs_online = p_obs) %>% 
  mutate(across(starts_with("p_"), ~. < .05))

Shown are the observations identified as extreme for the off-line and the on-line model. As can be seen, the off-line model identified two more observations as extreme.

Comparing these classifications based on the on-line and the offline model renders the following confusion matrix.

 offline FALSE TRUE Total
   FALSE    86    0    86
    TRUE     2    6     8
   Total    88    6    94

The confusion matrix renders that the on-line model (assuming the off-line model is the gold-standard for extreme value classification) has a sensitivity (precision) of .75, a recall of 1.00, a specificity of 1.00 and a misclassification error of .02. The F1 scores is .86.

References

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01

Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., & Riddell, A. (2016). Stan: A probabilistic programming language. Journal of Statistical Software, 20.

Johnson, A. A., Ott, M. Q., & Dogucu, M. (2022). Bayes rules!: An introduction to applied Bayesian modeling. CRC Press.

McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan (Vol. 2). CRC Press.