To demonstrate an inference from non IID binary observations, we develop an over simplified model for my mood. Suppose I know that my mood oscillates between good and bad during the day. It is unreasonable to assume that my current mood is independent of my future mood, at least in the short term. Also suppose that when I am in a good mood, I have a certain probability of transitioning to the bad mood and when I am in the bad mood I have a different probability of transitioning to the good mood. The two probabilities are independent and there are no time of day or other effects. Graphically, the model looks like this:

Simple mood model

Simple mood model

The observations are not independent as each successive observation depends on the previous one. This particular kind of dependence is called a Markov Chain. Let’s simulate some data.

set.seed(123)
N <- 1e3
mc <- numeric(N) 
mc[1] <- 0     # wake up in a bad mood
theta0 <- 0.3  # if we are in a bad mood, we go to good mood with 0.3
theta1 <- 0.6  # if we are in a good mood, we go bad mood with 0.4

for (i in 2:N) {
  if (mc[i - 1] == 0)
    mc[i] <- rbinom(1, 1, prob = theta0)
  else
    mc[i] <- rbinom(1, 1, prob = theta1)
}
cat("First 30 observations:", mc[1:30])
## First 30 observations: 0 0 1 1 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 1 0 1 0 0 1 0 1 1 1 1
cat("I was in a good mood", sum(mc)/N * 100, "% of the time.")
## I was in a good mood 42.6 % of the time.

In this data set, we are spending 42.6% in a good mood. We can also compute a long term equilibrium probabilities (the stationary distribution) from the transition matrix.

trm <- matrix(c(.7, .3, .4, .6), byrow = TRUE, nrow = 2)
trm
##      [,1] [,2]
## [1,]  0.7  0.3
## [2,]  0.4  0.6
eigen_v <- eigen(t(trm))$vectors[, 1]
round(eigen_v/sum(eigen_v), 2)
## [1] 0.57 0.43

The result agrees with what we see in the data, which is not surprising for 1,000 observations.

Now for the inverse problem: can we infer the transition probabilities from data. In other words, given data only, can we estimate \(\theta_0\) and \(\theta_1\)?

The following Stan program will be used to compute the inferences.

data {
  int<lower=1> N;  
  int<lower=0, upper=1> y[N];
}
parameters {
  real<lower=0, upper=1> theta_zero;
  real<lower=0, upper=1> theta_one;
}
model {
  theta_zero ~ beta(2, 2);
  theta_one ~ beta(2, 2);
  
  // likelihood  
  for (n in 2:N) 
    if (y[n - 1] == 0)
      target += bernoulli_lpmf(y[n] | theta_zero);
    else
      target += bernoulli_lpmf(y[n] | theta_one);
}

The likelihood follows our data generating process.

Next we fit the model and attempt to recover the parameters of the simulation.

data <- list(N = N, y = mc)
# mood_m <- stan_model("mood.stan"); saveRDS(mood_m, "mood.rds")
mood_m <- readRDS("mood.rds")
mood_f <- sampling(mood_m, iter = 500, data = data)
## 
## SAMPLING FOR MODEL 'mood' NOW (CHAIN 1).
## 
## Gradient evaluation took 8.3e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.83 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:   1 / 500 [  0%]  (Warmup)
## Iteration:  50 / 500 [ 10%]  (Warmup)
## Iteration: 100 / 500 [ 20%]  (Warmup)
## Iteration: 150 / 500 [ 30%]  (Warmup)
## Iteration: 200 / 500 [ 40%]  (Warmup)
## Iteration: 250 / 500 [ 50%]  (Warmup)
## Iteration: 251 / 500 [ 50%]  (Sampling)
## Iteration: 300 / 500 [ 60%]  (Sampling)
## Iteration: 350 / 500 [ 70%]  (Sampling)
## Iteration: 400 / 500 [ 80%]  (Sampling)
## Iteration: 450 / 500 [ 90%]  (Sampling)
## Iteration: 500 / 500 [100%]  (Sampling)
## 
##  Elapsed Time: 0.085675 seconds (Warm-up)
##                0.075986 seconds (Sampling)
##                0.161661 seconds (Total)
## 
## 
## SAMPLING FOR MODEL 'mood' NOW (CHAIN 2).
## 
## Gradient evaluation took 6.6e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.66 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:   1 / 500 [  0%]  (Warmup)
## Iteration:  50 / 500 [ 10%]  (Warmup)
## Iteration: 100 / 500 [ 20%]  (Warmup)
## Iteration: 150 / 500 [ 30%]  (Warmup)
## Iteration: 200 / 500 [ 40%]  (Warmup)
## Iteration: 250 / 500 [ 50%]  (Warmup)
## Iteration: 251 / 500 [ 50%]  (Sampling)
## Iteration: 300 / 500 [ 60%]  (Sampling)
## Iteration: 350 / 500 [ 70%]  (Sampling)
## Iteration: 400 / 500 [ 80%]  (Sampling)
## Iteration: 450 / 500 [ 90%]  (Sampling)
## Iteration: 500 / 500 [100%]  (Sampling)
## 
##  Elapsed Time: 0.092067 seconds (Warm-up)
##                0.07436 seconds (Sampling)
##                0.166427 seconds (Total)
## 
## 
## SAMPLING FOR MODEL 'mood' NOW (CHAIN 3).
## 
## Gradient evaluation took 6.6e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.66 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:   1 / 500 [  0%]  (Warmup)
## Iteration:  50 / 500 [ 10%]  (Warmup)
## Iteration: 100 / 500 [ 20%]  (Warmup)
## Iteration: 150 / 500 [ 30%]  (Warmup)
## Iteration: 200 / 500 [ 40%]  (Warmup)
## Iteration: 250 / 500 [ 50%]  (Warmup)
## Iteration: 251 / 500 [ 50%]  (Sampling)
## Iteration: 300 / 500 [ 60%]  (Sampling)
## Iteration: 350 / 500 [ 70%]  (Sampling)
## Iteration: 400 / 500 [ 80%]  (Sampling)
## Iteration: 450 / 500 [ 90%]  (Sampling)
## Iteration: 500 / 500 [100%]  (Sampling)
## 
##  Elapsed Time: 0.091734 seconds (Warm-up)
##                0.070619 seconds (Sampling)
##                0.162353 seconds (Total)
## 
## 
## SAMPLING FOR MODEL 'mood' NOW (CHAIN 4).
## 
## Gradient evaluation took 6.7e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.67 seconds.
## Adjust your expectations accordingly!
## 
## 
## Iteration:   1 / 500 [  0%]  (Warmup)
## Iteration:  50 / 500 [ 10%]  (Warmup)
## Iteration: 100 / 500 [ 20%]  (Warmup)
## Iteration: 150 / 500 [ 30%]  (Warmup)
## Iteration: 200 / 500 [ 40%]  (Warmup)
## Iteration: 250 / 500 [ 50%]  (Warmup)
## Iteration: 251 / 500 [ 50%]  (Sampling)
## Iteration: 300 / 500 [ 60%]  (Sampling)
## Iteration: 350 / 500 [ 70%]  (Sampling)
## Iteration: 400 / 500 [ 80%]  (Sampling)
## Iteration: 450 / 500 [ 90%]  (Sampling)
## Iteration: 500 / 500 [100%]  (Sampling)
## 
##  Elapsed Time: 0.083262 seconds (Warm-up)
##                0.061755 seconds (Sampling)
##                0.145017 seconds (Total)
theta_one <- rstan::extract(mood_f, pars = "theta_one")$theta_one
theta_zero <- rstan::extract(mood_f, pars = "theta_zero")$theta_zero
mood <- tibble(theta_zero, theta_one) %>% gather(.)

ggplot(mood, aes(key, value)) + 
  geom_violin(aes(color = key, fill = key), alpha = 1/3, adjust = 2) +
  geom_hline(yintercept = theta0, size = 0.2, color = "green") +
  geom_hline(yintercept = theta1, size = 0.2, color = "red") +
  xlab("") + ylab("Probability") + theme(legend.position="none")

Both parameters are estimated well with \(\theta_0\) estimated more precisely for a simple reason that we have more zeros (bad mood) in the dataset.