1 Setting

1.1 SCSS Setup
1.2 Setup

2 受講生によるテスト：The MCMC algorithm for zero-inflated mixtures

2.1 説明

2.1.1 Review criteria

2.2 自分の提出物

2.2.1 Assignment
2.2.2 Marking

2.3 ピアレビュー

2.3.1 1st Peer

2.3.1.1 Asignment
2.3.1.2 Marking

2.3.2 2nd Peer

2.3.2.1 Asignment
2.3.2.2 Marking

2.3.3 3rd Peer

2.3.3.1 Asignment
2.3.3.2 Marking

2.3.4 4th Peer

2.3.4.1 Asignment
2.3.4.2 Marking

2.3.5 5th Peer

2.3.5.1 Asignment
2.3.5.2 Marking

2.3.6 6th Peer

2.3.6.1 Asignment
2.3.6.2 Marking

2.4 ディスカッション

3 Appendix

3.1 Blooper
3.2 Documenting File Creation
3.3 Reference

Theme Song

1 Setting

1.1 SCSS Setup

#if(!suppressPackageStartupMessages(require('BBmisc'))) {
#  install.packages('BBmisc', dependencies = TRUE, INSTALL_opts = '--no-lock')
#}
suppressPackageStartupMessages(library('BBmisc'))
#remotes::install_github("rstudio/sass")
#lib('sass')
suppressPackageStartupMessages(library('sass'))

/* https://stackoverflow.com/a/66029010/3806250 */
h1 { color: #002C54; }
h2 { color: #2F496E; }
h3 { color: #375E97; }
h4 { color: #375E97; }
h5 { color: #375E97; }

/* ----------------------------------------------------------------- */
/* https://gist.github.com/himynameisdave/c7a7ed14500d29e58149#file-broken-gradient-animation-less */
.hover01 {
  /* color: #FFD64D; */
  background: linear-gradient(155deg, #EDAE01 0%, #FFEB94 100%);
  transition: all 0.45s;
  &:hover{
    background: linear-gradient(155deg, #EDAE01 20%, #FFEB94 80%);
    }
  }

.hover02 {
  color: #FFD64D;
  background: linear-gradient(155deg, #002C54 0%, #4CB5F5 100%);
  transition: all 0.45s;
  &:hover{
    background: linear-gradient(155deg, #002C54 20%, #4CB5F5 80%);
    }
  }

.hover03 {
  color: #FFD64D;
  background: linear-gradient(155deg, #A10115 0%, #FF3C5C 100%);
  transition: all 0.45s;
  &:hover{
    background: linear-gradient(155deg, #A10115 20%, #FF3C5C 80%);
    }
  }

## https://stackoverflow.com/a/36846793/3806250
options(width = 999)
knitr::opts_chunk$set(class.source = 'hover01', class.output = 'hover02', class.error = 'hover03')

1.2 Setup

pkgs <- c('devtools', 'knitr', 'kableExtra', 'tidyr', 
          'readr', 'lubridate', 'data.table', 'reprex', 
          'timetk', 'plyr', 'dplyr', 'stringr', 'magrittr', 
          'tdplyr', 'tidyverse', 'formattable', 
          'echarts4r', 'paletteer')

suppressAll(lib(pkgs))
# load_pkg(pkgs)

## Set the timezone but not change the datetime
Sys.setenv(TZ = 'Asia/Tokyo')
## options(knitr.table.format = 'html') will set all kableExtra tables to be 'html', otherwise need to set the parameter on every single table.
options(warn = -1, knitr.table.format = 'html')#, digits.secs = 6)

## https://stackoverflow.com/questions/39417003/long-vectors-not-supported-yet-abnor-in-rmd-but-not-in-r-script
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
#, 
                      #cache = TRUE, cache.lazy = FALSE)

rm(pkgs)

2 受講生によるテスト：The MCMC algorithm for zero-inflated mixtures

5月17日 15:59 JST までに提出

課題を受ける準備はできていますか？

以下に提出のための指示が表示されます。

2.1 説明

This assignment is similar to one you already completed in Lesson 3. However, in this case you are asked to use an MCMC algorithm to perform Bayesian inference on the model parameters, instead of an EM algorithm to find maximum likelihood estimators.

A biologist is interest in characterizing the number of eggs laid by a particular bird species. To do this, they sample $n = 300$ nests on a site in Southern California. The observations are contained in the attached file nestsize.csv:

nestsize.csv

The following graph compares the empirical distribution of the data against a Poison distribution whose parameter has been set to its maximum likelihood estimator:

As you can see, the Poisson distribution underestimates the number of empty nests in the data, and overestimates the number of nests with either 1 or 2 eggs. To address this, you are asked to modify the implementation of the MCMC algorithm contained in the Reading “Sample code for MCMC example 1” so that you can fit a mixture between a point mass at zero and a Poisson distribution (we call this a “zero-inflated Poisson” distribution):

$f (x) = w δ_{0} (x) + (1 - w) \frac{e^{- λ} λ^{x}}{x!} x \in {0, 1, 2, \dots}$

where $δ_{0} (x)$ represents the degenerate distribution placing all of its mass at zero. You then should run your algorithm for $5, 000$ iterations (after a burn-in period of $1, 000$ iterations) with the data contained in nestsize.csv and report your estimates of the posterior means, rounded to two decimal places.

In carrying out this assignment assume the following priors for your unknown parameters: $ω \sim U n i [0, 1]$ and $λ \sim E x p (1)$ .

2.1.1 Review criteria

The code you generate should follow the same structure as “Sample code for MCMC example 1”. In particular, focus on a Gibss sampler that alternates between the full conditionals for $ω$ , $λ$ and the latent component indicators $c_{1}, \dots, c_{n}$ . Peer reviewers will be asked to check whether the different pieces of code have been adequately modified to reflect the fact that (1) parameters have been initialized in a reasonable way, (2) each of the two full conditional distributions associated with the sampler are correct, and (2) the numerical values that you obtain are correct. To simplify the peer-review process, assume that component 1 corresponds to the point mass at zero, while component 2 corresponds to the Poisson distribution.

2.2 自分の提出物

2.2.1 Assignment

KK        = 2
w         = 1/2
mu        = mean(x)

## Error in mean(x): object 'x' not found

n         = length(x)

## Error in eval(expr, envir, enclos): object 'x' not found

aa        = rep(1,KK)  # Uniform prior on w
maxv      = 1 - 1e-6
# Number of iterations of the sampler
rrr       = 6000
burn      = 1000
w_out     = rep(0, rrr)
mu_out    = rep(0, rrr)

for(s in 1:rrr){
  # Sample the indicators
  cc      = rep(0,n)
  for(i in 1:n){
    v     = rep(0,KK)
    v[1]  = log(w)
    #v[1]  = log(w) + dunif(x[i], max=maxv, log=T) # treat first component as uniform (0, .9999)
    v[2]  = log(1-w) + dpois(x[i], mu, log=T)
    v     = exp(v - max(v))/sum(exp(v - max(v)))
    cc[i] = sample(1:KK, 1, replace=T, prob=v)
  }
  
  # Sample the weights
  w = rbeta(1, aa[1] + sum(cc==1), aa[2] + sum(cc==2))

  # Sample the mean
  for(k in 2:KK){
    nk    = sum(cc==k)
    xsumk = sum(x[cc==k])
    mu    = rgamma(1, xsumk + 1, nk + 1)
  }

  # Store samples
  w_out[s]    = w
  mu_out[s]   = mu
}

## Error in eval(expr, envir, enclos): cannot coerce type 'closure' to vector of type 'double'

mean(mu_out[burn:rrr])# = 3.051

## [1] 0

mean(w_out[burn:rrr])# = 0.398

## [1] 0

2.2.2 Marking

2.3 ピアレビュー

2.3.1 1st Peer

2.3.1.1 Asignment

Provide a Markov chain Monte Carlo algorithm to fit a zero-inflated Poisson distribution.

n        = length(x)

## Error in eval(expr, envir, enclos): object 'x' not found

cc       = rep(0, n)

## Error in eval(expr, envir, enclos): cannot coerce type 'closure' to vector of type 'double'

cc[x==0] = sample(1:2, sum(x==0), replace=TRUE, prob=c(1/2, 1/2))

## Error in sample.int(length(x), size, replace, prob): object 'x' not found

cc[x!=0] = 2

## Error in cc[x != 0] = 2: object 'cc' not found

lambda   = mean(x)

## Error in mean(x): object 'x' not found

w        = 0.2 # Full conditional for cc

for(i in 1:n){
   v = rep(0,2)
   
   if(x[i]==0){
     v[1] = log(w)
     v[2] = log(1-w) + dpois(x[i], lambda, log=TRUE)
     v    = exp(v - max(v))/sum(exp(v - max(v)))
     
    }else{
      v[1] = 0
      v[2] = 1
    }
   cc[i] = sample(1:2, 1, replace=TRUE, prob=v)
}

## Error in 1:n: NA/NaN argument

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+sum(cc==2))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + sum(cc == 2)): object 'cc' not found

lambda = rgamma(1, sum(x[cc==2]) + 1, sum(cc==2) + 1)

## Error in rgamma(1, sum(x[cc == 2]) + 1, sum(cc == 2) + 1): object 'x' not found

## [1] 0.2

lambda

## Error in eval(expr, envir, enclos): object 'lambda' not found

Provide you estimates of the posterior means $E (λ ∣ x)$ and $E (w ∣ x)$ .

mean(mu_out[burn:rrr])# = 3.051

## [1] 0

mean(w_out[burn:rrr])# = 0.398

## [1] 0

2.3.1.2 Marking

Are the parameters initialized in a reasonable way?

Correctly initializing the indicators $c_{1}, \dots, c_{n}$ is key. In particular, observations such that $x_{i} \neq 0$ must have $c_{i} = 2$ . As for observations for which $x_{i} = 0$ , these can be randomly initialized to either component.

For the other parameters, it is reasonable to initialize $λ$ to the mean of the observations and the weight $ω$ to a relatively small value (as we did in the EM algorithm).

Hence, the code for the initialization of the algorithm might look something like:

n        = length(x)

## Error in eval(expr, envir, enclos): object 'x' not found

cc       = rep(0, n)

## Error in eval(expr, envir, enclos): cannot coerce type 'closure' to vector of type 'double'

cc[x==0] = sample(1:2, sum(x==0), replace=T, prob=c(1/2, 1/2))

## Error in sample.int(length(x), size, replace, prob): object 'x' not found

cc[x!=0] = 2

## Error in cc[x != 0] = 2: object 'cc' not found

lambda   = mean(x)

## Error in mean(x): object 'x' not found

w        = 0.2

However, be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the indicators $c_{1}, \dots, c_{n}$ correct?

The indicators $c_{1}, \dots, c_{n}$ are conditionally independent from each other, and we have

$Pr (c_{i} = 1 ∣ \dots) \propto {\begin{cases} w & x_{i} = 0 \\ 0 & otherwise \end{cases}$

while

$Pr (c_{i} = 2 ∣ \dots) \propto {\begin{cases} (1 - w) \frac{λ^{x_{i}} e^{- λ}}{x_{i}!} & x_{i} = 0 \\ 1 & otherwise \end{cases}$

Hence, the code to sample the indicators might look something like this:

# Full conditional for cc
for(i in 1:n){
  v = rep(0,2)
  if(x[i]==0){
    v[1] = log(w)
    v[2] = log(1-w) + dpois(x[i], lambda, log=TRUE)
    v    = exp(v - max(v))/sum(exp(v - max(v)))
  }else{
    v[1] = 0
    v[2] = 1
  }
  cc[i] = sample(1:2, 1, replace=TRUE, prob=v)
}

## Error in 1:n: NA/NaN argument

Please be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the weight ww correct?

This is a simple one, as its structure is common to all mixture models. Recalling that the prior on $ω$ is a uniform distribution on [0,1], we have

$ω ∣ \dots \sim Beta (m (c) + 1, n - m (c) + 1)$

where $m (c)$ is the number of observations that $c$ assigns to component 1. The associated code might look something like this

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+n-sum(cc==1))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + n - sum(cc == 1)): object 'cc' not found

or, alternatively,

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+sum(cc==2))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + sum(cc == 2)): object 'cc' not found

Please remember to be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the rate $λ$ correct?

Because we use a exponential prior on $λ$ , the full conditional posterior is a Gamma distribution,

$λ ∣ \dots \sim Gamma (1 + \sum_{i : c_{i} = 2} x_{i}, 1 + n - m (c))$

where, as before, $m (c)$ is the number of observations that $c$ assigns to component 1 (so that $n - m (c)$ is the number of observations assigned to component 2), and $\sum_{c_{i} = 2}^{i} x_{i}$ is the sum of the observations assigned to component 2.

Hence, the code for this prompt might look something like:

lambda = rgamma(1, sum(x[cc==2]) + 1, sum(cc==2) + 1)

## Error in rgamma(1, sum(x[cc == 2]) + 1, sum(cc == 2) + 1): object 'x' not found

0点 No
1点 Yes

Are the posterior means generated by the algorithm correct?

Recall that the posterior means can be approximated by simple averages of the posterior samples obtained after burn in. While there might a bit of Monte Carlo error in the answers reported, the values should be $E (λ ∣ x) \approx 3.05$ and $E (ω ∣ x) \approx 0.40$ . Note that these values are very close to those generated by the EM algorithm.

0点 No
1点 Yes

2.3.2 2nd Peer

2.3.2.1 Asignment

Provide a Markov chain Monte Carlo algorithm to fit a zero-inflated Poisson distribution.

KK        = 2
w         = 1/2
mu        = mean(x)

## Error in mean(x): object 'x' not found

n         = length(x)

## Error in eval(expr, envir, enclos): object 'x' not found

aa        = rep(1,KK)  # Uniform prior on w
maxv      = 1 - 1e-6
# Number of iterations of the sampler
rrr       = 6000
burn      = 1000
w_out     = rep(0, rrr)
mu_out    = rep(0, rrr)

for(s in 1:rrr){
  # Sample the indicators
  cc      = rep(0,n)
  for(i in 1:n){
    v     = rep(0,KK)
    v[1]  = log(w)
    #v[1]  = log(w) + dunif(x[i], max=maxv, log=T) # treat first component as uniform (0, .9999)
    v[2]  = log(1-w) + dpois(x[i], mu, log=T)
    v     = exp(v - max(v))/sum(exp(v - max(v)))
    cc[i] = sample(1:KK, 1, replace=T, prob=v)
  }
  
  # Sample the weights
  w = rbeta(1, aa[1] + sum(cc==1), aa[2] + sum(cc==2))

  # Sample the mean
  for(k in 2:KK){
    nk    = sum(cc==k)
    xsumk = sum(x[cc==k])
    mu    = rgamma(1, xsumk + 1, nk + 1)
  }

  # Store samples
  w_out[s]    = w
  mu_out[s]   = mu
}

## Error in eval(expr, envir, enclos): cannot coerce type 'closure' to vector of type 'double'

Provide you estimates of the posterior means $E (λ ∣ x)$ and $E (w ∣ x)$ .

mean(mu_out[burn:rrr])# = 3.051

## [1] 0

mean(w_out[burn:rrr])# = 0.398

## [1] 0

2.3.2.2 Marking

Are the parameters initialized in a reasonable way?

For the other parameters, it is reasonable to initialize $λ$ to the mean of the observations and the weight $ω$ to a relatively small value (as we did in the EM algorithm).

Hence, the code for the initialization of the algorithm might look something like:

n        = length(x)

## Error in eval(expr, envir, enclos): object 'x' not found

cc       = rep(0, n)

## Error in eval(expr, envir, enclos): cannot coerce type 'closure' to vector of type 'double'

cc[x==0] = sample(1:2, sum(x==0), replace=T, prob=c(1/2, 1/2))

## Error in sample.int(length(x), size, replace, prob): object 'x' not found

cc[x!=0] = 2

## Error in cc[x != 0] = 2: object 'cc' not found

lambda   = mean(x)

## Error in mean(x): object 'x' not found

w        = 0.2

However, be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the indicators $c_{1}, \dots, c_{n}$ correct?

The indicators $c_{1}, \dots, c_{n}$ are conditionally independent from each other, and we have

$Pr (c_{i} = 1 ∣ \dots) \propto {\begin{cases} w & x_{i} = 0 \\ 0 & otherwise \end{cases}$

while

$Pr (c_{i} = 2 ∣ \dots) \propto {\begin{cases} (1 - w) \frac{λ^{x_{i}} e^{- λ}}{x_{i}!} & x_{i} = 0 \\ 1 & otherwise \end{cases}$

Hence, the code to sample the indicators might look something like this:

# Full conditional for cc
for(i in 1:n){
  v = rep(0,2)
  if(x[i]==0){
    v[1] = log(w)
    v[2] = log(1-w) + dpois(x[i], lambda, log=TRUE)
    v    = exp(v - max(v))/sum(exp(v - max(v)))
  }else{
    v[1] = 0
    v[2] = 1
  }
  cc[i] = sample(1:2, 1, replace=TRUE, prob=v)
}

## Error in 1:n: NA/NaN argument

Please be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the weight ww correct?

This is a simple one, as its structure is common to all mixture models. Recalling that the prior on $ω$ is a uniform distribution on [0,1], we have

$ω ∣ \dots \sim Beta (m (c) + 1, n - m (c) + 1)$

where $m (c)$ is the number of observations that $c$ assigns to component 1. The associated code might look something like this

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+n-sum(cc==1))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + n - sum(cc == 1)): object 'cc' not found

or, alternatively,

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+sum(cc==2))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + sum(cc == 2)): object 'cc' not found

Please remember to be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the rate $λ$ correct?

Because we use a exponential prior on $λ$ , the full conditional posterior is a Gamma distribution,

$λ ∣ \dots \sim Gamma (1 + \sum_{i : c_{i} = 2} x_{i}, 1 + n - m (c))$

Hence, the code for this prompt might look something like:

lambda = rgamma(1, sum(x[cc==2]) + 1, sum(cc==2) + 1)

## Error in rgamma(1, sum(x[cc == 2]) + 1, sum(cc == 2) + 1): object 'x' not found

0点 No
1点 Yes

Are the posterior means generated by the algorithm correct?

0点 No
1点 Yes

2.3.3 3rd Peer

2.3.3.1 Asignment

Provide a Markov chain Monte Carlo algorithm to fit a zero-inflated Poisson distribution.

## Initialize the parameters
KK     = 2                               # number of components
n      = length(x)                         # number of samples

## Error in eval(expr, envir, enclos): object 'x' not found

v      = array(0, dim=c(n,KK))

## Error in array(0, dim = c(n, KK)): 'list' object cannot be coerced to type 'integer'

v[,1]  = 0.5*(x==0)                    #Assign half weight for nests with 0 eggs to first component

## Error in eval(expr, envir, enclos): object 'x' not found

v[,2]  = 1-v[,1]                              #Assign all of the remaining nests to the second component

## Error in eval(expr, envir, enclos): object 'v' not found

w      = mean(v[,1])                          #weight of the first component

## Error in mean(v[, 1]): object 'v' not found

lambda = sum(x*v[,2])/sum(v[,2])     #parameter (mean) of the second component

## Error in eval(expr, envir, enclos): object 'x' not found

## The actual MCMC algorithm starts here
# Priors
aa     = rep(1,KK)  # Uniform prior on w
alpha  = 2 #For Gamma distribution prior number of obs = alpha-1
beta   = 1 #Sum of observations = beta

# Number of iterations of the sampler
rrr    = 6000
burn   = 1000

# Storing the samples
cc.out      = array(0, dim=c(rrr, n))

## Error in array(0, dim = c(rrr, n)): 'list' object cannot be coerced to type 'integer'

w.out       = rep(0, rrr)
lambda.out  = array(0, dim=c(rrr, 1))
logpost     = rep(0, rrr)

# MCMC iterations
# MCMC iterations
for(s in 1:rrr){
  # Sample the indicators
  cc = rep(0,n)
  for(i in 1:n){
    v       = rep(0,KK)
    if (x[i]==0) {
      v[1]  = (w/(w+(1-w)*exp(-lambda)))
    } else {
      v[1]  = 0.0
    }
    v[2]    = 1.0 - v[1]
    cc[i]   = sample(1:KK, 1, replace=TRUE, prob=v)
  }
  
  # Sample the weights
  w         = rbeta(1, aa[1] + sum(cc==1), aa[2] + sum(cc==2))
  
  # Sample lambda
  nk        = sum(cc==2)
  xsumk     = sum(x[cc==2])
  alpha.hat = alpha + xsumk
  beta.hat  = beta + nk
  lambda    = rgamma(1, shape = alpha.hat, rate = beta.hat)
  
  # Store samples
  cc.out[s,]     = cc
  w.out[s]       = w
  lambda.out[s,] = lambda
  for(i in 1:n){
    if(cc[i]==1){
      logpost[s] = logpost[s] + log(w)
    }else{
      logpost[s] = logpost[s] + log(1-w) + dpois(x[i], lambda, log=TRUE)
    }
  }
  logpost[s] = logpost[s] + dbeta(w, aa[1], aa[2],log = T)
  logpost[s] = logpost[s] + dgamma(lambda, shape = alpha, rate = beta)
  if(s/500==floor(s/500)){
    print(paste("s =",s, w, lambda, logpost[s]))
  }
}

## Error in eval(expr, envir, enclos): cannot coerce type 'closure' to vector of type 'double'

## Plot the logposterior distribution for various samples
par(mfrow=c(1,1))
par(mar=c(4,4,1,1)+0.1)
plot(logpost, type="l", xlab="Iterations", ylab="Log posterior")

w_avg = sum(w.out[(burn+1):rrr])/(rrr-burn)
lambda_avg = sum(lambda.out[(burn+1):rrr])/(rrr-burn)
print(paste(w_avg, lambda_avg))

## [1] "0 0"

w_sum = 0
lambda_sum = 0
counter = 0
for(s in 1:(rrr-burn)){
  w_sum = w_sum + w.out[s+burn]
  lambda_sum = lambda_sum + lambda.out[s+burn]
  counter = counter + 1
}
print(paste(counter, w_sum/counter, lambda_sum/counter))

## [1] "5000 0 0"

Provide you estimates of the posterior means $E (λ ∣ x)$ and $E (w ∣ x)$ .

#E(lambda|x) = 3.056 and E(w|x) = 0.398
mean(lambda_sum)

## [1] 0

mean(w_sum)

## [1] 0

2.3.3.2 Marking

Are the parameters initialized in a reasonable way?

For the other parameters, it is reasonable to initialize $λ$ to the mean of the observations and the weight $ω$ to a relatively small value (as we did in the EM algorithm).

Hence, the code for the initialization of the algorithm might look something like:

n        = length(x)

## Error in eval(expr, envir, enclos): object 'x' not found

cc       = rep(0, n)

## Error in eval(expr, envir, enclos): cannot coerce type 'closure' to vector of type 'double'

cc[x==0] = sample(1:2, sum(x==0), replace=T, prob=c(1/2, 1/2))

## Error in sample.int(length(x), size, replace, prob): object 'x' not found

cc[x!=0] = 2

## Error in cc[x != 0] = 2: object 'cc' not found

lambda   = mean(x)

## Error in mean(x): object 'x' not found

w        = 0.2

However, be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the indicators $c_{1}, \dots, c_{n}$ correct?

The indicators $c_{1}, \dots, c_{n}$ are conditionally independent from each other, and we have

$Pr (c_{i} = 1 ∣ \dots) \propto {\begin{cases} w & x_{i} = 0 \\ 0 & otherwise \end{cases}$

while

$Pr (c_{i} = 2 ∣ \dots) \propto {\begin{cases} (1 - w) \frac{λ^{x_{i}} e^{- λ}}{x_{i}!} & x_{i} = 0 \\ 1 & otherwise \end{cases}$

Hence, the code to sample the indicators might look something like this:

# Full conditional for cc
for(i in 1:n){
  v = rep(0,2)
  if(x[i]==0){
    v[1] = log(w)
    v[2] = log(1-w) + dpois(x[i], lambda, log=TRUE)
    v    = exp(v - max(v))/sum(exp(v - max(v)))
  }else{
    v[1] = 0
    v[2] = 1
  }
  cc[i] = sample(1:2, 1, replace=TRUE, prob=v)
}

## Error in 1:n: NA/NaN argument

Please be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the weight ww correct?

This is a simple one, as its structure is common to all mixture models. Recalling that the prior on $ω$ is a uniform distribution on [0,1], we have

$ω ∣ \dots \sim Beta (m (c) + 1, n - m (c) + 1)$

where $m (c)$ is the number of observations that $c$ assigns to component 1. The associated code might look something like this

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+n-sum(cc==1))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + n - sum(cc == 1)): object 'cc' not found

or, alternatively,

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+sum(cc==2))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + sum(cc == 2)): object 'cc' not found

Please remember to be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the rate $λ$ correct?

Because we use a exponential prior on $λ$ , the full conditional posterior is a Gamma distribution,

$λ ∣ \dots \sim Gamma (1 + \sum_{i : c_{i} = 2} x_{i}, 1 + n - m (c))$

Hence, the code for this prompt might look something like:

lambda = rgamma(1, sum(x[cc==2]) + 1, sum(cc==2) + 1)

## Error in rgamma(1, sum(x[cc == 2]) + 1, sum(cc == 2) + 1): object 'x' not found

0点 No
1点 Yes

Are the posterior means generated by the algorithm correct?

0点 No
1点 Yes

2.3.4 4th Peer

2.3.4.1 Asignment

Provide a Markov chain Monte Carlo algorithm to fit a zero-inflated Poisson distribution.

Provide you estimates of the posterior means $E (λ ∣ x)$ and $E (w ∣ x)$ .

2.3.4.2 Marking

Are the parameters initialized in a reasonable way?

For the other parameters, it is reasonable to initialize $λ$ to the mean of the observations and the weight $ω$ to a relatively small value (as we did in the EM algorithm).

Hence, the code for the initialization of the algorithm might look something like:

n        = length(x)

## Error in eval(expr, envir, enclos): object 'x' not found

cc       = rep(0, n)

## Error in eval(expr, envir, enclos): cannot coerce type 'closure' to vector of type 'double'

cc[x==0] = sample(1:2, sum(x==0), replace=T, prob=c(1/2, 1/2))

## Error in sample.int(length(x), size, replace, prob): object 'x' not found

cc[x!=0] = 2

## Error in cc[x != 0] = 2: object 'cc' not found

lambda   = mean(x)

## Error in mean(x): object 'x' not found

w        = 0.2

However, be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the indicators $c_{1}, \dots, c_{n}$ correct?

The indicators $c_{1}, \dots, c_{n}$ are conditionally independent from each other, and we have

$Pr (c_{i} = 1 ∣ \dots) \propto {\begin{cases} w & x_{i} = 0 \\ 0 & otherwise \end{cases}$

while

$Pr (c_{i} = 2 ∣ \dots) \propto {\begin{cases} (1 - w) \frac{λ^{x_{i}} e^{- λ}}{x_{i}!} & x_{i} = 0 \\ 1 & otherwise \end{cases}$

Hence, the code to sample the indicators might look something like this:

# Full conditional for cc
for(i in 1:n){
  v = rep(0,2)
  if(x[i]==0){
    v[1] = log(w)
    v[2] = log(1-w) + dpois(x[i], lambda, log=TRUE)
    v    = exp(v - max(v))/sum(exp(v - max(v)))
  }else{
    v[1] = 0
    v[2] = 1
  }
  cc[i] = sample(1:2, 1, replace=TRUE, prob=v)
}

## Error in 1:n: NA/NaN argument

Please be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the weight ww correct?

This is a simple one, as its structure is common to all mixture models. Recalling that the prior on $ω$ is a uniform distribution on [0,1], we have

$ω ∣ \dots \sim Beta (m (c) + 1, n - m (c) + 1)$

where $m (c)$ is the number of observations that $c$ assigns to component 1. The associated code might look something like this

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+n-sum(cc==1))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + n - sum(cc == 1)): object 'cc' not found

or, alternatively,

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+sum(cc==2))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + sum(cc == 2)): object 'cc' not found

Please remember to be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the rate $λ$ correct?

Because we use a exponential prior on $λ$ , the full conditional posterior is a Gamma distribution,

$λ ∣ \dots \sim Gamma (1 + \sum_{i : c_{i} = 2} x_{i}, 1 + n - m (c))$

Hence, the code for this prompt might look something like:

lambda = rgamma(1, sum(x[cc==2]) + 1, sum(cc==2) + 1)

## Error in rgamma(1, sum(x[cc == 2]) + 1, sum(cc == 2) + 1): object 'x' not found

0点 No
1点 Yes

Are the posterior means generated by the algorithm correct?

0点 No
1点 Yes

2.3.5 5th Peer

2.3.5.1 Asignment

Provide a Markov chain Monte Carlo algorithm to fit a zero-inflated Poisson distribution.

Provide you estimates of the posterior means $E (λ ∣ x)$ and $E (w ∣ x)$ .

2.3.5.2 Marking

Are the parameters initialized in a reasonable way?

For the other parameters, it is reasonable to initialize $λ$ to the mean of the observations and the weight $ω$ to a relatively small value (as we did in the EM algorithm).

Hence, the code for the initialization of the algorithm might look something like:

n        = length(x)

## Error in eval(expr, envir, enclos): object 'x' not found

cc       = rep(0, n)

## Error in eval(expr, envir, enclos): cannot coerce type 'closure' to vector of type 'double'

cc[x==0] = sample(1:2, sum(x==0), replace=T, prob=c(1/2, 1/2))

## Error in sample.int(length(x), size, replace, prob): object 'x' not found

cc[x!=0] = 2

## Error in cc[x != 0] = 2: object 'cc' not found

lambda   = mean(x)

## Error in mean(x): object 'x' not found

w        = 0.2

However, be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the indicators $c_{1}, \dots, c_{n}$ correct?

The indicators $c_{1}, \dots, c_{n}$ are conditionally independent from each other, and we have

$Pr (c_{i} = 1 ∣ \dots) \propto {\begin{cases} w & x_{i} = 0 \\ 0 & otherwise \end{cases}$

while

$Pr (c_{i} = 2 ∣ \dots) \propto {\begin{cases} (1 - w) \frac{λ^{x_{i}} e^{- λ}}{x_{i}!} & x_{i} = 0 \\ 1 & otherwise \end{cases}$

Hence, the code to sample the indicators might look something like this:

# Full conditional for cc
for(i in 1:n){
  v = rep(0,2)
  if(x[i]==0){
    v[1] = log(w)
    v[2] = log(1-w) + dpois(x[i], lambda, log=TRUE)
    v    = exp(v - max(v))/sum(exp(v - max(v)))
  }else{
    v[1] = 0
    v[2] = 1
  }
  cc[i] = sample(1:2, 1, replace=TRUE, prob=v)
}

## Error in 1:n: NA/NaN argument

Please be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the weight ww correct?

This is a simple one, as its structure is common to all mixture models. Recalling that the prior on $ω$ is a uniform distribution on [0,1], we have

$ω ∣ \dots \sim Beta (m (c) + 1, n - m (c) + 1)$

where $m (c)$ is the number of observations that $c$ assigns to component 1. The associated code might look something like this

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+n-sum(cc==1))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + n - sum(cc == 1)): object 'cc' not found

or, alternatively,

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+sum(cc==2))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + sum(cc == 2)): object 'cc' not found

Please remember to be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the rate $λ$ correct?

Because we use a exponential prior on $λ$ , the full conditional posterior is a Gamma distribution,

$λ ∣ \dots \sim Gamma (1 + \sum_{i : c_{i} = 2} x_{i}, 1 + n - m (c))$

Hence, the code for this prompt might look something like:

lambda = rgamma(1, sum(x[cc==2]) + 1, sum(cc==2) + 1)

## Error in rgamma(1, sum(x[cc == 2]) + 1, sum(cc == 2) + 1): object 'x' not found

0点 No
1点 Yes

Are the posterior means generated by the algorithm correct?

0点 No
1点 Yes

2.3.6 6th Peer

2.3.6.1 Asignment

Provide a Markov chain Monte Carlo algorithm to fit a zero-inflated Poisson distribution.

Provide you estimates of the posterior means $E (λ ∣ x)$ and $E (w ∣ x)$ .

2.3.6.2 Marking

Are the parameters initialized in a reasonable way?

For the other parameters, it is reasonable to initialize $λ$ to the mean of the observations and the weight $ω$ to a relatively small value (as we did in the EM algorithm).

Hence, the code for the initialization of the algorithm might look something like:

n        = length(x)

## Error in eval(expr, envir, enclos): object 'x' not found

cc       = rep(0, n)

## Error in eval(expr, envir, enclos): cannot coerce type 'closure' to vector of type 'double'

cc[x==0] = sample(1:2, sum(x==0), replace=T, prob=c(1/2, 1/2))

## Error in sample.int(length(x), size, replace, prob): object 'x' not found

cc[x!=0] = 2

## Error in cc[x != 0] = 2: object 'cc' not found

lambda   = mean(x)

## Error in mean(x): object 'x' not found

w        = 0.2

However, be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the indicators $c_{1}, \dots, c_{n}$ correct?

The indicators $c_{1}, \dots, c_{n}$ are conditionally independent from each other, and we have

$Pr (c_{i} = 1 ∣ \dots) \propto {\begin{cases} w & x_{i} = 0 \\ 0 & otherwise \end{cases}$

while

$Pr (c_{i} = 2 ∣ \dots) \propto {\begin{cases} (1 - w) \frac{λ^{x_{i}} e^{- λ}}{x_{i}!} & x_{i} = 0 \\ 1 & otherwise \end{cases}$

Hence, the code to sample the indicators might look something like this:

# Full conditional for cc
for(i in 1:n){
  v = rep(0,2)
  if(x[i]==0){
    v[1] = log(w)
    v[2] = log(1-w) + dpois(x[i], lambda, log=TRUE)
    v    = exp(v - max(v))/sum(exp(v - max(v)))
  }else{
    v[1] = 0
    v[2] = 1
  }
  cc[i] = sample(1:2, 1, replace=TRUE, prob=v)
}

## Error in 1:n: NA/NaN argument

Please be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the weight ww correct?

This is a simple one, as its structure is common to all mixture models. Recalling that the prior on $ω$ is a uniform distribution on [0,1], we have

$ω ∣ \dots \sim Beta (m (c) + 1, n - m (c) + 1)$

where $m (c)$ is the number of observations that $c$ assigns to component 1. The associated code might look something like this

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+n-sum(cc==1))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + n - sum(cc == 1)): object 'cc' not found

or, alternatively,

# Full conditional for w
w = rbeta(1, 1+sum(cc==1), 1+sum(cc==2))

## Error in rbeta(1, 1 + sum(cc == 1), 1 + sum(cc == 2)): object 'cc' not found

Please remember to be open-minded when reviewing this code, as there are many ways to accomplish the same thing in R.

0点 No
1点 Yes

Is the full conditional for the rate $λ$ correct?

Because we use a exponential prior on $λ$ , the full conditional posterior is a Gamma distribution,

$λ ∣ \dots \sim Gamma (1 + \sum_{i : c_{i} = 2} x_{i}, 1 + n - m (c))$

Hence, the code for this prompt might look something like:

lambda = rgamma(1, sum(x[cc==2]) + 1, sum(cc==2) + 1)

## Error in rgamma(1, sum(x[cc == 2]) + 1, sum(cc == 2) + 1): object 'x' not found

0点 No
1点 Yes

Are the posterior means generated by the algorithm correct?

0点 No
1点 Yes

2.4 ディスカッション

3 Appendix

3.1 Blooper

3.2 Documenting File Creation

It’s useful to record some information about how your file was created.

File creation date: 2021-05-24
File latest updated date: 2021-05-24
R version 4.1.0 (2021-05-18)
rmarkdown package version: 2.8
File version: 1.0.0
Author Profile: ®γσ, Eng Lian Hu
GitHub: Source Code
Additional session information:

suppressMessages(require('dplyr', quietly = TRUE))
suppressMessages(require('magrittr', quietly = TRUE))
suppressMessages(require('formattable', quietly = TRUE))
suppressMessages(require('knitr', quietly = TRUE))
suppressMessages(require('kableExtra', quietly = TRUE))

sys1 <- devtools::session_info()$platform %>%
  unlist %>%
  data.frame(Category = names(.), session_info = .)
rownames(sys1) <- NULL

sys2 <- data.frame(Sys.info()) %>%
  dplyr::mutate(Category = rownames(.)) %>%
  .[2:1]
names(sys2)[2] <- c('Sys.info')
rownames(sys2) <- NULL

if (nrow(sys1) == 9 & nrow(sys2) == 8) {
  sys2 %<>% rbind(., data.frame(
  Category = 'Current time', 
  Sys.info = paste(as.character(lubridate::now('Asia/Tokyo')), 'JST🗾')))
} else {
  sys1 %<>% rbind(., data.frame(
  Category = 'Current time', 
  session_info = paste(as.character(lubridate::now('Asia/Tokyo')), 'JST🗾')))
}

sys <- cbind(sys1, sys2) %>%
  kbl(caption = 'Additional session information:') %>%
  kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive')) %>%
  row_spec(0, background = 'DimGrey', color = 'yellow') %>%
  column_spec(1, background = 'CornflowerBlue', color = 'red') %>%
  column_spec(2, background = 'grey', color = 'black') %>%
  column_spec(3, background = 'CornflowerBlue', color = 'blue') %>%
  column_spec(4, background = 'grey', color = 'white') %>%
  row_spec(9, bold = T, color = 'yellow', background = '#D7261E')

rm(sys1, sys2)
sys

Additional session information:
Category	session_info	Category	Sys.info
version	R version 4.1.0 (2021-05-18)	sysname	Linux
os	Ubuntu 20.04.2 LTS	release	5.8.0-54-generic
system	x86_64, linux-gnu	version	#61~20.04.1-Ubuntu SMP Thu May 13 00:05:49 UTC 2021
ui	X11	nodename	Scibrokes-Trading
language	en	machine	x86_64
collate	en_US.UTF-8	login	englianhu
ctype	en_US.UTF-8	user	englianhu
tz	Asia/Tokyo	effective_user	englianhu
date	2021-05-24	Current time	2021-05-24 04:58:46 JST🗾

Bayesian Statistics: Mixture Models (Assessment Week3 A Codes)

®γσ, Lian Hu ®

2021-05-24

1 Setting

1.1 SCSS Setup

1.2 Setup

2 受講生によるテスト：The MCMC algorithm for zero-inflated mixtures

2.1 説明

2.1.1 Review criteria

2.2 自分の提出物

2.2.1 Assignment

2.2.2 Marking

2.3 ピアレビュー

2.3.1 1st Peer

2.3.1.1 Asignment

2.3.1.2 Marking

2.3.2 2nd Peer

2.3.2.1 Asignment

2.3.2.2 Marking

2.3.3 3rd Peer

2.3.3.1 Asignment

2.3.3.2 Marking

2.3.4 4th Peer

2.3.4.1 Asignment

2.3.4.2 Marking

2.3.5 5th Peer

2.3.5.1 Asignment

2.3.5.2 Marking

2.3.6 6th Peer

2.3.6.1 Asignment

2.3.6.2 Marking

2.4 ディスカッション

3 Appendix

3.1 Blooper

3.2 Documenting File Creation

3.3 Reference