Assignment#8

Questions

9E1. Which of the following is a requirement of the simple Metropolis algorithm?

The parameters must be discrete.
The likelihood function must be Gaussian.
The proposal distribution must be symmetric.

#3 is a requirement of the simple Metropolis algorithm

9E2. Gibbs sampling is more efficient than the Metropolis algorithm. How does it achieve this extra efficiency? Are there any limitations to the Gibbs sampling strategy?

# Gibbs Sampling is more efficient by getting a good estimate of the posterior from Gibbs sampling with many fewer samples than a comparable Metropolis approach. Some limitations to the Gibbs sampling strategy:
# 1 Maybe we don't want to use conjugate priors because Some conjugate priors are actually pathological in shape, once you start building multilevel models and need priors for entire covariance matrixes.
# 2 as models become more coomplex and contain hundres or thousands or tens of thousands of paramteres, both Metropolis and Gibbs sampling become shockingly.

9E3. Which sort of parameters can Hamiltonian Monte Carlo not handle? Can you explain why?

#Hamiltonian Monte Carlo can not handle discrete parameters. HMC requires continuous parameters. It can’t glide through a discrete parameter.

9E4. Explain the difference between the effective number of samples, n_eff as calculated by Stan, and the actual number of samples.

# N_effective aims to estimate the number of 'ideal' samples. Ideal samples are entirely uncorrelated. Due to way MCMC works each next sample is actually correlated with the previous one to some extent. n_eff estimate how many independent samples we should get to collect same information as presented by the current sequence of posterior samples.

9E5. Which value should Rhat approach, when a chain is sampling the posterior distribution correctly?

# Rhat is a complicated estimate of the convergence of the Markov chains to the target distribution. It should approach 1.00 from above, when all is well.

9E6. Sketch a good trace plot for a Markov chain, one that is effectively sampling from the posterior distribution. What is good about its shape? Then sketch a trace plot for a malfunctioning Markov chain. What about its shape indicates malfunction?

library(rethinking)

data("rugged")
d <- rugged
d$log_gdp <- log(d$rgdppc_2000)
df <- d[ complete.cases(d$rgdppc_2000) , ]

df$log_gdp_std <- df$log_gdp / mean(df$log_gdp)
df$rugged_std <- df$rugged / max(df$rugged)
df$cid <- ifelse( df$cont_africa==1,1,2 )


dat_slim <- list(
    log_gdp_std = df$log_gdp_std,
    rugged_std = df$rugged_std,
    cid = as.integer( df$cid )
)
str(dat_slim)

## List of 3
##  $ log_gdp_std: num [1:170] 0.88 0.965 1.166 1.104 0.915 ...
##  $ rugged_std : num [1:170] 0.138 0.553 0.124 0.125 0.433 ...
##  $ cid        : int [1:170] 1 2 2 2 2 2 2 2 2 1 ...

c9e6 <- ulam(
    alist(
        log_gdp_std ~ dnorm( mu , sigma ) ,
        mu <- a[cid] + b[cid]*( rugged_std - 0.215 ),
        a[cid] ~ dnorm( 1 , 0.1 ),
        b[cid] ~ dnorm( 0 , 0.3 ),
        sigma ~ dexp( 1 )
    ) , data=dat_slim , chains=4 , cores=4 )


show(c9e6)

## Hamiltonian Monte Carlo approximation
## 2000 samples from 4 chains
## 
## Sampling durations (seconds):
##         warmup sample total
## chain:1   0.07   0.04  0.11
## chain:2   0.06   0.03  0.08
## chain:3   0.06   0.03  0.09
## chain:4   0.05   0.03  0.08
## 
## Formula:
## log_gdp_std ~ dnorm(mu, sigma)
## mu <- a[cid] + b[cid] * (rugged_std - 0.215)
## a[cid] ~ dnorm(1, 0.1)
## b[cid] ~ dnorm(0, 0.3)
## sigma ~ dexp(1)

traceplot( c9e6 )

## [1] 1000
## [1] 1
## [1] 1000

# Consider its stationairty, good mixing and convergence, we can say it is a healthy chain. 


y <- c(-1,1)

c9e6.1 <- ulam(
    alist(
        y ~ dnorm( mu , sigma ),
         mu <- alpha ,
    alpha ~ dnorm( 0 , 1000 ),
    sigma ~ dexp( 0.0001 )
) , data=list(y=y) , chains=3 )

## 
## SAMPLING FOR MODEL 'd26c527083e7eda89b17a8c2eccd6019' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.057 seconds (Warm-up)
## Chain 1:                0.019 seconds (Sampling)
## Chain 1:                0.076 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'd26c527083e7eda89b17a8c2eccd6019' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.088 seconds (Warm-up)
## Chain 2:                0.071 seconds (Sampling)
## Chain 2:                0.159 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'd26c527083e7eda89b17a8c2eccd6019' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.031 seconds (Warm-up)
## Chain 3:                0.026 seconds (Sampling)
## Chain 3:                0.057 seconds (Total)
## Chain 3:

## Warning: There were 48 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.

## Warning: Examine the pairs() plot to diagnose sampling problems

## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess

## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess

show(c9e6.1)

## Hamiltonian Monte Carlo approximation
## 1500 samples from 3 chains
## 
## Sampling durations (seconds):
##         warmup sample total
## chain:1   0.06   0.02  0.08
## chain:2   0.09   0.07  0.16
## chain:3   0.03   0.03  0.06
## 
## Formula:
## y ~ dnorm(mu, sigma)
## mu <- alpha
## alpha ~ dnorm(0, 1000)
## sigma ~ dexp(1e-04)

traceplot(c9e6.1)

## [1] 1000
## [1] 1
## [1] 1000

# The path of the chain doesnt stay within the same high-probability portion of the posterior distribution.This indicates it is not a healthy chian

9E7. Repeat the problem above, but now for a trace rank plot.

trankplot( c9e6 )
#This is a healthy Markov chain, both stationary and well-mixing.
trankplot( c9e6.1 )

9M1. Re-estimate the terrain ruggedness model from the chapter, but now using a uniform prior for the standard deviation, sigma. The uniform prior should be dunif(0,1). Visualize the priors. Use ulam to estimate the posterior. Visualize the posteriors for both models. Does the different prior have any detectible influence on the posterior distribution of sigma? Why or why not?

library(rethinking)

data(rugged)
d <- rugged
d$log_gdp <- log(d$rgdppc_2000)
dd <- d[ complete.cases(d$rgdppc_2000) , ]

dd$log_gdp_std <- dd$log_gdp/ mean(dd$log_gdp)
dd$rugged_std<- dd$rugged/max(dd$rugged)

dd$cid<-ifelse(dd$cont_africa==1,1,2)

c9m1 <- quap(
  alist(
    log_gdp_std ~ dnorm( mu , sigma ) ,
    mu <- a[cid] + b[cid]* (rugged_std-0.215) ,
    a[cid] ~ dnorm(1,0.1),
    b[cid] ~ dnorm(0,0.3),
    sigma ~ dexp(1)
  ) , 
  data=dd)

precis(c9m1 , depth=2)

##             mean          sd        5.5%      94.5%
## a[1]   0.8865869 0.015674958  0.86153530  0.9116385
## a[2]   1.0505799 0.009936098  1.03470014  1.0664598
## b[1]   0.1328217 0.074200624  0.01423478  0.2514086
## b[2]  -0.1426052 0.054746676 -0.23010092 -0.0551094
## sigma  0.1094885 0.005934539  0.10000400  0.1189731

pairs(c9m1)

c9m1_unif <- quap(
  alist(
    log_gdp_std ~ dnorm( mu , sigma ) ,
    mu <- a[cid] + b[cid]* (rugged_std-0.215) ,
    a[cid] ~ dnorm(1,0.1),
    b[cid] ~ dnorm(0,0.3),
    sigma ~ dunif(0,1)
  ) , 
  data=dd)



precis(c9m1_unif , depth=2)

##             mean          sd        5.5%       94.5%
## a[1]   0.8865654 0.015680510  0.86150495  0.91162592
## a[2]   1.0505685 0.009939708  1.03468297  1.06645412
## b[1]   0.1325001 0.074226391  0.01387196  0.25112818
## b[2]  -0.1425697 0.054766092 -0.23009645 -0.05504287
## sigma  0.1095286 0.005939979  0.10003540  0.11902186

pairs(c9m1_unif)

# It does not have detectible influence on the posterior distribution of sigma because ulam is using Stan model that defines the sampler. Stan models look similar and make them much more flexible.

9M2. Modify the terrain ruggedness model again. This time, change the prior for b[cid] to dexp(0.3). What does this do to the posterior distribution? Can you explain it?

c9m2_exp <- ulam(
  alist(
    log_gdp_std ~ dnorm( mu , sigma ) ,
    mu <- a[cid] + b[cid]* (rugged_std-0.215) ,
    a[cid] ~ dnorm(1,0.1),
    b[cid] ~ dnorm(0,0.3),
    sigma ~ dexp(1)
  ) , data=dat_slim, chains=4, cores=4)


precis(c9m2_exp , depth=2)

##             mean          sd        5.5%       94.5%    n_eff     Rhat4
## a[1]   0.8864045 0.015960929  0.86155696  0.91090473 2778.886 0.9993907
## a[2]   1.0508261 0.009650741  1.03575514  1.06670114 2881.797 1.0003206
## b[1]   0.1327605 0.074369128  0.01727493  0.24729694 2879.805 0.9983616
## b[2]  -0.1422516 0.057288918 -0.23521405 -0.05010655 2361.053 0.9985277
## sigma  0.1115330 0.005946157  0.10234674  0.12131767 2445.057 0.9982825

pairs(c9m2_exp)

show(c9m2_exp)

## Hamiltonian Monte Carlo approximation
## 2000 samples from 4 chains
## 
## Sampling durations (seconds):
##         warmup sample total
## chain:1   0.06   0.04  0.10
## chain:2   0.06   0.05  0.11
## chain:3   0.08   0.04  0.11
## chain:4   0.04   0.03  0.08
## 
## Formula:
## log_gdp_std ~ dnorm(mu, sigma)
## mu <- a[cid] + b[cid] * (rugged_std - 0.215)
## a[cid] ~ dnorm(1, 0.1)
## b[cid] ~ dnorm(0, 0.3)
## sigma ~ dexp(1)

# We don't see much difference after the comparirson. The b parameter is somewhat different, which caused fewer samples, and is also no longer takes any negative values.

9M3. Re-estimate one of the Stan models from the chapter, but at different numbers of warmup iterations. Be sure to use the same number of sampling iterations in each case. Compare the n_eff values. How much warmup is enough?

data("rugged")

d <- rugged

d$log_gdp <- log(d$rgdppc_2000)

dd <- d[complete.cases(d$rgdppc_2000), ]

dd$log_gdp_std <- dd$log_gdp / mean(dd$log_gdp)
dd$rugged_std <- dd$rugged/max(dd$rugged)

dd$cid <- ifelse(dd$cont_africa==1,1,2)

dat_slim <- list(log_gdp_std = dd$log_gdp_std,
                 rugged_std = dd$rugged_std,
                 cid = as.integer(dd$cid))

c9m3 <- ulam(
  alist(
    log_gdp_std ~ dnorm(mu, sigma),
    mu <- a[cid] + b[cid] * (rugged_std - 0.215),
    a[cid] ~ dnorm(1, 0.1),
    b[cid] ~ dnorm(0, 0.3),
    sigma ~ dexp(1)
  ) , data = dat_slim, chains = 1, warmup = 600)

## 
## SAMPLING FOR MODEL '4e638bc82c4247257e76c1202e0a46c7' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 1: Iteration: 600 / 1000 [ 60%]  (Warmup)
## Chain 1: Iteration: 601 / 1000 [ 60%]  (Sampling)
## Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.05 seconds (Warm-up)
## Chain 1:                0.022 seconds (Sampling)
## Chain 1:                0.072 seconds (Total)
## Chain 1:

show(c9m3)

## Hamiltonian Monte Carlo approximation
## 400 samples from 1 chain
## 
## Sampling durations (seconds):
##         warmup sample total
## chain:1   0.05   0.02  0.07
## 
## Formula:
## log_gdp_std ~ dnorm(mu, sigma)
## mu <- a[cid] + b[cid] * (rugged_std - 0.215)
## a[cid] ~ dnorm(1, 0.1)
## b[cid] ~ dnorm(0, 0.3)
## sigma ~ dexp(1)

precis(c9m3, depth = 2)

##             mean          sd        5.5%       94.5%    n_eff     Rhat4
## a[1]   0.8874003 0.014761890  0.86614632  0.91176942 621.8482 0.9992238
## a[2]   1.0503348 0.009156334  1.03575124  1.06476839 440.2099 0.9975548
## b[1]   0.1290877 0.073252133  0.02434694  0.24321866 577.8040 0.9989589
## b[2]  -0.1440997 0.058759258 -0.22735020 -0.05680436 482.6061 0.9982579
## sigma  0.1107652 0.006161731  0.10167188  0.12073659 447.8812 0.9974971

c9m3.1 <- ulam(
  alist(
    log_gdp_std ~ dnorm(mu, sigma),
    mu <- a[cid] + b[cid] * (rugged_std - 0.215),
    a[cid] ~ dnorm(1, 0.1),
    b[cid] ~ dnorm(0, 0.3),
    sigma ~ dexp(1)
  ) , data = dat_slim, chains = 1, warmup = 400)

## 
## SAMPLING FOR MODEL '4e638bc82c4247257e76c1202e0a46c7' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 1: Iteration: 401 / 1000 [ 40%]  (Sampling)
## Chain 1: Iteration: 500 / 1000 [ 50%]  (Sampling)
## Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.032 seconds (Warm-up)
## Chain 1:                0.031 seconds (Sampling)
## Chain 1:                0.063 seconds (Total)
## Chain 1:

show(c9m3.1)

## Hamiltonian Monte Carlo approximation
## 600 samples from 1 chain
## 
## Sampling durations (seconds):
##         warmup sample total
## chain:1   0.03   0.03  0.06
## 
## Formula:
## log_gdp_std ~ dnorm(mu, sigma)
## mu <- a[cid] + b[cid] * (rugged_std - 0.215)
## a[cid] ~ dnorm(1, 0.1)
## b[cid] ~ dnorm(0, 0.3)
## sigma ~ dexp(1)

precis(c9m3.1, depth = 2)

##             mean          sd        5.5%       94.5%     n_eff     Rhat4
## a[1]   0.8876273 0.015792125  0.86272725  0.91218271 1164.4104 0.9992711
## a[2]   1.0504472 0.010111935  1.03409650  1.06715859 1361.1905 0.9985107
## b[1]   0.1358582 0.069990504  0.01856559  0.24667380  883.1196 0.9990920
## b[2]  -0.1392163 0.055920704 -0.22925385 -0.05109806  960.4216 0.9994416
## sigma  0.1112215 0.005883601  0.10213461  0.12033566  958.9318 0.9983375

# The default warmup is 500, we compared the two models with 400 and 600 here. the m_eff is larger when warmup is more than 500. Therefore, 400 warmup iterations are enough

9H1. Run the model below and then inspect the posterior distribution and explain what it is accomplishing.

mp <- ulam(
 alist(
   a ~ dnorm(0,1),
   b ~ dcauchy(0,1)
 ), data=list(y=1) , chains=1 )

## 
## SAMPLING FOR MODEL 'bcf56ee89f6cf2a4224a4139ff01c7d4' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.015 seconds (Warm-up)
## Chain 1:                0.02 seconds (Sampling)
## Chain 1:                0.035 seconds (Total)
## Chain 1:

## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess

## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess

Assignment#8

Zimo Liu

2021-05-11

Chapter 9 - Markov Chain Monte Carlo

Questions

Compare the samples for the parameters a and b. Can you explain the different trace plots? If you are unfamiliar with the Cauchy distribution, you should look it up. The key feature to attend to is that it has no expected value. Can you connect this fact to the trace plot?