# Installing some useful packages
library(tidyverse)
library(kableExtra)
library(magrittr)
library(tscount)     # Analysis of Count Time Series
library(gridExtra)   # Combind the ggplot
library(invgamma)    # Inverse-Gamma
library(coda)        # Diagnostics for MCMC
library(data.table)  # Fast read data
library(parallel)    # Parallel computing

Problem 1

Poisson Regression in Time Series: \[ \begin{align} Y_{t} &\sim Poi(\lambda_{t}),\ t = 1,2...n \\ log(\lambda_{t}) &= \alpha + \beta x_{t} + \eta_{t} \\ \eta_{t} &= \phi\eta_{t-1} + a_{t},\quad a_{t}\sim N(0, \sigma^{2}_{a}) \\ \theta &= (\alpha, \beta, \phi, \sigma^{2}_{a}) \end{align} \] Let \({\bf y} = (y_{1}, y_{2},..., y_{n})^{T}\) and \({\bf \eta} = (\eta_{1}, \eta_{2},..., \eta_{n})^{T}\). Given \(\eta_{0}=0\), it is known that \[ \begin{align} p({\bf y},{\bf \eta}|\theta) &= \left\{ \prod_{t=1}^{n} p(y_{t}|\eta_{t},\theta) \right\} p({\bf \eta}|\theta) \\ &= \left\{ \prod_{t=1}^{n} \frac{e^{y_{t}(\alpha + \beta x_{t} + \eta_{t})} e^{-e^{ \alpha + \beta x_{t} + \eta_{t}}}}{y_{t}!}\right\} \left\{ \prod_{t=1}^{n} p(\eta_{t}|\eta_{t-1},\phi,\sigma^{2}_{a}) \right\} \end{align} \] where \(p(\eta_{t}|\eta_{t-1},\phi,\sigma^{2}_{a}) \sim N(\phi \eta_{t-1},\sigma^{2}_{a})\) satisfying \[ \begin{align} p(\eta_{t}|\eta_{t-1},\phi,\sigma^{2}_{a}) = \frac{1}{\sqrt{2 \pi \sigma^{2}_{a}}} exp\left\{\frac{-1}{2\sigma^{2}_{a}}(\eta_{t}-\phi \eta_{t-1})^{2}\right\},\ t = 1, 2, 3,... \end{align} \] The ecoli data (weekly counts from year 2001 week 1 to 2013 week 20; n = 646) are used for this homework. Some descriptions about the data are provided in the next page. From time series plot and boxplots, a clear yearly trend can be seen. For simplicity, we use 4 covariates (sin and cos functions) to model the (periodic) yearly pattern. Also, according to the sample ACF plot, there exists a strong temporal dependence in the series which is modeled by AR(1).
Take the MCMC methods to implement the posterior inference for \((\theta, {\bf \eta})\) given \({\bf y}\) and \(\eta_{0}=0\).

ecoli
year week cases
2001 1 5
2001 2 7
2001 3 17
2001 4 18
2001 5 10
2001 6 8
Note:
dimension = 646 * 3

Data Description:
Weekly number of reported disease cases caused by Escherichia coli(大腸桿菌) in the state of North Rhine-Westphalia (Germany) from January 2001 to May 2013.

# Set covariates (to model yearly patterns)
c52 <- cos(2 * pi * ecoli$week / 52)
s52 <- sin(2 * pi * ecoli$week / 52)
c26 <- cos(2 * pi * ecoli$week / 26)
s26 <- sin(2 * pi * ecoli$week / 26)
# Raw estimate for beta 
fit2 <- glm(ecoli$cases ~ c52 + s52 + c26 + s26, 
            family = poisson())
# Raw estimate for AR(1)
TS.fit <- ar.ols(fit2$residuals , order = 1, intercept = FALSE)
# Visualization
c(fit2$coefficients, TS.fit$ar, TS.fit$var.pred) %>%
  t() %>% round(., 4) %>% 
  kable(., "html", caption = "Raw estimate", 
        col.names = c("$\\alpha$", "$\\beta_{1}$", 
                      "$\\beta_{2}$", "$\\beta_{3}$", 
                      "$\\beta_{4}$", "$\\phi$",
                      "$\\sigma^{2}_{a}$")) %>%
  kable_styling(bootstrap_options = "striped", 
                full_width = F) %>%
  footnote(general = "Estimation by GLM")
Raw estimate
\(\alpha\) \(\beta_{1}\) \(\beta_{2}\) \(\beta_{3}\) \(\beta_{4}\) \(\phi\) \(\sigma^{2}_{a}\)
3.0045 -0.0946 -0.1884 -0.0237 0.0667 0.5751 0.1282
Note:
Estimation by GLM
  1. Set your prior for \(\theta\). You may specify the same setting used in the class. \[ \begin{align} Set\ \left\{ \begin{aligned} \alpha &\sim N(0, \tau^{2}) \\ \beta &\sim N(0, \tau^{2}I_{4}) \\ \phi &\sim U(-1,1) \\ \sigma^{2}_{a} &\sim Inv-gamma(a,b) \end{aligned} \right.\ and\quad \pi(\theta) = \pi(\alpha) \pi(\beta) \pi(\phi) \pi(\sigma^{2}_{a}),\quad Where,\ \beta = (\beta_{1},\beta_{2}, \beta_{3}, \beta_{4}) \end{align} \] where, setting \(\tau = 5, a = 3, b = 2\) (其中,a, b 的設定,是希望Inverse gamma 不會太informative,故給值給小一點)
  2. Derive the full conditional for each of \(\theta\) and \({\bf \eta}\), which could be in un-normalized form.
    < posterior inference for \((\theta, {\bf \eta})\) > \[ \begin{align} p(\theta, {\bf \eta}|{\bf y }) &\propto p({\bf y}|\theta, {\bf \eta}) * p({\bf \eta}|\theta) * \pi(\theta) \\ &\propto p({\bf y}|\theta, {\bf \eta}) * \prod_{t=1}^{n}p(\eta_{t}|\eta_{t-1},\theta) * \pi(\theta) \\ &\propto \prod_{t=1}^{n} e^{(\alpha+\beta x_{t}+\eta_{t})y_{t}} e^{-e^{(\alpha+\beta x_{t}+\eta_{t})}} * \prod_{t=1}^{n} (\sigma^{2}_{a})^{\frac{-1}{2}} exp\left\{ \frac{-(\eta_{t}-\phi \eta_{t-1})^{2}}{2\sigma^{2}_{a}} \right\} * (\tau^{2})^{-\frac{2}{2}} exp\left\{-\frac{\alpha^{2}+\sum_{i=1}^{4}\beta^{2}_{i}}{2\tau^{2}}\right\} * (\sigma^{2}_{a})^{-(a+1)} exp\left\{ \frac{-b}{\sigma^{2}_{a}} \right\} \end{align} \] < Full conditional density > \[ \begin{align} \left\{ \begin{aligned} p(\alpha|\beta,\phi,\sigma^{2}_{a},{\bf y},{\bf \eta}) &\propto e^{\alpha \sum_{t=1}^{n} y_{t}} e^{-e^{\alpha} \sum (e^{\beta x_{t} + \eta_{t}}) } * exp\left\{ -\frac{\alpha^{2}}{2 \tau^{2}} \right\} \\ p(\beta_{j}|\alpha,\phi,\sigma^{2}_{a},{\bf y},{\bf \eta}) &\propto e^{\beta_{j} \sum_{t=1}^{n} x_{jt} y_{t}} e^{\sum_{t=1}^{n}(-e^{\alpha+\beta_{j} x_{jt} + \eta_{t}})} * exp\left\{ -\frac{\beta_{j}^{2}}{2 \tau^{2}} \right\},\quad j = 1,2,3,4 \\ p(\phi|\alpha, \beta,\sigma^{2}_{a}, {\bf y},{\bf \eta}) &\propto exp\left\{ \frac{-1}{2\sigma^{2}_{a}} \sum_{t=1}^{n} (\eta_{t}-\phi \eta_{t-1})^{2} \right\} \\ &\propto exp\left\{ \frac{-1}{2\sigma^{2}_{a}} \left( \sum_{t=1}^{n} \eta^{2}_{t-1} \right) \left( \phi - \frac{\sum_{t=1}^{n} \eta_{t} \eta_{t-1}}{\sum_{t=1}^{n} \eta^{2}_{t-1}}\right)^{2} \right\} \\ &\sim N\left( \frac{\sum_{t=1}^{n} \eta_{t} \eta_{t-1}}{\sum_{t=1}^{n} \eta^{2}_{t-1}},\ \frac{\sigma^{2}_{a}}{\sum_{t=1}^{n} \eta^{2}_{t-1}} \right) \\ p(\sigma^{2}_{a}|\alpha,\beta,\phi,{\bf y}, {\bf \eta}) &\propto (\sigma^{2}_{a})^{\frac{-n}{2}} exp\left\{ \frac{-\sum_{t=1}^{n} \left( \eta_{t} - \phi \eta_{t-1}\right)^{2} }{2\sigma^{2}_{a}}\right\} * (\sigma^{2}_{a})^{-(a+1)} exp\left\{ \frac{-b}{\sigma^{2}_{t}} \right\} \\ &\propto (\sigma^{2}_{a})^{-(a+\frac{n}{2}+1)} exp\left\{ \frac{-1}{\sigma^{2}_{a}} \left[ b + \frac{\sum_{t=1}^{n} \left( \eta_{t} - \phi \eta_{t-1}\right)^{2} }{2} \right] \right\} \\ &\sim Inv-gamma \left( a^{*} = a+\frac{n}{2},\ b^{*} = b + \frac{\sum_{t=1}^{n} \left( \eta_{t} - \phi \eta_{t-1}\right)^{2} }{2} \right) \\ p(\eta_{t}|\alpha,\beta,\phi,\sigma^{2}_{a},{\bf y}) &\propto e^{\eta_{t}y_{t}} e^{-e^{\alpha+\beta x_{t} + \eta_{t}}} * p(\eta_{t}|\eta_{t-1},\theta) * p(\eta_{t+1}|\eta_{t},\theta) \\ &\propto e^{ \eta_{t}y_{t}} e^{-e^{\alpha+\beta x_{t} + \eta_{t}}} * exp\left\{ \frac{-(\eta_{t}-\phi \eta_{t-1})^{2}}{2 \sigma^{2}_{a}} \right\} * exp\left\{ \frac{-(\eta_{t+1}-\phi \eta_{t})^{2}}{2 \sigma^{2}_{a}} \right\} \quad \forall\ t=1,2...646 \end{aligned} \right. \end{align} \]

  3. Construct a Markov chain using M-H algorithm to draw random samples from the joint posterior \(p(\theta, {\bf \eta})\) and summarize the inference results. To achieve this goal, you need to provide the detailed MCMC procedures (proposals for each parameter), convergence checking of Markov chains, etc. Finally, summarize the posterior distribution for each \(\theta\) and selected \(\eta_{t}\) in histogram and summary statistics (e.g., posterior mean and sd).
    < M-H Gibbs Sampler algorithm>
    < Step 1 >
    Give initial value \(\theta^{(0)}=(\alpha^{(0)}, \beta^{(0)}_{1}, \beta^{(0)}_{2}, \beta^{(0)}_{3}, \beta^{(0)}_{4}, \phi^{(0)}, (\sigma^{2}_{a})^{(0)}, {\bf \eta}^{(0)})\)
    < Step 2 >
    For t = 1,2,… \[ \left\{ \begin{aligned} &\alpha^{(t)} \sim p(\alpha|\beta^{(t-1)}_{1}, \beta^{(t-1)}_{2}, \beta^{(t-1)}_{3}, \beta^{(t-1)}_{4}, \phi^{(t-1)}, (\sigma^{2}_{a})^{(t-1)}, {\bf y}, {\bf \eta}^{(t-1)}) \\ & \beta^{(t)}_{1} \sim p(\beta_{1} | \alpha^{(t)}, \beta^{(t-1)}_{2}, \beta^{(t-1)}_{3}, \beta^{(t-1)}_{4}, \phi^{(t-1)}, (\sigma^{2}_{a})^{(t-1)}, {\bf y}, {\bf \eta}^{(t-1)}) \\ & \beta^{(t)}_{2} \sim p(\beta_{2} | \alpha^{(t)}, \beta^{(t)}_{1}, \beta^{(t-1)}_{3}, \beta^{(t-1)}_{4}, \phi^{(t-1)}, (\sigma^{2}_{a})^{(t-1)}, {\bf y}, {\bf \eta}^{(t-1)}) \\ & \beta^{(t)}_{3} \sim p(\beta_{3} | \alpha^{(t)}, \beta^{(t)}_{1}, \beta^{(t)}_{2}, \beta^{(t-1)}_{4}, \phi^{(t-1)}, (\sigma^{2}_{a})^{(t-1)}, {\bf y}, {\bf \eta}^{(t-1)}) \\ & \beta^{(t)}_{4} \sim p(\beta_{4} | \alpha^{(t)}, \beta^{(t)}_{1}, \beta^{(t)}_{2}, \beta^{(t)}_{3}, \phi^{(t-1)}, (\sigma^{2}_{a})^{(t-1)}, {\bf y}, {\bf \eta}^{(t-1)}) \\ & \phi^{(t)} \sim N\left( \frac{\sum_{j=1}^{n} \eta_{j}^{(t-1)} \eta_{j-1}^{(t-1)} }{\sum_{j=1}^{n} (\eta^{(t-1)}_{j-1})^{2}},\ \frac{ (\sigma^{2}_{a})^{(t-1)}}{\sum_{j=1}^{n} (\eta^{(t-1)}_{j-1})^{2}} \right) \\ & (\sigma^{2}_{a})^{(t)} \sim Inv-gamma \left( a^{*} = a+\frac{n}{2},\ b^{*} = b + \frac{\sum_{j=1}^{n} \left( \eta_{j}^{(t-1)} - (\phi)^{(t)} \eta_{j-1}^{(t-1)}\right)^{2} }{2} \right) \\ &(\eta_{j})^{(t)} \sim p(\eta_{j}|\alpha^{(t)},\beta^{(t)},\phi^{(t)},(\sigma^{2}_{a})^{(t)}, (\eta_{j-1})^{(t)},(\eta_{j+1})^{(t-1)},{\bf y})\quad \forall\ j=1,2...646 \end{aligned} \right. \\ \] Where, use random walker sampler to draw sample from p(.) and when t iteration \[ \begin{align} &p(\eta_{j}|\alpha^{(t)},\beta^{(t)},\phi^{(t)},(\sigma^{2}_{a})^{(t)}, (\eta_{j-1})^{(t)},(\eta_{j+1})^{(t-1)},{\bf y}) \\ \Rightarrow &log\left\{p(\eta_{j}|\alpha^{(t)},\beta^{(t)},\phi^{(t)},(\sigma^{2}_{a})^{(t)}, (\eta_{j-1})^{(t)},(\eta_{j+1})^{(t-1)},{\bf y}) \right\} \\ & \propto \left\{ \begin{aligned} &\eta_{j}y_{j} - exp\{ \alpha^{(t)} + \beta^{(t)}x_{jt} + \eta_{j} \} - \frac{( \eta_{j}-\phi^{(t)}*\eta_{j-1}^{(t)})^{2}}{ 2*(\sigma^{2}_{a})^{(t)}} - \frac{( \eta_{j+1}^{(t-1)}-\phi^{(t)}*\eta_{j})^{2}}{ 2*(\sigma^{2}_{a})^{(t)}},\quad j = 1,2...645 \\ &\eta_{j}y_{j} - exp\{ \alpha^{(t)} + \beta^{(t)}x_{jt} + \eta_{j} \} - \frac{( \eta_{j}-\phi^{(t)}*\eta_{j-1}^{(t)})^{2}}{ 2*(\sigma^{2}_{a})^{(t)}}, \quad j = 646 \end{aligned} \right. \end{align} \] < Step 3 >
    Repeat step 1 and step 2 until the markov chain coverge
    According to the above Metropolis-Hasting algorithm, then we write the function.

    # Set a covariate 
    c52 <- cos(2*pi*ecoli$week/52)
    s52 <- sin(2*pi*ecoli$week/52)
    c26 <- cos(2*pi*ecoli$week/26)
    s26 <- sin(2*pi*ecoli$week/26)
    
    # Raw estimate 
    fit2 <- glm(ecoli$cases ~ c52 + s52 + c26 + s26, 
                family = poisson())
    TS.fit <- ar.ols(fit2$residuals , 
                     order = 1, intercept = FALSE)
    
    # Write the function: some full conditional density 
    log.alpha_fun <- function(alpha = 3, 
                              beta = c(1, 2, 3, 4), 
                              eta = fit2$residuals,
                              tau = 10){
      # Define some constant 
      y <- log(ecoli$cases)
      X <- cbind(1, c52, s52, c26, s26)
      BETA <- c(alpha, beta) %>% t %>% t  # Matrix 
      # Calculate the value
      z1 <- (alpha * sum(y)) 
      z2 <- (as.vector(X %*% BETA) + eta) %>% exp %>% sum 
      z3 <- (alpha^2 / (2*tau^2))
      value <- (z1 - z2 - z3)
      return(value)
      }
    
    log.beta_fun <- function(alpha = 3, 
                             beta = c(1, 2, 3, 4), 
                             eta = fit2$residuals,
                             tau = 10, 
                             j = 1){
      # Define some constant 
      y <- log(ecoli$cases)
      X <- cbind(1, c52, s52, c26, s26)
      BETA <- c(alpha, beta) %>% t %>% t  # Matrix 
      # Calculate the value
      z1 <- (beta * sum(y * X[, j + 1]) ) 
      z2 <- (as.vector(X %*% BETA) + eta) %>% exp %>% sum 
      z3 <- (beta^2 / (2*tau^2))
      value <- (z1 - z2 - z3)
      return(value)
      }
    
    log.eta_fun <- function(eta_t = c(1, 2, 3), 
                            alpha = 3, 
                            beta  = c(1,2,3,4),
                            phi = 0.2,
                            sigma2_a = 0.15,
                            t = 1){
      # Define some constant 
      y <- log(ecoli$cases)
      X <- cbind(1, c52, s52, c26, s26)
      BETA <- c(alpha, beta) %>% t %>% t  # Matrix 
      # Calculate the value
      z1 <- eta_t[2]*y[t] 
      z2 <- ((X[t, ] %*% BETA ) + eta_t[2]) %>% exp
      z3 <- (eta_t[2] - phi*eta_t[1])^2 / (2*sigma2_a)
      z4 <- (eta_t[3] - phi*eta_t[2])^2 / (2*sigma2_a)
      if(t <= 645){
        value <- z1 - z2 - z3 - z4
        }else {
          value <- z1 - z2 - z3
        }
      return(value)
      }
    # M-H algorithm
    MH.Gibbs_Sampler <- function(coef_0 = fit2$coefficients %>%
                                   t %>% t,
                                 phi_0 = 0.575, 
                                 sigma2_0 = TS.fit$var.pred,
                                 eta_0 = fit2$residuals,
                                 a = 3, b = 2,  tau = 10, 
                                 iter = 10^3){
      # Define some constant 
      ## 1. alpha
      alpha <- succ_alpha <- as.numeric(iter)
      alpha[1] <- coef_0[1]
      ## 2. beta_j , j = 1, 2, 3, 4
      Beta <- matrix(NA, nrow = iter, ncol = 4) 
      Beta[1, ] <- coef_0[-1]
      ## 3. phi and sigma^{2}_{a}
      phi <- succ_phi <- as.numeric(iter)
      phi[1] <- phi_0
      sigma2_a <- succ_sigma_a <- as.numeric(iter)
      sigma2_a[1] <- sigma2_0
      ## 4. eta 
      Eta <- matrix(NA, nrow = iter, ncol = 646) 
      Eta[1, ] <- eta_0
    
      for (i in 2:iter) {
        # 1. Draw alpha - by Random Walker sampler
        ## Step 1 
        alpha[i] <- rnorm(1, mean = alpha[i-1], sd = tau) # sd = tau
        ##  Step 2 : Calculate the ratio
        r <- (log.alpha_fun(alpha = alpha[i], 
                            beta = Beta[i-1, ], 
                            eta = Eta[i-1, ], 
                            tau = tau) - 
                log.alpha_fun(alpha = alpha[i-1], 
                              beta = Beta[i-1, ], 
                              eta = Eta[i-1, ], 
                              tau = tau)) %>% exp
        succ_alpha[i] <- rbinom(n = 1, 
                                size = 1, prob = min(r, 1))
        if(succ_alpha[i] == 0) {
          alpha[i] <- alpha[i-1] 
          } 
    
        # 2. Draw beta - by Random Walker sampler 
        for (j in 1:4) {
          succ_beta <- as.numeric(iter)
          ## Step 1 
          Beta[i, j] <- rnorm(1, mean = Beta[i-1, j], sd = tau)
          # Determine statements
          if(j == 1) {
            Beta_start <- c(Beta[i, j], Beta[i-1, (j+1):4])
            Beta_minus <- c(Beta[i-1, j], Beta[i-1, (j+1):4])
            }else if(j <= 3) {
              Beta_start <- c(Beta[i, 1:j], Beta[i-1, (j+1):4])
              Beta_minus <- c(Beta[i, (1:j-1)], Beta[i-1, j], Beta[i-1, (j+1):4])
              }else {
                Beta_start <- Beta[i, 1:j]
                Beta_minus <- c(Beta[i, (1:j-1)], Beta[i-1, j])
                }
          ## Step 2 : Calculate the ratio
          r <- (log.beta_fun(alpha = alpha[i], 
                             beta = Beta_start, 
                             eta = Eta[i-1, ],
                             tau = tau,
                             j = j) - 
                  log.beta_fun(alpha = alpha[i], 
                               beta = Beta_minus,
                               eta = Eta[i-1, ], 
                               tau = tau,
                               j = j) ) %>% exp
          succ_beta[i] <- rbinom(n = 1, 
                                 size = 1, prob = min(r, 1))
          if(succ_beta[i] == 0) {
            Beta[i, j] <- Beta[i-1, j] 
          }
          }
    
        # 3. Draw phi and sigma^2_{a} - by Gibbs Sampler
        eta_t <- Eta[i-1, ]
        eta_t_1 <- c(0, eta_t)[-647]
    
        phi[i] <- rnorm(1, 
                        mean = sum(eta_t * eta_t_1)/sum(eta_t_1^2), 
                        sd = sqrt(sigma2_a[i-1] /sum(eta_t_1^2)) )
        sigma2_a[i] <- rinvgamma(n = 1, 
                                 shape =  a + (length(eta_t) / 2), 
                                 rate = b + (sum( (eta_t - phi[i]*eta_t_1)^2 ) /2))
    
        # 4. Draw eta - by Random Walker sampler
        for (j in 1: 646){
          succ_eta <- as.numeric(iter)
          ## Step 1 
          Eta[i, j] <- rnorm(1, mean = Eta[i-1, j], sd = tau) 
          # Determine statements
          if(j == 1){
            Eta_start <- c(0, Eta[i, j], Eta[i-1, j+1])
            Eta_minus <- c(0, Eta[i-1, j:2])
            }else if(j <= 645){
              Eta_start <- c(Eta[i, (j-1):j], Eta[i-1, j+1])
              Eta_minus <- c(Eta[i, (j-1):(j-1)], Eta[i-1, j:(j+1)])
              }else{
                Eta_start <- Eta[i, (j-1):j]
                Eta_minus <- c(Eta[i, (j-1):(j-1)], Eta[i-1, j])
                }
          ## Step 2 : Calculate the ratio
          r <- (log.eta_fun(eta_t = Eta_start, 
                            alpha = alpha[i], 
                            beta = Beta[i, ], 
                            phi = phi[i], 
                            sigma2_a = sigma2_a[i], 
                            t = j) - 
                  log.eta_fun(eta_t = Eta_minus, 
                              alpha = alpha[i], 
                              beta = Beta[i, ], 
                              phi = phi[i], 
                              sigma2_a = sigma2_a[i], 
                              t = j)) %>% exp
          succ_eta[i] <- rbinom(n = 1, 
                                size = 1, prob = min(r, 1))
          if(succ_eta[i] == 0) {
            Eta[i, j] <- Eta[i-1, j] 
          }
          }
        print(i)
        }
      MCMC_result <- cbind(alpha, Beta, phi, sigma2_a, Eta)
      return(MCMC_result)}
    # Setting Parallel Computing
    Beta_Test <- matrix(c(fit2$coefficients, 
                      TS.fit$ar, TS.fit$var.pred,
                      1, -1, .2, .3, .01, .5, .1,
                      2, -2, -.2, .3, .03, -.5, .2), 
                      byrow = T, ncol = 7)
    # detectCores()              # Calculate the core of CPU
    cl <- makeCluster(4)         # Construct Cluster
    # 讓所有節點載入指定套件
    clusterEvalQ(cl, 
                 c(library(tscount), library(invgamma), 
                   library(magrittr)))
    ## [[1]]
    ##  [1] "tscount"   "stats"     "graphics"  "grDevices" "utils"    
    ##  [6] "datasets"  "methods"   "base"      "invgamma"  "tscount"  
    ## [11] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
    ## [16] "methods"   "base"      "magrittr"  "invgamma"  "tscount"  
    ## [21] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
    ## [26] "methods"   "base"     
    ## 
    ## [[2]]
    ##  [1] "tscount"   "stats"     "graphics"  "grDevices" "utils"    
    ##  [6] "datasets"  "methods"   "base"      "invgamma"  "tscount"  
    ## [11] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
    ## [16] "methods"   "base"      "magrittr"  "invgamma"  "tscount"  
    ## [21] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
    ## [26] "methods"   "base"     
    ## 
    ## [[3]]
    ##  [1] "tscount"   "stats"     "graphics"  "grDevices" "utils"    
    ##  [6] "datasets"  "methods"   "base"      "invgamma"  "tscount"  
    ## [11] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
    ## [16] "methods"   "base"      "magrittr"  "invgamma"  "tscount"  
    ## [21] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
    ## [26] "methods"   "base"     
    ## 
    ## [[4]]
    ##  [1] "tscount"   "stats"     "graphics"  "grDevices" "utils"    
    ##  [6] "datasets"  "methods"   "base"      "invgamma"  "tscount"  
    ## [11] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
    ## [16] "methods"   "base"      "magrittr"  "invgamma"  "tscount"  
    ## [21] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
    ## [26] "methods"   "base"
    # 設定全域變數
    clusterExport(cl, 
                  c("c52", "c26", "s26", "s52",
                    "MH.Gibbs_Sampler", "Beta_Test", 
                    "fit2", "log.alpha_fun", "log.beta_fun", 
                    "log.eta_fun"))
    # parallel computing - main part 
    set.seed(2019)
    
    MCMC_List <- parLapply(cl, 1 : 3, function(x){
      mcmc_tmp <- MH.Gibbs_Sampler(coef_0 = Beta_Test[x, 1 : 5] %>% t %>% t, 
                                   phi_0 = Beta_Test[x, 6], 
                                   sigma2_0 = Beta_Test[x, 7],
                                   eta_0 = fit2$residuals,
                                   tau = 5, 
                                   iter = 1*10^1) 
      # Save MCMC 
      # write.table(mcmc_tmp, 
      #            file = paste0('MCMC_(3w)_', x, '.txt'), 
      #            row.names = FALSE)
      return(mcmc_tmp)
      })
    
    stopCluster(cl)   # Stop parallel computing 

    寫出Metropolis-Hasting algorithm寫成函數後,在給定三個不同initial value 下並迭代十五萬次得到三個Markon chain,並算Gelman Rubin Statistic判斷收斂與否,其中burn-in為2000

    因電腦運算太久,故先在其他電腦進行運算,並將結果儲存起來,再匯入此次作業

    # Different Markov chain Monte Carlo
    setwd("/Users/linweixiang/R/Bayesian Analysis/Dataset/MCMC")
    time.start <- Sys.time()
    mcmc1 <- fread("MCMC_15w_1.txt", header = TRUE) %>%
      as.data.frame()
    mcmc2 <- fread("MCMC_15w_2.txt", header = TRUE) %>% 
      as.data.frame()
    mcmc3 <- fread("MCMC_15w_3.txt", header = TRUE) %>% 
      as.data.frame()
    time.stop <- Sys.time()
    time.stop - time.start
    ## Time difference of 1.412277 mins

    上述時間,表讀入三個15萬筆的Markov Chain所需時間

    三個不同起始值,分別為

    rbind(mcmc1[1, c(1:7)], mcmc2[1, c(1:7)], 
          mcmc3[1, c(1:7)]) %>% round(., 4) %>% 
      `rownames<-`(., c("$\\theta_{1}$", "$\\theta_{2}$", "$\\theta_{3}$")) %>% 
      `colnames<-`(., c("$\\alpha$", "$\\beta_{1}$", 
                        "$\\beta_{2}$", "$\\beta_{3}$", 
                        "$\\beta_{4}$", "$\\phi$", 
                        "$\\sigma_{a}^{2}$")) %>% 
      kable(., "html", caption = "Different initial value") %>%
      kable_styling(bootstrap_options = "striped", 
                    full_width = F) %>% 
      footnote(general = "$\\eta_{t}$ be the residual of GLM")
    Different initial value
    \(\alpha\) \(\beta_{1}\) \(\beta_{2}\) \(\beta_{3}\) \(\beta_{4}\) \(\phi\) \(\sigma_{a}^{2}\)
    \(\theta_{1}\) 3.0045 -0.0946 -0.1884 -0.0237 0.0667 0.5751 0.1282
    \(\theta_{2}\) -3.0045 0.0946 0.1884 0.0237 -0.0667 0.1000 0.4000
    \(\theta_{3}\) 1.0000 -1.0000 0.2000 0.3000 0.0100 0.7000 0.1000
    Note:
    \(\eta_{t}\) be the residual of GLM

    Gelman Rubin Statistic is

    # Check convergence using multiple chains with different initials: by Gelman Rubin statistic
    GR_table <- matrix(NA, ncol = 1, nrow = 9) %>% 
      `colnames<-`(., c("GR_statistic"))
    l <- 1
    for(i in c(1:8, 653)){
      chain1 <- mcmc(mcmc1[-c(1:2000), i]) 
      chain2 <- mcmc(mcmc2[-c(1:2000), i]) 
      chain3 <- mcmc(mcmc3[-c(1:2000), i]) 
      combinedchains <- mcmc.list(chain1, chain2, chain3)
      GR_table[l, 1] <- gelman.diag(combinedchains)$psrf[1]
      l = l + 1
    }
    
    GR_table %>% round(., 4) %>% 
      `rownames<-`(., c("$\\alpha$", "$\\beta_{1}$", 
                        "$\\beta_{2}$", "$\\beta_{3}$", 
                        "$\\beta_{4}$", "$\\phi$", 
                        "$\\sigma_{a}^{2}$", "$\\eta_{1}$", 
                        "$\\eta_{646}$")) %>% t %>% 
      kable(., "html", caption = "Gelman Rubin statistic") %>%
      kable_styling(bootstrap_options = "striped", 
                    full_width = F)
    Gelman Rubin statistic
    \(\alpha\) \(\beta_{1}\) \(\beta_{2}\) \(\beta_{3}\) \(\beta_{4}\) \(\phi\) \(\sigma_{a}^{2}\) \(\eta_{1}\) \(\eta_{646}\)
    GR_statistic 1.0007 1.0017 1.0037 1.0021 1.0146 1.0016 1.004 1.0001 1.0006

    由上述結果可知,因為參數的Gelman Rubin Statistic接靠近1,故認為posterior distribution converge.且由於posterior distribution converge 通常 latent variables 的 posterior distribution 亦會收斂(為了表格顯示簡易,只匯出\(\eta_{1}\), \(\eta_{646}\)可看出這兩個latent variable 亦收斂)

    # Compare Markov Chain
    rbind(mcmc1[-c(1:2000), 1:7] %>% colMeans(), 
          mcmc2[-c(1:2000), 1:7] %>% colMeans(),
          mcmc3[-c(1:2000), 1:7] %>% colMeans()) %>% 
      round(., 4) %>% 
      `rownames<-`(., c("Chain 1", "Chain 2", "Chain 3")) %>% 
      `colnames<-`(., c("$\\alpha$", "$\\beta_{1}$", 
                        "$\\beta_{2}$", "$\\beta_{3}$", 
                        "$\\beta_{4}$", "$\\phi$", 
                        "$\\sigma_{a}^{2}$")) %>% 
      kable(., "html", caption = "posterior mean") %>%
      kable_styling(bootstrap_options = "striped", 
                    full_width = F)
    posterior mean
    \(\alpha\) \(\beta_{1}\) \(\beta_{2}\) \(\beta_{3}\) \(\beta_{4}\) \(\phi\) \(\sigma_{a}^{2}\)
    Chain 1 1.0480 -0.0215 -0.0563 -0.0052 0.0052 0.0371 0.0490
    Chain 2 0.9687 -0.0127 -0.0591 -0.0027 0.0002 0.1295 0.0485
    Chain 3 1.0472 -0.0211 -0.0562 -0.0038 0.0037 0.0367 0.0489

    由上述結果可知,因為收斂,所以三個Markov chain 的posterior mean 差不多,故後續分析使用 Chain 1 來做分析

    < Posterior inference >
    由於是使用Random walker Sampler 生成,故得出的Markov Chain相關性較高,故選每間隔200次抽一個樣本,使Markov Chain相關性降低

    # Setting MCMC chain 
    mcmc1 <- mcmc1[-c(1:2000), ]
    # 每200個抽一次
    n_row <- seq(1, dim(mcmc1)[1], 200) 
    mcmc_1 <- matrix(NA, 
                     nrow = n_row %>% length(), ncol = 653)
    k <- 1 
    for (i in n_row) {
     mcmc_1[k, ]<- mcmc1[i, ] %>% as.numeric() 
     k = k + 1 
    }
    mcmc1 <- mcmc_1 
    # Posterior inference
    cbind(mcmc1[, c(1:8, 653)] %>% apply(., 2, mean), 
          mcmc1[, c(1:8, 653)] %>% apply(., 2, sd)) %>% 
      round(., 4) %>% t %>% 
      `rownames<-`(., c("mean", "sd")) %>% 
      `colnames<-`(., c("$\\alpha$", "$\\beta_{1}$", 
                        "$\\beta_{2}$", "$\\beta_{3}$", 
                        "$\\beta_{4}$", "$\\phi$", 
                        "$\\sigma_{a}^{2}$", "$\\eta_{1}$", 
                        "$\\eta_{646}$")) %>% 
      kable(., "html", caption = "posterior inference") %>%
      kable_styling(bootstrap_options = "striped", 
                    full_width = F)
    posterior inference
    \(\alpha\) \(\beta_{1}\) \(\beta_{2}\) \(\beta_{3}\) \(\beta_{4}\) \(\phi\) \(\sigma_{a}^{2}\) \(\eta_{1}\) \(\eta_{646}\)
    mean 1.0484 -0.0209 -0.0565 -0.0054 0.0054 0.0376 0.0491 -0.0483 -0.0247
    sd 0.0253 0.0320 0.0294 0.0296 0.0284 0.1155 0.0058 0.2071 0.2033

    Above the table is posterior inference of parameter.

    # Visualization for parameter
    xlabs <- c(expression(alpha), expression(beta[1]),
               expression(beta[2]), expression(beta[3]), 
               expression(beta[4]), expression(phi),
               expression(sigma[a]^2))
    for (i in 1:7) {
      name <- paste("g", i, sep = "_")
      g <-  data.frame(x = mcmc1[, i]) %>% 
        ggplot(aes(x = x)) +
        geom_histogram(aes(y = ..density..),
                       bins = 30, 
                       fill = 'light blue', 
                       color = 'black') + 
        geom_density() + 
        labs(x = xlabs[i], y = "") + 
        theme(axis.title.x = element_text(size = 12))
      assign(name, g)
    }
    # paste("g", 1:7, sep = "_", collapse = ",") %>% noquote() 
    grid.arrange(g_1, g_2, g_3, g_4, g_5, g_6, g_7, ncol = 2)

    Above the figure is histogram of parameter.

    g1 <- ggplot(data = NULL, aes(x = mcmc1[, 8])) + 
      geom_histogram(aes(y = ..density..),
                     bins = 40, 
                     fill = 'light blue', color = 'black') + 
      geom_density()  +
      labs(x = expression(eta[1])) + 
      theme(axis.title.x = element_text(size = 13))
    
    g2 <- ggplot(data = NULL, aes(x = mcmc1[, 653])) + 
      geom_histogram(aes(y = ..density..),
                     bins = 40, 
                     fill = 'light blue', color = 'black') + 
      geom_density()  +
      labs(x = expression(eta[646])) + 
      theme(axis.title.x = element_text(size = 13))
    # Visualization for latent variable eta_{t} 
    xlabs <- c(expression(eta[1]), expression(eta[646]))
    l <- 1
    for (i in c(8, 653)) {
      name <- paste("g", l, sep = "")
      g <-  data.frame(x = mcmc1[, i]) %>% 
        ggplot(aes(x = x)) +
        geom_histogram(aes(y = ..density..),
                       bins = 30, 
                       fill = 'light blue', 
                       color = 'black') + 
        geom_density() + 
        labs(x = xlabs[l], y = "") + 
        theme(axis.title.x = element_text(size = 13))
      assign(name, g)
      l = l + 1
    }
    
    eta.mean <- mcmc1[, c(8:653)] %>% 
      apply(., 2, mean) %>% as.vector()
    
    g3 <- ggplot(data = NULL, aes(x = seq_along(eta.mean), 
                                  y = eta.mean)) + 
      geom_line() + 
      labs(x = "t", y = expression(E(eta[t]))) + 
      theme(axis.title.y = element_text(size = 13))
    
    # Combine above the ggplot 
    grid.arrange( arrangeGrob(g1, g2, ncol = 2), g3)

    Above the figure is posterior inference of \(\eta_{t}\).
    Note:
    用Random walker Sampler 生成三個十五萬筆的Markov Chain約需要20小時

  4. Comment on your inference results.
    由上述posterior inference 可看出,用Bayesian Analysis所估計的參數\((\alpha, \beta_{1}, \beta_{2}, \beta_{3}, \beta_{4}, \phi, \sigma^{2}_{a})\)和直接用GLM所估計的參數不同,即

    cbind(c(fit2$coefficients, TS.fit$ar, TS.fit$var.pred) , 
          mcmc1[, c(1:7)] %>% apply(., 2, mean)) %>% 
      round(., 4) %>% t %>% 
      `rownames<-`(., c("GLM", "Bayesian")) %>% 
      `colnames<-`(., c("$\\alpha$", "$\\beta_{1}$", 
                        "$\\beta_{2}$", "$\\beta_{3}$", 
                        "$\\beta_{4}$", "$\\phi$", 
                        "$\\sigma_{a}^{2}$")) %>% 
      kable(., "html") %>%
      kable_styling(bootstrap_options = "striped", 
                    full_width = F)
    \(\alpha\) \(\beta_{1}\) \(\beta_{2}\) \(\beta_{3}\) \(\beta_{4}\) \(\phi\) \(\sigma_{a}^{2}\)
    GLM 3.0045 -0.0946 -0.1884 -0.0237 0.0667 0.5751 0.1282
    Bayesian 1.0484 -0.0209 -0.0565 -0.0054 0.0054 0.0376 0.0491

    推測其原因是原本用glm去估計沒有考慮到殘差項的時間相依性,但用Bayesian去估計有考慮到,所以兩個方法估計出來不同頗合理。