Problem 1

Poisson Regression in Time Series: \[ \begin{align} Y_{t} &\sim Poi(\lambda_{t}),\ t = 1,2...n \\ log(\lambda_{t}) &= \alpha + \beta x_{t} + \eta_{t} \\ \eta_{t} &= \phi\eta_{t-1} + a_{t},\quad a_{t}\sim N(0, \sigma^{2}_{a}) \\ \theta &= (\alpha, \beta, \phi, \sigma^{2}_{a}) \end{align} \] Let \({\bf y} = (y_{1}, y_{2},..., y_{n})^{T}\) and \({\bf \eta} = (\eta_{1}, \eta_{2},..., \eta_{n})^{T}\). Given \(\eta_{0}=0\), it is known that \[ \begin{align} p({\bf y},{\bf \eta}|\theta) &= \left\{ \prod_{t=1}^{n} p(y_{t}|\eta_{t},\theta) \right\} p({\bf \eta}|\theta) \\ &= \left\{ \prod_{t=1}^{n} \frac{e^{y_{t}(\alpha + \beta x_{t} + \eta_{t})} e^{-e^{ \alpha + \beta x_{t} + \eta_{t}}}}{y_{t}!}\right\} \left\{ \prod_{t=1}^{n} p(\eta_{t}|\eta_{t-1},\phi,\sigma^{2}_{a}) \right\} \end{align} \] where \(p(\eta_{t}|\eta_{t-1},\phi,\sigma^{2}_{a}) \sim N(\phi \eta_{t-1},\sigma^{2}_{a})\) satisfying \[ \begin{align} p(\eta_{t}|\eta_{t-1},\phi,\sigma^{2}_{a}) = \frac{1}{\sqrt{2 \pi \sigma^{2}_{a}}} exp\left\{\frac{-1}{2\sigma^{2}_{a}}(\eta_{t}-\phi \eta_{t-1})^{2}\right\},\ t = 1, 2, 3,... \end{align} \] The ecoli data (weekly counts from year 2001 week 1 to 2013 week 20; n = 646) are used for this homework. Some descriptions about the data are provided in the next page. From time series plot and boxplots, a clear yearly trend can be seen. For simplicity, we use 4 covariates (sin and cos functions) to model the (periodic) yearly pattern. Also, according to the sample ACF plot, there exists a strong temporal dependence in the series which is modeled by AR(1).
Take the MCMC methods to implement the posterior inference for \((\theta, {\bf \eta})\) given \({\bf y}\) and \(\eta_{0}=0\).

ecoli
year	week	cases
2001	1	5
2001	2	7
2001	3	17
2001	4	18
2001	5	10
2001	6	8
Note:
dimension = 646 * 3

Data Description:
Weekly number of reported disease cases caused by Escherichia coli(大腸桿菌) in the state of North Rhine-Westphalia (Germany) from January 2001 to May 2013.

# Set covariates (to model yearly patterns)
c52 <- cos(2 * pi * ecoli$week / 52)
s52 <- sin(2 * pi * ecoli$week / 52)
c26 <- cos(2 * pi * ecoli$week / 26)
s26 <- sin(2 * pi * ecoli$week / 26)
# Raw estimate for beta 
fit2 <- glm(ecoli$cases ~ c52 + s52 + c26 + s26, 
            family = poisson())
# Raw estimate for AR(1)
TS.fit <- ar.ols(fit2$residuals , order = 1, intercept = FALSE)
# Visualization
c(fit2$coefficients, TS.fit$ar, TS.fit$var.pred) %>%
  t() %>% round(., 4) %>% 
  kable(., "html", caption = "Raw estimate", 
        col.names = c("$\\alpha$", "$\\beta_{1}$", 
                      "$\\beta_{2}$", "$\\beta_{3}$", 
                      "$\\beta_{4}$", "$\\phi$",
                      "$\\sigma^{2}_{a}$")) %>%
  kable_styling(bootstrap_options = "striped", 
                full_width = F) %>%
  footnote(general = "Estimation by GLM")

Raw estimate
\(\alpha\)	\(\beta_{1}\)	\(\beta_{2}\)	\(\beta_{3}\)	\(\beta_{4}\)	\(\phi\)	\(\sigma^{2}_{a}\)
3.0045	-0.0946	-0.1884	-0.0237	0.0667	0.5751	0.1282
Note:
Estimation by GLM

Set your prior for \(\theta\). You may specify the same setting used in the class. \[ \begin{align} Set\ \left\{ \begin{aligned} \alpha &\sim N(0, \tau^{2}) \\ \beta &\sim N(0, \tau^{2}I_{4}) \\ \phi &\sim U(-1,1) \\ \sigma^{2}_{a} &\sim Inv-gamma(a,b) \end{aligned} \right.\ and\quad \pi(\theta) = \pi(\alpha) \pi(\beta) \pi(\phi) \pi(\sigma^{2}_{a}),\quad Where,\ \beta = (\beta_{1},\beta_{2}, \beta_{3}, \beta_{4}) \end{align} \] where, setting \(\tau = 5, a = 3, b = 2\) (其中,a, b 的設定，是希望Inverse gamma 不會太informative，故給值給小一點)
Derive the full conditional for each of \(\theta\) and \({\bf \eta}\), which could be in un-normalized form.
< posterior inference for \((\theta, {\bf \eta})\) > \[ \begin{align} p(\theta, {\bf \eta}|{\bf y }) &\propto p({\bf y}|\theta, {\bf \eta}) * p({\bf \eta}|\theta) * \pi(\theta) \\ &\propto p({\bf y}|\theta, {\bf \eta}) * \prod_{t=1}^{n}p(\eta_{t}|\eta_{t-1},\theta) * \pi(\theta) \\ &\propto \prod_{t=1}^{n} e^{(\alpha+\beta x_{t}+\eta_{t})y_{t}} e^{-e^{(\alpha+\beta x_{t}+\eta_{t})}} * \prod_{t=1}^{n} (\sigma^{2}_{a})^{\frac{-1}{2}} exp\left\{ \frac{-(\eta_{t}-\phi \eta_{t-1})^{2}}{2\sigma^{2}_{a}} \right\} * (\tau^{2})^{-\frac{2}{2}} exp\left\{-\frac{\alpha^{2}+\sum_{i=1}^{4}\beta^{2}_{i}}{2\tau^{2}}\right\} * (\sigma^{2}_{a})^{-(a+1)} exp\left\{ \frac{-b}{\sigma^{2}_{a}} \right\} \end{align} \] < Full conditional density > \[ \begin{align} \left\{ \begin{aligned} p(\alpha|\beta,\phi,\sigma^{2}_{a},{\bf y},{\bf \eta}) &\propto e^{\alpha \sum_{t=1}^{n} y_{t}} e^{-e^{\alpha} \sum (e^{\beta x_{t} + \eta_{t}}) } * exp\left\{ -\frac{\alpha^{2}}{2 \tau^{2}} \right\} \\ p(\beta_{j}|\alpha,\phi,\sigma^{2}_{a},{\bf y},{\bf \eta}) &\propto e^{\beta_{j} \sum_{t=1}^{n} x_{jt} y_{t}} e^{\sum_{t=1}^{n}(-e^{\alpha+\beta_{j} x_{jt} + \eta_{t}})} * exp\left\{ -\frac{\beta_{j}^{2}}{2 \tau^{2}} \right\},\quad j = 1,2,3,4 \\ p(\phi|\alpha, \beta,\sigma^{2}_{a}, {\bf y},{\bf \eta}) &\propto exp\left\{ \frac{-1}{2\sigma^{2}_{a}} \sum_{t=1}^{n} (\eta_{t}-\phi \eta_{t-1})^{2} \right\} \\ &\propto exp\left\{ \frac{-1}{2\sigma^{2}_{a}} \left( \sum_{t=1}^{n} \eta^{2}_{t-1} \right) \left( \phi - \frac{\sum_{t=1}^{n} \eta_{t} \eta_{t-1}}{\sum_{t=1}^{n} \eta^{2}_{t-1}}\right)^{2} \right\} \\ &\sim N\left( \frac{\sum_{t=1}^{n} \eta_{t} \eta_{t-1}}{\sum_{t=1}^{n} \eta^{2}_{t-1}},\ \frac{\sigma^{2}_{a}}{\sum_{t=1}^{n} \eta^{2}_{t-1}} \right) \\ p(\sigma^{2}_{a}|\alpha,\beta,\phi,{\bf y}, {\bf \eta}) &\propto (\sigma^{2}_{a})^{\frac{-n}{2}} exp\left\{ \frac{-\sum_{t=1}^{n} \left( \eta_{t} - \phi \eta_{t-1}\right)^{2} }{2\sigma^{2}_{a}}\right\} * (\sigma^{2}_{a})^{-(a+1)} exp\left\{ \frac{-b}{\sigma^{2}_{t}} \right\} \\ &\propto (\sigma^{2}_{a})^{-(a+\frac{n}{2}+1)} exp\left\{ \frac{-1}{\sigma^{2}_{a}} \left[ b + \frac{\sum_{t=1}^{n} \left( \eta_{t} - \phi \eta_{t-1}\right)^{2} }{2} \right] \right\} \\ &\sim Inv-gamma \left( a^{*} = a+\frac{n}{2},\ b^{*} = b + \frac{\sum_{t=1}^{n} \left( \eta_{t} - \phi \eta_{t-1}\right)^{2} }{2} \right) \\ p(\eta_{t}|\alpha,\beta,\phi,\sigma^{2}_{a},{\bf y}) &\propto e^{\eta_{t}y_{t}} e^{-e^{\alpha+\beta x_{t} + \eta_{t}}} * p(\eta_{t}|\eta_{t-1},\theta) * p(\eta_{t+1}|\eta_{t},\theta) \\ &\propto e^{ \eta_{t}y_{t}} e^{-e^{\alpha+\beta x_{t} + \eta_{t}}} * exp\left\{ \frac{-(\eta_{t}-\phi \eta_{t-1})^{2}}{2 \sigma^{2}_{a}} \right\} * exp\left\{ \frac{-(\eta_{t+1}-\phi \eta_{t})^{2}}{2 \sigma^{2}_{a}} \right\} \quad \forall\ t=1,2...646 \end{aligned} \right. \end{align} \]

Construct a Markov chain using M-H algorithm to draw random samples from the joint posterior \(p(\theta, {\bf \eta})\) and summarize the inference results. To achieve this goal, you need to provide the detailed MCMC procedures (proposals for each parameter), convergence checking of Markov chains, etc. Finally, summarize the posterior distribution for each \(\theta\) and selected \(\eta_{t}\) in histogram and summary statistics (e.g., posterior mean and sd).
< M-H Gibbs Sampler algorithm>
< Step 1 >
Give initial value \(\theta^{(0)}=(\alpha^{(0)}, \beta^{(0)}_{1}, \beta^{(0)}_{2}, \beta^{(0)}_{3}, \beta^{(0)}_{4}, \phi^{(0)}, (\sigma^{2}_{a})^{(0)}, {\bf \eta}^{(0)})\)
< Step 2 >
For t = 1,2,… \[ \left\{ \begin{aligned} &\alpha^{(t)} \sim p(\alpha|\beta^{(t-1)}_{1}, \beta^{(t-1)}_{2}, \beta^{(t-1)}_{3}, \beta^{(t-1)}_{4}, \phi^{(t-1)}, (\sigma^{2}_{a})^{(t-1)}, {\bf y}, {\bf \eta}^{(t-1)}) \\ & \beta^{(t)}_{1} \sim p(\beta_{1} | \alpha^{(t)}, \beta^{(t-1)}_{2}, \beta^{(t-1)}_{3}, \beta^{(t-1)}_{4}, \phi^{(t-1)}, (\sigma^{2}_{a})^{(t-1)}, {\bf y}, {\bf \eta}^{(t-1)}) \\ & \beta^{(t)}_{2} \sim p(\beta_{2} | \alpha^{(t)}, \beta^{(t)}_{1}, \beta^{(t-1)}_{3}, \beta^{(t-1)}_{4}, \phi^{(t-1)}, (\sigma^{2}_{a})^{(t-1)}, {\bf y}, {\bf \eta}^{(t-1)}) \\ & \beta^{(t)}_{3} \sim p(\beta_{3} | \alpha^{(t)}, \beta^{(t)}_{1}, \beta^{(t)}_{2}, \beta^{(t-1)}_{4}, \phi^{(t-1)}, (\sigma^{2}_{a})^{(t-1)}, {\bf y}, {\bf \eta}^{(t-1)}) \\ & \beta^{(t)}_{4} \sim p(\beta_{4} | \alpha^{(t)}, \beta^{(t)}_{1}, \beta^{(t)}_{2}, \beta^{(t)}_{3}, \phi^{(t-1)}, (\sigma^{2}_{a})^{(t-1)}, {\bf y}, {\bf \eta}^{(t-1)}) \\ & \phi^{(t)} \sim N\left( \frac{\sum_{j=1}^{n} \eta_{j}^{(t-1)} \eta_{j-1}^{(t-1)} }{\sum_{j=1}^{n} (\eta^{(t-1)}_{j-1})^{2}},\ \frac{ (\sigma^{2}_{a})^{(t-1)}}{\sum_{j=1}^{n} (\eta^{(t-1)}_{j-1})^{2}} \right) \\ & (\sigma^{2}_{a})^{(t)} \sim Inv-gamma \left( a^{*} = a+\frac{n}{2},\ b^{*} = b + \frac{\sum_{j=1}^{n} \left( \eta_{j}^{(t-1)} - (\phi)^{(t)} \eta_{j-1}^{(t-1)}\right)^{2} }{2} \right) \\ &(\eta_{j})^{(t)} \sim p(\eta_{j}|\alpha^{(t)},\beta^{(t)},\phi^{(t)},(\sigma^{2}_{a})^{(t)}, (\eta_{j-1})^{(t)},(\eta_{j+1})^{(t-1)},{\bf y})\quad \forall\ j=1,2...646 \end{aligned} \right. \\ \] Where, use random walker sampler to draw sample from p(.) and when t iteration \[ \begin{align} &p(\eta_{j}|\alpha^{(t)},\beta^{(t)},\phi^{(t)},(\sigma^{2}_{a})^{(t)}, (\eta_{j-1})^{(t)},(\eta_{j+1})^{(t-1)},{\bf y}) \\ \Rightarrow &log\left\{p(\eta_{j}|\alpha^{(t)},\beta^{(t)},\phi^{(t)},(\sigma^{2}_{a})^{(t)}, (\eta_{j-1})^{(t)},(\eta_{j+1})^{(t-1)},{\bf y}) \right\} \\ & \propto \left\{ \begin{aligned} &\eta_{j}y_{j} - exp\{ \alpha^{(t)} + \beta^{(t)}x_{jt} + \eta_{j} \} - \frac{( \eta_{j}-\phi^{(t)}*\eta_{j-1}^{(t)})^{2}}{ 2*(\sigma^{2}_{a})^{(t)}} - \frac{( \eta_{j+1}^{(t-1)}-\phi^{(t)}*\eta_{j})^{2}}{ 2*(\sigma^{2}_{a})^{(t)}},\quad j = 1,2...645 \\ &\eta_{j}y_{j} - exp\{ \alpha^{(t)} + \beta^{(t)}x_{jt} + \eta_{j} \} - \frac{( \eta_{j}-\phi^{(t)}*\eta_{j-1}^{(t)})^{2}}{ 2*(\sigma^{2}_{a})^{(t)}}, \quad j = 646 \end{aligned} \right. \end{align} \] < Step 3 >
Repeat step 1 and step 2 until the markov chain coverge
According to the above Metropolis-Hasting algorithm, then we write the function.

# Set a covariate 
c52 <- cos(2*pi*ecoli$week/52)
s52 <- sin(2*pi*ecoli$week/52)
c26 <- cos(2*pi*ecoli$week/26)
s26 <- sin(2*pi*ecoli$week/26)

# Raw estimate 
fit2 <- glm(ecoli$cases ~ c52 + s52 + c26 + s26, 
            family = poisson())
TS.fit <- ar.ols(fit2$residuals , 
                 order = 1, intercept = FALSE)

# Write the function: some full conditional density 
log.alpha_fun <- function(alpha = 3, 
                          beta = c(1, 2, 3, 4), 
                          eta = fit2$residuals,
                          tau = 10){
  # Define some constant 
  y <- log(ecoli$cases)
  X <- cbind(1, c52, s52, c26, s26)
  BETA <- c(alpha, beta) %>% t %>% t  # Matrix 
  # Calculate the value
  z1 <- (alpha * sum(y)) 
  z2 <- (as.vector(X %*% BETA) + eta) %>% exp %>% sum 
  z3 <- (alpha^2 / (2*tau^2))
  value <- (z1 - z2 - z3)
  return(value)
  }

log.beta_fun <- function(alpha = 3, 
                         beta = c(1, 2, 3, 4), 
                         eta = fit2$residuals,
                         tau = 10, 
                         j = 1){
  # Define some constant 
  y <- log(ecoli$cases)
  X <- cbind(1, c52, s52, c26, s26)
  BETA <- c(alpha, beta) %>% t %>% t  # Matrix 
  # Calculate the value
  z1 <- (beta * sum(y * X[, j + 1]) ) 
  z2 <- (as.vector(X %*% BETA) + eta) %>% exp %>% sum 
  z3 <- (beta^2 / (2*tau^2))
  value <- (z1 - z2 - z3)
  return(value)
  }

log.eta_fun <- function(eta_t = c(1, 2, 3), 
                        alpha = 3, 
                        beta  = c(1,2,3,4),
                        phi = 0.2,
                        sigma2_a = 0.15,
                        t = 1){
  # Define some constant 
  y <- log(ecoli$cases)
  X <- cbind(1, c52, s52, c26, s26)
  BETA <- c(alpha, beta) %>% t %>% t  # Matrix 
  # Calculate the value
  z1 <- eta_t[2]*y[t] 
  z2 <- ((X[t, ] %*% BETA ) + eta_t[2]) %>% exp
  z3 <- (eta_t[2] - phi*eta_t[1])^2 / (2*sigma2_a)
  z4 <- (eta_t[3] - phi*eta_t[2])^2 / (2*sigma2_a)
  if(t <= 645){
    value <- z1 - z2 - z3 - z4
    }else {
      value <- z1 - z2 - z3
    }
  return(value)
  }

# M-H algorithm
MH.Gibbs_Sampler <- function(coef_0 = fit2$coefficients %>%
                               t %>% t,
                             phi_0 = 0.575, 
                             sigma2_0 = TS.fit$var.pred,
                             eta_0 = fit2$residuals,
                             a = 3, b = 2,  tau = 10, 
                             iter = 10^3){
  # Define some constant 
  ## 1. alpha
  alpha <- succ_alpha <- as.numeric(iter)
  alpha[1] <- coef_0[1]
  ## 2. beta_j , j = 1, 2, 3, 4
  Beta <- matrix(NA, nrow = iter, ncol = 4) 
  Beta[1, ] <- coef_0[-1]
  ## 3. phi and sigma^{2}_{a}
  phi <- succ_phi <- as.numeric(iter)
  phi[1] <- phi_0
  sigma2_a <- succ_sigma_a <- as.numeric(iter)
  sigma2_a[1] <- sigma2_0
  ## 4. eta 
  Eta <- matrix(NA, nrow = iter, ncol = 646) 
  Eta[1, ] <- eta_0

  for (i in 2:iter) {
    # 1. Draw alpha - by Random Walker sampler
    ## Step 1 
    alpha[i] <- rnorm(1, mean = alpha[i-1], sd = tau) # sd = tau
    ##  Step 2 : Calculate the ratio
    r <- (log.alpha_fun(alpha = alpha[i], 
                        beta = Beta[i-1, ], 
                        eta = Eta[i-1, ], 
                        tau = tau) - 
            log.alpha_fun(alpha = alpha[i-1], 
                          beta = Beta[i-1, ], 
                          eta = Eta[i-1, ], 
                          tau = tau)) %>% exp
    succ_alpha[i] <- rbinom(n = 1, 
                            size = 1, prob = min(r, 1))
    if(succ_alpha[i] == 0) {
      alpha[i] <- alpha[i-1] 
      } 

    # 2. Draw beta - by Random Walker sampler 
    for (j in 1:4) {
      succ_beta <- as.numeric(iter)
      ## Step 1 
      Beta[i, j] <- rnorm(1, mean = Beta[i-1, j], sd = tau)
      # Determine statements
      if(j == 1) {
        Beta_start <- c(Beta[i, j], Beta[i-1, (j+1):4])
        Beta_minus <- c(Beta[i-1, j], Beta[i-1, (j+1):4])
        }else if(j <= 3) {
          Beta_start <- c(Beta[i, 1:j], Beta[i-1, (j+1):4])
          Beta_minus <- c(Beta[i, (1:j-1)], Beta[i-1, j], Beta[i-1, (j+1):4])
          }else {
            Beta_start <- Beta[i, 1:j]
            Beta_minus <- c(Beta[i, (1:j-1)], Beta[i-1, j])
            }
      ## Step 2 : Calculate the ratio
      r <- (log.beta_fun(alpha = alpha[i], 
                         beta = Beta_start, 
                         eta = Eta[i-1, ],
                         tau = tau,
                         j = j) - 
              log.beta_fun(alpha = alpha[i], 
                           beta = Beta_minus,
                           eta = Eta[i-1, ], 
                           tau = tau,
                           j = j) ) %>% exp
      succ_beta[i] <- rbinom(n = 1, 
                             size = 1, prob = min(r, 1))
      if(succ_beta[i] == 0) {
        Beta[i, j] <- Beta[i-1, j] 
      }
      }

    # 3. Draw phi and sigma^2_{a} - by Gibbs Sampler
    eta_t <- Eta[i-1, ]
    eta_t_1 <- c(0, eta_t)[-647]

    phi[i] <- rnorm(1, 
                    mean = sum(eta_t * eta_t_1)/sum(eta_t_1^2), 
                    sd = sqrt(sigma2_a[i-1] /sum(eta_t_1^2)) )
    sigma2_a[i] <- rinvgamma(n = 1, 
                             shape =  a + (length(eta_t) / 2), 
                             rate = b + (sum( (eta_t - phi[i]*eta_t_1)^2 ) /2))

    # 4. Draw eta - by Random Walker sampler
    for (j in 1: 646){
      succ_eta <- as.numeric(iter)
      ## Step 1 
      Eta[i, j] <- rnorm(1, mean = Eta[i-1, j], sd = tau) 
      # Determine statements
      if(j == 1){
        Eta_start <- c(0, Eta[i, j], Eta[i-1, j+1])
        Eta_minus <- c(0, Eta[i-1, j:2])
        }else if(j <= 645){
          Eta_start <- c(Eta[i, (j-1):j], Eta[i-1, j+1])
          Eta_minus <- c(Eta[i, (j-1):(j-1)], Eta[i-1, j:(j+1)])
          }else{
            Eta_start <- Eta[i, (j-1):j]
            Eta_minus <- c(Eta[i, (j-1):(j-1)], Eta[i-1, j])
            }
      ## Step 2 : Calculate the ratio
      r <- (log.eta_fun(eta_t = Eta_start, 
                        alpha = alpha[i], 
                        beta = Beta[i, ], 
                        phi = phi[i], 
                        sigma2_a = sigma2_a[i], 
                        t = j) - 
              log.eta_fun(eta_t = Eta_minus, 
                          alpha = alpha[i], 
                          beta = Beta[i, ], 
                          phi = phi[i], 
                          sigma2_a = sigma2_a[i], 
                          t = j)) %>% exp
      succ_eta[i] <- rbinom(n = 1, 
                            size = 1, prob = min(r, 1))
      if(succ_eta[i] == 0) {
        Eta[i, j] <- Eta[i-1, j] 
      }
      }
    print(i)
    }
  MCMC_result <- cbind(alpha, Beta, phi, sigma2_a, Eta)
  return(MCMC_result)}

# Setting Parallel Computing
Beta_Test <- matrix(c(fit2$coefficients, 
                  TS.fit$ar, TS.fit$var.pred,
                  1, -1, .2, .3, .01, .5, .1,
                  2, -2, -.2, .3, .03, -.5, .2), 
                  byrow = T, ncol = 7)
# detectCores()              # Calculate the core of CPU
cl <- makeCluster(4)         # Construct Cluster
# 讓所有節點載入指定套件
clusterEvalQ(cl, 
             c(library(tscount), library(invgamma), 
               library(magrittr)))
## [[1]]
##  [1] "tscount"   "stats"     "graphics"  "grDevices" "utils"    
##  [6] "datasets"  "methods"   "base"      "invgamma"  "tscount"  
## [11] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [16] "methods"   "base"      "magrittr"  "invgamma"  "tscount"  
## [21] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [26] "methods"   "base"     
## 
## [[2]]
##  [1] "tscount"   "stats"     "graphics"  "grDevices" "utils"    
##  [6] "datasets"  "methods"   "base"      "invgamma"  "tscount"  
## [11] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [16] "methods"   "base"      "magrittr"  "invgamma"  "tscount"  
## [21] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [26] "methods"   "base"     
## 
## [[3]]
##  [1] "tscount"   "stats"     "graphics"  "grDevices" "utils"    
##  [6] "datasets"  "methods"   "base"      "invgamma"  "tscount"  
## [11] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [16] "methods"   "base"      "magrittr"  "invgamma"  "tscount"  
## [21] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [26] "methods"   "base"     
## 
## [[4]]
##  [1] "tscount"   "stats"     "graphics"  "grDevices" "utils"    
##  [6] "datasets"  "methods"   "base"      "invgamma"  "tscount"  
## [11] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [16] "methods"   "base"      "magrittr"  "invgamma"  "tscount"  
## [21] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [26] "methods"   "base"
# 設定全域變數
clusterExport(cl, 
              c("c52", "c26", "s26", "s52",
                "MH.Gibbs_Sampler", "Beta_Test", 
                "fit2", "log.alpha_fun", "log.beta_fun", 
                "log.eta_fun"))

# parallel computing - main part 
set.seed(2019)

MCMC_List <- parLapply(cl, 1 : 3, function(x){
  mcmc_tmp <- MH.Gibbs_Sampler(coef_0 = Beta_Test[x, 1 : 5] %>% t %>% t, 
                               phi_0 = Beta_Test[x, 6], 
                               sigma2_0 = Beta_Test[x, 7],
                               eta_0 = fit2$residuals,
                               tau = 5, 
                               iter = 1*10^1) 
  # Save MCMC 
  # write.table(mcmc_tmp, 
  #            file = paste0('MCMC_(3w)_', x, '.txt'), 
  #            row.names = FALSE)
  return(mcmc_tmp)
  })

stopCluster(cl)   # Stop parallel computing

寫出Metropolis-Hasting algorithm寫成函數後，在給定三個不同initial value 下並迭代十五萬次得到三個Markon chain，並算Gelman Rubin Statistic判斷收斂與否，其中burn-in為2000

因電腦運算太久，故先在其他電腦進行運算，並將結果儲存起來，再匯入此次作業

# Different Markov chain Monte Carlo
setwd("/Users/linweixiang/R/Bayesian Analysis/Dataset/MCMC")
time.start <- Sys.time()
mcmc1 <- fread("MCMC_15w_1.txt", header = TRUE) %>%
  as.data.frame()
mcmc2 <- fread("MCMC_15w_2.txt", header = TRUE) %>% 
  as.data.frame()
mcmc3 <- fread("MCMC_15w_3.txt", header = TRUE) %>% 
  as.data.frame()
time.stop <- Sys.time()
time.stop - time.start

## Time difference of 1.412277 mins

上述時間，表讀入三個15萬筆的Markov Chain所需時間

三個不同起始值，分別為

rbind(mcmc1[1, c(1:7)], mcmc2[1, c(1:7)], 
      mcmc3[1, c(1:7)]) %>% round(., 4) %>% 
  `rownames<-`(., c("$\\theta_{1}$", "$\\theta_{2}$", "$\\theta_{3}$")) %>% 
  `colnames<-`(., c("$\\alpha$", "$\\beta_{1}$", 
                    "$\\beta_{2}$", "$\\beta_{3}$", 
                    "$\\beta_{4}$", "$\\phi$", 
                    "$\\sigma_{a}^{2}$")) %>% 
  kable(., "html", caption = "Different initial value") %>%
  kable_styling(bootstrap_options = "striped", 
                full_width = F) %>% 
  footnote(general = "$\\eta_{t}$ be the residual of GLM")

Different initial value
	\(\alpha\)	\(\beta_{1}\)	\(\beta_{2}\)	\(\beta_{3}\)	\(\beta_{4}\)	\(\phi\)	\(\sigma_{a}^{2}\)
\(\theta_{1}\)	3.0045	-0.0946	-0.1884	-0.0237	0.0667	0.5751	0.1282
\(\theta_{2}\)	-3.0045	0.0946	0.1884	0.0237	-0.0667	0.1000	0.4000
\(\theta_{3}\)	1.0000	-1.0000	0.2000	0.3000	0.0100	0.7000	0.1000
Note:
\(\eta_{t}\) be the residual of GLM

Gelman Rubin Statistic is

# Check convergence using multiple chains with different initials: by Gelman Rubin statistic
GR_table <- matrix(NA, ncol = 1, nrow = 9) %>% 
  `colnames<-`(., c("GR_statistic"))
l <- 1
for(i in c(1:8, 653)){
  chain1 <- mcmc(mcmc1[-c(1:2000), i]) 
  chain2 <- mcmc(mcmc2[-c(1:2000), i]) 
  chain3 <- mcmc(mcmc3[-c(1:2000), i]) 
  combinedchains <- mcmc.list(chain1, chain2, chain3)
  GR_table[l, 1] <- gelman.diag(combinedchains)$psrf[1]
  l = l + 1
}

GR_table %>% round(., 4) %>% 
  `rownames<-`(., c("$\\alpha$", "$\\beta_{1}$", 
                    "$\\beta_{2}$", "$\\beta_{3}$", 
                    "$\\beta_{4}$", "$\\phi$", 
                    "$\\sigma_{a}^{2}$", "$\\eta_{1}$", 
                    "$\\eta_{646}$")) %>% t %>% 
  kable(., "html", caption = "Gelman Rubin statistic") %>%
  kable_styling(bootstrap_options = "striped", 
                full_width = F)

Gelman Rubin statistic
	\(\alpha\)	\(\beta_{1}\)	\(\beta_{2}\)	\(\beta_{3}\)	\(\beta_{4}\)	\(\phi\)	\(\sigma_{a}^{2}\)	\(\eta_{1}\)	\(\eta_{646}\)
GR_statistic	1.0007	1.0017	1.0037	1.0021	1.0146	1.0016	1.004	1.0001	1.0006

由上述結果可知，因為參數的Gelman Rubin Statistic接靠近1，故認為posterior distribution converge.且由於posterior distribution converge 通常 latent variables 的 posterior distribution 亦會收斂(為了表格顯示簡易，只匯出\(\eta_{1}\), \(\eta_{646}\)可看出這兩個latent variable 亦收斂)

# Compare Markov Chain
rbind(mcmc1[-c(1:2000), 1:7] %>% colMeans(), 
      mcmc2[-c(1:2000), 1:7] %>% colMeans(),
      mcmc3[-c(1:2000), 1:7] %>% colMeans()) %>% 
  round(., 4) %>% 
  `rownames<-`(., c("Chain 1", "Chain 2", "Chain 3")) %>% 
  `colnames<-`(., c("$\\alpha$", "$\\beta_{1}$", 
                    "$\\beta_{2}$", "$\\beta_{3}$", 
                    "$\\beta_{4}$", "$\\phi$", 
                    "$\\sigma_{a}^{2}$")) %>% 
  kable(., "html", caption = "posterior mean") %>%
  kable_styling(bootstrap_options = "striped", 
                full_width = F)

posterior mean
	\(\alpha\)	\(\beta_{1}\)	\(\beta_{2}\)	\(\beta_{3}\)	\(\beta_{4}\)	\(\phi\)	\(\sigma_{a}^{2}\)
Chain 1	1.0480	-0.0215	-0.0563	-0.0052	0.0052	0.0371	0.0490
Chain 2	0.9687	-0.0127	-0.0591	-0.0027	0.0002	0.1295	0.0485
Chain 3	1.0472	-0.0211	-0.0562	-0.0038	0.0037	0.0367	0.0489

由上述結果可知，因為收斂，所以三個Markov chain 的posterior mean 差不多，故後續分析使用 Chain 1 來做分析

< Posterior inference >
由於是使用Random walker Sampler 生成，故得出的Markov Chain相關性較高，故選每間隔200次抽一個樣本，使Markov Chain相關性降低

# Setting MCMC chain 
mcmc1 <- mcmc1[-c(1:2000), ]
# 每200個抽一次
n_row <- seq(1, dim(mcmc1)[1], 200) 
mcmc_1 <- matrix(NA, 
                 nrow = n_row %>% length(), ncol = 653)
k <- 1 
for (i in n_row) {
 mcmc_1[k, ]<- mcmc1[i, ] %>% as.numeric() 
 k = k + 1 
}
mcmc1 <- mcmc_1

# Posterior inference
cbind(mcmc1[, c(1:8, 653)] %>% apply(., 2, mean), 
      mcmc1[, c(1:8, 653)] %>% apply(., 2, sd)) %>% 
  round(., 4) %>% t %>% 
  `rownames<-`(., c("mean", "sd")) %>% 
  `colnames<-`(., c("$\\alpha$", "$\\beta_{1}$", 
                    "$\\beta_{2}$", "$\\beta_{3}$", 
                    "$\\beta_{4}$", "$\\phi$", 
                    "$\\sigma_{a}^{2}$", "$\\eta_{1}$", 
                    "$\\eta_{646}$")) %>% 
  kable(., "html", caption = "posterior inference") %>%
  kable_styling(bootstrap_options = "striped", 
                full_width = F)

posterior inference
	\(\alpha\)	\(\beta_{1}\)	\(\beta_{2}\)	\(\beta_{3}\)	\(\beta_{4}\)	\(\phi\)	\(\sigma_{a}^{2}\)	\(\eta_{1}\)	\(\eta_{646}\)
mean	1.0484	-0.0209	-0.0565	-0.0054	0.0054	0.0376	0.0491	-0.0483	-0.0247
sd	0.0253	0.0320	0.0294	0.0296	0.0284	0.1155	0.0058	0.2071	0.2033

Above the table is posterior inference of parameter.

# Visualization for parameter
xlabs <- c(expression(alpha), expression(beta[1]),
           expression(beta[2]), expression(beta[3]), 
           expression(beta[4]), expression(phi),
           expression(sigma[a]^2))
for (i in 1:7) {
  name <- paste("g", i, sep = "_")
  g <-  data.frame(x = mcmc1[, i]) %>% 
    ggplot(aes(x = x)) +
    geom_histogram(aes(y = ..density..),
                   bins = 30, 
                   fill = 'light blue', 
                   color = 'black') + 
    geom_density() + 
    labs(x = xlabs[i], y = "") + 
    theme(axis.title.x = element_text(size = 12))
  assign(name, g)
}
# paste("g", 1:7, sep = "_", collapse = ",") %>% noquote() 
grid.arrange(g_1, g_2, g_3, g_4, g_5, g_6, g_7, ncol = 2)

Above the figure is histogram of parameter.

g1 <- ggplot(data = NULL, aes(x = mcmc1[, 8])) + 
  geom_histogram(aes(y = ..density..),
                 bins = 40, 
                 fill = 'light blue', color = 'black') + 
  geom_density()  +
  labs(x = expression(eta[1])) + 
  theme(axis.title.x = element_text(size = 13))

g2 <- ggplot(data = NULL, aes(x = mcmc1[, 653])) + 
  geom_histogram(aes(y = ..density..),
                 bins = 40, 
                 fill = 'light blue', color = 'black') + 
  geom_density()  +
  labs(x = expression(eta[646])) + 
  theme(axis.title.x = element_text(size = 13))

# Visualization for latent variable eta_{t} 
xlabs <- c(expression(eta[1]), expression(eta[646]))
l <- 1
for (i in c(8, 653)) {
  name <- paste("g", l, sep = "")
  g <-  data.frame(x = mcmc1[, i]) %>% 
    ggplot(aes(x = x)) +
    geom_histogram(aes(y = ..density..),
                   bins = 30, 
                   fill = 'light blue', 
                   color = 'black') + 
    geom_density() + 
    labs(x = xlabs[l], y = "") + 
    theme(axis.title.x = element_text(size = 13))
  assign(name, g)
  l = l + 1
}

eta.mean <- mcmc1[, c(8:653)] %>% 
  apply(., 2, mean) %>% as.vector()

g3 <- ggplot(data = NULL, aes(x = seq_along(eta.mean), 
                              y = eta.mean)) + 
  geom_line() + 
  labs(x = "t", y = expression(E(eta[t]))) + 
  theme(axis.title.y = element_text(size = 13))

# Combine above the ggplot 
grid.arrange( arrangeGrob(g1, g2, ncol = 2), g3)

Above the figure is posterior inference of \(\eta_{t}\).
Note:
用Random walker Sampler 生成三個十五萬筆的Markov Chain約需要20小時

Comment on your inference results.
由上述posterior inference 可看出，用Bayesian Analysis所估計的參數\((\alpha, \beta_{1}, \beta_{2}, \beta_{3}, \beta_{4}, \phi, \sigma^{2}_{a})\)和直接用GLM所估計的參數不同，即

cbind(c(fit2$coefficients, TS.fit$ar, TS.fit$var.pred) , 
      mcmc1[, c(1:7)] %>% apply(., 2, mean)) %>% 
  round(., 4) %>% t %>% 
  `rownames<-`(., c("GLM", "Bayesian")) %>% 
  `colnames<-`(., c("$\\alpha$", "$\\beta_{1}$", 
                    "$\\beta_{2}$", "$\\beta_{3}$", 
                    "$\\beta_{4}$", "$\\phi$", 
                    "$\\sigma_{a}^{2}$")) %>% 
  kable(., "html") %>%
  kable_styling(bootstrap_options = "striped", 
                full_width = F)

	\(\alpha\)	\(\beta_{1}\)	\(\beta_{2}\)	\(\beta_{3}\)	\(\beta_{4}\)	\(\phi\)	\(\sigma_{a}^{2}\)
GLM	3.0045	-0.0946	-0.1884	-0.0237	0.0667	0.5751	0.1282
Bayesian	1.0484	-0.0209	-0.0565	-0.0054	0.0054	0.0376	0.0491

推測其原因是原本用glm去估計沒有考慮到殘差項的時間相依性，但用Bayesian去估計有考慮到，所以兩個方法估計出來不同頗合理。

Bayesian_Analysis_HW4

林暐翔

2019/4/23

Problem 1