# Installing some useful packages
library(tidyverse)
library(kableExtra)
library(magrittr)
library(gridExtra)   # Combind the ggplot

Problem 1

For the normal example, consider that \(y_{i}|(\mu,\sigma^{2})\) are iid \(N(\mu,\sigma^{2})\) and the prior \(\pi(\mu,\sigma^{2}) \propto \sigma^{-2},\ I_{\sigma>0}\).
Let \({\bf y} = (y_{1}, y_{2},... , y_{n})^{t}\) and \(\tilde{y}\) be a new observation also from \(N(\mu,\sigma^{2})\).
Derive the posterior predictive distribution \(p(\tilde{y}|{\bf y})\) analytically.
< proof > \[ \begin{align} p(\tilde{y}|{\bf y}) &\propto \int \int p(\tilde{y}|\mu,\sigma^{2})*p(\mu,\sigma^{2}|{\bf y}) d\mu d\sigma^{2} \\ &\propto \int_{0}^{\infty} \int_{-\infty}^{\infty} (\sigma^{2})^{-\frac{n+3}{2}} * exp\left\{\frac{-1}{2\sigma^{2}} \left[ (n-1)s^{2}+n(\bar{y}-\mu)^{2}+(\tilde{y}-\mu)^{2} \right] \right\} d\mu d\sigma^{2} \\ &\propto \int_{0}^{\infty} (\sigma^{2})^{-\frac{n+3}{2}} * exp\left\{ \frac{-(n-1)s^{2}}{2\sigma^{2}} \right\} \int_{-\infty}^{\infty} exp\left\{\frac{-1}{2\sigma^{2}} \left[ n(\bar{y}-\mu)^{2}+(\tilde{y}-\mu)^{2} \right] \right\} d\mu d\sigma^{2} \\ &\propto \int_{0}^{\infty} (\sigma^{2})^{-\frac{n+3}{2}} * exp\left\{ \frac{-(n-1)s^{2}}{2\sigma^{2}} \right\} \int_{-\infty}^{\infty} exp\left\{\frac{-1}{2\sigma^{2}} \left[ (n+1)(\mu - \tilde{y}_{n})^{2} + \frac{n}{n+1}(\tilde{y}-\bar{y})^{2} \right] \right\} d\mu d\sigma^{2} \\ &\propto \int_{0}^{\infty} (\sigma^{2})^{-\frac{n+3}{2}}* exp\left\{ \frac{-[(n-1)s^{2}+ \frac{n}{n+1}(\tilde{y}-\bar{y})^{2}]}{2\sigma^{2}}\right\} \int_{-\infty}^{\infty} exp\left\{\frac{-(\mu-\tilde{y}_{n})^{2}}{2*\frac{\sigma^{2}}{n+1}} \right\} d\mu d\sigma^{2},\quad by\ normal\ function\\ &\propto \int_{0}^{\infty} (\sigma^{2})^{-\left(\frac{n}{2}+1\right)}* exp\left\{ \frac{-[(n-1)s^{2}+ \frac{n}{n+1}(\tilde{y}-\bar{y})^{2}]}{2\sigma^{2}}\right\} d\sigma^{2},\quad by\ Inv-gamma\ function \\ &\propto \frac{\Gamma(\frac{n}{2})}{\left(\frac{(n-1)s^{2}+ \frac{n}{n+1}(\tilde{y}-\bar{y})^{2}}{2}\right)^{\frac{n}{2}}} \\ &\propto \left[(n-1)s^{2}+ \frac{n}{n+1}(\tilde{y}-\bar{y})^{2}\right]^{\frac{-n}{2}} \\ &\propto \left[1+ \frac{n(\tilde{y}-\bar{y})^{2}}{n+1(n-1)s^{2}}\right]^{\frac{-n}{2}} \\ &\propto \left[1+ \frac{1}{n-1} \frac{(\tilde{y}-\bar{y})^{2}}{(1 + \frac{1}{n})s^2}\right]^{-\frac{n-1+1}{2}} \\ \Rightarrow \tilde{y}|{\bf y} &\sim t_{n-1}\left(\mu = \bar{y},\ \sigma^{2} = \left(1+\frac{1}{n}\right)s^{2} \right) \end{align} \] where \[ \begin{align} &n(\bar{y}-\mu)^{2}+(\tilde{y}-\mu)^{2} \\ \Rightarrow &(n+1)\mu^{2}-2(n\bar{y}+\tilde{y})\mu+n\bar{y}^{2}+\tilde{y}^{2} \\ \Rightarrow &(n+1) \left\{\mu^{2} - 2\left(\frac{n\bar{y+\tilde{y}}}{n+1}\right)\mu + \left(\frac{n\bar{y+\tilde{y}}}{n+1}\right)^{2} - \left(\frac{n\bar{y+\tilde{y}}}{n+1}\right)^{2} + \frac{n\bar{y}^{2}+\tilde{y}^{2}}{n+1} \right\} \\ \Rightarrow &(n+1)(\mu-\tilde{y}_{n})^{2} + n\bar{y}^{2}+\tilde{y}^{2} - \frac{1}{n+1}(n\bar{y}+\tilde{y})^{2},\quad where\ \tilde{y}_{n} = \left(\frac{n\bar{y+\tilde{y}}}{n+1}\right) \\ \Rightarrow &(n+1)(\mu-\tilde{y}_{n})^{2} + \frac{n}{n+1} (\tilde{y}-\bar{y})^{2} \end{align} \]

Problem 2

The Weibull distribution is a distribution that is often used for lifetimes of equipment or parts.
It is a 2-parameter distribution satisfying \[ \begin{align} p(x|\alpha, \eta) = \alpha \eta (x \eta)^{\alpha-1}*exp\{-(x \eta)^{\alpha}\},\quad x>0,\alpha>0,\eta>0 \end{align} \] For simplicity, the shape parameter is fixed as \(\alpha = 2\) and the re-parameterization \(\theta = \eta^{2}\) is adopted for the Bayesian analysis.
Suppose we observe data \(y_{1},y_{2},...,y_{n}\) as independent samples (given \(\theta\)) from Weibull with \(\alpha = 2\), and denote \({\bf y} = (y_{1}, y_{2},... , y_{n})^{t}\).

According to the statement
\[ \begin{align} p(y|\alpha=2, \eta) &= 2 \eta (y \eta)*exp\{-(y \eta)^{2}\} \\ &= 2y \eta^{2}*exp\{-y^{2}\eta^{2}\} \\ \Rightarrow p(y|\alpha=2, \theta) &= 2y\theta * exp\{-y^{2} \theta\},\quad where\ \theta=\eta^{2} \\ &\in\ Exponential\ family \end{align} \]

  1. Derive the conjugate prior family for \(\theta\). Use the conjugate \(\pi(\theta)\) for the rest of questions.
    Because\(p(y|\alpha=2, \theta)\in Exponential\ family\), the conjugate prior is \[ \begin{align} \pi(\theta) &\propto \theta^{\alpha-1} * exp\{-\beta*\theta\} \\ \Rightarrow \theta &\sim \Gamma(\alpha,\beta) \end{align} \]

  2. Derive the Jeffreys?? prior for \(\theta\). Can it lead to a proper posterior? Verify it and provide necessary conditions.
    要求Jeffreys prior前,需先求出Fisher information \[ \begin{align} L &= p({\bf y}|\theta) = \prod_{i=1}^{n} 2 y_{i}\theta*exp\{-y_{i}^{2}\theta\} \\ &\propto \theta^{n} * exp\left\{-\sum_{i=1}^{n}y_{i}^{2}*\theta\right\} \\ \Rightarrow logL &\propto n log(\theta) -\sum_{i=1}^{n}y_{i}^{2}*\theta \\ \Rightarrow \frac{dlogL}{d\theta} &= \frac{n}{\theta}-\sum_{i=1}^{n}y_{i}^{2} \\ \Rightarrow \frac{d^{2}logL}{d\theta^{2}} &= \frac{-n}{\theta^{2}} \end{align} \] and, the Fisher information is \[ \begin{align} I(\theta) &= - E\left[ \frac{d^{2}logL}{d\theta^{2}} \right] \\ &= \frac{n}{\theta^{2}} \end{align} \] so, the Jeffery prior is \[ \begin{align} \pi(\theta) &\propto [I(\theta)]^{1/2} = \frac{\sqrt{n}}{\theta} \\ &\propto\frac{1}{\theta} \end{align} \] Verify the prior leads to a proper posterior \[ \begin{align} p(\theta|{\bf y}) &\propto p({\bf y}|\theta) * \pi(\theta) \\ &\propto \theta^{n}* exp\left\{-\sum_{i=1}^{n}y_{i}^{2}*\theta\right\} * \frac{1}{\theta} \\ &\propto \theta^{n-1}* exp\left\{-\sum_{i=1}^{n}y_{i}^{2}*\theta\right\} \\ \Rightarrow \theta|{\bf y} &\sim \Gamma\left(n,\ \sum_{i=1}^{n}y_{i}^{2}\right) \end{align} \] 由於其posterior 為一常用分配,故積分總和為1(necessary condition),故此Jeffreys prior可得到proper posterior.
  3. Derive the marginal distribution \(p(y_{1}, y_{2}, ..., y_{n})\). There are two ways to obtain this marginal. (1) Use \(p(y_{1},y_{2}, ..., y_{n})= \int p(y_{1}, y_{2},..., y_{n}|\theta)\pi(\theta)d\theta\); or (2) \(p(y_{1}, y_{2},..., y_{n}) = \frac{p(y_{1}, y_{2}, ..., y_{n}|\theta)\pi(\theta)}{p(\theta|y_{1}, y_{2}, ..., y_{n})}\) Use which holds for any \(\theta\).
    < Solution > \[ \begin{align} p({\bf y}) &= \int_{0}^{\infty} p({\bf y}|\theta)*\pi(\theta)d\theta,\quad where\ \theta \sim \Gamma(\alpha,\ \beta) \\ &= \int_{0}^{\infty} 2^{n} \theta^{n} * \prod_{i=1}^{n}y_{i} *exp\left\{-\sum_{i=1}^{n}y_{i}^{2}*\theta\right\} * \frac{\beta^{\alpha}}{\Gamma(\alpha)} * \theta^{\alpha-1} * exp\{-\beta*\theta\} d\theta \\ &= 2^{n}*\prod_{i=1}^{n}y_{i} * \frac{\beta^{\alpha}}{\Gamma(\alpha)} \int_{0}^{\infty} \theta^{n+\alpha-1} *exp\left\{- \left(\sum_{i=1}^{n}y_{i}^{2}+\beta\right)*\theta\right\} d\theta \quad by\ gamma\ function \\ &= \frac{2^{n}*\prod_{i=1}^{n}y_{i} *\beta^{\alpha}}{\Gamma(\alpha)} \frac{\Gamma(n+\alpha)}{\left(\sum_{i=1}^{n}y_{i}^{2}+\beta\right)^{n+\alpha}} \end{align} \]
  4. In one application, the lifetime for an electronic device are measured (with the unit of 10000 hours) (i.e., y = 1 means the test sample lasted for 10000 hours before failure; y = 2 means the test sample lasted for 20000 hours before failure). According to previous experiences, the devices with similar design and manufacturer usually have lifetime ranging from 5000 to 50000 hours, and the average around 10000 hours.
    Try to use these information to suggest a reasonable conjugate prior (given specific hyper-parameter values).
    依題意知道,y的值大部分介於0.5~5之間,且均數為1,故可以依照下列algorithm求出hyper-parameter
    step 1: Given hyper-parameter(\(\alpha,\ \beta\)), 生成1000個\(\theta\), where \(\theta \sim \Gamma(\alpha,\ \beta)\)
    step 2: 以這1000\(\theta\)當作Weibull distribution 的parameter,生成1000個weibull random variablee
    step 3: 計算上述1000個Weibull random variable之95% confidence interval,即可得出在特定hyper-parameter下,95% confidence interval of y
    step 4: Repeat step 1 ~ step 3, choose the adequate hyper-parameter
    According to the above algorithm, we can implement

    # Decision prior
    Decision.prior_fun <- function(alpha = 1, beta = 2){
      set.seed(2019)
      theta <- rgamma(1000, shape = alpha, rate = beta)
      weibull.rv <- rweibull(1000, shape = 2, 
                             scale = (1/sqrt(theta)))
      rv.mean <- weibull.rv %>% mean
      LCI_95 <- weibull.rv %>% sort %>% .[25]
      UCI_95 <- weibull.rv %>% sort %>% .[975]
      return(list(mean = rv.mean, 
                  CI = c(LCI_95, UCI_95),
                  rv = weibull.rv))
      }
    # For loop
    alpha.seq <- seq(8, 10, by = 1)
    beta.seq <-seq(47, 53, by = 1)
    hyper_parameter_talbe <- matrix(NA, 
                                    nrow = alpha.seq %>% length(), 
                                    ncol =  beta.seq %>% length()) %>% 
      `rownames<-`(., paste("alpha=", alpha.seq, sep = "")) %>% 
      `colnames<-`(., paste("beta=", beta.seq, sep = ""))
    
    ii = 1
    for (i in alpha.seq) {
      jj = 1
      for (j in beta.seq) {
        char <- Decision.prior_fun(alpha = i, 
                                   beta = j)$CI %>% 
          round(., 3) %>% as.character()
        hyper_parameter_talbe[ii, jj] <- paste("(", char[1], ", ", char[2], ")", sep = "") 
        jj = jj +1
        }
      ii = ii +1 
      }
    
    hyper_parameter_talbe %>% 
      kable(., "html") %>% 
      kable_styling(bootstrap_options = "striped", full_width = F) 
    beta=47 beta=48 beta=49 beta=50 beta=51 beta=52 beta=53
    alpha=8 (0.377, 5.095) (0.381, 5.149) (0.385, 5.203) (0.389, 5.255) (0.393, 5.308) (0.397, 5.359) (0.401, 5.411)
    alpha=9 (0.322, 4.836) (0.325, 4.887) (0.328, 4.938) (0.332, 4.988) (0.335, 5.038) (0.338, 5.087) (0.341, 5.136)
    alpha=10 (0.306, 4.558) (0.309, 4.606) (0.312, 4.654) (0.315, 4.701) (0.318, 4.748) (0.322, 4.794) (0.325, 4.84)

    上述表格表,在各hyper-parameter下,所對應的95% confidence interval of y.
    由上述表格看出,當\((\alpha = 9,\ \beta = 50)\)時,y 之95% CI 均離0.5~5很近,故以\((\alpha = 9,\ \beta = 50)\)當作specific hyper-parameter values

  5. Ten observations(n = 10) are observed \({\bf y }=(0.25,0.50,0.62,0.92,0.98,1.00,1.07,1.10,1.20,1.48)\).
    Derive the posterior for ?c and graph it. Give the posterior mean, posterior variance, and a 95% posterior interval for \(\theta\).
    由(a)小題知道,\(\theta \sim \Gamma(\alpha,\beta)\), so the posterior is
    < Derive the posterior dist.> \[ \begin{align} p(\theta|{\bf y}) &\propto p({\bf y}|\theta) * \pi(\theta) \\ &\propto \theta^{n}* exp\left\{-\sum_{i=1}^{n}y_{i}^{2}*\theta\right\} * \theta^{\alpha-1} * exp\{-\beta*\theta\} \\ &\propto \theta^{n+\alpha-1} *exp\left\{-\left(\sum_{i=1}^{n}y_{i}^{2}+\beta \right)*\theta\right\} \\ \Rightarrow \theta|{\bf y} &\sim \Gamma\left(n + \alpha,\ \sum_{i=1}^{n}y_{i}^{2}+\beta\right) = \Gamma(19,\ 59.489)\\ \Rightarrow &\left\{ \begin{aligned} &E(\theta|{\bf y}) = \frac{n+\alpha}{\sum_{i=1}^{n}y_{i}^{2}+\beta} \approx 0.3194 \\ &Var(\theta|{\bf y}) = \frac{n+\alpha}{\left(\sum_{i=1}^{n}y_{i}^{2}+\beta\right)^{2}} \approx 0.0054 \\ \end{aligned} \right. \end{align} \] 故可由已知資料繪製其分佈,且計算sterior mean, posterior variance, and a 95% posterior interval for \(\theta\)

    # Loading Datas
    alpha = 9
    beta = 50
    y <- c(0.25, 0.50, 0.62, 0.92, 0.98, 
           1.00, 1.07, 1.10, 1.20, 1.48)
    n <- 10
    # Information of posterior dist.
    poster.mean <- (n + alpha) / ((y^2 %>% sum) + beta) 
    poster.var <- (n + alpha) / ((y^2 %>% sum) + beta)^2 
    poster.table <- matrix(c(poster.mean %>% round(., 4), 
                             poster.var %>% round(., 4),
                             "(0.192, 0.478)")) %>% 
      `rownames<-`(., c("Post.mean", "Post.var", "95% CI")) %>% 
      `colnames<-`(., "Value")
    
    x.seq <- seq(0, 2.5, by = 0.001)
    y.seq <- as.numeric()
    for (i in 1:length(x.seq)) {
      y.seq[i] <- dgamma(x.seq[i], 
                         shape = n + alpha, 
                         rate = (y^2 %>% sum) + beta)
      }
    
    LCI_95 <- qgamma(0.025, shape = n + alpha, 
                     rate = (y^2 %>% sum) + beta) %>% 
      round(., 4)
    UCI_95<- qgamma(0.975, shape = n + alpha, 
                    rate = (y^2 %>% sum) + beta) %>% 
      round(., 4)
    
    # Visualization of posterior dist.
    ggplot(data = NULL, 
           aes(x = x.seq, y = y.seq)) + 
      geom_line() + 
      labs(title = "Posterior dist.", 
           x = expression(theta), y = "density", 
           caption = expression(Gamma(alpha ==  19, 
                                      beta == 59.489))) + 
      annotation_custom(tableGrob(poster.table), 
                        xmin = 1.3, xmax = 2.3, 
                        ymin = 2, ymax = 4) +
      geom_area(mapping = aes(x = ifelse(x.seq > LCI_95 & 
                                           x.seq < UCI_95 ,
                                         x.seq, 0)), 
                fill = "light blue") + 
      xlim(c(0, 2.5)) + ylim(c(0, 5.7)) + 
      annotate("text", x = 0.33, y = 0.4, label = "95% CI") 

    上述圖形,即為此Posterior distribution 的相關資訊,其中,95% CI 是用等尾的quantile 下去計算的

  6. For a new part, what is your prediction for its lifetime?
    According to the statement, we need to find \(p(\tilde{y}|{\bf y})\) \[ \begin{align} p(\tilde{y}|{\bf y}) &= \int p(\tilde{y}|\theta)*p(\theta|{\bf y })d\theta \\ &= \int_{0}^{\infty} 2 \tilde{y} * \theta * exp\{-\tilde{y}^{2}*\theta\} * \frac{(\sum y_{i}^{2} + \beta)^{n+\alpha}}{\Gamma(n+\alpha)} * \theta^{n+\alpha+1}*exp \left\{-\left[\sum y_{i}^{2} + \beta \right] *\theta \right\} d\theta \\ &= 2 \tilde{y} * \frac{\left(\sum y_{i}^{2} + \beta\right)^{n+\alpha}}{\Gamma(n+\alpha)} * \int_{0}^{\infty} \theta^{n+\alpha} * exp\left\{-\left[\tilde{y}^{2}+ \sum y_{i}^{2} + \beta \right] * \theta \right\} d\theta,\quad by\ gamma\ function \\ &= 2 \tilde{y} * \frac{\left(\sum y_{i}^{2} + \beta\right)^{n+\alpha}}{\Gamma \left(n+\alpha\right)} * \frac{\Gamma(n+\alpha+1)}{\left(\tilde{y}^{2} + \sum y_{i}^{2} + \beta\right)^{n+\alpha+1}} \\ &= 38*\tilde{y} * \frac{(59.489)^{19}}{(\tilde{y}^{2}+59.489)^{20}} \end{align} \] Above is the prdiction distribution

    # Prediction random variable 
    theta <- rgamma(1000, 
                    shape = n + alpha, 
                    rate = (y^2 %>% sum) + beta)
    predict.rv <- rweibull(1000, 
                           shape = 2, scale = (1/sqrt(theta)))
    # Information of Prediction
    LCI_95 <- predict.rv %>% sort() %>% .[25] %>% round(., 3)
    UCI_95 <- predict.rv %>% sort() %>% .[975] %>% round(., 3)
    CI_95 <- paste("(", LCI_95, ", ", UCI_95, ")", sep = "") 
    
    predict.info <- c(predict.rv %>% mean %>% round(., 3), 
                      predict.rv %>% var %>% round(., 3), 
                      CI_95) %>% 
      matrix(., nrow = 3) %>% 
      `rownames<-`(., c("Mean", "Var", "95% CI")) %>% 
      `colnames<-`(., c("Value"))
    # Visualization
    ggplot(data = NULL, aes(x = predict.rv)) + 
      geom_histogram(aes(y = ..density..), 
                     binwidth = 0.2, 
                     fill = 'light blue', color = 'black') +
      labs(title = "Distribution of prediction") +
      geom_density() + 
      annotation_custom(tableGrob(predict.info), 
                    xmin = 4, xmax = 5, 
                    ymin = 0.3, ymax = 0.4)

Problem 3

Poisson regression model:
expand the model of Exercise 2.13(a) by assuming that the number of fatal accidents in year t follows a Poisson distribution with mean \(\alpha + \beta t\). You will estimate \(\alpha\) and \(\beta\), following the example of the analysis in Section 3.7.

# Datas
x1 <- c(24, 25, 31, 31, 22, 21, 26, 20, 16, 22)
x2 <- c(734, 516, 754, 877, 814, 362, 764, 809, 223, 1066) 
x3 <- c(0.19, 0.12, 0.15, 0.16, 0.14, 
        0.06, 0.13, 0.13, 0.03, 0.15)
data.hw2 <- data.frame(Year = seq(1976, 1985),
                       Year_index = seq(1976, 1985) - 1975, 
                       Fatel_accidents = x1, 
                       Passenger_deaths = x2, 
                       Death_rate = x3)
data.hw2 %>% as.matrix() %>% kable(., "html") %>% 
  kable_styling(bootstrap_options = "striped", full_width = F)
Year Year_index Fatel_accidents Passenger_deaths Death_rate
1976 1 24 734 0.19
1977 2 25 516 0.12
1978 3 31 754 0.15
1979 4 31 877 0.16
1980 5 22 814 0.14
1981 6 21 362 0.06
1982 7 26 764 0.13
1983 8 20 809 0.13
1984 9 16 223 0.03
1985 10 22 1066 0.15

According to the above statement, we know the statistical model is \[ \begin{align} Y_{t} &\sim Poisson(\lambda_{t})\quad t = 1,2...10\\ where,\ &ln(\lambda_{t}) = \alpha + \beta*t \Longleftrightarrow \lambda_{t} = e^{\alpha + \beta*t} \end{align} \]

  1. Discuss various choices for a ??noninformative?? prior for \((\alpha,\beta)\). Choose one.
    目前學過的noninformative prior有 jeffery prior and diffuse prior……等. 因為jeffery prior和Fisher information 有關,故多少會含有一點\((\alpha,\beta)\)的資訊,而此組資料樣本數只有10筆,故考慮使用diffuse prior, 不給\((\alpha,\beta)\)資訊,而是藉由資料自己決定(to let the data speak for themselves)即 \[ \begin{align} \pi(\alpha,\beta) \propto\ 1*I_{\{\alpha \in \mathbb{R},\beta \in \mathbb{R}\}} \end{align} \]
  2. Discuss what would be a realistic informative prior distribution for \((\alpha,\beta)\). Sketch its contours and then put it aside. Do parts (c)?V(h) of this problem using your noninformative prior distribution from (a).
    依題意知道,要我們選一個realistic informative prior of \((\alpha,\beta)\)並將其contours plot 繪出。 由(e)小題知,用linear regression對\(ln(\lambda)=\alpha+\beta*t\) 配適,得到初步的\((\alpha,\beta)\)估計值,且由回歸的假設,認為\((\alpha,\beta)\)皆服從常態(為方便分析,暫不考慮兩者的共變異數),其中兩者的常態分配參數如下

    # Fit a linear regression 
    reg.table <- lm(log(Fatel_accidents) ~ Year_index, 
                    data = data.hw2) %>% 
      summary() %>% .$coefficients %>% round(., 4) %>% 
      .[1:2, 1:2] %>% as.matrix() %>% 
      `rownames<-`(., c("alpha", "beta"))
    reg.table %>% kable(., "html") %>% 
      kable_styling(bootstrap_options = "striped", full_width = F)
    Estimate Std. Error
    alpha 3.3742 0.1153
    beta -0.0404 0.0186

    由上述表格得到\((\alpha, \beta)\) 的常態分配參數,並將其contours plot繪出如下

    # Contours plot
    # Prior dist.
    info.prior <- function(alpha = 1, beta = 0.1){
      value <- dnorm(alpha, mean = reg.table[1, 1], 
                     sd = reg.table[1, 2]) *
        dnorm(beta, 
              mean = reg.table[2, 1], sd = reg.table[2, 2])
      return(value)
      }
    
    # Grid approximation
    m  <- 100
    alpha.seq <- seq(3.05, 3.7, length.out = m)
    beta.seq <- seq(-0.10, 0.03, length.out = m)
    grid.value_prior <- data.frame(alpha = rep(alpha.seq, 
                                               each = m), 
                                   beta = rep(beta.seq, 
                                              times = m))
    grid.value_prior$density <- grid.value_prior %>% 
      apply(., 1,  
            FUN = function(x){ 
              info.prior(alpha = x[1], beta = x[2])
              })
    
    # Visualization
    prior_contour <- ggplot(data = grid.value_prior, 
                            aes(x = alpha, y = beta)) + 
      geom_raster(aes(fill = density), interpolate = TRUE) + 
      geom_contour(aes(z = density), colour = 'black', 
                   size = 0.3) + 
      scale_fill_gradientn(colours = c("white", "yellow", "red")) + 
      labs(title = "Prior density evaluated in grid", 
           x = expression(alpha), y = expression(beta), 
           caption = "Contours plot") + 
      theme(legend.position = "none")
    prior_contour

    上述圖形,表在不同hyper-parameter下,所對應的Conjugate prior..
  3. Write the posterior density for \((\alpha,\beta)\). What are the sufficient statistics?
    \[ \begin{align} p(\alpha,\beta|{\bf y}) &\propto p({\bf y}|\alpha,\beta)*\pi(\alpha,\beta) \\ &\propto \prod_{t=1}^{n} \lambda_{t}^{y_{t}}*exp\{-\lambda_{t}\},\quad \lambda_{t}=exp\{\alpha+\beta*t\} \\ &\propto \prod_{t=1}^{n}e^{-\lambda_{t}} \prod_{t=1}^{n} \lambda_{t}^{y_{t}} \\ &\propto \prod_{t=1}^{n}e^{-\lambda_{t}} * \prod_{t=1}^{n} exp\{\alpha y _{t}+\beta*ty_{t}\} \\ &\propto \prod_{t=1}^{n}e^{-\lambda_{t}} * exp \left\{ \alpha \sum_{i=1}^{n} y_{t} + \beta \sum_{i=1}^{n} t*y_{t} \right\} \end{align} \] by Neyman-Fisher factorization theorem
    \(\left(\sum_{i=1}^{n} y_{t},\ \sum_{i=1}^{n} t*y_{t}\right)\) is a sufficient statistic for \((\alpha,\ \beta)\)
  4. Check that the posterior density is proper.
    為了希望posterior density is proper, 故一般希望diffuse prior的範圍會取很大。 而此題diffuse prior 範圍取在回歸估計值的正負三倍標準差內,故可保證\(p({\bf y}|\alpha,\beta)*\pi(\alpha,\beta)\)積分為有限值,且因\((\alpha,\beta|{\bf y}) \propto p({\bf y}|\alpha,\beta)*\pi(\alpha,\beta)\)??\(p(\alpha,\beta|{\bf y})\)?M\(p({\bf y}|\alpha,\beta)*\pi(\alpha,\beta)\)差一個normalizing constant,故可說posterior density is proper.

  5. Calculate crude estimates and uncertainties for \((\alpha,\beta)\) using linear regression.
    由於原始資料Year和Fatel_accidents資料單位差距過大,故考慮先將Year - 1975 = Year_index,再以此變數當作independent variable下去跑回歸。 即考慮每增加一年對Fatel_accidents的影響
    則相關資訊如下所示

    # Visualization 
    ggplot(data = data.hw2, 
           aes(x = Year_index, y = Fatel_accidents %>% log())) + 
      geom_point() + 
      theme_grey() + 
      labs(y = expression(ln(lambda)), x = "t") + 
      scale_x_continuous(breaks = data.hw2$Year_index,
                         minor_breaks = NULL) +
      geom_smooth(method = lm) + 
      annotation_custom(tableGrob(reg.table), 
                        xmin = 2.3, xmax = 3.3, 
                        ymin = 2.8, ymax = 3)
  6. Plot the contours and take 1000 draws from the joint posterior density of \((\alpha,\beta)\).
    Given joint posterior density is \[ \begin{align} p(\alpha,\beta|{\bf y }) &\propto p({\bf y }|\alpha,\beta)*\pi(\alpha,\beta) \\ &\propto \prod_{t=1}^{n} \frac{\lambda_{t}^{y_{t}}e^{-\lambda_{t}}}{y_{t}!} * 1 \\ \Rightarrow\ log(p(\alpha,\beta|{\bf y }) ) &\propto \sum_{t=1}^{n}\left\{y_{t}*log(\lambda_{t})-\lambda_{t}- log(y_{t}!) \right\} \end{align} \]

    # Log posterior dist.
    n <- 10
    log.poster <- function(alpha = 1, beta = 0.1){
      y <- data.hw2$Fatel_accidents
      x <- data.hw2$Year_index
      lambda <- exp(alpha + (beta * x))
      value <- ( (y * log(lambda)) 
                 - lambda - log(factorial(y))) %>% sum
      return(value)
    }
    
    # Grid approximation
    m <- 100
    alpha.seq <- seq(3.05, 3.7, length.out = m)
    beta.seq <- seq(-0.10, 0.03, length.out = m)
    grid.value_post <- data.frame(alpha = rep(alpha.seq, 
                                              each = m), 
                                  beta = rep(beta.seq, 
                                             times = m))
    grid.value_post$density <- grid.value_post %>% 
      apply(., 1,  
            FUN = function(x){ 
              log.poster(alpha = x[1], beta = x[2])
              }) %>% exp()
    
    # Visualization
    poster_contour <- ggplot(data = grid.value_post, 
                             aes(x = alpha, y = beta)) + 
      geom_raster(aes(fill = density), interpolate = TRUE) +
      geom_contour(aes(z = density), colour = "black", size = 0.3) + 
      scale_fill_gradientn(colours = c("white", "yellow", "red")) + 
      labs(title = "Posterior density evaluated in grid", 
           x = expression(alpha), y = expression(beta), 
           caption = "Contours plot") + 
      theme(legend.position = "none")
    poster_contour

    Above the figure is the Contour plot of posterior distribution.

    # Sampling from posterior dist.
    set.seed(2019)
    nsample <- 1000  # smpling number
    index <- sample(1:(m^2), 
                    size = nsample, replace = TRUE, 
                    prob = grid.value_post$density)
    
    # Visualization of joint dist. and contour plot 
    joint.dist <- ggplot(data = grid.value_post[index, ], 
                         aes(x = alpha, y = beta)) + 
      geom_point() + 
      labs(x = expression(alpha), y = expression(beta), 
           title = "Sampling form joint posterior dist.") 
    # Contour plot
    joint.contour <- ggplot(data = grid.value_post[index, ], 
                            aes(x = alpha, y = beta)) + 
      stat_density_2d(aes(fill = ..level..), 
                      geom = "polygon", 
                      colour = "black") + 
      labs(x = "alpha", y = "beta", title = "") +
      scale_fill_gradient(low = "#FFFF00", 
                          high = "#FF3366") +
      labs(x = expression(alpha), y = expression(beta), 
           title = "Contour plot") + 
      theme(legend.position = c(0.15, 0.25), 
            legend.background = element_rect(color = "black",
                                             fill = "grey90"),
            legend.title = element_blank()) 
    grid.arrange(joint.contour, joint.dist, nrow = 1)

    上述圖形均表示由此posterior distribution抽出1000個sample,所畫出來的圖形。

  7. Using your samples of \((\alpha,\beta)\), plot a histogram of the posterior density for the expected number of fatal accidents in 1986, \(\alpha + 1986*\beta\).
    According to the statement \[ \begin{align} Let\ W &= exp\{\alpha + 1986*\beta \}\quad (Year) \\ &= exp\{\alpha + 11*\beta \}\quad (Year\_index) \end{align} \]

    W <- (grid.value_post[index, 1] + 
            (11 * grid.value_post[index, 2])) %>% exp()
    
    W.table <- c(W %>% mean, W %>% var) %>% round(., 4) %>% 
      matrix(., ncol = 1) %>% 
      `colnames<-`(., "Value") %>% 
      `rownames<-`(., c("Mean", "Var"))
    
    # Visualization 
    diffuse.post_W <- ggplot(data = NULL, aes(x = W)) + 
      geom_histogram(aes(y = ..density..), 
                     binwidth = 0.4, 
                     fill = 'light blue', color = 'black') + 
      geom_density(aes(y = ..density..)) +# plot density curve
      labs(x = 'w', title = "Distribution of W",
           caption = "Expected number of fatal accidents in 1986") +
      annotation_custom(tableGrob(W.table), 
                        xmin = 24, xmax = 27, 
                        ymin = 0.12, ymax = 0.15) + 
      xlim(10, 30)
    diffuse.post_W

    Above the figure is the posterior density for the expected number of fatal accidents in 1986.
  8. Create simulation draws and obtain a 95% predictive interval for the number of fatal accidents in 1986.
    由(g)小題得到W的sampling distribution,現在要求95% predictive interval,可以取等尾的interval當作其95% predictive interval。即取0.025和0.975的quantile當作其interval

    # Predictive random variable
    y.predict <- as.numeric()
    for (i in 1:1000) {
      y.predict[i] <- rpois(1, lambda = W[i])
    }
    
    # Information of predictive dist.
    y.info.table <- c(y.predict %>% sort() %>% .[25], 
                      y.predict %>% sort() %>% .[975]) %>% 
      matrix(., nrow = 1) %>% `rownames<-`(., "Value") %>% 
      `colnames<-`(., c("95%_LCI", "95%_UCI")) 
    
    # Visualization 
    ggplot(data = NULL, aes(x = y.predict)) +
      geom_histogram(aes(y = ..density..), 
                     binwidth = 1, 
                     fill = 'light blue', color = 'black') +
      geom_density() + 
      labs(title = bquote("Distribution of " ~ y[t==11] ),
           caption = "The number of fatal accidents in 1986", 
           x = expression(y[t==11])) + 
      annotation_custom(tableGrob(y.info.table), 
                        xmin = 28, xmax = 32, 
                        ymin = 0.075, ymax = 0.075)
  9. How does your hypothetical informative prior distribution in (b) differ from the posterior distribution in (f) and (g), obtained from the noninformative prior distribution and the data? If they disagree, discuss.

    # Combine the ggplot 
    prior_contour <- prior_contour + 
      ggtitle("(b), Informative Prior")
    poster_contour <- poster_contour + 
      ggtitle("(f), Posterior density")
    grid.arrange(prior_contour, poster_contour, nrow = 1)

    上述圖形資訊為:左圖為informative prior之contour plot, 右圖為diffuse prior下之posterior density 之contour plot.
    可看出兩圖形在函數最高點的位置幾乎差不多,且\((\alpha,\beta)\)的分散程度不同。
    Note: Informative posterior distribution
    若取informative prior,則可明顯看出Informative posterior distribution 的分散程度會較diffuse posterior distribution 小,如下所示

    # Informative prior  posterior dist.
    grid.value_info.post <- grid.value_post
    grid.value_info.post$density <- grid.value_prior$density * grid.value_post$density
    
    infor.poster_contour <- ggplot(data = grid.value_info.post, aes(x = alpha, 
                                            y = beta)) + 
      geom_raster(aes(fill = density), interpolate = TRUE) + 
      geom_contour(aes(z = density), colour = "black", size = 0.3) + 
      scale_fill_gradientn(colours = c("white", "yellow", "red")) + 
      labs(title = "Posterior density", 
           x = expression(alpha), y = expression(beta), 
           caption = "Informative prior") + 
      theme(legend.position = "none")
    
    # Comparison different prior
    poster_contour <- poster_contour + 
      labs(title = "Posterior density", 
           caption = "Diffuse prior")
    grid.arrange(poster_contour, 
                 infor.poster_contour, nrow = 1)

    # Sampling from informative-posterior dist.
    set.seed(2019)
    nsample <- 1000  # smpling number
    index <- sample(1:(m^2), 
                    size = nsample, replace = TRUE, 
                    prob = grid.value_info.post$density)
    W_info <- (grid.value_info.post[index, 1] + 
                 (11 * grid.value_info.post[index, 2])) %>%
      exp()
    
    W_info.table <- c(W_info %>% mean, W_info %>% var) %>% 
      round(., 4) %>% 
      matrix(., ncol = 1) %>% 
      `colnames<-`(., "Value") %>% 
      `rownames<-`(., c("Mean", "Var"))
    
    # Visualization 
    info.post_W <- ggplot(data = NULL, aes(x = W_info)) + 
      geom_histogram(aes(y = ..density..), 
                     binwidth = 0.4, 
                     fill = 'light blue', color = 'black') + 
      geom_density(aes(y = ..density..)) +# plot density curve
      labs(x = 'w', title = "Distribution of W_informative",
           caption = "Expected number of fatal accidents in 1986_informative") +
      annotation_custom(tableGrob(W_info.table), 
                        xmin = 24, xmax = 27, 
                        ymin = 0.12, ymax = 0.15) + 
      xlim(10, 30)
    grid.arrange(diffuse.post_W, info.post_W)

    由上述圖形可看出,當我的prior 取informative時,可看出均數幾乎沒有什麼變化,但是變異數下降了