Averaging Bayesian Predictive Distribution

class: title-slide, left, bottom

# Averaging Bayesian Predictive Distribution
----
## **LOO(Leave-one-out) cross-validation**
### Hyunsu Ju
### 9/10/2021

---
class: inverse, middle, center

# **Bayesian Statistics:** *Almost certainly.*

-  Probability is a measure of subjective belief about how likely an event is, 
based on prior understanding and new information.
  
  + Prior `$\rightarrow$` Information `$\rightarrow$` Posterior
  
---

# **Bayes Rule**

The cornerstone of the Bayesian approach (and the source of its name) is the 
conditional likelihood theorem known as **Bayes' rule**.

In its simplest form, Bayes' Rule states that for two events and A and B (with `$P(B) \neq 0$`): 
$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

Or, if A can take on multiple values, we 
have the **extended form**:

$$ p(A_i|B) = \frac{p(B | A_i) P(A_i)}{\sum_j P(B|A_j)P(A_j)} $$

---

# **Inference using Bayes' Rule**

Adapting Bayes' Rule to the case of a parameter value, `$\theta$` and observed data *y*, we have:

$$ p(\theta \mid y) = \frac{f(\theta, y)}{f(y)} =  \frac{f(y\mid\theta)p(\theta)}{ \int f(y \mid \theta) p(\theta) d\theta} \underbrace{\propto}_{\text{proportional to}} f(y|\theta)p(\theta) $$

Adding a bit of terminology:

- `$p(\theta)$` is the **prior** distribution: our initial subjective belief about 
the probability distribution of `$\theta$`.
- `$f(y|\theta)$` you may recognize from Maximum Likelihood estimation as: 
    
    + `$f(y | \theta) = \prod_{i=1}^n f(y_i |\theta) = \mathcal{L} (\theta)$`, the **likelihood function**.
    
- Finally, `$p(\theta)|y)$` is our **posterior** (post-experiment) belief 
 about the probability distribution of `$\theta$`.
 
 
---

### Continued

Hence we have the basic statement:

$$ Posterior \propto Likelihood \times Prior $$

- This is commonly summarized as saying that the posterior belief is a compromise between the data and prior belief.

---

### **Types of Priors**

Different types of priors include:

- **Uninformative (or "flat") priors**:  Priors that have no impact on 
    posterior values (ie assuming total ignorance about the possible parameter 
    value).
    
    + A classic uninformative prior is the uniform prior, which 
    treats all possible parameter values as equally likely: 
    `$p(\theta) = c \text{ } \forall \theta \in \Theta$`
    
- **Informative priors**:  Priors where we use prior knowledge to 
    specify a prior with a best-guess of the prior mean and distribution for 
    a parameter value.
    
- **Weakly informative** or **regularizing priors**:  Priors which are deliberately 
less informative than our actual knowledge, affecting estimates less than informative 
priors but at least incorporating *very conservative* information into the \\\\ production 
of posterior estimates.

---

### **Bayesian Computation**

One major feature of Bayesian inference that I haven't mentioned so far is 
the intractability of analytic solutions for estimating posterior distributions
in most circumstances.

Recall:
$$ p(\theta \mid y) = \frac{f(\theta, y)}{f(y)} =  \frac{f(y\mid\theta)p(\theta)}{ \int f(y \mid \theta) p(\theta) d\theta} $$

For models that are more complex or that involve high-dimensional data, closed-form
solutions are not available for the integral in the denominator.

Hence, Bayesian analysis instead typically relies on numerical methods, usually 
Markov Chain Monte Carlo (MCMC) methods.

---

### **MCMC Methods**

This method relies on sequentially sampling values of `$\theta$` from an approximation 
of the posterior distribution and then correcting the approximation to create 
better subsequent samples.

- Because the approximated distribution used in 1 step relies on the sample 
from the previous step, the simulation forms a **Markov chain**.

- A critical property then is **convergence**:  Have our simulations converged 
to the real target distribution?

+ Typically instead of running one really long "chain", researchers use 
    multiple short chains.  
    + The aggregate can not only converge faster, but 
    can provide a better sense of convergence through the noisiness between
    multiple chains.

---

### **Short comparison of rstanarm and brms**

**rstanarm** is faster, has better posterior checking, and is a bit simpler to 
use.

**brms** is generally a bit more flexible, with support for some regression types 
missing in rstanarm, more flexible specification of priors, and support for 
more types of error correlational structures.

- [rstanarm](https://www.rdocumentation.org/packages/rstanarm) is 
the more popular choice.

---
### **Using rstanarm**

With rstanarm, most regressions you run using the function [stan_glm()](https://www.rdocumentation.org/packages/rstanarm/versions/2.17.2/topics/stan_glm)

- Since generalized linear models (GLMs) incorporates models like linear 
regression, probit, logit, Poisson, binomial, exponential, etc.

**Syntax:**

```r
mybayesreg <- stan_glm(y ~ X1 + x2 + x3 ..., 
                       family = myfamily, data = mydata, 
                       prior = myprior)
```
---

### **Options with stan_glm**

**Family** (with a possible **link** argument needed as well) defines the type of 
regression you want:

- Linear regression:  `family = gaussian`
- Logit: `family = binomial(link = "logit")`
- Probit: `family = binomial(link = "probit")`
- Poisson: `family = poisson`
- More options can be read from the main [GLM page](https://www.rdocumentation.org/packages/stats/versions/3.4.3/topics/glm)

**Prior distributions:**

- **Flat priors** can be set by using `prior = NULL`
- *[Weakly]* Informative Priors can be specified by using `prior = ` with one of:

+ *normal, student_t, cauchy, laplace* and more found 
    [here](https://www.rdocumentation.org/packages/rstanarm/versions/2.17.2/topics/priors)
    
    
---

### **A Titanic survival example with rstanarm**

```r
url<-"https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
Titanic <- read.csv(url, header = TRUE)

# Display titanic data
#glimpse(Titanic,width = 50)
# Reformat Class
Titanic$class <- str_extract(Titanic$Pclass, "[0-9]")
TitanicLinear <- stan_glm(Survived ~ Age + 
    Sex + as.factor(class), 
    data = Titanic, family = gaussian)
```

```
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.13 seconds (Warm-up)
## Chain 1:                0.228 seconds (Sampling)
## Chain 1:                0.358 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.115 seconds (Warm-up)
## Chain 2:                0.209 seconds (Sampling)
## Chain 2:                0.324 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.119 seconds (Warm-up)
## Chain 3:                0.231 seconds (Sampling)
## Chain 3:                0.35 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.122 seconds (Warm-up)
## Chain 4:                0.226 seconds (Sampling)
## Chain 4:                0.348 seconds (Total)
## Chain 4:
```

```r
summary(TitanicLinear)
```

```
## 
## Model Info:
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      Survived ~ Age + Sex + as.factor(class)
##  algorithm:    sampling
##  sample:       4000 (posterior sample size)
##  priors:       see help('prior_summary')
##  observations: 714
##  predictors:   5
## 
## Estimates:
##                     mean   sd   10%   50%   90%
## (Intercept)        1.1    0.1  1.1   1.1   1.2 
## Age                0.0    0.0  0.0   0.0   0.0 
## Sexmale           -0.5    0.0 -0.5  -0.5  -0.4 
## as.factor(class)2 -0.2    0.0 -0.3  -0.2  -0.2 
## as.factor(class)3 -0.4    0.0 -0.5  -0.4  -0.4 
## sigma              0.4    0.0  0.4   0.4   0.4 
## 
## Fit Diagnostics:
##            mean   sd   10%   50%   90%
## mean_PPD 0.4    0.0  0.4   0.4   0.4  
## 
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
## 
## MCMC diagnostics
##                   mcse Rhat n_eff
## (Intercept)       0.0  1.0  2735 
## Age               0.0  1.0  3288 
## Sexmale           0.0  1.0  4577 
## as.factor(class)2 0.0  1.0  2810 
## as.factor(class)3 0.0  1.0  2758 
## sigma             0.0  1.0  4534 
## mean_PPD          0.0  1.0  4223 
## log-posterior     0.0  1.0  1814 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
```

---

##### **Credible intervals**

You can also easily get print the credible intervals with the function [posterior_interval()](https://www.rdocumentation.org/packages/rstanarm/versions/2.17.2/topics/posterior_interval.stanreg)

```r
posterior_interval(TitanicLinear, prob=0.95)
```

```
##                           2.5%        97.5%
## (Intercept)        1.030266767  1.222839447
## Age               -0.007479305 -0.003391709
## Sexmale           -0.539206252 -0.418148797
## as.factor(class)2 -0.291069298 -0.126452989
## as.factor(class)3 -0.483923498 -0.331491394
## sigma              0.366018620  0.406772331
```

---

#### **Graphical credible intervals**

```r
plot(TitanicLinear)
```

![](Bayesian-Methods-Playbook_files/figure-html/unnamed-chunk-4-1.png)

---

#### **Plotting the posterior distribution**

You can also easily plot the posterior distribution of a parameter in R.

```r
Titanic_posterior <- TitanicLinear %>% as_tibble() %>% 
  rename(sec.class = "as.factor(class)2",
         third.class = "as.factor(class)3")
ggplot(Titanic_posterior, aes(x=third.class)) + 
  geom_histogram()
```

```
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
```

![](Bayesian-Methods-Playbook_files/figure-html/unnamed-chunk-5-1.png)

---

#### **Juxtaposing the prior and the posterior**

```r
posterior_vs_prior(TitanicLinear)
```

```
## 
## Drawing from prior...
```

![](Bayesian-Methods-Playbook_files/figure-html/unnamed-chunk-6-1.png)

---

# Model Testing

## **Model testing basics**

There are a number of different regression diagnostics after performing Bayesian 
regression to help infer if the model converged, determine its performance, and 
even compare between models.

The slides cover some of them included with rstanarm as well as the very useful 
[shinystan package](https://www.rdocumentation.org/packages/shinystan).

---
### **Graphical posterior predictive analysis**

To check the predictive accuracy of the posterior distribution, you can use the 
function [pp_check()](https://www.rdocumentation.org/packages/rstanarm/versions/2.17.2/topics/pp_check.stanreg), 
which plots simulated y values from the posterior distribution against the 
actual values of y.

```r
pp_check(TitanicLinear)
```

![](Bayesian-Methods-Playbook_files/figure-html/unnamed-chunk-7-1.png)

---

### **Regularization and Predictive Accuracy**

A critical issue in both Bayesian and frequentist estimation is how to balance 
predictive accuracy with parsimony.  Put another, the researcher should be 
concerned with not overfitting the data while still creating a compelling model.

The basic approach in frequentist method is to use the Akaike information criterion 
(AIC):

**Expected Log Predictive Density:**   `$$\hat{elpd}_{AIC}  =  \log p(y | \hat{\theta}_{MLE}) - k$$`

- Where `$\theta_{MLE}$` is the maximum likelihood estimator of `$\theta$`,
- `$\log p(y | \hat{\theta}_{MLE})$` is the log likelihood given `$\theta_{MLE}$`,
- and k is the number of parameters in the model.

---

### **Deviance information criterion**

The most basic Bayesian adaptation of the AIC is the Deviance information criterion (DIC):

`$$\hat{elpd}_{DIC}  =  \log p(y | \hat{\theta}_{Bayes}) - p_{DIC}$$`

- Where `$\theta_{Bayes}$` is the mean posterior estimate and
- `$p_{DIC}$` is the number of "effective parameters in the model" using a data-biased correction

---

### **Watanabe-Akaike information criterion**

An improvement over the DIC is the Watanabe-Akaike information criterion:

`$$\hat{elpd}_{WAIC} = \sum_{i=1}^{n} \log p(y_i) - \sum_{i=1}^{n} \log V \Big[p(y_i) \Big]$$`

The WAIC has the advantages of:

- Averaging the likelihood over the posterior distribution rather than using the mean

- Does not assume a multivariate Gaussian posterior distribution, as does the DIC (and AIC)

---

### **WAIC example**

```r
waic(TitanicLinear)
```

```
## 
## Computed from 4000 by 714 log-likelihood matrix
## 
##           Estimate   SE
## elpd_waic   -334.9 19.1
## p_waic         6.1  0.4
## waic         669.9 38.3
```

---

### **Leave One Out Cross-Validation**

Another method alongside WAIC for comparing out-of-sample predictive ability is to apply 
leave-one-out cross-validation (LOO).

- LOO assesses predictive ability of posterior simulations in which the data is 
iteratively partitioned into training and prediction sets.

**Expected Log Predictive Density:**  
`$$\hat{elpd}_{LOO} = \sum_{i=1}^{n} \log p(y_i | y_{-i})$$`

---

### **LOO example**

```r
loo(TitanicLinear)
```

```
## 
## Computed from 4000 by 714 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -334.9 19.1
## p_loo         6.1  0.4
## looic       669.9 38.3
## ------
## Monte Carlo SE of elpd_loo is 0.0.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
```

---

### **Comparing models between loo_logit and loo_probit**

```r
Titanic_probit <- stan_glm(Survived ~ Age + 
    Sex + as.factor(class), 
    data = Titanic, family = binomial(link=probit))
```

```
## 
## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: 
## Chain 1: Gradient evaluation took 0.001 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 2.079 seconds (Warm-up)
## Chain 1:                2.115 seconds (Sampling)
## Chain 1:                4.194 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: 
## Chain 2: Gradient evaluation took 0.001 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 2.189 seconds (Warm-up)
## Chain 2:                2.015 seconds (Sampling)
## Chain 2:                4.204 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
## Chain 3: Rejecting initial value:
## Chain 3:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 3:   Stan can't start sampling from this initial value.
## Chain 3: Rejecting initial value:
## Chain 3:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 3:   Stan can't start sampling from this initial value.
## Chain 3: Rejecting initial value:
## Chain 3:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 3:   Stan can't start sampling from this initial value.
## Chain 3: Rejecting initial value:
## Chain 3:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 3:   Stan can't start sampling from this initial value.
## Chain 3: Rejecting initial value:
## Chain 3:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 3:   Stan can't start sampling from this initial value.
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 2.016 seconds (Warm-up)
## Chain 3:                1.978 seconds (Sampling)
## Chain 3:                3.994 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
## Chain 4: Rejecting initial value:
## Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 4:   Stan can't start sampling from this initial value.
## Chain 4: Rejecting initial value:
## Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 4:   Stan can't start sampling from this initial value.
## Chain 4: Rejecting initial value:
## Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 4:   Stan can't start sampling from this initial value.
## Chain 4: Rejecting initial value:
## Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 4:   Stan can't start sampling from this initial value.
## Chain 4: Rejecting initial value:
## Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 4:   Stan can't start sampling from this initial value.
## Chain 4: Rejecting initial value:
## Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 4:   Stan can't start sampling from this initial value.
## Chain 4: Rejecting initial value:
## Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 4:   Stan can't start sampling from this initial value.
## Chain 4: Rejecting initial value:
## Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 4:   Stan can't start sampling from this initial value.
## Chain 4: Rejecting initial value:
## Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 4:   Stan can't start sampling from this initial value.
## Chain 4: Rejecting initial value:
## Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 4:   Stan can't start sampling from this initial value.
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 2.116 seconds (Warm-up)
## Chain 4:                2.214 seconds (Sampling)
## Chain 4:                4.33 seconds (Total)
## Chain 4:
```

```r
Loo_probit <- loo(Titanic_probit)
Titanic_logit <- stan_glm(Survived ~ Age + 
    Sex + as.factor(class), 
    data = Titanic, family = binomial(link=logit))
```

```
## 
## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.652 seconds (Warm-up)
## Chain 1:                0.627 seconds (Sampling)
## Chain 1:                1.279 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.634 seconds (Warm-up)
## Chain 2:                0.775 seconds (Sampling)
## Chain 2:                1.409 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0.001 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.667 seconds (Warm-up)
## Chain 3:                0.725 seconds (Sampling)
## Chain 3:                1.392 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.651 seconds (Warm-up)
## Chain 4:                0.696 seconds (Sampling)
## Chain 4:                1.347 seconds (Total)
## Chain 4:
```

```r
Loo_logit <- loo(Titanic_logit)
# ELPD_diff>0 indicates more support for 2nd model
loo_compare(Loo_probit, Loo_logit)
```

```
##                elpd_diff se_diff
## Titanic_logit   0.0       0.0   
## Titanic_probit -1.2       0.7
```

---

### **Continued**

```r
# ELPD_diff>0 indicates more support for 2nd model
loo_compare(Loo_probit, Loo_logit)
```

```
##                elpd_diff se_diff
## Titanic_logit   0.0       0.0   
## Titanic_probit -1.2       0.7
```

---

### **Many more diagnostics with shinystan**

Probably the most popular diagnostic for Bayesian regression in R is the 
functionality from the [shinystan package](https://www.rdocumentation.org/packages/shinystan).

- Shinystan launches a "Shiny" web application to show you model diagnostics, so 
it can't be done inside of a RMarkdown document (but works just fine if called
from the console).

```r
# Do in console not RMarkdown
launch_shinystan(TitanicLinear)
```
---

### Dose response example data

```r
exp.dat = matrix(c(49.1,53,56.9,60.8,64.8,68.7,72.6,76.5,
59,60,62,56,63,59,62,60,
6,13,18,28,52,53,61,60,
.102,.217,.29,.5,.825,.898,.984,1),ncol=4)
colnames(exp.dat) = c('dose','nexp','ndied','prop')
exp.dat = as.data.frame(exp.dat)
# compute how many lived
exp.dat$nalive = exp.dat$nexp - exp.dat$ndied
#View(exp.dat)
summary(glm(cbind(ndied,nalive) ~ dose,
family=binomial,
data=exp.dat)->exp.glm)
```

```
## 
## Call:
## glm(formula = cbind(ndied, nalive) ~ dose, family = binomial, 
##     data = exp.dat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2746  -0.4668   0.7688   0.9544   1.2990  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -14.82300    1.28959  -11.49   <2e-16 ***
## dose          0.24942    0.02139   11.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 284.2024  on 7  degrees of freedom
## Residual deviance:   7.3849  on 6  degrees of freedom
## AIC: 37.583
## 
## Number of Fisher Scoring iterations: 4
```

```r
dose4prob = function(b0,b1,prob){
d = (-b0+log(-prob/(prob-1)))/b1
return(d)
}
dose4prob(coef(exp.glm)[[1]],coef(exp.glm)[[2]],.5)
```

```
## [1] 59.43092
```

```r
range(exp.dat$dose)
```

```
## [1] 49.1 76.5
```

```r
drange = seq(30,90,length=100)
exp.pred = predict(exp.glm, newdata=data.frame(dose=drange))
```
---
### Plot for LD25 and LD50

```r
knitr::include_graphics("C:\\Users\\HJU\\OneDrive - Environmental Protection Agency (EPA)\\Profile\\Documents\\curve.png")
```

```r
plot(drange,
exp(exp.pred)/(1+exp(exp.pred)),
type='l',
xlab='dose',
ylab='probability')
# add in the observed points
points(prop ~ dose, data = exp.dat, col='red', pch=19)
# compute our LD values
ld50 = dose4prob(coef(exp.glm)[[1]],coef(exp.glm)[[2]],.50)
ld25 = dose4prob(coef(exp.glm)[[1]],coef(exp.glm)[[2]],.25)
# let's add these to our graph for visualization purposes
abline(h=.5,lty='dashed',col='red')
abline(h=.25,lty='dotted',col='blue')
legend('topleft',
c(expression(LD[50]),
expression(LD[25])),
col=c('red','blue'),lty=c('dashed','dotted'),
inset=.01)
text(ld50,.52,sprintf("%.3f",ld50),pos=c(2),col='red')
text(ld25,.27,sprintf("%.3f",ld25),pos=c(2),col="blue")
```

---

### Continued

```r
DT::datatable(exp.dat,options=list(bPaginate=FALSE))
```

<div id="htmlwidget-8d1097d955ee7b42d4d9" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-8d1097d955ee7b42d4d9">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8"],[49.1,53,56.9,60.8,64.8,68.7,72.6,76.5],[59,60,62,56,63,59,62,60],[6,13,18,28,52,53,61,60],[0.102,0.217,0.29,0.5,0.825,0.898,0.984,1],[53,47,44,28,11,6,1,0]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>dose<\/th>\n      <th>nexp<\/th>\n      <th>ndied<\/th>\n      <th>prop<\/th>\n      <th>nalive<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"bPaginate":false,"columnDefs":[{"className":"dt-right","targets":[1,2,3,4,5]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>
---

###**LOO package in *rstanarm * **

```r
library(rstanarm)
options(mc.cores = parallel::detectCores())
library(loo)
library(tidyverse)
library(GGally)
library(bayesplot)
theme_set(bayesplot::theme_default())
library(projpred)
SEED=87
set.seed(SEED)
fitg <- stan_glm(cbind(ndied,nalive) ~ dose,
data=exp.dat, na.action = na.fail, family=binomial(), seed=SEED)
summary(glm(cbind(ndied,nalive) ~ dose,
family=binomial,
data=exp.dat))
```

```r
round(coef(fitg), 2)
```

```
## (Intercept)        dose 
##      -14.85        0.25
```

```r
round(posterior_interval(fitg, prob = 0.9), 2)
```

```
##                 5%    95%
## (Intercept) -17.01 -12.86
## dose          0.22   0.29
```

```r
fitg0 <- stan_glm(cbind(ndied,nalive) ~ 1, data = exp.dat, na.action = na.fail, family=binomial(), seed=SEED)
(loog <- loo(fitg))
```

```
## 
## Computed from 4000 by 8 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -19.1 2.4
## p_loo         2.1 0.4
## looic        38.2 4.7
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     6     75.0%   1214      
##  (0.5, 0.7]   (ok)       2     25.0%   392       
##    (0.7, 1]   (bad)      0      0.0%   <NA>      
##    (1, Inf)   (very bad) 0      0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.
```

```r
(loog0 <- loo(fitg0))
```

```
## Warning: Found 5 observation(s) with a pareto_k > 0.7. We recommend calling 'loo' again with argument 'k_threshold = 0.7' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 5 times to compute the ELPDs for the problematic observations directly.
```

```
## 
## Computed from 4000 by 8 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -172.8 34.1
## p_loo        30.5  6.7
## looic       345.6 68.1
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     2     25.0%   129       
##  (0.5, 0.7]   (ok)       1     12.5%   57        
##    (0.7, 1]   (bad)      3     37.5%   16        
##    (1, Inf)   (very bad) 2     25.0%   5         
## See help('pareto-k-diagnostic') for details.
```
---
### **LOO comparison**

```r
loo_compare(loog0, loog)
```

```
##       elpd_diff se_diff
## fitg     0.0       0.0 
## fitg0 -153.7      35.5
```
- Based on cross-validation dosage contains significant information to improve predictions.

---
###** Compare loo_probit and loo_logit**

```r
#Compare the probit
fitg1 <- stan_glm(cbind(ndied,nalive) ~ dose, data = exp.dat, na.action = na.fail, family=binomial(link=probit), seed=SEED)
loog1<-loo(fitg1)
loo_compare(loog1, loog)
```

```
##       elpd_diff se_diff
## fitg1  0.0       0.0   
## fitg  -0.7       0.7
```

```r
round(posterior_interval(fitg1, prob = 0.9), 2)
```

```
##                5%   95%
## (Intercept) -9.74 -7.46
## dose         0.13  0.16
```
---

##LOO_logit

```r
loog
```

```r
#loog1
#summary(loog)
```
---

##LOO_probit

```r
loog1
```

```
## 
## Computed from 4000 by 8 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -18.5 2.7
## p_loo         1.9 0.4
## looic        36.9 5.4
## ------
## Monte Carlo SE of elpd_loo is 0.0.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     7     87.5%   922       
##  (0.5, 0.7]   (ok)       1     12.5%   1143      
##    (0.7, 1]   (bad)      0      0.0%   <NA>      
##    (1, Inf)   (very bad) 0      0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.
```

```r
#loog1
#summary(loog)
```

---
## summary for loo_probit

```r
#summary(fitg1)

posterior <- as.array(fitg1)
np <- nuts_params(fitg1)
mcmc_pairs(posterior, pars = c("dose", "(Intercept)"), np = np)
```

```r
#launch_shinystan(fitg1)
```

---

# Question/Discussion

Thank You!