class: title-slide, left, bottom # Averaging Bayesian Predictive Distribution ---- ## **LOO(Leave-one-out) cross-validation** ### Hyunsu Ju ### 9/10/2021 --- class: inverse, middle, center # **Bayesian Statistics:** *Almost certainly.* - Probability is a measure of subjective belief about how likely an event is, based on prior understanding and new information. + Prior `\(\rightarrow\)` Information `\(\rightarrow\)` Posterior --- # **Bayes Rule** The cornerstone of the Bayesian approach (and the source of its name) is the conditional likelihood theorem known as **Bayes' rule**. In its simplest form, Bayes' Rule states that for two events and A and B (with `\(P(B) \neq 0\)`): $$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$ Or, if A can take on multiple values, we have the **extended form**: $$ p(A_i|B) = \frac{p(B | A_i) P(A_i)}{\sum_j P(B|A_j)P(A_j)} $$ --- # **Inference using Bayes' Rule** Adapting Bayes' Rule to the case of a parameter value, `\(\theta\)` and observed data *y*, we have: $$ p(\theta \mid y) = \frac{f(\theta, y)}{f(y)} = \frac{f(y\mid\theta)p(\theta)}{ \int f(y \mid \theta) p(\theta) d\theta} \underbrace{\propto}_{\text{proportional to}} f(y|\theta)p(\theta) $$ Adding a bit of terminology: - `\(p(\theta)\)` is the **prior** distribution: our initial subjective belief about the probability distribution of `\(\theta\)`. - `\(f(y|\theta)\)` you may recognize from Maximum Likelihood estimation as: + `\(f(y | \theta) = \prod_{i=1}^n f(y_i |\theta) = \mathcal{L} (\theta)\)`, the **likelihood function**. - Finally, `\(p(\theta)|y)\)` is our **posterior** (post-experiment) belief about the probability distribution of `\(\theta\)`. --- ### Continued - Hence we have the basic statement: $$ Posterior \propto Likelihood \times Prior $$ - This is commonly summarized as saying that the posterior belief is a compromise between the data and prior belief. --- ### **Types of Priors** Different types of priors include: - **Uninformative (or "flat") priors**: Priors that have no impact on posterior values (ie assuming total ignorance about the possible parameter value). + A classic uninformative prior is the uniform prior, which treats all possible parameter values as equally likely: `\(p(\theta) = c \text{ } \forall \theta \in \Theta\)` - **Informative priors**: Priors where we use prior knowledge to specify a prior with a best-guess of the prior mean and distribution for a parameter value. - **Weakly informative** or **regularizing priors**: Priors which are deliberately less informative than our actual knowledge, affecting estimates less than informative priors but at least incorporating *very conservative* information into the \\\\ production of posterior estimates. --- ### **Bayesian Computation** One major feature of Bayesian inference that I haven't mentioned so far is the intractability of analytic solutions for estimating posterior distributions in most circumstances. Recall: $$ p(\theta \mid y) = \frac{f(\theta, y)}{f(y)} = \frac{f(y\mid\theta)p(\theta)}{ \int f(y \mid \theta) p(\theta) d\theta} $$ For models that are more complex or that involve high-dimensional data, closed-form solutions are not available for the integral in the denominator. Hence, Bayesian analysis instead typically relies on numerical methods, usually Markov Chain Monte Carlo (MCMC) methods. --- ### **MCMC Methods** This method relies on sequentially sampling values of `\(\theta\)` from an approximation of the posterior distribution and then correcting the approximation to create better subsequent samples. - Because the approximated distribution used in 1 step relies on the sample from the previous step, the simulation forms a **Markov chain**. - A critical property then is **convergence**: Have our simulations converged to the real target distribution? + Typically instead of running one really long "chain", researchers use multiple short chains. + The aggregate can not only converge faster, but can provide a better sense of convergence through the noisiness between multiple chains. --- ### **Short comparison of rstanarm and brms** **rstanarm** is faster, has better posterior checking, and is a bit simpler to use. **brms** is generally a bit more flexible, with support for some regression types missing in rstanarm, more flexible specification of priors, and support for more types of error correlational structures. - [rstanarm](https://www.rdocumentation.org/packages/rstanarm) is the more popular choice. --- ### **Using rstanarm** With rstanarm, most regressions you run using the function [stan_glm()](https://www.rdocumentation.org/packages/rstanarm/versions/2.17.2/topics/stan_glm) - Since generalized linear models (GLMs) incorporates models like linear regression, probit, logit, Poisson, binomial, exponential, etc. **Syntax:** ```r mybayesreg <- stan_glm(y ~ X1 + x2 + x3 ..., family = myfamily, data = mydata, prior = myprior) ``` --- ### **Options with stan_glm** **Family** (with a possible **link** argument needed as well) defines the type of regression you want: - Linear regression: `family = gaussian` - Logit: `family = binomial(link = "logit")` - Probit: `family = binomial(link = "probit")` - Poisson: `family = poisson` - More options can be read from the main [GLM page](https://www.rdocumentation.org/packages/stats/versions/3.4.3/topics/glm) **Prior distributions:** - **Flat priors** can be set by using `prior = NULL` - *[Weakly]* Informative Priors can be specified by using `prior = ` with one of: + *normal, student_t, cauchy, laplace* and more found [here](https://www.rdocumentation.org/packages/rstanarm/versions/2.17.2/topics/priors) --- ### **A Titanic survival example with rstanarm** ```r url<-"https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv" Titanic <- read.csv(url, header = TRUE) # Display titanic data #glimpse(Titanic,width = 50) # Reformat Class Titanic$class <- str_extract(Titanic$Pclass, "[0-9]") TitanicLinear <- stan_glm(Survived ~ Age + Sex + as.factor(class), data = Titanic, family = gaussian) ``` ``` ## ## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1). ## Chain 1: ## Chain 1: Gradient evaluation took 0 seconds ## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 1: Adjust your expectations accordingly! ## Chain 1: ## Chain 1: ## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 1: ## Chain 1: Elapsed Time: 0.13 seconds (Warm-up) ## Chain 1: 0.228 seconds (Sampling) ## Chain 1: 0.358 seconds (Total) ## Chain 1: ## ## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2). ## Chain 2: ## Chain 2: Gradient evaluation took 0 seconds ## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 2: Adjust your expectations accordingly! ## Chain 2: ## Chain 2: ## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 2: ## Chain 2: Elapsed Time: 0.115 seconds (Warm-up) ## Chain 2: 0.209 seconds (Sampling) ## Chain 2: 0.324 seconds (Total) ## Chain 2: ## ## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3). ## Chain 3: ## Chain 3: Gradient evaluation took 0 seconds ## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 3: Adjust your expectations accordingly! ## Chain 3: ## Chain 3: ## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 3: ## Chain 3: Elapsed Time: 0.119 seconds (Warm-up) ## Chain 3: 0.231 seconds (Sampling) ## Chain 3: 0.35 seconds (Total) ## Chain 3: ## ## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4). ## Chain 4: ## Chain 4: Gradient evaluation took 0 seconds ## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 4: Adjust your expectations accordingly! ## Chain 4: ## Chain 4: ## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 4: ## Chain 4: Elapsed Time: 0.122 seconds (Warm-up) ## Chain 4: 0.226 seconds (Sampling) ## Chain 4: 0.348 seconds (Total) ## Chain 4: ``` ```r summary(TitanicLinear) ``` ``` ## ## Model Info: ## function: stan_glm ## family: gaussian [identity] ## formula: Survived ~ Age + Sex + as.factor(class) ## algorithm: sampling ## sample: 4000 (posterior sample size) ## priors: see help('prior_summary') ## observations: 714 ## predictors: 5 ## ## Estimates: ## mean sd 10% 50% 90% ## (Intercept) 1.1 0.1 1.1 1.1 1.2 ## Age 0.0 0.0 0.0 0.0 0.0 ## Sexmale -0.5 0.0 -0.5 -0.5 -0.4 ## as.factor(class)2 -0.2 0.0 -0.3 -0.2 -0.2 ## as.factor(class)3 -0.4 0.0 -0.5 -0.4 -0.4 ## sigma 0.4 0.0 0.4 0.4 0.4 ## ## Fit Diagnostics: ## mean sd 10% 50% 90% ## mean_PPD 0.4 0.0 0.4 0.4 0.4 ## ## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')). ## ## MCMC diagnostics ## mcse Rhat n_eff ## (Intercept) 0.0 1.0 2735 ## Age 0.0 1.0 3288 ## Sexmale 0.0 1.0 4577 ## as.factor(class)2 0.0 1.0 2810 ## as.factor(class)3 0.0 1.0 2758 ## sigma 0.0 1.0 4534 ## mean_PPD 0.0 1.0 4223 ## log-posterior 0.0 1.0 1814 ## ## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1). ``` --- ##### **Credible intervals** You can also easily get print the credible intervals with the function [posterior_interval()](https://www.rdocumentation.org/packages/rstanarm/versions/2.17.2/topics/posterior_interval.stanreg) ```r posterior_interval(TitanicLinear, prob=0.95) ``` ``` ## 2.5% 97.5% ## (Intercept) 1.030266767 1.222839447 ## Age -0.007479305 -0.003391709 ## Sexmale -0.539206252 -0.418148797 ## as.factor(class)2 -0.291069298 -0.126452989 ## as.factor(class)3 -0.483923498 -0.331491394 ## sigma 0.366018620 0.406772331 ``` --- #### **Graphical credible intervals** ```r plot(TitanicLinear) ``` <!-- --> --- #### **Plotting the posterior distribution** You can also easily plot the posterior distribution of a parameter in R. ```r Titanic_posterior <- TitanicLinear %>% as_tibble() %>% rename(sec.class = "as.factor(class)2", third.class = "as.factor(class)3") ggplot(Titanic_posterior, aes(x=third.class)) + geom_histogram() ``` ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` <!-- --> --- #### **Juxtaposing the prior and the posterior** ```r posterior_vs_prior(TitanicLinear) ``` ``` ## ## Drawing from prior... ``` <!-- --> --- # Model Testing ## **Model testing basics** There are a number of different regression diagnostics after performing Bayesian regression to help infer if the model converged, determine its performance, and even compare between models. The slides cover some of them included with rstanarm as well as the very useful [shinystan package](https://www.rdocumentation.org/packages/shinystan). --- ### **Graphical posterior predictive analysis** To check the predictive accuracy of the posterior distribution, you can use the function [pp_check()](https://www.rdocumentation.org/packages/rstanarm/versions/2.17.2/topics/pp_check.stanreg), which plots simulated y values from the posterior distribution against the actual values of y. ```r pp_check(TitanicLinear) ``` <!-- --> --- ### **Regularization and Predictive Accuracy** A critical issue in both Bayesian and frequentist estimation is how to balance predictive accuracy with parsimony. Put another, the researcher should be concerned with not overfitting the data while still creating a compelling model. The basic approach in frequentist method is to use the Akaike information criterion (AIC): **Expected Log Predictive Density:** `$$\hat{elpd}_{AIC} = \log p(y | \hat{\theta}_{MLE}) - k$$` - Where `\(\theta_{MLE}\)` is the maximum likelihood estimator of `\(\theta\)`, - `\(\log p(y | \hat{\theta}_{MLE})\)` is the log likelihood given `\(\theta_{MLE}\)`, - and k is the number of parameters in the model. --- ### **Deviance information criterion** The most basic Bayesian adaptation of the AIC is the Deviance information criterion (DIC): `$$\hat{elpd}_{DIC} = \log p(y | \hat{\theta}_{Bayes}) - p_{DIC}$$` - Where `\(\theta_{Bayes}\)` is the mean posterior estimate and - `\(p_{DIC}\)` is the number of "effective parameters in the model" using a data-biased correction --- ### **Watanabe-Akaike information criterion** An improvement over the DIC is the Watanabe-Akaike information criterion: `$$\hat{elpd}_{WAIC} = \sum_{i=1}^{n} \log p(y_i) - \sum_{i=1}^{n} \log V \Big[p(y_i) \Big]$$` The WAIC has the advantages of: - Averaging the likelihood over the posterior distribution rather than using the mean - Does not assume a multivariate Gaussian posterior distribution, as does the DIC (and AIC) --- ### **WAIC example** ```r waic(TitanicLinear) ``` ``` ## ## Computed from 4000 by 714 log-likelihood matrix ## ## Estimate SE ## elpd_waic -334.9 19.1 ## p_waic 6.1 0.4 ## waic 669.9 38.3 ``` --- ### **Leave One Out Cross-Validation** Another method alongside WAIC for comparing out-of-sample predictive ability is to apply leave-one-out cross-validation (LOO). - LOO assesses predictive ability of posterior simulations in which the data is iteratively partitioned into training and prediction sets. **Expected Log Predictive Density:** `$$\hat{elpd}_{LOO} = \sum_{i=1}^{n} \log p(y_i | y_{-i})$$` --- ### **LOO example** ```r loo(TitanicLinear) ``` ``` ## ## Computed from 4000 by 714 log-likelihood matrix ## ## Estimate SE ## elpd_loo -334.9 19.1 ## p_loo 6.1 0.4 ## looic 669.9 38.3 ## ------ ## Monte Carlo SE of elpd_loo is 0.0. ## ## All Pareto k estimates are good (k < 0.5). ## See help('pareto-k-diagnostic') for details. ``` --- ### **Comparing models between loo_logit and loo_probit** ```r Titanic_probit <- stan_glm(Survived ~ Age + Sex + as.factor(class), data = Titanic, family = binomial(link=probit)) ``` ``` ## ## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1). ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: Rejecting initial value: ## Chain 1: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 1: Stan can't start sampling from this initial value. ## Chain 1: ## Chain 1: Gradient evaluation took 0.001 seconds ## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds. ## Chain 1: Adjust your expectations accordingly! ## Chain 1: ## Chain 1: ## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 1: ## Chain 1: Elapsed Time: 2.079 seconds (Warm-up) ## Chain 1: 2.115 seconds (Sampling) ## Chain 1: 4.194 seconds (Total) ## Chain 1: ## ## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2). ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: Rejecting initial value: ## Chain 2: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 2: Stan can't start sampling from this initial value. ## Chain 2: ## Chain 2: Gradient evaluation took 0.001 seconds ## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds. ## Chain 2: Adjust your expectations accordingly! ## Chain 2: ## Chain 2: ## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 2: ## Chain 2: Elapsed Time: 2.189 seconds (Warm-up) ## Chain 2: 2.015 seconds (Sampling) ## Chain 2: 4.204 seconds (Total) ## Chain 2: ## ## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3). ## Chain 3: Rejecting initial value: ## Chain 3: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 3: Stan can't start sampling from this initial value. ## Chain 3: Rejecting initial value: ## Chain 3: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 3: Stan can't start sampling from this initial value. ## Chain 3: Rejecting initial value: ## Chain 3: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 3: Stan can't start sampling from this initial value. ## Chain 3: Rejecting initial value: ## Chain 3: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 3: Stan can't start sampling from this initial value. ## Chain 3: Rejecting initial value: ## Chain 3: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 3: Stan can't start sampling from this initial value. ## Chain 3: ## Chain 3: Gradient evaluation took 0 seconds ## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 3: Adjust your expectations accordingly! ## Chain 3: ## Chain 3: ## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 3: ## Chain 3: Elapsed Time: 2.016 seconds (Warm-up) ## Chain 3: 1.978 seconds (Sampling) ## Chain 3: 3.994 seconds (Total) ## Chain 3: ## ## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4). ## Chain 4: Rejecting initial value: ## Chain 4: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 4: Stan can't start sampling from this initial value. ## Chain 4: Rejecting initial value: ## Chain 4: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 4: Stan can't start sampling from this initial value. ## Chain 4: Rejecting initial value: ## Chain 4: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 4: Stan can't start sampling from this initial value. ## Chain 4: Rejecting initial value: ## Chain 4: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 4: Stan can't start sampling from this initial value. ## Chain 4: Rejecting initial value: ## Chain 4: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 4: Stan can't start sampling from this initial value. ## Chain 4: Rejecting initial value: ## Chain 4: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 4: Stan can't start sampling from this initial value. ## Chain 4: Rejecting initial value: ## Chain 4: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 4: Stan can't start sampling from this initial value. ## Chain 4: Rejecting initial value: ## Chain 4: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 4: Stan can't start sampling from this initial value. ## Chain 4: Rejecting initial value: ## Chain 4: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 4: Stan can't start sampling from this initial value. ## Chain 4: Rejecting initial value: ## Chain 4: Log probability evaluates to log(0), i.e. negative infinity. ## Chain 4: Stan can't start sampling from this initial value. ## Chain 4: ## Chain 4: Gradient evaluation took 0 seconds ## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 4: Adjust your expectations accordingly! ## Chain 4: ## Chain 4: ## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 4: ## Chain 4: Elapsed Time: 2.116 seconds (Warm-up) ## Chain 4: 2.214 seconds (Sampling) ## Chain 4: 4.33 seconds (Total) ## Chain 4: ``` ```r Loo_probit <- loo(Titanic_probit) Titanic_logit <- stan_glm(Survived ~ Age + Sex + as.factor(class), data = Titanic, family = binomial(link=logit)) ``` ``` ## ## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1). ## Chain 1: ## Chain 1: Gradient evaluation took 0 seconds ## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 1: Adjust your expectations accordingly! ## Chain 1: ## Chain 1: ## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 1: ## Chain 1: Elapsed Time: 0.652 seconds (Warm-up) ## Chain 1: 0.627 seconds (Sampling) ## Chain 1: 1.279 seconds (Total) ## Chain 1: ## ## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2). ## Chain 2: ## Chain 2: Gradient evaluation took 0 seconds ## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 2: Adjust your expectations accordingly! ## Chain 2: ## Chain 2: ## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 2: ## Chain 2: Elapsed Time: 0.634 seconds (Warm-up) ## Chain 2: 0.775 seconds (Sampling) ## Chain 2: 1.409 seconds (Total) ## Chain 2: ## ## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3). ## Chain 3: ## Chain 3: Gradient evaluation took 0.001 seconds ## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds. ## Chain 3: Adjust your expectations accordingly! ## Chain 3: ## Chain 3: ## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 3: ## Chain 3: Elapsed Time: 0.667 seconds (Warm-up) ## Chain 3: 0.725 seconds (Sampling) ## Chain 3: 1.392 seconds (Total) ## Chain 3: ## ## SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4). ## Chain 4: ## Chain 4: Gradient evaluation took 0 seconds ## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 4: Adjust your expectations accordingly! ## Chain 4: ## Chain 4: ## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 4: ## Chain 4: Elapsed Time: 0.651 seconds (Warm-up) ## Chain 4: 0.696 seconds (Sampling) ## Chain 4: 1.347 seconds (Total) ## Chain 4: ``` ```r Loo_logit <- loo(Titanic_logit) # ELPD_diff>0 indicates more support for 2nd model loo_compare(Loo_probit, Loo_logit) ``` ``` ## elpd_diff se_diff ## Titanic_logit 0.0 0.0 ## Titanic_probit -1.2 0.7 ``` --- ### **Continued** ```r # ELPD_diff>0 indicates more support for 2nd model loo_compare(Loo_probit, Loo_logit) ``` ``` ## elpd_diff se_diff ## Titanic_logit 0.0 0.0 ## Titanic_probit -1.2 0.7 ``` --- ### **Many more diagnostics with shinystan** Probably the most popular diagnostic for Bayesian regression in R is the functionality from the [shinystan package](https://www.rdocumentation.org/packages/shinystan). - Shinystan launches a "Shiny" web application to show you model diagnostics, so it can't be done inside of a RMarkdown document (but works just fine if called from the console). ```r # Do in console not RMarkdown launch_shinystan(TitanicLinear) ``` --- ### Dose response example data ```r exp.dat = matrix(c(49.1,53,56.9,60.8,64.8,68.7,72.6,76.5, 59,60,62,56,63,59,62,60, 6,13,18,28,52,53,61,60, .102,.217,.29,.5,.825,.898,.984,1),ncol=4) colnames(exp.dat) = c('dose','nexp','ndied','prop') exp.dat = as.data.frame(exp.dat) # compute how many lived exp.dat$nalive = exp.dat$nexp - exp.dat$ndied #View(exp.dat) summary(glm(cbind(ndied,nalive) ~ dose, family=binomial, data=exp.dat)->exp.glm) ``` ``` ## ## Call: ## glm(formula = cbind(ndied, nalive) ~ dose, family = binomial, ## data = exp.dat) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.2746 -0.4668 0.7688 0.9544 1.2990 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -14.82300 1.28959 -11.49 <2e-16 *** ## dose 0.24942 0.02139 11.66 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 284.2024 on 7 degrees of freedom ## Residual deviance: 7.3849 on 6 degrees of freedom ## AIC: 37.583 ## ## Number of Fisher Scoring iterations: 4 ``` ```r dose4prob = function(b0,b1,prob){ d = (-b0+log(-prob/(prob-1)))/b1 return(d) } dose4prob(coef(exp.glm)[[1]],coef(exp.glm)[[2]],.5) ``` ``` ## [1] 59.43092 ``` ```r range(exp.dat$dose) ``` ``` ## [1] 49.1 76.5 ``` ```r drange = seq(30,90,length=100) exp.pred = predict(exp.glm, newdata=data.frame(dose=drange)) ``` --- ### Plot for LD25 and LD50 ```r knitr::include_graphics("C:\\Users\\HJU\\OneDrive - Environmental Protection Agency (EPA)\\Profile\\Documents\\curve.png") ``` <img src="C:\Users\HJU\OneDrive - Environmental Protection Agency (EPA)\Profile\Documents\curve.png" width="50%" /> ```r plot(drange, exp(exp.pred)/(1+exp(exp.pred)), type='l', xlab='dose', ylab='probability') # add in the observed points points(prop ~ dose, data = exp.dat, col='red', pch=19) # compute our LD values ld50 = dose4prob(coef(exp.glm)[[1]],coef(exp.glm)[[2]],.50) ld25 = dose4prob(coef(exp.glm)[[1]],coef(exp.glm)[[2]],.25) # let's add these to our graph for visualization purposes abline(h=.5,lty='dashed',col='red') abline(h=.25,lty='dotted',col='blue') legend('topleft', c(expression(LD[50]), expression(LD[25])), col=c('red','blue'),lty=c('dashed','dotted'), inset=.01) text(ld50,.52,sprintf("%.3f",ld50),pos=c(2),col='red') text(ld25,.27,sprintf("%.3f",ld25),pos=c(2),col="blue") ``` <img src="Bayesian-Methods-Playbook_files/figure-html/unnamed-chunk-14-2.png" width="50%" /> --- ### Continued ```r DT::datatable(exp.dat,options=list(bPaginate=FALSE)) ```
--- ###**LOO package in *rstanarm * ** ```r library(rstanarm) options(mc.cores = parallel::detectCores()) library(loo) library(tidyverse) library(GGally) library(bayesplot) theme_set(bayesplot::theme_default()) library(projpred) SEED=87 set.seed(SEED) fitg <- stan_glm(cbind(ndied,nalive) ~ dose, data=exp.dat, na.action = na.fail, family=binomial(), seed=SEED) summary(glm(cbind(ndied,nalive) ~ dose, family=binomial, data=exp.dat)) ``` ``` ## ## Call: ## glm(formula = cbind(ndied, nalive) ~ dose, family = binomial, ## data = exp.dat) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.2746 -0.4668 0.7688 0.9544 1.2990 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -14.82300 1.28959 -11.49 <2e-16 *** ## dose 0.24942 0.02139 11.66 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 284.2024 on 7 degrees of freedom ## Residual deviance: 7.3849 on 6 degrees of freedom ## AIC: 37.583 ## ## Number of Fisher Scoring iterations: 4 ``` ```r round(coef(fitg), 2) ``` ``` ## (Intercept) dose ## -14.85 0.25 ``` ```r round(posterior_interval(fitg, prob = 0.9), 2) ``` ``` ## 5% 95% ## (Intercept) -17.01 -12.86 ## dose 0.22 0.29 ``` ```r fitg0 <- stan_glm(cbind(ndied,nalive) ~ 1, data = exp.dat, na.action = na.fail, family=binomial(), seed=SEED) (loog <- loo(fitg)) ``` ``` ## ## Computed from 4000 by 8 log-likelihood matrix ## ## Estimate SE ## elpd_loo -19.1 2.4 ## p_loo 2.1 0.4 ## looic 38.2 4.7 ## ------ ## Monte Carlo SE of elpd_loo is 0.1. ## ## Pareto k diagnostic values: ## Count Pct. Min. n_eff ## (-Inf, 0.5] (good) 6 75.0% 1214 ## (0.5, 0.7] (ok) 2 25.0% 392 ## (0.7, 1] (bad) 0 0.0% <NA> ## (1, Inf) (very bad) 0 0.0% <NA> ## ## All Pareto k estimates are ok (k < 0.7). ## See help('pareto-k-diagnostic') for details. ``` ```r (loog0 <- loo(fitg0)) ``` ``` ## Warning: Found 5 observation(s) with a pareto_k > 0.7. We recommend calling 'loo' again with argument 'k_threshold = 0.7' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 5 times to compute the ELPDs for the problematic observations directly. ``` ``` ## ## Computed from 4000 by 8 log-likelihood matrix ## ## Estimate SE ## elpd_loo -172.8 34.1 ## p_loo 30.5 6.7 ## looic 345.6 68.1 ## ------ ## Monte Carlo SE of elpd_loo is NA. ## ## Pareto k diagnostic values: ## Count Pct. Min. n_eff ## (-Inf, 0.5] (good) 2 25.0% 129 ## (0.5, 0.7] (ok) 1 12.5% 57 ## (0.7, 1] (bad) 3 37.5% 16 ## (1, Inf) (very bad) 2 25.0% 5 ## See help('pareto-k-diagnostic') for details. ``` --- ### **LOO comparison** ```r loo_compare(loog0, loog) ``` ``` ## elpd_diff se_diff ## fitg 0.0 0.0 ## fitg0 -153.7 35.5 ``` - Based on cross-validation dosage contains significant information to improve predictions. --- ###** Compare loo_probit and loo_logit** ```r #Compare the probit fitg1 <- stan_glm(cbind(ndied,nalive) ~ dose, data = exp.dat, na.action = na.fail, family=binomial(link=probit), seed=SEED) loog1<-loo(fitg1) loo_compare(loog1, loog) ``` ``` ## elpd_diff se_diff ## fitg1 0.0 0.0 ## fitg -0.7 0.7 ``` ```r round(posterior_interval(fitg1, prob = 0.9), 2) ``` ``` ## 5% 95% ## (Intercept) -9.74 -7.46 ## dose 0.13 0.16 ``` --- ##LOO_logit ```r loog ``` ``` ## ## Computed from 4000 by 8 log-likelihood matrix ## ## Estimate SE ## elpd_loo -19.1 2.4 ## p_loo 2.1 0.4 ## looic 38.2 4.7 ## ------ ## Monte Carlo SE of elpd_loo is 0.1. ## ## Pareto k diagnostic values: ## Count Pct. Min. n_eff ## (-Inf, 0.5] (good) 6 75.0% 1214 ## (0.5, 0.7] (ok) 2 25.0% 392 ## (0.7, 1] (bad) 0 0.0% <NA> ## (1, Inf) (very bad) 0 0.0% <NA> ## ## All Pareto k estimates are ok (k < 0.7). ## See help('pareto-k-diagnostic') for details. ``` ```r #loog1 #summary(loog) ``` --- ##LOO_probit ```r loog1 ``` ``` ## ## Computed from 4000 by 8 log-likelihood matrix ## ## Estimate SE ## elpd_loo -18.5 2.7 ## p_loo 1.9 0.4 ## looic 36.9 5.4 ## ------ ## Monte Carlo SE of elpd_loo is 0.0. ## ## Pareto k diagnostic values: ## Count Pct. Min. n_eff ## (-Inf, 0.5] (good) 7 87.5% 922 ## (0.5, 0.7] (ok) 1 12.5% 1143 ## (0.7, 1] (bad) 0 0.0% <NA> ## (1, Inf) (very bad) 0 0.0% <NA> ## ## All Pareto k estimates are ok (k < 0.7). ## See help('pareto-k-diagnostic') for details. ``` ```r #loog1 #summary(loog) ``` --- ## summary for loo_probit ```r #summary(fitg1) posterior <- as.array(fitg1) np <- nuts_params(fitg1) mcmc_pairs(posterior, pars = c("dose", "(Intercept)"), np = np) ``` <img src="Bayesian-Methods-Playbook_files/figure-html/unnamed-chunk-21-1.png" width="50%" /> ```r #launch_shinystan(fitg1) ``` --- # Question/Discussion Thank You!