This is a reproduction from https://www.r-bloggers.com/2020/04/bayesian-linear-regression/ for personal study purpose.

1. Introduction

Frequentist

For statistical inferences we have tow general approaches or frameworks:

  • Frequentist approach in which the data sampled from the population is considered as random and the population parameter values, known as null hypothesis, as fixed (but unknown). To estimate thus this null hypothesis we look for the sample parameters that maximize the likelihood of the data. However, the data at hand, even it is sampled randomly from the population, it is fixed now, so how can we consider this data as random. The answer is that we assume that the population distribution is known and we work out the maximum likelihood of the data using this distribution. Or we repeat the study many times with different samples then we average the results. So if we get very small value for the likelihood of the data which is known as p-value we tend to reject the null hypothesis. The main problem, however, is the misunderstanding and misusing of this p-value when we decide to reject the null hypothesis based on some threshold, from which we wrongly interpreting it as the probability of rejecting the null hypothesis. For more detail about p-value click here.

  • Bayesian approach, in contrast, provides true probabilities to quantify the uncertainty about a certain hypothesis, but requires the use of a first belief about how likely this hypothesis is true, known as prior, to be able to derive the probability of this hypothesis after seeing the data known as posterior probability. This approach called bayesian because it is based on the bayes’ theorem, for instance if a have population parameter to estimate 𝜃 , and we have some data sampled randomly from this population 𝐷, the posterior probability thus will be \[\overbrace{p(\theta/D)}^{Posterior}=\frac{\overbrace{p(D/\theta)}^{Likelihood}.\overbrace{p(\theta)}^{Prior}}{\underbrace{p(D)}_{Evidence}}\] The Evidence is the probability of the data at hand regardless the parameter 𝜃

2. Data preparation

For simplicity we use the BostonHousing data from mlbench package, For more detail about this data run this command ?BostonHousing after calling the package. But first Let’s call all the packages that we need throughout this article.

suppressPackageStartupMessages(library(mlbench))
suppressPackageStartupMessages(library(rstanarm))
suppressPackageStartupMessages(library(bayestestR))
suppressPackageStartupMessages(library(bayesplot))
suppressPackageStartupMessages(library(insight))
suppressPackageStartupMessages(library(broom))
data("BostonHousing")
str(BostonHousing)
'data.frame':   506 obs. of  14 variables:
 $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
 $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
 $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
 $ chas   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
 $ rm     : num  6.58 6.42 7.18 7 7.15 ...
 $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
 $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
 $ rad    : num  1 2 2 3 3 3 5 5 5 5 ...
 $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
 $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
 $ b      : num  397 397 393 395 397 ...
 $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
 $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

To well understand how the bayesian regression works we keep only three features, two numeric variables age, dis and one categorical chas, with the target variable medv the median value of owner-occupied homes.

bost <- BostonHousing[,c("medv","age","dis","chas")]
head(bost)
  medv  age    dis chas
1 24.0 65.2 4.0900    0
2 21.6 78.9 4.9671    0
3 34.7 61.1 4.9671    0
4 33.4 45.8 6.0622    0
5 36.2 54.2 6.0622    0
6 28.7 58.7 6.0622    0
str(bost)
'data.frame':   506 obs. of  4 variables:
 $ medv: num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
 $ age : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
 $ dis : num  4.09 4.97 4.97 6.06 6.06 ...
 $ chas: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

3. Classical linear regression model

To highlight the difference between the Bayesian regression and the traditional linear regression (frequentist approach), Let’s first fit the latter to our data.

model_freq<-lm(medv~., data=bost)
tidy(model_freq)

Checking the p.value of each regressor tells all the regressors are significant except “dis” variable. Since the variable “chas” is categorical by two levels, the coefficient of “chas1” is the difference between the median price of houses on the bounds Charles River and that of the others, so the median price of the former is about 7.513.

4. Bayesian regression

To fit a Bayesian regression, we use the function stan_glm from the rstanarm package. This function as the above lm function requires providing the formula and the data that will be used, and leave all the following arguments with their default values:

model_bayes <- stan_glm(medv~., data=bost, seed=111)

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 3.7e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.37 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.108939 seconds (Warm-up)
Chain 1:                0.152252 seconds (Sampling)
Chain 1:                0.261191 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 2.2e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.22 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.09926 seconds (Warm-up)
Chain 2:                0.136321 seconds (Sampling)
Chain 2:                0.235581 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 2e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.100463 seconds (Warm-up)
Chain 3:                0.150563 seconds (Sampling)
Chain 3:                0.251026 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 2.4e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.104112 seconds (Warm-up)
Chain 4:                0.142341 seconds (Sampling)
Chain 4:                0.246453 seconds (Total)
Chain 4: 
print(model_bayes, digits = 3)
stan_glm
 family:       gaussian [identity]
 formula:      medv ~ .
 observations: 506
 predictors:   4
------
            Median MAD_SD
(Intercept) 32.834  2.285
age         -0.143  0.020
dis         -0.258  0.257
chas1        7.543  1.432

Auxiliary parameter(s):
      Median MAD_SD
sigma 8.324  0.260 

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

The Median estimate is the median computed from the MCMC simulation, and MAD_SD is the median absolute deviation computed from the same simulation. To well understand how getting these outputs, we plot the MCMC simulation of each predictor using bayesplot

mcmc_dens(model_bayes, pars = c("age"))+
  vline_at(-0.143, col="red")

As can be seen, the point estimate of “age” falls on the median of the distribution (red line). The same thing is true for “dis”" and “chas” predictors.

mcmc_dens(model_bayes, pars=c("dis"))+
  vline_at(-0.244, col="red")

mcmc_dens(model_bayes, pars=c("chas1"))+
  vline_at(7.496, col="red")

How do we evaluate the model parameters? By analyzing the posteriors using some specific statistics. We make use of the function describe_posterior provided by bayestestR package.

describe_posterior(model_bayes)
Possible multicollinearity between dis and age (r = 0.76). This might lead to inappropriate results. See 'Details' in '?rope'.
# Description of Posterior Distributions

Parameter   | Median |           89% CI |    pd |        89% ROPE | % in ROPE |  Rhat |      ESS
------------------------------------------------------------------------------------------------
(Intercept) | 32.834 | [29.218, 36.295] | 1.000 | [-0.920, 0.920] |         0 | 1.002 | 2029.279
age         | -0.143 | [-0.175, -0.112] | 1.000 | [-0.920, 0.920] |       100 | 1.001 | 2052.155
dis         | -0.258 | [-0.667,  0.179] | 0.819 | [-0.920, 0.920] |       100 | 1.002 | 2115.192
chas1       |  7.543 | [ 5.159,  9.813] | 1.000 | [-0.920, 0.920] |         0 | 1.000 | 3744.403

Aternatively, we can get the coefficeient estimates (which are the medians by default) separatly by using the package insight.

post <- get_parameters(model_bayes)
print(purrr::map_dbl(post,median),digits = 3)
(Intercept)         age         dis       chas1 
     32.834      -0.143      -0.258       7.543 

We can also compute the Maximum A posteriori (MAP), and the mean as follows:

print(purrr::map_dbl(post, map_estimate),digits = 3)
(Intercept)         age         dis       chas1 
     33.025      -0.145      -0.295       7.573 
print(purrr::map_dbl(post, mean),digits = 3)
(Intercept)         age         dis       chas1 
     32.761      -0.143      -0.248       7.523 

As we see, the values are closer to each other due to the like normality of the distribution of the posteriors where all the central statistics (mean, median, mode) are closer to each other. Using the following plot to visualize the age coefficient using different statistics as follows:

mcmc_dens(model_bayes, pars=c("age"))+
  vline_at(median(post$age), col="red")+
  vline_at(mean(post$age), col="yellow")+
  vline_at(map_estimate(post$age), col="green")

As expected they are approximately on top of each other.

5. Bayesian inferences

As we do with classical regression (frequentist), we can test the significance of the bayesian regression coefficients by checking whether the corresponding credible interval contains zero or not, if no then this coefficient is significant. Let’s go back to our model and check the significance of each coefficient (using credible based on the default hdi).

hdi(model_bayes)
# Highest Density Interval

Parameter   |        89% HDI
----------------------------
(Intercept) | [29.22, 36.29]
age         | [-0.18, -0.11]
dis         | [-0.67,  0.18]
chas1       | [ 5.16,  9.81]
eti(model_bayes)
# Equal-Tailed Interval

Parameter   |        89% ETI
----------------------------
(Intercept) | [29.20, 36.28]
age         | [-0.17, -0.11]
dis         | [-0.67,  0.18]
chas1       | [ 5.17,  9.83]

Using both methods, the only non-significant coefficient is dis variable, which is in line with the classical regression.

Note: this similar result between frequentist and bayesian regression may due to the normality assumption for the former that is well satisfied which gives satisfied results and due to the normal prior used in the latter. However, in real world it is less often to be sure about the normality assumption which may give contradict conclusions between the two approaches.

Another way to test the significance by checking the part of the credible interval that falls inside the ROPE interval. we can get this by calling the rope from bayestestR package.

rope(post$age)
# Proportion of samples inside the ROPE [-0.10, 0.10]:

inside ROPE
-----------
0.00 %     

For age almost all the credible interval (HDI) is outside the ROPE range, which means that coefficient is highly significant.

rope(post$chas1)
# Proportion of samples inside the ROPE [-0.10, 0.10]:

inside ROPE
-----------
0.00 %     
rope(post$`(Intercept)`)
# Proportion of samples inside the ROPE [-0.10, 0.10]:

inside ROPE
-----------
0.00 %     

The same thing is true for the “chas” and “intercept” variable.

rope(post$dis)
# Proportion of samples inside the ROPE [-0.10, 0.10]:

inside ROPE
-----------
20.02 %    

In contrast, almost the quarter of the credible interval of “dis” variable is inside the ROPE interval. In other words, the probability of this coefficient to be zero is 23.28%.

rope_range(model_bayes)
[1] -0.9197104  0.9197104

6. PD and P-value

Sometimes we are only interested in checking the direction of the coefficient (positive or negative). This is the role of pd statistic in the above table, high value means that the associated effect is concentrated on the same side as the median. For our model, since pd is equal to 1, almost all the posteriors of the two variables “age” and “chas1” and the “intercept” are on the same side (if median negative all other values are negatives). However, it should be noted that this statistic does not assess the significance of the effect. Something more important to mention is that there exists a strong relation between this probability and the p-value approximated as follows: 𝑝−𝑣𝑎𝑙𝑢𝑒=1−𝑝𝑑. let’s check this with our variables.

df1 <-dplyr::select(tidy(model_freq), c(term,p.value))
df1$p.value <- round(df1$p.value, digits = 3)
df2 <- 1- purrr::map_dbl(post, p_direction)
df <- cbind(df1,df2)
df
                   term p.value     df2
(Intercept) (Intercept)   0.000 0.00000
age                 age   0.000 0.00000
dis                 dis   0.354 0.18075
chas1             chas1   0.000 0.00000

Conclusion

Within the last decade more practitioners, specially in some fields such as medicine and psychology, are turning towards bayesian analysis since almost every thing can be interpreted straightforwardly with a probabilistic manner. However, the bayesian analysis has also some drawback, like the subjective way to define the priors (which play an important role to compute the posterior), or for problems that do not have conjugate prior, not always the mcmc alghoritm converges easily to the right values (specially with complex data).

---
title: "Bayesian linear regression"
output:
  html_notebook: default
  html_document: default
---
This is a reproduction from https://www.r-bloggers.com/2020/04/bayesian-linear-regression/
for personal study purpose.

* 1. Introduction
* 2. Data preparation
* 3. Classical linear regression model
* 4. Bayesian regression
* 5. Bayesian inferences
* 6. PD and P-value

# 1. Introduction

## Frequentist

For statistical inferences we have tow general approaches or frameworks:

- Frequentist approach in which the data sampled from the population is considered as random and the population parameter values, known as null hypothesis, as fixed (but unknown). To estimate thus this null hypothesis we look for the sample parameters that maximize the likelihood of the data. However, the data at hand, even it is sampled randomly from the population, it is fixed now, so how can we consider this data as random. The answer is that we assume that the population distribution is known and we work out the maximum likelihood of the data using this distribution. Or we repeat the study many times with different samples then we average the results. So if we get very small value for the likelihood of the data which is known as p-value we tend to reject the null hypothesis. The main problem, however, is the misunderstanding and misusing of this p-value when we decide to reject the null hypothesis based on some threshold, from which we wrongly interpreting it as the probability of rejecting the null hypothesis. For more detail about p-value click here.

- Bayesian approach, in contrast, provides true probabilities to quantify the uncertainty about a certain hypothesis, but requires the use of a first belief about how likely this hypothesis is true, known as prior, to be able to derive the probability of this hypothesis after seeing the data known as posterior probability. This approach called bayesian because it is based on the bayes’ theorem, for instance if a have population parameter to estimate 𝜃 , and we have some data sampled randomly from this population 𝐷, the posterior probability thus will be
$$\overbrace{p(\theta/D)}^{Posterior}=\frac{\overbrace{p(D/\theta)}^{Likelihood}.\overbrace{p(\theta)}^{Prior}}{\underbrace{p(D)}_{Evidence}}$$
The Evidence is the probability of the data at hand regardless the parameter 𝜃

# 2. Data preparation
For simplicity we use the BostonHousing data from mlbench package, For more detail about this data run this command ?BostonHousing after calling the package. But first Let’s call all the packages that we need throughout this article.
```{r}
suppressPackageStartupMessages(library(mlbench))
suppressPackageStartupMessages(library(rstanarm))
suppressPackageStartupMessages(library(bayestestR))
suppressPackageStartupMessages(library(bayesplot))
suppressPackageStartupMessages(library(insight))
suppressPackageStartupMessages(library(broom))
data("BostonHousing")
str(BostonHousing)
```
To well understand how the bayesian regression works we keep only three features, two numeric variables age, dis and one categorical chas, with the target variable medv the median value of owner-occupied homes.
```{r}
bost <- BostonHousing[,c("medv","age","dis","chas")]
head(bost)
str(bost)
```
# 3. Classical linear regression model
To highlight the difference between the Bayesian regression and the traditional linear regression (frequentist approach), Let’s first fit the latter to our data.
```{r}
model_freq<-lm(medv~., data=bost)
tidy(model_freq)
```
Checking the p.value of each regressor tells all the regressors are significant except "dis" variable. Since the variable "chas" is categorical by two levels, the coefficient of "chas1" is the difference between the median price of houses on the bounds Charles River and that of the others, so the median price of the former is about 7.513.

# 4. Bayesian regression
To fit a Bayesian regression, we use the function *stan_glm* from the *rstanarm* package. This function as the above *lm* function requires providing the formula and the data that will be used, and leave all the following arguments with their default values:

- family : by default this function uses the gaussian distribution as we do with the classical *glm* function to perform *lm* model.
- prior : The prior distribution for the regression coefficients, By default the normal prior is used. There are subset of functions used for the prior provided by rstanarm like , student t family, laplace family…ect. To get the full list with all the details run this command *?priors*. If we want a flat uniform prior we set this to NULL.
- prior_intercept: prior for the intercept, can be normal, student_t , or cauchy. If we want a flat uniform prior we set this to NULL.
- prior_aux: prior fo auxiliary parameters such as the error standard deviation for the gaussion family.
- algorithm: The estimating approach to use. The default is "sampling MCMC1.
- QR: FALSE by default, if true QR decomposition applied on the design matrix if we have large number of predictors.
- iter : is the number of iterations if the MCMC method is used, the default is 2000.
- chains : the number of Markov chains, the default is 4.
- warmup : also known as burnin, the number of iterations used for adaptation, and should not be used for inference. By default it is half of the iterations.

```{r}
model_bayes <- stan_glm(medv~., data=bost, seed=111)
print(model_bayes, digits = 3)
```
The Median estimate is the median computed from the MCMC simulation, and MAD_SD is the median absolute deviation computed from the same simulation. To well understand how getting these outputs, we plot the MCMC simulation of each predictor using *bayesplot*
```{r}
mcmc_dens(model_bayes, pars = c("age"))+
  vline_at(-0.143, col="red")
```
As can be seen, the point estimate of "age" falls on the median of the distribution (red line). The same thing is true for "dis"" and "chas" predictors.
```{r}
mcmc_dens(model_bayes, pars=c("dis"))+
  vline_at(-0.244, col="red")
mcmc_dens(model_bayes, pars=c("chas1"))+
  vline_at(7.496, col="red")
```
How do we evaluate the model parameters? By analyzing the posteriors using some specific statistics. We make use of the function *describe_posterior* provided by *bayestestR* package.
```{r}
describe_posterior(model_bayes)
```
- CI : Credible Interval, it is used to quantify the uncertainty about the regression coefficients. There are two methods to compute CI, the highest density interval HDI which is the default, and the Equal-tailed Interval ETI. with 89% probability (given the data) that a coefficient lies above the CI_low value and under CI_high value. This strightforward probabilistic interpretation is completely diffrent from the confidence interval used in classical linear regression where the coefficient fall inside this confidence interval (if we choose 95% of confidence) 95 times if we repeat the study 100 times.
- pd : Probability of Direction , which is the probability that the effect goes to the positive or to the negative direction, and it is considered as the best equivalent for the p-value.
- ROPE_CI: Region of Practical Equivalence, since bayes method deals with true probabilities, it does not make sense to compute the probability of getting the effect equals zero (the null hypothesis) as a point (probability of a point in continuous intervals equal zero ). Thus, we define instead a small range around zero which can be considered practically the same as no effect (zero), this range therefore is called ROPE. By default (according to Cohen, 1988) The Rope is [-0.1,0.1] from the standardized coefficients.
- Rhat: scale reduction factor 𝑅̂ , it is computed for each scalar quantity of interest, as the standard deviation of that quantity from all the chains included together, divided by the root mean square of the separate within-chain standard deviations. When this value is close to 1 we do not have any convergence problem with MCMC.
- ESS: effective sample size, it captures how many independent draws contain the same amount of information as the dependent sample obtained by the MCMC algorithm, the higher the ESS the better. The threshold used in practice is 400.

Aternatively, we can get the coefficeient estimates (which are the medians by default) separatly by using the package *insight*.
```{r}
post <- get_parameters(model_bayes)
print(purrr::map_dbl(post,median),digits = 3)
```
We can also compute the Maximum A posteriori (MAP), and the mean as follows:
```{r}
print(purrr::map_dbl(post, map_estimate),digits = 3)
print(purrr::map_dbl(post, mean),digits = 3)
```
As we see, the values are closer to each other due to the like normality of the distribution of the posteriors where all the central statistics (mean, median, mode) are closer to each other.
Using the following plot to visualize the age coefficient using different statistics as follows:
```{r}
mcmc_dens(model_bayes, pars=c("age"))+
  vline_at(median(post$age), col="red")+
  vline_at(mean(post$age), col="yellow")+
  vline_at(map_estimate(post$age), col="green")
```
As expected they are approximately on top of each other.

# 5. Bayesian inferences
As we do with classical regression (frequentist), we can test the significance of the bayesian regression coefficients by checking whether the corresponding credible interval contains zero or not, if no then this coefficient is significant. Let’s go back to our model and check the significance of each coefficient (using credible based on the default *hdi*).
```{r}
hdi(model_bayes)
eti(model_bayes)
```
Using both methods, the only non-significant coefficient is dis variable, which is in line with the classical regression.

__Note__: this similar result between frequentist and bayesian regression may due to the normality assumption for the former that is well satisfied which gives satisfied results and due to the normal prior used in the latter. However, in real world it is less often to be sure about the normality assumption which may give contradict conclusions between the two approaches.

Another way to test the significance by checking the part of the credible interval that falls inside the *ROPE* interval. we can get this by calling the rope from *bayestestR* package.
```{r}
rope(post$age)
```
For age almost all the credible interval (HDI) is outside the *ROPE* range, which means that coefficient is highly significant.
```{r}
rope(post$chas1)
rope(post$`(Intercept)`)
```
The same thing is true for the "chas" and "intercept" variable.
```{r}
rope(post$dis)
```
In contrast, almost the quarter of the credible interval of "dis" variable is inside the ROPE interval. In other words, the probability of this coefficient to be zero is 23.28%.
```{r}
rope_range(model_bayes)
```

# 6. PD and P-value
Sometimes we are only interested in checking the direction of the coefficient (positive or negative). This is the role of *pd* statistic in the above table, high value means that the associated effect is concentrated on the same side as the median. For our model, since pd is equal to 1, almost all the posteriors of the two variables "age" and "chas1" and the "intercept" are on the same side (if median negative all other values are negatives). However, it should be noted that this statistic does not assess the significance of the effect.
Something more important to mention is that there exists a strong relation between this probability and the *p*-value approximated as follows: 𝑝−𝑣𝑎𝑙𝑢𝑒=1−𝑝𝑑. let’s check this with our variables.
```{r}
df1 <-dplyr::select(tidy(model_freq), c(term,p.value))
df1$p.value <- round(df1$p.value, digits = 3)
df2 <- 1- purrr::map_dbl(post, p_direction)
df <- cbind(df1,df2)
df
```
## Conclusion

Within the last decade more practitioners, specially in some fields such as medicine and psychology, are turning towards bayesian analysis since almost every thing can be interpreted straightforwardly with a probabilistic manner. However, the bayesian analysis has also some drawback, like the subjective way to define the priors (which play an important role to compute the posterior), or for problems that do not have conjugate prior, not always the *mcmc* alghoritm converges easily to the right values (specially with complex data).

