knitr::opts_chunk$set(
    echo = TRUE,
    message = FALSE,
    warning = FALSE
)

Previous sections:

Clustering practice: https://rpubs.com/minhtri/923711
Table Descriptive Analysis: https://rpubs.com/minhtri/929114
Imputation - Dealing with missing data: https://rpubs.com/minhtri/968586
The R environment - Data Wrangling: https://rpubs.com/minhtri/968607
Correlation analysis: https://rpubs.com/minhtri/968611
Parametric or Nonparametric data: https://rpubs.com/minhtri/968616
Dealing with outliers or non-normal distribution: https://rpubs.com/minhtri/968622
Linear regression - Frequentist approach: https://rpubs.com/minhtri/968628

Note: This analysis is used for personal study purpose. In this section, I will summerize some basic information about linear regression - Bayesian approach based on several sources.

R Packages:

# More packages will be shown during the analysis process
# Load the required library
library(tidyverse)    # Data Wrangling
library(conflicted)   # Dealing with conflict package
library(readxl)       # Read csv file

Dealing with Conflicts
There is a lot of packages here, and sometimes individual functions are in conflict. R’s default conflict resolution system gives precedence to the most recently loaded package. This can make it hard to detect conflicts, particularly when introduced by an update to an existing package.

Using the code below helps the entire section run properly. You may or may not need to look into the conflicted package for your work.

conflict_prefer("filter", "dplyr")
conflict_prefer("select", "dplyr")
conflict_prefer("Predict", "rms")
conflict_prefer("impute_median", "simputation")
conflict_prefer("summarize", "dplyr")

1 Simulated fakestroke data

Data used in this notes
fakestroke.csv
(Source: https://github.com/THOMASELOVE/432-data/blob/master/data/fakestroke.csv)

The fakestroke.csv file contains the following 18 variables for 500 patients:

# To make a table in R markdown: 1st row: header, 2nd row: Alignment; the remaining row: for content
## |Variable |  Description |
## |:------- | :----------  |
## |studyid  | Study ID  # (z001 through z500) |

Variable	Description
studyid	Study ID # (z001 through z500)
trt	Treatment group (Intervention or Control)
age	Age in years
sex	Male or Female
nihss	NIH Stroke Scale Score (can range from 0-42; higher scores indicate more severe neurological deficits)
location	Stroke Location - Left or Right Hemisphere
hx.isch	History of Ischemic Stroke (Yes/No)
afib	Atrial Fibrillation (1 = Yes, 0 = No)
dm	Diabetes Mellitus (1 = Yes, 0 = No)
mrankin	Pre-stroke modified Rankin scale score (0, 1, 2 or > 2) indicating functional disability - complete range is 0 (no symptoms) to 6 (death)
sbp	Systolic blood pressure, in mm Hg
iv.altep	Treatment with IV alteplase (Yes/No)
time.iv	Time from stroke onset to start of IV alteplase (minutes) if iv.altep=Yes
aspects	Alberta Stroke Program Early Computed Tomography score, which measures extent of stroke from 0 - 10; higher scores indicate fewer early ischemic changes
ia.occlus	Intracranial arterial occlusion, based on vessel imaging - five categories3
extra.ica	Extracranial ICA occlusion (1 = Yes, 0 = No)
time.rand	Time from stroke onset to study randomization, in minutes
time.punc	Time from stroke onset to groin puncture, in minutes (only if Intervention)

2 Data preprocessing step

2.1 Step 1: Data wrangling

Please refer to this section: The R environment - Data Wrangling: https://rpubs.com/minhtri/933332

Load the data to R and examine some basic information about variables. When loading the data, I also define the type of variables (numeric or text):

df <- read_excel("D:/Statistics/R/R data/fakestroke.xlsx", 
                 col_types = c("text", "text", "numeric",
                               "text", "numeric", "text", "text", 
                               "numeric", "numeric", "text", "numeric", 
                               "text", "numeric", "numeric", "text", 
                               "numeric", "numeric", "numeric")) 
names(df)

##  [1] "studyid"   "trt"       "age"       "sex"       "nihss"     "location" 
##  [7] "hx.isch"   "afib"      "dm"        "mrankin"   "sbp"       "iv.altep" 
## [13] "time.iv"   "aspects"   "ia.occlus" "extra.ica" "time.rand" "time.punc"

str(df)

## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
##  $ studyid  : chr [1:500] "z001" "z002" "z003" "z004" ...
##  $ trt      : chr [1:500] "Control" "Intervention" "Control" "Control" ...
##  $ age      : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
##  $ sex      : chr [1:500] "Male" "Male" "Female" "Male" ...
##  $ nihss    : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
##  $ location : chr [1:500] "Right" "Left" "Right" "Left" ...
##  $ hx.isch  : chr [1:500] "No" "No" "No" "No" ...
##  $ afib     : num [1:500] 0 1 0 0 0 0 0 0 1 0 ...
##  $ dm       : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
##  $ mrankin  : chr [1:500] "2" "0" "0" "0" ...
##  $ sbp      : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
##  $ iv.altep : chr [1:500] "Yes" "Yes" "No" "Yes" ...
##  $ time.iv  : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
##  $ aspects  : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
##  $ ia.occlus: chr [1:500] "M1" "M1" "ICA with M1" "ICA with M1" ...
##  $ extra.ica: num [1:500] 0 1 1 1 0 0 0 0 0 0 ...
##  $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
##  $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...

Define factor for all character variables:

df <- df%>%mutate_if(is.character, as.factor)
str(df)

## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
##  $ studyid  : Factor w/ 500 levels "z001","z002",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ trt      : Factor w/ 2 levels "Control","Intervention": 1 2 1 1 1 1 2 1 1 2 ...
##  $ age      : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
##  $ sex      : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 2 1 ...
##  $ nihss    : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
##  $ location : Factor w/ 2 levels "Left","Right": 2 1 2 1 2 1 2 2 1 2 ...
##  $ hx.isch  : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ afib     : num [1:500] 0 1 0 0 0 0 0 0 1 0 ...
##  $ dm       : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
##  $ mrankin  : Factor w/ 4 levels "> 2","0","1",..: 4 2 2 2 2 4 2 2 4 2 ...
##  $ sbp      : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
##  $ iv.altep : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 1 2 2 ...
##  $ time.iv  : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
##  $ aspects  : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
##  $ ia.occlus: Factor w/ 6 levels "A1 or A2","ICA with M1",..: 4 4 2 2 4 1 4 4 4 4 ...
##  $ extra.ica: num [1:500] 0 1 1 1 0 0 0 0 0 0 ...
##  $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
##  $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...

afib, dm, extra.ica are variables but in the excel file they are noted as “1” and “0”. Therefore, I must convert them back to “Yes-No” for factor variables:

# Changing coding variable to categorical:
df$afib       <- factor(df$afib , levels=0:1, labels=c("No", "Yes"))
df$dm         <- factor(df$dm , levels=0:1, labels=c("No", "Yes"))
df$extra.ica  <- factor(df$extra.ica, levels=0:1, labels=c("No", "Yes"))
# Checking after converting
str(df)

## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
##  $ studyid  : Factor w/ 500 levels "z001","z002",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ trt      : Factor w/ 2 levels "Control","Intervention": 1 2 1 1 1 1 2 1 1 2 ...
##  $ age      : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
##  $ sex      : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 2 1 ...
##  $ nihss    : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
##  $ location : Factor w/ 2 levels "Left","Right": 2 1 2 1 2 1 2 2 1 2 ...
##  $ hx.isch  : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ afib     : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 2 1 ...
##  $ dm       : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ mrankin  : Factor w/ 4 levels "> 2","0","1",..: 4 2 2 2 2 4 2 2 4 2 ...
##  $ sbp      : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
##  $ iv.altep : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 1 2 2 ...
##  $ time.iv  : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
##  $ aspects  : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
##  $ ia.occlus: Factor w/ 6 levels "A1 or A2","ICA with M1",..: 4 4 2 2 4 1 4 4 4 4 ...
##  $ extra.ica: Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 1 1 1 ...
##  $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
##  $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...

Reorder the level of some character variables. After this, basically we have tidy data that are nearly ready for Missingness checking:

trt: Intervention (baseline) , Control
mrankin: 0, 1, 2, > 2.
ia.occlus: rearrange the ia.occlus variable to the order presented in Berkhemer et al. (2015).

# Relevel factors:
df$trt = factor(df$trt, levels=c("Control", "Intervention"))
df$mrankin = factor(df$mrankin, levels=c("0", "1", "2", "> 2"))
df$ia.occlus = factor(df$ia.occlus, levels=c("Intracranial ICA", "ICA with M1", 
                                             "M1", "M2", "A1 or A2"))

# Check order of factor after relevel: 
str(df)

## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
##  $ studyid  : Factor w/ 500 levels "z001","z002",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ trt      : Factor w/ 2 levels "Control","Intervention": 1 2 1 1 1 1 2 1 1 2 ...
##  $ age      : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
##  $ sex      : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 2 1 ...
##  $ nihss    : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
##  $ location : Factor w/ 2 levels "Left","Right": 2 1 2 1 2 1 2 2 1 2 ...
##  $ hx.isch  : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ afib     : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 2 1 ...
##  $ dm       : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ mrankin  : Factor w/ 4 levels "0","1","2","> 2": 3 1 1 1 1 3 1 1 3 1 ...
##  $ sbp      : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
##  $ iv.altep : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 1 2 2 ...
##  $ time.iv  : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
##  $ aspects  : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
##  $ ia.occlus: Factor w/ 5 levels "Intracranial ICA",..: 3 3 2 2 3 5 3 3 3 3 ...
##  $ extra.ica: Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 1 1 1 ...
##  $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
##  $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...

2.2 Step 2: Missingness checking and imputation when necessary

Package: VIM

Command: aggr (x, plot = T, prop = T, numbers = F, combined = F, sortVars = T, cex.axis=.7, gap=3)

plot: whether the results should be plotted.
numbers: whether the proportion or frequencies of the different combinations should be represented by numbers.
prop: whether the proportion of missing/imputed values should be used rather than the total amount.
combined: a logical indicating whether the two plots should be combined.
sortVars: whether the variables should be sorted by the number of missing/imputed values.

For more information, please refer to the following link:
Dealing with missingness - Imputation: https://rpubs.com/minhtri/932007

Checking the percentage of missingness:

library(VIM)
aggr(df, plot = F, numbers = T, prop = T)

## 
##  Missings in variables:
##   Variable Count
##        sbp     1
##    time.iv    55
##    aspects     4
##  ia.occlus     1
##  extra.ica     1
##  time.rand     2
##  time.punc   267

aggr(df, plot = T, numbers = T, prop = T, combined= F, sortVars = T, cex.axis=.7, gap=3)

## 
##  Variables sorted by number of missings: 
##   Variable Count
##  time.punc 0.534
##    time.iv 0.110
##    aspects 0.008
##  time.rand 0.004
##        sbp 0.002
##  ia.occlus 0.002
##  extra.ica 0.002
##    studyid 0.000
##        trt 0.000
##        age 0.000
##        sex 0.000
##      nihss 0.000
##   location 0.000
##    hx.isch 0.000
##       afib 0.000
##         dm 0.000
##    mrankin 0.000
##   iv.altep 0.000

aggr(df, plot = T, numbers = T, prop = F, combined= F, sortVars = T, cex.axis=.7, gap=3)

## 
##  Variables sorted by number of missings: 
##   Variable Count
##  time.punc   267
##    time.iv    55
##    aspects     4
##  time.rand     2
##        sbp     1
##  ia.occlus     1
##  extra.ica     1
##    studyid     0
##        trt     0
##        age     0
##        sex     0
##      nihss     0
##   location     0
##    hx.isch     0
##       afib     0
##         dm     0
##    mrankin     0
##   iv.altep     0

Discuss

There are 7 variables with missingess.
However, the missingness in time.punc and time.iv is due to the experimental design (time.punc is only applied for Intervention group (trt), time.iv only applied for patients if iv.altep =Yes). Therefore, I will exclude them from the missingness analysis.
Only 5 variables left: the proportion of each < 5%.
=> Complete case analysis can be applied.
=> We can ignore the missingness and progress to next step - Multiple regression.

df1 = df %>% select (-time.punc, - time.iv, -studyid) %>% na.omit()

3 Classical linear regression - Frequentist approach

To highlight the difference between the Bayesian regression and the traditional linear regression (frequentist approach), let’s first fit the latter to our data.

Package: car
Command: newModel<-lm(outcome ~ predictor(s), data = dataFrame, na.action = an action))

# Building the model
library(car)
model_A <- lm(nihss ~ ., data = df1)
summary(model_A)

## 
## Call:
## lm(formula = nihss ~ ., data = df1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3283 -3.6782 -0.5946  3.6788 11.2577 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          16.070796   3.205253   5.014 7.56e-07 ***
## trtIntervention      -0.209815   0.423275  -0.496  0.62034    
## age                  -0.006806   0.012374  -0.550  0.58257    
## sexMale               0.305910   0.424068   0.721  0.47104    
## locationRight         0.805324   0.420833   1.914  0.05627 .  
## hx.ischYes           -1.063988   0.669898  -1.588  0.11289    
## afibYes               0.054109   0.471736   0.115  0.90873    
## dmYes                -1.972019   0.635639  -3.102  0.00203 ** 
## mrankin1             -0.164159   0.700352  -0.234  0.81478    
## mrankin2             -0.224361   1.008078  -0.223  0.82397    
## mrankin> 2            1.405864   1.047780   1.342  0.18032    
## sbp                  -0.005802   0.008308  -0.698  0.48534    
## iv.altepYes           1.241297   0.673398   1.843  0.06591 .  
## aspects              -0.107675   0.136134  -0.791  0.42937    
## ia.occlusICA with M1  3.018393   2.380488   1.268  0.20543    
## ia.occlusM1           4.226387   2.357445   1.793  0.07365 .  
## ia.occlusM2           4.968808   2.478730   2.005  0.04558 *  
## ia.occlusA1 or A2     8.057030   3.563516   2.261  0.02422 *  
## extra.icaYes          0.440397   0.481065   0.915  0.36042    
## time.rand            -0.005755   0.003287  -1.751  0.08060 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.586 on 472 degrees of freedom
## Multiple R-squared:  0.07934,    Adjusted R-squared:  0.04228 
## F-statistic: 2.141 on 19 and 472 DF,  p-value: 0.003569

t value and its significance: In multiple regression, the smaller the value of Pr(> |t|) (and the larger the value of t), the greater the contribution of that predictor.
=> dm and ia.occlus contribute significantly to the nihss.

For more information about Classical linear regression - Frequentist approach: https://rpubs.com/minhtri/936322

4 Bayesian framework

4.1 Introduction

From https://easystats.github.io/bayestestR/articles/bayestestR.html

The key difference is that in the frequentist framework (the “classical” approach to statistics, with p and t values, as well as degrees of freedom), the effects are fixed (but unknown) and data are random. In other words, it assumes that the unknown parameter has a unique value that we are trying to estimate/guess using our sample data. On the other hand, in the Bayesian framework, instead of estimating the “true effect”, the probability of different effects given the observed data is computed, resulting in a distribution of possible values for the parameters, called the posterior distribution.

The frequentist approach - classical linear regression tries to estimate the real effect. For instance, the “real” value of the correlation between x and y. Hence, the frequentist models return a point-estimate (i.e., a single value and not a distribution) of the “real” correlation (e.g: r=0.42) estimated under a number of obscure assumptions (at a minimum, considering that the data is sampled at random from a “parent”, usually normal distribution).
The Bayesian framework assumes no such thing. The data are what they are. Based on the observed data (and a prior belief about the result), the Bayesian sampling algorithm (MCMC sampling is one example) returns a probability distribution (called the posterior) of the effect that is compatible with the observed data. For the correlation between x and y, it will return a distribution that says, for example, “the most probable effect is 0.42, but this data is also compatible with correlations of 0.12 and 0.74 with certain probabilities”.
To characterize statistical significance of our effects, we do not need p-values, or any other such indices. We simply describe the posterior distribution of the effect. For example, we can report the median, the 95% Credible Interval, Probability of Direction (pd), or % in Region of Practical Equivalence (ROPE).

To illustrate the difference of interpretation, the Bayesian framework allows to say “given the observed data, the effect has 95% probability of falling within this range”, while the frequentist (less intuitive) alternative would be “when repeatedly computing confidence intervals from data of this sort, there is a 95% probability that the effect falls within a given range”. In essence, the Bayesian sampling algorithms (such as MCMC sampling) return a probability distribution (the posterior) of an effect that is compatible with the observed data. Thus, an effect can be described by characterizing its posterior distribution in relation to its centrality (point-estimates), uncertainty, as well as its existence and significance.

4.2 Important Parameters

4.2.1 Credible Intervals (CI)

4.2.1.1 Definition

Credible intervals are an important concept in Bayesian statistics. Its core purpose is to describe and summarise the uncertainty related to the unknown parameters you are trying to estimate. In this regard, it could appear as quite similar to the frequentist Confidence Intervals. However, while their goal is similar, their statistical definition and meaning is very different. Indeed, while the latter is obtained through a complex algorithm full of rarely-tested assumptions and approximations, the credible intervals are fairly straightforward to compute.

As the Bayesian inference returns a distribution of possible effect values (the posterior), the credible interval is just the range containing a particular percentage of probable values. For instance, the 95% credible interval is simply the central portion of the posterior distribution that contains 95% of the values.

Note how this drastically improve the interpretability of the Bayesian interval compared to the frequentist one. Indeed, the Bayesian framework allows us to say “given the observed data, the effect has 95% probability of falling within this range”, compared to the less straightforward, frequentist alternative (the 95% Confidence Interval) would be “there is a 95% probability that when computing a confidence interval from data of this sort, the effect falls within this range”.

4.2.1.2 CI: 89% vs. 95%

Naturally, when it came about choosing the CI level to report by default, people started using 95%, the arbitrary convention used in the frequentist world. However, some authors suggested that 95% might not be the most appropriate for Bayesian posterior distributions, potentially lacking stability if not enough posterior samples are drawn (Kruschke, 2014).

The proposition was to use 90% instead of 95%. However, recently, McElreath (2014, 2018) suggested that if we were to use arbitrary thresholds in the first place, why not use 89%? Moreover, 89 is the highest prime number that does not exceed the already unstable 95% threshold. What does it have to do with anything? Nothing, but it reminds us of the total arbitrariness of these conventions (McElreath, 2018). Thus, CIs computed with 89% intervals (ci = 0.89), are deemed to be more stable than, for instance, 95% intervals (Kruschke, 2014)

However, 95% has some advantages too. For instance, it shares (in the case of a normal posterior distribution) an intuitive relationship with the standard deviation and it conveys a more accurate image of the (artificial) bounds of the distribution. Also, because it is wider, it makes analyses more conservative (i.e., the probability of covering 0 is larger for the 95% CI than for lower ranges such as 89%), which is a good thing in the context of the reproducibility crisis.

4.2.1.3 Different types of CIs

The reader might notice that bayestestR provides two methods to compute credible intervals, the Highest Density Interval (HDI) (hdi()) and the Equal-tailed Interval (ETI) (eti()). These methods can also be changed via the method argument of the ci() function. What is the difference? Let’s see: for symmetric distribution and other types of distribution

For symmetric distribution:

library(bayestestR)
library(ggplot2)

# Generate a normal distribution
post <- distribution_normal(1000)

# Compute HDI and ETI
ci_hdi <- ci(post, method = "HDI")
ci_eti <- ci(post, method = "ETI")

# Plot the distribution and add the limits of the two CIs
out <- estimate_density(post, extend = TRUE)
ggplot(out, aes(x = x, y = y)) +
  geom_area(fill = "orange") +
  theme_classic() +
  # HDI in blue
  geom_vline(xintercept = ci_hdi$CI_low, color = "royalblue", size = 3) +
  geom_vline(xintercept = ci_hdi$CI_high, color = "royalblue", size = 3) +
  # Quantile in red
  geom_vline(xintercept = ci_eti$CI_low, color = "red", size = 1) +
  geom_vline(xintercept = ci_eti$CI_high, color = "red", size = 1)

Other types of distribution

# Generate a beta distribution
posterior <- distribution_beta(1000, 6, 2)

# Compute HDI and Quantile CI
ci_hdi <- ci(posterior, method = "HDI")
ci_eti <- ci(posterior, method = "ETI")

# Plot the distribution and add the limits of the two CIs
out <- estimate_density(posterior, extend = TRUE)
ggplot(out, aes(x = x, y = y)) +
  geom_area(fill = "orange") +
  theme_classic() +
  # HDI in blue
  geom_vline(xintercept = ci_hdi$CI_low, color = "royalblue", size = 3) +
  geom_vline(xintercept = ci_hdi$CI_high, color = "royalblue", size = 3) +
  # ETI in red
  geom_vline(xintercept = ci_eti$CI_low, color = "red", size = 1) +
  geom_vline(xintercept = ci_eti$CI_high, color = "red", size = 1)

Contrary to the HDI, for which all points within the interval have a higher probability density than points outside the interval, the ETI is equal-tailed. This means that a 90% interval has 5% of the distribution on either side of its limits. It indicates the 5th percentile and the 95th percentile. In symmetric distributions, the two methods of computing credible intervals, the ETI and the HDI, return similar results.

This is not the case for skewed distributions. Indeed, it is possible that parameter values in the ETI have lower credibility (are less probable) than parameter values outside the ETI. This property seems undesirable as a summary of the credible values in a distribution.

On the other hand, the ETI range does not change when transformations are applied to the distribution (for instance, for log-odds to probabilities transformation): the lower and higher bounds of the transformed distribution will correspond to the transformed lower and higher bounds of the original distribution. On the contrary, applying transformations to the distribution will change the resulting HDI. Thus, for instance, if exponentiated credible intervals are required, calculating the ETI is recommended.

Unlike the HDI and the ETI, which look at the posterior distribution, the Support Interval (SI) provides information regarding the change in the credibility of values from the prior to the posterior - in other words, it indicates which values of a parameter have gained support by the observed data by some factor greater or equal to k (Wagenmakers, Gronau, Dablander, & Etz, 2018). For more info, please refer to the following link: https://easystats.github.io/bayestestR/articles/credible_interval.html

4.2.2 Probability of Direction (pd)

4.2.2.1 Definition

The Probability of Direction (pd) is an index of effect existence, ranging from 50% to 100%, representing the certainty with which an effect goes in a particular direction (i.e., is positive or negative).

Beyond its simplicity of interpretation, understanding and computation, this index also presents other interesting properties:

It is independent from the model: It is solely based on the posterior distributions and does not require any additional information from the data or the model.
It is robust to the scale of both the response variable and the predictors.
It is strongly correlated with the frequentist p-value, and can thus be used to draw parallels and give some reference to readers non-familiar with Bayesian statistics.

Indices of effect existence, such as the pd, are particularly useful in exploratory research or clinical studies, for which the focus is to make sure that the effect of interest is not in the opposite direction (for clinical studies, that a treatment is not harmful). However, once the effect’s direction is confirmed, the focus should shift toward its significance, including a precise estimation of its magnitude, relevance and importance.

4.2.2.2 Relationship with the p-value

In most cases, it seems that the pd has a direct correspondence with the frequentist one-sided p-value through the formula: p.one−sided = 1 − pd
Similarly, the two-sided p-value (the most commonly reported one) is equivalent through the formula: p.two−sided = 2 x (1−pd)

A two-sided p-value of respectively .1, .05, .01 and .001 correspond approximately to a pd of 95%, 97.5%, 99.5% and 99.95%:

** pd <= 95% ~ p > .1: uncertain
** pd > 95% ~ p < .1: possibly existing
** pd > 97%: likely existing
** pd > 99%: probably existing
** pd > 99.9%: certainly existing

4.2.2.3 Can the pd be 100%?

p = 0.000 is coined as one of the term to avoid when reporting results (Lilienfeld et al., 2015), even if often displayed by statistical software. The rationale is that for every probability distribution, there is no value with a probability of exactly 0. There is always some infinitesimal probability associated with each data point, and the p = 0.000 returned by software is due to approximations related, among other, to finite memory hardware.

One could apply this rationale for the pd: since all data points have a non-null probability density, then the pd (a particular portion of the probability density) can never be 100%. While this is an entirely valid point, people using the direct method might argue that their pd is based on the posterior draws, rather than on the theoretical, hidden, true posterior distribution (which is only approximated by the posterior draws). These posterior draws represent a finite sample for which pd = 100% is a valid statement.

For more information, please refer to the following link: https://easystats.github.io/bayestestR/articles/probability_of_direction.html

4.2.3 Region of Practical Equivalence (ROPE)

4.2.3.1 Definition

According to https://easystats.github.io/bayestestR/articles/region_of_practical_equivalence.html: Unlike a frequentist approach, Bayesian inference is not based on statistical significance, where effects are tested against “zero”. Indeed, the Bayesian framework offers a probabilistic view of the parameters, allowing assessment of the uncertainty related to them. Thus, rather than concluding that an effect is present when it simply differs from zero, we would conclude that the probability of being outside a specific range that can be considered as “practically no effect” (i.e., a negligible magnitude) is sufficient. This range is called the region of practical equivalence (ROPE).

4.2.3.2 ROPE percentage to accept or to reject?

Null Hypothesis: Parameter is non-significant (in the sense of not important enough to be cared about). In other words, it checks the percentage of Credible Interval (CI - HDI) that is the null region (the ROPE). If this percentage is sufficiently low, the null hypothesis is rejected. If this percentage is sufficiently high, the null hypothesis is accepted.

If the HDI is completely outside the ROPE, the “null hypothesis” for this parameter is “rejected”. If the ROPE completely covers the HDI, i.e., all most credible values of a parameter are inside the region of practical equivalence, the null hypothesis is accepted. Else, it’s unclear whether the null hypothesis should be accepted or rejected.

If the full ROPE is used (i.e., 100% of the HDI), then the null hypothesis is rejected or accepted if the percentage of the posterior within the ROPE is smaller than to 2.5% or greater than 97.5%. Desirable results are low proportions inside the ROPE (the closer to zero the better).

4.2.3.3 Define ROPE range

Kruschke (2018) suggests that the ROPE could be set, by default, to a range from -0.1 to 0.1 of a standardized parameter:

For linear models (lm): [−0.1 x SDy , 0.1 x SDy]
For logistic models: π/√3
For t-tests: the standard deviation of the response is used, similarly to linear models.
For correlations, -0.05, 0.05 is used.
For all other models, -0.1, 0.1 is used to determine the ROPE limits, but it is strongly advised to specify it manually.

4.2.4 Bayes factor (BF)

Although it can also be used to assess effect existence and significance, the Bayes factor (BF) is a versatile index that can be used to directly compare different models (or data generation processes). The Bayes factor is a ratio that informs us by how much more (or less) likely the observed data are under two compared models - usually a model with versus a model without the effect. Depending on the specifications of the null model (whether it is a point-estimate (e.g., 0) or an interval), the Bayes factor could be used both in the context of effect existence and significance.

For how to calculate Bayes factor, please refer to the following link: https://easystats.github.io/bayestestR/articles/bayes_factors.html

4.2.5 Comparison of Point-Estimates

One of the main difference between the Bayesian and the frequentist frameworks is that the former returns a probability distribution for each effect (i.e., a model parameter of interest, such as a regression slope) instead of a single value. However, there is still a need and demand - for reporting or use in further analysis - for a single value (point-estimate) that best characterises the underlying posterior distribution. There are three main indices used in the literature for effect estimation: the mean - the median - the MAP (Maximum A Posteriori) estimate (roughly corresponding to the mode - the “peak” - of the distribution).

For linear models, the Mean was the better predictor, closely followed by the Median, the MAP and the frequentist Coefficient.
For logistic models, the MAP was the better predictor, followed by the Median, the Mean and, behind, the frequentist Coefficient.

=> Overall, the median appears to be a safe choice, maintaining a high performance across different types of models.

4.3 Practice with R

4.3.1 Fit the model

To fit a Bayesian regression, we use the function stan_glm from the rstanarm package. This function as the above lm function requires providing the formula and the data that will be used, and leave all the following arguments with their default values:

Package: rstanarm
Command: stan_glm(formula, data, prior, prior_intercept, prior_aux, algorithm)

with:

family : by default this function uses the gaussian distribution as we do with the classical glm function to perform lm model.
prior : The prior distribution for the regression coefficients, By default the normal prior is used. There are subset of functions used for the prior provided by rstanarm like , student t family, laplace family…ect. To get the full list with all the details run this command ?priors. If we want a flat uniform prior we set this to NULL.
prior_intercept: prior for the intercept, can be normal, student_t , or cauchy. If we want a flat uniform prior we set this to NULL.
prior_aux: prior fo auxiliary parameters such as the error standard deviation for the gaussion family.
algorithm: The estimating approach to use. The default is “sampling MCMC1.
iter : is the number of iterations if the MCMC method is used, the default is 2000.
chains : the number of Markov chains, the default is 4.

library(rstanarm)
model_B <- stan_glm(nihss ~. , data=df1 , seed=1234)

## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.001 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.951 seconds (Warm-up)
## Chain 1:                0.962 seconds (Sampling)
## Chain 1:                1.913 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.806 seconds (Warm-up)
## Chain 2:                1.285 seconds (Sampling)
## Chain 2:                2.091 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 1.245 seconds (Warm-up)
## Chain 3:                0.928 seconds (Sampling)
## Chain 3:                2.173 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.936 seconds (Warm-up)
## Chain 4:                0.896 seconds (Sampling)
## Chain 4:                1.832 seconds (Total)
## Chain 4:

print(model_B, digits = 3)

## stan_glm
##  family:       gaussian [identity]
##  formula:      nihss ~ .
##  observations: 492
##  predictors:   20
## ------
##                      Median MAD_SD
## (Intercept)          16.305  3.134
## trtIntervention      -0.211  0.431
## age                  -0.007  0.013
## sexMale               0.310  0.438
## locationRight         0.809  0.416
## hx.ischYes           -1.078  0.665
## afibYes               0.050  0.461
## dmYes                -1.983  0.660
## mrankin1             -0.171  0.725
## mrankin2             -0.255  1.004
## mrankin> 2            1.406  1.066
## sbp                  -0.006  0.008
## iv.altepYes           1.243  0.697
## aspects              -0.107  0.133
## ia.occlusICA with M1  2.865  2.317
## ia.occlusM1           4.071  2.329
## ia.occlusM2           4.803  2.430
## ia.occlusA1 or A2     7.873  3.562
## extra.icaYes          0.437  0.481
## time.rand            -0.006  0.003
## 
## Auxiliary parameter(s):
##       Median MAD_SD
## sigma 4.587  0.143 
## 
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg

Discuss
The Median estimate is the median computed from the MCMC simulation, and MAD_SD is the median absolute deviation computed from the same simulation. To well understand how getting these outputs, we plot the MCMC simulation of predictors using bayesplot. As can be seen, the point estimate of “age” falls on the median of the distribution (red line). The same thing is true for “other” predictors.

library(bayesplot)
mcmc_dens(model_B, pars = c("age"))+ vline_at(-0.007, col="red")

4.3.2 Summary of Posterior Distribution

How do we evaluate the model parameters? By analyzing the posteriors using some specific statistics. We make use of the function describe_posterior provided by bayestestR package.

Package: bayestestR
Command: describe_posterior(model, ci = 0.95)

Summary of Posterior Distribution:

library(bayestestR)
posteriors = describe_posterior(model_B, ci = 0.95)
print_md(posteriors, digits = 3)

Summary of Posterior Distribution
Parameter	Median	95% CI	pd	ROPE	% in ROPE	Rhat	ESS
(Intercept)	16.305	[ 9.95, 22.35]	100%	[-0.47, 0.47]	0%	1.000	1775.000
trtIntervention	-0.211	[-1.02, 0.61]	68.77%	[-0.47, 0.47]	70.76%	1.000	4664.000
age	-0.007	[-0.03, 0.02]	70.65%	[-0.47, 0.47]	100%	1.000	4850.000
sexMale	0.310	[-0.49, 1.14]	76.80%	[-0.47, 0.47]	64.92%	1.000	4955.000
locationRight	0.809	[ 0.02, 1.60]	97.72%	[-0.47, 0.47]	19.92%	1.000	4718.000
hx.ischYes	-1.078	[-2.37, 0.24]	94.85%	[-0.47, 0.47]	16.05%	1.000	4456.000
afibYes	0.050	[-0.85, 1.00]	54.30%	[-0.47, 0.47]	72.37%	1.000	4832.000
dmYes	-1.983	[-3.19, -0.72]	99.95%	[-0.47, 0.47]	0%	1.000	4348.000
mrankin1	-0.171	[-1.55, 1.23]	58.65%	[-0.47, 0.47]	50.03%	0.999	4744.000
mrankin2	-0.255	[-2.25, 1.72]	60.68%	[-0.47, 0.47]	37.05%	0.999	5408.000
mrankin> 2	1.406	[-0.71, 3.48]	90.48%	[-0.47, 0.47]	15.76%	0.999	5548.000
sbp	-0.006	[-0.02, 0.01]	76.22%	[-0.47, 0.47]	100%	1.000	4613.000
iv.altepYes	1.243	[-0.09, 2.55]	96.83%	[-0.47, 0.47]	11.18%	1.000	5175.000
aspects	-0.107	[-0.38, 0.16]	79.25%	[-0.47, 0.47]	100%	0.999	4808.000
ia.occlusICA with M1	2.865	[-1.45, 7.48]	89.55%	[-0.47, 0.47]	7.61%	1.000	1086.000
ia.occlusM1	4.071	[-0.19, 8.68]	96.75%	[-0.47, 0.47]	2.87%	1.000	1060.000
ia.occlusM2	4.803	[ 0.27, 9.66]	98.15%	[-0.47, 0.47]	0.61%	1.000	1136.000
ia.occlusA1 or A2	7.873	[ 0.96, 15.05]	98.60%	[-0.47, 0.47]	0%	1.000	1774.000
extra.icaYes	0.437	[-0.51, 1.38]	82.33%	[-0.47, 0.47]	51.76%	1.000	4588.000
time.rand	-0.006	[-0.01, 0.00]	95.80%	[-0.47, 0.47]	100%	0.999	4367.000

Explain

CI : Credible Interval, it is used to quantify the uncertainty about the regression coefficients. There are two methods to compute CI, the highest density interval HDI which is the default, and the Equal-tailed Interval ETI. with 89% probability (given the data) that a coefficient lies above the CI_low value and under CI_high value. This straightforward probabilistic interpretation is completely different from the confidence interval used in classical linear regression where the coefficient fall inside this confidence interval (if we choose 95% of confidence) 95 times if we repeat the study 100 times.
pd : Probability of Direction , which is the probability that the effect goes to the positive or to the negative direction, and it is considered as the best equivalent for the p-value.
ROPE_CI: Region of Practical Equivalence, since bayes method deals with true probabilities, it does not make sense to compute the probability of getting the effect equals zero (the null hypothesis) as a point (probability of a point in continuous intervals equal zero ). Thus, we define instead a small range around zero which can be considered practically the same as no effect (zero), this range therefore is called ROPE. By default (according to Cohen, 1988) The Rope is [-0.1,0.1] from the standardized coefficients.
Rhat: scale reduction factor 𝑅, it is computed for each scalar quantity of interest, as the standard deviation of that quantity from all the chains included together, divided by the root mean square of the separate within-chain standard deviations. When this value is close to 1 we do not have any convergence problem with MCMC.
ESS: effective sample size, it captures how many independent draws contain the same amount of information as the dependent sample obtained by the MCMC algorithm, the higher the ESS the better. The threshold used in practice is 400.

=> You just fitted a Bayesian version of the model by simply using the stan_glm() function instead of lm() and described the posterior distributions of the parameters: describe_posterior()!

4.3.3 Extracting the posterior

Aternatively, we can get the coefficeient estimates (which are the medians by default) separatly by using the package insight.

Package: insight
Command: get_parameters(model)

library(insight)
post <- get_parameters(model_B)

Discuss
As we can see, the parameters take the form of a lengthy dataframe with two types of columns, corresponding to the intercept and the effect of each variable. These columns contain the posterior distributions of these parameters. In simple terms, the posterior distribution is a set of different plausible values for each parameter. Contrast this with the result we saw from the frequentist linear regression mode using lm, where the results had single values for each effect of the model, and not a distribution of values. This is one of the most important differences between these two frameworks.

dim(post)

## [1] 4000   20

Why is the size 4000 ?

First, these observations (the rows) are usually referred to as posterior draws. The underlying idea is that the Bayesian sampling algorithm (e.g., Monte Carlo Markov Chains - MCMC) will draw from the hidden true posterior distribution. Thus, it is through these posterior draws that we can estimate the underlying true posterior distribution. Therefore, the more draws you have, the better your estimation of the posterior distribution. However, increased draws also means longer computation time.
If we look at the documentation (?sampling) for the rstanarm’s “sampling” algorithm used by default in the model above, we can see several parameters that influence the number of posterior draws. By default, there are 4 chains (you can see it as distinct sampling runs), that each create 2000 iter (draws). However, only half of these iterations are kept, as half are used for warm-up (the convergence of the algorithm). Thus, the total for posterior draws equals 4 chains * (2000 iterations - 1000 warm-up) = 4000.
We can change that, for instance: model-C <- stan_glm(nihss ~ . , data = df1, chains = 2, iter = 1000, warmup = 250). As would be expected, we have 2 chains * (1000 iterations - 250 warm-up) = 1500 posterior draws.

For more info, please refer to https://easystats.github.io/bayestestR/articles/example1.html

4.3.4 Visualizing the posterior distribution

Now that we’ve understood where these values come from, let’s look at them. We will start by visualizing the posterior distribution of our parameter of interest:

Effect of numeric variable: age

ggplot(post, aes(x = age)) +
  geom_density(fill = "orange")

Discuss

This distribution represents the probability (the y axis) of different effects (the x axis). The central values are more probable than the extreme values. As you can see, this distribution ranges from about -0.055 to 0.04, with the bulk of it being at around -0.007 (look at the median and range of age).
And this is the heart of Bayesian analysis. We don’t need p-values, t-values, or degrees of freedom. Everything we need is contained within this posterior distribution.
Our description above is consistent with the values obtained from the frequentist regression (which resulted in a β of -0.007). Indeed, in most cases, Bayesian analysis does not drastically differ from the frequentist results or their interpretation. Rather, it makes the results more interpretable and intuitive, and easier to understand and describe.

Effect of categorical variable: sex-Male: Imagine for a moment you are interested in how the nihss score varies depending on two different groups: Intervention and Control.

ggplot(post, aes(x = sexMale)) +
  geom_density(fill = "red")

This represents the posterior distribution of the difference between Male and Female. It seems that the difference is positive (since the values are concentrated on the right side of 0) => nihss score increases if participant is Male.

But, by how much? Let compute the median and the CI (as below):

median(post$sexMale)

## [1] 0.3102701

hdi(post$sexMale)

## 95% HDI: [-0.48, 1.15]

Discuss: It makes nihss by around 0.31 scores (the median). However, the uncertainty is quite low: there is 95% chance that the difference between the two gender types is between -0.48 and 1.15.

4.3.5 Describing the Posterior

Unfortunately, it is often not practical to report the whole posterior distributions as graphs. We need to find a concise way to summarize it. We recommend to describe the posterior distribution with 3 elements:

A point-estimate which is a one-value summary (similar to the beta in frequentist regressions).
A credible interval representing the associated uncertainty.
Some indices of significance, giving information about the relative importance of this effect.

4.3.5.1 Point estimate

Centrality indices, such as the mean, the median, or the mode are usually used as point-estimates. But what’s the difference between them?

Mean:

print(purrr::map_dbl(post, mean),digits = 3)

##          (Intercept)      trtIntervention                  age 
##             16.19966             -0.21046             -0.00673 
##              sexMale        locationRight           hx.ischYes 
##              0.31231              0.80310             -1.07695 
##              afibYes                dmYes             mrankin1 
##              0.05700             -1.97313             -0.16776 
##             mrankin2           mrankin> 2                  sbp 
##             -0.24783              1.39985             -0.00597 
##          iv.altepYes              aspects ia.occlusICA with M1 
##              1.23625             -0.10724              2.90636 
##          ia.occlusM1          ia.occlusM2    ia.occlusA1 or A2 
##              4.12252              4.86562              7.93401 
##         extra.icaYes            time.rand 
##              0.44084             -0.00576

Median:

print(purrr::map_dbl(post,median),digits = 3)

##          (Intercept)      trtIntervention                  age 
##             16.30473             -0.21090             -0.00712 
##              sexMale        locationRight           hx.ischYes 
##              0.31027              0.80877             -1.07798 
##              afibYes                dmYes             mrankin1 
##              0.04952             -1.98303             -0.17059 
##             mrankin2           mrankin> 2                  sbp 
##             -0.25508              1.40627             -0.00594 
##          iv.altepYes              aspects ia.occlusICA with M1 
##              1.24282             -0.10692              2.86471 
##          ia.occlusM1          ia.occlusM2    ia.occlusA1 or A2 
##              4.07147              4.80325              7.87335 
##         extra.icaYes            time.rand 
##              0.43739             -0.00572

Maximum A posteriori (MAP):

print(purrr::map_dbl(post, map_estimate),digits = 3)

##          (Intercept)      trtIntervention                  age 
##             16.81814             -0.21015             -0.00855 
##              sexMale        locationRight           hx.ischYes 
##              0.32157              0.82763             -1.07448 
##              afibYes                dmYes             mrankin1 
##              0.09198             -2.04621             -0.20568 
##             mrankin2           mrankin> 2                  sbp 
##             -0.23416              1.50310             -0.00532 
##          iv.altepYes              aspects ia.occlusICA with M1 
##              1.32554             -0.10276              3.00038 
##          ia.occlusM1          ia.occlusM2    ia.occlusA1 or A2 
##              3.99828              4.62867              7.47592 
##         extra.icaYes            time.rand 
##              0.42618             -0.00530

Discuss

As we see, the values are closer to each other due to the like normality of the distribution of the posteriors where all the central statistics (mean, median, mode) are closer to each other. Using the following plot to visualize the age coefficient using different statistics as follows:

mcmc_dens(post, pars=c("age"))+
  geom_density(fill = "orange") +
  vline_at(median(post$age), col="red", size = 1)+
  vline_at(mean(post$age), col="blue", size = 1)+
  vline_at(map_estimate(post$age), col="green", size = 1)

Well, all these values give very similar results. Thus, we will choose the median, as this value has a direct meaning from a probabilistic perspective: there is 50% chance that the true effect is higher and 50% chance that the effect is lower (as it divides the distribution in two equal parts).

4.3.5.2 Uncertainty

As we do with classical regression (frequentist), we can test the significance of the bayesian regression coefficients by checking whether the corresponding credible interval contains zero or not, if no then this coefficient is significant. Let’s go back to our model and check the significance of each coefficient (using credible based on the default hdi).

Some people choose to use the range containing the 89% most probable effect values. As the 89% level gives more stable results (Kruschke, 2014) and reminds us about the arbitrariness of such conventions (McElreath, 2018). However, I will use 95% CI and eti as the default ci_method in this example:

hdi(model_B)

## Highest Density Interval
## 
## Parameter            |        95% HDI
## -------------------------------------
## (Intercept)          | [ 9.85, 22.22]
## trtIntervention      | [-1.01,  0.61]
## age                  | [-0.03,  0.02]
## sexMale              | [-0.48,  1.15]
## locationRight        | [ 0.01,  1.58]
## hx.ischYes           | [-2.39,  0.21]
## afibYes              | [-0.82,  1.03]
## dmYes                | [-3.18, -0.72]
## mrankin1             | [-1.51,  1.24]
## mrankin2             | [-2.26,  1.68]
## mrankin> 2           | [-0.75,  3.43]
## sbp                  | [-0.02,  0.01]
## iv.altepYes          | [-0.07,  2.56]
## aspects              | [-0.38,  0.16]
## ia.occlusICA with M1 | [-1.65,  7.27]
## ia.occlusM1          | [-0.20,  8.62]
## ia.occlusM2          | [ 0.35,  9.73]
## ia.occlusA1 or A2    | [ 1.53, 15.48]
## extra.icaYes         | [-0.54,  1.35]
## time.rand            | [-0.01,  0.00]

eti(model_B)

## Equal-Tailed Interval
## 
## Parameter            |        95% ETI | Effects |   Component
## -------------------------------------------------------------
## (Intercept)          | [ 9.95, 22.35] |   fixed | conditional
## trtIntervention      | [-1.02,  0.61] |   fixed | conditional
## age                  | [-0.03,  0.02] |   fixed | conditional
## sexMale              | [-0.49,  1.14] |   fixed | conditional
## locationRight        | [ 0.02,  1.60] |   fixed | conditional
## hx.ischYes           | [-2.37,  0.24] |   fixed | conditional
## afibYes              | [-0.85,  1.00] |   fixed | conditional
## dmYes                | [-3.19, -0.72] |   fixed | conditional
## mrankin1             | [-1.55,  1.23] |   fixed | conditional
## mrankin2             | [-2.25,  1.72] |   fixed | conditional
## mrankin> 2           | [-0.71,  3.48] |   fixed | conditional
## sbp                  | [-0.02,  0.01] |   fixed | conditional
## iv.altepYes          | [-0.09,  2.55] |   fixed | conditional
## aspects              | [-0.38,  0.16] |   fixed | conditional
## ia.occlusICA with M1 | [-1.45,  7.48] |   fixed | conditional
## ia.occlusM1          | [-0.19,  8.68] |   fixed | conditional
## ia.occlusM2          | [ 0.27,  9.66] |   fixed | conditional
## ia.occlusA1 or A2    | [ 0.96, 15.05] |   fixed | conditional
## extra.icaYes         | [-0.51,  1.38] |   fixed | conditional
## time.rand            | [-0.01,  0.00] |   fixed | conditional

Discuss: Using both methods, the only significant coefficient is dm and ia.occlus variables, which is in line with the classical regression.

Note: this similar result between frequentist and bayesian regression may due to the normality assumption for the former that is well satisfied which gives satisfied results and due to the normal prior used in the latter. However, in real world it is less often to be sure about the normality assumption which may give contradict conclusions between the two approaches.

4.3.5.3 ROPE Percentage: Significance

one way to assess significance could be to define an area around 0, which will consider as practically equivalent to zero (i.e., absence of, or a negligible, effect). This is called the Region of Practical Equivalence (ROPE), and is one way of testing the significance of parameters. By checking the part of the credible interval that falls inside the ROPE interval = non-significant coefficient.

Percentage of ROPE to Accept or Reject:

If the HDI is completely outside the ROPE, the “null hypothesis” for this parameter is “rejected”. If the ROPE completely covers the HDI, i.e., all most credible values of a parameter are inside the region of practical equivalence, the null hypothesis is accepted. Else, it’s unclear whether the null hypothesis should be accepted or rejected.
If the full ROPE is used (i.e., 100% of the HDI), then the null hypothesis is rejected or accepted if the percentage of the posterior within the ROPE is smaller than to 2.5% or greater than 97.5%. Desirable results are low proportions inside the ROPE (the closer to zero the better).

Define ROPE range:

For linear models (lm): [−0.1 x SDy , 0.1 x SDy]
For logistic models: π/√3
For t-tests: the standard deviation of the response is used, similarly to linear models.
For correlations, -0.05, 0.05 is used.
For all other models, -0.1, 0.1 is used to determine the ROPE limits, but it is strongly advised to specify it manually.

To calculate ROPE range:

rope_value <- 0.1 * sd(df1$nihss)
rope_range <- c(-rope_value, rope_value)
rope_range

## [1] -0.4686278  0.4686278

or we can just simply use:

rope_range(model_B)

## [1] -0.4686278  0.4686278

Let’s recompute the percentage in ROPE by calling the rope from bayestestR package.

library(bayestestR)
rope(post$age, range = rope_range)

## # Proportion of samples inside the ROPE [-0.47, 0.47]:
## 
## inside ROPE
## -----------
## 100.00 %

rope(post$dmYes, range = rope_range)

## # Proportion of samples inside the ROPE [-0.47, 0.47]:
## 
## inside ROPE
## -----------
## 0.00 %

rope(post$`ia.occlusA1 or A2`, range = rope_range)

## # Proportion of samples inside the ROPE [-0.47, 0.47]:
## 
## inside ROPE
## -----------
## 0.00 %

Discuss

For age almost all the credible interval (HDI) is inside the ROPE range, which means that coefficient is non-significant. Or The probability of age coefficient to be zero is 100%.
For dm and ia.occlusA1 or A2 we observe that the 95% of the posterior distribution of the effect does not overlap with the ROPE, which means that the effect is significant.

4.3.6 PD and P-value

Sometimes we are only interested in checking the direction of the coefficient (positive or negative). This is the role of pd statistic in the above table, high value means that the associated effect is concentrated on the same side as the median.

For our model, since pd is equal to 1, not many the posteriors of the variables and the intercept are on the same side (if median negative all other values are negatives). However, it should be noted that this statistic does not assess the significance of the effect. Something more important to mention is that there exists a strong relation between this probability and the p-value approximated as follows: 𝑝−𝑣𝑎𝑙𝑢𝑒 = 1−𝑝𝑑. Let’s check this with our variables.

library(broom)
Classical.p <-dplyr::select(tidy(model_A), c(term,p.value))
Classical.p <- round(Classical.p$p.value, digits = 3)
Bayes.1sided_p <- 1- purrr::map_dbl(post, p_direction)
Bayes.2sided_p <- Bayes.1sided_p * 2
a <- cbind(Classical.p , Bayes.1sided_p, Bayes.2sided_p)
a

##                      Classical.p Bayes.1sided_p Bayes.2sided_p
## (Intercept)                0.000        0.00000         0.0000
## trtIntervention            0.620        0.31225         0.6245
## age                        0.583        0.29350         0.5870
## sexMale                    0.471        0.23200         0.4640
## locationRight              0.056        0.02275         0.0455
## hx.ischYes                 0.113        0.05150         0.1030
## afibYes                    0.909        0.45700         0.9140
## dmYes                      0.002        0.00050         0.0010
## mrankin1                   0.815        0.41350         0.8270
## mrankin2                   0.824        0.39325         0.7865
## mrankin> 2                 0.180        0.09525         0.1905
## sbp                        0.485        0.23775         0.4755
## iv.altepYes                0.066        0.03175         0.0635
## aspects                    0.429        0.20750         0.4150
## ia.occlusICA with M1       0.205        0.10450         0.2090
## ia.occlusM1                0.074        0.03250         0.0650
## ia.occlusM2                0.046        0.01850         0.0370
## ia.occlusA1 or A2          0.024        0.01400         0.0280
## extra.icaYes               0.360        0.17675         0.3535
## time.rand                  0.081        0.04200         0.0840

Discuss: The p-value of Classical Linear Regression and Bayesian Regression are similar.

4.3.7 Summary commands

In summary, we just need to use a few commands to fit a Bayesian version of the model by simply using the stan_glm() function instead of lm() and describe the posterior distributions of the parameters describe_posterior(): median, CI, pd and % in ROPE.

library(rstanarm)
model_C <- stan_glm(nihss ~. , data=df1 , seed=1234)

## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.803 seconds (Warm-up)
## Chain 1:                0.849 seconds (Sampling)
## Chain 1:                1.652 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.001 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.706 seconds (Warm-up)
## Chain 2:                0.833 seconds (Sampling)
## Chain 2:                1.539 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.76 seconds (Warm-up)
## Chain 3:                0.79 seconds (Sampling)
## Chain 3:                1.55 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.811 seconds (Warm-up)
## Chain 4:                0.8 seconds (Sampling)
## Chain 4:                1.611 seconds (Total)
## Chain 4:

library(bayestestR)
posteriors = describe_posterior(model_C, ci = 0.95)
print_md(posteriors, digits = 3)

Summary of Posterior Distribution
Parameter	Median	95% CI	pd	ROPE	% in ROPE	Rhat	ESS
(Intercept)	16.305	[ 9.95, 22.35]	100%	[-0.47, 0.47]	0%	1.000	1775.000
trtIntervention	-0.211	[-1.02, 0.61]	68.77%	[-0.47, 0.47]	70.76%	1.000	4664.000
age	-0.007	[-0.03, 0.02]	70.65%	[-0.47, 0.47]	100%	1.000	4850.000
sexMale	0.310	[-0.49, 1.14]	76.80%	[-0.47, 0.47]	64.92%	1.000	4955.000
locationRight	0.809	[ 0.02, 1.60]	97.72%	[-0.47, 0.47]	19.92%	1.000	4718.000
hx.ischYes	-1.078	[-2.37, 0.24]	94.85%	[-0.47, 0.47]	16.05%	1.000	4456.000
afibYes	0.050	[-0.85, 1.00]	54.30%	[-0.47, 0.47]	72.37%	1.000	4832.000
dmYes	-1.983	[-3.19, -0.72]	99.95%	[-0.47, 0.47]	0%	1.000	4348.000
mrankin1	-0.171	[-1.55, 1.23]	58.65%	[-0.47, 0.47]	50.03%	0.999	4744.000
mrankin2	-0.255	[-2.25, 1.72]	60.68%	[-0.47, 0.47]	37.05%	0.999	5408.000
mrankin> 2	1.406	[-0.71, 3.48]	90.48%	[-0.47, 0.47]	15.76%	0.999	5548.000
sbp	-0.006	[-0.02, 0.01]	76.22%	[-0.47, 0.47]	100%	1.000	4613.000
iv.altepYes	1.243	[-0.09, 2.55]	96.83%	[-0.47, 0.47]	11.18%	1.000	5175.000
aspects	-0.107	[-0.38, 0.16]	79.25%	[-0.47, 0.47]	100%	0.999	4808.000
ia.occlusICA with M1	2.865	[-1.45, 7.48]	89.55%	[-0.47, 0.47]	7.61%	1.000	1086.000
ia.occlusM1	4.071	[-0.19, 8.68]	96.75%	[-0.47, 0.47]	2.87%	1.000	1060.000
ia.occlusM2	4.803	[ 0.27, 9.66]	98.15%	[-0.47, 0.47]	0.61%	1.000	1136.000
ia.occlusA1 or A2	7.873	[ 0.96, 15.05]	98.60%	[-0.47, 0.47]	0%	1.000	1774.000
extra.icaYes	0.437	[-0.51, 1.38]	82.33%	[-0.47, 0.47]	51.76%	1.000	4588.000
time.rand	-0.006	[-0.01, 0.00]	95.80%	[-0.47, 0.47]	100%	0.999	4367.000

4.4 Reporting Guidelines

According to https://easystats.github.io/bayestestR/articles/guidelines.html: A Bayesian analysis returns a posterior distribution for each parameter (or effect). To minimally describe these distributions, they recommend reporting a point-estimate of centrality as well as information characterizing the estimation uncertainty (the dispersion). Additionally, one can also report indices of effect existence and/or significance.

Centrality: reporting the median as an index of centrality, as it is more robust compared to the mean or the MAP estimate. However, in case of a severely skewed posterior distribution, the MAP estimate could be a good alternative.
Uncertainty: The 95% or 89% Credible Intervals (CI) are two reasonable ranges to characterize the uncertainty related to the estimation. They also recommend computing the CIs based on the HDI rather than quantiles, favouring probable over central values. Note that a CI based on the quantile (equal-tailed interval) might be more appropriate in case of transformations (for instance when transforming log-odds to probabilities). Otherwise, intervals that originally do not cover the null might cover it after transformation.
Existence: The most straightforward index to describe existence of an effect is the Probability of Direction (pd), representing the certainty associated with the most probable direction (positive or negative) of the effect. This index is easy to understand, simple to interpret, straightforward to compute, robust to model characteristics, and independent from the scale of the data. Moreover, it is strongly correlated with the frequentist p-value, and can thus be used to draw parallels and give some reference to readers non-familiar with Bayesian statistics. A two-sided p-value of respectively .1, .05, .01 and .001 correspond approximately to a pd of 95%, 97.5%, 99.5% and 99.95%:

** pd <= 95% ~ p > .1: uncertain
** pd > 95% ~ p < .1: possibly existing
** pd > 97%: likely existing
** pd > 99%: probably existing
** pd > 99.9%: certainly existing

Significance: The percentage in ROPE is a index of significance (in its primary meaning), informing us whether a parameter is related or not to a non-negligible change (in terms of magnitude) in the outcome. We suggest reporting the percentage of the full posterior distribution (the full ROPE) instead of a given proportion of CI in the ROPE, which appears to be more sensitive (especially to delineate highly significant effects). Rather than using it as a binary, all-or-nothing decision criterion, such as suggested by the original equivalence test, we recommend using the percentage as a continuous index of significance:

** > 99% in ROPE: negligible (we can accept the null hypothesis: no significant different)
** > 97.5% in ROPE: probably negligible
** <= 97.5% & >= 2.5% in ROPE: undecided significance
** < 2.5% in ROPE: probably significant
** < 1% in ROPE: significant (we can reject the null hypothesis)

Template Sentence: “the effect of X has a probability of pd of being negative (Median = median, 95% CI [ HDIlow , HDIhigh ] and can be considered as significant [ROPE% in ROPE]).”

5 References

https://statisticsbyjim.com/regression/model-specification-variable-selection/
https://rpubs.com/Qsheep/BayesianLinearRegression
https://rstudio-pubs-static.s3.amazonaws.com/677464_c8609bc9757543e2b8543b2cf5206175.html
https://github.com/easystats/bayestestR#documentation
Cohen, J. (1988). Statistical power analysis for the social sciences.
Kruschke, J. (2014). Doing bayesian data analysis: A tutorial with r, JAGS, and stan. Academic Press.
McElreath, R. (2018). Statistical rethinking: A bayesian course with examples in r and stan. Chapman; Hall/CRC.
Makowski, D., Ben-Shachar, M. S., & Lüdecke, D. (2019). bayestestR: Describing Effects and their Uncertainty, Existence and Significance within the Bayesian Framework. Journal of Open Source Software, 4(40), 1541. https://doi.org/10.21105/joss.01541
Lilienfeld, S. O., Sauvigné, K. C., Lynn, S. J., Cautin, R. L., Latzman, R. D., & Waldman, I. D. (2015). Fifty psychological and psychiatric terms to avoid: A list of inaccurate, misleading, misused, ambiguous, and logically confused words and phrases. Frontiers in Psychology, 6, 1100. https://doi.org/10.3389/fpsyg.2015.01100

Linear regression - Bayesian framework

Minh Tri

2022-11-10

1 Simulated fakestroke data

2 Data preprocessing step

2.1 Step 1: Data wrangling

2.2 Step 2: Missingness checking and imputation when necessary

3 Classical linear regression - Frequentist approach

4 Bayesian framework

4.1 Introduction

4.2 Important Parameters

4.2.1 Credible Intervals (CI)

4.2.1.1 Definition

4.2.1.2 CI: 89% vs. 95%

4.2.1.3 Different types of CIs

4.2.2 Probability of Direction (pd)

4.2.2.1 Definition

4.2.2.2 Relationship with the p-value

4.2.2.3 Can the pd be 100%?

4.2.3 Region of Practical Equivalence (ROPE)

4.2.3.1 Definition

4.2.3.2 ROPE percentage to accept or to reject?

4.2.3.3 Define ROPE range

4.2.4 Bayes factor (BF)

4.2.5 Comparison of Point-Estimates

4.3 Practice with R

4.3.1 Fit the model

4.3.2 Summary of Posterior Distribution

4.3.3 Extracting the posterior

4.3.4 Visualizing the posterior distribution

4.3.5 Describing the Posterior

4.3.5.1 Point estimate

4.3.5.2 Uncertainty

4.3.5.3 ROPE Percentage: Significance

4.3.6 PD and P-value

4.3.7 Summary commands

4.4 Reporting Guidelines

5 References