knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE
)
Previous sections:
Note: This analysis is used for personal study purpose. In this section, I will summerize some basic information about linear regression - Bayesian approach based on several sources.
R Packages:
# More packages will be shown during the analysis process
# Load the required library
library(tidyverse) # Data Wrangling
library(conflicted) # Dealing with conflict package
library(readxl) # Read csv file
Dealing with Conflicts
There is a lot of packages here, and sometimes individual functions are
in conflict. R’s default conflict resolution system gives precedence to
the most recently loaded package. This can make it hard to detect
conflicts, particularly when introduced by an update to an existing
package.
conflict_prefer("filter", "dplyr")
conflict_prefer("select", "dplyr")
conflict_prefer("Predict", "rms")
conflict_prefer("impute_median", "simputation")
conflict_prefer("summarize", "dplyr")
Data used in this notes
fakestroke.csv
(Source: https://github.com/THOMASELOVE/432-data/blob/master/data/fakestroke.csv)
The fakestroke.csv file contains the following 18 variables for 500 patients:
# To make a table in R markdown: 1st row: header, 2nd row: Alignment; the remaining row: for content
## |Variable | Description |
## |:------- | :---------- |
## |studyid | Study ID # (z001 through z500) |
| Variable | Description |
|---|---|
| studyid | Study ID # (z001 through z500) |
| trt | Treatment group (Intervention or Control) |
| age | Age in years |
| sex | Male or Female |
| nihss | NIH Stroke Scale Score (can range from 0-42; higher scores indicate more severe neurological deficits) |
| location | Stroke Location - Left or Right Hemisphere |
| hx.isch | History of Ischemic Stroke (Yes/No) |
| afib | Atrial Fibrillation (1 = Yes, 0 = No) |
| dm | Diabetes Mellitus (1 = Yes, 0 = No) |
| mrankin | Pre-stroke modified Rankin scale score (0, 1, 2 or > 2) indicating functional disability - complete range is 0 (no symptoms) to 6 (death) |
| sbp | Systolic blood pressure, in mm Hg |
| iv.altep | Treatment with IV alteplase (Yes/No) |
| time.iv | Time from stroke onset to start of IV alteplase (minutes) if iv.altep=Yes |
| aspects | Alberta Stroke Program Early Computed Tomography score, which measures extent of stroke from 0 - 10; higher scores indicate fewer early ischemic changes |
| ia.occlus | Intracranial arterial occlusion, based on vessel imaging - five categories3 |
| extra.ica | Extracranial ICA occlusion (1 = Yes, 0 = No) |
| time.rand | Time from stroke onset to study randomization, in minutes |
| time.punc | Time from stroke onset to groin puncture, in minutes (only if Intervention) |
Please refer to this section: The R environment - Data Wrangling: https://rpubs.com/minhtri/933332
df <- read_excel("D:/Statistics/R/R data/fakestroke.xlsx",
col_types = c("text", "text", "numeric",
"text", "numeric", "text", "text",
"numeric", "numeric", "text", "numeric",
"text", "numeric", "numeric", "text",
"numeric", "numeric", "numeric"))
names(df)
## [1] "studyid" "trt" "age" "sex" "nihss" "location"
## [7] "hx.isch" "afib" "dm" "mrankin" "sbp" "iv.altep"
## [13] "time.iv" "aspects" "ia.occlus" "extra.ica" "time.rand" "time.punc"
str(df)
## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
## $ studyid : chr [1:500] "z001" "z002" "z003" "z004" ...
## $ trt : chr [1:500] "Control" "Intervention" "Control" "Control" ...
## $ age : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
## $ sex : chr [1:500] "Male" "Male" "Female" "Male" ...
## $ nihss : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
## $ location : chr [1:500] "Right" "Left" "Right" "Left" ...
## $ hx.isch : chr [1:500] "No" "No" "No" "No" ...
## $ afib : num [1:500] 0 1 0 0 0 0 0 0 1 0 ...
## $ dm : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
## $ mrankin : chr [1:500] "2" "0" "0" "0" ...
## $ sbp : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
## $ iv.altep : chr [1:500] "Yes" "Yes" "No" "Yes" ...
## $ time.iv : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
## $ aspects : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
## $ ia.occlus: chr [1:500] "M1" "M1" "ICA with M1" "ICA with M1" ...
## $ extra.ica: num [1:500] 0 1 1 1 0 0 0 0 0 0 ...
## $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
## $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...
df <- df%>%mutate_if(is.character, as.factor)
str(df)
## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
## $ studyid : Factor w/ 500 levels "z001","z002",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ trt : Factor w/ 2 levels "Control","Intervention": 1 2 1 1 1 1 2 1 1 2 ...
## $ age : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
## $ sex : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 2 1 ...
## $ nihss : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
## $ location : Factor w/ 2 levels "Left","Right": 2 1 2 1 2 1 2 2 1 2 ...
## $ hx.isch : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ afib : num [1:500] 0 1 0 0 0 0 0 0 1 0 ...
## $ dm : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
## $ mrankin : Factor w/ 4 levels "> 2","0","1",..: 4 2 2 2 2 4 2 2 4 2 ...
## $ sbp : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
## $ iv.altep : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 1 2 2 ...
## $ time.iv : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
## $ aspects : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
## $ ia.occlus: Factor w/ 6 levels "A1 or A2","ICA with M1",..: 4 4 2 2 4 1 4 4 4 4 ...
## $ extra.ica: num [1:500] 0 1 1 1 0 0 0 0 0 0 ...
## $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
## $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...
# Changing coding variable to categorical:
df$afib <- factor(df$afib , levels=0:1, labels=c("No", "Yes"))
df$dm <- factor(df$dm , levels=0:1, labels=c("No", "Yes"))
df$extra.ica <- factor(df$extra.ica, levels=0:1, labels=c("No", "Yes"))
# Checking after converting
str(df)
## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
## $ studyid : Factor w/ 500 levels "z001","z002",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ trt : Factor w/ 2 levels "Control","Intervention": 1 2 1 1 1 1 2 1 1 2 ...
## $ age : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
## $ sex : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 2 1 ...
## $ nihss : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
## $ location : Factor w/ 2 levels "Left","Right": 2 1 2 1 2 1 2 2 1 2 ...
## $ hx.isch : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ afib : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 2 1 ...
## $ dm : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ mrankin : Factor w/ 4 levels "> 2","0","1",..: 4 2 2 2 2 4 2 2 4 2 ...
## $ sbp : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
## $ iv.altep : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 1 2 2 ...
## $ time.iv : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
## $ aspects : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
## $ ia.occlus: Factor w/ 6 levels "A1 or A2","ICA with M1",..: 4 4 2 2 4 1 4 4 4 4 ...
## $ extra.ica: Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 1 1 1 ...
## $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
## $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...
# Relevel factors:
df$trt = factor(df$trt, levels=c("Control", "Intervention"))
df$mrankin = factor(df$mrankin, levels=c("0", "1", "2", "> 2"))
df$ia.occlus = factor(df$ia.occlus, levels=c("Intracranial ICA", "ICA with M1",
"M1", "M2", "A1 or A2"))
# Check order of factor after relevel:
str(df)
## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
## $ studyid : Factor w/ 500 levels "z001","z002",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ trt : Factor w/ 2 levels "Control","Intervention": 1 2 1 1 1 1 2 1 1 2 ...
## $ age : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
## $ sex : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 2 1 ...
## $ nihss : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
## $ location : Factor w/ 2 levels "Left","Right": 2 1 2 1 2 1 2 2 1 2 ...
## $ hx.isch : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ afib : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 2 1 ...
## $ dm : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ mrankin : Factor w/ 4 levels "0","1","2","> 2": 3 1 1 1 1 3 1 1 3 1 ...
## $ sbp : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
## $ iv.altep : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 1 2 2 ...
## $ time.iv : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
## $ aspects : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
## $ ia.occlus: Factor w/ 5 levels "Intracranial ICA",..: 3 3 2 2 3 5 3 3 3 3 ...
## $ extra.ica: Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 1 1 1 ...
## $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
## $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...
Package: VIM
Command: aggr (x, plot = T, prop = T, numbers = F, combined = F, sortVars = T, cex.axis=.7, gap=3)
For more information, please refer to the following link:
Dealing with missingness - Imputation: https://rpubs.com/minhtri/932007
Checking the percentage of missingness:
library(VIM)
aggr(df, plot = F, numbers = T, prop = T)
##
## Missings in variables:
## Variable Count
## sbp 1
## time.iv 55
## aspects 4
## ia.occlus 1
## extra.ica 1
## time.rand 2
## time.punc 267
aggr(df, plot = T, numbers = T, prop = T, combined= F, sortVars = T, cex.axis=.7, gap=3)
##
## Variables sorted by number of missings:
## Variable Count
## time.punc 0.534
## time.iv 0.110
## aspects 0.008
## time.rand 0.004
## sbp 0.002
## ia.occlus 0.002
## extra.ica 0.002
## studyid 0.000
## trt 0.000
## age 0.000
## sex 0.000
## nihss 0.000
## location 0.000
## hx.isch 0.000
## afib 0.000
## dm 0.000
## mrankin 0.000
## iv.altep 0.000
aggr(df, plot = T, numbers = T, prop = F, combined= F, sortVars = T, cex.axis=.7, gap=3)
##
## Variables sorted by number of missings:
## Variable Count
## time.punc 267
## time.iv 55
## aspects 4
## time.rand 2
## sbp 1
## ia.occlus 1
## extra.ica 1
## studyid 0
## trt 0
## age 0
## sex 0
## nihss 0
## location 0
## hx.isch 0
## afib 0
## dm 0
## mrankin 0
## iv.altep 0
Discuss
df1 = df %>% select (-time.punc, - time.iv, -studyid) %>% na.omit()
To highlight the difference between the Bayesian regression and the traditional linear regression (frequentist approach), let’s first fit the latter to our data.
Package: car
Command: newModel<-lm(outcome ~ predictor(s), data =
dataFrame, na.action = an action))
# Building the model
library(car)
model_A <- lm(nihss ~ ., data = df1)
summary(model_A)
##
## Call:
## lm(formula = nihss ~ ., data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3283 -3.6782 -0.5946 3.6788 11.2577
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.070796 3.205253 5.014 7.56e-07 ***
## trtIntervention -0.209815 0.423275 -0.496 0.62034
## age -0.006806 0.012374 -0.550 0.58257
## sexMale 0.305910 0.424068 0.721 0.47104
## locationRight 0.805324 0.420833 1.914 0.05627 .
## hx.ischYes -1.063988 0.669898 -1.588 0.11289
## afibYes 0.054109 0.471736 0.115 0.90873
## dmYes -1.972019 0.635639 -3.102 0.00203 **
## mrankin1 -0.164159 0.700352 -0.234 0.81478
## mrankin2 -0.224361 1.008078 -0.223 0.82397
## mrankin> 2 1.405864 1.047780 1.342 0.18032
## sbp -0.005802 0.008308 -0.698 0.48534
## iv.altepYes 1.241297 0.673398 1.843 0.06591 .
## aspects -0.107675 0.136134 -0.791 0.42937
## ia.occlusICA with M1 3.018393 2.380488 1.268 0.20543
## ia.occlusM1 4.226387 2.357445 1.793 0.07365 .
## ia.occlusM2 4.968808 2.478730 2.005 0.04558 *
## ia.occlusA1 or A2 8.057030 3.563516 2.261 0.02422 *
## extra.icaYes 0.440397 0.481065 0.915 0.36042
## time.rand -0.005755 0.003287 -1.751 0.08060 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.586 on 472 degrees of freedom
## Multiple R-squared: 0.07934, Adjusted R-squared: 0.04228
## F-statistic: 2.141 on 19 and 472 DF, p-value: 0.003569
t value and its significance: In multiple
regression, the smaller the value of Pr(> |t|) (and the larger the
value of t), the greater the contribution of that predictor.
=> dm and ia.occlus contribute significantly to the
nihss.
For more information about Classical linear regression - Frequentist approach: https://rpubs.com/minhtri/936322
From https://easystats.github.io/bayestestR/articles/bayestestR.html
The key difference is that in the frequentist framework (the “classical” approach to statistics, with p and t values, as well as degrees of freedom), the effects are fixed (but unknown) and data are random. In other words, it assumes that the unknown parameter has a unique value that we are trying to estimate/guess using our sample data. On the other hand, in the Bayesian framework, instead of estimating the “true effect”, the probability of different effects given the observed data is computed, resulting in a distribution of possible values for the parameters, called the posterior distribution.
To illustrate the difference of interpretation, the Bayesian framework allows to say “given the observed data, the effect has 95% probability of falling within this range”, while the frequentist (less intuitive) alternative would be “when repeatedly computing confidence intervals from data of this sort, there is a 95% probability that the effect falls within a given range”. In essence, the Bayesian sampling algorithms (such as MCMC sampling) return a probability distribution (the posterior) of an effect that is compatible with the observed data. Thus, an effect can be described by characterizing its posterior distribution in relation to its centrality (point-estimates), uncertainty, as well as its existence and significance.
Credible intervals are an important concept in Bayesian statistics. Its core purpose is to describe and summarise the uncertainty related to the unknown parameters you are trying to estimate. In this regard, it could appear as quite similar to the frequentist Confidence Intervals. However, while their goal is similar, their statistical definition and meaning is very different. Indeed, while the latter is obtained through a complex algorithm full of rarely-tested assumptions and approximations, the credible intervals are fairly straightforward to compute.
As the Bayesian inference returns a distribution of possible effect values (the posterior), the credible interval is just the range containing a particular percentage of probable values. For instance, the 95% credible interval is simply the central portion of the posterior distribution that contains 95% of the values.
Note how this drastically improve the interpretability of the Bayesian interval compared to the frequentist one. Indeed, the Bayesian framework allows us to say “given the observed data, the effect has 95% probability of falling within this range”, compared to the less straightforward, frequentist alternative (the 95% Confidence Interval) would be “there is a 95% probability that when computing a confidence interval from data of this sort, the effect falls within this range”.
Naturally, when it came about choosing the CI level to report by default, people started using 95%, the arbitrary convention used in the frequentist world. However, some authors suggested that 95% might not be the most appropriate for Bayesian posterior distributions, potentially lacking stability if not enough posterior samples are drawn (Kruschke, 2014).
The proposition was to use 90% instead of 95%. However, recently, McElreath (2014, 2018) suggested that if we were to use arbitrary thresholds in the first place, why not use 89%? Moreover, 89 is the highest prime number that does not exceed the already unstable 95% threshold. What does it have to do with anything? Nothing, but it reminds us of the total arbitrariness of these conventions (McElreath, 2018). Thus, CIs computed with 89% intervals (ci = 0.89), are deemed to be more stable than, for instance, 95% intervals (Kruschke, 2014)
However, 95% has some advantages too. For instance, it shares (in the case of a normal posterior distribution) an intuitive relationship with the standard deviation and it conveys a more accurate image of the (artificial) bounds of the distribution. Also, because it is wider, it makes analyses more conservative (i.e., the probability of covering 0 is larger for the 95% CI than for lower ranges such as 89%), which is a good thing in the context of the reproducibility crisis.
The reader might notice that bayestestR provides two methods to compute credible intervals, the Highest Density Interval (HDI) (hdi()) and the Equal-tailed Interval (ETI) (eti()). These methods can also be changed via the method argument of the ci() function. What is the difference? Let’s see: for symmetric distribution and other types of distribution
For symmetric distribution:
library(bayestestR)
library(ggplot2)
# Generate a normal distribution
post <- distribution_normal(1000)
# Compute HDI and ETI
ci_hdi <- ci(post, method = "HDI")
ci_eti <- ci(post, method = "ETI")
# Plot the distribution and add the limits of the two CIs
out <- estimate_density(post, extend = TRUE)
ggplot(out, aes(x = x, y = y)) +
geom_area(fill = "orange") +
theme_classic() +
# HDI in blue
geom_vline(xintercept = ci_hdi$CI_low, color = "royalblue", size = 3) +
geom_vline(xintercept = ci_hdi$CI_high, color = "royalblue", size = 3) +
# Quantile in red
geom_vline(xintercept = ci_eti$CI_low, color = "red", size = 1) +
geom_vline(xintercept = ci_eti$CI_high, color = "red", size = 1)
Other types of distribution
# Generate a beta distribution
posterior <- distribution_beta(1000, 6, 2)
# Compute HDI and Quantile CI
ci_hdi <- ci(posterior, method = "HDI")
ci_eti <- ci(posterior, method = "ETI")
# Plot the distribution and add the limits of the two CIs
out <- estimate_density(posterior, extend = TRUE)
ggplot(out, aes(x = x, y = y)) +
geom_area(fill = "orange") +
theme_classic() +
# HDI in blue
geom_vline(xintercept = ci_hdi$CI_low, color = "royalblue", size = 3) +
geom_vline(xintercept = ci_hdi$CI_high, color = "royalblue", size = 3) +
# ETI in red
geom_vline(xintercept = ci_eti$CI_low, color = "red", size = 1) +
geom_vline(xintercept = ci_eti$CI_high, color = "red", size = 1)
Contrary to the HDI, for which all points within the interval have a higher probability density than points outside the interval, the ETI is equal-tailed. This means that a 90% interval has 5% of the distribution on either side of its limits. It indicates the 5th percentile and the 95th percentile. In symmetric distributions, the two methods of computing credible intervals, the ETI and the HDI, return similar results.
This is not the case for skewed distributions. Indeed, it is possible that parameter values in the ETI have lower credibility (are less probable) than parameter values outside the ETI. This property seems undesirable as a summary of the credible values in a distribution.
On the other hand, the ETI range does not change when transformations are applied to the distribution (for instance, for log-odds to probabilities transformation): the lower and higher bounds of the transformed distribution will correspond to the transformed lower and higher bounds of the original distribution. On the contrary, applying transformations to the distribution will change the resulting HDI. Thus, for instance, if exponentiated credible intervals are required, calculating the ETI is recommended.
Unlike the HDI and the ETI, which look at the posterior distribution, the Support Interval (SI) provides information regarding the change in the credibility of values from the prior to the posterior - in other words, it indicates which values of a parameter have gained support by the observed data by some factor greater or equal to k (Wagenmakers, Gronau, Dablander, & Etz, 2018). For more info, please refer to the following link: https://easystats.github.io/bayestestR/articles/credible_interval.html
The Probability of Direction (pd) is an index of effect existence, ranging from 50% to 100%, representing the certainty with which an effect goes in a particular direction (i.e., is positive or negative).
Beyond its simplicity of interpretation, understanding and computation, this index also presents other interesting properties:
It is independent from the model: It is solely
based on the posterior distributions and does not require any additional
information from the data or the model.
It is robust to the scale of both the response
variable and the predictors.
It is strongly correlated with the frequentist p-value, and can thus be used to draw parallels and give some reference to readers non-familiar with Bayesian statistics.
Indices of effect existence, such as the pd, are particularly useful in exploratory research or clinical studies, for which the focus is to make sure that the effect of interest is not in the opposite direction (for clinical studies, that a treatment is not harmful). However, once the effect’s direction is confirmed, the focus should shift toward its significance, including a precise estimation of its magnitude, relevance and importance.
In most cases, it seems that the pd has a direct correspondence with
the frequentist one-sided p-value through the formula:
p.one−sided = 1 − pd
Similarly, the two-sided p-value (the most commonly reported one) is
equivalent through the formula: p.two−sided = 2 x
(1−pd)
A two-sided p-value of respectively .1, .05, .01 and .001 correspond approximately to a pd of 95%, 97.5%, 99.5% and 99.95%:
** pd <= 95% ~ p > .1: uncertain
** pd > 95% ~ p < .1: possibly existing
** pd > 97%: likely existing
** pd > 99%: probably existing
** pd > 99.9%: certainly existing
p = 0.000 is coined as one of the term to avoid when reporting results (Lilienfeld et al., 2015), even if often displayed by statistical software. The rationale is that for every probability distribution, there is no value with a probability of exactly 0. There is always some infinitesimal probability associated with each data point, and the p = 0.000 returned by software is due to approximations related, among other, to finite memory hardware.
One could apply this rationale for the pd: since all data points have a non-null probability density, then the pd (a particular portion of the probability density) can never be 100%. While this is an entirely valid point, people using the direct method might argue that their pd is based on the posterior draws, rather than on the theoretical, hidden, true posterior distribution (which is only approximated by the posterior draws). These posterior draws represent a finite sample for which pd = 100% is a valid statement.
For more information, please refer to the following link: https://easystats.github.io/bayestestR/articles/probability_of_direction.html
According to https://easystats.github.io/bayestestR/articles/region_of_practical_equivalence.html: Unlike a frequentist approach, Bayesian inference is not based on statistical significance, where effects are tested against “zero”. Indeed, the Bayesian framework offers a probabilistic view of the parameters, allowing assessment of the uncertainty related to them. Thus, rather than concluding that an effect is present when it simply differs from zero, we would conclude that the probability of being outside a specific range that can be considered as “practically no effect” (i.e., a negligible magnitude) is sufficient. This range is called the region of practical equivalence (ROPE).
Null Hypothesis: Parameter is non-significant (in the sense of not important enough to be cared about). In other words, it checks the percentage of Credible Interval (CI - HDI) that is the null region (the ROPE). If this percentage is sufficiently low, the null hypothesis is rejected. If this percentage is sufficiently high, the null hypothesis is accepted.
If the HDI is completely outside the ROPE, the “null hypothesis” for this parameter is “rejected”. If the ROPE completely covers the HDI, i.e., all most credible values of a parameter are inside the region of practical equivalence, the null hypothesis is accepted. Else, it’s unclear whether the null hypothesis should be accepted or rejected.
If the full ROPE is used (i.e., 100% of the HDI), then the null hypothesis is rejected or accepted if the percentage of the posterior within the ROPE is smaller than to 2.5% or greater than 97.5%. Desirable results are low proportions inside the ROPE (the closer to zero the better).
Kruschke (2018) suggests that the ROPE could be set, by default, to a range from -0.1 to 0.1 of a standardized parameter:
For linear models (lm): [−0.1 x SDy , 0.1 x SDy]
For logistic models: π/√3
For t-tests: the standard deviation of the response is used,
similarly to linear models.
For correlations, -0.05, 0.05 is used.
For all other models, -0.1, 0.1 is used to determine the ROPE limits, but it is strongly advised to specify it manually.
Although it can also be used to assess effect existence and significance, the Bayes factor (BF) is a versatile index that can be used to directly compare different models (or data generation processes). The Bayes factor is a ratio that informs us by how much more (or less) likely the observed data are under two compared models - usually a model with versus a model without the effect. Depending on the specifications of the null model (whether it is a point-estimate (e.g., 0) or an interval), the Bayes factor could be used both in the context of effect existence and significance.
For how to calculate Bayes factor, please refer to the following link: https://easystats.github.io/bayestestR/articles/bayes_factors.html
One of the main difference between the Bayesian and the frequentist frameworks is that the former returns a probability distribution for each effect (i.e., a model parameter of interest, such as a regression slope) instead of a single value. However, there is still a need and demand - for reporting or use in further analysis - for a single value (point-estimate) that best characterises the underlying posterior distribution. There are three main indices used in the literature for effect estimation: the mean - the median - the MAP (Maximum A Posteriori) estimate (roughly corresponding to the mode - the “peak” - of the distribution).
=> Overall, the median appears to be a safe choice, maintaining a high performance across different types of models.
To fit a Bayesian regression, we use the function stan_glm from the rstanarm package. This function as the above lm function requires providing the formula and the data that will be used, and leave all the following arguments with their default values:
Package: rstanarm
Command: stan_glm(formula, data, prior, prior_intercept,
prior_aux, algorithm)
with:
library(rstanarm)
model_B <- stan_glm(nihss ~. , data=df1 , seed=1234)
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.001 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.951 seconds (Warm-up)
## Chain 1: 0.962 seconds (Sampling)
## Chain 1: 1.913 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.806 seconds (Warm-up)
## Chain 2: 1.285 seconds (Sampling)
## Chain 2: 2.091 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 1.245 seconds (Warm-up)
## Chain 3: 0.928 seconds (Sampling)
## Chain 3: 2.173 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.936 seconds (Warm-up)
## Chain 4: 0.896 seconds (Sampling)
## Chain 4: 1.832 seconds (Total)
## Chain 4:
print(model_B, digits = 3)
## stan_glm
## family: gaussian [identity]
## formula: nihss ~ .
## observations: 492
## predictors: 20
## ------
## Median MAD_SD
## (Intercept) 16.305 3.134
## trtIntervention -0.211 0.431
## age -0.007 0.013
## sexMale 0.310 0.438
## locationRight 0.809 0.416
## hx.ischYes -1.078 0.665
## afibYes 0.050 0.461
## dmYes -1.983 0.660
## mrankin1 -0.171 0.725
## mrankin2 -0.255 1.004
## mrankin> 2 1.406 1.066
## sbp -0.006 0.008
## iv.altepYes 1.243 0.697
## aspects -0.107 0.133
## ia.occlusICA with M1 2.865 2.317
## ia.occlusM1 4.071 2.329
## ia.occlusM2 4.803 2.430
## ia.occlusA1 or A2 7.873 3.562
## extra.icaYes 0.437 0.481
## time.rand -0.006 0.003
##
## Auxiliary parameter(s):
## Median MAD_SD
## sigma 4.587 0.143
##
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg
Discuss
The Median estimate is the median computed from the
MCMC simulation, and MAD_SD is the median absolute
deviation computed from the same simulation. To well understand how
getting these outputs, we plot the MCMC simulation of predictors using
bayesplot. As can be seen, the point estimate of “age” falls on the
median of the distribution (red line). The same thing is true for
“other” predictors.
library(bayesplot)
mcmc_dens(model_B, pars = c("age"))+ vline_at(-0.007, col="red")
How do we evaluate the model parameters? By analyzing the posteriors using some specific statistics. We make use of the function describe_posterior provided by bayestestR package.
Package: bayestestR
Command: describe_posterior(model, ci = 0.95)
Summary of Posterior Distribution:
library(bayestestR)
posteriors = describe_posterior(model_B, ci = 0.95)
print_md(posteriors, digits = 3)
| Parameter | Median | 95% CI | pd | ROPE | % in ROPE | Rhat | ESS |
|---|---|---|---|---|---|---|---|
| (Intercept) | 16.305 | [ 9.95, 22.35] | 100% | [-0.47, 0.47] | 0% | 1.000 | 1775.000 |
| trtIntervention | -0.211 | [-1.02, 0.61] | 68.77% | [-0.47, 0.47] | 70.76% | 1.000 | 4664.000 |
| age | -0.007 | [-0.03, 0.02] | 70.65% | [-0.47, 0.47] | 100% | 1.000 | 4850.000 |
| sexMale | 0.310 | [-0.49, 1.14] | 76.80% | [-0.47, 0.47] | 64.92% | 1.000 | 4955.000 |
| locationRight | 0.809 | [ 0.02, 1.60] | 97.72% | [-0.47, 0.47] | 19.92% | 1.000 | 4718.000 |
| hx.ischYes | -1.078 | [-2.37, 0.24] | 94.85% | [-0.47, 0.47] | 16.05% | 1.000 | 4456.000 |
| afibYes | 0.050 | [-0.85, 1.00] | 54.30% | [-0.47, 0.47] | 72.37% | 1.000 | 4832.000 |
| dmYes | -1.983 | [-3.19, -0.72] | 99.95% | [-0.47, 0.47] | 0% | 1.000 | 4348.000 |
| mrankin1 | -0.171 | [-1.55, 1.23] | 58.65% | [-0.47, 0.47] | 50.03% | 0.999 | 4744.000 |
| mrankin2 | -0.255 | [-2.25, 1.72] | 60.68% | [-0.47, 0.47] | 37.05% | 0.999 | 5408.000 |
| mrankin> 2 | 1.406 | [-0.71, 3.48] | 90.48% | [-0.47, 0.47] | 15.76% | 0.999 | 5548.000 |
| sbp | -0.006 | [-0.02, 0.01] | 76.22% | [-0.47, 0.47] | 100% | 1.000 | 4613.000 |
| iv.altepYes | 1.243 | [-0.09, 2.55] | 96.83% | [-0.47, 0.47] | 11.18% | 1.000 | 5175.000 |
| aspects | -0.107 | [-0.38, 0.16] | 79.25% | [-0.47, 0.47] | 100% | 0.999 | 4808.000 |
| ia.occlusICA with M1 | 2.865 | [-1.45, 7.48] | 89.55% | [-0.47, 0.47] | 7.61% | 1.000 | 1086.000 |
| ia.occlusM1 | 4.071 | [-0.19, 8.68] | 96.75% | [-0.47, 0.47] | 2.87% | 1.000 | 1060.000 |
| ia.occlusM2 | 4.803 | [ 0.27, 9.66] | 98.15% | [-0.47, 0.47] | 0.61% | 1.000 | 1136.000 |
| ia.occlusA1 or A2 | 7.873 | [ 0.96, 15.05] | 98.60% | [-0.47, 0.47] | 0% | 1.000 | 1774.000 |
| extra.icaYes | 0.437 | [-0.51, 1.38] | 82.33% | [-0.47, 0.47] | 51.76% | 1.000 | 4588.000 |
| time.rand | -0.006 | [-0.01, 0.00] | 95.80% | [-0.47, 0.47] | 100% | 0.999 | 4367.000 |
Explain
CI : Credible Interval, it is used to quantify
the uncertainty about the regression coefficients. There are two methods
to compute CI, the highest density interval HDI which is the default,
and the Equal-tailed Interval ETI. with 89% probability (given the data)
that a coefficient lies above the CI_low value and under CI_high value.
This straightforward probabilistic interpretation is completely
different from the confidence interval used in classical linear
regression where the coefficient fall inside this confidence interval
(if we choose 95% of confidence) 95 times if we repeat the study 100
times.
pd : Probability of Direction , which is the
probability that the effect goes to the positive or to the negative
direction, and it is considered as the best equivalent for the
p-value.
ROPE_CI: Region of Practical Equivalence, since
bayes method deals with true probabilities, it does not make sense to
compute the probability of getting the effect equals zero (the null
hypothesis) as a point (probability of a point in continuous intervals
equal zero ). Thus, we define instead a small range around zero which
can be considered practically the same as no effect (zero), this range
therefore is called ROPE. By default (according to Cohen, 1988) The Rope
is [-0.1,0.1] from the standardized coefficients.
Rhat: scale reduction factor 𝑅, it is computed
for each scalar quantity of interest, as the standard deviation of that
quantity from all the chains included together, divided by the root mean
square of the separate within-chain standard deviations. When this value
is close to 1 we do not have any convergence problem with MCMC.
ESS: effective sample size, it captures how many independent draws contain the same amount of information as the dependent sample obtained by the MCMC algorithm, the higher the ESS the better. The threshold used in practice is 400.
=> You just fitted a Bayesian version of the model by simply using the stan_glm() function instead of lm() and described the posterior distributions of the parameters: describe_posterior()!
Aternatively, we can get the coefficeient estimates (which are the medians by default) separatly by using the package insight.
Package: insight
Command: get_parameters(model)
library(insight)
post <- get_parameters(model_B)
Discuss
As we can see, the parameters take the form of a lengthy dataframe with
two types of columns, corresponding to the intercept and the
effect of each variable. These columns contain the posterior
distributions of these parameters. In simple terms, the posterior
distribution is a set of different plausible values for each parameter.
Contrast this with the result we saw from the frequentist linear
regression mode using lm, where the results had single values
for each effect of the model, and not a distribution of values.
This is one of the most important differences between these two
frameworks.
dim(post)
## [1] 4000 20
Why is the size 4000 ?
For more info, please refer to https://easystats.github.io/bayestestR/articles/example1.html
Now that we’ve understood where these values come from, let’s look at them. We will start by visualizing the posterior distribution of our parameter of interest:
Effect of numeric variable: age
ggplot(post, aes(x = age)) +
geom_density(fill = "orange")
Discuss
This distribution represents the probability (the y axis) of different effects (the x axis). The central values are more probable than the extreme values. As you can see, this distribution ranges from about -0.055 to 0.04, with the bulk of it being at around -0.007 (look at the median and range of age).
And this is the heart of Bayesian analysis. We don’t need p-values, t-values, or degrees of freedom. Everything we need is contained within this posterior distribution.
Our description above is consistent with the values obtained from the frequentist regression (which resulted in a β of -0.007). Indeed, in most cases, Bayesian analysis does not drastically differ from the frequentist results or their interpretation. Rather, it makes the results more interpretable and intuitive, and easier to understand and describe.
Effect of categorical variable: sex-Male: Imagine for a moment you are interested in how the nihss score varies depending on two different groups: Intervention and Control.
ggplot(post, aes(x = sexMale)) +
geom_density(fill = "red")
This represents the posterior distribution of the difference between Male and Female. It seems that the difference is positive (since the values are concentrated on the right side of 0) => nihss score increases if participant is Male.
But, by how much? Let compute the median and the CI (as below):
median(post$sexMale)
## [1] 0.3102701
hdi(post$sexMale)
## 95% HDI: [-0.48, 1.15]
Discuss: It makes nihss by around 0.31 scores (the median). However, the uncertainty is quite low: there is 95% chance that the difference between the two gender types is between -0.48 and 1.15.
Unfortunately, it is often not practical to report the whole posterior distributions as graphs. We need to find a concise way to summarize it. We recommend to describe the posterior distribution with 3 elements:
A point-estimate which is a one-value summary (similar to the
beta in frequentist regressions).
A credible interval representing the associated
uncertainty.
Some indices of significance, giving information about the relative importance of this effect.
Centrality indices, such as the mean, the median, or the mode are usually used as point-estimates. But what’s the difference between them?
Mean:
print(purrr::map_dbl(post, mean),digits = 3)
## (Intercept) trtIntervention age
## 16.19966 -0.21046 -0.00673
## sexMale locationRight hx.ischYes
## 0.31231 0.80310 -1.07695
## afibYes dmYes mrankin1
## 0.05700 -1.97313 -0.16776
## mrankin2 mrankin> 2 sbp
## -0.24783 1.39985 -0.00597
## iv.altepYes aspects ia.occlusICA with M1
## 1.23625 -0.10724 2.90636
## ia.occlusM1 ia.occlusM2 ia.occlusA1 or A2
## 4.12252 4.86562 7.93401
## extra.icaYes time.rand
## 0.44084 -0.00576
Median:
print(purrr::map_dbl(post,median),digits = 3)
## (Intercept) trtIntervention age
## 16.30473 -0.21090 -0.00712
## sexMale locationRight hx.ischYes
## 0.31027 0.80877 -1.07798
## afibYes dmYes mrankin1
## 0.04952 -1.98303 -0.17059
## mrankin2 mrankin> 2 sbp
## -0.25508 1.40627 -0.00594
## iv.altepYes aspects ia.occlusICA with M1
## 1.24282 -0.10692 2.86471
## ia.occlusM1 ia.occlusM2 ia.occlusA1 or A2
## 4.07147 4.80325 7.87335
## extra.icaYes time.rand
## 0.43739 -0.00572
Maximum A posteriori (MAP):
print(purrr::map_dbl(post, map_estimate),digits = 3)
## (Intercept) trtIntervention age
## 16.81814 -0.21015 -0.00855
## sexMale locationRight hx.ischYes
## 0.32157 0.82763 -1.07448
## afibYes dmYes mrankin1
## 0.09198 -2.04621 -0.20568
## mrankin2 mrankin> 2 sbp
## -0.23416 1.50310 -0.00532
## iv.altepYes aspects ia.occlusICA with M1
## 1.32554 -0.10276 3.00038
## ia.occlusM1 ia.occlusM2 ia.occlusA1 or A2
## 3.99828 4.62867 7.47592
## extra.icaYes time.rand
## 0.42618 -0.00530
Discuss
As we see, the values are closer to each other due to the like normality of the distribution of the posteriors where all the central statistics (mean, median, mode) are closer to each other. Using the following plot to visualize the age coefficient using different statistics as follows:
mcmc_dens(post, pars=c("age"))+
geom_density(fill = "orange") +
vline_at(median(post$age), col="red", size = 1)+
vline_at(mean(post$age), col="blue", size = 1)+
vline_at(map_estimate(post$age), col="green", size = 1)
Well, all these values give very similar results. Thus, we will choose the median, as this value has a direct meaning from a probabilistic perspective: there is 50% chance that the true effect is higher and 50% chance that the effect is lower (as it divides the distribution in two equal parts).
As we do with classical regression (frequentist), we can test the significance of the bayesian regression coefficients by checking whether the corresponding credible interval contains zero or not, if no then this coefficient is significant. Let’s go back to our model and check the significance of each coefficient (using credible based on the default hdi).
Some people choose to use the range containing the 89% most probable effect values. As the 89% level gives more stable results (Kruschke, 2014) and reminds us about the arbitrariness of such conventions (McElreath, 2018). However, I will use 95% CI and eti as the default ci_method in this example:
hdi(model_B)
## Highest Density Interval
##
## Parameter | 95% HDI
## -------------------------------------
## (Intercept) | [ 9.85, 22.22]
## trtIntervention | [-1.01, 0.61]
## age | [-0.03, 0.02]
## sexMale | [-0.48, 1.15]
## locationRight | [ 0.01, 1.58]
## hx.ischYes | [-2.39, 0.21]
## afibYes | [-0.82, 1.03]
## dmYes | [-3.18, -0.72]
## mrankin1 | [-1.51, 1.24]
## mrankin2 | [-2.26, 1.68]
## mrankin> 2 | [-0.75, 3.43]
## sbp | [-0.02, 0.01]
## iv.altepYes | [-0.07, 2.56]
## aspects | [-0.38, 0.16]
## ia.occlusICA with M1 | [-1.65, 7.27]
## ia.occlusM1 | [-0.20, 8.62]
## ia.occlusM2 | [ 0.35, 9.73]
## ia.occlusA1 or A2 | [ 1.53, 15.48]
## extra.icaYes | [-0.54, 1.35]
## time.rand | [-0.01, 0.00]
eti(model_B)
## Equal-Tailed Interval
##
## Parameter | 95% ETI | Effects | Component
## -------------------------------------------------------------
## (Intercept) | [ 9.95, 22.35] | fixed | conditional
## trtIntervention | [-1.02, 0.61] | fixed | conditional
## age | [-0.03, 0.02] | fixed | conditional
## sexMale | [-0.49, 1.14] | fixed | conditional
## locationRight | [ 0.02, 1.60] | fixed | conditional
## hx.ischYes | [-2.37, 0.24] | fixed | conditional
## afibYes | [-0.85, 1.00] | fixed | conditional
## dmYes | [-3.19, -0.72] | fixed | conditional
## mrankin1 | [-1.55, 1.23] | fixed | conditional
## mrankin2 | [-2.25, 1.72] | fixed | conditional
## mrankin> 2 | [-0.71, 3.48] | fixed | conditional
## sbp | [-0.02, 0.01] | fixed | conditional
## iv.altepYes | [-0.09, 2.55] | fixed | conditional
## aspects | [-0.38, 0.16] | fixed | conditional
## ia.occlusICA with M1 | [-1.45, 7.48] | fixed | conditional
## ia.occlusM1 | [-0.19, 8.68] | fixed | conditional
## ia.occlusM2 | [ 0.27, 9.66] | fixed | conditional
## ia.occlusA1 or A2 | [ 0.96, 15.05] | fixed | conditional
## extra.icaYes | [-0.51, 1.38] | fixed | conditional
## time.rand | [-0.01, 0.00] | fixed | conditional
Discuss: Using both methods, the only significant coefficient is dm and ia.occlus variables, which is in line with the classical regression.
Note: this similar result between frequentist and bayesian regression may due to the normality assumption for the former that is well satisfied which gives satisfied results and due to the normal prior used in the latter. However, in real world it is less often to be sure about the normality assumption which may give contradict conclusions between the two approaches.
one way to assess significance could be to define an area around 0, which will consider as practically equivalent to zero (i.e., absence of, or a negligible, effect). This is called the Region of Practical Equivalence (ROPE), and is one way of testing the significance of parameters. By checking the part of the credible interval that falls inside the ROPE interval = non-significant coefficient.
Percentage of ROPE to Accept or Reject:
Define ROPE range:
For linear models (lm): [−0.1 x SDy , 0.1 x SDy]
For logistic models: π/√3
For t-tests: the standard deviation of the response is used,
similarly to linear models.
For correlations, -0.05, 0.05 is used.
For all other models, -0.1, 0.1 is used to determine the ROPE limits, but it is strongly advised to specify it manually.
To calculate ROPE range:
rope_value <- 0.1 * sd(df1$nihss)
rope_range <- c(-rope_value, rope_value)
rope_range
## [1] -0.4686278 0.4686278
or we can just simply use:
rope_range(model_B)
## [1] -0.4686278 0.4686278
Let’s recompute the percentage in ROPE by calling the rope from bayestestR package.
library(bayestestR)
rope(post$age, range = rope_range)
## # Proportion of samples inside the ROPE [-0.47, 0.47]:
##
## inside ROPE
## -----------
## 100.00 %
rope(post$dmYes, range = rope_range)
## # Proportion of samples inside the ROPE [-0.47, 0.47]:
##
## inside ROPE
## -----------
## 0.00 %
rope(post$`ia.occlusA1 or A2`, range = rope_range)
## # Proportion of samples inside the ROPE [-0.47, 0.47]:
##
## inside ROPE
## -----------
## 0.00 %
Discuss
For age almost all the credible interval (HDI)
is inside the ROPE range, which means that coefficient is
non-significant. Or The probability of age coefficient
to be zero is 100%.
For dm and ia.occlusA1 or A2 we observe that the 95% of the posterior distribution of the effect does not overlap with the ROPE, which means that the effect is significant.
Sometimes we are only interested in checking the direction of the coefficient (positive or negative). This is the role of pd statistic in the above table, high value means that the associated effect is concentrated on the same side as the median.
For our model, since pd is equal to 1, not many the posteriors of the variables and the intercept are on the same side (if median negative all other values are negatives). However, it should be noted that this statistic does not assess the significance of the effect. Something more important to mention is that there exists a strong relation between this probability and the p-value approximated as follows: 𝑝−𝑣𝑎𝑙𝑢𝑒 = 1−𝑝𝑑. Let’s check this with our variables.
library(broom)
Classical.p <-dplyr::select(tidy(model_A), c(term,p.value))
Classical.p <- round(Classical.p$p.value, digits = 3)
Bayes.1sided_p <- 1- purrr::map_dbl(post, p_direction)
Bayes.2sided_p <- Bayes.1sided_p * 2
a <- cbind(Classical.p , Bayes.1sided_p, Bayes.2sided_p)
a
## Classical.p Bayes.1sided_p Bayes.2sided_p
## (Intercept) 0.000 0.00000 0.0000
## trtIntervention 0.620 0.31225 0.6245
## age 0.583 0.29350 0.5870
## sexMale 0.471 0.23200 0.4640
## locationRight 0.056 0.02275 0.0455
## hx.ischYes 0.113 0.05150 0.1030
## afibYes 0.909 0.45700 0.9140
## dmYes 0.002 0.00050 0.0010
## mrankin1 0.815 0.41350 0.8270
## mrankin2 0.824 0.39325 0.7865
## mrankin> 2 0.180 0.09525 0.1905
## sbp 0.485 0.23775 0.4755
## iv.altepYes 0.066 0.03175 0.0635
## aspects 0.429 0.20750 0.4150
## ia.occlusICA with M1 0.205 0.10450 0.2090
## ia.occlusM1 0.074 0.03250 0.0650
## ia.occlusM2 0.046 0.01850 0.0370
## ia.occlusA1 or A2 0.024 0.01400 0.0280
## extra.icaYes 0.360 0.17675 0.3535
## time.rand 0.081 0.04200 0.0840
Discuss: The p-value of Classical Linear Regression and Bayesian Regression are similar.
In summary, we just need to use a few commands to fit a Bayesian version of the model by simply using the stan_glm() function instead of lm() and describe the posterior distributions of the parameters describe_posterior(): median, CI, pd and % in ROPE.
library(rstanarm)
model_C <- stan_glm(nihss ~. , data=df1 , seed=1234)
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.803 seconds (Warm-up)
## Chain 1: 0.849 seconds (Sampling)
## Chain 1: 1.652 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0.001 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.706 seconds (Warm-up)
## Chain 2: 0.833 seconds (Sampling)
## Chain 2: 1.539 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.76 seconds (Warm-up)
## Chain 3: 0.79 seconds (Sampling)
## Chain 3: 1.55 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.811 seconds (Warm-up)
## Chain 4: 0.8 seconds (Sampling)
## Chain 4: 1.611 seconds (Total)
## Chain 4:
library(bayestestR)
posteriors = describe_posterior(model_C, ci = 0.95)
print_md(posteriors, digits = 3)
| Parameter | Median | 95% CI | pd | ROPE | % in ROPE | Rhat | ESS |
|---|---|---|---|---|---|---|---|
| (Intercept) | 16.305 | [ 9.95, 22.35] | 100% | [-0.47, 0.47] | 0% | 1.000 | 1775.000 |
| trtIntervention | -0.211 | [-1.02, 0.61] | 68.77% | [-0.47, 0.47] | 70.76% | 1.000 | 4664.000 |
| age | -0.007 | [-0.03, 0.02] | 70.65% | [-0.47, 0.47] | 100% | 1.000 | 4850.000 |
| sexMale | 0.310 | [-0.49, 1.14] | 76.80% | [-0.47, 0.47] | 64.92% | 1.000 | 4955.000 |
| locationRight | 0.809 | [ 0.02, 1.60] | 97.72% | [-0.47, 0.47] | 19.92% | 1.000 | 4718.000 |
| hx.ischYes | -1.078 | [-2.37, 0.24] | 94.85% | [-0.47, 0.47] | 16.05% | 1.000 | 4456.000 |
| afibYes | 0.050 | [-0.85, 1.00] | 54.30% | [-0.47, 0.47] | 72.37% | 1.000 | 4832.000 |
| dmYes | -1.983 | [-3.19, -0.72] | 99.95% | [-0.47, 0.47] | 0% | 1.000 | 4348.000 |
| mrankin1 | -0.171 | [-1.55, 1.23] | 58.65% | [-0.47, 0.47] | 50.03% | 0.999 | 4744.000 |
| mrankin2 | -0.255 | [-2.25, 1.72] | 60.68% | [-0.47, 0.47] | 37.05% | 0.999 | 5408.000 |
| mrankin> 2 | 1.406 | [-0.71, 3.48] | 90.48% | [-0.47, 0.47] | 15.76% | 0.999 | 5548.000 |
| sbp | -0.006 | [-0.02, 0.01] | 76.22% | [-0.47, 0.47] | 100% | 1.000 | 4613.000 |
| iv.altepYes | 1.243 | [-0.09, 2.55] | 96.83% | [-0.47, 0.47] | 11.18% | 1.000 | 5175.000 |
| aspects | -0.107 | [-0.38, 0.16] | 79.25% | [-0.47, 0.47] | 100% | 0.999 | 4808.000 |
| ia.occlusICA with M1 | 2.865 | [-1.45, 7.48] | 89.55% | [-0.47, 0.47] | 7.61% | 1.000 | 1086.000 |
| ia.occlusM1 | 4.071 | [-0.19, 8.68] | 96.75% | [-0.47, 0.47] | 2.87% | 1.000 | 1060.000 |
| ia.occlusM2 | 4.803 | [ 0.27, 9.66] | 98.15% | [-0.47, 0.47] | 0.61% | 1.000 | 1136.000 |
| ia.occlusA1 or A2 | 7.873 | [ 0.96, 15.05] | 98.60% | [-0.47, 0.47] | 0% | 1.000 | 1774.000 |
| extra.icaYes | 0.437 | [-0.51, 1.38] | 82.33% | [-0.47, 0.47] | 51.76% | 1.000 | 4588.000 |
| time.rand | -0.006 | [-0.01, 0.00] | 95.80% | [-0.47, 0.47] | 100% | 0.999 | 4367.000 |
According to https://easystats.github.io/bayestestR/articles/guidelines.html: A Bayesian analysis returns a posterior distribution for each parameter (or effect). To minimally describe these distributions, they recommend reporting a point-estimate of centrality as well as information characterizing the estimation uncertainty (the dispersion). Additionally, one can also report indices of effect existence and/or significance.
Centrality: reporting the median as an index of centrality, as it is more robust compared to the mean or the MAP estimate. However, in case of a severely skewed posterior distribution, the MAP estimate could be a good alternative.
Uncertainty: The 95% or 89% Credible Intervals (CI) are two reasonable ranges to characterize the uncertainty related to the estimation. They also recommend computing the CIs based on the HDI rather than quantiles, favouring probable over central values. Note that a CI based on the quantile (equal-tailed interval) might be more appropriate in case of transformations (for instance when transforming log-odds to probabilities). Otherwise, intervals that originally do not cover the null might cover it after transformation.
Existence: The most straightforward index to describe existence of an effect is the Probability of Direction (pd), representing the certainty associated with the most probable direction (positive or negative) of the effect. This index is easy to understand, simple to interpret, straightforward to compute, robust to model characteristics, and independent from the scale of the data. Moreover, it is strongly correlated with the frequentist p-value, and can thus be used to draw parallels and give some reference to readers non-familiar with Bayesian statistics. A two-sided p-value of respectively .1, .05, .01 and .001 correspond approximately to a pd of 95%, 97.5%, 99.5% and 99.95%:
** pd <= 95% ~ p > .1: uncertain
** pd > 95% ~ p < .1: possibly existing
** pd > 97%: likely existing
** pd > 99%: probably existing
** pd > 99.9%: certainly existing
** > 99% in ROPE: negligible (we can accept the null hypothesis:
no significant different)
** > 97.5% in ROPE: probably negligible
** <= 97.5% & >= 2.5% in ROPE: undecided significance
** < 2.5% in ROPE: probably significant
** < 1% in ROPE: significant (we can reject the null hypothesis)
Template Sentence: “the effect of X has a probability of pd of being negative (Median = median, 95% CI [ HDIlow , HDIhigh ] and can be considered as significant [ROPE% in ROPE]).”
https://statisticsbyjim.com/regression/model-specification-variable-selection/
https://rpubs.com/Qsheep/BayesianLinearRegression
https://rstudio-pubs-static.s3.amazonaws.com/677464_c8609bc9757543e2b8543b2cf5206175.html
https://github.com/easystats/bayestestR#documentation
Cohen, J. (1988). Statistical power analysis for the social
sciences.
Kruschke, J. (2014). Doing bayesian data analysis: A tutorial with r,
JAGS, and stan. Academic Press.
McElreath, R. (2018). Statistical rethinking: A bayesian course with
examples in r and stan. Chapman; Hall/CRC.
Makowski, D., Ben-Shachar, M. S., & Lüdecke, D. (2019). bayestestR:
Describing Effects and their Uncertainty, Existence and Significance
within the Bayesian Framework. Journal of Open Source Software, 4(40),
1541. https://doi.org/10.21105/joss.01541
Lilienfeld, S. O., Sauvigné, K. C., Lynn, S. J., Cautin, R. L., Latzman,
R. D., & Waldman, I. D. (2015). Fifty psychological and psychiatric
terms to avoid: A list of inaccurate, misleading, misused, ambiguous,
and logically confused words and phrases. Frontiers in Psychology, 6,
1100. https://doi.org/10.3389/fpsyg.2015.01100