class: center, middle, inverse, title-slide # Beyond significant ## A practical introduction to Bayesian data anlysis ### Andrew Ellis ### 2019-02-04 --- # Overview - A practical guide to current Bayesian data analysis methods + setting up a Bayesian model + parameter estimation + model comparison: Bayes factors and posterior predictive checking. The purpose of this talk is to familiarize the audience with Bayesian thinking and to demonstrate that with current software, Bayesian analysis is no longer restricted to a small number of expert statisticians. - Slides are available online: ``` rpubs.com/awellis/beyond-significant or bit.ly/2RCrKd0 ``` - On Dropbox: `https://bit.ly/2RyVPdr` --- # To set the mood .pull-left[  ] .pull-right[  ] --- # IQ example .pull-left[ Summary: Two groups of people took an IQ test. Group 1, `\(N_1=47\)`, consumes a "smart drug", and Group 2, `\(N_2=42\)`, is a control group that consumes a placebo <a name=cite-kruschkeBayesianEstimationSupersedes2013></a>([Kruschke, 2013](https://doi.org/10.1037/a0029146)). ] .pull-right[ <!-- --> ] --- # T-Test ```r Two Sample t-test data: IQ by Group * t = -1.5587, df = 87, p-value = 0.1227 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.544155 0.428653 sample estimates: mean in group Placebo mean in group SmartDrug 100.3571 101.9149 ``` ```r Welch Two Sample t-test data: IQ by Group * t = -1.6222, df = 63.039, p-value = 0.1098 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.4766863 0.3611848 sample estimates: mean in group Placebo mean in group SmartDrug 100.3571 101.9149 ``` ??? Show example in JASP --- # Problems with NHST - American Statistical Association (ASA) released a statement about p-values <a name=cite-lazarASAStatementPValues2016></a>([Lazar, 2016](https://doi.org/10.1080/00031305.2016.1154108)). Among the principles are: + P-values can indicate how incompatible the data are with a specified statistical model. + P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. - <a name=cite-greenlandStatisticalTestsValues2016></a>[Greenland, Senn, Rothman, Carlin, Poole, Goodman, and Altman (2016)](https://doi.org/10.1007/s10654-016-0149-3) provide a good discussion of common misinterpretations of p values and confidence intervals. + A confidence interval does not have a 95% chance of containing the true parameter. ??? the 95% refers only to how often 95% confidence intervals computed from very many studies would contain the true size if all the assumptions used to compute the intervals were correct. These further assumptions are summarized in what is called a prior distribution, and the resulting intervals are usually called Bayesian posterior (or credible) intervals to distinguish them from confidence intervals --- # The Bayesian New Statistics - <a name=cite-cummingNewStatisticsWhy2014></a>[Cumming (2014)](https://doi.org/10.1177/0956797613504966) claim that "we need to shift from reliance on NHST to estimation and other preferred techniques. - <a name=cite-kruschkeBayesianNewStatistics2018></a>[Kruschke and Liddell (2018)](https://doi.org/10.3758/s13423-016-1221-4) advocate that Bayesian methods are better suited to achieve this, for both hypothesis testing and parameter estimation. - According to <a name=cite-gigerenzerStatisticalRitualsReplication2018></a>[Gigerenzer (2018)](https://doi.org/10.1177/2515245918771329), we need to stop relying on NHST, but instead learn to use a statistical toolkit. - Many reviewers now demand Bayes factors (because a BF can provide evidence for/against hypotheses). - Bayesian data analysis is not limited to calculating Bayes factors. ??? - Claim: Bayesian approaches are more direct, more intuitive and more informative than frequentist approaches. - Bayes factors are controversial --- # Bayesian methods .pull-left[ 🤗 - more intuitive (quantification of uncertainty) - able to provide evidence for/against hypotheses - more flexible + cognitive process models <a name=cite-leeBayesianCognitiveModeling2013></a>([Lee and Wagenmakers, 2013](#bib-leeBayesianCognitiveModeling2013)) + robust models - can include prior knowledge - better for multilevel models <a name=cite-gelmanDataAnalysisUsing2006></a>([Gelman and Hill, 2006](https://doi.org/10.1017/CBO9780511790942)) - based on probability theory (Bayes theorem) ] .pull-right[ 😧 - requires computing power - setting priors requires familiarity with probability distributions - (ongoing discussion about parameter estimation vs. hypothesis testing. See e.g. [here](https://statmodeling.stat.columbia.edu/2017/05/04/hypothesis-testing-hint-not-think/) and [here](https://statmodeling.stat.columbia.edu/2011/04/02/so-called_bayes/).) ] --- # Some theory We will have a brief look at the theoretical background, then dive straight into a practical example. .pull-left[ ## Key ideas: - Parameters are random variables<sup>1</sup>. These are drawn from probability distributions, which reflect our uncertainty about the parameters. - The prior distribution is updated with the likelihood (data) to obtain a posterior distribution. ] .pull-right[ <!-- --> ] .footnote[ [1] In contrast to frequentist analysis, in which they are fixed. ] --- # Bayesian workflow  ??? - prior predictive distribution:the marginal distribution of the data over the prior --- # Bayesian workflow  ??? A posterior predictive p-value is a the tail posterior probability for a statistic generated from the model compared to the statistic observed in the data. --- # Bayesian model comparison  --- # A hands-on example To make this more concrete, we return to the IQ example. - Our null hypothesis: the smart drug group have higher IQ scores than the placebo group. - We obtained a p-value of 0.0548769 (Welch test) - Even after removing outliers, p is > 0.05 **Solution: Bayes factor to the rescue?** -- - In this case, NO. The data provide little evidence either for or against our hypothesis. - The maximally attainable Bayes factor in favour of our hypothesis is `\(\sim 1.8\)`. - A further benefit of going Bayesian is the ability to flexibly specify our generative model. We can make the model [robust against outliers](https://solomonkurz.netlify.com/post/robust-linear-regression-with-the-robust-student-s-t-distribution/). ??? Show IQ example in JASP --- # T-Test as linear model A t-test is really just a general linear model<sup>1</sup>: $$ Y = \alpha + \beta X + \epsilon$$ $$ \epsilon \sim N(0, \sigma^2) $$ where `\(X\)` is an indicator variable. .f-.footnote[ [1] This assumes equal variances. ] ```r fit_ols <- lm(IQ ~ Group, data = TwoGroupIQ) ``` which can be read as: $$ IQ = Placebo + \beta \cdot SmartDrug + \epsilon$$ $$ \epsilon \sim N(0, \sigma^2) $$ The `\(\beta\)` parameter therefore represents the difference between groups. --- # Generative model .pull-left[ - Linear regression model as a probabilistic model - The graph on the right shows the data-generating process (dependencies among the random variables). - `\(\alpha\)` is our expectation for the placebo group, `\(\beta\)` is our expectation for the difference in means. `\(\sigma\)` is the variance of the outcome. ] .pull-right[  ] --- # Software .pull-left[ Probabilistic programming languages - Winbugs - Jags - [Stan](https://mc-stan.org): Probabilistic programming language - [PyMC3](https://docs.pymc.io): Probabilistic programming in Python - [Turing](http://turing.ml): Probabilistic programming language with an intuitive modelling interface. ] .pull-right[ R interface / GUI - [brms](https://github.com/paul-buerkner/brms): R package for Bayesian generalized multivariate non-linear multilevel models using Stan - [rstanarm](http://mc-stan.org/rstanarm/): Bayesian applied regression modeling via Stan - [JASP](https://jasp-stats.org): GUI for simple models (frequentist and Bayesian) ] --- # Inference ## Stan .pull-left[ ``` data { int<lower=0> N; vector[N] x; vector[N] y; } parameters { real alpha; real beta; real<lower=0> sigma; } model { sigma ~ cauchy(0, 2.5); alpha ~ normal(100, 15); beta ~ normal(0, 10); y ~ normal(alpha + beta * x, sigma); } ``` ] .pull-right[  ] --- # Inference ## brms ```r fit_eqvar <- brm(IQ ~ Group, data = TwoGroupIQ, file = here::here("models/fitiq-eqvar")) ``` ```r Family: gaussian Links: mu = identity; sigma = identity Formula: IQ ~ Group Data: TwoGroupIQ (Number of observations: 89) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 100.36 0.72 98.91 101.75 4083 1.00 * GroupSmartDrug 1.56 1.00 -0.37 3.56 4039 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 4.76 0.36 4.12 5.53 3504 1.00 ``` --- # Estimated parameter .pull-left[ We obtain the marginal posterior distribution of the regression parameter `\(\beta\)`<sup>1</sup> ] .pull-right[ <!-- --> ] .footnote[ [1] This is a distribution, not a confidence interval. ] --- # Criticize model (posterior predictive check) .pull-left[ We can draw samples from the posterior predictive distribution. The model predicts equal variances, and no outliers. ] .pull-right[ <!-- --> ] --- # Model revision The Gaussian distribution is sensitive to outliers. We can model our data as being generated by a T distribution instead. ```r fit_robust <- brm(bf(IQ ~ 0 + Group, sigma ~ Group), family = student, data = TwoGroupIQ, prior = c(set_prior("normal(100, 10)", class = "b"), set_prior("cauchy(0, 1)", class = "b", dpar = "sigma"), set_prior("exponential(1.0/29)", class = "nu")), cores = parallel::detectCores(), file = here::here("models/fitiq-robust")) ``` --- # Robust regression ```r Family: student Links: mu = identity; sigma = log; nu = identity Formula: IQ ~ 0 + Group sigma ~ Group Data: TwoGroupIQ (Number of observations: 89) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma_Intercept 0.01 0.19 -0.37 0.38 4343 1.00 GroupPlacebo 100.52 0.21 100.12 100.93 4312 1.00 GroupSmartDrug 101.55 0.36 100.85 102.24 4262 1.00 sigma_GroupSmartDrug 0.61 0.25 0.10 1.11 3800 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat nu 1.74 0.45 1.10 2.81 3554 1.00 ``` --- .pull-left[ Predicted marginal means <!-- --> ] .pull-right[ <!-- --> ] ``` ## Hypothesis Tests for class b: ## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio ## 1 (GroupSmartDrug)-... > 0 1.03 0.41 0.34 Inf 101.56 ## Post.Prob Star ## 1 0.99 * ## --- ## '*': The expected value under the hypothesis lies outside the 95%-CI. ## Posterior probabilities of point hypotheses assume equal prior probabilities. ``` --- ## Posterior predictive distribution .pull-left[ Equal variance model <!-- --> ] .pull-right[ Robust model <!-- --> ] --- # Bayes factor We can compute Bayes factors using different two methods: - Savage-Dickey density ratio - Bridge sampling (used to obtain the marginal likelihood) - We are comparing the model that includes the grouping variable with a restricted model. - For a more detailed description of Bayes factors, see [here](http://rpubs.com/awellis/bayes-factor). - Bayes factors are difficult to compute, often only possible for certain restricted models. .pull-left[ Savage Dickey density ratio: ``` ## [1] 2.935822 ``` ```r hypothesis(fit_robust_bf, hypothesis = 'GroupSmartDrug = 0') ``` ] .pull-right[ Bridge sampling: ```r bayes_factor(fit_robust_bridge, fit_robust_bridge_null) ``` ``` ## [1] 3.458627 ``` ] --- # Bayes factors: summary .pull-left[ - We noted that the normal model does not describe the data well. We modelled the outcome `\(y\)` as being drawn from a t distribution. This accounts for *outliers*. - Posterior predictive check: the data are well described by this model. - Parameter estimation: the parameter representing the group differnce is positive, with a 95% credible interval of [0.23, 1.85]. - Bayes factor in favour of our hypothesis lies between 2.94 and 3.51, depending on the method used. ] .pull-right[ Is this b-hacking? - This is exploratory. - We did not remove *outliers*. - We did not test more subjects. - Computing a BF requires carefully specifying prior distributions, and these need to be reported. ] --- # Further reading - **Statistical Rethinking** (an introduction to applied Bayesian data analysis with lots of example code): [book website](https://xcelab.net/rm/statistical-rethinking/) and [lectures on Youtube](https://www.youtube.com/playlist?list=PLDcUM9US4XdNM4Edgs7weiyIguLSToZRI) - [brms](https://github.com/paul-buerkner/brms): R package for Bayesian generalized multivariate non-linear multilevel models using Stan - [Blog post by Jonas Lindeloev](https://rpubs.com/lindeloev/bayes_factors) on how to compute Bayes factors using various methods. - [Blog post by Matti Vuorre](https://vuorre.netlify.com/post/2017/01/02/how-to-compare-two-groups-with-robust-bayesian-estimation-using-r-stan-and-brms/#describing-the-models-underlying-the-t-tests) on how to perform a Bayesian t-test using brms - [Blog post by A. Solomon Kurz](https://solomonkurz.netlify.com/post/robust-linear-regression-with-the-robust-student-s-t-distribution/) on how to perform robust regression. - [Blog post by Andrew Heiss](https://www.andrewheiss.com/blog/2019/01/29/diff-means-half-dozen-ways/): Half a dozen frequentist and Bayesian ways to measure the difference in means in two groups. --- # Message to PhD students Read Statistical Rethinking and `install.packages("brms")`. --- # References <a name=bib-cummingNewStatisticsWhy2014></a>[Cumming, G.](#cite-cummingNewStatisticsWhy2014) (2014). "The New Statistics: Why and How". En. In: _Psychological Science_ 25.1, pp. 7-29. ISSN: 0956-7976. DOI: [10.1177/0956797613504966](https://doi.org/10.1177%2F0956797613504966). <a name=bib-gelmanDataAnalysisUsing2006></a>[Gelman, A. and J. Hill](#cite-gelmanDataAnalysisUsing2006) (2006). _Data Analysis Using Regression and Multilevel/Hierarchical Models_. En. Cambridge University Press. ISBN: 978-0-511-79094-2. DOI: [10.1017/CBO9780511790942](https://doi.org/10.1017%2FCBO9780511790942). <a name=bib-gigerenzerStatisticalRitualsReplication2018></a>[Gigerenzer, G.](#cite-gigerenzerStatisticalRitualsReplication2018) (2018). "Statistical Rituals: The Replication Delusion and How We Got There". En. In: _Advances in Methods and Practices in Psychological Science_ 1.2, pp. 198-218. ISSN: 2515-2459. DOI: [10.1177/2515245918771329](https://doi.org/10.1177%2F2515245918771329). <a name=bib-greenlandStatisticalTestsValues2016></a>[Greenland, S, S. J. Senn, K. J. Rothman, et al.](#cite-greenlandStatisticalTestsValues2016) (2016). "Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations". En. In: _European Journal of Epidemiology_ 31.4, pp. 337-350. ISSN: 1573-7284. DOI: [10.1007/s10654-016-0149-3](https://doi.org/10.1007%2Fs10654-016-0149-3). <a name=bib-kruschkeBayesianEstimationSupersedes2013></a>[Kruschke, J. K.](#cite-kruschkeBayesianEstimationSupersedes2013) (2013). "Bayesian Estimation Supersedes the t Test." En. In: _Journal of Experimental Psychology: General_ 142.2, pp. 573-603. ISSN: 1939-2222, 0096-3445. DOI: [10.1037/a0029146](https://doi.org/10.1037%2Fa0029146). --- <a name=bib-kruschkeBayesianNewStatistics2018></a>[Kruschke, J. K. and T. M. Liddell](#cite-kruschkeBayesianNewStatistics2018) (2018). "The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective". En. In: _Psychonomic Bulletin & Review_ 25.1, pp. 178-206. ISSN: 1531-5320. DOI: [10.3758/s13423-016-1221-4](https://doi.org/10.3758%2Fs13423-016-1221-4). <a name=bib-lazarASAStatementPValues2016></a>[Lazar, N. A.](#cite-lazarASAStatementPValues2016) (2016). "The ASA's Statement on p-Values: Context, Process, and Purpose AU - Wasserstein, Ronald L." In: _The American Statistician_ 70.2, pp. 129-133. ISSN: 0003-1305. DOI: [10.1080/00031305.2016.1154108](https://doi.org/10.1080%2F00031305.2016.1154108). <a name=bib-leeBayesianCognitiveModeling2013></a>[Lee, M. D. and E. Wagenmakers](#cite-leeBayesianCognitiveModeling2013) (2013). _Bayesian Cognitive Modeling: A Practical Course_. OCLC: ocn861318341. Cambridge ; New York: Cambridge University Press. ISBN: 978-1-107-01845-7 978-1-107-60357-8.