R statistical software (R Core Team 2016) is designed for statistical computation and graphics and is popular among statisticians and scientific researchers. The software is free and open source, meaning anyone can download the software for personal use, and anyone can contribute packages to be used with the software. To make the model averaged posteriors plot accessible to as many users of model averaging as possible, we wrote the MApp package in the R programming language. MApp was developed to support work presented in (Banner and Higgs 2016), which discusses model averaging of regression coefficients for linear regression models. We provide a brief summary of model averaging in .
Model averaging is advertised as a tool for “accounting for model uncertainty”" in analyses. To conduct model averaging, a set of models is considered (\(\mathcal{M} = \{M_1, M_2, ..., M_J\}\), termed the model set). When considering model averaging in the linear regression framework, all first order combinations of the explanatory variables, or all-subsets, is a common model set to consider. For example, consider \(p\) potential explanatory variables, the all-subsets model set has size \(J = 2^p\) models, defined by models where each of the \(p\) variables is either in or out of a particular model.
Once the model set is defined, parameters of interest and posterior model probabilities, or weights based on information criteria such as AIC or BIC are calculated for all models in \(\mathcal{M}\). The probabilities, or weights, are used as weights in a weighted average to combine information about quantities of interest across all models considered. Banner and Higgs (2016) and Hoeting et al. (1999) provide more information about Bayesian model averaging, and Burnham and Anderson (2002) describe model averaging based on information criteria.
Model averaging was developed in the context of prediction and its use has recently been extended to partial regression coefficients. Inference drawn from the latter application of typically results in explanations of “overall effects” for each of the coefficients of interest, presenting a unique question; what does “overall effect” mean in the context of the research question? In the presence of multicollinearity, the interpretations of partial regression coefficients are not necessarily constant across models, making “overall effects” difficult to understand. Similarly, the posterior distributions for partial regression coefficients also take careful thought. Variables associated with partial regression coefficients can be in or out of a linear regression model. Excluding a variable from a linear regression model is equivalent to setting its associated partial regression coefficient to exactly zero. Therefore, the total probability of the posterior distribution of a model averaged partial regression coefficient is made up of two components, one coming from the models where the coefficient is set to zero (i.e., variable excluded from the model), the other coming from the models where the coefficient is estimated (i.e., variable included in the model).
The MApp package provides the user with graphical tools for understanding what goes into model averaged partial regression coefficients so their appropriateness may be assessed on a case-by-case basis. Banner and Higgs (2016) contains a detailed discussion of potential issues with, and considerations for assessing common inferences drawn from model averaging partial regression coefficients.
There are many approaches for conducting model averaging including: self programmed reversible jump Markov chain Monte Carlo (MCMC) samplers; implementations in WinBUGS (Lunn et al. 2000), described in Link and Barker (2006); and methods based on information criteria, described in Burnham and Anderson (2002). To accommodate as many users of model averaging as possible, we designed our plotting functions to work with posterior draws from existing software in R (R Core Team 2016) (such as BMS), self-coded reversible jump MCMC samplers, and AIC & BIC based model averaging.
MApp must be installed from GitHub or loaded from a local disk. The package, devtools (Wickham and Chang 2016) is necessary for both installations. To install devtools, run install.packages("devtools") in the console.
GitHub (https://github.com) is an online service for hosting Git repositories. Git (https://git-scm.com) is a version control software used primarily by developers for maintaining code. You do not need to be a Git user or have a registered GitHub account to use MApp, follow these two steps:
MApp from GitHub.# install the package
library(devtools)
install_github("kbanner14/MApp-Rpackage", subdir = "MApp")
# load the package
library(MApp)MApp unload the package with unload(inst("MApp")) and repeat step 1. Note that both functions, inst and unload, are in the devtools package.MApp uses functions from the following packages: LearnBayes (Albert 2014), beanplot (Kampstra 2008), magrittr (Bache and Wickham 2014), dplyr (Wickham and Francois 2015), and BMS (Feldkircher and Zeugner 2009). These packages will be loaded automatically with a call to install_github.
Another option for installing MApp is to load it from a local directory. Follow these four steps for this type of installation.
Download all files in the MApp-Rpackage/ repository from the Download Zip button on the right hand side of the screen here: https://github.com/kbanner14/MApp-Rpackage. Git users may also clone the repository.
MApp repository on your computer.Run the following lines of code to load the package and its documentation.
# load the pckage
devtools::load_all(".")
# load the R documentation files
devtools::document(".")LearnBayes (Albert 2014), beanplot (Kampstra 2008), magrittr (Bache and Wickham 2014), dplyr (Wickham and Francois 2015), and BMS (Feldkircher and Zeugner 2009). Use install.packages("packagename")MApp plotting functionsThe MApp package provides plotting functions to help researchers dissect and understand common inferences drawn from model averaging partial regression coefficients. There are five major plotting functions:
MApp_bms works with bma objects obtained from the bms function in the BMS package. Details about the bms function can be found in the BMS tutorial here: http://bms.zeugner.eu/tutorials/bms.pdf. This function returns the MAP plot and a table of posterior standard deviations.MApp_IC works with a data frame and conducts AIC, AICc, or BIC based model averaging for all-subsets regression. This function returns the MAP plot and Model averaged results.MApp_IC_large works with results from model averaging based on Information criteria. Requires the user to have the results first, but provides more flexibility than MApp_IC. This function returns the MAP plot.MApp_MCMC works with default output from the implementation of model averaging using the OpenBUGS or self programmed RJMCMC samplers.MApp_list works with draws from self programmed RJMCMC samplers that store results from each model in a list. This function returns the MAP plot and a table of posterior standard deviations.Documentation exists for all functions in MApp. Use the ? or help to view the documentation.
? MApp_bms MApp package databfat A dataset containing percent body fat measurements, and 13 other body size measurements for 251 men. These data were originally collected by Johnson (1996), and were used by Hoeting et al. (1999) in their hallmark paper, Bayesian Model avearging, A Tutoiral.brainData A dataset containing the log brain weight log(g), log body weight log(kg), log gestation length log(days), and litter size (# offspring) for 96 species of mammals. These data were modified from case0902 in the R package Sleuth2 (Ramsey et al. 2012).MApp_bms functionMApp_bms is designed to be used with the bms function in the BMS package (Feldkircher and Zeugner 2009). The bms function executes a popular implementation of model averaging by placing Zellner’s \(g\)-prior (Bernardo et al. 1980) on the regression coefficients, a user specified prior on the model set, and a user specified value for the hyperparameter, \(g\). This implementation results in analytical posterior distributions for the regression coefficients in the form of \(t\) distributions, allowing for quick computations. Detailed information about the \(g\)-prior implementation and its resulting posteriors is provided in Banner and Higgs (2016) (Supplement S1).
The function bms takes on a data frame and prior information and returns model averaged results (PMPs, and means and standard deviations of the model averaged distributions for each partial regression coefficient). The function also saves useful information about the models considered, namely the central moments for the posterior distributions of the partial regression coefficients. We use this information to obtain the exact \(t\) distributions for all partial regression coefficients in each model considered. Specifically, we use analytical results (Banner and Higgs (2016), Supplement S1). to find the appropriate degrees of freedom, and the central moments from bms to compute means and variances for each \(t\) distribution. MApp_bms uses this information to obtain a specified number of Monte Carlo draws from posterior distributions of the partial regression coefficients for each individual model. Then, using the posterior model probabilities, MApp_bms samples proportionally from all individual posteriors to create a model averaged posterior distribution for each partial regression coefficient of interest. Therefore, MApp_bms obtains draws from all distributions going into the model averaged result and displays this information using density and rug plots, or beanplots, from the beanplot package (Kampstra 2008) using a Gaussian smoothing kernel with default bandwidth nrd0. The bandwidth is set according to Silverman’s ‘rule of thumb’ (see the text Silverman B. W. (1986). Density Estimation, page 48, eqn (3.31)).
In this example we use brainData, which is comprised of average values of log gestation length log(days), litter size (number of offspring), log body size log(kg) and log brain weight log(g) for 96 species of mammals. The number of animals going into the average values for each species was not provided.
data(brainData)
head(brainData)We suppose the researchers are interested in the relationship between log brain weight and log gestation length for all 96 mammal species. This relationship can be addressed directly with a simple linear regression model regressing log brain weight on log gestation length. Now we assume the researchers are asked to use model averaging to “account for model uncertainty,” and we show how the MAP plot can be used to understand the results of model averaging the partial regression coefficient associated with log gestation length, and potentially defend the researchers’ choice to use simple linear regression to address their question of interest.
| lbrain | lbody | lgest | litter |
|---|---|---|---|
| 2.8622009 | 1.2527630 | 3.258097 | 1.0 |
| 1.2527630 | -0.0725707 | 3.526361 | 4.6 |
| 1.1474025 | -1.8971200 | 3.828641 | 3.0 |
| 0.1310283 | -3.0159350 | 3.931826 | 1.5 |
| 0.3148107 | -2.7488722 | 3.828641 | 1.5 |
We can use the bms function from the BMS package to obtain a bma object.
library(BMS)
brain <- bms(brainData, g = "UIP",
mprior = "uniform",
user.int = F)
class(brain)## [1] "bma"
We use bms to place Zellner’s \(g\)-prior on the coefficients, and we specify a discrete uniform prior on the \(2^3=\) 8 models in the all-subsets model set. MApp_bms takes on two arguments that do not have default values: x and plot_wind. For this example, we specify x = bma-object, and plot_wind = c(1,3), which tells the function to split the plotting window up into one row with three columns. We can add specific model names in place of the default \(M_1, M_2,..., M_8\) with the mod_names argument.
map_brain <- MApp_bms(x = brain, plot_wind = c(1,3),
mod_names = c("BGL", "BG", "BL", "B", "G", "GL", "L", "Null"))MAP plot for the partial regression coefficients associated with each explanatory variable in the brainData dataset
The model averaged posteriors plot displays both components of the model averaged posterior distribution. The continuous component is represented by the black density and rug plot (beanplot), and the zero component is represented by text providing the probability of the point mass at zero. The dashed line extending up from the beanplot shows the mean of the model averaged posterior distribution, which incorporates the zero component; the larger the point mass at zero, the more the line will be pulled towards zero.
The MAP plot has three panels, one for each of the explanatory variables considered (log gestation length (\(lgest\)), log body size (\(lbody\)), and litter size (\(litter\))). The posterior distributions of all partial regression coefficients for log body size, along with the posterior distribution of the model averaged partial regression coefficient associated with log body size (black beanplot and text) are shown in the first (leftmost) panel of Figure . In this case, all models excluding log body size had such small posterior model probabilities that their sum is arbitrarily close to zero. That is, the point mass at zero is so small that it is essentially negligible and the posterior distribution for the model averaged partial regression coefficient from \(\mathcal{M}\) (all models) is essentially equivalent to what would result from conditioning on the subset of models including log body size. So, for log body size, the continuous component of the posterior distribution of the model averaged partial regression coefficient gives essentially the same results as the model averaged posterior distribution considering both components.
To contrast this, we highlight the panel summarizing the partial regression coefficient for litter (far right). One of the models excluding litter (Model BG) has a relatively large posterior model probability of 0.343, and the other two models excluding litter (Models G and B) have negligible posterior model probabilities. Therefore, the zero-component of the posterior distribution of the model averaged partial regression coefficient associated with litter is the sum over the posterior model probabilities for models G, B and GB, which is approximately 0.343. The mean line is pulled away from the center of the continuous component (black beanplot) toward zero, showing the effect of the point mass at zero. Conditioning on models including litter, the model averaged posterior distribution is essentially a weighted average of the model with all three explanatory variables (with weight 0.569) and the model with log body and litter (with weight 0.088). Therefore, in this case we have one model contributing to the zero component, and two models contributing (very unevenly) to the continuous component of the model averaged distribution. This type of information is important for assessing the appropriateness of model averaged regression coefficient, and it is all easily discerned from the MAP plot.
A Row across panels displays the posterior distributions of partial regression coefficients associated with explanatory variables considered in a particular model; the model name is shown in the margin and its posterior model probability is shown on the \(y\)-axis. In this example, the first row is associated with the model containing only litter (Model L, second row from the top). The posterior model probability for this model is so small that rounded to three decimal places it is 0.000. The posterior distribution for the coefficient associated with \(litter\) in Model L is shown in the panel for \(litter\), and the panels for \(lgest\) and \(lbody\) do not display posterior distributions because those explanatory variables were set explicitly to zero in Model L. All other rows are read in the same fashion. Comparing rows facilitates comparisons of posteriors for coefficients and model probabilities across models. The rows are ordered by increasing posterior model probability to facilitate easy comparisons between posteriors for partial regression coefficients from models contributing most to the model averaged results and their corresponding model averaged posterior distribution (displayed in the bottom row).
The MAP plot also returns a table of posterior standard deviations for the model averaged result and the individual models displayed below.
map_brain| MA | MA_cts | BGL | BG | BL | B | G | GL | L | Null | |
|---|---|---|---|---|---|---|---|---|---|---|
| lbody | 0.0447 | 0.0447 | 0.0363 | 0.0362 | 0.0226 | 0.0215 | — | — | — | — |
| lgest | 0.2128 | 0.1726 | 0.1484 | 0.1228 | — | — | 0.1166 | 0.1755 | — | — |
| litter | 0.0703 | 0.0517 | 0.0475 | — | 0.038 | — | — | 0.0869 | 0.1077 | — |
| PMP | — | — | 0.569 | 0.3433 | 0.0877 | 0 | 0 | 0 | 0 | 0 |
A common motivation for model averaging is to account for model uncertainty, which is often equated to increased variances for quantities of interest. However, it is unclear whether this translation of model uncertainty into increased variances is realized for partial regression coefficients. We also include a table of posterior standard deviations with MAP to facilitate comparisons for particular examples.
We return to the original question of interest stated at the beginning of the example. The researchers were specifically interested in the relationship between mean brain weight and gestation length over all mammals (ignoring body size and litter size). The regression coefficient representing this relationship comes from the model with log gestation length as the only explanatory variable (Model G, third row in MAP plot above). We observe two things in the MAP plot above: (1) the model of interest has such a small posterior model probability that it is essentially ignored in the model averaging, and (2) the posterior distribution for the partial regression coefficient associated with log gestation length in the model of interest is clearly shifted to the right compared to the posterior distributions of the partial regression coefficients with large enough posterior model probabilities to contribute largely to the model averaging. That is, the posteriors contributing to the model averaged result for log gestation length and the posterior for the partial regression coefficient addressing the question of interest tell very different stories.
We easily see from the MAP plot, that the total posterior probability for models that exclude log gestation length is about 0.088. The component of the model averaged posterior distribution conditional on log gestation length being in the model (continuous component) is essentially a weighted average of the posterior distributions from models GBL and GB: \(p(\beta_{3.MA}|y, \mathcal{M}^*_L) = 0.569 p({\beta}_{3.gbl}|y) + 0.343 Pr(\beta_{3.bl}|y)\). Although we can see in the MAP plot above that the two posterior distributions contributing to the continuous component of the model averaged distribution are similar, the result is a mixture of two coefficients with different meanings: one conditional on log body size alone, and one conditional on both log body size and litter size. As we mentioned above, neither of these directly address the relationship of interest. We have seen researchers interpret model averaged coefficients as “overall effects” of their associated explanatory variables. However, this is technically incorrect and potentially very misleading. Not only is the interpretation difficult, but the model averaged distribution for the coefficient associated with log gestation length is essentially a weighted average of posterior distributions from only \(two\) of the \(eight\) models. While information like this is extremely important, it is not typically reported or discussed and the MAP plot tells a clear visual story to convey this information.
This example, though simple, illustrates the complexities of model averaging regression coefficients, and demonstrates model averaging leads to tough decisions of its own, just like defending the choice of one model. Such decisions should be accompanied by a thorough description of the main components going into the model averaging. Whether the researcher chooses to condition inference on one model or use an approach like model averaging, the decisions and justifications along the way should be driven by the research question and the intended use of the results. When a specific question is well defined, an appropriate way to address that question can be to build a regression model driven by the question itself. Building such a model requires careful thought effort to justify, making it tempting to elicit some sort of automatic approach to find the “best” model or combine results over multiple models. Something we often lose sight of when building models is that the “best” model, whether it be based on AIC, \(p\)-values or some other model selection criteria may not be consistent with addressing the original question. So, while model averaging may hold the allure of a decision-free model building technique (just use all of them), we must consider whether the result of the procedure leaves us with the ability to address the relationships we set out to investigate in the first place. We urge researchers to think hard about what research questions they are investigating and whether “accounting for model uncertainty” is worth the potential loss in interpretation and ability to address the question(s) of interest. We present the MAP plot as a way to help with this goal.
MApp_bms with Large Model SetsWe will use the bfat data which contain 13 body fat measurements from 251 men to demonstrate the flexibility of our plotting function for problems with a larger set of potential predictors.
First, we will take a look at these data and create a bma object, bfat_bms, using the bms function (again, we chose a uniform prior on the model space and Zellner’s \(g\)-prior Zellner (1984) with \(g = n\)).
data(bfat)
head(bfat)| bf.brozek | age | weight | height | neck | chest | abdomen | hip | thigh | knee | ankle | ex.bicep | forearm | wrist |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12.6 | 23 | 154.25 | 67.75 | 36.2 | 93.1 | 85.2 | 94.5 | 59.0 | 37.3 | 21.9 | 32.0 | 27.4 | 17.1 |
| 6.9 | 22 | 173.25 | 72.25 | 38.5 | 93.6 | 83.0 | 98.7 | 58.7 | 37.3 | 23.4 | 30.5 | 28.9 | 18.2 |
| 24.6 | 22 | 154.00 | 66.25 | 34.0 | 95.8 | 87.9 | 99.2 | 59.6 | 38.9 | 24.0 | 28.8 | 25.2 | 16.6 |
| 10.9 | 26 | 184.75 | 72.25 | 37.4 | 101.8 | 86.4 | 101.2 | 60.1 | 37.3 | 22.8 | 32.4 | 29.4 | 18.2 |
| 27.8 | 24 | 184.25 | 71.25 | 34.4 | 97.3 | 100.0 | 101.9 | 63.2 | 42.2 | 24.0 | 32.2 | 27.7 | 17.7 |
# create bma object using bfat data
bfat_bms <- bms(bfat, mprior = "uniform", g = "UIP", user.int = F)If we consider all potential first-order combinations of the predictors, there are \(2^{13} =\) 8192 potential models in the model set. By default a bma object generated by bms will store weights and posterior summary measures for the top 500 models considered. Using the closed form posterior results for model averaging with a \(g\)-prior on the regression coefficients, MApp_bms will obtain a specified number of draws from the analytical posterior distributions for all models stored in the bma object and will create a model averaged distribution by sampling from these individual posteriors with normalized probabilities to retain proportionality from model averaging including all models. The proportion of posterior model mass accounted for by the top 500 models should be checked to ensure the approximate model averaged distribution created by MApp_bms is reasonable. In this case, the top 500 models accounted for approximately 97% of the posterior mass and we can be confident that the approximate model averaged distributions will be very similar to the model averaged distributions when all models are considered.
# see how much of the posterior model mass is accounted for by the top 500
# models
sum(pmp.bma(bfat_bms)[,1])## [1] 0.9749476
Because plotting 500 models simultaneously would make MAP plot impossible to read, we can tell MApp_bms to display a subset of those models using the max_display argument. We can also specify which coefficient to include in the model averaged posteriors plot with the include_coef argument. If max_display and include_coef are left unspecified, the function will display the 20 models with the largest PMP.
map_bfat <- MApp_bms(bfat_bms, plot_wind = c(1,3),
include_coef = c(13, 2, 4),
max_display = 10,
num_sims = 1000)map_bfat| MA | MA_cts | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | M10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wrist | 0.6694 | 0.4594 | 0.4128 | 0.4172 | 0.397 | — | 0.4367 | 0.4729 | 0.4219 | — | 0.4244 | 0.4195 |
| weight | 0.0407 | 0.0332 | 0.0234 | 0.0219 | 0.025 | 0.0194 | 0.0236 | 0.0283 | 0.0255 | 0.0223 | 0.0293 | 0.0246 |
| neck | 0.1887 | 0.2194 | — | — | — | — | 0.2085 | — | — | 0.1899 | — | — |
| PMP | — | — | 0.1636 | 0.0788 | 0.0463 | 0.0376 | 0.034 | 0.0246 | 0.0235 | 0.0206 | 0.0195 | 0.0164 |
It is also common to obtain approximate posterior model probabilities (model weights) from certain information criterion (most commonly: BIC, AICc or AIC). A version of the MAP plot can be created for comparing confidence intervals for partial regression coefficients across the models in the model set.
MApp_ICMApp_IC generates results from approximations to posterior model probabilities (PMPs) based on two user specifications: (1) the type of information criterion, and (2) the method for model averaging (conditional on models allowing the coefficient to be estimated vs. using all models in the model set). The default for MApp_IC conducts BIC based model averaging (type = "BIC") and considers all models in the model set (w_plus = FALSE). Here, we make the MAP plot for the brainData with the default specifications.
## Subset of models used in MA: All models in the model set
## Type of information criteria used in MAP plot: BIC
| Model | AICc_pmp | BIC_pmp | AICc | BIC |
|---|---|---|---|---|
| MA_AICc | NA | NA | NA | NA |
| MA_BIC | NA | NA | NA | NA |
| M1 | 0.895 | 0.368 | 136.310 | 148.465 |
| M2 | 0.088 | 0.318 | 140.947 | 150.765 |
| M3 | 0.017 | 0.259 | 144.271 | 154.089 |
| M4 | 0.000 | 0.055 | 171.447 | 178.880 |
| M5 | 0.000 | 0.000 | 274.682 | 282.114 |
| M6 | 0.000 | 0.000 | 275.600 | 285.418 |
| M7 | 0.000 | 0.000 | 381.491 | 388.924 |
| M8 | 0.000 | 0.000 | 424.408 | 429.407 |
| Model | Est_Int | Est_lbody | Est_lgest | Est_litter |
|---|---|---|---|---|
| MA_AICc | 0.746 | 0.574 | 0.452 | -0.102 |
| MA_BIC | 1.039 | 0.597 | 0.375 | -0.092 |
| M1 | 0.823 | 0.575 | 0.440 | -0.110 |
| M2 | -0.457 | 0.551 | 0.668 | 0.000 |
| M3 | 2.917 | 0.658 | 0.000 | -0.197 |
| M4 | 2.332 | 0.719 | 0.000 | 0.000 |
| M5 | -6.665 | 0.000 | 2.234 | 0.000 |
| M6 | -7.528 | 0.000 | 2.371 | 0.094 |
| M7 | 5.622 | 0.000 | 0.000 | -0.761 |
| M8 | 3.865 | 0.000 | 0.000 | 0.000 |
| Model | SE_Int | SE_lbody | SE_lgest | SE_litter |
|---|---|---|---|---|
| MA_AICc | 0.746 | 0.034 | 0.152 | 0.049 |
| MA_BIC | 1.314 | 0.056 | 0.273 | 0.080 |
| M1 | 0.662 | 0.033 | 0.137 | 0.042 |
| M2 | 0.458 | 0.032 | 0.109 | 0.000 |
| M3 | 0.119 | 0.020 | 0.000 | 0.034 |
| M4 | 0.073 | 0.020 | 0.000 | 0.000 |
| M5 | 0.562 | 0.000 | 0.117 | 0.000 |
| M6 | 0.960 | 0.000 | 0.170 | 0.084 |
| M7 | 0.293 | 0.000 | 0.000 | 0.101 |
| M8 | 0.222 | 0.000 | 0.000 | 0.000 |
The function returns messages providing the type of model averaging used and other useful output, including AICc based MA results.
Sometimes researchers choose to condition their results on the subset of models including the variables of interest (i.e., models where associated coefficients are not set to 0), described in Burnham and Anderson (2002). Setting the argument w_plus = TRUE tells MAPP_IC to conduct this type of model averaging.
MApp_IC(brainData, plot_wind = c(1,3),
type = "BIC", w_plus = TRUE)## Subset of models used in MA: Models where coefficient is not set to 0
## Type of information criteria used in MAP plot: BIC
The message now indicates the alternative type of model averaging was conducted.
This function works best with modest numbers of explanatory variables because it uses bms to define an all-subsets model set. When 9 explanatory variables are considered, the all-subsets model set contains 512 models, and by default bms will only store results from 500 models. Therefore, if this function is used with data containing 9 or more explanatory variables, it is important for the user to make sure the sum of the PMPs for the top 500 models (estimated using bms) is large enough so conditioning on those 500 models is appropriate for IC-based model averaging. A more flexible, but less automatic function included in this package is MApp_IC_gen (see Section ).
bfat example to show the flexibility of the function and max_display option?Results from IC based model averaging can be used as input arguments to MApp_IC_gen to create the MAP plot. This function requires the user to input four arguments: (1) a matrix of maximum likelihood estimates of the coefficients of interest for each model considered, (2) a matrix of standard errors for those estimates, (3) a vector (length = \(J\)) of model weights, and (4) a matrix of zeroes and ones corresponding to the variables included in each model (all matrices will be \(J\) rows and \(p\) columns). The user can choose not to include all \(J\) models by using a subset of the models in the four inputs.
We use the braindata to demonstrate this function.
post_means <- t(brain$topmod$betas())
post_means[post_means != 0] <- 1
inmat <- post_means
IC_approx <- approx_pmp(inmat, Xmat = brainData[,-1], Yvec = brainData[,1])## Subset of models used in MA: All models in the model set
knitr::kable(IC_approx[,c(1:5)], digits = 3)| Model | AICc_pmp | BIC_pmp | AICc | BIC |
|---|---|---|---|---|
| MA_AICc | NA | NA | NA | NA |
| MA_BIC | NA | NA | NA | NA |
| M1 | 0.895 | 0.368 | 136.310 | 148.465 |
| M2 | 0.088 | 0.318 | 140.947 | 150.765 |
| M3 | 0.017 | 0.259 | 144.271 | 154.089 |
| M4 | 0.000 | 0.055 | 171.447 | 178.880 |
| M5 | 0.000 | 0.000 | 274.682 | 282.114 |
| M6 | 0.000 | 0.000 | 275.600 | 285.418 |
| M7 | 0.000 | 0.000 | 381.491 | 388.924 |
| M8 | 0.000 | 0.000 | 424.408 | 429.407 |
knitr::kable(IC_approx[,c(1,6:9)], digits = 3)| Model | Est_Int | Est_lbody | Est_lgest | Est_litter |
|---|---|---|---|---|
| MA_AICc | 0.746 | 0.574 | 0.452 | -0.102 |
| MA_BIC | 1.039 | 0.597 | 0.375 | -0.092 |
| M1 | 0.823 | 0.575 | 0.440 | -0.110 |
| M2 | -0.457 | 0.551 | 0.668 | 0.000 |
| M3 | 2.917 | 0.658 | 0.000 | -0.197 |
| M4 | 2.332 | 0.719 | 0.000 | 0.000 |
| M5 | -6.665 | 0.000 | 2.234 | 0.000 |
| M6 | -7.528 | 0.000 | 2.371 | 0.094 |
| M7 | 5.622 | 0.000 | 0.000 | -0.761 |
| M8 | 3.865 | 0.000 | 0.000 | 0.000 |
knitr::kable(IC_approx[,c(1,10:13)], digits = 3)| Model | SE_Int | SE_lbody | SE_lgest | SE_litter |
|---|---|---|---|---|
| MA_AICc | 0.746 | 0.034 | 0.152 | 0.049 |
| MA_BIC | 1.314 | 0.056 | 0.273 | 0.080 |
| M1 | 0.662 | 0.033 | 0.137 | 0.042 |
| M2 | 0.458 | 0.032 | 0.109 | 0.000 |
| M3 | 0.119 | 0.020 | 0.000 | 0.034 |
| M4 | 0.073 | 0.020 | 0.000 | 0.000 |
| M5 | 0.562 | 0.000 | 0.117 | 0.000 |
| M6 | 0.960 | 0.000 | 0.170 | 0.084 |
| M7 | 0.293 | 0.000 | 0.000 | 0.101 |
| M8 | 0.222 | 0.000 | 0.000 | 0.000 |
Use MApp_IC_gen
# BIC weights
x_coef <- IC_approx[c(2:10), c(7:9)]
x_se <- IC_approx[c(2:10), c(11:13)]
pmp <- as.numeric(as.character(IC_approx[c(3:10),2]))
MApp_IC_gen(x_coef = x_coef, x_se = x_se, pmp = pmp, inmat = inmat)Another way to conduct Bayesian model averaging is to use BUGS, OpenBUGS, or a self-programmed reversible jump Gibbs sampler. In this case, the user will have posterior draws for partial regression coefficients estimated by each of the models \(\mathcal{M}\). Chains of posterior draws are usually stored in a list (length usually equal to the cardinality of the model set, unless certain models are never visited), or in a matrix with posterior draws for quantities of interest (including model) as variable in the matrix. If draws come in the form of a matrix, the MAP plot can be created directly with MApp_MCMC.
MApp_MCMCHere we show how the MAP plot can be made using results similar to the posterior draws that would be obtained from implementing model averaging with WinBUGS (Lunn et al. 2000) or R2OpenBugs (Sturtz, Ligges, and Gelman 2005). Typically draws are extracted using the coda (Plummer et al. 2006) and they will have a column for model and columns for the estimates of partial regression coefficients. Here, we use fake data considering 3 variables and 7 models.
The data are of the form:
head(mcmc_mat)| model | X1 | X2 | X3 | |
|---|---|---|---|---|
| 282 | M3 | -0.5044108 | 0.000000 | 6.518147 |
| 200 | M2 | -1.8688975 | 3.270894 | 0.000000 |
| 453 | M5 | -1.4741605 | 0.000000 | 0.000000 |
| 608 | M7 | 0.0000000 | 0.000000 | 5.967435 |
| 285 | M3 | 0.1372812 | 0.000000 | 6.038164 |
| 611 | M7 | 0.0000000 | 0.000000 | 6.065165 |
Create the MAP plot using MApp_MCMC
MApp_MCMC(mcmc_mat, plot_wind = c(1,3))## Table of posterior standard deviations:
## compare individual models to MA posterior.
##
##
## (press enter to display table)
## MA MA_cts M3 M2 M5 M7 M1
## X1 "0.8793" "0.9518" "1.0079" "0.9991" "0.9418" "---" "0.8651"
## X2 "1.6736" "1.0084" "---" "1.0032" "---" "---" "1.0294"
## X3 "3.1055" "0.9944" "1.0734" "---" "---" "0.924" "1.116"
## PMP "---" "---" "0.1429" "0.1429" "0.1429" "0.1429" "0.1429"
## M6 M4
## X1 "---" "---"
## X2 "1.0503" "0.9565"
## X3 "---" "0.8541"
## PMP "0.1429" "0.1429"
Another option in max_display (common to all Bayesian model averaging MAP plot functions) is "common3", which displays the top PMP model, the model including all explanatory variables, and the model averaged result.
MApp_MCMC(mcmc_mat, plot_wind = c(1,3), max_display = "common3")## Table of posterior standard deviations:
## compare individual models to MA posterior.
##
##
## (press enter to display table)
## MA MA_cts M3 M1
## X1 "0.8793" "0.9518" "1.0079" "0.8651"
## X2 "1.6736" "1.0084" "---" "1.0294"
## X3 "3.1055" "0.9944" "1.0734" "1.116"
## PMP "---" "---" "0.1429" "0.1429"
In this example, the posterior model probabilities are all equal, resulting in equivalent posterior exclusion probabilities (point masses at zero) for the partial regression coefficients associated with each explanatory variable. The overall mean for the model averaged posterior distribution gets pulled toward zero in all three panels, but we see the most “shrinkage” in the panel for \(X_3\). This is a result of the individual model posteriors for the coefficients associated with \(X_3\) being distributed similarly; specifically being centered away from zero with relatively small variance.
The table of posterior standard deviations shows how a relatively large posterior exclusion probability can affect the posterior variance of the model averaged distribution. We observe that the posterior variance of the continuous component of the model averaged distribution is similar to the posterior variances from the corresponding individual models. When considering the zero component, the model averaged posterior distribution has larger variance for the partial regression coefficients associated with \(X_2\) and \(X_3\), and smaller variance for the partial regression coefficients associated with \(X_1\). This is a result of the location of the continuous component.
MApp_listAnother common format for posterior draws is a list, in which case, MApp_list can be used. For this example, the posterior draws from the model with the highest PMP are in the first element of the list, and posterior draws from the model with the smallest PMP are in the last element of the list.
MApp_list(mcmc_list, plot_wind = c(1,5), max_display = 10)## Table of posterior standard deviations:
## compare individual models to the MA posterior.
##
##
## (press enter to display table)
## MA MA_cts M1 M2 M3 M4 M5
## X1 "0.0397" "0.1014" "---" "---" "0.101" "---" "---"
## X2 "0.037" "0.1025" "---" "---" "---" "0.094" "---"
## X3 "0.1674" "0.1176" "0.1085" "---" "0.1128" "0.1118" "0.1685"
## X4 "0.1169" "0.1099" "0.1081" "0.1093" "0.1135" "0.1079" "0.115"
## X5 "0.1112" "0.1932" "---" "0.1133" "---" "---" "0.1814"
## PMP "---" "---" "0.5636" "0.0875" "0.0835" "0.0746" "0.0726"
## M6 M7 M8 M9 M10
## X1 "---" "0.1142" "---" "0.1023" "0.0867"
## X2 "---" "---" "0.0889" "0.0991" "---"
## X3 "---" "---" "---" "0.0998" "0.1789"
## X4 "0.1016" "0.1149" "0.138" "0.1281" "0.1249"
## X5 "---" "0.1031" "0.0769" "---" "0.1857"
## PMP "0.0378" "0.0129" "0.0119" "0.0109" "0.0109"
Unlike the MApp plot, the number of draws from each individual model is proportional to the posterior model probability, and some beanplots may be drawn with very few observations.
MAppThis chapter highlighted the plotting functions in MApp, however, many of the other functions in this package are useful for studying properties of model averaged posteriors for partial regression coefficients under the \(g\)-prior implementation of Bayesian model averaging. Use help(package = "MApp") to see a list of all functions in the package and their descriptions (Figure ).
Albert, Jim. 2014. LearnBayes: Functions for Learning Bayesian Inference. http://CRAN.R-project.org/package=LearnBayes.
Bache, Stefan Milton, and Hadley Wickham. 2014. Magrittr: A Forward-Pipe Operator for R. https://CRAN.R-project.org/package=magrittr.
Bernardo, J. M., M. H. DeGroot, D. V. Lindley, and A. F. M. Smith, eds. 1980. Posterior Odds Ratios for Selected Regression Hypotheses. Bayesian Statistics Proceedings of the First Valencia International Meeting Held in Valencia (Spain); Valencia University Press.
Burnham, Kenneth P., and David R. Anderson. 2002. Model Selection and Multimodel Inference a Practical Information-Theoretic Approach. Springer.
Feldkircher, Martin, and Stefan Zeugner. 2009. “Benchmark Priors Revisited: On Adaptive Shrinkage and the Supermodel Effect in Bayesian Model Averaging.” International Monetary Fund (IMF) Working Paper, 1–39.
Hoeting, Jennifer A., David Madigan, Adrian E. Raftery, and Chris T. Volinsky. 1999. “Bayesian Model Averaging: A Tutorial.” Statistical Science 14 (4): 382–401.
Johnson, Roger W. 1996. “Fitting Percentage of Body Fat to Simple Body Measurements.” Journal of Statistics Education 4 (1). http://www.amstat.org/publications/jse/v4n1/datasets.johnson.html.
Kampstra, Peter. 2008. “Beanplot: A Boxplot Alternative for Visual Comparison of Distributions.” Journal of Statistical Software, Code Snippets 28 (1): 1–9. http://www.jstatsoft.org/v28/c01/.
Link, William A., and Richard J. Barker. 2006. “Model weights and the foundations of multimodel inference.” Ecology 87 (10): 2626–35. http://www.ncbi.nlm.nih.gov/pubmed/17089670.
Lunn, D J, A Thomas, N Best, and D Spiegelhalter. 2000. “WinBUGS – a Bayesian Modelling Framework: Concepts, Structure, and Extensibility.” Statistics and Computing 10: 325–37.
Plummer, Martyn, Nicky Best, Kate Cowles, and Karen Vines. 2006. “CODA: Convergence Diagnosis and Output Analysis for MCMC.” R News 6 (1): 7–11. http://CRAN.R-project.org/doc/Rnews/.
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Ramsey, F.L., D.W. Schafer, Daniel W. Schafer, Jeannie Sifneos, and Berwin A. Turlach. 2012. Sleuth2: Data Sets from Ramsey and Shafer’s “Statistical Sleuth (2nd Ed)". R package version 1.0-7.
Sturtz, Sibylle, Uwe Ligges, and Andrew Gelman. 2005. “R2WinBUGS: A Package for Running WinBUGS from R.” Journal of Statistical Software 12 (3): 1–16. http://www.jstatsoft.org.
Wickham, Hadley, and Winston Chang. 2016. Devtools: Tools to Make Developing R Packages Easier. https://CRAN.R-project.org/package=devtools.
Wickham, Hadley, and Romain Francois. 2015. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Zellner, A. 1984. Posterior Odds Ratios for Regression Hypotheses: General Considerations and Some Specific Results. Basic Issues in Econometrics. Univ. Chicago Press.