MApp: Model Averaged posteriors plot

Katharine Banner

2016-08-19

R statistical software (R Core Team 2016) is designed for statistical computation and graphics and is popular among statisticians and scientific researchers. The software is free and open source, meaning anyone can download the software for personal use, and anyone can contribute packages to be used with the software. To make the model averaged posteriors plot accessible to as many users of model averaging as possible, we wrote the MApp package in the R programming language. MApp was developed to support work presented in (Banner and Higgs 2016), which discusses model averaging of regression coefficients for linear regression models. We provide a brief summary of model averaging in .

Motivation for the Package

Model averaging is advertised as a tool for “accounting for model uncertainty”" in analyses. To conduct model averaging, a set of models is considered (\(\mathcal{M} = \{M_1, M_2, ..., M_J\}\), termed the model set). When considering model averaging in the linear regression framework, all first order combinations of the explanatory variables, or all-subsets, is a common model set to consider. For example, consider \(p\) potential explanatory variables, the all-subsets model set has size \(J = 2^p\) models, defined by models where each of the \(p\) variables is either in or out of a particular model.

Once the model set is defined, parameters of interest and posterior model probabilities, or weights based on information criteria such as AIC or BIC are calculated for all models in \(\mathcal{M}\). The probabilities, or weights, are used as weights in a weighted average to combine information about quantities of interest across all models considered. Banner and Higgs (2016) and Hoeting et al. (1999) provide more information about Bayesian model averaging, and Burnham and Anderson (2002) describe model averaging based on information criteria.

Model Averaging of Partial Regression Coefficients

Model averaging was developed in the context of prediction and its use has recently been extended to partial regression coefficients. Inference drawn from the latter application of typically results in explanations of “overall effects” for each of the coefficients of interest, presenting a unique question; what does “overall effect” mean in the context of the research question? In the presence of multicollinearity, the interpretations of partial regression coefficients are not necessarily constant across models, making “overall effects” difficult to understand. Similarly, the posterior distributions for partial regression coefficients also take careful thought. Variables associated with partial regression coefficients can be in or out of a linear regression model. Excluding a variable from a linear regression model is equivalent to setting its associated partial regression coefficient to exactly zero. Therefore, the total probability of the posterior distribution of a model averaged partial regression coefficient is made up of two components, one coming from the models where the coefficient is set to zero (i.e., variable excluded from the model), the other coming from the models where the coefficient is estimated (i.e., variable included in the model).

The MApp package provides the user with graphical tools for understanding what goes into model averaged partial regression coefficients so their appropriateness may be assessed on a case-by-case basis. Banner and Higgs (2016) contains a detailed discussion of potential issues with, and considerations for assessing common inferences drawn from model averaging partial regression coefficients.

There are many approaches for conducting model averaging including: self programmed reversible jump Markov chain Monte Carlo (MCMC) samplers; implementations in WinBUGS (Lunn et al. 2000), described in Link and Barker (2006); and methods based on information criteria, described in Burnham and Anderson (2002). To accommodate as many users of model averaging as possible, we designed our plotting functions to work with posterior draws from existing software in R (R Core Team 2016) (such as BMS), self-coded reversible jump MCMC samplers, and AIC & BIC based model averaging.

Installing MApp

MApp must be installed from GitHub or loaded from a local disk. The package, devtools (Wickham and Chang 2016) is necessary for both installations. To install devtools, run install.packages("devtools") in the console.

GitHub Install

GitHub (https://github.com) is an online service for hosting Git repositories. Git (https://git-scm.com) is a version control software used primarily by developers for maintaining code. You do not need to be a Git user or have a registered GitHub account to use MApp, follow these two steps:

  1. Install Version 0.1 of MApp from GitHub.
# install the package
library(devtools)
install_github("kbanner14/MApp-Rpackage", subdir = "MApp")
# load the package
library(MApp)
  1. To update MApp unload the package with unload(inst("MApp")) and repeat step 1. Note that both functions, inst and unload, are in the devtools package.

MApp uses functions from the following packages: LearnBayes (Albert 2014), beanplot (Kampstra 2008), magrittr (Bache and Wickham 2014), dplyr (Wickham and Francois 2015), and BMS (Feldkircher and Zeugner 2009). These packages will be loaded automatically with a call to install_github.

Local Disk Install

Another option for installing MApp is to load it from a local directory. Follow these four steps for this type of installation.

  1. Download all files in the MApp-Rpackage/ repository from the Download Zip button on the right hand side of the screen here: https://github.com/kbanner14/MApp-Rpackage. Git users may also clone the repository.

  2. Set your working directory to the location of the MApp repository on your computer.
  3. Run the following lines of code to load the package and its documentation.

# load the pckage
devtools::load_all(".")

# load the R documentation files
devtools::document(".")
  1. Install package dependencies: LearnBayes (Albert 2014), beanplot (Kampstra 2008), magrittr (Bache and Wickham 2014), dplyr (Wickham and Francois 2015), and BMS (Feldkircher and Zeugner 2009). Use install.packages("packagename")

MApp plotting functions

The MApp package provides plotting functions to help researchers dissect and understand common inferences drawn from model averaging partial regression coefficients. There are five major plotting functions:

Documentation exists for all functions in MApp. Use the ? or help to view the documentation.

? MApp_bms 

Example of the R documentation for MApp_bms.

MApp package data

The MApp_bms function

MApp_bms is designed to be used with the bms function in the BMS package (Feldkircher and Zeugner 2009). The bms function executes a popular implementation of model averaging by placing Zellner’s \(g\)-prior (Bernardo et al. 1980) on the regression coefficients, a user specified prior on the model set, and a user specified value for the hyperparameter, \(g\). This implementation results in analytical posterior distributions for the regression coefficients in the form of \(t\) distributions, allowing for quick computations. Detailed information about the \(g\)-prior implementation and its resulting posteriors is provided in Banner and Higgs (2016) (Supplement S1).

The function bms takes on a data frame and prior information and returns model averaged results (PMPs, and means and standard deviations of the model averaged distributions for each partial regression coefficient). The function also saves useful information about the models considered, namely the central moments for the posterior distributions of the partial regression coefficients. We use this information to obtain the exact \(t\) distributions for all partial regression coefficients in each model considered. Specifically, we use analytical results (Banner and Higgs (2016), Supplement S1). to find the appropriate degrees of freedom, and the central moments from bms to compute means and variances for each \(t\) distribution. MApp_bms uses this information to obtain a specified number of Monte Carlo draws from posterior distributions of the partial regression coefficients for each individual model. Then, using the posterior model probabilities, MApp_bms samples proportionally from all individual posteriors to create a model averaged posterior distribution for each partial regression coefficient of interest. Therefore, MApp_bms obtains draws from all distributions going into the model averaged result and displays this information using density and rug plots, or beanplots, from the beanplot package (Kampstra 2008) using a Gaussian smoothing kernel with default bandwidth nrd0. The bandwidth is set according to Silverman’s ‘rule of thumb’ (see the text Silverman B. W. (1986). Density Estimation, page 48, eqn (3.31)).

Example 1: A small model set

In this example we use brainData, which is comprised of average values of log gestation length log(days), litter size (number of offspring), log body size log(kg) and log brain weight log(g) for 96 species of mammals. The number of animals going into the average values for each species was not provided.

data(brainData)
head(brainData)

We suppose the researchers are interested in the relationship between log brain weight and log gestation length for all 96 mammal species. This relationship can be addressed directly with a simple linear regression model regressing log brain weight on log gestation length. Now we assume the researchers are asked to use model averaging to “account for model uncertainty,” and we show how the MAP plot can be used to understand the results of model averaging the partial regression coefficient associated with log gestation length, and potentially defend the researchers’ choice to use simple linear regression to address their question of interest.

lbrain lbody lgest litter
2.8622009 1.2527630 3.258097 1.0
1.2527630 -0.0725707 3.526361 4.6
1.1474025 -1.8971200 3.828641 3.0
0.1310283 -3.0159350 3.931826 1.5
0.3148107 -2.7488722 3.828641 1.5

We can use the bms function from the BMS package to obtain a bma object.

library(BMS)
brain <- bms(brainData, g = "UIP", 
                    mprior = "uniform", 
                    user.int = F)
class(brain)
## [1] "bma"

We use bms to place Zellner’s \(g\)-prior on the coefficients, and we specify a discrete uniform prior on the \(2^3=\) 8 models in the all-subsets model set. MApp_bms takes on two arguments that do not have default values: x and plot_wind. For this example, we specify x = bma-object, and plot_wind = c(1,3), which tells the function to split the plotting window up into one row with three columns. We can add specific model names in place of the default \(M_1, M_2,..., M_8\) with the mod_names argument.

map_brain <- MApp_bms(x = brain, plot_wind = c(1,3),
                      mod_names = c("BGL", "BG", "BL", "B", "G", "GL", "L", "Null"))
MAP plot for the partial regression coefficients associated with each explanatory variable in the brainData dataset

MAP plot for the partial regression coefficients associated with each explanatory variable in the brainData dataset

The model averaged posteriors plot displays both components of the model averaged posterior distribution. The continuous component is represented by the black density and rug plot (beanplot), and the zero component is represented by text providing the probability of the point mass at zero. The dashed line extending up from the beanplot shows the mean of the model averaged posterior distribution, which incorporates the zero component; the larger the point mass at zero, the more the line will be pulled towards zero.

Reading the MAP plot

The MAP plot has three panels, one for each of the explanatory variables considered (log gestation length (\(lgest\)), log body size (\(lbody\)), and litter size (\(litter\))). The posterior distributions of all partial regression coefficients for log body size, along with the posterior distribution of the model averaged partial regression coefficient associated with log body size (black beanplot and text) are shown in the first (leftmost) panel of Figure . In this case, all models excluding log body size had such small posterior model probabilities that their sum is arbitrarily close to zero. That is, the point mass at zero is so small that it is essentially negligible and the posterior distribution for the model averaged partial regression coefficient from \(\mathcal{M}\) (all models) is essentially equivalent to what would result from conditioning on the subset of models including log body size. So, for log body size, the continuous component of the posterior distribution of the model averaged partial regression coefficient gives essentially the same results as the model averaged posterior distribution considering both components.

To contrast this, we highlight the panel summarizing the partial regression coefficient for litter (far right). One of the models excluding litter (Model BG) has a relatively large posterior model probability of 0.343, and the other two models excluding litter (Models G and B) have negligible posterior model probabilities. Therefore, the zero-component of the posterior distribution of the model averaged partial regression coefficient associated with litter is the sum over the posterior model probabilities for models G, B and GB, which is approximately 0.343. The mean line is pulled away from the center of the continuous component (black beanplot) toward zero, showing the effect of the point mass at zero. Conditioning on models including litter, the model averaged posterior distribution is essentially a weighted average of the model with all three explanatory variables (with weight 0.569) and the model with log body and litter (with weight 0.088). Therefore, in this case we have one model contributing to the zero component, and two models contributing (very unevenly) to the continuous component of the model averaged distribution. This type of information is important for assessing the appropriateness of model averaged regression coefficient, and it is all easily discerned from the MAP plot.

A Row across panels displays the posterior distributions of partial regression coefficients associated with explanatory variables considered in a particular model; the model name is shown in the margin and its posterior model probability is shown on the \(y\)-axis. In this example, the first row is associated with the model containing only litter (Model L, second row from the top). The posterior model probability for this model is so small that rounded to three decimal places it is 0.000. The posterior distribution for the coefficient associated with \(litter\) in Model L is shown in the panel for \(litter\), and the panels for \(lgest\) and \(lbody\) do not display posterior distributions because those explanatory variables were set explicitly to zero in Model L. All other rows are read in the same fashion. Comparing rows facilitates comparisons of posteriors for coefficients and model probabilities across models. The rows are ordered by increasing posterior model probability to facilitate easy comparisons between posteriors for partial regression coefficients from models contributing most to the model averaged results and their corresponding model averaged posterior distribution (displayed in the bottom row).

The MAP plot also returns a table of posterior standard deviations for the model averaged result and the individual models displayed below.

map_brain
MA MA_cts BGL BG BL B G GL L Null
lbody 0.0447 0.0447 0.0363 0.0362 0.0226 0.0215
lgest 0.2128 0.1726 0.1484 0.1228 0.1166 0.1755
litter 0.0703 0.0517 0.0475 0.038 0.0869 0.1077
PMP 0.569 0.3433 0.0877 0 0 0 0 0

A common motivation for model averaging is to account for model uncertainty, which is often equated to increased variances for quantities of interest. However, it is unclear whether this translation of model uncertainty into increased variances is realized for partial regression coefficients. We also include a table of posterior standard deviations with MAP to facilitate comparisons for particular examples.

Using the MAP Plot for Justifying Inference

We return to the original question of interest stated at the beginning of the example. The researchers were specifically interested in the relationship between mean brain weight and gestation length over all mammals (ignoring body size and litter size). The regression coefficient representing this relationship comes from the model with log gestation length as the only explanatory variable (Model G, third row in MAP plot above). We observe two things in the MAP plot above: (1) the model of interest has such a small posterior model probability that it is essentially ignored in the model averaging, and (2) the posterior distribution for the partial regression coefficient associated with log gestation length in the model of interest is clearly shifted to the right compared to the posterior distributions of the partial regression coefficients with large enough posterior model probabilities to contribute largely to the model averaging. That is, the posteriors contributing to the model averaged result for log gestation length and the posterior for the partial regression coefficient addressing the question of interest tell very different stories.

We easily see from the MAP plot, that the total posterior probability for models that exclude log gestation length is about 0.088. The component of the model averaged posterior distribution conditional on log gestation length being in the model (continuous component) is essentially a weighted average of the posterior distributions from models GBL and GB: \(p(\beta_{3.MA}|y, \mathcal{M}^*_L) = 0.569 p({\beta}_{3.gbl}|y) + 0.343 Pr(\beta_{3.bl}|y)\). Although we can see in the MAP plot above that the two posterior distributions contributing to the continuous component of the model averaged distribution are similar, the result is a mixture of two coefficients with different meanings: one conditional on log body size alone, and one conditional on both log body size and litter size. As we mentioned above, neither of these directly address the relationship of interest. We have seen researchers interpret model averaged coefficients as “overall effects” of their associated explanatory variables. However, this is technically incorrect and potentially very misleading. Not only is the interpretation difficult, but the model averaged distribution for the coefficient associated with log gestation length is essentially a weighted average of posterior distributions from only \(two\) of the \(eight\) models. While information like this is extremely important, it is not typically reported or discussed and the MAP plot tells a clear visual story to convey this information.

This example, though simple, illustrates the complexities of model averaging regression coefficients, and demonstrates model averaging leads to tough decisions of its own, just like defending the choice of one model. Such decisions should be accompanied by a thorough description of the main components going into the model averaging. Whether the researcher chooses to condition inference on one model or use an approach like model averaging, the decisions and justifications along the way should be driven by the research question and the intended use of the results. When a specific question is well defined, an appropriate way to address that question can be to build a regression model driven by the question itself. Building such a model requires careful thought effort to justify, making it tempting to elicit some sort of automatic approach to find the “best” model or combine results over multiple models. Something we often lose sight of when building models is that the “best” model, whether it be based on AIC, \(p\)-values or some other model selection criteria may not be consistent with addressing the original question. So, while model averaging may hold the allure of a decision-free model building technique (just use all of them), we must consider whether the result of the procedure leaves us with the ability to address the relationships we set out to investigate in the first place. We urge researchers to think hard about what research questions they are investigating and whether “accounting for model uncertainty” is worth the potential loss in interpretation and ability to address the question(s) of interest. We present the MAP plot as a way to help with this goal.

Example 2: MApp_bms with Large Model Sets

We will use the bfat data which contain 13 body fat measurements from 251 men to demonstrate the flexibility of our plotting function for problems with a larger set of potential predictors.

First, we will take a look at these data and create a bma object, bfat_bms, using the bms function (again, we chose a uniform prior on the model space and Zellner’s \(g\)-prior Zellner (1984) with \(g = n\)).

data(bfat)
head(bfat)
bf.brozek age weight height neck chest abdomen hip thigh knee ankle ex.bicep forearm wrist
12.6 23 154.25 67.75 36.2 93.1 85.2 94.5 59.0 37.3 21.9 32.0 27.4 17.1
6.9 22 173.25 72.25 38.5 93.6 83.0 98.7 58.7 37.3 23.4 30.5 28.9 18.2
24.6 22 154.00 66.25 34.0 95.8 87.9 99.2 59.6 38.9 24.0 28.8 25.2 16.6
10.9 26 184.75 72.25 37.4 101.8 86.4 101.2 60.1 37.3 22.8 32.4 29.4 18.2
27.8 24 184.25 71.25 34.4 97.3 100.0 101.9 63.2 42.2 24.0 32.2 27.7 17.7
# create bma object using bfat data
bfat_bms <- bms(bfat, mprior = "uniform", g = "UIP", user.int = F)

If we consider all potential first-order combinations of the predictors, there are \(2^{13} =\) 8192 potential models in the model set. By default a bma object generated by bms will store weights and posterior summary measures for the top 500 models considered. Using the closed form posterior results for model averaging with a \(g\)-prior on the regression coefficients, MApp_bms will obtain a specified number of draws from the analytical posterior distributions for all models stored in the bma object and will create a model averaged distribution by sampling from these individual posteriors with normalized probabilities to retain proportionality from model averaging including all models. The proportion of posterior model mass accounted for by the top 500 models should be checked to ensure the approximate model averaged distribution created by MApp_bms is reasonable. In this case, the top 500 models accounted for approximately 97% of the posterior mass and we can be confident that the approximate model averaged distributions will be very similar to the model averaged distributions when all models are considered.

# see how much of the posterior model mass is accounted for by the top 500
# models
sum(pmp.bma(bfat_bms)[,1])
## [1] 0.9749476

Because plotting 500 models simultaneously would make MAP plot impossible to read, we can tell MApp_bms to display a subset of those models using the max_display argument. We can also specify which coefficient to include in the model averaged posteriors plot with the include_coef argument. If max_display and include_coef are left unspecified, the function will display the 20 models with the largest PMP.

map_bfat <- MApp_bms(bfat_bms, plot_wind = c(1,3), 
                        include_coef = c(13, 2, 4), 
                        max_display = 10, 
                        num_sims = 1000)

map_bfat
MA MA_cts M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
wrist 0.6694 0.4594 0.4128 0.4172 0.397 0.4367 0.4729 0.4219 0.4244 0.4195
weight 0.0407 0.0332 0.0234 0.0219 0.025 0.0194 0.0236 0.0283 0.0255 0.0223 0.0293 0.0246
neck 0.1887 0.2194 0.2085 0.1899
PMP 0.1636 0.0788 0.0463 0.0376 0.034 0.0246 0.0235 0.0206 0.0195 0.0164

It is also common to obtain approximate posterior model probabilities (model weights) from certain information criterion (most commonly: BIC, AICc or AIC). A version of the MAP plot can be created for comparing confidence intervals for partial regression coefficients across the models in the model set.

Using MApp_IC

MApp_IC generates results from approximations to posterior model probabilities (PMPs) based on two user specifications: (1) the type of information criterion, and (2) the method for model averaging (conditional on models allowing the coefficient to be estimated vs. using all models in the model set). The default for MApp_IC conducts BIC based model averaging (type = "BIC") and considers all models in the model set (w_plus = FALSE). Here, we make the MAP plot for the brainData with the default specifications.

## Subset of models used in MA: All models in the model set
## Type of information criteria used in MAP plot: BIC

Model AICc_pmp BIC_pmp AICc BIC
MA_AICc NA NA NA NA
MA_BIC NA NA NA NA
M1 0.895 0.368 136.310 148.465
M2 0.088 0.318 140.947 150.765
M3 0.017 0.259 144.271 154.089
M4 0.000 0.055 171.447 178.880
M5 0.000 0.000 274.682 282.114
M6 0.000 0.000 275.600 285.418
M7 0.000 0.000 381.491 388.924
M8 0.000 0.000 424.408 429.407
Model Est_Int Est_lbody Est_lgest Est_litter
MA_AICc 0.746 0.574 0.452 -0.102
MA_BIC 1.039 0.597 0.375 -0.092
M1 0.823 0.575 0.440 -0.110
M2 -0.457 0.551 0.668 0.000
M3 2.917 0.658 0.000 -0.197
M4 2.332 0.719 0.000 0.000
M5 -6.665 0.000 2.234 0.000
M6 -7.528 0.000 2.371 0.094
M7 5.622 0.000 0.000 -0.761
M8 3.865 0.000 0.000 0.000
Model SE_Int SE_lbody SE_lgest SE_litter
MA_AICc 0.746 0.034 0.152 0.049
MA_BIC 1.314 0.056 0.273 0.080
M1 0.662 0.033 0.137 0.042
M2 0.458 0.032 0.109 0.000
M3 0.119 0.020 0.000 0.034
M4 0.073 0.020 0.000 0.000
M5 0.562 0.000 0.117 0.000
M6 0.960 0.000 0.170 0.084
M7 0.293 0.000 0.000 0.101
M8 0.222 0.000 0.000 0.000

The function returns messages providing the type of model averaging used and other useful output, including AICc based MA results.

Sometimes researchers choose to condition their results on the subset of models including the variables of interest (i.e., models where associated coefficients are not set to 0), described in Burnham and Anderson (2002). Setting the argument w_plus = TRUE tells MAPP_IC to conduct this type of model averaging.

MApp_IC(brainData, plot_wind = c(1,3), 
        type = "BIC", w_plus = TRUE)
## Subset of models used in MA: Models where coefficient is not set to 0
## Type of information criteria used in MAP plot: BIC

The message now indicates the alternative type of model averaging was conducted.

This function works best with modest numbers of explanatory variables because it uses bms to define an all-subsets model set. When 9 explanatory variables are considered, the all-subsets model set contains 512 models, and by default bms will only store results from 500 models. Therefore, if this function is used with data containing 9 or more explanatory variables, it is important for the user to make sure the sum of the PMPs for the top 500 models (estimated using bms) is large enough so conditioning on those 500 models is appropriate for IC-based model averaging. A more flexible, but less automatic function included in this package is MApp_IC_gen (see Section ).

Results from IC based model averaging can be used as input arguments to MApp_IC_gen to create the MAP plot. This function requires the user to input four arguments: (1) a matrix of maximum likelihood estimates of the coefficients of interest for each model considered, (2) a matrix of standard errors for those estimates, (3) a vector (length = \(J\)) of model weights, and (4) a matrix of zeroes and ones corresponding to the variables included in each model (all matrices will be \(J\) rows and \(p\) columns). The user can choose not to include all \(J\) models by using a subset of the models in the four inputs.

We use the braindata to demonstrate this function.

post_means <- t(brain$topmod$betas())
post_means[post_means != 0] <- 1
inmat <- post_means

IC_approx <- approx_pmp(inmat, Xmat = brainData[,-1], Yvec = brainData[,1])
## Subset of models used in MA: All models in the model set
knitr::kable(IC_approx[,c(1:5)], digits = 3)
Model AICc_pmp BIC_pmp AICc BIC
MA_AICc NA NA NA NA
MA_BIC NA NA NA NA
M1 0.895 0.368 136.310 148.465
M2 0.088 0.318 140.947 150.765
M3 0.017 0.259 144.271 154.089
M4 0.000 0.055 171.447 178.880
M5 0.000 0.000 274.682 282.114
M6 0.000 0.000 275.600 285.418
M7 0.000 0.000 381.491 388.924
M8 0.000 0.000 424.408 429.407
knitr::kable(IC_approx[,c(1,6:9)], digits = 3)
Model Est_Int Est_lbody Est_lgest Est_litter
MA_AICc 0.746 0.574 0.452 -0.102
MA_BIC 1.039 0.597 0.375 -0.092
M1 0.823 0.575 0.440 -0.110
M2 -0.457 0.551 0.668 0.000
M3 2.917 0.658 0.000 -0.197
M4 2.332 0.719 0.000 0.000
M5 -6.665 0.000 2.234 0.000
M6 -7.528 0.000 2.371 0.094
M7 5.622 0.000 0.000 -0.761
M8 3.865 0.000 0.000 0.000
knitr::kable(IC_approx[,c(1,10:13)], digits = 3)
Model SE_Int SE_lbody SE_lgest SE_litter
MA_AICc 0.746 0.034 0.152 0.049
MA_BIC 1.314 0.056 0.273 0.080
M1 0.662 0.033 0.137 0.042
M2 0.458 0.032 0.109 0.000
M3 0.119 0.020 0.000 0.034
M4 0.073 0.020 0.000 0.000
M5 0.562 0.000 0.117 0.000
M6 0.960 0.000 0.170 0.084
M7 0.293 0.000 0.000 0.101
M8 0.222 0.000 0.000 0.000

Use MApp_IC_gen

# BIC weights
x_coef <- IC_approx[c(2:10), c(7:9)]
x_se <- IC_approx[c(2:10), c(11:13)]
pmp <- as.numeric(as.character(IC_approx[c(3:10),2]))
MApp_IC_gen(x_coef = x_coef, x_se = x_se, pmp = pmp, inmat = inmat)

MApp for Self-Programmed Samplers

Another way to conduct Bayesian model averaging is to use BUGS, OpenBUGS, or a self-programmed reversible jump Gibbs sampler. In this case, the user will have posterior draws for partial regression coefficients estimated by each of the models \(\mathcal{M}\). Chains of posterior draws are usually stored in a list (length usually equal to the cardinality of the model set, unless certain models are never visited), or in a matrix with posterior draws for quantities of interest (including model) as variable in the matrix. If draws come in the form of a matrix, the MAP plot can be created directly with MApp_MCMC.

MApp_MCMC

Here we show how the MAP plot can be made using results similar to the posterior draws that would be obtained from implementing model averaging with WinBUGS (Lunn et al. 2000) or R2OpenBugs (Sturtz, Ligges, and Gelman 2005). Typically draws are extracted using the coda (Plummer et al. 2006) and they will have a column for model and columns for the estimates of partial regression coefficients. Here, we use fake data considering 3 variables and 7 models.

The data are of the form:

head(mcmc_mat)
model X1 X2 X3
282 M3 -0.5044108 0.000000 6.518147
200 M2 -1.8688975 3.270894 0.000000
453 M5 -1.4741605 0.000000 0.000000
608 M7 0.0000000 0.000000 5.967435
285 M3 0.1372812 0.000000 6.038164
611 M7 0.0000000 0.000000 6.065165

Create the MAP plot using MApp_MCMC

MApp_MCMC(mcmc_mat, plot_wind = c(1,3))

## Table of posterior standard deviations: 
##            compare individual models to MA posterior.
## 
##  
##            (press enter to display table)
##     MA       MA_cts   M3       M2       M5       M7       M1      
## X1  "0.8793" "0.9518" "1.0079" "0.9991" "0.9418" "---"    "0.8651"
## X2  "1.6736" "1.0084" "---"    "1.0032" "---"    "---"    "1.0294"
## X3  "3.1055" "0.9944" "1.0734" "---"    "---"    "0.924"  "1.116" 
## PMP "---"    "---"    "0.1429" "0.1429" "0.1429" "0.1429" "0.1429"
##     M6       M4      
## X1  "---"    "---"   
## X2  "1.0503" "0.9565"
## X3  "---"    "0.8541"
## PMP "0.1429" "0.1429"

Another option in max_display (common to all Bayesian model averaging MAP plot functions) is "common3", which displays the top PMP model, the model including all explanatory variables, and the model averaged result.

MApp_MCMC(mcmc_mat, plot_wind = c(1,3), max_display = "common3")

## Table of posterior standard deviations: 
##            compare individual models to MA posterior.
## 
##  
##            (press enter to display table)
##     MA       MA_cts   M3       M1      
## X1  "0.8793" "0.9518" "1.0079" "0.8651"
## X2  "1.6736" "1.0084" "---"    "1.0294"
## X3  "3.1055" "0.9944" "1.0734" "1.116" 
## PMP "---"    "---"    "0.1429" "0.1429"

In this example, the posterior model probabilities are all equal, resulting in equivalent posterior exclusion probabilities (point masses at zero) for the partial regression coefficients associated with each explanatory variable. The overall mean for the model averaged posterior distribution gets pulled toward zero in all three panels, but we see the most “shrinkage” in the panel for \(X_3\). This is a result of the individual model posteriors for the coefficients associated with \(X_3\) being distributed similarly; specifically being centered away from zero with relatively small variance.

The table of posterior standard deviations shows how a relatively large posterior exclusion probability can affect the posterior variance of the model averaged distribution. We observe that the posterior variance of the continuous component of the model averaged distribution is similar to the posterior variances from the corresponding individual models. When considering the zero component, the model averaged posterior distribution has larger variance for the partial regression coefficients associated with \(X_2\) and \(X_3\), and smaller variance for the partial regression coefficients associated with \(X_1\). This is a result of the location of the continuous component.

MApp_list

Another common format for posterior draws is a list, in which case, MApp_list can be used. For this example, the posterior draws from the model with the highest PMP are in the first element of the list, and posterior draws from the model with the smallest PMP are in the last element of the list.

MApp_list(mcmc_list, plot_wind = c(1,5), max_display = 10)

## Table of posterior standard deviations: 
##              compare individual models to the MA posterior.
## 
##  
##              (press enter to display table)
##     MA       MA_cts   M1       M2       M3       M4       M5      
## X1  "0.0397" "0.1014" "---"    "---"    "0.101"  "---"    "---"   
## X2  "0.037"  "0.1025" "---"    "---"    "---"    "0.094"  "---"   
## X3  "0.1674" "0.1176" "0.1085" "---"    "0.1128" "0.1118" "0.1685"
## X4  "0.1169" "0.1099" "0.1081" "0.1093" "0.1135" "0.1079" "0.115" 
## X5  "0.1112" "0.1932" "---"    "0.1133" "---"    "---"    "0.1814"
## PMP "---"    "---"    "0.5636" "0.0875" "0.0835" "0.0746" "0.0726"
##     M6       M7       M8       M9       M10     
## X1  "---"    "0.1142" "---"    "0.1023" "0.0867"
## X2  "---"    "---"    "0.0889" "0.0991" "---"   
## X3  "---"    "---"    "---"    "0.0998" "0.1789"
## X4  "0.1016" "0.1149" "0.138"  "0.1281" "0.1249"
## X5  "---"    "0.1031" "0.0769" "---"    "0.1857"
## PMP "0.0378" "0.0129" "0.0119" "0.0109" "0.0109"

Unlike the MApp plot, the number of draws from each individual model is proportional to the posterior model probability, and some beanplots may be drawn with very few observations.

Other Functions in MApp

This chapter highlighted the plotting functions in MApp, however, many of the other functions in this package are useful for studying properties of model averaged posteriors for partial regression coefficients under the \(g\)-prior implementation of Bayesian model averaging. Use help(package = "MApp") to see a list of all functions in the package and their descriptions (Figure ).

Help for the `MApp` package provides a list of functions and their descriptions.

References

Albert, Jim. 2014. LearnBayes: Functions for Learning Bayesian Inference. http://CRAN.R-project.org/package=LearnBayes.

Bache, Stefan Milton, and Hadley Wickham. 2014. Magrittr: A Forward-Pipe Operator for R. https://CRAN.R-project.org/package=magrittr.

Banner, Katharine M., and Megan D. Higgs. 2016. “Considerations for Assessing Model Averaging of Regression Coefficients.” Ecological Applications. doi:10.1002/eap.1419.

Bernardo, J. M., M. H. DeGroot, D. V. Lindley, and A. F. M. Smith, eds. 1980. Posterior Odds Ratios for Selected Regression Hypotheses. Bayesian Statistics Proceedings of the First Valencia International Meeting Held in Valencia (Spain); Valencia University Press.

Burnham, Kenneth P., and David R. Anderson. 2002. Model Selection and Multimodel Inference a Practical Information-Theoretic Approach. Springer.

Feldkircher, Martin, and Stefan Zeugner. 2009. “Benchmark Priors Revisited: On Adaptive Shrinkage and the Supermodel Effect in Bayesian Model Averaging.” International Monetary Fund (IMF) Working Paper, 1–39.

Hoeting, Jennifer A., David Madigan, Adrian E. Raftery, and Chris T. Volinsky. 1999. “Bayesian Model Averaging: A Tutorial.” Statistical Science 14 (4): 382–401.

Johnson, Roger W. 1996. “Fitting Percentage of Body Fat to Simple Body Measurements.” Journal of Statistics Education 4 (1). http://www.amstat.org/publications/jse/v4n1/datasets.johnson.html.

Kampstra, Peter. 2008. “Beanplot: A Boxplot Alternative for Visual Comparison of Distributions.” Journal of Statistical Software, Code Snippets 28 (1): 1–9. http://www.jstatsoft.org/v28/c01/.

Link, William A., and Richard J. Barker. 2006. “Model weights and the foundations of multimodel inference.” Ecology 87 (10): 2626–35. http://www.ncbi.nlm.nih.gov/pubmed/17089670.

Lunn, D J, A Thomas, N Best, and D Spiegelhalter. 2000. “WinBUGS – a Bayesian Modelling Framework: Concepts, Structure, and Extensibility.” Statistics and Computing 10: 325–37.

Plummer, Martyn, Nicky Best, Kate Cowles, and Karen Vines. 2006. “CODA: Convergence Diagnosis and Output Analysis for MCMC.” R News 6 (1): 7–11. http://CRAN.R-project.org/doc/Rnews/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Ramsey, F.L., D.W. Schafer, Daniel W. Schafer, Jeannie Sifneos, and Berwin A. Turlach. 2012. Sleuth2: Data Sets from Ramsey and Shafer’s “Statistical Sleuth (2nd Ed)". R package version 1.0-7.

Sturtz, Sibylle, Uwe Ligges, and Andrew Gelman. 2005. “R2WinBUGS: A Package for Running WinBUGS from R.” Journal of Statistical Software 12 (3): 1–16. http://www.jstatsoft.org.

Wickham, Hadley, and Winston Chang. 2016. Devtools: Tools to Make Developing R Packages Easier. https://CRAN.R-project.org/package=devtools.

Wickham, Hadley, and Romain Francois. 2015. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Zellner, A. 1984. Posterior Odds Ratios for Regression Hypotheses: General Considerations and Some Specific Results. Basic Issues in Econometrics. Univ. Chicago Press.