Exploratory Data Analysis

This dataset comprises 15 variables: weekly wage in US dollars, hours worked, IQ, years of education, tenure, and experience, age, marital status, urban or rural living condition, black race (yes/no), mother’s and father’s education in years, and siblings. Some of the variables are shown in the plot below and interesting relationships can be observed: Some possible, simplistic, explanations for the variation in wages in the data are that smarter people make more money, and income tends to increase with age and be way lower for black people, unless they are married.

Density Plots

Exploring the variable of interest (weekly wage) and one of the covariates with highest correlation to wage, IQ, you can see that wage is concentrated in amounts below USD 2’000, with some values going up to over 3’000, while smarter people seem to make more money: as IQ increases, you find more observations in the upper right corner, indicating higher salaries.

column=350x

Bayesian Model Averaging (BAM) Plot

After fitting a linear regression model (not shown here) on the log-transformed wage variable, you can check if assumptions hold (i.e. errors are normally distributed with a constant variance, the mean expected weekly wages is linear in IQ, check for outliers). Often, several models are equally plausible and choosing only one ignores the uncertainty involved in selecting the variables to include in the model. A way to get around this problem is to implement Bayesian Model Averaging (BMA), in which multiple models are averaged to obtain posteriors of coefficients and predictions from new data. Excludîng observations with missing values in the data set, you fit a Bayesian linear regression model with a uniform model prior. This results in the calculation of marginal posterior inclusion probabilities for each variable. A visual representation of model ranking is shown here, with the best models on the left.

BAM Summary

You get the posterior model inclusion probability for each variable and the most probable models. For example, the posterior probability that hours is included in the model is 0.855. Further, the most likely model, with posterior probability of 0.0455, includes an intercept, hours worked, IQ, education, tenure, age, marital status, urban living status, and mother’s education (3 variables less than with the stepwise method in OLS regression). While a posterior probability of 0.0455 sounds small, it is much larger than the uniform prior probability assigned to it, since there are 2^15 (32’768) possible models. The most probable models are listed indicating relevance metrics such as Bayes factor (BF), posterior probability, R-squared and marginal log likelihood. Base model for BF is best BIC model.

          P(B != 0 | Y)   model 1   model 2   model 3   model 4   model 5
Intercept         1.000     1.000     1.000     1.000     1.000     1.000
hours             0.855     1.000     1.000     1.000     1.000     1.000
iq                0.897     1.000     1.000     1.000     1.000     1.000
kww               0.348     0.000     0.000     0.000     1.000     0.000
educ              0.999     1.000     1.000     1.000     1.000     1.000
exper             0.710     0.000     1.000     1.000     1.000     0.000
tenure            0.704     1.000     1.000     1.000     1.000     1.000
age               0.525     1.000     1.000     0.000     0.000     1.000
married1          0.999     1.000     1.000     1.000     1.000     1.000
black1            0.346     0.000     0.000     0.000     0.000     1.000
south1            0.320     0.000     0.000     0.000     0.000     0.000
urban1            1.000     1.000     1.000     1.000     1.000     1.000
sibs              0.042     0.000     0.000     0.000     0.000     0.000
brthord           0.122     0.000     0.000     0.000     0.000     0.000
meduc             0.573     1.000     1.000     1.000     1.000     1.000
feduc             0.233     0.000     0.000     0.000     0.000     0.000
BF                   NA     1.000     0.522     0.518     0.441     0.413
PostProbs            NA     0.046     0.024     0.024     0.020     0.019
R2                   NA     0.271     0.277     0.270     0.276     0.276
dim                  NA     9.000    10.000     9.000    10.000    10.000
logmarg              NA -1490.053 -1490.703 -1490.710 -1490.871 -1490.938

Posterior Model Probabilities with MCMC

You can then use BMA with the Zellner-Siow Cauchy prior distribution on the regression coefficients. This conjugate prior, called Zellner’s g prior, allows the value of g to be updated using the data, providing data-dependent shrinkage of coefficients towards zero. This leads to simple expressions for Bayes factors in terms of summary statistics from ordinary least squares. If the Markov chain has converged, these two quantities should be the same and fall on a 1-1 line. If not, running longer may be required. It looks like the MC has converged.

Predictions I

Once model diagnostics indicate that the Markov Chains have converged, you are ready to make predictions. Depending on the goal of your analysis, there are several ways to choose a model. The Bayesian Model Averaging (BMA) is a hierarchical model composed of submodels. BMA represents the full posterior uncertainty after seeing the data. For prediction, the posterior predicted posterior mean is the posterior predicted probability weighted average of all the models and the best under squared error loss. If you really have to select the best model and your objective is prediction, then the Best Predictive Model (BPM) finds the predictive values closest to BMA.

 [1] "Intercept" "hours"     "iq"        "kww"       "educ"      "exper"    
 [7] "tenure"    "age"       "married1"  "urban1"    "meduc"

If your objective is to learn which is the most likely model to generate the data using a 0/1 loss, aother way is to identify the variables using the Highest Probability Model (HPM).

[1] "Intercept" "hours"     "iq"        "educ"      "tenure"    "age"      
[7] "married1"  "urban1"    "meduc"

Yet another way is the Median Probability Model (MPM).

 [1] "Intercept" "hours"     "iq"        "educ"      "exper"     "tenure"   
 [7] "age"       "married1"  "urban1"    "meduc"

Predictions II

You can then answer questions such as: what characteristics lead to the highest wages in the BPM model? The highest predicted weekly wage / earnings in US dollars using the Best Predictive Model turns out to be USD 1’570 with a 95% chance (i.e. 95% credible interval) that this amount varies between 782 and 3’154 US Dollars. The highest wage applies to someone aged 37, married, living in an urban area, with an IQ of 127, with 16 years of education and of work experience, from which 12 with current employer. This person is not black and does not live in the Southern USA.

     2.5%     97.5%      pred 
 782.0093 3154.0842 1570.5169

Rows: 1
Columns: 17
$ wage    <int> 1586
$ hours   <int> 40
$ iq      <int> 127
$ kww     <int> 48
$ educ    <int> 16
$ exper   <int> 16
$ tenure  <int> 12
$ age     <int> 37
$ married <fct> 1
$ black   <fct> 0
$ south   <fct> 0
$ urban   <fct> 1
$ sibs    <int> 4
$ brthord <int> 4
$ meduc   <int> 16
$ feduc   <int> 16
$ lwage   <dbl> 7.36897