Prior Knowledge

Impact of Priors

First we will sample 100 Normal random numbers with \(\mu=1\) and \(\sigma=1\).

set.seed(666)
randon_samples=rnorm(1e2, mean=1, sd=1)
Data1_100=list(data=randon_samples, mon.names='', parm.names='mean', N=1e2)
Data2_100=list(data=randon_samples, mon.names='', parm.names=c('mean','sd'), N=1e2)
maghist(randon_samples)

## [1] "Summary of used sample:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.9432  0.2263  0.9479  0.9333  1.6844  3.1500 
## [1] "sd / 1-sig / 2-sig range:"
## [1] 1.0293765 0.9892135 1.9056859
## [1] "Plotting 100 out of 100 (100%) data points (0 < xlo & 0 > xhi)"

Now we make 4 different models to test, with different priors and number of parameters.

The single parameter model is constructed such that we only infer \(\mu\), \(\sigma\) is already set to the correct value (1).

Model1a=function(parm, Data){
val.prior=dnorm(parm, 1, 1, log=TRUE) #prior sd=1
LL=sum(dnorm(Data$data ,mean=parm, sd=1, log=TRUE))
LP=LL+val.prior
Modelout=list(LP=LP, Dev=-2*LL, Monitor=1, yhat=1, parm=parm)}

Model1b=function(parm, Data){
val.prior=dnorm(parm, 1, 10, log=TRUE) #prior sd=10
LL=sum(dnorm(Data$data, mean=parm, sd=1, log=TRUE))
LP=LL+val.prior
Modelout=list(LP=LP, Dev=-2*LL, Monitor=1, yhat=1, parm=parm)}

The 2 parameter model allows us to infer \(\mu\) and \(\sigma\).

Model2a=function(parm, Data){
val.prior=dnorm(parm[1], 1, 1, log=TRUE) + dnorm(parm[2], 0, 1, log=TRUE)  #prior sd=1
LL=sum(dnorm(Data$data, mean=parm[1], sd=10^parm[2], log=TRUE))
LP=LL+val.prior
Modelout=list(LP=LP, Dev=-2*LL, Monitor=1, yhat=1, parm=parm)}

Model2b=function(parm, Data){
val.prior=dnorm(parm[1], 1, 10, log=TRUE) + dnorm(parm[2], 0, 10, log=TRUE)  #prior sd=10
LL=sum(dnorm(Data$data, mean=parm[1], sd=10^parm[2], log=TRUE))
LP=LL+val.prior
Modelout=list(LP=LP, Dev=-2*LL, Monitor=1, yhat=1, parm=parm)}

We can sample the 4 different model options using LaplacesDemon.

output1a_100=LaplacesDemon(Model1a, Data1_100, Initial.Values=1, Algorithm='CHARM')
output1b_100=LaplacesDemon(Model1b, Data1_100, Initial.Values=1, Algorithm='CHARM')
output2a_100=LaplacesDemon(Model2a, Data2_100, Initial.Values=c(1,0), Algorithm='CHARM')
output2b_100=LaplacesDemon(Model2b, Data2_100, Initial.Values=c(1,0), Algorithm='CHARM')

We can then create a grid of Bayes Factors to select a preferred model.

BayesFactor(list(output1a_100, output1b_100, output2a_100, output2b_100))

## $B
##            [,1]       [,2]       [,3]      [,4]
## [1,] 1.00000000 0.60650199 11.7423627 11.844220
## [2,] 1.64879920 1.00000000 19.3607982 19.528740
## [3,] 0.08516174 0.05165076  1.0000000  1.008674
## [4,] 0.08442937 0.05120658  0.9914003  1.000000
## 
## $Hypothesis
## [1] "row > column"
## 
## $Strength.of.Evidence
## [1] "-Inf  <  B <= 0.1   Strong against"                 
## [2] "0.1   <  B <= (1/3) Substantial against"            
## [3] "(1/3) <  B < 1      Barely worth mentioning against"
## [4] "1     <= B < 3      Barely worth mentioning for"    
## [5] "3     <= B < 10     Substantial for"                
## [6] "10    <= B < Inf    Strong for"                     
## 
## $Posterior.Probability
## [1] 0.35481246 0.58501450 0.03021645 0.02995659
## 
## attr(,"class")
## [1] "bayesfactor"

As should be expected, we prefer out single paramter models (large Bayes Factors in the top-right of the grid) that have the intrinsic \(\sigma\) in them already. Of the two single parameter models, there is barely any support for Model1a with the tighter prior distribution around \(\mu\) and \(\sigma\) (1 versus 10). I.e. our choice in priors, although very different, actually has very little impact on which model we choose. This is good, because we want model selection to be largely driven by the data not the priors.

But what if we make the assumed distribution \(\sigma\) different to the true value: 1.2 rather that 1. Will we prefer the two parameter model then?

Model1a_bad=function(parm, Data){
val.prior=dnorm(parm, 1, 1, log=TRUE) #prior sd=1
LL=sum(dnorm(Data$data ,mean=parm, sd=1.2, log=TRUE)) #sd=1.2
LP=LL+val.prior
Modelout=list(LP=LP, Dev=-2*LL, Monitor=1, yhat=1, parm=parm)}

Model1b_bad=function(parm, Data){
val.prior=dnorm(parm, 1, 10, log=TRUE) #prior sd=10
LL=sum(dnorm(Data$data, mean=parm, sd=1.2, log=TRUE)) #sd=1.2
LP=LL+val.prior
Modelout=list(LP=LP, Dev=-2*LL, Monitor=1, yhat=1, parm=parm)}

Make some more samples with LaplacesDemon.

output1a_100_bad=LaplacesDemon(Model1a_bad, Data1_100, Initial.Values=1, Algorithm='CHARM')
output1b_100_bad=LaplacesDemon(Model1b_bad, Data1_100, Initial.Values=1, Algorithm='CHARM')

Create a new grid of Bayes Factors to select a preferred model.

BayesFactor(list(output1a_100_bad, output1b_100_bad, output2a_100, output2b_100))

## $B
##           [,1]      [,2]      [,3]     [,4]
## [1,] 1.0000000 1.0018361 1.5185294 1.531702
## [2,] 0.9981673 1.0000000 1.5157464 1.528894
## [3,] 0.6585319 0.6597410 1.0000000 1.008674
## [4,] 0.6528687 0.6540674 0.9914003 1.000000
## 
## $Hypothesis
## [1] "row > column"
## 
## $Strength.of.Evidence
## [1] "-Inf  <  B <= 0.1   Strong against"                 
## [2] "0.1   <  B <= (1/3) Substantial against"            
## [3] "(1/3) <  B < 1      Barely worth mentioning against"
## [4] "1     <= B < 3      Barely worth mentioning for"    
## [5] "3     <= B < 10     Substantial for"                
## [6] "10    <= B < Inf    Strong for"                     
## 
## $Posterior.Probability
## [1] 0.3021543 0.3016005 0.1989782 0.1972670
## 
## attr(,"class")
## [1] "bayesfactor"

We still marginally prefer the simpler (though a bit incorrect) model. Not by as much though, so it has lost some support.

But what if we have more data? We will now create data with 1000 observations.

set.seed(666)
randon_samples=rnorm(1000, mean=1, sd=1)
Data1_1000=list(data=randon_samples, mon.names='', parm.names='mean', N=1e3)
Data2_1000=list(data=randon_samples, mon.names='', parm.names=c('mean','sd'), N=1e3)
maghist(randon_samples)

## [1] "Summary of used sample:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.2184  0.3137  1.0076  0.9805  1.6741  3.7449 
## [1] "sd / 1-sig / 2-sig range:"
## [1] 0.9842185 0.9715474 1.9714712
## [1] "Plotting 1000 out of 1000 (100%) data points (0 < xlo & 0 > xhi)"

Make some more samples with LaplacesDemon.

output1a_1000_bad=LaplacesDemon(Model1a_bad, Data1_1000, Initial.Values=1, Algorithm='CHARM')
output1b_1000_bad=LaplacesDemon(Model1b_bad, Data1_1000, Initial.Values=1, Algorithm='CHARM')
output2a_1000=LaplacesDemon(Model2a, Data2_1000, Initial.Values=c(1,0), Algorithm='CHARM')
output2b_1000=LaplacesDemon(Model2b, Data2_1000, Initial.Values=c(1,0), Algorithm='CHARM')

As as aside, it is easy to check the triangle (or corner) plots with magtri:

magtri(output2a_1000$Posterior2)

## Warning: weights overwritten by binning

## Loading required package: rgl

## 
## Attaching package: 'rgl'

## The following object is masked from 'package:plotrix':
## 
##     mtext3d

## Loading required package: rpanel

## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'rpanel'

Create a new grid of Bayes Factors to select a preferred model.

BayesFactor(list(output1a_1000_bad, output1b_1000_bad, output2a_1000, output2b_1000))

## $B
##              [,1]         [,2]         [,3]         [,4]
## [1,] 1.000000e+00 5.190648e-01 3.223632e-14 3.848678e-14
## [2,] 1.926542e+00 1.000000e+00 6.210461e-14 7.414638e-14
## [3,] 3.102091e+13 1.610186e+13 1.000000e+00 1.193895e+00
## [4,] 2.598295e+13 1.348684e+13 8.375947e-01 1.000000e+00
## 
## $Hypothesis
## [1] "row > column"
## 
## $Strength.of.Evidence
## [1] "-Inf  <  B <= 0.1   Strong against"                 
## [2] "0.1   <  B <= (1/3) Substantial against"            
## [3] "(1/3) <  B < 1      Barely worth mentioning against"
## [4] "1     <= B < 3      Barely worth mentioning for"    
## [5] "3     <= B < 10     Substantial for"                
## [6] "10    <= B < Inf    Strong for"                     
## 
## $Posterior.Probability
## [1] NaN NaN NaN NaN
## 
## attr(,"class")
## [1] "bayesfactor"

With 10 times more data we can strongly reject both of the deliberately bad, but simpler, models. The two more complex models with different priors are indistinguishable in terms of selection preference.

The conclusion to this is that our choice of priors has very little impact on our model selection. This means you should not worry too much about the precise prior ranges, as long as they are either correctly informative when restrictive, or properly normalised when uninformative.

Prior Knowledge

Aaron Robotham

03/05/2018

Impact of Priors