Purpose of assignment is to get some practical experience using JAGS on common methods we have come across in statistics. These methods include fitting linearity regression models with and without interactions, Bayes factors, and paired t-tests.
This datasets aims to estimate the biomass of paddocks by using satellites measurements of the reflected wavelengths of green, blue, red, and near-infrared (henceforth NIR).
| DryMatter | drymatter | NDVI | Date | Paddock | blue | green | red | nir |
|---|---|---|---|---|---|---|---|---|
| 1875 | 1536.238 | 0.6785089 | 2017-10-13 | A12 | 325.0920 | 532.5517 | 520.3084 | 2716.536 |
| 2250 | 1748.723 | 0.7144538 | 2017-10-18 | A12 | 335.7911 | 551.1741 | 528.0619 | 3170.547 |
| 2550 | 2012.143 | 0.7427501 | 2018-02-15 | A12 | 435.8410 | 592.2337 | 509.5287 | 3451.824 |
| 1950 | 1815.039 | 0.6794601 | 2018-05-03 | A12 | 554.5660 | 672.9560 | 601.7782 | 3153.000 |
| 1950 | 1879.764 | 0.7208708 | 2018-05-03 | A12 | 388.9216 | 649.5143 | 614.5010 | 3788.486 |
| 2700 | 2393.333 | 0.8187152 | 2018-05-28 | A12 | 218.9326 | 498.8568 | 442.2653 | 4436.966 |
Other variables include DryMatter (ground biomass measurements), drymatter (current algorithm estimate of biomass), Date (date of estimate), Paddock (paddock on farm) and RGB, NIR (respective reflected wavelengths).
Shown in the matrix below there is strong positive correlation with the biomass estimate from our current model estimates (expected). Also NIR by itself has a strong positive correlation while RGB all have weaker negative correlations. There also appears to be multidisciplinary between the predictors especially between the RGB reflected wavelengths. Possibly an indication the not all variables will be needed in the model.
The graph below shows the NIR wavelengths reflected the most but also has the most variance relative to RGB. It was shown by it’s strong linear relationship that has NIR increases, biomass increases. RGB on the other hand have smaller variances and weak negative relationships so as they increase biomass decreases (very slightly).
Separating the variables out from the data the we will use later. Will also mean center the data at this point as it helps with thinning later on.
Y <- All_Clean$DryMatter - mean(All_Clean$DryMatter)
b1_nir <- All_Clean$nir - mean(All_Clean$nir)
b2_red <- All_Clean$red - mean(All_Clean$red)
b3_green <- All_Clean$green - mean(All_Clean$green)
b4_blue <- All_Clean$blue - mean(All_Clean$blue)
n <- length(Y)
all_clean_centered = data.frame(cbind(Y,b1_nir,b2_red,b3_green,b4_blue))
colnames(all_clean_centered) = c("DryMatter","nir","red","green","blue")In setting up our model we must first know the form in which it should follow, being a linear regression with four predictor variables we know the form should be of:
\[ y = \beta_{o} + x_{1}\beta_{1} + x_{2}\beta_{2}+ x_{3}\beta_{3}+ x_{4}\beta_{4}+ \epsilon,\;\;\;\;\; \epsilon \sim N(0, \sigma^2) \] Now we should specify our priors the will make up the model. For the \(\beta\) coefficients we have no knowledge of what value the should take so I will set the prior to be from a normal distribution with a standard deviation centered around zero. This is because normal distribution allows for all possible values and can set the standard deviation as needed depending on how large a distance I expect the MCMC will need to make. Initially I set s.d. at 0.001 but found my values didn’t converge anywhere close to the lm() model so I later changed to 0.00001 to allow more movement.
Note: I made smaller because in JAGS it is the inverse variance, outside JAGS it would be making the standard deviation larger
The prior for sigma has been taken from a \(gamma(0.001, 0.001)\) distributions as it allows for all possible values but tends to favour smaller values. Shown in the image below:
Gamma(0.001, 0.001)
model_string <- "model{
# Posterior Likelihood
for(i in 1:n){
Y[i] ~ dnorm(mu[i],inv.var)
mu[i] <- beta_intercept + beta_nir*b1_nir[i] + beta_red*b2_red[i] + beta_green*b3_green[i] + beta_blue*b4_blue[i]
}
# Prior for beta
beta_intercept ~ dnorm(0,0.00001)
beta_nir ~ dnorm(0,0.00001)
beta_red ~ dnorm(0,0.00001)
beta_green ~ dnorm(0,0.00001)
beta_blue ~ dnorm(0,0.00001)
# Prior for the inverse variance
inv.var ~ dgamma(0.001, 0.001)
sigma <- 1/sqrt(inv.var)
}"model <- jags.model(textConnection(model_string),
data = list(Y=Y,n=n,b1_nir=b1_nir,b2_red=b2_red,b3_green=b3_green,b4_blue=b4_blue))## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 897
## Unobserved stochastic nodes: 6
## Total graph size: 8983
##
## Initializing model
Initialing with a small burn in and without thinning and having 10000 steps. Initially I found that thinning was extremely large at around 4000 so with the aim of around 5000 samples I would need the steps to be around 20 million. Then when mean centering the data it became a lot more reasonable. As my laptop cannot handle this I took 100,000 steps as the results were still very accurate.
burn_in = 1
thin = 1
variable_names = c('beta_intercept','beta_nir','beta_red','beta_green','beta_blue', 'sigma')
steps = 100000
update(model, burn_in)
draw = jags.samples(model, steps, thin=thin, variable.names = variable_names)
# Convert to a list
make_list <- function(draw)
{
results = list()
for(name in names(draw))
{
# Extract "chain 1"
results[[name]] = as.array(draw[[name]][,,1])
# Transpose 2D arrays
if(length(dim(results[[name]])) == 2)
results[[name]] = t(results[[name]])
}
return(results)
}
results = make_list(draw)| beta_blue | beta_green | beta_intercept | beta_nir | beta_red | sigma | |
|---|---|---|---|---|---|---|
| Min. :0.7434 | Min. :-7.67126 | Min. :-39.13509 | Min. :0.3983 | Min. :-2.0949 | Min. :241.2 | |
| 1st Qu.:3.0356 | 1st Qu.:-5.79094 | 1st Qu.: -6.04993 | 1st Qu.:0.4973 | 1st Qu.: 0.2805 | 1st Qu.:265.0 | |
| Median :3.2192 | Median :-5.39336 | Median : 0.00535 | Median :0.5099 | Median : 0.5033 | Median :269.3 | |
| Mean :3.2193 | Mean :-5.38676 | Mean : -0.00290 | Mean :0.5099 | Mean : 0.5001 | Mean :269.4 | |
| 3rd Qu.:3.4042 | 3rd Qu.:-4.98457 | 3rd Qu.: 6.05209 | 3rd Qu.:0.5226 | 3rd Qu.: 0.7221 | 3rd Qu.:273.6 | |
| Max. :4.3583 | Max. : 0.03094 | Max. : 36.22369 | Max. :0.5913 | Max. : 1.7239 | Max. :300.2 |
After making adjustments of a burn in at 1000 (little large but no harm) and thinning should be around 150. But to get a sample of around 5000 I will need to thin at 20 as I have 100,000 steps. This will just add more variance in the end model. In practice I would run more steps to have a less covariance as possible.
| beta_blue | beta_green | beta_intercept | beta_nir | beta_red | sigma | |
|---|---|---|---|---|---|---|
| Min. :2.174 | Min. :-7.606 | Min. :-32.25904 | Min. :0.4432 | Min. :-0.5665 | Min. :241.2 | |
| 1st Qu.:3.040 | 1st Qu.:-5.801 | 1st Qu.: -5.95349 | 1st Qu.:0.4977 | 1st Qu.: 0.2828 | 1st Qu.:265.0 | |
| Median :3.219 | Median :-5.390 | Median : 0.21901 | Median :0.5099 | Median : 0.4983 | Median :269.3 | |
| Mean :3.222 | Mean :-5.392 | Mean : 0.07197 | Mean :0.5102 | Mean : 0.5018 | Mean :269.4 | |
| 3rd Qu.:3.409 | 3rd Qu.:-4.982 | 3rd Qu.: 6.08858 | 3rd Qu.:0.5228 | 3rd Qu.: 0.7278 | 3rd Qu.:273.4 | |
| Max. :4.229 | Max. :-3.369 | Max. : 32.39724 | Max. :0.5776 | Max. : 1.6195 | Max. :296.8 |
Comes out almost exactly the same with the exception of intercept which is much closer to zero than the estimates. But all and all very good.
##
## Call:
## lm(formula = DryMatter ~ nir + red + green + blue, data = all_clean_centered)
##
## Residuals:
## Min 1Q Median 3Q Max
## -981.8 -193.0 -9.7 166.2 1041.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.102e-13 8.986e+00 0.000 1.000
## nir 5.096e-01 1.868e-02 27.281 <2e-16 ***
## red 4.915e-01 3.264e-01 1.506 0.133
## green -5.370e+00 5.936e-01 -9.046 <2e-16 ***
## blue 3.212e+00 2.716e-01 11.827 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 269.1 on 892 degrees of freedom
## Multiple R-squared: 0.6034, Adjusted R-squared: 0.6017
## F-statistic: 339.3 on 4 and 892 DF, p-value: < 2.2e-16
| beta_blue | beta_green | beta_intercept | beta_nir | beta_red | sigma | |
|---|---|---|---|---|---|---|
| Min. :2.174 | Min. :-7.606 | Min. :-32.25904 | Min. :0.4432 | Min. :-0.5665 | Min. :241.2 | |
| 1st Qu.:3.040 | 1st Qu.:-5.801 | 1st Qu.: -5.95349 | 1st Qu.:0.4977 | 1st Qu.: 0.2828 | 1st Qu.:265.0 | |
| Median :3.219 | Median :-5.390 | Median : 0.21901 | Median :0.5099 | Median : 0.4983 | Median :269.3 | |
| Mean :3.222 | Mean :-5.392 | Mean : 0.07197 | Mean :0.5102 | Mean : 0.5018 | Mean :269.4 | |
| 3rd Qu.:3.409 | 3rd Qu.:-4.982 | 3rd Qu.: 6.08858 | 3rd Qu.:0.5228 | 3rd Qu.: 0.7278 | 3rd Qu.:273.4 | |
| Max. :4.229 | Max. :-3.369 | Max. : 32.39724 | Max. :0.5776 | Max. : 1.6195 | Max. :296.8 |
Again this is fairly similar in terms of the above model with same priors and reasoning. But as you can see in the model I have now added two new beta coefficients including the interactions between NIR/Red and Blue/Green. With priors similar to the rest of the \(\beta\) coefficients.
model_string1 <- "model{
# Posterior Likelihood
for(i in 1:n){
Y[i] ~ dnorm(mu[i],inv.var)
mu[i] <- beta_intercept + beta_nir*b1_nir[i] + beta_red*b2_red[i] + beta_green*b3_green[i] + beta_blue*b4_blue[i] + beta_nir_red*b1_nir[i]*b2_red[i] + beta_blue_green*b4_blue[i]*b3_green[i]
}
# Prior for beta
beta_intercept ~ dnorm(0,0.00001)
beta_nir ~ dnorm(0,0.00001)
beta_red ~ dnorm(0,0.00001)
beta_green ~ dnorm(0,0.00001)
beta_blue ~ dnorm(0,0.00001)
beta_nir_red ~ dnorm(0,0.00001)
beta_blue_green ~ dnorm(0,0.00001)
# Prior for the inverse variance
inv.var ~ dgamma(0.001, 0.001)
sigma <- 1/sqrt(inv.var)
}"model <- jags.model(textConnection(model_string1),
data = list(Y=Y,n=n,b1_nir=b1_nir,b2_red=b2_red,b3_green=b3_green,b4_blue=b4_blue))## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 897
## Unobserved stochastic nodes: 8
## Total graph size: 10779
##
## Initializing model
Similar again
burn_in = 1
thin = 1
variable_names = c('beta_intercept','beta_nir','beta_red','beta_green','beta_blue','beta_nir_red','beta_blue_green', 'sigma')
steps = 100000
update(model, burn_in)
draw = jags.samples(model, steps, thin=thin, variable.names = variable_names)
# Convert to a list
make_list <- function(draw)
{
results2 = list()
for(name in names(draw))
{
# Extract "chain 1"
results2[[name]] = as.array(draw[[name]][,,1])
# Transpose 2D arrays
if(length(dim(results2[[name]])) == 2)
results2[[name]] = t(results2[[name]])
}
return(results2)
}
results2 = make_list(draw)| beta_blue | beta_blue_green | beta_green | beta_intercept | beta_nir | beta_nir_red | beta_red | sigma | |
|---|---|---|---|---|---|---|---|---|
| Min. :0.4784 | Min. :-0.0035710 | Min. :-7.1653 | Min. :-38.644 | Min. :0.3902 | Min. :-4.367e-05 | Min. :-2.39708 | Min. :242.6 | |
| 1st Qu.:2.7987 | 1st Qu.:-0.0011480 | 1st Qu.:-5.3436 | 1st Qu.: 3.747 | 1st Qu.:0.4868 | 1st Qu.: 4.349e-04 | 1st Qu.: 0.05318 | 1st Qu.:262.8 | |
| Median :2.9852 | Median :-0.0006934 | Median :-4.9451 | Median : 11.658 | Median :0.4995 | Median : 5.209e-04 | Median : 0.28233 | Median :267.1 | |
| Mean :2.9838 | Mean :-0.0006914 | Mean :-4.9388 | Mean : 11.617 | Mean :0.4995 | Mean : 5.209e-04 | Mean : 0.27983 | Mean :267.2 | |
| 3rd Qu.:3.1693 | 3rd Qu.:-0.0002370 | 3rd Qu.:-4.5312 | 3rd Qu.: 19.559 | 3rd Qu.:0.5123 | 3rd Qu.: 6.070e-04 | 3rd Qu.: 0.50768 | 3rd Qu.:271.4 | |
| Max. :4.2369 | Max. : 0.0023117 | Max. : 0.2695 | Max. : 61.439 | Max. :0.5841 | Max. : 1.167e-03 | Max. : 1.56821 | Max. :295.3 |
Very similar to simple linear model, burn in will remain the same as previous and thinning should be around 150. But again to get a sample of around 5000 I will need to thin at 20 as I have 100,000 steps. This will just add more variance in the end model. In practice I would run more steps to have a less covariance as possible.
| beta_blue | beta_blue_green | beta_green | beta_intercept | beta_nir | beta_nir_red | beta_red | sigma | |
|---|---|---|---|---|---|---|---|---|
| Min. :2.077 | Min. :-0.0027952 | Min. :-6.889 | Min. :-32.569 | Min. :0.4301 | Min. :0.0000370 | Min. :-0.86943 | Min. :246.7 | |
| 1st Qu.:2.803 | 1st Qu.:-0.0011440 | 1st Qu.:-5.340 | 1st Qu.: 3.291 | 1st Qu.:0.4863 | 1st Qu.:0.0004331 | 1st Qu.: 0.05632 | 1st Qu.:262.7 | |
| Median :2.983 | Median :-0.0006795 | Median :-4.948 | Median : 11.227 | Median :0.4993 | Median :0.0005216 | Median : 0.28182 | Median :267.0 | |
| Mean :2.987 | Mean :-0.0006807 | Mean :-4.945 | Mean : 11.299 | Mean :0.4994 | Mean :0.0005223 | Mean : 0.27996 | Mean :267.1 | |
| 3rd Qu.:3.165 | 3rd Qu.:-0.0002384 | 3rd Qu.:-4.537 | 3rd Qu.: 19.326 | 3rd Qu.:0.5120 | 3rd Qu.:0.0006088 | 3rd Qu.: 0.50476 | 3rd Qu.:271.5 | |
| Max. :3.909 | Max. : 0.0017977 | Max. :-2.680 | Max. : 55.482 | Max. :0.5758 | Max. :0.0009891 | Max. : 1.30904 | Max. :288.6 |
Again all are very close only slight deviations between the two models.
##
## Call:
## lm(formula = DryMatter ~ nir * red + green * blue, data = all_clean_centered)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1038.95 -192.58 -6.58 175.33 1000.53
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.6508802 11.7829044 0.989 0.323
## nir 0.5003528 0.0188874 26.491 < 2e-16 ***
## red 0.3020214 0.3330745 0.907 0.365
## green -4.9792709 0.5977240 -8.330 3.03e-16 ***
## blue 2.9995726 0.2745099 10.927 < 2e-16 ***
## nir:red 0.0005193 0.0001276 4.070 5.12e-05 ***
## green:blue -0.0006955 0.0006761 -1.029 0.304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 267 on 890 degrees of freedom
## Multiple R-squared: 0.6107, Adjusted R-squared: 0.6081
## F-statistic: 232.7 on 6 and 890 DF, p-value: < 2.2e-16
| beta_blue | beta_blue_green | beta_green | beta_intercept | beta_nir | beta_nir_red | beta_red | sigma | |
|---|---|---|---|---|---|---|---|---|
| Min. :2.077 | Min. :-0.0027952 | Min. :-6.889 | Min. :-32.569 | Min. :0.4301 | Min. :0.0000370 | Min. :-0.86943 | Min. :246.7 | |
| 1st Qu.:2.803 | 1st Qu.:-0.0011440 | 1st Qu.:-5.340 | 1st Qu.: 3.291 | 1st Qu.:0.4863 | 1st Qu.:0.0004331 | 1st Qu.: 0.05632 | 1st Qu.:262.7 | |
| Median :2.983 | Median :-0.0006795 | Median :-4.948 | Median : 11.227 | Median :0.4993 | Median :0.0005216 | Median : 0.28182 | Median :267.0 | |
| Mean :2.987 | Mean :-0.0006807 | Mean :-4.945 | Mean : 11.299 | Mean :0.4994 | Mean :0.0005223 | Mean : 0.27996 | Mean :267.1 | |
| 3rd Qu.:3.165 | 3rd Qu.:-0.0002384 | 3rd Qu.:-4.537 | 3rd Qu.: 19.326 | 3rd Qu.:0.5120 | 3rd Qu.:0.0006088 | 3rd Qu.: 0.50476 | 3rd Qu.:271.5 | |
| Max. :3.909 | Max. : 0.0017977 | Max. :-2.680 | Max. : 55.482 | Max. :0.5758 | Max. :0.0009891 | Max. : 1.30904 | Max. :288.6 |
We get our Bayes Factor by looking at the ratio of the integrated likelihoods which is shown as follows:
\[P(y|m_{i}) = \int f(y|m_{i},\theta_{1})\pi(\theta_{1}|m_{i})d\theta_{1}\] Where \(m_{i}\) is our two models, our Bayes regression model and our Bayes regression model with interactions. Our Bayes factor can be simplified down to:
\[B^{12} = \frac{P(m_{1}|y)}{P(m_{2}|y)}\frac{P(m_{1})}{P(m_{2})}\] \[ = \frac{P(y|m_{1})}{P(y|m_{2})}\]
We can calculate our integrated likelihood by:
\[sample \ \theta_{1} \sim \pi(\theta_{1}|m_{i})\]
\[P(y|m_{i})\approx \frac{1}{N}\sum^{N}_{i=1}f(y|m_{1},\theta_{1}^{(i)})\] We can now compare our output to Bayes Factor table given by Jeffery’s in 1961 and we can see our value being between -1.1 and 1.1 is not worth more than a bare mention. Meaning the better model would be the more simple one since there is no evidence they are significantly different. Resulting in evidence to favour the linear model being better than the model with interactions.
# Linear Model
y_given_m1 = c()
number_rows_m1 = length(mcmc_samples[,1])
for (i in 1:length(mcmc_samples[1,])){
y_given_m1[i] = sum(mcmc_samples[,i][-number_rows_m1])
sigma_given_m1 = mcmc_samples[,i][number_rows_m1]
}
# interactive Model
y_given_m2 = c()
number_rows_m2 = length(mcmc_samples1[,1])
for (i in 1:length(mcmc_samples1[1,])){
y_given_m2[i] = sum(mcmc_samples1[,i][-number_rows_m2])
sigma_given_m2 = mcmc_samples1[,i][number_rows_m2]
}
# log likelihood
loglik_m1 = dnorm(All_Clean$DryMatter, y_given_m1,sigma_given_m1,log=TRUE)
loglik_m2 = dnorm(All_Clean$DryMatter, y_given_m2,sigma_given_m2,log=TRUE)
#
m1 = sum(loglik_m1)/length(All_Clean$DryMatter)
m2 = sum(loglik_m2)/length(All_Clean$DryMatter)
bf = m1/m2
bf## [1] 0.9820489
Jeffrey’s 1961
In class we went over we could look at the distribution of the difference and check to see if zero is found within the 95% credible interval. As we can see this confirms the result we found using the Bayes factor, that is there is evidence to believe the models are not significantly different
First we plot our data so we can get a visual understanding at what we are preforming the paired t-test on. Visually we can see slight difference in the data. Before we can test is this difference is significant we first must make sure our assumptions are meet. The data is paired as we know the two drymatter values were derived from the same data and our \(n\) is sufficiently large as n > 30. But we will preform an additional Shapiro-Willk normality test which showed p-value < 0.05 so we can assume normality has been met.
The p-value of the paired t-test is 2.2e-16, which is less than the significance level alpha = 0.05. We can then reject null hypothesis and conclude that the ground farmer estimates are significantly different from the satellite estimates with a p-value = 2.2e-16.
d = All_Clean$DryMatter - All_Clean$drymatter
shapiro.test(d)##
## Shapiro-Wilk normality test
##
## data: d
## W = 0.99499, p-value = 0.004746
t.test(All_Clean$DryMatter, All_Clean$drymatter, paired = TRUE, alternative = "two.sided")##
## Paired t-test
##
## data: All_Clean$DryMatter and All_Clean$drymatter
## t = 9.8319, df = 896, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 72.85716 109.19874
## sample estimates:
## mean of the differences
## 91.02795
In this model we specify two normal distributions (as we assume normality) with a \(\mu_{1}\) from our farmer DryMatter and \(\mu_{2}\) from the satellite drymatter. Sigma has been set from \(dunif(-5,5)\) as it will be positive when the square root has been taken and close to zero. After running I found sigma prior didn’t allow it converge as the true sigma must not have been in the range so I increased it to \(dunif(-8,8)\). Since the null hypothesis is our default position and is the hypothesis of no difference we sample our priors for \(\mu_{1}\) and \(\mu_{2}\) from the same mean \(m\) and standard deviation \(s\) where \(s\) is set similar to our sigma but slightly smaller so \(s^2\) we allow our means to be close to one another and represent our null hypothesis better.
model = "model
{
for (i in 1:length(x1)) {
x1[i]~dnorm(mu1,(1/(sigma^2)))
x2[i]~dnorm(mu2,(1/(sigma^2)))
}
m ~ dnorm(0, (1/(100^2))) ## N(m,s^2) prior for mu1, mu2
log_s ~ dunif(-3,3) ## m and s^2 are hyper-parameters
s <- exp(log_s)
mu1 ~ dnorm(m, (1/(s^2))) #prior for mu1 and mu2
mu2 ~ dnorm(m, (1/(s^2)))
log_sig ~ dunif(-8,8) #prior for common sigma
sigma<-exp(log_sig)
}
"
data = list(x1=All_Clean$DryMatter, x2=All_Clean$drymatter)
# Variables to monitor
variable_names = c('mu1','mu2','sigma')Starting of with a small burn-in and then check in the following step
# How many burn-in steps?
burn_in = 1
# How many proper steps?
steps = 100000
# Thinning?
thin = 1
# Write model out to file
fileConn=file("model.temp")
writeLines(model, fileConn)
close(fileConn)
if(all(is.na(data)))
{
m = jags.model(file="model.temp")
} else
{
m = jags.model(file="model.temp", data=data)
}## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1794
## Unobserved stochastic nodes: 5
## Total graph size: 1815
##
## Initializing model
update(m, burn_in)
draw = jags.samples(m, steps, thin=thin, variable.names = variable_names)
# Convert to a list
make_list <- function(draw)
{
results = list()
for(name in names(draw))
{
# Extract "chain 1"
results[[name]] = as.array(draw[[name]][,,1])
# Transpose 2D arrays
if(length(dim(results[[name]])) == 2)
results[[name]] = t(results[[name]])
}
return(results)
}
results3 = make_list(draw)We can see our burn-in at 100 is satisfactory and and thinning of 20 looks appropriate.
| mu1 | mu2 | sigma | |
|---|---|---|---|
| Min. :2092 | Min. :2030 | Min. :370.7 | |
| 1st Qu.:2125 | 1st Qu.:2065 | 1st Qu.:388.7 | |
| Median :2133 | Median :2073 | Median :393.1 | |
| Mean :2133 | Mean :2073 | Mean :393.1 | |
| 3rd Qu.:2141 | 3rd Qu.:2081 | 3rd Qu.:397.5 | |
| Max. :2181 | Max. :2122 | Max. :419.7 |
We can see the zero does not fall into our 95% credible interval indicating that there is evidence to reject our null that the means are the same. This confirms what we established using the pair t-test earlier.
| mu_diff | |
|---|---|
| Min. : 5.567 | |
| 1st Qu.: 8.037 | |
| Median : 8.604 | |
| Mean : 8.614 | |
| 3rd Qu.: 9.198 | |
| Max. :11.755 |