BioStat Inference

This R Markdown document is for the purpose of introducing statistical inference (modeling).
Some References:
Marin and Robert, 2014. Bayesian Essentials with R.
Rossi et al. 2005. Bayesian Statistics and Marketing
UCx STA02.1ucX Introduction to Bayesian Statistics Using R
https://m-clark.github.io/mixed-models-with-R/
brms: An R Package for Bayesian Multilevel Models using Stan

Preface
This document starts by an overview on probability basics needed for understanding of the Bayes rule for linked (i) events and (ii) random variables.
Next, we use the Beyes rule to infer about parameters of commonly used distribution, normal mixtures, and linear models.

Preparation - Import relevant packages

# if you don't have these packages installed yet, please use the install.packages("package_name") command.
library(tidyverse)
library(knitr)
library(kableExtra)
library(r2symbols)
library(ggpubr)
library(rstan) 
library(brms)
library(MCMCglmm)

For linked events

Lets plot imaginary results from a sample space \(\Omega\), i.e., the set of all possible outcomes of trial(s).
\(B_1\) and \(B_0\) are cause events or sets of outcomes (e.g., ‘heatwave versus no heatwave’ in August) that are mutually exhaustive \([B_1 \cup B_0 = \Omega]\) and mutually exclusive \([B_1 \cap B_0 = \emptyset]\).
\(y_1\) and \(y_0\) are effect events (e.g., ‘seastar morality versus no mortality’).

B1 y1	B1 y1	B0 y1	B0 y1
B1 y1	B1 y1	B0 y1	B0 y0
B1 y1	B1 y1	B0 y0	B0 y0
B1 y1	B1 y0	B0 y0	B0 y0

Contingency table shows probability of all possible events

	y0	y1
B0	P(B0,y0)	P(B0,y1)	P(B0)
B1	P(B1,y0)	P(B1,y1)	P(B1)
	P(y0)	P(y1)

Marginal probability \[P(B_1)=\frac{8}{16} \ \ \ \ \ P(y_1)=\frac{10}{16} \]

Probability of sum \[P(B_1 \cup y_1) = P(B_1) + P(y_1) - P(B_1 \cap y_1) = \frac{11}{16} \] Probability of intersection (Joint probability) \[P(B_1 \cap y_1)=P(B_1,y_1)=P(B_1 \ and \ y_1)= \frac{7}{16}\] \[P(B_1 \cap y_1)=P(y_1) \cdot P(B_1 \mid y_1) \] Conditional probability (prob. of \(B_1\) given \(y_1\)) \[P(B_1 \mid y_1)= \frac{P(B_1 \cap y_1)}{P(y_1)} = \frac{7/16}{11/16}=\frac{7}{11}\] Probabilities of intersections are equal \[P(B_1 \cap y_1)=P(B_1 \mid y_1) \cdot P(y_1)=\frac{7}{16}\] \[P(y_1 \cap B_1)=P(y_1 \mid B_1) \cdot P(B_1)=\frac{7}{16}\] \[P(B_1 \mid y_1) \cdot P(y_1) = P(y_1 \mid B_1) \cdot P(B_1) \]

Bayes rule

\[P(B_1 \mid y_1) =\frac {P(y_1 \mid B_1) \cdot P(B_1) }{P(y_1)}\] Remember that \(B_0\) and \(B_1\) are mutually exhaustive
\[P(B_1 \mid y_1) =\frac {P(y_1 \mid B_1) \cdot P(B_1) }{\sum_i P(y_1 \cap B_i)}\]
\[P(B_1 \mid y_1) =\frac {P(y_1 \mid B_1) \cdot P(B_1) }{\sum_i P(y_1|B_i)P(B_i)}\]

Usually, \(P(y_1 \cap B_1)\), \(P(B_i)\) and \(P(y_i|B_i)\) are known or can be assumed for i= 0, 1. e.g., lets thinks about the ‘heatwave occurs’ and ‘seastar mortality’… and that there is usually a cause and effect relation between the two sets of events.
Bayes formula allows us to update the conditional probability when having new data.

For linked random variables

Random variables have probability density or mass function (pdf or pmf).
A probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes (events) for an experiment (https://en.wikipedia.org/wiki/Probability_distribution).
What was a random variable in the previous example? …

a binary random variable with two possible event outcomes (e.g., \(B_1\) and \(B_0\) in heatwave occurrence) has a Bernoulli distribution. [y-axis: prob. of event x-axis: trial name]
a binomial random variable counting how often a particular event (\(B=B_1\)) occurs in a fixed number of trials has a Binomial distribution. [y-axis: prob. of event x-axis: # of success in n trial]
(NOTE: Bernoulli is Binomial with n=1)

Generally speaking, assuming \(Y\) as a continuous (or discrete) random variable with probability density (or mass) function \(f(y)\) (or \(p_Y(y)\) or \(P(Y=y)\)):

\[f(y) \geq 0 \qquad \forall y\] \[\int_{-\infty}^\infty f(y)dy =1\] Cumulative density function (cdf) \[F(y) = F(Y \leq y) = \int_{-\infty}^y f(y) dy\] The joint distribution \[f(y,\beta) = f(y|\beta)f(\beta) = f(\beta|y)f(y)\] Marginal pdf \[f(y) = \int_{-\infty}^{\infty}f(y,\beta)d\beta\] NOTE: Joint pdf of multiple random variables \(y_1,y_2,y_3,...,y_K\) \[f(y_1,y_2,y_3,...,y_K)=f(y_1|y_2,y_3,...,y_K)f(y_2|y_3,...,y_K)...f(y_K)\]

Bayes formula (or theorem)

For \(\beta\) and \(Y\) as linked random variables, \[f(\beta|y) = \frac{f(y|\beta)f(\beta)}{\int_{\beta}f(y|\beta)f(\beta)d\beta}\] \[f(\beta|y) = \frac{f(y|\beta)f(\beta)}{\int_{B}f(y,\beta)d\beta }\]
Only difference is in denominators which can be explained through the equation for marginal distribution above. \[pdf_{posterior} = \frac{JointLikelihood * pdf_{prior}}{pdf_{marginal}}\]

By now, we had an overview on the definition of the likelihood, that is the conditional probability of effect given cause. Here, the probability is a function of effect on cause. In the frequentist approach, the likelihood is considered as the data generating mechanism.
Bayes rule goes beyond this, and explicitly incorporate prior knowledge about the cause to define the conditional probability (or prob. distribution) of cause given effect. This is also called a posterior probability (or posterior probability distribution) and is usually unknown. In term of two linked events, we can think of it as the probability of fire (cause) given smoke observation (effect).
The Bayesian approach modifies the likelihood function into a posterior distribution, which is a valid probability distribution on cause.
When acquiring new data, we can use the last posterior as the new prior, and get an updated version of the posterior probability.
Importantly, Bayesian approach also provide a predictive distribution for effect given previously observed effect.

Posterior parameter and predictive distributions for count (Poisson) data

Posterior pdf of expected value \(\mu\) (given observed counts \(y\))

We assume \(y\) is poisson distributed. Thus, we have two linked random variables, \(y\) and \(\mu\). \[y|\mu \sim Pois(\mu) \\ E(Pois(\mu))=\mu \\ Var(Pois(\mu))=\mu \] \(\mu\) itself is a random variable, assumed to be Gamma distributed \[\mu|\alpha,\beta \sim Gamma(\alpha,\beta) \\ E(Gamma(\alpha,\beta)) = \frac{\alpha}{\beta} \\ Var(Gamma(\alpha,\beta))=\frac{\alpha}{\beta^2}\]
NOTE: As you noticed we are having assumptions one after the other. Welcome to the world of modeling.

Joint likelihood: Remember that in likelihood, the probability is a function of effect (\(y_i\)) on cause (\(\mu\)). \[f(y|\mu) = f(y_1,...,y_n|\mu) = \prod_if(y_i|\mu)=\prod_i\frac{\mu^{y_i}e^{-\mu}}{y_i!}=\frac{\mu^{\sum_iy_i}e^{-n\mu}}{\prod_iy_i!}\] Prior pdf \[f(\mu|\alpha,\beta) = \frac{\beta^\alpha}{\Gamma(\alpha)}\mu^{\alpha-1}e^{-\beta\mu} \quad \textrm{for } \mu>0\]
Posterior pdf \[\begin{eqnarray} f(\mu|y,\alpha,\beta) &=& \frac{f(y|\mu)f(\mu|\alpha,\beta)}{\int_0^{\infty} f(y|\mu)f(\mu|\alpha,\beta)d\mu}= \frac{\mu^{\alpha+\sum_iy_i-1}e^{-(\beta+n)\mu}}{\int_0^{\infty}\mu^{\alpha+\sum_iy_i-1}e^{-(\beta+n)\mu}d\mu} \end{eqnarray}\]
As the denominator is a constant with respect to \(\mu\),
\[f(\mu|y,a,b) \propto \mu^{\alpha+\sum_iy_i-1}e^{-(\beta+n)\mu}\] \[\mu|y,\alpha,\beta \sim Gamma(\alpha+\sum_iy_i,\beta+n)\] Thus, gamma distribution is a conjugate prior for the Poisson likelihood.
Importantly, we can then add new data \(\mathbf{y^{\ast}}\) to \(\alpha\) of the posterior Gamma distribution and add the number of new observations \(\mathbf{n^{\ast}}\) to \(\beta\). \[\mu|\mathbf{y^{\ast}},\alpha,\beta \sim Gamma(\alpha+\mathbf{y^{\ast}}, \beta+\mathbf{n^{\ast}})\]
\[E(\mu|\textbf{y},\alpha,\beta) = \frac{\alpha+\sum_iy_i}{\beta+n} = \frac{\alpha+n\bar{y}}{\beta+n}\] As \(n\) increases, bayesian posterior mean \(E(\mu|\textbf{y},\alpha,\beta)\) will approach frequentist mean \(\bar{y}\): \[\lim_{n \rightarrow \infty} E(\mu|\textbf{y},\alpha,\beta) =\bar{y}\]

Practice

Lets practice using mussel recruit abundance data collected in summer 2018. As you see, we have 24 counts (Recruits_n) at Thermal_baseline (0, 1,.., and 6). For now we omit all other columns.

Import mussel abundance dataframe

library(readxl)
df <- read_excel("~/Library/Mobile Documents/com~apple~CloudDocs/KOB_2018_MytilusRecruitment_onBaskets.xlsx")
df=as.data.frame(df)
knitr::kable(head(df)) %>% kable_styling(full_width = F)

Benthocosm	Thermal_baseline	Upwelling	Basket	Recruits_n
E2	0	No	Basket_19	208
E2	0	No	Basket_20	272
A1	1	No	Basket_1	83
A1	1	No	Basket_2	32
C2	2	No	Basket_11	126
C2	2	No	Basket_12	103

assume a prior parameter distribution: as we don’t have any prior information, we use a “vague’ prior: \(\Gamma(\alpha=0.001,\beta=0.001)\)
It is called vague, as it is very weakly held and will be totally change by exposure to new information.
define posterior parameter distribution via Monte Carlo simulation

prior_alpha=0.001
prior_beta=0.001
sim_num <- 10^3 # number of simulations
mu <- rgamma(sim_num, prior_alpha+sum(df$Recruits_n), prior_beta+24)
hist(mu, freq=F, xlab='mu', ylab='Post. pred. density', main='')

mean(mu)

## [1] 120.9296

var(mu)

## [1] 5.221075

qgamma(.025,prior_alpha+sum(df$Recruits_n),prior_beta+24)

## [1] 116.6053

qgamma(.975,prior_alpha+sum(df$Recruits_n),prior_beta+24)

## [1] 125.4053

#for highest density region 
pp <- seq(0,.05,length.out=1000) 
lo <- sapply(pp,qgamma,shape=prior_alpha+sum(df$Recruits_n),rate=prior_beta+24) 
hi <- sapply(pp+.95,qgamma,shape=prior_alpha+sum(df$Recruits_n),rate=prior_beta+24)
pp[which.min(hi-lo)]

## [1] 0.02427427

Posterior predictive pdf (conditional pdf of predicted value \(\tilde{y}\) given observed count data \(y\))

\[f(\tilde{y}|y)\] considering Sum rule: \(f(y)=\sum f(y,\mu)\) \[f(\tilde{y}|y) = \int f(\tilde{y},\mu|y) d\mu\] considering Product rule \(f(y,\mu)=f(\mu|y)f(\mu)\) and conditional independence of \(\mu\) and \(y\)
\[f(\tilde{y}|y) = \int f(\tilde{y}|\mu)f(\mu|y) d\mu\]

Derivation to posterior parameter (or predictive) distribution is not always easy and instead posterior can be estimated via numerical methods.

Practice

Simulate a vector of (\(\mu\)) values from posterior mu distribution.
Considering the data-generating mechanism, Pois(y|mu), simulate or predict a vector of recruitment numbers ‘y_tilde’

set.seed(12345)
# A vector of y-tilde is simulated using values of mu which was simulated above
y_tilde <- rpois(sim_num,mu) 
hist(y_tilde,freq=F, xlab='# of recruits',
     ylab='Post. pred. density', main='')

mean(y_tilde)

## [1] 121.492

quantile(y_tilde,c(.025,.975))

##  2.5% 97.5% 
##    99   145

mean(y_tilde>=140)

## [1] 0.072

mean(y_tilde<=50)

## [1] 0

Pairwise comparison of \(\mu\) between Thermal history levels

prior_alpha=0.001
prior_beta=0.001
sim_num <- 10^3 # number of simulations
df_mu = NULL
# simulate posterior mu distribution given data of each treatment level
for (TB in unique(df$Thermal_baseline)){
  mu <- rgamma(sim_num, prior_alpha+sum(df$Recruits_n[df$Thermal_baseline==TB]),
               prior_beta+length(df$Recruits_n[df$Thermal_baseline==TB]))
  df_mu <- cbind(df_mu, mu)
  colnames(df_mu)[ncol(df_mu)]  <- paste('TB',TB, sep="")
}
df_mu = as.data.frame(df_mu)
knitr::kable(head(df_mu)) %>% kable_styling(full_width = F)

TB0	TB1	TB2	TB3	TB4	TB5
443.3389	104.9427	112.6016	42.34512	2.804348	0.0064171
461.2639	112.5685	108.2508	40.74966	4.293258	0.0002523
447.3130	120.2431	118.6380	45.53929	3.817926	0.0096588
443.0322	112.2095	113.7506	46.72680	3.315271	0.0006233
436.1325	119.9078	118.6365	47.79399	4.957076	0.0275269
450.6076	109.8995	106.0149	47.54148	3.017464	0.0054606

paste("bayesian P-value = ", mean(df_mu$TB1>df_mu$TB2), sep = "")

## [1] "bayesian P-value = 0.576"

df_mu_long = df_mu %>%
  tidyr::pivot_longer( everything(), names_to = 'Thermal baseline',
                       values_to = 'mu' ) 
df_mu_long <- as.data.frame(df_mu_long)
knitr::kable(head(df_mu_long)) %>% kable_styling(full_width = F)

Thermal baseline	mu
TB0	443.3389200
TB1	104.9427125
TB2	112.6015823
TB3	42.3451228
TB4	2.8043480
TB5	0.0064171

gghistogram(df_mu_long, x = "mu", add = "mean", bins = 200, color = "Thermal baseline",
            fill = "Thermal baseline")

Statistical evidence is not enough to support the hypothesis that recruitment was higher in TB2 than TB2.

Posterior parameter distributions for NORMAL data

Posterior pdf of expectation and precision \(\mu\) and \(\tau\) (given observed NORMAL data \(y\))

Here we have three random variables:

\[y|\mu,\tau \sim N(\mu,\tau)\] \[\mu \sim N(\mu_0,\tau_0)\] \[\tau \sim Gamma(\alpha,\beta)\]
Joint likelihood \[\begin{eqnarray} f(y|\mu,\tau)=f(y_1,...,y_n|\mu,\tau) &=& \prod_i\left( \frac{\tau^{1/2}}{\sqrt{2\pi}}\exp\left(-\frac{\tau}{2}\left(y_i-\mu\right)^2\right)\right)\nonumber\\ &=& \frac{\tau^{n/2}}{(2\pi)^{n/2}}\exp\left(-\frac{\tau}{2}\sum_i(y_i-\mu)^2\right)\nonumber\\ &\propto& \tau^{n/2}\exp\left(-\frac{\tau}{2}\sum_i(y_i-\mu)^2\right)\nonumber \end{eqnarray}\] Prior pdf \[f(\mu) = \frac{\tau_0^{1/2}}{\sqrt{2\pi}}\exp\left(-\frac{\tau_0}{2}\left(\mu-\mu_0\right)^2\right) \propto \exp\left(-\frac{\tau_0}{2}\left(\mu-\mu_0\right)^2\right)\]

\[f(\tau) = \frac{\beta^\alpha}{\Gamma(\alpha)}\tau^{\alpha-1}\exp(-\beta\tau)\propto \tau^{a-1}\exp(-\beta\tau)\]

Posterior pdf \[\mu|\tau,y \sim N\left( \frac{n\bar{y}\tau+\mu_0\tau_0}{n\tau+\tau_0}, n\tau+\tau_0 \right)\]

\[\tau|\mu,y \sim Gamma\left(\alpha+n/2,\beta+\frac{1}{2}\sum_i(y_i-\mu)^2\right)\]
Again at this step, think about what happens on the expectation of these two posterior pdf when the sample size goes to infinity…

Here the simulation is a bit different. First, we need to set an initial value of tau to sample a new mu given the initial tau, then sample a new tau given the obtained mu and so on. Then we can repeat such a chain by setting an initial value for mu. This method of sampling from conditional posterior distributions, one by one, is called the Gibbs sampler, a kind of Markov Chain Monte Carlo (MCMC) sampling method.

Practice

Lets practice using mussel growth rate data collected in summer 2018. We focus on data of shell length growth (SL_Growth_mm_d) at various Thermal_baselines (0, 1,.., and 6). For now we omit all other columns.

Import mussel abundance dataframe

path = paste("~/Library/Mobile Documents/com~apple~CloudDocs/", sep="")
df <- read_excel(paste(path,"KOB_2018_mussel_growth_99outliers_removed.xlsx",sep=""))

## New names:
## * `` -> ...1

df = as.data.frame(df)
df = select(df, Thermal_baseline, SL_Growth_mm_d)
knitr::kable(head(df)) %>% kable_styling(full_width = F)

Thermal_baseline	SL_Growth_mm_d
0	0.1917188
0	0.2417188
0	0.2229688
0	0.2385938
0	0.2432812
0	0.2401562

Remember that, prior parameter values shall not be selected based on data while initial parameter values can be selected based on the data. Later we will discuss how prior parameter values can be chosen not to be totally off from data.

# define y
y <- df$SL_Growth_mm_d
n = length(y)

# assume prior parameters
prior_mu=0
prior_tau=0.0001
prior_alpha=0.001
prior_beta=0.001

## MCMC 
# setting the seed
set.seed(123)
# number of iterations:
sim_num <- 10^3
# preparing empty vectors to later store the sampled values
mu.sample <- tau.sample <- numeric(sim_num)
# assigning starting values
mu <- mean(y); tau <- 1/var(y)
for(i in 1:sim_num){
  mu <- mu.sample[i] <-rnorm(1, (sum(y)*tau+prior_mu*prior_tau)/(n*tau+prior_tau),
                             1/sqrt(n*tau+prior_tau))
  tau <- tau.sample[i] <- rgamma(1, prior_alpha+n/2,
                                prior_beta+0.5*sum((y-mu.sample[i])^2))
}
par(mfrow=c(2,2))
plot(mu.sample,ty='l',xlab='iteration',ylab=expression(mu))
plot(density(mu.sample),ty='l',ylab='density',xlab=expression(mu),lwd=2,main='')
plot(tau.sample,ty='l',xlab='iteration',ylab=expression(tau))
plot(density(tau.sample),ty='l',ylab='density',xlab=expression(tau),lwd=2,main='')

As our simulation get more complicated, we need to check if the script work fine. One efficient way is to first simulate data from known arbitrary parameters, and then test how MCMC estimated parameters relate to real values.

Bayesian cluster analysis: Posterior distributions for Gaussian mixture data (uni-variate, with a known # of components)

Assume \(y\) (e.g., the organism size) has a Gaussian mixture distribution (e.g., because it includes size values of organisms of various cohorts).
Let \(y_i\) denote an observation and \(z_i\) is the type (component) of that observation. Note: \(z_i\) is unobserved or latent variable, but we know that there are \(K\) components in total.
Using the law of total probability, we can define \(\pi_k\) that is the mixture proportions or weights. \[f(y_i) = \sum_{k=1}^K f(y_i|z_i=k)\underbrace{f(z_i=k)}_{\pi_k} = \sum_{k=1}^K f(y_i|z_i=k)\pi_k\] A uni-variate Gaussian mixture has four random variables: \[f(y_i) = \sum_{k=1}^K f(y_i|z_i=k, \mu_k,\tau_k) \pi_k\] Robert & Marin 2014: It is actually relevant to distinguish the weights \(\pi_k\) from the other parameters in that they are solely associated with the missing- data structure of the model, while the others are related to the observations. … the inferential (parametric modeling) goal associated with a sample from a mixture of distributions may be either to (i) reconstitute the groups by estimating the missing component, an operation usually called classification (or clustering), (ii) to provide estimators for the parameters of the different groups, or even (iii) to estimate the number k of groups … In addition, from the semi-parametric modeling perspective, If k is large enough, there is theoretical support for the argument that the latter equation provides a good approximation (in some functional sense) to most distributions.

MCMC Solutions Mixture Gibbs Sampler Initialization: https://www.youtube.com/watch?v=zVwFPDsxsH0 1. Initialization: Choose \(\pi_k^{(0)}\) and \(\mu_k^{(0)},\tau_k^{(0)}\) arbitrarily.
2. for iterations t (t ≥ 1):
3. calculate \(\pi_{ik}=f(z_i=k|y_i,\mu_k,\tau_k)∝f(y_i|\mu_k^{(t−1)},\tau_k^{(t-1)})\pi_k^{(t−1)}\)
4. sample \(z_i^{(t)} \sim Mul(\pi_{i1},..., \pi_{iK})\)
5. calculate \(n_k, \mu_k^{(t)}, \tau_k^{(t)}\)
6. sample \(\tau_k \sim f(\tau_k | y, z)\)
7. sample \(\mu_k \sim f(\mu_k | \tau_k, y, z)\)
8. sample \(\pi_k \sim f(\pi_k | z)\)
(1) You generate random Gaussian distributions for the components. (3) Then, based on those distributions, each of data points is assigned with probabilities of being from the components. (4) From this multinomial distribution, we can sample component labels for all data points. (5) Based on labeled data, we calculate sample statistics for each component. (6-8) Form and sample from parameter posteriors. And repeat from 3.

# eval=FALSE, include=FALSE}
################# An example using 'bmixture' library
library(bmixture)

# first for simulated data, from mixture of Normal with 3 components
n_ = 500
weight_ = c( 0.3, 0.5, 0.2 )
mean_ = c( 0 , 10 , 3 )
sd_ = c( 1 , 1 , 1 )
# rmixnorm is random generation function for a finite mixture of uni-variate Normal distribution.
data_ = rmixnorm( n = n_, weight = weight_, mean = mean_, sd = sd_ ) 

# plot for simulation data and the exact model
hist( data_, prob = TRUE, nclass = 30, col = "gray" )
x = seq( -20, 20, 0.05 )
# dmixnorm is density function for a finite mixture of uni-variate Normal distribution
densmixnorm = dmixnorm( x, weight_, mean_, sd_ ) 
lines( x, densmixnorm, lwd = 2 )

# Runing bdmcmc algorithm for simulated data
bmixnorm.obj = bmixnorm( data_, k=3, iter = 5000, trace = F ) # bmixnorm function consists of several sampling algorithms for Bayesian estimation for finite mixture of Normal distributions.
summary( bmixnorm.obj )

## 
## Number of mixture components =  3
## Estimated 'pi'  =  0.3 0.2 0.5
## Estimated 'mu'  =  -0.05 2.79 10.02
## Estimated 'sig' =  0.66 1.16 0.97

pi_sample_chain = lapply(bmixnorm.obj['mu_sample'], function(x) x[,1])
plot(seq(1, length(unlist(pi_sample_chain))),unlist(pi_sample_chain, use.names = F))

# Runing bdmcmc algorithm for our shell length data
path = paste("~/Library/Mobile Documents/com~apple~CloudDocs/", sep="")
df1 <- read_excel(paste(
  path,"KOB_2018_mussel_growth_99outliers_removed.xlsx",sep=""))

## New names:
## * `` -> ...1

df1 = as.data.frame(df1)
data2 = as.numeric(df1$SL_mm)

library(bmixture)
data2 = as.numeric(df1$SL_mm)
bmixnorm.obj = bmixnorm( data2, k = 2, iter = 5000, trace = F )#, pi.start=c(3,4) )
#summary( bmixnorm.obj )

# the plot seems merged (the two dist. are not clear)
#plot(bmixnorm.obj)

# plotting the two identified mixed Gaussian
set.seed(12) #better make this reproducible
observations <- tibble(value = c(
  rnorm(n = 1000, mean = 23.4, sd = 5.18), #the first normal distribution
  rnorm(n = 1000, mean = 26.02, sd = 8.63) #this second distribution is double the size of the first
  ))
ggplot(observations, aes(x = value)) +
  geom_histogram(binwidth = 0.5) +
  mapply(
    function(mean, sd, lambda, n, binwidth) {
      stat_function(fun = function(x) {(dnorm(x, mean = mean, sd = sd)) * n * binwidth * lambda})
    },
    mean = c(23.4, 26.02), #mean
    sd = c(5.18, 8.63), #standard deviation
    lambda = c(0.62, 0.38), #amplitude
    n = length(observations$value), #sample size
    binwidth = 0.5 #binwidth used for histogram
  )

Bayesian statistical inference is the gold standard for uncertainty estimation

Zero-inflated binomial model

The first object of inference is the posterior pdf of the parameters of a zero-inflated binomial pdf
A binomial distributed random variable \(y\) represent ‘# of success in n Bernoulli trials’.
Lets assume that \(y_i\) is # of success in 7 barnacle sampling trials in Site i, and the aim is to find the posterior pdf of the parameter p (i.e., prob. of success in a random sampling trial):

set.seed(12345)
# define a y vector (for 100 sites)
y <- rep(0:6,c(22,6,18,23,18,10,3))
hist(y)

As you can see, there are usually many sites with zero success in barnacle sampling, simply because there are sites not populated by barnacle at all due to whatever reason (not receiving enough recruits and etc.).
Such zero inflation are not considered in a binomial distribution.
Thus, we have two objects of inference:
(i) the posterior pdf of \(p\) (i.e., the prob. of success in a random sampling trial) for the sites that such possibility is not zero.
(ii) the probability of a random site having non-zero prob. of success.
Thus, a latent variable \(z\) need to be introduced that labels whether barnacle sampling can or cannot be successful on each of sites. In the posterior parameter pdf, we include \(z=1\) as a condition.
Lets look immediately into the Gibbs sampler: [NOTE: derivations not included]
(1) We know by now that the posterior pdf of \(p\) is a Beta-binomial conjugate model (a binomial joint likelihood multiplied by a beta prior). By assigning initial values to \(z_i\), we can start sampling from this posterior, in the loop.
\[p|y,z=1 \sim Beta\left(a+\sum_iy_i, b+7\sum_iz_i-\sum_iy_i\right)\]
(2) We also sample from the posterior pdf of \(\omega\) [\(\omega\) is the parameter of the latent (labeling) variable \(z\) Bernoulli distribution; \(\omega\) represent the prob. of a random site having non-zero prob. of barnacle sampling success]. The posterior \(\omega\) pdf is also a Beta-binomial (or Beta-Bernoulli) conjugate model. As we assigned initial values to \(z_i\), we can also start sampling from this posterior in the loop.
\[\omega|z \sim Beta\left(a_{\omega}+\sum_iz_i, b_{\omega}+n-\sum_iz_i\right)\]
(3) Finally, we need a probability distribution for \(z_i\) being equal to 1, given \(y_i=0\). As we already sampled \(p\) and \(\omega\), the next step in the loop will be sampling a new set of labels from this Bernoulli distribution:
\[Pr(z_i=1|y_i=0,p,\omega) = \frac{(1-p)^7\omega}{(1-\omega)+(1-p)^7\omega}\]

set.seed(12345)
n = length(y)
# number of iterations
ITER <- 10^3
# monitor vectors that will be filled by samples from posterior distributions
mon.p <- numeric(ITER) 
mon.omega <- numeric(ITER) 
# a matrix that will be filled by each set of labels sampled from z, we need :
mon.z <- array(dim=c(n,ITER))
# assigning initial values to z
z <- (y>=1)*1

# starting Gibbs sampler:
for(iter in 1:ITER){
  # (i) sampling p from a beta distribution. REMEMBER, p is the prob. of success on a random barnacle sampling trial
  p <- rbeta(1,1+sum(y),1+7*sum(z)-sum(y))
  # (ii) sampling omega from a beta distribution. REMEMBER, omega is the prob. of a random site having non-zero prob. of barnacle sampling success
  omega <- rbeta(1,1+sum(z),1+n-sum(z))
  # (iii) defining the Bernoulli distribution for the probability that z=1, given y=0
  prob.z.equal1 <- 1*(y>=1)+(1-p)^7*omega/(1-omega+(1-p)^7*omega)*(y==0)
  # sampling labels from the Bernoulli distribution
  z <- rbinom(n,1,prob.z.equal1)
  
  # update the monitors with the new values
  mon.p[iter] <- p
  mon.omega[iter] <- omega
  mon.z[,iter] <- z
}

# checking the posterior samples:
par(mfrow=c(2,2))
plot(mon.p,ty='l')
plot(density(mon.p),main='')
plot(mon.omega,ty='l')
plot(density(mon.omega),main='')

# producing posterior summaries
mean(mon.omega); quantile(mon.omega,c(.025,.975))

## [1] 0.7877691

##      2.5%     97.5% 
## 0.7031661 0.8704294

mean(mon.p); quantile(mon.p,c(.025,.975))

## [1] 0.4517731

##      2.5%     97.5% 
## 0.4107704 0.4910918

The sampling can succeed in ca. 78% of sites ( CI: 70-86%). In those sites, the prob. of success in random barnacle sampling is 0.45 (CI: 0.40-0.49).

The second object of inference is the posterior predictive pdf for our zero-inflated binomial variable \(y\)

# preparing posterior predictive sample of 0-frequency of success
post.pred.z0 <- numeric(ITER)
# Gibbs sampler for predictive pdf
for(iter in 1:ITER){
  z.post.pred <- rbinom(100,1,omega)
  y.post.pred <- rbinom(100,7,p)
  y.post.pred[z.post.pred==0] <- 0
  post.pred.z0[iter] <- sum(y.post.pred==0)
}
hist(y,breaks=seq(-.25,7.25,.5), col='lightblue', ylim=c(0,40),
     main='', xlab='# of success in n sampling trials')
# new zero-inflated binomial model prediction
arrows(0-.1,quantile(post.pred.z0,.025),
       0-.1,quantile(post.pred.z0,.975),
       angle=90,lwd=2,col='black',code=3,length=.1)
points(0-.1,mean(post.pred.z0),pch=16,col='black',cex=2)

Generalised Linear (Additive) Mixed Model (or “Hierarchical Models”, “Multilevel Models”, “Random Effects Models”)

A D-distributed response variable \(y\) predicted by an inverse link function \(f\) of a linear predictor parameter \(\eta\) and another parameter \(\delta\).
\[ y_i = D(f(\eta_i, \delta)) \ \ \ \ \ \ \ \ \ \eta = X \theta + Z \xi \]
\(\delta\) is usually assumed not to vary across data points (e.g., the standard deviation or precision in Gaussian models or the shape parameter in Gamma or negative binomial models).
An inverse link function (e.g., \(log()\)) make linear predictors map to predicted values that are on a different scale. \(\theta\) and \(\xi\) are the population-level and group-level regression parameters (fixed and random effects from the frequentist perspective).
\(X\) and \(Z\) are the corresponding design matrices, which with \(y\) make up the data.

Gaussian Linear Model

A Gaussian distributed response variable \(y\) predicted by identity link function of a linear predictor \(\mu\) and precision \(\tau\):
\[ y = N(I(\mu, \tau)) \ \ \ \ \ \ \ \ \ \mu = X \theta \]
Other formulation of Normal Linear Model
Matrix form: \[ y = X \theta + \epsilon \]
Vector form: \[ \hat{y_i} = \theta_0 + \theta_1^T x_i \]
Scalar form: \[ y_i = \theta_0 + \sum_{j=1}^p \theta_j x_{i,j} + \epsilon_i \]

With single predictor

likelihood (NOT joint likelihood) \[y_i|\mu_i,\tau \sim N(\mu_i,\tau)\] \[\mu_i = \theta_0 + \theta_1 x_i\]
Prior \[\theta_0 \sim N(\mu_{\theta_0},\tau_{\theta_0})\] \[\theta_1 \sim N(\mu_{\theta_1},\tau_{\theta_1})\] \[\tau \sim Gamma(\alpha,\beta)\]

As always, the objects of inference are the posterior pdf of parameters and predicted response
Here, the derivation of posterior parameter and predictive distributions are more complicated. As long as such derivations are possible, we can use Gibbs sampler to obtain samples from posterior distribution of \(\theta_0\), \(\theta_1\), and \(\tau\) respectively and create posterior mean estimates.

Practice

Lets practice again using mussel growth rate data collected in summer 2018. We focus on data of shell length growth (SL_Growth_mm_d) at various Thermal_baselines (0, 1,.., and 6). We use the function MCMCglmm.

# vague prior parameters, i.e., means and variances for slope and intercept; the Gamma prior of tau is set by default.
prior <- list(B=list(mu=c(0,0), V=c(100^2,100^2)*diag(2)))
mod_1 <- MCMCglmm(SL_Growth_mm_d ~ Thermal_baseline, data=df, nitt=11000,
                  burnin=1000, thin = 10, prior=prior, verbose=F)

print(summary(mod_1))

## 
##  Iterations = 1001:10991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: -2046.068 
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units  0.001679 0.001505 0.001873     1000
## 
##  Location effects: SL_Growth_mm_d ~ Thermal_baseline 
## 
##                  post.mean  l-95% CI  u-95% CI eff.samp  pMCMC    
## (Intercept)       0.162624  0.157044  0.168163     1000 <0.001 ***
## Thermal_baseline -0.004694 -0.006863 -0.002822     1000 <0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# str(mod_1$Sol)
plot(mod_1$Sol)

mean(mod_1$Sol[,2] < 0) # test in significance of the slope

## [1] 1

pred_df <- data.frame(Thermal_baseline=seq(0,5,0.1), SL_Growth_mm_d=0)
# need a value for SL_Growth_mm_d for the package to work.

# estimating mean population SL_Growth_mm_d:
SLG_mean <- predict(mod_1,newdata=pred_df,type='response',interval='confidence')

The second object of inference is the posterior predictive pdf of the response variable

# posterior predictive pdf predicting individual SL_Growth_mm_d ()
SLG_ind <- predict(mod_1,newdata=pred_df,type='response',interval='prediction')
print(head(pred_df))

##   Thermal_baseline SL_Growth_mm_d
## 1              0.0              0
## 2              0.1              0
## 3              0.2              0
## 4              0.3              0
## 5              0.4              0
## 6              0.5              0

pred_df$SLG_mean_mn <- SLG_mean[,1]
pred_df$SLG_mean_lo <- SLG_mean[,2]
pred_df$SLG_mean_hi <- SLG_mean[,3]
pred_df$SLG_ind_mn <- SLG_ind[,1]
pred_df$SLG_ind_lo <- SLG_ind[,2]
pred_df$SLG_ind_hi <- SLG_ind[,3]

apatheme=theme_bw(base_size = 11, base_family = "sans")+
  theme(panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),
        panel.border=element_blank(),
        axis.line=element_line())
ggplot()+
  # data
  geom_point(data=df, aes(x=Thermal_baseline, y=SL_Growth_mm_d)) + 
  # the actual regression line:
  geom_line(data=pred_df, aes(x=Thermal_baseline,y=SLG_mean_mn),
            col='red',size=1.5) +
  # posterior 95% credible interval for the mean population SLG
  geom_ribbon(data=pred_df,
              aes(x=Thermal_baseline, ymin=SLG_mean_lo, ymax=SLG_mean_hi),
              fill='orange',alpha=0.6) +
  # posterior predictive 95% credible envelope for the individual SLG
  geom_ribbon(data=pred_df,
              aes(x=Thermal_baseline,ymin=SLG_ind_lo,ymax=SLG_ind_hi),fill='blue',
              alpha=0.2) +
  xlab('Thermal baseline') + 
  ylab('SL Growth mm/d') + apatheme

Linear regression model with single predictor and a change point

The main object of inference is the posterior pdf of the change point parameter
In Bayesian linear regression model, the change point is just a parameter. We can assign a prior distribution to it, combine it with the likelihood, and get its posterior distribution.
However, as the posterior change-point distribution cannot be derived analytically, the Gibbs sampler cannot be used to find the mean and credible interval (numerically). Remember that the main objects of Bayesian inference are (i) the posterior distribution of parameters and (ii) the posterior predictive distribution of the response variable. These are usually hard to found analytically. Instead, it is usually easier to do this numerically. One of the most widely used sampling methods is Metropolis-Hastings Algorithm.
To understand and use the Metropolis-Hastings Algorithm, we should understand the acceptance ratio. The acceptance ratio become 1, when we know the posterior distribution:
\[ R = \frac{f(y|\theta^{\text{new}})f(\theta^{\textrm{new}})}{f(y|\theta)f(\theta)}\frac{q(\theta|\theta^{\textrm{new}})}{q(\theta^{\textrm{new}}|\theta)} \] \[ = \frac{f(y|\theta^{\text{new}})f(\theta^{\textrm{new}})}{f(y|\theta)f(\theta)}\frac{\frac{f(y|\theta)f(\theta)}{f(y)}}{\frac{f(y|\theta^{\text{new}})f(\theta^{\text{new}})}{f(y)}} = 1 \]
Metropolis Step: If the proposal distribution is symmetric, i.e., \(q(\theta|\theta^{\textrm{new}})=q(\theta^{\textrm{new}}|\theta)\), then the proposal ratio become \(\frac{q(\theta|\theta^{\textrm{new}})}{q(\theta^{\textrm{new}}|\theta)} = 1\).
First, lets see a simple example of using the Metropolis-Hastings Algorithm: Estimating the Normal Mean

set.seed(123)
# generate data
n <- 100
y <- rnorm(n,2,sd=2)
# vague priors
mu0 <- 0; tau0 <- 10^{-4}
alpha <- 0.01; beta <- 0.01
# number of iterations
ITER <- 10^3
# setting up the monitors
mon.mu <- mon.tau <- numeric(ITER)
# initializing the algorithm: initial values for mu and for tau
mu <- mean(y)
tau <- 1/var(y)
# deciding on the width of the proposal distribution for mu
delta <- .5

## the algorithm with Metropolis step for mu, Gibbs step for tau:
for(iter in 1:ITER){
  #### STEP for MU
  # propose new value
  mu.new <- rnorm(1, mu, sd=delta)
  # evaluate the log of the acceptance ratio
  logR <- sum(dnorm(y,mu.new,1/sqrt(tau), log=T))- #new likelihood
          sum(dnorm(y,mu    ,1/sqrt(tau), log=T))+ #old likelihood
          dnorm(mu.new,mu0,1/sqrt(tau0),log=T)- # new prior
          dnorm(mu    ,mu0,1/sqrt(tau0),log=T) # old prior
  # accept with probability min(R,1)
  # draw U from uniform(0,1) and accept if logU < logR
  logU <- log(runif(1,0,1))
  if(logU < logR){mu <- mu.new}
  #### STEP for TAU
  tau <- rgamma(1,alpha+n/2,beta+.5*sum((y-mu)^2))
  #### UPDATING THE MONITORS:
  mon.mu[iter] <- mu
  mon.tau[iter] <- tau
}

par(mfrow=c(2,1))
plot(mon.mu,ty='l',ylab=expression(mu))
plot(1/sqrt(mon.tau),ty='l',ylab='sd')

cor(mon.mu[-1],mon.mu[-ITER])

## [1] 0.6739411

## NOTE 1: To select theta with probability p, we sampled 'u' from a uniform distribution and checked whether u<p. Generally, a binary random variable (x presenting theta accept or not) with the success probability p can be simulated by checking whether a uniform random variable u is smaller than p [i.e., x=(u<p)*1].
# we used runif instead of rbinom to generate random binary values
?runif()
set.seed(12345)
u <- runif(10)
x <- (u<0.75)*1
rbinom(10, 1, 0.75)

##  [1] 1 1 1 1 1 1 1 1 1 0

## NOTE 2: Very small and very large numbers can be manipulated with higher accuracy on log-scale.
x <- .01234^1000
print(x^{1/1000})

## [1] 0

logx <- 1000*log(.01234)
print(exp(logx/1000))

## [1] 0.01234

Now, lets use the Metropolis-Hastings Algorithm to solve change point problem

################## changing point data simulation and numerical algorithm
## generate some data
Alpha0 <- 0; Alpha1 <- 1; Tau <- 1/.1^2;
CP <- 25.5
set.seed(12345)
d <- data.frame(t=1:100)
d$y <- rnorm(dim(d)[1],Alpha0+Alpha1*(d$t>CP),1/sqrt(Tau))
plot(d,ty='b',pch=16,col='blue')

## Set up the algorithm
# Number of iterations
ITER <- 1500
# monitor vectors
mon.alpha0 <- mon.alpha1 <- mon.tau <- mon.cp <- numeric(ITER)
# initial values (can be based on the data)
alpha0 <- mean(d$y); alpha1 <- mean(d$y); tau <- 1/var(d$y); cp <- median(d$t)
# Gibbs steps
for(iter in 1:ITER){
  alpha0 <- rnorm(1,sum(d$y[d$t <  cp])*tau/(sum(d$t <  cp)*tau+10^(-8)),
                  1/sqrt(sum(d$t <  cp)*tau+10^(-8)))
  alpha1 <- rnorm(1,sum(d$y[d$t >= cp])*tau/(sum(d$t >= cp)*tau+10^(-8)),
                  1/sqrt(sum(d$t >= cp)*tau+10^(-8)))
  tau <-rgamma(1, 0.01+dim(d)[1]/2,
                  0.01+0.5*sum(((d$y-alpha0)^2)[d$t < cp]) +0.5*sum(((d$y-alpha1)^2)[d$t >= cp]))
  
# metropolis step 
# draw a new value from the proposal distribution with some sd
cp.new <- rnorm(1,cp,sd=0.2)
logR <- 
  # DIFFERENCE IN LOG-LIKELIHOODS:
  # given new value of cp
  (sum(dnorm(d$y[d$t < cp.new], alpha0, 1/sqrt(tau),log=T)) +
  sum(dnorm(d$y[d$t >= cp.new], alpha1, 1/sqrt(tau),log=T))) -
  # given old value of cp
  (sum(dnorm(d$y[d$t < cp], alpha0, 1/sqrt(tau),log=T)) +
  sum(dnorm(d$y[d$t >= cp], alpha1, 1/sqrt(tau),log=T)))+
  # DIFFERENCE IN LOG-PRIORS FOR CP:
  0-
  # LOG of PROPOSAL RATIO
  0
# We want to accept the new cp value based on the R probability. For that we draw a random number U between 0 and 1 using runif; and if it's log is less than log(R), we keep cp.
logU <- log(runif(1,0,1))
if(logU < logR){cp <- cp.new}

# updating the monitors
mon.alpha0[iter] <- alpha0
mon.alpha1[iter] <- alpha1
mon.tau[iter] <- tau
mon.cp[iter] <- cp
}

par(mfrow=c(2,2))
plot(mon.alpha0,ty='l')
plot(mon.alpha1,ty='l')
plot(mon.tau,ty='l')
plot(mon.cp,ty='l')

round(mean(mon.cp[501:1500]),4)

## [1] 32.3363

round(quantile(mon.cp[501:1500],c(0.025,.975)),4)

##    2.5%   97.5% 
## 25.0295 45.6382

There are also models, which use a rather complicated reversible jump Markov Chain Monte Carlo algorithm, which make the number of change points into a parameter. You can then evaluate the posterior probability that it was exactly one change point or that there have been at least two.
If the precision parameter of the proposal distribution is selected too large, this algorithm may converge quicker, but it tend to stay at the same value for a rather long time. That’s because the newly proposed values tend to be too far from the current one to be accepted. In contrast when it is too small, it may converge finally but it will take lots of iterations. Just meandering up and down far away from the correct value.
Just a NOTE: There exist algorithms which adapt the proposal distribution automatically.

Optional but important techniques

Monte Carlo integration

Unlike grid-based approximations, such as trapezoidal rule, Monte Carlo integration does not suffer from the curse of dimensionality.

Inverse sampling, rejection sampling, and importance sampling

Sampling from a pdf is essential to bayesian inference. Together, the methods allow to sample and thus obtain expected values, variances etc. for any distribution with a known pdf.
Inverse sampling Algorithm
Cumulative probability (of a random variable) is always uniform distributed. Therefore, we can sample from uniform distribution \(U(0,1)\) and turn the sampled values to the random variable, if the inverse of the cumulative distribution function (cdf) is known (shortcoming: not many cdfs are invertible).
Consider a random variable \(y\) with the pdf \(f(y)\) and the cdf \(F(y)\) such that its inverse \(F^-1(y)\) can be analytically derived.
The inverse sampling algorithm then consists of the following two steps:
(i) Sample \(u\) from the uniform distribution \(U(0,1)\).
(ii) Evaluate \(y = F^{-1}(u)\).
(iii) The resulting will follow the distribution with pdf \(f(x)\).

rejection sampling and importance sampling
Allow us to use the existing sampling algorithms to sample from distributions for which such algorithms do not exist. They are both less efficient than inverse sampling. The closer the substitute distribution is to the objective, the more efficient are the algorithms.

Gaussian Linear Mixed Model

A Gaussian distributed response variable \(y\) predicted by an identity link function of linear predictor paramter \(\mu\) and precision parameter \(\tau\):
\[ y = N(I(\mu, \tau)) \ \ \ \ \ \ \ \ \ \mu = X \theta + Z \xi \]
Other formulation of the Model (only with random-intercept)
Matrix form: \[ y = X \theta + Z\xi + \epsilon \]
Vector form: \[ \hat{y_i} = \theta_0 + \theta_1^T x_i + z_i \xi_{ID} \]
Scalar form: \[ y_i = \theta_0 + \sum_{j=1}^p \theta_j x_{i,j} + \xi_{\text{ID}_i} + \epsilon_i \]

With one predictor

likelihood (NOT joint likelihood) \[y_i|\mu_i,\tau \sim N(\mu_i,\tau)\] Random effect is considered as ID specific deviation of the response from the population intercept (\(\xi_{\text{ID}_i}\)).
Including this partly resolve the measurement dependency issue caused by hierarchical, longitudinal, or repeated measurement.
\[\mu_i = \theta_0 + \theta_1 x_i + \xi_{\text{ID}_i}\]
Prior \[\theta_0 \sim N(\mu_{\theta_0},\tau_{\theta_0})\] \[\theta_1 \sim N(\mu_{\theta_1},\tau_{\theta_1})\] \[\tau \sim Gamma(\alpha,\beta)\]
Other priors shall be considered for the random effect (\(j=1,...,J\)): \[\xi_j \sim N(0,\tau_{\xi})\] \[\tau_{\xi} \sim Gamma(\alpha_{\xi},\beta_{\xi})\]

Lets simulate data via arbitrary definition of all components of a simple linear mixed model:

### We simulate 10 measurements collected over time (covariate x) for each of 6 IDs (total # of measurements is 60)
set.seed(1234)
nID = 4                           # number of groups or IDs
ID = factor(rep(1:nID, each=5))       # ID of measurements
n = length(ID)                           # total # of measurements

# parameters
intercept = 3                              # fixed effects
slope = .75

sigma = 1                                  # residual sd
tau = .5                                   # re sd
u = rnorm(nID, mean=0, sd=tau)             # random effects
e = rnorm(n, mean=0, sd=sigma)             # residual error

# data
x = rnorm(n)                               # covariate
y = intercept + slope*x + u[ID] + e    # see model 1
d = data.frame(x, y, ID=ID)

# matrix form
X = cbind(1, x)
knitr::kable(X ,  caption = "matrix X")

matrix X
	x
1	-0.6937202
1	-1.4482049
1	0.5747557
1	-1.0236557
1	-0.0151383
1	-0.9359486
1	1.1022975
1	-0.4755931
1	-0.7094400
1	-0.5012581
1	-1.6290935
1	-1.1676193
1	-2.1800396
1	-1.3409932
1	-0.2942939
1	-0.4658975
1	1.4494963
1	-1.0686427
1	-0.8553646
1	-0.2806230

teta = c(intercept, slope)
knitr::kable(teta ,  caption = "vector teta")

vector teta
x
3.00
0.75

Z = model.matrix(~-1+ID)            
knitr::kable(Z ,  caption = "matrix Z")

matrix Z
ID1	ID2	ID3	ID4
1	0	0	0
1	0	0	0
1	0	0	0
1	0	0	0
1	0	0	0
0	1	0	0
0	1	0	0
0	1	0	0
0	1	0	0
0	1	0	0
0	0	1	0
0	0	1	0
0	0	1	0
0	0	1	0
0	0	1	0
0	0	0	1
0	0	0	1
0	0	0	1
0	0	0	1
0	0	0	1

y = X%*%teta + Z%*%u + e
knitr::kable(y ,  caption = "vector y2")

vector y2
2.3053016
1.8163693
2.2527940
1.0820935
1.8206614
1.5467153
3.4882451
1.7836334
1.8303807
2.8272299
3.2798945
2.5562206
1.3961813
1.6252803
2.4843285
3.8935632
3.0483616
0.5349832
0.7450798
2.0762733

library(lme4)

## 
## Attaching package: 'lme4'

## The following object is masked from 'package:brms':
## 
##     ngrps

lmeMod = lmer(y ~ x + (1|ID), data=d)

## boundary (singular) fit: see help('isSingular')

summary(lmeMod)

## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x + (1 | ID)
##    Data: d
## 
## REML criterion at convergence: 49.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6768 -0.4886 -0.1065  0.3941  2.0933 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. 
##  ID       (Intercept) 6.399e-17 7.999e-09
##  Residual             6.709e-01 8.191e-01
## Number of obs: 20, groups:  ID, 4
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   2.3880     0.2231  10.706
## x             0.4488     0.2129   2.108
## 
## Correlation of Fixed Effects:
##   (Intr)
## x 0.571 
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Lets model some real data

BioStat Inference

Jahangir Vajedsamiei

4/4/2022

Preparation - Import relevant packages

For linked events

Bayes rule

For linked random variables

Bayes formula (or theorem)

Posterior parameter and predictive distributions for count (Poisson) data

Posterior pdf of expected value \(\mu\) (given observed counts \(y\))

Practice

Posterior predictive pdf (conditional pdf of predicted value \(\tilde{y}\) given observed count data \(y\))

Practice

Posterior parameter distributions for NORMAL data

Posterior pdf of expectation and precision \(\mu\) and \(\tau\) (given observed NORMAL data \(y\))

Practice

Bayesian cluster analysis: Posterior distributions for Gaussian mixture data (uni-variate, with a known # of components)

Zero-inflated binomial model

Generalised Linear (Additive) Mixed Model (or “Hierarchical Models”, “Multilevel Models”, “Random Effects Models”)

Gaussian Linear Model

With single predictor

Practice

Linear regression model with single predictor and a change point

Optional but important techniques

Monte Carlo integration

Inverse sampling, rejection sampling, and importance sampling

Gaussian Linear Mixed Model

With one predictor

ID1	ID2	ID3	ID4
1	0	0	0
1	0	0	0
1	0	0	0
1	0	0	0
1	0	0	0
0	1	0	0
0	1	0	0
0	1	0	0
0	1	0	0
0	1	0	0
0	0	1	0
0	0	1	0
0	0	1	0
0	0	1	0
0	0	1	0
0	0	0	1
0	0	0	1
0	0	0	1
0	0	0	1
0	0	0	1

ID1	ID2	ID3	ID4
1	0	0	0
1	0	0	0
1	0	0	0
1	0	0	0
1	0	0	0
0	1	0	0
0	1	0	0
0	1	0	0
0	1	0	0
0	1	0	0
0	0	1	0
0	0	1	0
0	0	1	0
0	0	1	0
0	0	1	0
0	0	0	1
0	0	0	1
0	0	0	1
0	0	0	1
0	0	0	1

ID1	ID2	ID3	ID4
1	0	0	0
1	0	0	0
1	0	0	0
1	0	0	0
1	0	0	0
0	1	0	0
0	1	0	0
0	1	0	0
0	1	0	0
0	1	0	0
0	0	1	0
0	0	1	0
0	0	1	0
0	0	1	0
0	0	1	0
0	0	0	1
0	0	0	1
0	0	0	1
0	0	0	1
0	0	0	1