This R Markdown document is for the purpose of introducing statistical inference (modeling).
Some References:
Marin and Robert, 2014. Bayesian Essentials with R.
Rossi et al. 2005. Bayesian Statistics and Marketing
UCx STA02.1ucX Introduction to Bayesian Statistics Using R
https://m-clark.github.io/mixed-models-with-R/
brms: An R Package for Bayesian Multilevel Models using Stan
Preface
This document starts by an overview on probability basics needed for understanding of the Bayes rule for linked (i) events and (ii) random variables.
Next, we use the Beyes rule to infer about parameters of commonly used distribution, normal mixtures, and linear models.
# if you don't have these packages installed yet, please use the install.packages("package_name") command.
library(tidyverse)
library(knitr)
library(kableExtra)
library(r2symbols)
library(ggpubr)
library(rstan)
library(brms)
library(MCMCglmm)
Lets plot imaginary results from a sample space \(\Omega\), i.e., the set of all possible outcomes of trial(s).
\(B_1\) and \(B_0\) are cause events or sets of outcomes (e.g., ‘heatwave versus no heatwave’ in August) that are mutually exhaustive \([B_1 \cup B_0 = \Omega]\) and mutually exclusive \([B_1 \cap B_0 = \emptyset]\).
\(y_1\) and \(y_0\) are effect events (e.g., ‘seastar morality versus no mortality’).
| B1 y1 | B1 y1 | B0 y1 | B0 y1 |
| B1 y1 | B1 y1 | B0 y1 | B0 y0 |
| B1 y1 | B1 y1 | B0 y0 | B0 y0 |
| B1 y1 | B1 y0 | B0 y0 | B0 y0 |
Contingency table shows probability of all possible events
| y0 | y1 | ||
| B0 | P(B0,y0) | P(B0,y1) | P(B0) |
| B1 | P(B1,y0) | P(B1,y1) | P(B1) |
| P(y0) | P(y1) |
Marginal probability \[P(B_1)=\frac{8}{16} \ \ \ \ \ P(y_1)=\frac{10}{16} \]
Probability of sum \[P(B_1 \cup y_1) = P(B_1) + P(y_1) - P(B_1 \cap y_1) = \frac{11}{16} \] Probability of intersection (Joint probability) \[P(B_1 \cap y_1)=P(B_1,y_1)=P(B_1 \ and \ y_1)= \frac{7}{16}\] \[P(B_1 \cap y_1)=P(y_1) \cdot P(B_1 \mid y_1) \] Conditional probability (prob. of \(B_1\) given \(y_1\)) \[P(B_1 \mid y_1)= \frac{P(B_1 \cap y_1)}{P(y_1)} = \frac{7/16}{11/16}=\frac{7}{11}\] Probabilities of intersections are equal \[P(B_1 \cap y_1)=P(B_1 \mid y_1) \cdot P(y_1)=\frac{7}{16}\] \[P(y_1 \cap B_1)=P(y_1 \mid B_1) \cdot P(B_1)=\frac{7}{16}\] \[P(B_1 \mid y_1) \cdot P(y_1) = P(y_1 \mid B_1) \cdot P(B_1) \]
\[P(B_1 \mid y_1) =\frac {P(y_1 \mid B_1) \cdot P(B_1) }{P(y_1)}\] Remember that \(B_0\) and \(B_1\) are mutually exhaustive
\[P(B_1 \mid y_1) =\frac {P(y_1 \mid B_1) \cdot P(B_1) }{\sum_i P(y_1 \cap B_i)}\]
\[P(B_1 \mid y_1) =\frac {P(y_1 \mid B_1) \cdot P(B_1) }{\sum_i P(y_1|B_i)P(B_i)}\]
Usually, \(P(y_1 \cap B_1)\), \(P(B_i)\) and \(P(y_i|B_i)\) are known or can be assumed for i= 0, 1. e.g., lets thinks about the ‘heatwave occurs’ and ‘seastar mortality’… and that there is usually a cause and effect relation between the two sets of events.
Bayes formula allows us to update the conditional probability when having new data.
Random variables have probability density or mass function (pdf or pmf).
A probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes (events) for an experiment (https://en.wikipedia.org/wiki/Probability_distribution).
What was a random variable in the previous example? …
Generally speaking, assuming \(Y\) as a continuous (or discrete) random variable with probability density (or mass) function \(f(y)\) (or \(p_Y(y)\) or \(P(Y=y)\)):
\[f(y) \geq 0 \qquad \forall y\] \[\int_{-\infty}^\infty f(y)dy =1\] Cumulative density function (cdf) \[F(y) = F(Y \leq y) = \int_{-\infty}^y f(y) dy\] The joint distribution \[f(y,\beta) = f(y|\beta)f(\beta) = f(\beta|y)f(y)\] Marginal pdf \[f(y) = \int_{-\infty}^{\infty}f(y,\beta)d\beta\] NOTE: Joint pdf of multiple random variables \(y_1,y_2,y_3,...,y_K\) \[f(y_1,y_2,y_3,...,y_K)=f(y_1|y_2,y_3,...,y_K)f(y_2|y_3,...,y_K)...f(y_K)\]
For \(\beta\) and \(Y\) as linked random variables, \[f(\beta|y) = \frac{f(y|\beta)f(\beta)}{\int_{\beta}f(y|\beta)f(\beta)d\beta}\] \[f(\beta|y) = \frac{f(y|\beta)f(\beta)}{\int_{B}f(y,\beta)d\beta }\]
Only difference is in denominators which can be explained through the equation for marginal distribution above. \[pdf_{posterior} = \frac{JointLikelihood * pdf_{prior}}{pdf_{marginal}}\]
By now, we had an overview on the definition of the likelihood, that is the conditional probability of effect given cause. Here, the probability is a function of effect on cause. In the frequentist approach, the likelihood is considered as the data generating mechanism.
Bayes rule goes beyond this, and explicitly incorporate prior knowledge about the cause to define the conditional probability (or prob. distribution) of cause given effect. This is also called a posterior probability (or posterior probability distribution) and is usually unknown. In term of two linked events, we can think of it as the probability of fire (cause) given smoke observation (effect).
The Bayesian approach modifies the likelihood function into a posterior distribution, which is a valid probability distribution on cause.
When acquiring new data, we can use the last posterior as the new prior, and get an updated version of the posterior probability.
Importantly, Bayesian approach also provide a predictive distribution for effect given previously observed effect.
We assume \(y\) is poisson distributed. Thus, we have two linked random variables, \(y\) and \(\mu\). \[y|\mu \sim Pois(\mu) \\ E(Pois(\mu))=\mu \\ Var(Pois(\mu))=\mu \] \(\mu\) itself is a random variable, assumed to be Gamma distributed \[\mu|\alpha,\beta \sim Gamma(\alpha,\beta) \\ E(Gamma(\alpha,\beta)) = \frac{\alpha}{\beta} \\ Var(Gamma(\alpha,\beta))=\frac{\alpha}{\beta^2}\]
NOTE: As you noticed we are having assumptions one after the other. Welcome to the world of modeling.
Joint likelihood: Remember that in likelihood, the probability is a function of effect (\(y_i\)) on cause (\(\mu\)). \[f(y|\mu) = f(y_1,...,y_n|\mu) = \prod_if(y_i|\mu)=\prod_i\frac{\mu^{y_i}e^{-\mu}}{y_i!}=\frac{\mu^{\sum_iy_i}e^{-n\mu}}{\prod_iy_i!}\] Prior pdf \[f(\mu|\alpha,\beta) = \frac{\beta^\alpha}{\Gamma(\alpha)}\mu^{\alpha-1}e^{-\beta\mu} \quad \textrm{for } \mu>0\]
Posterior pdf \[\begin{eqnarray} f(\mu|y,\alpha,\beta) &=& \frac{f(y|\mu)f(\mu|\alpha,\beta)}{\int_0^{\infty} f(y|\mu)f(\mu|\alpha,\beta)d\mu}= \frac{\mu^{\alpha+\sum_iy_i-1}e^{-(\beta+n)\mu}}{\int_0^{\infty}\mu^{\alpha+\sum_iy_i-1}e^{-(\beta+n)\mu}d\mu} \end{eqnarray}\]
As the denominator is a constant with respect to \(\mu\),
\[f(\mu|y,a,b) \propto \mu^{\alpha+\sum_iy_i-1}e^{-(\beta+n)\mu}\] \[\mu|y,\alpha,\beta \sim Gamma(\alpha+\sum_iy_i,\beta+n)\] Thus, gamma distribution is a conjugate prior for the Poisson likelihood.
Importantly, we can then add new data \(\mathbf{y^{\ast}}\) to \(\alpha\) of the posterior Gamma distribution and add the number of new observations \(\mathbf{n^{\ast}}\) to \(\beta\). \[\mu|\mathbf{y^{\ast}},\alpha,\beta \sim Gamma(\alpha+\mathbf{y^{\ast}}, \beta+\mathbf{n^{\ast}})\]
\[E(\mu|\textbf{y},\alpha,\beta) = \frac{\alpha+\sum_iy_i}{\beta+n} = \frac{\alpha+n\bar{y}}{\beta+n}\] As \(n\) increases, bayesian posterior mean \(E(\mu|\textbf{y},\alpha,\beta)\) will approach frequentist mean \(\bar{y}\): \[\lim_{n \rightarrow \infty} E(\mu|\textbf{y},\alpha,\beta) =\bar{y}\]
Lets practice using mussel recruit abundance data collected in summer 2018. As you see, we have 24 counts (Recruits_n) at Thermal_baseline (0, 1,.., and 6). For now we omit all other columns.
Import mussel abundance dataframe
library(readxl)
df <- read_excel("~/Library/Mobile Documents/com~apple~CloudDocs/KOB_2018_MytilusRecruitment_onBaskets.xlsx")
df=as.data.frame(df)
knitr::kable(head(df)) %>% kable_styling(full_width = F)
| Benthocosm | Thermal_baseline | Upwelling | Basket | Recruits_n |
|---|---|---|---|---|
| E2 | 0 | No | Basket_19 | 208 |
| E2 | 0 | No | Basket_20 | 272 |
| A1 | 1 | No | Basket_1 | 83 |
| A1 | 1 | No | Basket_2 | 32 |
| C2 | 2 | No | Basket_11 | 126 |
| C2 | 2 | No | Basket_12 | 103 |
prior_alpha=0.001
prior_beta=0.001
sim_num <- 10^3 # number of simulations
mu <- rgamma(sim_num, prior_alpha+sum(df$Recruits_n), prior_beta+24)
hist(mu, freq=F, xlab='mu', ylab='Post. pred. density', main='')
mean(mu)
## [1] 120.9296
var(mu)
## [1] 5.221075
qgamma(.025,prior_alpha+sum(df$Recruits_n),prior_beta+24)
## [1] 116.6053
qgamma(.975,prior_alpha+sum(df$Recruits_n),prior_beta+24)
## [1] 125.4053
#for highest density region
pp <- seq(0,.05,length.out=1000)
lo <- sapply(pp,qgamma,shape=prior_alpha+sum(df$Recruits_n),rate=prior_beta+24)
hi <- sapply(pp+.95,qgamma,shape=prior_alpha+sum(df$Recruits_n),rate=prior_beta+24)
pp[which.min(hi-lo)]
## [1] 0.02427427
\[f(\tilde{y}|y)\] considering Sum rule: \(f(y)=\sum f(y,\mu)\) \[f(\tilde{y}|y) = \int f(\tilde{y},\mu|y) d\mu\] considering Product rule \(f(y,\mu)=f(\mu|y)f(\mu)\) and conditional independence of \(\mu\) and \(y\)
\[f(\tilde{y}|y) = \int f(\tilde{y}|\mu)f(\mu|y) d\mu\]
Derivation to posterior parameter (or predictive) distribution is not always easy and instead posterior can be estimated via numerical methods.
set.seed(12345)
# A vector of y-tilde is simulated using values of mu which was simulated above
y_tilde <- rpois(sim_num,mu)
hist(y_tilde,freq=F, xlab='# of recruits',
ylab='Post. pred. density', main='')
mean(y_tilde)
## [1] 121.492
quantile(y_tilde,c(.025,.975))
## 2.5% 97.5%
## 99 145
mean(y_tilde>=140)
## [1] 0.072
mean(y_tilde<=50)
## [1] 0
Pairwise comparison of \(\mu\) between Thermal history levels
prior_alpha=0.001
prior_beta=0.001
sim_num <- 10^3 # number of simulations
df_mu = NULL
# simulate posterior mu distribution given data of each treatment level
for (TB in unique(df$Thermal_baseline)){
mu <- rgamma(sim_num, prior_alpha+sum(df$Recruits_n[df$Thermal_baseline==TB]),
prior_beta+length(df$Recruits_n[df$Thermal_baseline==TB]))
df_mu <- cbind(df_mu, mu)
colnames(df_mu)[ncol(df_mu)] <- paste('TB',TB, sep="")
}
df_mu = as.data.frame(df_mu)
knitr::kable(head(df_mu)) %>% kable_styling(full_width = F)
| TB0 | TB1 | TB2 | TB3 | TB4 | TB5 |
|---|---|---|---|---|---|
| 443.3389 | 104.9427 | 112.6016 | 42.34512 | 2.804348 | 0.0064171 |
| 461.2639 | 112.5685 | 108.2508 | 40.74966 | 4.293258 | 0.0002523 |
| 447.3130 | 120.2431 | 118.6380 | 45.53929 | 3.817926 | 0.0096588 |
| 443.0322 | 112.2095 | 113.7506 | 46.72680 | 3.315271 | 0.0006233 |
| 436.1325 | 119.9078 | 118.6365 | 47.79399 | 4.957076 | 0.0275269 |
| 450.6076 | 109.8995 | 106.0149 | 47.54148 | 3.017464 | 0.0054606 |
paste("bayesian P-value = ", mean(df_mu$TB1>df_mu$TB2), sep = "")
## [1] "bayesian P-value = 0.576"
df_mu_long = df_mu %>%
tidyr::pivot_longer( everything(), names_to = 'Thermal baseline',
values_to = 'mu' )
df_mu_long <- as.data.frame(df_mu_long)
knitr::kable(head(df_mu_long)) %>% kable_styling(full_width = F)
| Thermal baseline | mu |
|---|---|
| TB0 | 443.3389200 |
| TB1 | 104.9427125 |
| TB2 | 112.6015823 |
| TB3 | 42.3451228 |
| TB4 | 2.8043480 |
| TB5 | 0.0064171 |
gghistogram(df_mu_long, x = "mu", add = "mean", bins = 200, color = "Thermal baseline",
fill = "Thermal baseline")
Statistical evidence is not enough to support the hypothesis that recruitment was higher in TB2 than TB2.
Here we have three random variables:
\[y|\mu,\tau \sim N(\mu,\tau)\] \[\mu \sim N(\mu_0,\tau_0)\] \[\tau \sim Gamma(\alpha,\beta)\]
Joint likelihood \[\begin{eqnarray} f(y|\mu,\tau)=f(y_1,...,y_n|\mu,\tau) &=& \prod_i\left( \frac{\tau^{1/2}}{\sqrt{2\pi}}\exp\left(-\frac{\tau}{2}\left(y_i-\mu\right)^2\right)\right)\nonumber\\ &=& \frac{\tau^{n/2}}{(2\pi)^{n/2}}\exp\left(-\frac{\tau}{2}\sum_i(y_i-\mu)^2\right)\nonumber\\ &\propto& \tau^{n/2}\exp\left(-\frac{\tau}{2}\sum_i(y_i-\mu)^2\right)\nonumber \end{eqnarray}\] Prior pdf \[f(\mu) = \frac{\tau_0^{1/2}}{\sqrt{2\pi}}\exp\left(-\frac{\tau_0}{2}\left(\mu-\mu_0\right)^2\right) \propto \exp\left(-\frac{\tau_0}{2}\left(\mu-\mu_0\right)^2\right)\]
\[f(\tau) = \frac{\beta^\alpha}{\Gamma(\alpha)}\tau^{\alpha-1}\exp(-\beta\tau)\propto \tau^{a-1}\exp(-\beta\tau)\]
Posterior pdf \[\mu|\tau,y \sim N\left( \frac{n\bar{y}\tau+\mu_0\tau_0}{n\tau+\tau_0}, n\tau+\tau_0 \right)\]
\[\tau|\mu,y \sim Gamma\left(\alpha+n/2,\beta+\frac{1}{2}\sum_i(y_i-\mu)^2\right)\]
Again at this step, think about what happens on the expectation of these two posterior pdf when the sample size goes to infinity…
Here the simulation is a bit different. First, we need to set an initial value of tau to sample a new mu given the initial tau, then sample a new tau given the obtained mu and so on. Then we can repeat such a chain by setting an initial value for mu. This method of sampling from conditional posterior distributions, one by one, is called the Gibbs sampler, a kind of Markov Chain Monte Carlo (MCMC) sampling method.
Lets practice using mussel growth rate data collected in summer 2018. We focus on data of shell length growth (SL_Growth_mm_d) at various Thermal_baselines (0, 1,.., and 6). For now we omit all other columns.
Import mussel abundance dataframe
path = paste("~/Library/Mobile Documents/com~apple~CloudDocs/", sep="")
df <- read_excel(paste(path,"KOB_2018_mussel_growth_99outliers_removed.xlsx",sep=""))
## New names:
## * `` -> ...1
df = as.data.frame(df)
df = select(df, Thermal_baseline, SL_Growth_mm_d)
knitr::kable(head(df)) %>% kable_styling(full_width = F)
| Thermal_baseline | SL_Growth_mm_d |
|---|---|
| 0 | 0.1917188 |
| 0 | 0.2417188 |
| 0 | 0.2229688 |
| 0 | 0.2385938 |
| 0 | 0.2432812 |
| 0 | 0.2401562 |
Remember that, prior parameter values shall not be selected based on data while initial parameter values can be selected based on the data. Later we will discuss how prior parameter values can be chosen not to be totally off from data.
# define y
y <- df$SL_Growth_mm_d
n = length(y)
# assume prior parameters
prior_mu=0
prior_tau=0.0001
prior_alpha=0.001
prior_beta=0.001
## MCMC
# setting the seed
set.seed(123)
# number of iterations:
sim_num <- 10^3
# preparing empty vectors to later store the sampled values
mu.sample <- tau.sample <- numeric(sim_num)
# assigning starting values
mu <- mean(y); tau <- 1/var(y)
for(i in 1:sim_num){
mu <- mu.sample[i] <-rnorm(1, (sum(y)*tau+prior_mu*prior_tau)/(n*tau+prior_tau),
1/sqrt(n*tau+prior_tau))
tau <- tau.sample[i] <- rgamma(1, prior_alpha+n/2,
prior_beta+0.5*sum((y-mu.sample[i])^2))
}
par(mfrow=c(2,2))
plot(mu.sample,ty='l',xlab='iteration',ylab=expression(mu))
plot(density(mu.sample),ty='l',ylab='density',xlab=expression(mu),lwd=2,main='')
plot(tau.sample,ty='l',xlab='iteration',ylab=expression(tau))
plot(density(tau.sample),ty='l',ylab='density',xlab=expression(tau),lwd=2,main='')
As our simulation get more complicated, we need to check if the script work fine. One efficient way is to first simulate data from known arbitrary parameters, and then test how MCMC estimated parameters relate to real values.
Assume \(y\) (e.g., the organism size) has a Gaussian mixture distribution (e.g., because it includes size values of organisms of various cohorts).
Let \(y_i\) denote an observation and \(z_i\) is the type (component) of that observation. Note: \(z_i\) is unobserved or latent variable, but we know that there are \(K\) components in total.
Using the law of total probability, we can define \(\pi_k\) that is the mixture proportions or weights. \[f(y_i) = \sum_{k=1}^K f(y_i|z_i=k)\underbrace{f(z_i=k)}_{\pi_k} = \sum_{k=1}^K f(y_i|z_i=k)\pi_k\] A uni-variate Gaussian mixture has four random variables: \[f(y_i) = \sum_{k=1}^K f(y_i|z_i=k, \mu_k,\tau_k) \pi_k\] Robert & Marin 2014: It is actually relevant to distinguish the weights \(\pi_k\) from the other parameters in that they are solely associated with the missing- data structure of the model, while the others are related to the observations. … the inferential (parametric modeling) goal associated with a sample from a mixture of distributions may be either to (i) reconstitute the groups by estimating the missing component, an operation usually called classification (or clustering), (ii) to provide estimators for the parameters of the different groups, or even (iii) to estimate the number k of groups … In addition, from the semi-parametric modeling perspective, If k is large enough, there is theoretical support for the argument that the latter equation provides a good approximation (in some functional sense) to most distributions.
MCMC Solutions Mixture Gibbs Sampler Initialization: https://www.youtube.com/watch?v=zVwFPDsxsH0 1. Initialization: Choose \(\pi_k^{(0)}\) and \(\mu_k^{(0)},\tau_k^{(0)}\) arbitrarily.
2. for iterations t (t ≥ 1):
3. calculate \(\pi_{ik}=f(z_i=k|y_i,\mu_k,\tau_k)∝f(y_i|\mu_k^{(t−1)},\tau_k^{(t-1)})\pi_k^{(t−1)}\)
4. sample \(z_i^{(t)} \sim Mul(\pi_{i1},..., \pi_{iK})\)
5. calculate \(n_k, \mu_k^{(t)}, \tau_k^{(t)}\)
6. sample \(\tau_k \sim f(\tau_k | y, z)\)
7. sample \(\mu_k \sim f(\mu_k | \tau_k, y, z)\)
8. sample \(\pi_k \sim f(\pi_k | z)\)
(1) You generate random Gaussian distributions for the components. (3) Then, based on those distributions, each of data points is assigned with probabilities of being from the components. (4) From this multinomial distribution, we can sample component labels for all data points. (5) Based on labeled data, we calculate sample statistics for each component. (6-8) Form and sample from parameter posteriors. And repeat from 3.
# eval=FALSE, include=FALSE}
################# An example using 'bmixture' library
library(bmixture)
# first for simulated data, from mixture of Normal with 3 components
n_ = 500
weight_ = c( 0.3, 0.5, 0.2 )
mean_ = c( 0 , 10 , 3 )
sd_ = c( 1 , 1 , 1 )
# rmixnorm is random generation function for a finite mixture of uni-variate Normal distribution.
data_ = rmixnorm( n = n_, weight = weight_, mean = mean_, sd = sd_ )
# plot for simulation data and the exact model
hist( data_, prob = TRUE, nclass = 30, col = "gray" )
x = seq( -20, 20, 0.05 )
# dmixnorm is density function for a finite mixture of uni-variate Normal distribution
densmixnorm = dmixnorm( x, weight_, mean_, sd_ )
lines( x, densmixnorm, lwd = 2 )
# Runing bdmcmc algorithm for simulated data
bmixnorm.obj = bmixnorm( data_, k=3, iter = 5000, trace = F ) # bmixnorm function consists of several sampling algorithms for Bayesian estimation for finite mixture of Normal distributions.
summary( bmixnorm.obj )
##
## Number of mixture components = 3
## Estimated 'pi' = 0.3 0.2 0.5
## Estimated 'mu' = -0.05 2.79 10.02
## Estimated 'sig' = 0.66 1.16 0.97
pi_sample_chain = lapply(bmixnorm.obj['mu_sample'], function(x) x[,1])
plot(seq(1, length(unlist(pi_sample_chain))),unlist(pi_sample_chain, use.names = F))
# Runing bdmcmc algorithm for our shell length data
path = paste("~/Library/Mobile Documents/com~apple~CloudDocs/", sep="")
df1 <- read_excel(paste(
path,"KOB_2018_mussel_growth_99outliers_removed.xlsx",sep=""))
## New names:
## * `` -> ...1
df1 = as.data.frame(df1)
data2 = as.numeric(df1$SL_mm)
library(bmixture)
data2 = as.numeric(df1$SL_mm)
bmixnorm.obj = bmixnorm( data2, k = 2, iter = 5000, trace = F )#, pi.start=c(3,4) )
#summary( bmixnorm.obj )
# the plot seems merged (the two dist. are not clear)
#plot(bmixnorm.obj)
# plotting the two identified mixed Gaussian
set.seed(12) #better make this reproducible
observations <- tibble(value = c(
rnorm(n = 1000, mean = 23.4, sd = 5.18), #the first normal distribution
rnorm(n = 1000, mean = 26.02, sd = 8.63) #this second distribution is double the size of the first
))
ggplot(observations, aes(x = value)) +
geom_histogram(binwidth = 0.5) +
mapply(
function(mean, sd, lambda, n, binwidth) {
stat_function(fun = function(x) {(dnorm(x, mean = mean, sd = sd)) * n * binwidth * lambda})
},
mean = c(23.4, 26.02), #mean
sd = c(5.18, 8.63), #standard deviation
lambda = c(0.62, 0.38), #amplitude
n = length(observations$value), #sample size
binwidth = 0.5 #binwidth used for histogram
)
Bayesian statistical inference is the gold standard for uncertainty estimation
The first object of inference is the posterior pdf of the parameters of a zero-inflated binomial pdf
A binomial distributed random variable \(y\) represent ‘# of success in n Bernoulli trials’.
Lets assume that \(y_i\) is # of success in 7 barnacle sampling trials in Site i, and the aim is to find the posterior pdf of the parameter p (i.e., prob. of success in a random sampling trial):
set.seed(12345)
# define a y vector (for 100 sites)
y <- rep(0:6,c(22,6,18,23,18,10,3))
hist(y)
As you can see, there are usually many sites with zero success in barnacle sampling, simply because there are sites not populated by barnacle at all due to whatever reason (not receiving enough recruits and etc.).
Such zero inflation are not considered in a binomial distribution.
Thus, we have two objects of inference:
(i) the posterior pdf of \(p\) (i.e., the prob. of success in a random sampling trial) for the sites that such possibility is not zero.
(ii) the probability of a random site having non-zero prob. of success.
Thus, a latent variable \(z\) need to be introduced that labels whether barnacle sampling can or cannot be successful on each of sites. In the posterior parameter pdf, we include \(z=1\) as a condition.
Lets look immediately into the Gibbs sampler: [NOTE: derivations not included]
(1) We know by now that the posterior pdf of \(p\) is a Beta-binomial conjugate model (a binomial joint likelihood multiplied by a beta prior). By assigning initial values to \(z_i\), we can start sampling from this posterior, in the loop.
\[p|y,z=1 \sim Beta\left(a+\sum_iy_i, b+7\sum_iz_i-\sum_iy_i\right)\]
(2) We also sample from the posterior pdf of \(\omega\) [\(\omega\) is the parameter of the latent (labeling) variable \(z\) Bernoulli distribution; \(\omega\) represent the prob. of a random site having non-zero prob. of barnacle sampling success]. The posterior \(\omega\) pdf is also a Beta-binomial (or Beta-Bernoulli) conjugate model. As we assigned initial values to \(z_i\), we can also start sampling from this posterior in the loop.
\[\omega|z \sim Beta\left(a_{\omega}+\sum_iz_i, b_{\omega}+n-\sum_iz_i\right)\]
(3) Finally, we need a probability distribution for \(z_i\) being equal to 1, given \(y_i=0\). As we already sampled \(p\) and \(\omega\), the next step in the loop will be sampling a new set of labels from this Bernoulli distribution:
\[Pr(z_i=1|y_i=0,p,\omega) = \frac{(1-p)^7\omega}{(1-\omega)+(1-p)^7\omega}\]
set.seed(12345)
n = length(y)
# number of iterations
ITER <- 10^3
# monitor vectors that will be filled by samples from posterior distributions
mon.p <- numeric(ITER)
mon.omega <- numeric(ITER)
# a matrix that will be filled by each set of labels sampled from z, we need :
mon.z <- array(dim=c(n,ITER))
# assigning initial values to z
z <- (y>=1)*1
# starting Gibbs sampler:
for(iter in 1:ITER){
# (i) sampling p from a beta distribution. REMEMBER, p is the prob. of success on a random barnacle sampling trial
p <- rbeta(1,1+sum(y),1+7*sum(z)-sum(y))
# (ii) sampling omega from a beta distribution. REMEMBER, omega is the prob. of a random site having non-zero prob. of barnacle sampling success
omega <- rbeta(1,1+sum(z),1+n-sum(z))
# (iii) defining the Bernoulli distribution for the probability that z=1, given y=0
prob.z.equal1 <- 1*(y>=1)+(1-p)^7*omega/(1-omega+(1-p)^7*omega)*(y==0)
# sampling labels from the Bernoulli distribution
z <- rbinom(n,1,prob.z.equal1)
# update the monitors with the new values
mon.p[iter] <- p
mon.omega[iter] <- omega
mon.z[,iter] <- z
}
# checking the posterior samples:
par(mfrow=c(2,2))
plot(mon.p,ty='l')
plot(density(mon.p),main='')
plot(mon.omega,ty='l')
plot(density(mon.omega),main='')
# producing posterior summaries
mean(mon.omega); quantile(mon.omega,c(.025,.975))
## [1] 0.7877691
## 2.5% 97.5%
## 0.7031661 0.8704294
mean(mon.p); quantile(mon.p,c(.025,.975))
## [1] 0.4517731
## 2.5% 97.5%
## 0.4107704 0.4910918
The sampling can succeed in ca. 78% of sites ( CI: 70-86%). In those sites, the prob. of success in random barnacle sampling is 0.45 (CI: 0.40-0.49).
The second object of inference is the posterior predictive pdf for our zero-inflated binomial variable \(y\)
# preparing posterior predictive sample of 0-frequency of success
post.pred.z0 <- numeric(ITER)
# Gibbs sampler for predictive pdf
for(iter in 1:ITER){
z.post.pred <- rbinom(100,1,omega)
y.post.pred <- rbinom(100,7,p)
y.post.pred[z.post.pred==0] <- 0
post.pred.z0[iter] <- sum(y.post.pred==0)
}
hist(y,breaks=seq(-.25,7.25,.5), col='lightblue', ylim=c(0,40),
main='', xlab='# of success in n sampling trials')
# new zero-inflated binomial model prediction
arrows(0-.1,quantile(post.pred.z0,.025),
0-.1,quantile(post.pred.z0,.975),
angle=90,lwd=2,col='black',code=3,length=.1)
points(0-.1,mean(post.pred.z0),pch=16,col='black',cex=2)
A D-distributed response variable \(y\) predicted by an inverse link function \(f\) of a linear predictor parameter \(\eta\) and another parameter \(\delta\).
\[ y_i = D(f(\eta_i, \delta)) \ \ \ \ \ \ \ \ \ \eta = X \theta + Z \xi \]
\(\delta\) is usually assumed not to vary across data points (e.g., the standard deviation or precision in Gaussian models or the shape parameter in Gamma or negative binomial models).
An inverse link function (e.g., \(log()\)) make linear predictors map to predicted values that are on a different scale. \(\theta\) and \(\xi\) are the population-level and group-level regression parameters (fixed and random effects from the frequentist perspective).
\(X\) and \(Z\) are the corresponding design matrices, which with \(y\) make up the data.
A Gaussian distributed response variable \(y\) predicted by identity link function of a linear predictor \(\mu\) and precision \(\tau\):
\[ y = N(I(\mu, \tau)) \ \ \ \ \ \ \ \ \ \mu = X \theta \]
Other formulation of Normal Linear Model
Matrix form: \[ y = X \theta + \epsilon \]
Vector form: \[ \hat{y_i} = \theta_0 + \theta_1^T x_i \]
Scalar form: \[ y_i = \theta_0 + \sum_{j=1}^p \theta_j x_{i,j} + \epsilon_i \]
likelihood (NOT joint likelihood) \[y_i|\mu_i,\tau \sim N(\mu_i,\tau)\] \[\mu_i = \theta_0 + \theta_1 x_i\]
Prior \[\theta_0 \sim N(\mu_{\theta_0},\tau_{\theta_0})\] \[\theta_1 \sim N(\mu_{\theta_1},\tau_{\theta_1})\] \[\tau \sim Gamma(\alpha,\beta)\]
As always, the objects of inference are the posterior pdf of parameters and predicted response
Here, the derivation of posterior parameter and predictive distributions are more complicated. As long as such derivations are possible, we can use Gibbs sampler to obtain samples from posterior distribution of \(\theta_0\), \(\theta_1\), and \(\tau\) respectively and create posterior mean estimates.
Lets practice again using mussel growth rate data collected in summer 2018. We focus on data of shell length growth (SL_Growth_mm_d) at various Thermal_baselines (0, 1,.., and 6). We use the function MCMCglmm.
# vague prior parameters, i.e., means and variances for slope and intercept; the Gamma prior of tau is set by default.
prior <- list(B=list(mu=c(0,0), V=c(100^2,100^2)*diag(2)))
mod_1 <- MCMCglmm(SL_Growth_mm_d ~ Thermal_baseline, data=df, nitt=11000,
burnin=1000, thin = 10, prior=prior, verbose=F)
print(summary(mod_1))
##
## Iterations = 1001:10991
## Thinning interval = 10
## Sample size = 1000
##
## DIC: -2046.068
##
## R-structure: ~units
##
## post.mean l-95% CI u-95% CI eff.samp
## units 0.001679 0.001505 0.001873 1000
##
## Location effects: SL_Growth_mm_d ~ Thermal_baseline
##
## post.mean l-95% CI u-95% CI eff.samp pMCMC
## (Intercept) 0.162624 0.157044 0.168163 1000 <0.001 ***
## Thermal_baseline -0.004694 -0.006863 -0.002822 1000 <0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# str(mod_1$Sol)
plot(mod_1$Sol)
mean(mod_1$Sol[,2] < 0) # test in significance of the slope
## [1] 1
pred_df <- data.frame(Thermal_baseline=seq(0,5,0.1), SL_Growth_mm_d=0)
# need a value for SL_Growth_mm_d for the package to work.
# estimating mean population SL_Growth_mm_d:
SLG_mean <- predict(mod_1,newdata=pred_df,type='response',interval='confidence')
The second object of inference is the posterior predictive pdf of the response variable
# posterior predictive pdf predicting individual SL_Growth_mm_d ()
SLG_ind <- predict(mod_1,newdata=pred_df,type='response',interval='prediction')
print(head(pred_df))
## Thermal_baseline SL_Growth_mm_d
## 1 0.0 0
## 2 0.1 0
## 3 0.2 0
## 4 0.3 0
## 5 0.4 0
## 6 0.5 0
pred_df$SLG_mean_mn <- SLG_mean[,1]
pred_df$SLG_mean_lo <- SLG_mean[,2]
pred_df$SLG_mean_hi <- SLG_mean[,3]
pred_df$SLG_ind_mn <- SLG_ind[,1]
pred_df$SLG_ind_lo <- SLG_ind[,2]
pred_df$SLG_ind_hi <- SLG_ind[,3]
apatheme=theme_bw(base_size = 11, base_family = "sans")+
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
panel.border=element_blank(),
axis.line=element_line())
ggplot()+
# data
geom_point(data=df, aes(x=Thermal_baseline, y=SL_Growth_mm_d)) +
# the actual regression line:
geom_line(data=pred_df, aes(x=Thermal_baseline,y=SLG_mean_mn),
col='red',size=1.5) +
# posterior 95% credible interval for the mean population SLG
geom_ribbon(data=pred_df,
aes(x=Thermal_baseline, ymin=SLG_mean_lo, ymax=SLG_mean_hi),
fill='orange',alpha=0.6) +
# posterior predictive 95% credible envelope for the individual SLG
geom_ribbon(data=pred_df,
aes(x=Thermal_baseline,ymin=SLG_ind_lo,ymax=SLG_ind_hi),fill='blue',
alpha=0.2) +
xlab('Thermal baseline') +
ylab('SL Growth mm/d') + apatheme
The main object of inference is the posterior pdf of the change point parameter
In Bayesian linear regression model, the change point is just a parameter. We can assign a prior distribution to it, combine it with the likelihood, and get its posterior distribution.
However, as the posterior change-point distribution cannot be derived analytically, the Gibbs sampler cannot be used to find the mean and credible interval (numerically). Remember that the main objects of Bayesian inference are (i) the posterior distribution of parameters and (ii) the posterior predictive distribution of the response variable. These are usually hard to found analytically. Instead, it is usually easier to do this numerically. One of the most widely used sampling methods is Metropolis-Hastings Algorithm.
To understand and use the Metropolis-Hastings Algorithm, we should understand the acceptance ratio. The acceptance ratio become 1, when we know the posterior distribution:
\[ R = \frac{f(y|\theta^{\text{new}})f(\theta^{\textrm{new}})}{f(y|\theta)f(\theta)}\frac{q(\theta|\theta^{\textrm{new}})}{q(\theta^{\textrm{new}}|\theta)} \] \[ = \frac{f(y|\theta^{\text{new}})f(\theta^{\textrm{new}})}{f(y|\theta)f(\theta)}\frac{\frac{f(y|\theta)f(\theta)}{f(y)}}{\frac{f(y|\theta^{\text{new}})f(\theta^{\text{new}})}{f(y)}} = 1 \]
Metropolis Step: If the proposal distribution is symmetric, i.e., \(q(\theta|\theta^{\textrm{new}})=q(\theta^{\textrm{new}}|\theta)\), then the proposal ratio become \(\frac{q(\theta|\theta^{\textrm{new}})}{q(\theta^{\textrm{new}}|\theta)} = 1\).
First, lets see a simple example of using the Metropolis-Hastings Algorithm: Estimating the Normal Mean
set.seed(123)
# generate data
n <- 100
y <- rnorm(n,2,sd=2)
# vague priors
mu0 <- 0; tau0 <- 10^{-4}
alpha <- 0.01; beta <- 0.01
# number of iterations
ITER <- 10^3
# setting up the monitors
mon.mu <- mon.tau <- numeric(ITER)
# initializing the algorithm: initial values for mu and for tau
mu <- mean(y)
tau <- 1/var(y)
# deciding on the width of the proposal distribution for mu
delta <- .5
## the algorithm with Metropolis step for mu, Gibbs step for tau:
for(iter in 1:ITER){
#### STEP for MU
# propose new value
mu.new <- rnorm(1, mu, sd=delta)
# evaluate the log of the acceptance ratio
logR <- sum(dnorm(y,mu.new,1/sqrt(tau), log=T))- #new likelihood
sum(dnorm(y,mu ,1/sqrt(tau), log=T))+ #old likelihood
dnorm(mu.new,mu0,1/sqrt(tau0),log=T)- # new prior
dnorm(mu ,mu0,1/sqrt(tau0),log=T) # old prior
# accept with probability min(R,1)
# draw U from uniform(0,1) and accept if logU < logR
logU <- log(runif(1,0,1))
if(logU < logR){mu <- mu.new}
#### STEP for TAU
tau <- rgamma(1,alpha+n/2,beta+.5*sum((y-mu)^2))
#### UPDATING THE MONITORS:
mon.mu[iter] <- mu
mon.tau[iter] <- tau
}
par(mfrow=c(2,1))
plot(mon.mu,ty='l',ylab=expression(mu))
plot(1/sqrt(mon.tau),ty='l',ylab='sd')
cor(mon.mu[-1],mon.mu[-ITER])
## [1] 0.6739411
## NOTE 1: To select theta with probability p, we sampled 'u' from a uniform distribution and checked whether u<p. Generally, a binary random variable (x presenting theta accept or not) with the success probability p can be simulated by checking whether a uniform random variable u is smaller than p [i.e., x=(u<p)*1].
# we used runif instead of rbinom to generate random binary values
?runif()
set.seed(12345)
u <- runif(10)
x <- (u<0.75)*1
rbinom(10, 1, 0.75)
## [1] 1 1 1 1 1 1 1 1 1 0
## NOTE 2: Very small and very large numbers can be manipulated with higher accuracy on log-scale.
x <- .01234^1000
print(x^{1/1000})
## [1] 0
logx <- 1000*log(.01234)
print(exp(logx/1000))
## [1] 0.01234
Now, lets use the Metropolis-Hastings Algorithm to solve change point problem
################## changing point data simulation and numerical algorithm
## generate some data
Alpha0 <- 0; Alpha1 <- 1; Tau <- 1/.1^2;
CP <- 25.5
set.seed(12345)
d <- data.frame(t=1:100)
d$y <- rnorm(dim(d)[1],Alpha0+Alpha1*(d$t>CP),1/sqrt(Tau))
plot(d,ty='b',pch=16,col='blue')
## Set up the algorithm
# Number of iterations
ITER <- 1500
# monitor vectors
mon.alpha0 <- mon.alpha1 <- mon.tau <- mon.cp <- numeric(ITER)
# initial values (can be based on the data)
alpha0 <- mean(d$y); alpha1 <- mean(d$y); tau <- 1/var(d$y); cp <- median(d$t)
# Gibbs steps
for(iter in 1:ITER){
alpha0 <- rnorm(1,sum(d$y[d$t < cp])*tau/(sum(d$t < cp)*tau+10^(-8)),
1/sqrt(sum(d$t < cp)*tau+10^(-8)))
alpha1 <- rnorm(1,sum(d$y[d$t >= cp])*tau/(sum(d$t >= cp)*tau+10^(-8)),
1/sqrt(sum(d$t >= cp)*tau+10^(-8)))
tau <-rgamma(1, 0.01+dim(d)[1]/2,
0.01+0.5*sum(((d$y-alpha0)^2)[d$t < cp]) +0.5*sum(((d$y-alpha1)^2)[d$t >= cp]))
# metropolis step
# draw a new value from the proposal distribution with some sd
cp.new <- rnorm(1,cp,sd=0.2)
logR <-
# DIFFERENCE IN LOG-LIKELIHOODS:
# given new value of cp
(sum(dnorm(d$y[d$t < cp.new], alpha0, 1/sqrt(tau),log=T)) +
sum(dnorm(d$y[d$t >= cp.new], alpha1, 1/sqrt(tau),log=T))) -
# given old value of cp
(sum(dnorm(d$y[d$t < cp], alpha0, 1/sqrt(tau),log=T)) +
sum(dnorm(d$y[d$t >= cp], alpha1, 1/sqrt(tau),log=T)))+
# DIFFERENCE IN LOG-PRIORS FOR CP:
0-
# LOG of PROPOSAL RATIO
0
# We want to accept the new cp value based on the R probability. For that we draw a random number U between 0 and 1 using runif; and if it's log is less than log(R), we keep cp.
logU <- log(runif(1,0,1))
if(logU < logR){cp <- cp.new}
# updating the monitors
mon.alpha0[iter] <- alpha0
mon.alpha1[iter] <- alpha1
mon.tau[iter] <- tau
mon.cp[iter] <- cp
}
par(mfrow=c(2,2))
plot(mon.alpha0,ty='l')
plot(mon.alpha1,ty='l')
plot(mon.tau,ty='l')
plot(mon.cp,ty='l')
round(mean(mon.cp[501:1500]),4)
## [1] 32.3363
round(quantile(mon.cp[501:1500],c(0.025,.975)),4)
## 2.5% 97.5%
## 25.0295 45.6382
There are also models, which use a rather complicated reversible jump Markov Chain Monte Carlo algorithm, which make the number of change points into a parameter. You can then evaluate the posterior probability that it was exactly one change point or that there have been at least two.
If the precision parameter of the proposal distribution is selected too large, this algorithm may converge quicker, but it tend to stay at the same value for a rather long time. That’s because the newly proposed values tend to be too far from the current one to be accepted. In contrast when it is too small, it may converge finally but it will take lots of iterations. Just meandering up and down far away from the correct value.
Just a NOTE: There exist algorithms which adapt the proposal distribution automatically.
Unlike grid-based approximations, such as trapezoidal rule, Monte Carlo integration does not suffer from the curse of dimensionality.
Sampling from a pdf is essential to bayesian inference. Together, the methods allow to sample and thus obtain expected values, variances etc. for any distribution with a known pdf.
Inverse sampling Algorithm
Cumulative probability (of a random variable) is always uniform distributed. Therefore, we can sample from uniform distribution \(U(0,1)\) and turn the sampled values to the random variable, if the inverse of the cumulative distribution function (cdf) is known (shortcoming: not many cdfs are invertible).
Consider a random variable \(y\) with the pdf \(f(y)\) and the cdf \(F(y)\) such that its inverse \(F^-1(y)\) can be analytically derived.
The inverse sampling algorithm then consists of the following two steps:
(i) Sample \(u\) from the uniform distribution \(U(0,1)\).
(ii) Evaluate \(y = F^{-1}(u)\).
(iii) The resulting will follow the distribution with pdf \(f(x)\).
rejection sampling and importance sampling
Allow us to use the existing sampling algorithms to sample from distributions for which such algorithms do not exist. They are both less efficient than inverse sampling. The closer the substitute distribution is to the objective, the more efficient are the algorithms.
A Gaussian distributed response variable \(y\) predicted by an identity link function of linear predictor paramter \(\mu\) and precision parameter \(\tau\):
\[ y = N(I(\mu, \tau)) \ \ \ \ \ \ \ \ \ \mu = X \theta + Z \xi \]
Other formulation of the Model (only with random-intercept)
Matrix form: \[ y = X \theta + Z\xi + \epsilon \]
Vector form: \[ \hat{y_i} = \theta_0 + \theta_1^T x_i + z_i \xi_{ID} \]
Scalar form: \[ y_i = \theta_0 + \sum_{j=1}^p \theta_j x_{i,j} + \xi_{\text{ID}_i} + \epsilon_i \]
likelihood (NOT joint likelihood) \[y_i|\mu_i,\tau \sim N(\mu_i,\tau)\] Random effect is considered as ID specific deviation of the response from the population intercept (\(\xi_{\text{ID}_i}\)).
Including this partly resolve the measurement dependency issue caused by hierarchical, longitudinal, or repeated measurement.
\[\mu_i = \theta_0 + \theta_1 x_i + \xi_{\text{ID}_i}\]
Prior \[\theta_0 \sim N(\mu_{\theta_0},\tau_{\theta_0})\] \[\theta_1 \sim N(\mu_{\theta_1},\tau_{\theta_1})\] \[\tau \sim Gamma(\alpha,\beta)\]
Other priors shall be considered for the random effect (\(j=1,...,J\)): \[\xi_j \sim N(0,\tau_{\xi})\] \[\tau_{\xi} \sim Gamma(\alpha_{\xi},\beta_{\xi})\]
Lets simulate data via arbitrary definition of all components of a simple linear mixed model:
### We simulate 10 measurements collected over time (covariate x) for each of 6 IDs (total # of measurements is 60)
set.seed(1234)
nID = 4 # number of groups or IDs
ID = factor(rep(1:nID, each=5)) # ID of measurements
n = length(ID) # total # of measurements
# parameters
intercept = 3 # fixed effects
slope = .75
sigma = 1 # residual sd
tau = .5 # re sd
u = rnorm(nID, mean=0, sd=tau) # random effects
e = rnorm(n, mean=0, sd=sigma) # residual error
# data
x = rnorm(n) # covariate
y = intercept + slope*x + u[ID] + e # see model 1
d = data.frame(x, y, ID=ID)
# matrix form
X = cbind(1, x)
knitr::kable(X , caption = "matrix X")
| x | |
|---|---|
| 1 | -0.6937202 |
| 1 | -1.4482049 |
| 1 | 0.5747557 |
| 1 | -1.0236557 |
| 1 | -0.0151383 |
| 1 | -0.9359486 |
| 1 | 1.1022975 |
| 1 | -0.4755931 |
| 1 | -0.7094400 |
| 1 | -0.5012581 |
| 1 | -1.6290935 |
| 1 | -1.1676193 |
| 1 | -2.1800396 |
| 1 | -1.3409932 |
| 1 | -0.2942939 |
| 1 | -0.4658975 |
| 1 | 1.4494963 |
| 1 | -1.0686427 |
| 1 | -0.8553646 |
| 1 | -0.2806230 |
teta = c(intercept, slope)
knitr::kable(teta , caption = "vector teta")
| x |
|---|
| 3.00 |
| 0.75 |
Z = model.matrix(~-1+ID)
knitr::kable(Z , caption = "matrix Z")
| ID1 | ID2 | ID3 | ID4 |
|---|---|---|---|
| 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 1 |
y = X%*%teta + Z%*%u + e
knitr::kable(y , caption = "vector y2")
| 2.3053016 |
| 1.8163693 |
| 2.2527940 |
| 1.0820935 |
| 1.8206614 |
| 1.5467153 |
| 3.4882451 |
| 1.7836334 |
| 1.8303807 |
| 2.8272299 |
| 3.2798945 |
| 2.5562206 |
| 1.3961813 |
| 1.6252803 |
| 2.4843285 |
| 3.8935632 |
| 3.0483616 |
| 0.5349832 |
| 0.7450798 |
| 2.0762733 |
library(lme4)
##
## Attaching package: 'lme4'
## The following object is masked from 'package:brms':
##
## ngrps
lmeMod = lmer(y ~ x + (1|ID), data=d)
## boundary (singular) fit: see help('isSingular')
summary(lmeMod)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x + (1 | ID)
## Data: d
##
## REML criterion at convergence: 49.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.6768 -0.4886 -0.1065 0.3941 2.0933
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 6.399e-17 7.999e-09
## Residual 6.709e-01 8.191e-01
## Number of obs: 20, groups: ID, 4
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 2.3880 0.2231 10.706
## x 0.4488 0.2129 2.108
##
## Correlation of Fixed Effects:
## (Intr)
## x 0.571
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Lets model some real data