Let Y be a random variable representing the data. We have a (parametric) model for Y with Y∼f(y∣θ),θ∈Θ, Θ the parameter space. That is probability density function and can result in the likelihood function. Then we have a prior distribution represented by π(θ). Given an observation of Y, the posterior distribution of θ is
\[ f(\theta \mid y) =\frac{f(y\mid\theta) \pi(\theta)}{\int_\Theta f(y\mid\theta) \pi(\theta)\; d\theta} \]
the posterior is proportional to the likelihood times the prior (\(f(y)\)is a constant).
To find the max probability of mu given some observations.
Assign mu a arbitrary prior probability (or estimated by maximum likelihood method using a sample). To keep things simple and intuitive, we will assess our distributions (i.e. prior, likelihood, posterior) at equally spaced points . for each specific value approximate the probability \(\pi\) that μ actually lies within a narrow interval (0.01) around this point (treating them as categorical distributions without gap).
\[ \int{f(y\mid\theta) }= \int_{-\infty}^\infty \frac1{\sqrt{2\pi}} e^{-\frac12 (y-\theta)^2}\; d\theta = 1 \]
res <- 1 / 100 # Set resolution for the sequence of potential mu-values, 0.01 interval
mu.h <- seq(-10, 10, res) # Define values at which to assess the probability density
density <- dnorm(mu.h, mean = -1, sd = 1.5) # Get the density for each value in mu.h
prior <- density * res # Multiply with resolution to get probability per interval
length(mu.h)
## [1] 2001
head(mu.h)
## [1] -10.00 -9.99 -9.98 -9.97 -9.96 -9.95
sum(prior)
## [1] 1
length(prior)
## [1] 2001
set.seed(4) # Set seed for reproducibility
n <- 100 # Set number of observations
y <- rnorm(n, mean = 1, sd = 2) # Generate data
y # Take a look at the data
## [1] 1.43350973 -0.08498514 2.78228929 2.19196115 4.27123600 2.37855088
## [7] -1.56249326 0.57371096 4.79307974 4.55372643 2.13320900 1.03143891
## [13] 1.76611468 0.90972577 1.06870381 1.33805355 3.33005368 0.91159201
## [19] 0.79926311 0.43311086 4.08162996 1.33033804 3.61524472 3.57651376
## [25] 2.18579388 0.43411263 3.51176805 2.81967830 -0.85605621 3.48036168
## [31] 1.30692836 3.10386516 -0.50842243 -1.96437824 2.72226374 0.19096034
## [37] 0.54518917 2.86819234 0.06820824 -0.27508700 3.68741725 1.36307077
## [43] 3.58502467 -2.37609715 -0.64198716 -0.72429229 1.19768738 0.24868971
## [49] 2.44780831 -2.59476404 -0.32748628 -0.24745298 0.84073514 1.87124953
## [55] 4.94180194 -0.19351735 -0.10501442 2.39193327 0.68867207 3.69779639
## [61] -1.13704614 3.12890149 -1.62544353 5.12738940 1.26276602 0.53662310
## [67] 0.20528895 2.77886416 2.05233808 0.65745351 1.31735379 0.02866987
## [73] -0.91781215 1.36103458 2.44346857 0.26091904 1.47507663 -0.33184422
## [79] -0.59361502 0.89660614 3.57385667 0.57170067 -0.14949093 -1.94145409
## [85] -1.06547687 -1.61304971 -0.67650481 -1.26130736 1.73749635 0.59639396
## [91] -1.55531981 -0.59602496 1.31816485 2.22959527 2.37589592 0.90589798
## [97] 5.66064336 -0.15513198 2.93695827 0.44492875
Suppose we know the standard deviation is 2. For a single observation, \(y_i\), the likelihood is then: \(p\left(y_i \mid \mu\right)=N\left(y_i \mid \mu, 2^2\right)\). The likelihood for the whole dataset is the product of all the individual likelihoods:
\[ p(y \mid \mu)=\prod_{i=1}^n N\left(y_i \mid \mu, 2^2\right) \]
we can calculate the likelihood across the set of potential values for \(\mu\).
likelihood <- vector() # Create an empty vector
for (s in mu.h) { # Loop over each value in mu.h , 2001 MU
temp.lik <- 1 # Create a temporary likelihood scalar
for (i in 1:n) { # Loop over each individual data point, 30 observations
temp.lik <- temp.lik * dnorm(y[i], mean = s, sd = 2) # Multiply individual likelihoods
}
likelihood <- c(likelihood, temp.lik) # Store the likelihood for each value in mu.h
}
# output likelihood value
head(likelihood)
## [1] 0 0 0 0 0 0
length(likelihood)
## [1] 2001
# normalized.likelihood <- likelihood / sum(likelihood)
As we know from Bayes’ theorem, we can obtain a distribution that is proportional (” \(\propto\) “) to the posterior by multiplying the likelihood and the prior:
\[ p(\mu \mid y) \propto p(y \mid \mu) p(\mu) \]
unnormalized.posterior <- likelihood * prior
normalizing.constant <- sum(likelihood * prior) #p(y)
normalizing.constant #integrate
## [1] 5.001066e-90
# head(unnormalized.posterior) #no integrate
posterior <- unnormalized.posterior / sum(likelihood * prior)
which.max(posterior)
## [1] 1116
mu.h[which.max(posterior)]
## [1] 1.15
prior and posterior probability
plot(mu.h, posterior / res, type = "l", ylab = "")
# lines(mu.h, normalized.likelihood / res, col = "red")
lines(mu.h, prior / res, col = "green")
#having a corresponding relationship for the two normal distributions
let us try to estimate both the mean, μ, and the standard deviation, σ. The standard deviation must be positive, let us give σ a flat prior over the range from 0 to 10, using a uniform distribution: p(σ)∼U(0,10).
\[ p(\mu, \sigma)=p(\mu) p(\sigma) \]
mu prior
res <- 1 / 100 # Set resolution for the sequence of potential mu-values, 0.01 interval
mu.h <- seq(-10, 10, res) # Define values at which to assess the probability density
density <- dnorm(mu.h, mean = -1, sd = 1.5) # Get the density for each value in mu.h
prior <- density * res # Multiply with resolution to get probability per interval
length(mu.h)
## [1] 2001
head(mu.h)
## [1] -10.00 -9.99 -9.98 -9.97 -9.96 -9.95
sum(prior)
## [1] 1
length(prior)
## [1] 2001
mu.prior <- prior
σ prior
sigma.h <- seq(0, 10, res) # Define values at which to assess the probabilities for sigma
sigma.density <- dunif(sigma.h, min(sigma.h), max(sigma.h)) # Density-values for a flat prior
sigma.prior <- sigma.density * res # Multiply with res to get probability per interval
length(sigma.h)
## [1] 1001
head(sigma.h)
## [1] 0.00 0.01 0.02 0.03 0.04 0.05
sum(sigma.prior)
## [1] 1.001
length(sigma.prior)
## [1] 1001
Now that we have specified priors for μ and σ, we can also specify their joint prior distribution, p(μ,σ).
joint.prior <- sapply(mu.prior, function(mu.s) { # Using sapply's instead of loops
sapply(sigma.prior, function(sigma.s) { mu.s * sigma.s })})
This gives us a two-dimensional grid of probabilities, representing all possible combinations of μ and σ values that we are considering.
joint.prior[1:5,1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 4.050589e-14 4.215803e-14 4.38756e-14 4.566113e-14 4.75172e-14
## [2,] 4.050589e-14 4.215803e-14 4.38756e-14 4.566113e-14 4.75172e-14
## [3,] 4.050589e-14 4.215803e-14 4.38756e-14 4.566113e-14 4.75172e-14
## [4,] 4.050589e-14 4.215803e-14 4.38756e-14 4.566113e-14 4.75172e-14
## [5,] 4.050589e-14 4.215803e-14 4.38756e-14 4.566113e-14 4.75172e-14
set.seed(4) # Set seed for reproducibility
n <- 200 # Set number of observations
y <- rnorm(n, mean = 1, sd = 2) # Generate data # Take a look at the data
If we calculate the likelihood across the possible combinations of values for μ and σ,
\[ p(y \mid \mu,\sigma)=\prod_{i=1}^n N\left(y_i \mid \mu, \sigma^2\right) \]
joint.likelihood <- sapply(mu.h, function(mu.s) { # Sapply's instead of loops
sapply(sigma.h, function(sigma.s) { prod(dnorm(y, mu.s, sigma.s)) })})
# normalized.joint.likelihood <- joint.likelihood / sum(joint.likelihood) # Normalize
The posterior distribution is now a joint distribution for our two parameters.
\[ p(\mu, \sigma \mid y) \propto p(y \mid \mu, \sigma) p(\mu, \sigma) \]
unnormalized.joint.posterior <- joint.likelihood * joint.prior
normalizing.constant <- 1 / sum(joint.likelihood * joint.prior)
joint.posterior <- unnormalized.joint.posterior * normalizing.constant #have the same max as unnormalized.joint.posterior
# find the position in the matrix
df<-matrix(joint.posterior,nrow = 1001, byrow = F)
dim(df)
## [1] 1001 2001
position=which(df==max(joint.posterior),arr.ind = T)
position
## row col
## [1,] 194 1101
# then detect the ordinal mu and sigma by position
mu.h[position[2]]
## [1] 1
sigma.h[position[1]]
## [1] 1.93
Plot
contour(mu.h,sigma.h,t(joint.posterior))
we often want to draw conclusions about the values of a given parameter without reference to the other parameter(s) in the model. a marginal posterior distribution is to average the joint distribution over the parameter(s).
to get the marginal posterior for p(μ∣y), we integrate the joint posterior over σ: by summing the posterior probabilities in our grid for each potential value of σ.
\[ p(\mu \mid y)=\int p(\mu, \sigma \mid y) d(\sigma) \]
mu.marginal.posterior <- colSums(df) #integrate or sum
which.max(mu.marginal.posterior)
## [1] 1101
mu.h[which.max(mu.marginal.posterior)]
## [1] 1
plot(mu.h,mu.marginal.posterior/res,type = "l"); lines(mu.h,mu.prior/res,col=2)
sigma.marginal.posterior <- rowSums(df) #integrate or sum
which.max(sigma.marginal.posterior )
## [1] 194
sigma.h[which.max(sigma.marginal.posterior )]
## [1] 1.93
plot(sigma.h,sigma.marginal.posterior/res,type = "l"); lines(sigma.h,sigma.prior/res,col=2)
the number of elements in the grid increases exponentially with the number of parameters. For such problems, we would normally sample from the posterior distribution using a form of Markov Chain Monte Carlo (MCMC) or solved analytically.
The prevalence is 2.28%. Assume the test (94% and 98%, correspondingly for sensitivity and specificity). What is the probability with positive test result has the disease?
# d is disease and t is test
#Prior p(theta1)
p.d1 <- .0228
p.d0 <- 1-p.d1
#Sensitivity: p(tau1|theta1)
p.t1.d1 <- .94
#Specificity: p(tau0|theta0)
p.t0.d0 <- .98
p.t1.d0 <- 1-p.t0.d0
#Bayes prob
(p.d1.t1 <- p.t1.d1* p.d1 /( p.t1.d1 *p.d1+ p.t1.d0 *p.d0))
## [1] 0.5230379
# PPV = (sensitivity x prevalence) / [ (sensitivity x prevalence) + ((1 – specificity) x (1 – prevalence)) ]
(p.t1.d1 * p.d1) / ( (p.t1.d1 * p.d1) + (1 - p.t0.d0) * (1 - p.d1) )
## [1] 0.5230379
\[ P(x ; p, n)=\left(\begin{array}{c} n \\ x \end{array}\right)(p)^x(1-p)^{(n-x)} \quad \text { for } x=0,1,2, \cdots, n \]
use the normal distribution as prior distribution
\[ \int{f(y\mid\theta) }= \int_{-\infty}^\infty \frac1{\sqrt{2\pi}} e^{-\frac12 (y-\theta)^2}\; d\theta = 1 \]
res <- 1 / 100 # Set resolution for the sequence of potential mu-values, 0.01 interval
mu.h <- seq(0, 1, res) # Define values at which to assess the probability density
density <- dnorm(mu.h, mean = -1, sd = 1.5) # Get the density for each value in mu.h
prior <- density * res # Multiply with resolution to get probability per interval
length(mu.h)
## [1] 101
head(mu.h)
## [1] 0.00 0.01 0.02 0.03 0.04 0.05
sum(prior)
## [1] 0.1628928
length(prior)
## [1] 101
# sigma.h <- seq(0, 10, res) # Define values at which to assess the probabilities for sigma
# sigma.density <- dunif(sigma.h, min(sigma.h), max(sigma.h)) # Density-values for a flat prior
# sigma.prior <- sigma.density * res # Multiply with res to get probability per interval
#
#
# length(sigma.h)
# head(sigma.h)
#
# sum(sigma.prior)
# length(sigma.prior)
set.seed(4) # Set seed for reproducibility
n <- 500 # Set number of observations
y <- rbinom(n, size=1, p = 0.26) # Generate data # Take a look at the data
y
## [1] 0 0 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0
## [38] 0 0 0 1 0 0 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 0 0
## [75] 1 0 0 0 0 0 1 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0
## [112] 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 1
## [149] 1 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [186] 0 0 0 1 1 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0
## [223] 1 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0
## [260] 1 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
## [297] 0 0 1 0 1 0 0 0 1 1 1 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0
## [334] 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [371] 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 1 0 0 1
## [408] 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0
## [445] 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0
## [482] 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
use binomial distribution likelihood function.
\[ p(y \mid p)=\prod_{i=1}^n B\left(y_i \mid p\right) \]
Where n=1.
joint.likelihood <- sapply(mu.h, function(s) prod(dbinom(y, 1, s)) )
# normalized.joint.likelihood <- joint.likelihood / sum(joint.likelihood) # Normalize
\[ p(p \mid y) \propto p(y \mid p) p(p) \]
unnormalized.joint.posterior <- joint.likelihood * prior
normalizing.constant <- 1 / sum(joint.likelihood * prior)
joint.posterior <- unnormalized.joint.posterior * normalizing.constant #have the same max as unnormalized.joint.posterior
# find the position in the matrix
mu.h[which.max(joint.posterior)]
## [1] 0.27
Plot
plot(mu.h, joint.posterior / res, type = "l", ylab = "")
# lines(mu.h, normalized.likelihood / res, col = "red")
lines(mu.h, prior / res, col = "green")