Jim Savage
v
Under the assumption that the true parameter of a model is 0, (and the model is correctly specified, and the observations are without bias), the p-value tells us the probability of estimating a parameter at least as large as the observed effect.
Bayesian statistics is difficult in the sense that thinking is difficult. - Donald Berry, Duke
lm
function\[ y_{i} = \beta_{0} + \beta_{1}x_{1,i} + \beta_{2}x_{2,i} + \epsilon \] with \( \epsilon \sim\mathcal{N}(0,\sigma) \)
is the same as writing
\[ y_{i} \sim \mathcal{N}(\beta_{0} + \beta_{1}x_{1,i} + \beta_{2}x_{2,i}, \sigma) \]
\[ y_{i} \sim \mathcal{N}(\mbox{Conditional mean}, \mbox{Potentially conditional volatility}) \]
Note that the above model is the joint distribution of the data and parameters only if we think the parameters are fixed and known. In reality, we are uncertain about the value of the \( \beta \) s
So our generative model typically takes the form \( p(y_{i}, \beta) \)
In R:
# Random number generation
rnorm(number, mean, standard devaition)
# Height of the density at point x
dnorm(x, mean, standard deviation)
# Cumulative probability function up to point q
pnorm(q, mean, standard deviation)
In R:
# Random number generation
rt(number, degrees of freedom, mean)
# Height of the density at point x
dnorm(x, degrees of freedom, mean)
# Cumulative probability function up to point q
pnorm(q, degrees of freedom, mean)
Strictly positive, right skewed
In R:
install.packages("MCMCpack")
library(MCMCpack)
# Random number generation
rinvgamma(number, shape, scale)
# Height of density at point x
rinvgamma(x, shape, scale)
plot(density(x))
where x
are your random numbersCorrelation a measure between 0 and 1 where 0 implies no linear relationship between two variables, and 1 implies perfect linear relationship.
A Correlation matrix contains the two-way correlations between all variables in the system.
V1 V2 V3
V1 1.0000000 -0.482711 0.1231727
V2 -0.4827110 1.000000 0.4823770
V3 0.1231727 0.482377 1.0000000
A univariate distribution's scale is determined by the \( \sigma \) value. Each variable in a multivariate distribution also has it's own scale variable. We collect them together in a vector called \( \tau \).
\[ \tau = [\sigma_{1}, \sigma_{2}, \dots, \sigma_{P}]' \]
Covariance matrices combine the information from a scale vector with the correlation matrix. We typically use \( \Sigma \) to denote a covariance matrix and \( \Omega \) for a correlation matrix.
\[ \Sigma = \mbox{diag}(\tau)\Omega\mbox{diag}(\tau) \]
Where the diag operator places the values in \( \tau \) along the diagonal elements of a matrix (with zeroes everywher else)
In R:
library(MASS)
# Define your correlation matrix
cormat <- matrix(c(1, -0.5, 0.1, -0.5, 1, 0.5, 0.1, 0.5, 1), 3, 3)
# Define your scale vector
scalemat <- c(4, 2, 1)
# Create your covariance matrix
Sigma <- diag(scalemat) %*% cormat %*% diag(scalemat)
# Create the mean vector
means <- c(3, 7, 2)
xs <- mvrnorm(n = number, means, Sigma) %>% as.data.frame
cor(xs); plot(xs)
When its parameter is large, it collapses on an identity matrix (no correlation between variables)
Not yet available in R, but available in Stan
\[ y_{i} = \beta_{0} + \beta_{1}x_{1,i} + \dots + \beta_{P}x_{p,i} + \epsilon_{i} \]
\[ y = X\beta + \epsilon \]
with \( \epsilon \sim \mathcal{N}(0, \sigma) \)
In summary,
dnorm
, dt
, dinvgamma
etc. Example in R: Two parameter likelihood.
Let's take our linear model from before:
\[ y = X\beta + \epsilon \]
With \( \epsilon \sim \mathcal{N}(0, \sigma) \)
As before, we can express this in probabilistic notation:
\[ y \sim\mathcal{N}(X\beta, \sigma) \]
\[ p(\theta | y) = \frac{p(y | \theta)p(\theta)}{p(y)} \]
Because the denominator \( p(y) \) does not depend on the parameters \( \theta \), we typically write this out in proportional notation:
\[ p(\theta | y) \propto p(y | \theta)p(\theta) \]
That is, if we want to make inference about the posterior \( p(\theta | y) \), we need to have the likelihood \( p(y | \theta) \), and a prior distribution for the parameters of the model \( p(\theta) \).
Again: We want to make inference (want to know mean, sd, quantiles etc.) about the posterior. Unfortunately for all but a few simple models, the right hand side \( p(y | \theta)p(\theta) \) is not analytically solveable.
mean(theta[,1])
Code blocks: