Simple linear model
Deterministic: \(E[y_i] = \beta_0 + \beta_1x\)
Stochastic: \(y_i \sim Normal(E[y_i], \sigma)\)
If it’s continuous:
Simple linear model
Deterministic: \(E[y_i] = \beta_0 + \beta_1x\)
Stochastic: \(y_i \sim Normal(E[y_i], \sigma)\)
If it’s discrete:
Where \(x_1\) = 1 or 0
(Intercept) x
7.087216 3.382339
Multiple groups
Deterministic: \(E[y_i] = \beta_0 + \beta_1x_{1,i} + \beta_2x_{2,i}\)
Stochastic: \(y_i \sim Normal(E[y_i], \sigma)\)
Where \(x_1\) = 1 or 0
And \(x_2\) = 1 or 0
Group | Value of x1 | Value of x2 |
---|---|---|
1 | 0 | 0 |
2 | 1 | 0 |
3 | 0 | 1 |
We can have multiple regression
Deterministic: \(E[y_i] = \beta_0 + \beta_1x_{1,i} + \beta_2x_{2,i}\)
Stochastic: \(y_i \sim Normal(E[y_i], \sigma)\)
x1 = continuous
x2 = 1 or 0
Linearity
Normality
Independence
Equal variance
What do Glm’s do?
If normal:
\[ \underbrace{E[y_i]}_{\text{expected value}} = \underbrace{\beta_0 + \beta_1x_{1,i} + ... \beta_mx_{m,i}}_{deterministic} \]
\[ y_i \sim \underbrace{N(mean=E[y_i], var=\sigma^2)}_{stochastic} \]
Poisson glm:
\[ \underbrace{log(\lambda)}_{\text{link function}} = \underbrace{\beta_0 + \beta_1x_{1,i} + ... \beta_mx_{m,i}}_{deterministic} \]
\[ y_i \sim \underbrace{Poisson(\lambda)}_{stochastic} \]
Negative Binomial glm:
\[ \underbrace{log(\lambda)}_{\text{link function}} = \underbrace{\beta_0 + \beta_1x_{1,i} + ... \beta_mx_{m,i}}_{deterministic} \]
\[ y_i \sim \underbrace{NB(\mu,\theta)}_{stochastic} \]
\(variance = \frac{\theta}{\mu+\theta}\)
Create two datasets: One with zeroes and ones \(Z_i\)
\(Z_i \sim Bernoulli (p_i)\)
\(logit(pi) = \beta_0 + ...\)
and then a Poisson or negative binomial
Set nets on each of those sites
Measure 50, 43, 67, and 90 fish. For mercury concentration
What assumptions are we breaking?
\(E(Hg_i) = \beta_0 + \beta_1*size_{i}\)
Pseudoreplication
\(E(Hg_i) = \beta_0 + \beta_1size_{i} + \beta_2 site2_i + \beta_3 site3_i + \beta_4 site4_i\)
We don’t care about the specific sites… we want whole population-wide!
Mercury concentration population-wide
Variance introduces by placement of the nets
\[ Hg_{ij} \sim \underbrace{(\beta_0 +\underbrace{\gamma_j}_{\text{Random intercept}})}_{intercept} + \underbrace{(\beta_1+\underbrace{\psi_j}_{\text{Random slope}})size_{i}}_{slope} +\underbrace{\epsilon}_\text{ind var} \]
\(\gamma_j \sim Normal(0,\sigma_\gamma)\)
\(\psi_j \sim Normal(0,\sigma_\psi)\)
\(\epsilon \sim Normal(0,\sigma)\)
If we set all random effects to zero, we get population mean, and predicted value for a random individual
Set nets on each of those sites
Measure 50, 43, 67, and 90 fish. For number of parasites OR presence of parasites?
What assumptions are we breaking?
Linear portion of the Mixed effects models have a deterministic and stochastic component
Count data
Binary data (1, 0)
Multinomial response variable
We talked last week about overdispersion
Poisson, Negative Binomial, Bernoulli and Binomial are not parameterized in terms of separate mean and variance parameters
Normal distribution is: \(Normal(\mu, \sigma)\)
Mean and variance of Poisson are \(\lambda\)
Mean and variance of Bernoulli are a function of p
\(Mean = p\) and \(variance=p(1-p)\)
How can we run mixed effects models?
Easy way: Add random effects to the linear predictor, leading to generalized linear mixed effect models
Essentially, you have two sources of variation
One is normally dsitributed, the other one is distributed according to a different distribution
A poisson glm:
\(log(\lambda) = \beta_0 + \beta_1x_i\)
\(y_i \sim Poisson(\lambda)\)
A glmm: Poisson-normal
\(log(\lambda) = (\beta_0+\gamma) + (\beta_1+\psi)x_{ij}\)
\(\gamma \sim N(0,\sigma_\gamma)\) , \(\psi\sim N(0,\sigma_\psi)\)
\(y_i \sim Poisson(\lambda)\)
Log norm
Not easy to interpret!
In random effects if we set all random effects to 0, then we estimate the mean for a “typical” individual
“typical means subject, site, individual, etc.”
The variances are in different distributions
Typical individual does not equal “population average response”
Both curves are not lining up
Due to non-logit transformations (random effects are normal)
If you do glmm’s be very careful about interpretation
Package GLMMadaptive
estimates marginal means
How can we run mixed effects models?
Easy way: Add random effects to the linear predictor, leading to generalized linear mixed effect models
Hards way: Generalized Estimating Equations