Levi Waldron
The model: \[ Logit(P(x)) = log \left( \frac{P(x)}{1-P(x)} \right) = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi} \]
Random component: \(y_i\) follows a Binomial distribution (outcome is a binary variable)
Systematic component: linear predictor \[ \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi} \]
Link function: logit (log of the odds that the event occurs)
\[ g(P(x)) = logit(P(x)) = log\left( \frac{P(x)}{1-P(x)} \right) \]
\[ P(x) = g^{-1}\left( \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi} \right) \]
From http://data.princeton.edu/wws509/datasets/#cuse
## age education wantsMore notUsing using
## <25 :4 high:8 no :8 Min. : 8.00 Min. : 4.00
## 25-29:4 low :8 yes:8 1st Qu.: 31.00 1st Qu.: 9.50
## 30-39:4 Median : 56.50 Median :29.00
## 40-49:4 Mean : 68.75 Mean :31.69
## 3rd Qu.: 85.75 3rd Qu.:49.00
## Max. :212.00 Max. :80.00
Univariate regression to “wants more children” only:
fit <- glm(cbind(using, notUsing) ~ wantsMore,
data=cuse, family=binomial("logit"))
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -0.1864 | 0.0797 | -2.34 | 0.0194 |
wantsMoreyes | -1.0486 | 0.1107 | -9.48 | 0.0000 |
Diagram of the estimated coefficients in the GLM. The green arrow indicates the Intercept term, which goes from zero to the mean of the reference group (here the ‘wantsMore = no’ samples). The orange arrow indicates the difference in log-odds of the yes group minus the no group, which is negative in this example. The circles show the individual samples, jittered horizontally to avoid overplotting.
There are four age groups:
fit <- glm(cbind(using, notUsing) ~ age,
data=cuse, family=binomial("logit"))
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -1.5072 | 0.1303 | -11.57 | 0.0000 |
age25-29 | 0.4607 | 0.1727 | 2.67 | 0.0077 |
age30-39 | 1.0483 | 0.1544 | 6.79 | 0.0000 |
age40-49 | 1.4246 | 0.1940 | 7.35 | 0.0000 |
age25-29
, age30-39
, age40-49
symbol | example | meaning |
---|---|---|
+ | + x | include this variable |
- | - x | delete this variable |
: | x : z | include the interaction |
* | x * z | include these variables and their interactions |
^ | (u + v + w)^3 | include these variables and all interactions up to three way |
1 | -1 | intercept: delete the intercept |
fit <- glm(cbind(using, notUsing) ~ age + wantsMore,
data=cuse, family=binomial("logit"))
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -0.8698 | 0.1571 | -5.54 | 0.0000 |
age25-29 | 0.3678 | 0.1754 | 2.10 | 0.0360 |
age30-39 | 0.8078 | 0.1598 | 5.06 | 0.0000 |
age40-49 | 1.0226 | 0.2039 | 5.01 | 0.0000 |
wantsMoreyes | -0.8241 | 0.1171 | -7.04 | 0.0000 |
Interaction is modeled as the product of two covariates: \[ E[y|x] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1*x_2 \]
fit <- glm(cbind(using, notUsing) ~ age * wantsMore,
data=cuse, family=binomial("logit"))
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -1.4553 | 0.2968 | -4.90 | 0.0000 |
age25-29 | 0.6354 | 0.3564 | 1.78 | 0.0746 |
age30-39 | 1.5411 | 0.3183 | 4.84 | 0.0000 |
age40-49 | 1.7643 | 0.3435 | 5.14 | 0.0000 |
wantsMoreyes | -0.0640 | 0.3303 | -0.19 | 0.8464 |
age25-29:wantsMoreyes | -0.2672 | 0.4091 | -0.65 | 0.5137 |
age30-39:wantsMoreyes | -1.0905 | 0.3733 | -2.92 | 0.0035 |
age40-49:wantsMoreyes | -1.3671 | 0.4834 | -2.83 | 0.0047 |
Matrix notation for the multiple linear regression model:
\[ \, \begin{pmatrix} Y_1\\ Y_2\\ \vdots\\ Y_N \end{pmatrix} = \begin{pmatrix} 1&x_1\\ 1&x_2\\ \vdots\\ 1&x_N \end{pmatrix} \begin{pmatrix} \beta_0\\ \beta_1 \end{pmatrix} + \begin{pmatrix} \varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_N \end{pmatrix} \]
or simply:
\[ \mathbf{Y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon} \]
group <- factor( c(1, 1, 2, 2) )
model.matrix(~ group)
## (Intercept) group2
## 1 1 0
## 2 1 0
## 3 1 1
## 4 1 1
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
What if we forgot to code group as a factor?
group <- c(1, 1, 2, 2)
model.matrix(~ group)
## (Intercept) group
## 1 1 1
## 2 1 1
## 3 1 2
## 4 1 2
## attr(,"assign")
## [1] 0 1
group <- factor(c(1,1,2,2,3,3))
model.matrix(~ group)
## (Intercept) group2 group3
## 1 1 0 0
## 2 1 0 0
## 3 1 1 0
## 4 1 1 0
## 5 1 0 1
## 6 1 0 1
## attr(,"assign")
## [1] 0 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
group <- factor(c(1,1,2,2,3,3))
group <- relevel(x=group, ref=3)
model.matrix(~ group)
## (Intercept) group1 group2
## 1 1 1 0
## 2 1 1 0
## 3 1 0 1
## 4 1 0 1
## 5 1 0 0
## 6 1 0 0
## attr(,"assign")
## [1] 0 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
agegroup <- factor(c(1,1,1,1,2,2,2,2))
wantsMore <- factor(c("y","y","n","n","y","y","n","n"))
model.matrix(~ agegroup + wantsMore)
## (Intercept) agegroup2 wantsMorey
## 1 1 0 1
## 2 1 0 1
## 3 1 0 0
## 4 1 0 0
## 5 1 1 1
## 6 1 1 1
## 7 1 1 0
## 8 1 1 0
## attr(,"assign")
## [1] 0 1 2
## attr(,"contrasts")
## attr(,"contrasts")$agegroup
## [1] "contr.treatment"
##
## attr(,"contrasts")$wantsMore
## [1] "contr.treatment"
model.matrix(~ agegroup + wantsMore + agegroup:wantsMore)
## (Intercept) agegroup2 wantsMorey agegroup2:wantsMorey
## 1 1 0 1 0
## 2 1 0 1 0
## 3 1 0 0 0
## 4 1 0 0 0
## 5 1 1 1 1
## 6 1 1 1 1
## 7 1 1 0 0
## 8 1 1 0 0
## attr(,"assign")
## [1] 0 1 2 3
## attr(,"contrasts")
## attr(,"contrasts")$agegroup
## [1] "contr.treatment"
##
## attr(,"contrasts")$wantsMore
## [1] "contr.treatment"
age40-49:wantsMoreyes
in a model with interaction terms:fitX <- glm(cbind(using, notUsing) ~ age * wantsMore,
data=cuse, family=binomial("logit"))
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -1.4553 | 0.2968 | -4.90 | 0.0000 |
age25-29 | 0.6354 | 0.3564 | 1.78 | 0.0746 |
age30-39 | 1.5411 | 0.3183 | 4.84 | 0.0000 |
age40-49 | 1.7643 | 0.3435 | 5.14 | 0.0000 |
wantsMoreyes | -0.0640 | 0.3303 | -0.19 | 0.8464 |
age25-29:wantsMoreyes | -0.2672 | 0.4091 | -0.65 | 0.5137 |
age30-39:wantsMoreyes | -1.0905 | 0.3733 | -2.92 | 0.0035 |
age40-49:wantsMoreyes | -1.3671 | 0.4834 | -2.83 | 0.0047 |
The desired contrast is:
age40-49:wantsMoreyes - age30-39:wantsMoreyes
There are many ways to construct this design, one is with library(multcomp)
:
names(coef(fitX))
## [1] "(Intercept)" "age25-29" "age30-39"
## [4] "age40-49" "wantsMoreyes" "age25-29:wantsMoreyes"
## [7] "age30-39:wantsMoreyes" "age40-49:wantsMoreyes"
contmat <- matrix(c(0,0,0,0,0,0,-1,1), 1)
new.interaction <- multcomp::glht(fitX, linfct=contmat)
summary(new.interaction)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: glm(formula = cbind(using, notUsing) ~ age * wantsMore, family = binomial("logit"),
## data = cuse)
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## 1 == 0 -0.2767 0.3935 -0.703 0.482
## (Adjusted p values reported -- single-step method)
mutate
, group_by
, and summarize
fit1
. Based on fit1
, write on paper the model for expected probability of using birth control. Don’t forget the logit function.fit1
, what is the expected probability of an individual 25-29 years old, with high education, who wants more children, using birth control? Calculate it manually, and using predict(fit1)
fit1
: Relative to women under 25 who want to have children, what is the predicted increase in odds that a woman 40-49 years old who does not want to have children will be taking birth control?fit1
(no interactions)?