Notes 7
Modeling Proportions with Binomial Distribution | Part 2: Interpretations of Beta Parameters for Logit Link Function
Setup
Loading packages
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'broom' was built under R version 4.3.3
Section 9.5 - Odds, Odds Ratios, and the Logit Link
Using the logit link allows for useful interpretation of regression coefficients. Recall the systematic component of the model is
\[logit(\mu) = \log(\frac{\mu}{1-\mu})=\beta_0 + \beta_1 x,\]
where \(\mu\) is the probability of “success”.
Odds
The odds is defined as the ratio of the probability of success to the probability of failure: \(\mu/(1 - \mu)\).
Example: if the probability that a turbine develops fissures is 0.6, the odds that a turbine develops fissures is
## [1] 1.5
Interpretation of odds: The probability of observing fissures is 1.5 times the probability of not observing fissures.
So the logit model says that
\(\log(odds) = \beta_0 + \beta_1 x\),
or, equivalently, odds = \(\exp(\beta_0){\exp(\beta_1)}^x\)
Interpretation of beta coefficients
As x increases by one unit,
the log-odds increase linearly by an amount \(\beta_1\).
the odds increase by a factor of \(\exp(\beta_1)\)
Example 9.3 - turbines
## Hours Turbines Fissures prop
## 1 400 39 0 0.00000000
## 2 1000 53 4 0.07547170
## 3 1400 33 2 0.06060606
## 4 1800 73 7 0.09589041
## 5 2200 30 5 0.16666667
## 6 2600 39 9 0.23076923
## 7 3000 42 9 0.21428571
## 8 3400 13 6 0.46153846
## 9 3800 34 22 0.64705882
## 10 4200 40 21 0.52500000
## 11 4600 36 21 0.58333333
Fit the binomial logit model:
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -3.92 0.378 -10.4 3.03e-25
## 2 Hours 0.000999 0.000114 8.75 2.07e-18
Interpretations:
- Increasing Hours by one increases the odds of a turbine developing fissures by
## [1] 1.001
As a percent:
## [1] 0.09997366
- Interpretation is more useful for scale of data if we consider increasing Hours by 1000. This increases the odds of a turbine developing fissures by a factor of (see below) about 2.7.
## [1] 2.716209
What about decreasing hours by 1000? The odds of a turbine developing a fissure changes by a factor of (see below).
## [1] 0.3681602
Odds ratio for categorical explanatory variables
Example 9.4 - germ
See Section 9.5, p. 342
## Germ Total Extract Seeds
## 1 10 39 Bean OA75
## 2 23 62 Bean OA75
## 3 23 81 Bean OA75
## 4 26 51 Bean OA75
## 5 17 39 Bean OA75
## 6 5 6 Cucumber OA75
## 7 53 74 Cucumber OA75
## 8 55 72 Cucumber OA75
## 9 32 51 Cucumber OA75
## 10 46 79 Cucumber OA75
## 11 10 13 Cucumber OA75
## 12 8 16 Bean OA73
## 13 10 30 Bean OA73
## 14 8 28 Bean OA73
## 15 23 45 Bean OA73
## 16 0 4 Bean OA73
## 17 3 12 Cucumber OA73
## 18 22 41 Cucumber OA73
## 19 15 30 Cucumber OA73
## 20 32 51 Cucumber OA73
## 21 3 7 Cucumber OA73
Response: proportion of
Total Seeds
that germinated (Germ
)Explanatory variable: Seed type (
Seeds
), Root stock (Extract
)
germ <- germ |>
mutate(prop = Germ / Total)
ggplot(
data = germ,
mapping = aes(x = Extract, y = prop)
) +
geom_boxplot() +
geom_jitter(mapping = aes(size = Total), width=.2, alpha=.5) +
facet_wrap(~Seeds)
Write Binomial model for proportions with logit link. (Assume no interaction between seed type and root stock)
\(y\sim Binomial(\mu, m)\)
\(logit(\mu) = \beta_0 + \beta_1 I(Extract = Cucumber)+ \beta_2 I(Seed = OA75)\)
where _____
Fit the model:
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.700 0.151 -4.65 3.36e- 6
## 2 ExtractCucumber 1.06 0.144 7.38 1.55e-13
## 3 SeedsOA75 0.270 0.155 1.75 8.04e- 2
Fitted equation for \(logit(\mu)\) for different categories:
\(logit(\mu) = -.700 + 1.065 I(Extract = Cucumber)+ 0.270 I(Seed = OA75)\)
Extract=Bean, Seeds=OA73
## [1] -0.7
Extract=Bean, Seeds=OA75
## [1] -0.43
Extract=Cucumbers, Seeds=OA73
## [1] 0.365
Extract=Cucumbers, Seeds=OA75
## [1] 0.635
Interpret ExtractCucumber
slope as odds ratio: The odds
of seed germination occurring using cucumber extracts is
2.90_ times the odds using bean extracts, assuming the
type of seeds is held constant.
## [1] 2.900839
Interpret SeedsOA75
slope as odds ratio: The odds of
seed germination occurring using OA75 seeds is 1.30_ times the odds of
using OA73 seeds, assuming the extract is constant.
## [1] 1.309964
What are the fitted probabilities of germination for the 4 extract/seed combination?
## # A tibble: 4 × 3
## Extract Seeds .fitted
## <fct> <fct> <dbl>
## 1 Bean OA75 0.394
## 2 Cucumber OA75 0.654
## 3 Bean OA73 0.332
## 4 Cucumber OA73 0.590