Notes 8

Modeling Proportions with Binomial Distribution | Part 3: Probit Link, Median Effective Dose

Setup

Loading packages

library(tidyverse)

## Warning: package 'ggplot2' was built under R version 4.3.3

library(GLMsData)
library(broom)

## Warning: package 'broom' was built under R version 4.3.3

Logit link, probit link (Section 9.3)

Recall the binomial, logit link model for the turbines data:

\(y \sim Binomial(\mu, m)\), where
- \(y\) is the proportion of turbines with fissures
- \(\mu\) is the proportion parameter
- \(m\) is the number of turbines
\(\log(\frac{\mu}{1-\mu})=logit(\mu)=\beta_0 + \beta_1 Hours\)

In the systematic component #2, another common link function to use for the proportion parameter \(\mu\) is \(probit(\mu\)):

\(probit(\mu) = \beta_0 + \beta_1 Hours\)

The probit function is the same as the qnorm() for the quantile function for the standard normal distribution. We provide the proportion \(\mu\), and the qnorm() (or probit) function outputs the standard normal value that has a probability of \(\mu\) to the left.

Examples:

qnorm(.2)

## [1] -0.8416212

qnorm(.6)

## [1] 0.2533471

qnorm(.9)

## [1] 1.281552

Pictures:

Why use link functions?

The idea behind link functions for the binomial model is that

they take the proportion \(\mu\), which must take values between 0 and 1
and transform it to the range of values for the linear predictor \(\beta_0 + \beta_1 x\), which can take values between negative infinity to positive infinity.

This is true for both the link functions we have discussed for the binomial model: logit and probit. Fig 9.2 (Section 9.3) illustrates both link functions:

tibble(
  mu = seq(from = .001, to = .999, length=500), 
  logit = log(mu / (1 - mu)), 
  probit = qnorm(mu)
) |> 
  pivot_longer(cols = c(logit, probit), names_to = "Link function", values_to = "Link(mu)") |> 
  ggplot(
    mapping = aes(x = mu, y = `Link(mu)`, color = `Link function`)
  ) + 
  geom_line() + 
  labs(x = "mu, proportion parameter")

Fitting turbines data with probit link

data(turbines)
turbines <- turbines |> 
  mutate(prop = Fissures/Turbines)
turbines

##    Hours Turbines Fissures       prop
## 1    400       39        0 0.00000000
## 2   1000       53        4 0.07547170
## 3   1400       33        2 0.06060606
## 4   1800       73        7 0.09589041
## 5   2200       30        5 0.16666667
## 6   2600       39        9 0.23076923
## 7   3000       42        9 0.21428571
## 8   3400       13        6 0.46153846
## 9   3800       34       22 0.64705882
## 10  4200       40       21 0.52500000
## 11  4600       36       21 0.58333333

fit_probit <- glm(prop  ~ Hours, family=binomial(link="probit"), weights=Turbines, data=turbines)

tidy(fit_probit)

## # A tibble: 2 × 5
##   term         estimate std.error statistic  p.value
##   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept) -2.28     0.197        -11.5  9.55e-31
## 2 Hours        0.000578 0.0000626      9.24 2.49e-20

Predicted probabilities

augment(fit_probit, type.predict = "response")

## # A tibble: 11 × 9
##      prop Hours `(weights)` .fitted .resid   .hat .sigma .cooksd .std.resid
##     <dbl> <int>       <int>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>      <dbl>
##  1 0        400          39  0.0205 -1.27  0.144   0.996 0.0802      -1.37 
##  2 0.0755  1000          53  0.0448  0.987 0.220   1.03  0.211        1.12 
##  3 0.0606  1400          33  0.0713 -0.245 0.134   1.10  0.00512     -0.263
##  4 0.0959  1800          73  0.108  -0.351 0.272   1.10  0.0305      -0.411
##  5 0.167   2200          30  0.158   0.132 0.0982  1.11  0.00107      0.139
##  6 0.231   2600          39  0.220   0.161 0.116   1.11  0.00196      0.172
##  7 0.214   3000          42  0.294  -1.17  0.130   1.01  0.112       -1.26 
##  8 0.462   3400          13  0.378   0.610 0.0503  1.09  0.0106       0.626
##  9 0.647   3800          34  0.469   2.09  0.181   0.748 0.587        2.31 
## 10 0.525   4200          40  0.561  -0.456 0.297   1.09  0.0627      -0.544
## 11 0.583   4600          36  0.650  -0.824 0.356   1.05  0.298       -1.03

Compare fits under logit and probit links

First, fitting the logit model and collecting the predicted probabilities under both the logit and probit models:

fit_logit <- glm(prop  ~ Hours, family=binomial(link="logit"), weights=Turbines, data=turbines)

fitted_logit <- augment(fit_logit, type.predict = "response") |> 
  rename(logit = .fitted) |> 
  select(prop, Hours, logit)

fitted_probit <- augment(fit_probit, type.predict = "response") |> 
  rename(probit = .fitted) |> 
  select(prop, Hours, probit)

Then merging these probabilities into a single dataset and plotting (along with the observed proportions):

inner_join(fitted_logit, fitted_probit, by = join_by(Hours, prop)) |> 
  pivot_longer(cols = c("logit", "probit"), names_to = "link_fcn", values_to = "fitted") |> 
  ggplot(
    mapping = aes(x = Hours, y = prop, color = link_fcn)
  ) + 
  geom_point(color = "black") + 
  geom_line(aes(y = fitted))

We see that the predicted probabilities under the logit and probit link functions are very similar.

Section 9.6 - Median Effective Dose

(Basically, given a probability - solving for x.)

From text: “Binomial GLMs are commonly used to examine the relationship between the dose \(d\) of a drug or poison and the proportion \(y\) of insects (or plants, or animals) that survive. These models are called dose–response models. Associated with these experiments is the concept of the median effective dose, ED50: the dose of poison affecting 50% of the insects.

We use the systematic component of the model to find this dose \(d\):

\[g(\mu) = \beta_0 + \beta_1 d,\]

where \(g()\) is the link function (either logit or probit). After fitting the model, we know the maximum likelihood estimates of the beta parameters, \(\hat{\beta}_0\) and \(\hat{\beta}_1\). We can then use algebra to solve for \(d\), the dose that affects a proportion of \(\mu\) of the insects:

\[ d = \frac{g(\mu) - \hat{\beta}_0}{\hat{\beta}_1} \]

`turbines` example

To calculate the run time for which 50% of turbines would be expected to experience fissures:

Under the logit model: \(g(\mu) = \log\frac{\mu}{1-\mu}\)

mu <- .5
num <- log(mu/(1-mu)) - tidy(fit_logit)$estimate[1] # num: numerator of fraction
num / tidy(fit_logit)$estimate[2]

## [1] 3926.592

Under the probit model: \(g(\mu) = qnorm(\mu)\)

mu <- .5
num <- qnorm(mu) - tidy(fit_probit)$estimate[1] # num: numerator of fraction
num / tidy(fit_probit)$estimate[2]

## [1] 3935.197

These numbers should seem reasonable based on our graph above.

We could of course calculate the run time for any percent of failure rate, say 20% (as long as it’s reasonable based on the data):

Under the logit model:

# calculate
mu <- .2
num <- log(mu/(1-mu)) - tidy(fit_logit)$estimate[1] # num: numerator of fraction
num / tidy(fit_logit)$estimate[2]

## [1] 2539.239

Under the probit model:

# calculate
mu <- .2
num <- qnorm(mu) - tidy(fit_probit)$estimate[1] # num: numerator of fraction
num / tidy(fit_probit)$estimate[2]

## [1] 2479.913

Notes 8

Modeling Proportions with Binomial Distribution | Part 3: Probit Link, Median Effective Dose

Notes 8

Setup

Logit link, probit link (Section 9.3)

Why use link functions?

Fitting turbines data with probit link

Compare fits under logit and probit links

Section 9.6 - Median Effective Dose

turbines example

`turbines` example