Excercise J-2.4 (adopted from Jennrich 1995)

In this problem we assume the number of tumor incidences \(y_i\) follow a binomial distribution with incidence probability that satisfies an Armitage-Doll model of the form \[ \pi(\beta) = 1 - \exp(-\beta_0 - \beta_1x - \beta_2x^2) \]

We will be fitting the expectations using iteratively reweighted least squares to obtain the maximum likelihood estimates. In other words we fit the model: \[ y_i = \mu_i(\beta) + \varepsilon_i \] with weights \[ w_i = \frac{1}{\sigma^2_i(\beta)} \]

Since \(y_i\) follows a \(Binomial(\pi_i(\beta))\) distribution it follows that

\[ \mu_i(\beta) = n_i(1 - \exp(-\beta_0 - \beta_1x_i - \beta_2x_i^2))\\ \sigma^2_i(\beta) = n_i(1 - \exp(-\beta_0 - \beta_1x_i - \beta_2x_i^2))(\exp(-\beta_0 - \beta_1x_i - \beta_2x_i^2)) \] The Gauss-Newton direction for the weighted least squares is: \[ d = (J^{T}WJ)^{-1}J^{T}W(y-f) \] where \(J\) is the Jacobian matrix (n x p) of \(\mu(\beta)\), \(W\) is the diagonal matrix (n x n) matrix of weights \(w_i\), and \(f\) is the nonlinear function that approximates our response variable \(y\).

The first column of the Jacobian will have the form (for rows \(i=1,2,...n\)) \[ \frac{\partial \mu_i}{\partial \beta_0} = n_i(\exp(-\beta_0 - \beta_1x_i - \beta_2x_i^2)) \] The second column of the Jacobian will have the form (for rows \(i=1,2,...n\)) \[ \frac{\partial \mu_i}{\partial \beta_1} = x_in_i(\exp(-\beta_0 - \beta_1x_i - \beta_2x_i^2)) \] The third column of the Jacobian will have the form (for rows \(i=1,2,...n\)) \[ \frac{\partial \mu_i}{\partial \beta_2} = x_i^2n_i(\exp(-\beta_0 - \beta_1x_i - \beta_2x_i^2)) \] Before implementing our algorithm we show how to find initial values. Our incidence probability equation \(\pi(\beta)\) can be rewritten by taking log of each side as follows: \[ \beta_0 + \beta_1x + \beta_2x^2 = -log(1-\pi) \] A system of equations can be formed by using the data given to us in the problem. In particular, letting \(x = 0\) and \(\pi = 4/111\) results in \(\beta_0 = 0.0367\). Then we can form a 2x2 system of equations involving \(\beta_1\) and \(\beta_2\) by choosing the pair of points \((x,\pi)\) as \((2,\frac{4}{105})\) and \((250, \frac{60}{90})\). Solving this system results in \(\beta_1 = 0.039\) and \(\beta_2 = -0.0001339\). One can check on a graphing calculator, such as Desmos.com, that these parameter estimates satisfy \(0 < \pi(\beta) < 1\) for \(x \in [0,250]\).

Therefore, an initial value to our algorithm will be \(\beta_{initial} = (0.0367, 0.039, -0.0001339)^{T}\)

Exercise J-2.4 Part A

Dose = c(0, 2, 10, 50, 250)
AnimalsTest = c(111, 105, 124, 104, 90)
TumorInc = c(4, 4, 11, 13, 60)
data = data.frame(Dose, AnimalsTest, TumorInc)
data = as.matrix(data)
n = nrow(data)

n.vec = data[, 2]
y = as.matrix(data[, 3])
X = cbind(matrix(1, n, ncol = 1), data[, 1], data[, 1]^2)

# calculates function approximation, Jacobian, and Weight matrix
fJW <- function(beta, X, n.vec) {
    xb = X %*% beta
    mu = n.vec * (1 - exp(-xb))
    sigma = n.vec * (1 - exp(-xb))
    
    # Jacobian
    col1 = n.vec * exp(-xb)
    col2 = X[, 2] * n.vec * exp(-xb)
    col3 = X[, 3] * n.vec * exp(-xb)
    J = cbind(col1, col2, col3)
    
    # Weight matrix
    W = diag(1/as.vector(sigma))
    list(f = mu, J = J, W = W)
}


# Gauss-Newton algorithm
GN = function(y, X, beta0, maxit, tolerr, n.vec) {
    
    for (it in 1:maxit) {
        a = fJW(beta0, X, n.vec)
        f = a$f
        J = a$J
        Wt = a$W
        
        JW = t(J) %*% Wt
        print(sprintf("it = %3.0f   b0 = %6.6f b1 = %6.6f b2 = %6.6f  grad=%2.2e", 
            it, beta0[1, ], beta0[2, ], beta0[3, ], norm(JW %*% (y - f))))
        
        dir = solve(JW %*% J) %*% JW %*% (y - f)
        beta.new = beta0 + dir
        
        c1 = abs(max(1, beta.new))
        c2 = abs(beta.new - beta0)
        mre = max(c2/c1)
        if (mre < tolerr) {
            print("We reached convergence with modified relative error < tolerr")
            break
        }
        # update variable for next iteration
        beta0 = beta.new
    }
    return(beta.new)
}
beta.init = as.matrix(c(0.0367, 0.039, -0.0001339), ncol = 1)
n.vec = data[, 2]
beta.mle = GN(y, X, beta0 = beta.init, maxit = 100, tolerr = 10^-6, n.vec = n.vec)
## [1] "it =   1   b0 = 0.036700 b1 = 0.039000 b2 = -0.000134  grad=2.15e+05"
## [1] "it =   2   b0 = 0.048169 b1 = -0.016497 b2 = 0.000081  grad=4.74e+05"
## [1] "it =   3   b0 = 0.034194 b1 = -0.002593 b2 = 0.000028  grad=1.50e+06"
## [1] "it =   4   b0 = 0.065465 b1 = 0.000181 b2 = 0.000014  grad=1.97e+05"
## [1] "it =   5   b0 = 0.048401 b1 = 0.001417 b2 = 0.000011  grad=1.04e+04"
## [1] "it =   6   b0 = 0.046065 b1 = 0.001650 b2 = 0.000010  grad=1.47e+03"
## [1] "it =   7   b0 = 0.045631 b1 = 0.001689 b2 = 0.000010  grad=2.39e+02"
## [1] "it =   8   b0 = 0.045556 b1 = 0.001696 b2 = 0.000010  grad=4.17e+01"
## [1] "it =   9   b0 = 0.045543 b1 = 0.001697 b2 = 0.000010  grad=7.20e+00"
## [1] "it =  10   b0 = 0.045541 b1 = 0.001697 b2 = 0.000010  grad=1.25e+00"
## [1] "We reached convergence with modified relative error < tolerr"
beta.mle
##              [,1]
## [1,] 4.554026e-02
## [2,] 1.697068e-03
## [3,] 9.681713e-06

Therefore, the mle estimates are: \[ \hat{\beta_0} \approx 0.045541\\ \hat{\beta_1} \approx 0.001697\\ \hat{\beta_2} \approx 0.00000968 \]

Exercise J-2.4 Part B

x.plot = seq(0, 250, length.out = 251)
pi.prob = rep(0, length(x.plot))

for (i in 1:length(x.plot)) {
    current.x = x.plot[i]
    val = beta.mle[1, ] + beta.mle[2, ] * current.x + beta.mle[3, ] * current.x^2
    pi.prob[i] = 1 - exp(-val)
}
plot(x.plot, pi.prob, xlab = "Dose (ppm)", ylab = "Prob. of Tumor Incidence")
observed.probs = TumorInc/AnimalsTest
points(Dose, observed.probs, col = "red")

Exercise J-2.5

Similar to the previous problem, we will use the method of fitting expectations using iteratively reweighted least squares to obtain the maximum likelihood estimates. Since \(y_i\) are \(Bernoulli(\pi_i(\beta))\) we have the following: \[ y_i = \mu_i(\beta) + \varepsilon_i\\ \mu_i(\beta) = \pi_i(\beta) = \frac{\exp(x_i^{T}\beta)}{1+\exp(x_i^{T}\beta)} \] and weights \[ w_i = \frac{1}{\sigma^2_i(\beta)}\\ \sigma^2_i(\beta) = \frac{\exp(x_i^{T}\beta)}{(1+\exp(x_i^{T}\beta))^2} \]

We need the partial derivatives of \(\mu(\beta)\), which means we need the partial derivatives of \(\pi(\beta)\). Using the method of differentials (like on Exam 1) we arrive at the following:

\[ d\pi(d\beta) = \frac{(x_i^{T})(d\beta)(\exp(x_i^{T}\beta))}{(1+\exp(x_i^{T}\beta))^2} \] Then we have the following three columns of the Jacobian (for rows \(i=1,2,... n=659\)) \[ \frac{\partial \pi}{\partial \beta_1} = \frac{\exp(x_i^{T}\beta)}{(1+\exp(x_i^{T}\beta))^2}\\ \frac{\partial \pi}{\partial \beta_2} = \frac{log(D_i)\exp(x_i^{T}\beta)}{(1+\exp(x_i^{T}\beta))^2}\\ \frac{\partial \pi}{\partial \beta_3} = \frac{S_i\exp(x_i^{T}\beta)}{(1+\exp(x_i^{T}\beta))^2}\\ \]

Exercise J-2.5 Part A

In Exam 1 we were given the optimal solution \(\beta^{*} = (-9.562079,3.197561,4.5085)^{T}\). So we know two things from this: (i.) an easy initial value and (ii.) what our algorithm should converge to.

Therefore, to make our lives easy, we will start with \(\beta_{initial} = (-9, 3, 4)^{T}\)

data = read.table("blowBF.txt", header = TRUE)
n = nrow(data)
X = cbind(rep(1, n), log(data[, 1]), data[, 2])
y = matrix(data[, 3], ncol = 1)

fJW <- function(beta, X) {
    xb = X %*% beta
    mu = exp(xb)/(1 + exp(xb))
    sigma = exp(xb)/(1 + exp(xb))^2
    
    # Jacobian
    col1 = sigma
    col2 = X[, 2] * sigma
    col3 = X[, 3] * sigma
    J = cbind(col1, col2, col3)
    
    # Weight matrix
    W = diag(1/as.vector(sigma))
    list(f = mu, J = J, W = W)
}


GN = function(y, X, beta0, maxit, tolerr) {
    
    for (it in 1:maxit) {
        a = fJW(beta0, X)
        f = a$f
        J = a$J
        Wt = a$W
        
        JW = t(J) %*% Wt
        print(sprintf("it = %3.0f   b0 = %6.6f b1 = %6.6f b2 = %6.6f  grad=%2.2e", 
            it, beta0[1, ], beta0[2, ], beta0[3, ], norm(JW %*% (y - f))))
        
        dir = solve(JW %*% J) %*% JW %*% (y - f)
        beta.new = beta0 + dir
        
        c1 = abs(max(1, beta.new))
        c2 = abs(beta.new - beta0)
        mre = max(c2/c1)
        if (mre < tolerr) {
            print("We reached convergence using mre < tolerr")
            break
        }
        # update variable for next iteration
        beta0 = beta.new
    }
    return(beta.new)
}
beta.init = matrix(c(-9, 3, 5), ncol = 1)
a = GN(y, X, beta0 = beta.init, maxit = 100, tolerr = 10^-6)
## [1] "it =   1   b0 = -9.000000 b1 = 3.000000 b2 = 5.000000  grad=1.07e+02"
## [1] "it =   2   b0 = -9.361238 b1 = 3.126904 b2 = 4.415580  grad=5.05e-01"
## [1] "it =   3   b0 = -9.558501 b1 = 3.196318 b2 = 4.506854  grad=1.02e-02"
## [1] "it =   4   b0 = -9.562084 b1 = 3.197563 b2 = 4.508593  grad=3.56e-06"
## [1] "We reached convergence using mre < tolerr"
a
##           [,1]
## [1,] -9.562085
## [2,]  3.197564
## [3,]  4.508593

As we can see from the R console printout, we obtain the mle estimates of \[ \hat{\beta_1} \approx -9.562085\\ \hat{\beta_2} \approx 3.197564\\ \hat{\beta_3} \approx 4.508593 \]

Exercise J-2.5 Part B

# Part b
library(plot3D)
b1 = a[1, ]
b2 = a[2, ]
b3 = a[3, ]

x1 = seq(0, 1, length.out = 50)  #S
x2 = seq(min(X[, 2]), max(X[, 2]), length.out = 50)  #logD

# x1:= S x2:= logD
pi.beta <- function(x1, x2) {
    num = exp(b1 + b2 * x2 + b3 * x1)
    denom = 1 + num
    z = num/denom
    z
}

z.out = outer(x1, x2, pi.beta)
persp3D(x1, x2, z.out, xlab = "S", ylab = "logD", theta = 45, phi = 30, alpha = 0.5)

Exercise J-2.5 Part C

# Part c
predict.prob = pi.beta(0.3, log(10))
predict.prob
## [1] 0.3000951

Therefore, our model predicts that the probability that a tree with a diameter of 10 located in an area with wind severity of 0.3 has a probability of 0.3000951 of blowing down.