In this problem we assume the number of tumor incidences \(y_i\) follow a binomial distribution with incidence probability that satisfies an Armitage-Doll model of the form \[ \pi(\beta) = 1 - \exp(-\beta_0 - \beta_1x - \beta_2x^2) \]
We will be fitting the expectations using iteratively reweighted least squares to obtain the maximum likelihood estimates. In other words we fit the model: \[ y_i = \mu_i(\beta) + \varepsilon_i \] with weights \[ w_i = \frac{1}{\sigma^2_i(\beta)} \]
Since \(y_i\) follows a \(Binomial(\pi_i(\beta))\) distribution it follows that
\[ \mu_i(\beta) = n_i(1 - \exp(-\beta_0 - \beta_1x_i - \beta_2x_i^2))\\ \sigma^2_i(\beta) = n_i(1 - \exp(-\beta_0 - \beta_1x_i - \beta_2x_i^2))(\exp(-\beta_0 - \beta_1x_i - \beta_2x_i^2)) \] The Gauss-Newton direction for the weighted least squares is: \[ d = (J^{T}WJ)^{-1}J^{T}W(y-f) \] where \(J\) is the Jacobian matrix (n x p) of \(\mu(\beta)\), \(W\) is the diagonal matrix (n x n) matrix of weights \(w_i\), and \(f\) is the nonlinear function that approximates our response variable \(y\).
The first column of the Jacobian will have the form (for rows \(i=1,2,...n\)) \[ \frac{\partial \mu_i}{\partial \beta_0} = n_i(\exp(-\beta_0 - \beta_1x_i - \beta_2x_i^2)) \] The second column of the Jacobian will have the form (for rows \(i=1,2,...n\)) \[ \frac{\partial \mu_i}{\partial \beta_1} = x_in_i(\exp(-\beta_0 - \beta_1x_i - \beta_2x_i^2)) \] The third column of the Jacobian will have the form (for rows \(i=1,2,...n\)) \[ \frac{\partial \mu_i}{\partial \beta_2} = x_i^2n_i(\exp(-\beta_0 - \beta_1x_i - \beta_2x_i^2)) \] Before implementing our algorithm we show how to find initial values. Our incidence probability equation \(\pi(\beta)\) can be rewritten by taking log of each side as follows: \[ \beta_0 + \beta_1x + \beta_2x^2 = -log(1-\pi) \] A system of equations can be formed by using the data given to us in the problem. In particular, letting \(x = 0\) and \(\pi = 4/111\) results in \(\beta_0 = 0.0367\). Then we can form a 2x2 system of equations involving \(\beta_1\) and \(\beta_2\) by choosing the pair of points \((x,\pi)\) as \((2,\frac{4}{105})\) and \((250, \frac{60}{90})\). Solving this system results in \(\beta_1 = 0.039\) and \(\beta_2 = -0.0001339\). One can check on a graphing calculator, such as Desmos.com, that these parameter estimates satisfy \(0 < \pi(\beta) < 1\) for \(x \in [0,250]\).
Therefore, an initial value to our algorithm will be \(\beta_{initial} = (0.0367, 0.039, -0.0001339)^{T}\)
Dose = c(0, 2, 10, 50, 250)
AnimalsTest = c(111, 105, 124, 104, 90)
TumorInc = c(4, 4, 11, 13, 60)
data = data.frame(Dose, AnimalsTest, TumorInc)
data = as.matrix(data)
n = nrow(data)
n.vec = data[, 2]
y = as.matrix(data[, 3])
X = cbind(matrix(1, n, ncol = 1), data[, 1], data[, 1]^2)
# calculates function approximation, Jacobian, and Weight matrix
fJW <- function(beta, X, n.vec) {
xb = X %*% beta
mu = n.vec * (1 - exp(-xb))
sigma = n.vec * (1 - exp(-xb))
# Jacobian
col1 = n.vec * exp(-xb)
col2 = X[, 2] * n.vec * exp(-xb)
col3 = X[, 3] * n.vec * exp(-xb)
J = cbind(col1, col2, col3)
# Weight matrix
W = diag(1/as.vector(sigma))
list(f = mu, J = J, W = W)
}
# Gauss-Newton algorithm
GN = function(y, X, beta0, maxit, tolerr, n.vec) {
for (it in 1:maxit) {
a = fJW(beta0, X, n.vec)
f = a$f
J = a$J
Wt = a$W
JW = t(J) %*% Wt
print(sprintf("it = %3.0f b0 = %6.6f b1 = %6.6f b2 = %6.6f grad=%2.2e",
it, beta0[1, ], beta0[2, ], beta0[3, ], norm(JW %*% (y - f))))
dir = solve(JW %*% J) %*% JW %*% (y - f)
beta.new = beta0 + dir
c1 = abs(max(1, beta.new))
c2 = abs(beta.new - beta0)
mre = max(c2/c1)
if (mre < tolerr) {
print("We reached convergence with modified relative error < tolerr")
break
}
# update variable for next iteration
beta0 = beta.new
}
return(beta.new)
}
beta.init = as.matrix(c(0.0367, 0.039, -0.0001339), ncol = 1)
n.vec = data[, 2]
beta.mle = GN(y, X, beta0 = beta.init, maxit = 100, tolerr = 10^-6, n.vec = n.vec)
## [1] "it = 1 b0 = 0.036700 b1 = 0.039000 b2 = -0.000134 grad=2.15e+05"
## [1] "it = 2 b0 = 0.048169 b1 = -0.016497 b2 = 0.000081 grad=4.74e+05"
## [1] "it = 3 b0 = 0.034194 b1 = -0.002593 b2 = 0.000028 grad=1.50e+06"
## [1] "it = 4 b0 = 0.065465 b1 = 0.000181 b2 = 0.000014 grad=1.97e+05"
## [1] "it = 5 b0 = 0.048401 b1 = 0.001417 b2 = 0.000011 grad=1.04e+04"
## [1] "it = 6 b0 = 0.046065 b1 = 0.001650 b2 = 0.000010 grad=1.47e+03"
## [1] "it = 7 b0 = 0.045631 b1 = 0.001689 b2 = 0.000010 grad=2.39e+02"
## [1] "it = 8 b0 = 0.045556 b1 = 0.001696 b2 = 0.000010 grad=4.17e+01"
## [1] "it = 9 b0 = 0.045543 b1 = 0.001697 b2 = 0.000010 grad=7.20e+00"
## [1] "it = 10 b0 = 0.045541 b1 = 0.001697 b2 = 0.000010 grad=1.25e+00"
## [1] "We reached convergence with modified relative error < tolerr"
beta.mle
## [,1]
## [1,] 4.554026e-02
## [2,] 1.697068e-03
## [3,] 9.681713e-06
Therefore, the mle estimates are: \[ \hat{\beta_0} \approx 0.045541\\ \hat{\beta_1} \approx 0.001697\\ \hat{\beta_2} \approx 0.00000968 \]
x.plot = seq(0, 250, length.out = 251)
pi.prob = rep(0, length(x.plot))
for (i in 1:length(x.plot)) {
current.x = x.plot[i]
val = beta.mle[1, ] + beta.mle[2, ] * current.x + beta.mle[3, ] * current.x^2
pi.prob[i] = 1 - exp(-val)
}
plot(x.plot, pi.prob, xlab = "Dose (ppm)", ylab = "Prob. of Tumor Incidence")
observed.probs = TumorInc/AnimalsTest
points(Dose, observed.probs, col = "red")
Similar to the previous problem, we will use the method of fitting expectations using iteratively reweighted least squares to obtain the maximum likelihood estimates. Since \(y_i\) are \(Bernoulli(\pi_i(\beta))\) we have the following: \[ y_i = \mu_i(\beta) + \varepsilon_i\\ \mu_i(\beta) = \pi_i(\beta) = \frac{\exp(x_i^{T}\beta)}{1+\exp(x_i^{T}\beta)} \] and weights \[ w_i = \frac{1}{\sigma^2_i(\beta)}\\ \sigma^2_i(\beta) = \frac{\exp(x_i^{T}\beta)}{(1+\exp(x_i^{T}\beta))^2} \]
We need the partial derivatives of \(\mu(\beta)\), which means we need the partial derivatives of \(\pi(\beta)\). Using the method of differentials (like on Exam 1) we arrive at the following:
\[ d\pi(d\beta) = \frac{(x_i^{T})(d\beta)(\exp(x_i^{T}\beta))}{(1+\exp(x_i^{T}\beta))^2} \] Then we have the following three columns of the Jacobian (for rows \(i=1,2,... n=659\)) \[ \frac{\partial \pi}{\partial \beta_1} = \frac{\exp(x_i^{T}\beta)}{(1+\exp(x_i^{T}\beta))^2}\\ \frac{\partial \pi}{\partial \beta_2} = \frac{log(D_i)\exp(x_i^{T}\beta)}{(1+\exp(x_i^{T}\beta))^2}\\ \frac{\partial \pi}{\partial \beta_3} = \frac{S_i\exp(x_i^{T}\beta)}{(1+\exp(x_i^{T}\beta))^2}\\ \]
In Exam 1 we were given the optimal solution \(\beta^{*} = (-9.562079,3.197561,4.5085)^{T}\). So we know two things from this: (i.) an easy initial value and (ii.) what our algorithm should converge to.
Therefore, to make our lives easy, we will start with \(\beta_{initial} = (-9, 3, 4)^{T}\)
data = read.table("blowBF.txt", header = TRUE)
n = nrow(data)
X = cbind(rep(1, n), log(data[, 1]), data[, 2])
y = matrix(data[, 3], ncol = 1)
fJW <- function(beta, X) {
xb = X %*% beta
mu = exp(xb)/(1 + exp(xb))
sigma = exp(xb)/(1 + exp(xb))^2
# Jacobian
col1 = sigma
col2 = X[, 2] * sigma
col3 = X[, 3] * sigma
J = cbind(col1, col2, col3)
# Weight matrix
W = diag(1/as.vector(sigma))
list(f = mu, J = J, W = W)
}
GN = function(y, X, beta0, maxit, tolerr) {
for (it in 1:maxit) {
a = fJW(beta0, X)
f = a$f
J = a$J
Wt = a$W
JW = t(J) %*% Wt
print(sprintf("it = %3.0f b0 = %6.6f b1 = %6.6f b2 = %6.6f grad=%2.2e",
it, beta0[1, ], beta0[2, ], beta0[3, ], norm(JW %*% (y - f))))
dir = solve(JW %*% J) %*% JW %*% (y - f)
beta.new = beta0 + dir
c1 = abs(max(1, beta.new))
c2 = abs(beta.new - beta0)
mre = max(c2/c1)
if (mre < tolerr) {
print("We reached convergence using mre < tolerr")
break
}
# update variable for next iteration
beta0 = beta.new
}
return(beta.new)
}
beta.init = matrix(c(-9, 3, 5), ncol = 1)
a = GN(y, X, beta0 = beta.init, maxit = 100, tolerr = 10^-6)
## [1] "it = 1 b0 = -9.000000 b1 = 3.000000 b2 = 5.000000 grad=1.07e+02"
## [1] "it = 2 b0 = -9.361238 b1 = 3.126904 b2 = 4.415580 grad=5.05e-01"
## [1] "it = 3 b0 = -9.558501 b1 = 3.196318 b2 = 4.506854 grad=1.02e-02"
## [1] "it = 4 b0 = -9.562084 b1 = 3.197563 b2 = 4.508593 grad=3.56e-06"
## [1] "We reached convergence using mre < tolerr"
a
## [,1]
## [1,] -9.562085
## [2,] 3.197564
## [3,] 4.508593
As we can see from the R console printout, we obtain the mle estimates of \[ \hat{\beta_1} \approx -9.562085\\ \hat{\beta_2} \approx 3.197564\\ \hat{\beta_3} \approx 4.508593 \]
# Part b
library(plot3D)
b1 = a[1, ]
b2 = a[2, ]
b3 = a[3, ]
x1 = seq(0, 1, length.out = 50) #S
x2 = seq(min(X[, 2]), max(X[, 2]), length.out = 50) #logD
# x1:= S x2:= logD
pi.beta <- function(x1, x2) {
num = exp(b1 + b2 * x2 + b3 * x1)
denom = 1 + num
z = num/denom
z
}
z.out = outer(x1, x2, pi.beta)
persp3D(x1, x2, z.out, xlab = "S", ylab = "logD", theta = 45, phi = 30, alpha = 0.5)
# Part c
predict.prob = pi.beta(0.3, log(10))
predict.prob
## [1] 0.3000951
Therefore, our model predicts that the probability that a tree with a diameter of 10 located in an area with wind severity of 0.3 has a probability of 0.3000951 of blowing down.