Chapter 4

4.1

The data bellow show the number of cases of AIDS in Australia by date

y <- c(1, 6, 16, 23, 27, 39, 31, 30, 43, 51, 63, 70, 88, 97, 91, 104, 110, 113, 149, 159)
x = ts(y, start = c(1984, 1), frequency = 4)
x
##      Qtr1 Qtr2 Qtr3 Qtr4
## 1984    1    6   16   23
## 1985   27   39   31   30
## 1986   43   51   63   70
## 1987   88   97   91  104
## 1988  110  113  149  159
plot(x)

(a)

plot(1:20,y)

(b)

plot(log(1:20),log(y))

(c)

The IRWLS method is based on the the method of scoring which in turn is based on the Newton-Raphson formula,

Basically, the method has the following structure:

  1. Obtain an initial estimate \(\bf{\beta}\),
  2. replace \(f(y_i,\beta)\) with a Taylor series approximation,
  3. evaluate all expressions that involve \(\beta\) at the current estimate,
  4. solve the resulting system of equations
  5. actualize the new \(\beta\) and repeat (2)-(5) until \(\{ \beta^k \}\) converges.

Thus, the general formulation in the GLM context has the form,

\[\begin{equation} \bf{\beta}^{(m)} = \bf{\beta}^{(m+1)} + [\mathcal{J}^{(m-1)}]^{-1}U^{(m-1)} \label{iterGLM} \end{equation}\]

or

\[\begin{equation} \mathcal{J}^{(m-1)}\bf{\beta}^{(m)} = \mathcal{J}^{(m-1)}\bf{\beta}^{(m+1)} + U^{(m-1)} \label{iterGLM2} \end{equation}\]

where \(\bf{\beta}^{(m)}\) is the vector of estimates of the parameters at the \(m\)th iteration, \([\mathbb{J}^{(m-1)}]^{-1}\) is the inverse of the information matrix and \(U^{(m-1)}\) is the score function evaluated on \(\bf{\beta}^{(m-1)}\).

Thus, we can re-write the information matrix as

\[ \mathcal{J} = X^TWX \]

where \(W\) is the diagonal matrix with elements

\[ w_{ii} = \frac{1}{Var(Y_i)}\left( \frac{\partial E(Y_i)}{\partial \eta_i} \right)^2 \]

and

\[ \mathcal{J}^{(m-1)}\bf{\beta}^{(m+1)} + U^{(m-1)} = X^TW\bf{z}\]

where \(\bf{z}\) has elements

\[ z_i = \displaystyle\sum_{k=1}^{p} x_{i,j} \beta_k^{m-1} + (y_i - E(Y_i)) \left( \frac{\partial \eta_i}{\partial E(Y_i)} \right) \]

with \(E(Y_i)\) and \(\frac{\partial \eta_i}{\partial E(Y_i)}\) are evaluated at \(\bf{\beta}^{(m-1)}\).

Hence the iterative equation can be written as

\[X^TWX\bf{\beta}^{(m)}=X^TW\bf{z}\]

This is the same form as the normal equations for a linear model obtained by weighed least squares, except that it has to be solved iteratively because, in general, the components of the equation depends on \(\beta\).

In this case

\[E[y_i] = exp\{\beta_0 + \beta_1x_i\} = e^{\eta_i} = \lambda_i\] then

\[ w_{ii} = \frac{1}{Var(Y_i)}\left( \frac{\partial E(Y_i)}{\partial \eta_i} \right)^2 = e^{\eta_i} = \lambda_i\] and

\[ z_i = \langle \bf{x_i},\beta \rangle + (y_i -\lambda_i ) \frac{1}{\lambda_i}\]

# according to book page 66, we have
x <- log(1:20)
X <- cbind(1, x)

# With this explanation we can write a function as below
Newton.Raphson <- function (bstar, X, y){
  distance = 1
  while(distance > 0.001){
    lambda <- exp(bstar %*% t(X))
    # now according to these starting points we calculate matrix W and z in equetion 4.25
    W <- diag(c(lambda))
    z <- bstar %*% t(X) + (y - lambda) / lambda
    z <- matrix(z, 20, 1)
    # next
    next.b <- solve(t(X) %*% W %*% X) %*% (t(X) %*% W %*% z)
      next.b <- c(next.b)
    distance <- sum(abs(bstar - next.b))    
    bstar <- next.b
  }
  return(betahat = bstar)
}
# we start with initial value for betahat1 and betahat2
bstar <- c(3,3)
Newton.Raphson(bstar = bstar, X, y)
## [1] 0.995998 1.326610

(d)

res.p <- glm(y ~ x, family = poisson(link = "log")) 
summary(res.p)
## 
## Call:
## glm(formula = y ~ x, family = poisson(link = "log"))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0568  -0.8302  -0.3072   0.9279   1.7310  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.99600    0.16971   5.869 4.39e-09 ***
## x            1.32661    0.06463  20.525  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 677.264  on 19  degrees of freedom
## Residual deviance:  21.755  on 18  degrees of freedom
## AIC: 138.05
## 
## Number of Fisher Scoring iterations: 4

These results are the same with the result from our function. That is nice, now the glm estimation function is not a black-box for us anymore because we now exacly how it works.

4.2

(a)

y = c(65,156,100,134,16,108,121,4,39,143,56,26,22,1,1,5,65)
x= c(3.36,2.88,3.63,3.41,3.78,4.02,4,4.23,3.73,3.85,3.97,4.51,4.54,5,5,4.72,5)
ggplot2::qplot(x,y)

mean(y)
## [1] 62.47059

it looks a bit linear, isn’t it?

(b)

\(g(\cdot) = log(\cdot)\).

(c)

\[f(y,\theta) = \theta e^{-\theta y} = exp\{ln(\theta)-\theta y \} \] which is part of the exponential family with \(a(y)=y\), \(b(\theta)=-\theta\) and \(c(\theta)=ln(\theta)\).

Thus, acording to equations (4.14) and (4.15) of the book we have

\[ E(Y) = \frac{1}{\theta}\]

and

\[ V(Y) = \frac{1}{\theta^2} \]

(d)

fit <- glm(y ~ x, family = Gamma(link = "log"))
summary(fit)
## 
## Call:
## glm(formula = y ~ x, family = Gamma(link = "log"))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9922  -1.2102  -0.2242   0.2102   1.5646  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.4775     1.6034   5.287 9.13e-05 ***
## x            -1.1093     0.3872  -2.865   0.0118 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.9388638)
## 
##     Null deviance: 26.282  on 16  degrees of freedom
## Residual deviance: 19.457  on 15  degrees of freedom
## AIC: 173.97
## 
## Number of Fisher Scoring iterations: 8

(e)

y_hat = exp(fit$coefficients[1]+fit$coefficients[2]*x)
residuals = (y-y_hat)/y_hat
plot(y,y_hat)
abline(0,1)

summary(residuals)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.9467 -0.7795 -0.2077  0.0000  0.2251  2.4673
hist(residuals)

4.3

MLE is \[\hat{\beta} = e^{\bar{y}} \] After that we can verify equations (4.18) and (4.25)

Chapter 5

5.1

(a)

\[W = (\hat{\pi}-\pi)^2 J \] but \(J=\frac{n}{\pi(1-\pi)}\) and \(\hat{\pi} = \frac{y}{n}\) then,

\[ W = \left( \frac{y-\pi n}{n} \right)^2 \frac{n}{\pi(1-\pi)} \]

(b)

\[ U^T J^{-1} U = \left( \frac{y-\pi n}{\pi(1-\pi)} \right)^2 \frac{\pi(1-\pi)}{n} = W \]

(c)

\[ D = 2\left[ y log(\frac{y}{n\pi}) + (n-y)log(\frac{n-y}{n(1-\pi)})\right] \]

(d)

W <- function(y,n,pi){
  W = ((y/n-pi)^2)*(n/(pi*(1-pi)))
  return(W)
}
D <- function(y,n,pi){
  D = 2*(y*log(y/(n*pi))+(n-y)*log((n-y)/(n*(1-pi))))
  return(D)
}
qchisq(.95, df=1)
## [1] 3.841459

$3.84 $ corresponds to the 95th percentile of chi-squared distribution, if we calculate the statistics for \(\pi = 0.1\)

W(n=10,y=3,pi=0.1)
## [1] 4.444444
D(n=10,y=3,pi=0.1)
## [1] 3.073272

then the Wald statistic would suggest to reject \(H_0: \pi = 0.1\) and the second would not suggest the rejection. We do the same for the other 2 hypotesis:

W(n=10,y=3,pi=0.3)
## [1] 0
D(n=10,y=3,pi=0.3)
## [1] 0
W(n=10,y=3,pi=0.5)
## [1] 1.6
D(n=10,y=3,pi=0.5)
## [1] 1.645658

then in both cases both statistics would not reject.

5.2

\[ l(\lambda_{max};y) = \sum log\left(\frac{1}{y_i}\right) - 1 \] \[ l(\lambda;y) = \sum log\left(\frac{1}{\hat{y}_i}\right) - \frac{y_i}{\hat{y}_i} \]

Then,

\[ D = 2\sum \left[ log \left( \frac{\hat{y}_i}{y_i} \right) + \frac{y_i}{\hat{y}_i} - 1 \right] \]

5.3

(a)

pareto: \[ f(y ; \theta)= \frac{\theta}{y^{\theta+1}} \]

Then the log-likelihood function corresponds to

\[ l(\theta ; y) = N log(\theta) -(\theta+1)\sum log(y_i) \]

Thus, calculating the score function and solving the equation \(\frac{\partial l}{\partial \theta} = 0\) we have that

\[ \hat{\theta} = \frac{N}{\sum log(y_i)} \]

(b)

For this case \(J = \frac{N}{\theta^{2}}\), then

\[ W = \frac{(\theta-\hat{\theta})N}{\theta^2} \]

(c)

Using the Wald statistic and equation (5.8) we have that the confidence limits are

\[\left[ \frac{\hat{\theta}}{1 + \frac{1.96}{\sqrt{N}}} , \frac{\hat{\theta}}{1 - \frac{1.96}{\sqrt{N}}} \right]\]

(d)

rpar <- function(theta){
  u = runif(1,0,1)
  p = (1/u)^(1/theta)
  return(p)
}

get_rpar_numbers <- function(n_it=100,theta=2){
  numbers = NULL
  for(i in 1:n_it){
    numbers[i] = rpar(theta)
  }
  return(numbers)
}

numbers = get_rpar_numbers()


hist(numbers,breaks = 10)

theta_est <- function(data){
  theta = length(data)/sum(log(data))
  return(theta)
}

est_theta = theta_est(numbers)
est_theta
## [1] 1.884202
confidence_int <- function(est_theta,N){
  conf = c(est_theta/(1+1.96/(N^0.5)),est_theta/(1-1.96/(N^0.5)))
  return(conf)
}

confidence_int(est_theta,N=length(numbers))
## [1] 1.575420 2.343535
beta = NULL
confidence = matrix(nrow = 20, ncol=2)
for(j in 1:20){
  data = get_rpar_numbers()
  beta[j] = theta_est(data)
  confidence[j,] = confidence_int(est_theta=beta[j],N=length(data))
}

beta
##  [1] 2.435074 2.006309 1.783846 1.900453 1.954688 2.201298 2.152836
##  [8] 2.006313 2.044510 1.948280 1.952087 2.042101 1.756828 2.037029
## [15] 2.186680 1.967729 2.196448 2.485540 2.037834 1.898149
confidence
##           [,1]     [,2]
##  [1,] 2.036015 3.028699
##  [2,] 1.677516 2.495409
##  [3,] 1.491510 2.218713
##  [4,] 1.589008 2.363748
##  [5,] 1.634354 2.431204
##  [6,] 1.840550 2.737932
##  [7,] 1.800030 2.677657
##  [8,] 1.677519 2.495414
##  [9,] 1.709456 2.542922
## [10,] 1.628997 2.423234
## [11,] 1.632180 2.427969
## [12,] 1.707442 2.539926
## [13,] 1.468920 2.185109
## [14,] 1.703202 2.533619
## [15,] 1.828327 2.719751
## [16,] 1.645259 2.447425
## [17,] 1.836495 2.731900
## [18,] 2.078210 3.091467
## [19,] 1.703875 2.534620
## [20,] 1.587081 2.360882
conf_low = confidence[,1]
conf_up = confidence[,2]
sum(beta>conf_low & beta < conf_up)
## [1] 20

5.4

# Data according to table 4.6 page 70
x <- c(3.36,2.88,3.63,3.41,3.78,4.02,4,4.23,3.73,3.85,3.97,4.51,4.54,5,5,4.72,5)
y <- c(65,156,100,134,16,108,121,4,39,143,56,26,22,1,1,5,65)

N <- length(x) # number of observation
X <- cbind(rep(1,N),x) # creating matrix X

# according to excercise 4.2 we have
m <- glm(y ~ x, family = Gamma(link = "log"))
betahat <- round(m$coefficients, 3)
betahat # which is estimator for our parameters in our model
## (Intercept)           x 
##       8.477      -1.109
# part (a)
# calculation yields w_ii = 1

W <- diag(N) # diagonal matrix
J <- t(X) %*% W %*% X 
J # J is information matrix
##                x
##   17.00  69.6300
## x 69.63 291.4571
J.inv <- solve(J) # inverse of information matrix

sigma <- sqrt( diag(J.inv) )
z <- qnorm(0.975) # for 95 percent confidence interval
U <- betahat[1] + sigma[2] * z # Upper bound for confidence interval
L <- betahat[1] - sigma[2] * z # Lower bound for confidence interval
cat("confidence interval for beta1 is:\n (",L, ",",U,")\n" )
## confidence interval for beta1 is:
##  ( 7.693692 , 9.260308 )
# part (b)

# H0 : beta2 = 0
# H1 : beta2 is not equal with zero
# first we obtain dalatD which is deviance differece  
deltaD <- m$null.deviance - m$deviance
deltaD
## [1] 6.825567
# p-value
pchisq(deltaD, df = 1, lower = FALSE)
## [1] 0.008986204
# which provides strong evidence that the initial white blood cell count 
# is a statistically significant predictor of survival time