The data bellow show the number of cases of AIDS in Australia by date
y <- c(1, 6, 16, 23, 27, 39, 31, 30, 43, 51, 63, 70, 88, 97, 91, 104, 110, 113, 149, 159)
x = ts(y, start = c(1984, 1), frequency = 4)
x
## Qtr1 Qtr2 Qtr3 Qtr4
## 1984 1 6 16 23
## 1985 27 39 31 30
## 1986 43 51 63 70
## 1987 88 97 91 104
## 1988 110 113 149 159
plot(x)
plot(1:20,y)
plot(log(1:20),log(y))
The IRWLS method is based on the the method of scoring which in turn is based on the Newton-Raphson formula,
Basically, the method has the following structure:
Thus, the general formulation in the GLM context has the form,
\[\begin{equation} \bf{\beta}^{(m)} = \bf{\beta}^{(m+1)} + [\mathcal{J}^{(m-1)}]^{-1}U^{(m-1)} \label{iterGLM} \end{equation}\]or
\[\begin{equation} \mathcal{J}^{(m-1)}\bf{\beta}^{(m)} = \mathcal{J}^{(m-1)}\bf{\beta}^{(m+1)} + U^{(m-1)} \label{iterGLM2} \end{equation}\]where \(\bf{\beta}^{(m)}\) is the vector of estimates of the parameters at the \(m\)th iteration, \([\mathbb{J}^{(m-1)}]^{-1}\) is the inverse of the information matrix and \(U^{(m-1)}\) is the score function evaluated on \(\bf{\beta}^{(m-1)}\).
Thus, we can re-write the information matrix as
\[ \mathcal{J} = X^TWX \]
where \(W\) is the diagonal matrix with elements
\[ w_{ii} = \frac{1}{Var(Y_i)}\left( \frac{\partial E(Y_i)}{\partial \eta_i} \right)^2 \]
and
\[ \mathcal{J}^{(m-1)}\bf{\beta}^{(m+1)} + U^{(m-1)} = X^TW\bf{z}\]
where \(\bf{z}\) has elements
\[ z_i = \displaystyle\sum_{k=1}^{p} x_{i,j} \beta_k^{m-1} + (y_i - E(Y_i)) \left( \frac{\partial \eta_i}{\partial E(Y_i)} \right) \]
with \(E(Y_i)\) and \(\frac{\partial \eta_i}{\partial E(Y_i)}\) are evaluated at \(\bf{\beta}^{(m-1)}\).
Hence the iterative equation can be written as
\[X^TWX\bf{\beta}^{(m)}=X^TW\bf{z}\]
This is the same form as the normal equations for a linear model obtained by weighed least squares, except that it has to be solved iteratively because, in general, the components of the equation depends on \(\beta\).
In this case
\[E[y_i] = exp\{\beta_0 + \beta_1x_i\} = e^{\eta_i} = \lambda_i\] then
\[ w_{ii} = \frac{1}{Var(Y_i)}\left( \frac{\partial E(Y_i)}{\partial \eta_i} \right)^2 = e^{\eta_i} = \lambda_i\] and
\[ z_i = \langle \bf{x_i},\beta \rangle + (y_i -\lambda_i ) \frac{1}{\lambda_i}\]
# according to book page 66, we have
x <- log(1:20)
X <- cbind(1, x)
# With this explanation we can write a function as below
Newton.Raphson <- function (bstar, X, y){
distance = 1
while(distance > 0.001){
lambda <- exp(bstar %*% t(X))
# now according to these starting points we calculate matrix W and z in equetion 4.25
W <- diag(c(lambda))
z <- bstar %*% t(X) + (y - lambda) / lambda
z <- matrix(z, 20, 1)
# next
next.b <- solve(t(X) %*% W %*% X) %*% (t(X) %*% W %*% z)
next.b <- c(next.b)
distance <- sum(abs(bstar - next.b))
bstar <- next.b
}
return(betahat = bstar)
}
# we start with initial value for betahat1 and betahat2
bstar <- c(3,3)
Newton.Raphson(bstar = bstar, X, y)
## [1] 0.995998 1.326610
res.p <- glm(y ~ x, family = poisson(link = "log"))
summary(res.p)
##
## Call:
## glm(formula = y ~ x, family = poisson(link = "log"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0568 -0.8302 -0.3072 0.9279 1.7310
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.99600 0.16971 5.869 4.39e-09 ***
## x 1.32661 0.06463 20.525 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 677.264 on 19 degrees of freedom
## Residual deviance: 21.755 on 18 degrees of freedom
## AIC: 138.05
##
## Number of Fisher Scoring iterations: 4
These results are the same with the result from our function. That is nice, now the glm estimation function is not a black-box for us anymore because we now exacly how it works.
y = c(65,156,100,134,16,108,121,4,39,143,56,26,22,1,1,5,65)
x= c(3.36,2.88,3.63,3.41,3.78,4.02,4,4.23,3.73,3.85,3.97,4.51,4.54,5,5,4.72,5)
ggplot2::qplot(x,y)
mean(y)
## [1] 62.47059
it looks a bit linear, isn’t it?
\(g(\cdot) = log(\cdot)\).
\[f(y,\theta) = \theta e^{-\theta y} = exp\{ln(\theta)-\theta y \} \] which is part of the exponential family with \(a(y)=y\), \(b(\theta)=-\theta\) and \(c(\theta)=ln(\theta)\).
Thus, acording to equations (4.14) and (4.15) of the book we have
\[ E(Y) = \frac{1}{\theta}\]
and
\[ V(Y) = \frac{1}{\theta^2} \]
fit <- glm(y ~ x, family = Gamma(link = "log"))
summary(fit)
##
## Call:
## glm(formula = y ~ x, family = Gamma(link = "log"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9922 -1.2102 -0.2242 0.2102 1.5646
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.4775 1.6034 5.287 9.13e-05 ***
## x -1.1093 0.3872 -2.865 0.0118 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 0.9388638)
##
## Null deviance: 26.282 on 16 degrees of freedom
## Residual deviance: 19.457 on 15 degrees of freedom
## AIC: 173.97
##
## Number of Fisher Scoring iterations: 8
y_hat = exp(fit$coefficients[1]+fit$coefficients[2]*x)
residuals = (y-y_hat)/y_hat
plot(y,y_hat)
abline(0,1)
summary(residuals)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.9467 -0.7795 -0.2077 0.0000 0.2251 2.4673
hist(residuals)
MLE is \[\hat{\beta} = e^{\bar{y}} \] After that we can verify equations (4.18) and (4.25)
\[W = (\hat{\pi}-\pi)^2 J \] but \(J=\frac{n}{\pi(1-\pi)}\) and \(\hat{\pi} = \frac{y}{n}\) then,
\[ W = \left( \frac{y-\pi n}{n} \right)^2 \frac{n}{\pi(1-\pi)} \]
\[ U^T J^{-1} U = \left( \frac{y-\pi n}{\pi(1-\pi)} \right)^2 \frac{\pi(1-\pi)}{n} = W \]
\[ D = 2\left[ y log(\frac{y}{n\pi}) + (n-y)log(\frac{n-y}{n(1-\pi)})\right] \]
W <- function(y,n,pi){
W = ((y/n-pi)^2)*(n/(pi*(1-pi)))
return(W)
}
D <- function(y,n,pi){
D = 2*(y*log(y/(n*pi))+(n-y)*log((n-y)/(n*(1-pi))))
return(D)
}
qchisq(.95, df=1)
## [1] 3.841459
$3.84 $ corresponds to the 95th percentile of chi-squared distribution, if we calculate the statistics for \(\pi = 0.1\)
W(n=10,y=3,pi=0.1)
## [1] 4.444444
D(n=10,y=3,pi=0.1)
## [1] 3.073272
then the Wald statistic would suggest to reject \(H_0: \pi = 0.1\) and the second would not suggest the rejection. We do the same for the other 2 hypotesis:
W(n=10,y=3,pi=0.3)
## [1] 0
D(n=10,y=3,pi=0.3)
## [1] 0
W(n=10,y=3,pi=0.5)
## [1] 1.6
D(n=10,y=3,pi=0.5)
## [1] 1.645658
then in both cases both statistics would not reject.
\[ l(\lambda_{max};y) = \sum log\left(\frac{1}{y_i}\right) - 1 \] \[ l(\lambda;y) = \sum log\left(\frac{1}{\hat{y}_i}\right) - \frac{y_i}{\hat{y}_i} \]
Then,
\[ D = 2\sum \left[ log \left( \frac{\hat{y}_i}{y_i} \right) + \frac{y_i}{\hat{y}_i} - 1 \right] \]
pareto: \[ f(y ; \theta)= \frac{\theta}{y^{\theta+1}} \]
Then the log-likelihood function corresponds to
\[ l(\theta ; y) = N log(\theta) -(\theta+1)\sum log(y_i) \]
Thus, calculating the score function and solving the equation \(\frac{\partial l}{\partial \theta} = 0\) we have that
\[ \hat{\theta} = \frac{N}{\sum log(y_i)} \]
For this case \(J = \frac{N}{\theta^{2}}\), then
\[ W = \frac{(\theta-\hat{\theta})N}{\theta^2} \]
Using the Wald statistic and equation (5.8) we have that the confidence limits are
\[\left[ \frac{\hat{\theta}}{1 + \frac{1.96}{\sqrt{N}}} , \frac{\hat{\theta}}{1 - \frac{1.96}{\sqrt{N}}} \right]\]
rpar <- function(theta){
u = runif(1,0,1)
p = (1/u)^(1/theta)
return(p)
}
get_rpar_numbers <- function(n_it=100,theta=2){
numbers = NULL
for(i in 1:n_it){
numbers[i] = rpar(theta)
}
return(numbers)
}
numbers = get_rpar_numbers()
hist(numbers,breaks = 10)
theta_est <- function(data){
theta = length(data)/sum(log(data))
return(theta)
}
est_theta = theta_est(numbers)
est_theta
## [1] 1.884202
confidence_int <- function(est_theta,N){
conf = c(est_theta/(1+1.96/(N^0.5)),est_theta/(1-1.96/(N^0.5)))
return(conf)
}
confidence_int(est_theta,N=length(numbers))
## [1] 1.575420 2.343535
beta = NULL
confidence = matrix(nrow = 20, ncol=2)
for(j in 1:20){
data = get_rpar_numbers()
beta[j] = theta_est(data)
confidence[j,] = confidence_int(est_theta=beta[j],N=length(data))
}
beta
## [1] 2.435074 2.006309 1.783846 1.900453 1.954688 2.201298 2.152836
## [8] 2.006313 2.044510 1.948280 1.952087 2.042101 1.756828 2.037029
## [15] 2.186680 1.967729 2.196448 2.485540 2.037834 1.898149
confidence
## [,1] [,2]
## [1,] 2.036015 3.028699
## [2,] 1.677516 2.495409
## [3,] 1.491510 2.218713
## [4,] 1.589008 2.363748
## [5,] 1.634354 2.431204
## [6,] 1.840550 2.737932
## [7,] 1.800030 2.677657
## [8,] 1.677519 2.495414
## [9,] 1.709456 2.542922
## [10,] 1.628997 2.423234
## [11,] 1.632180 2.427969
## [12,] 1.707442 2.539926
## [13,] 1.468920 2.185109
## [14,] 1.703202 2.533619
## [15,] 1.828327 2.719751
## [16,] 1.645259 2.447425
## [17,] 1.836495 2.731900
## [18,] 2.078210 3.091467
## [19,] 1.703875 2.534620
## [20,] 1.587081 2.360882
conf_low = confidence[,1]
conf_up = confidence[,2]
sum(beta>conf_low & beta < conf_up)
## [1] 20
# Data according to table 4.6 page 70
x <- c(3.36,2.88,3.63,3.41,3.78,4.02,4,4.23,3.73,3.85,3.97,4.51,4.54,5,5,4.72,5)
y <- c(65,156,100,134,16,108,121,4,39,143,56,26,22,1,1,5,65)
N <- length(x) # number of observation
X <- cbind(rep(1,N),x) # creating matrix X
# according to excercise 4.2 we have
m <- glm(y ~ x, family = Gamma(link = "log"))
betahat <- round(m$coefficients, 3)
betahat # which is estimator for our parameters in our model
## (Intercept) x
## 8.477 -1.109
# part (a)
# calculation yields w_ii = 1
W <- diag(N) # diagonal matrix
J <- t(X) %*% W %*% X
J # J is information matrix
## x
## 17.00 69.6300
## x 69.63 291.4571
J.inv <- solve(J) # inverse of information matrix
sigma <- sqrt( diag(J.inv) )
z <- qnorm(0.975) # for 95 percent confidence interval
U <- betahat[1] + sigma[2] * z # Upper bound for confidence interval
L <- betahat[1] - sigma[2] * z # Lower bound for confidence interval
cat("confidence interval for beta1 is:\n (",L, ",",U,")\n" )
## confidence interval for beta1 is:
## ( 7.693692 , 9.260308 )
# part (b)
# H0 : beta2 = 0
# H1 : beta2 is not equal with zero
# first we obtain dalatD which is deviance differece
deltaD <- m$null.deviance - m$deviance
deltaD
## [1] 6.825567
# p-value
pchisq(deltaD, df = 1, lower = FALSE)
## [1] 0.008986204
# which provides strong evidence that the initial white blood cell count
# is a statistically significant predictor of survival time