For Fisher scoring we use the following iterative equation to calculate the \(\beta\)s
\[\beta^{(t+1)}=\beta^{(t)}+\left [-El''(\beta^{(t)})\right ]^{-1}l'(\beta^{(t)}\] where \(l\) is the log-likelihood function for the entire sample \(y_1,...,y_N\) and \(-E(l''(\beta^{(t)})=-E(\frac{\partial^2 l}{\partial \beta^2})=E\left[(\frac{\partial l}{\partial\beta})^2\right]\)
For the Newton-Raphson method the iteration is almost the same but without expectation
\[\beta^{(t+1)}=\beta^{(t)}+\left [-l''(\beta^{(t)})\right ]^{-1}l'(\beta^{(t)})\] Where \(l''(\beta^{(t)})\) is the Hessian matrix, i.e the second derivative of likelihood function.
With the canonical link in GLM, the actual second derivatives is equal to the observed second derivatives, and Fisher scoring is the same as Newton-Raphson method.
For un-canonical link function, Fisher scoring usually easier to calculate.
Example, a log-binomial regression, for binomial distribution, the canonical link function is logit, instead we use a log function here. We use the following data to study the relationship between the treatment and disease.
mydata<-readRDS('C:\\Users\\enliu\\OneDrive - Australian Catholic University\\Modelling infectious diseases\\treat_disease.rds')
mydata
## treatment disease
## 1 1 1
## 2 1 1
## 3 1 1
## 4 1 1
## 5 1 1
## 6 1 1
## 7 1 1
## 8 1 1
## 9 1 1
## 10 1 1
## 11 1 1
## 12 1 1
## 13 1 1
## 14 1 1
## 15 1 1
## 16 1 1
## 17 1 1
## 18 1 1
## 19 1 1
## 20 1 1
## 21 1 1
## 22 1 1
## 23 1 1
## 24 1 1
## 25 1 1
## 26 1 1
## 27 1 1
## 28 1 1
## 29 1 1
## 30 1 1
## 31 1 1
## 32 1 1
## 33 1 1
## 34 1 1
## 35 1 1
## 36 1 1
## 37 1 1
## 38 1 1
## 39 1 1
## 40 1 1
## 41 1 1
## 42 1 1
## 43 1 1
## 44 1 1
## 45 1 1
## 46 1 1
## 47 1 1
## 48 1 1
## 49 1 1
## 50 1 1
## 51 1 1
## 52 1 1
## 53 1 1
## 54 1 1
## 55 1 1
## 56 1 0
## 57 1 0
## 58 1 0
## 59 1 0
## 60 1 0
## 61 1 0
## 62 1 0
## 63 1 0
## 64 1 0
## 65 1 0
## 66 1 0
## 67 1 0
## 68 1 0
## 69 1 0
## 70 1 0
## 71 1 0
## 72 1 0
## 73 1 0
## 74 1 0
## 75 1 0
## 76 1 0
## 77 1 0
## 78 1 0
## 79 1 0
## 80 1 0
## 81 1 0
## 82 1 0
## 83 1 0
## 84 1 0
## 85 1 0
## 86 1 0
## 87 1 0
## 88 1 0
## 89 1 0
## 90 1 0
## 91 1 0
## 92 1 0
## 93 1 0
## 94 1 0
## 95 1 0
## 96 1 0
## 97 1 0
## 98 1 0
## 99 1 0
## 100 1 0
## 101 1 0
## 102 1 0
## 103 1 0
## 104 1 0
## 105 1 0
## 106 1 0
## 107 1 0
## 108 1 0
## 109 1 0
## 110 1 0
## 111 1 0
## 112 1 0
## 113 1 0
## 114 1 0
## 115 1 0
## 116 1 0
## 117 1 0
## 118 1 0
## 119 1 0
## 120 1 0
## 121 1 0
## 122 1 0
## 123 0 1
## 124 0 1
## 125 0 1
## 126 0 1
## 127 0 1
## 128 0 1
## 129 0 1
## 130 0 1
## 131 0 1
## 132 0 1
## 133 0 1
## 134 0 1
## 135 0 1
## 136 0 1
## 137 0 1
## 138 0 1
## 139 0 1
## 140 0 1
## 141 0 1
## 142 0 1
## 143 0 1
## 144 0 1
## 145 0 1
## 146 0 1
## 147 0 1
## 148 0 1
## 149 0 1
## 150 0 1
## 151 0 1
## 152 0 1
## 153 0 1
## 154 0 1
## 155 0 1
## 156 0 1
## 157 0 1
## 158 0 1
## 159 0 1
## 160 0 1
## 161 0 1
## 162 0 1
## 163 0 1
## 164 0 1
## 165 0 0
## 166 0 0
## 167 0 0
## 168 0 0
## 169 0 0
## 170 0 0
## 171 0 0
## 172 0 0
## 173 0 0
## 174 0 0
## 175 0 0
## 176 0 0
## 177 0 0
## 178 0 0
## 179 0 0
## 180 0 0
## 181 0 0
## 182 0 0
## 183 0 0
## 184 0 0
## 185 0 0
## 186 0 0
## 187 0 0
## 188 0 0
## 189 0 0
## 190 0 0
## 191 0 0
## 192 0 0
## 193 0 0
## 194 0 0
## 195 0 0
## 196 0 0
## 197 0 0
## 198 0 0
The following R code runs the log-binomial regression model using glm function in R
model_unweighted <- glm(disease ~ treatment, data = mydata, family = binomial("log"))
summary(model_unweighted)
##
## Call:
## glm(formula = disease ~ treatment, family = binomial("log"),
## data = mydata)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.268 -1.095 -1.095 1.262 1.262
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.5931 0.1032 -5.746 9.12e-09 ***
## treatment -0.2036 0.1437 -1.417 0.156
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 274.41 on 197 degrees of freedom
## Residual deviance: 272.46 on 196 degrees of freedom
## AIC: 276.46
##
## Number of Fisher Scoring iterations: 5
Next,we will calculate the coefficients using the Fisher scoring manually.
For \(\log\) link of a Bernoulli distribution we have
\[\log(p_i)=x_i^{T}\mathbf{\beta} \tag{1}\Rightarrow p_i=e^{x_i^T\beta}\] The likelihood function is
\[\begin{align*} L&=p_1^{y_1}(1-p_1)^{(1-y_1)}\times p_2^{y_2}(1-p_2)^{(1-y_2)}\times...\times p_n^{y_n}(1-p_n)^{(1-y_n)} \\&= \prod_{i=1}^n p_i^{y_i}(1-p_i)^{(1-y_i)} \end{align*}\]
Then log likelihood function is
\[\ell=\sum_{i=1}^n y_i\log p_i+\sum_{i=1}^n (1-y_i)\log(1-p_i)\tag{2}\] We put \((1)\) to \((2)\) and get
\[\begin{align*} \ell&=\sum_{i=1}^n \left\{y_i\log e^{\bf{x_i^T\beta}}+(1-y_i)\log(1-e^{\bf{x_i^T\beta}})\right\} \\ &=\left\{y_i \bf{x_i^T\beta}+(1-y_i)\log(1-e^{\bf{x_i^T\beta}})\right\} \tag{3} \end{align*}\]
Taking derivatives in term of \(\beta_j\) where where \(j=1,2,...,m\), we get:
\[\begin{align*} \frac{\partial\ell}{\partial\beta_j}&=\sum_{i=1}^n\left\{y_i x_{ij}-\frac{1-y_i}{1-e^{\bf{x_i^T\beta}}}e^{\bf{x_i^T\beta}}x_{ij}\right\}\\ &=\sum_{i=1}^nx_{ij}(y_i-\frac{e^{\bf{x_i^T\beta}}(1-y_i)}{1-e^{\bf{x_i^T\beta}}}) \\&=\sum_{i=1}^nx_{ij}(\frac{1}{1-e^{x_i^T\beta}}-\frac{e^{x_i^T\beta}}{1-e^{\bf{x_i^T\beta}}})\\ &=\sum_{i=1}^nx_{ij}\frac{y_i-e^{x_i^T\beta}}{1-e^{x_i^T\beta}} \tag{4} \end{align*}\]
Next,to calculate the second derivative of the log-likelihood function.
\[\begin{align*} \frac{\partial\ell^2}{\partial\beta_j \partial\beta_k}&=\sum_{i=1}^nx_{ij}\left[\frac{y_ie^{x_i^T\beta}x_{ik}}{(1-e^{x_i^T\beta})^2}-\frac{(e^{x_i^T\beta})'(1-e^{\bf{x_i^T\beta}})-(e^{x_i^T\beta})(1-e^{\bf{x_i^T\beta}})'}{(1-e^{\bf{x_i^T\beta}})^2}\right]\\ &=\sum_{i=1}^nx_{ij}\left[\frac{(y_i-1)e^{x_i^T\beta}}{(1-e^{x_i^T\beta})^2}\right]x_{ik}\tag{5} \end{align*}\]
For Fisher scoring method, we need to calculate the Fisher Information:
\[\begin{align*} I&=-E(\frac{\partial\ell^2}{\partial\beta_j \partial\beta_k})\\ &=-E(\sum_{i=1}^nx_{ij}\left[\frac{(y_i-1)e^{x_i^T\beta}}{(1-e^{x_i^T\beta})^2}\right]x_{ik})\\ &=\sum_{i=1}^nx_{ij}\left[\frac{e^{x_i^T\beta}}{(1-e^{x_i^T\beta})}\right]x_{ik} \tag{6} \end{align*}\]
This is because when \(y_i\) has a Bernoulli distribution \(E(y_i)=p_i=e^{x_i^T\beta}\)
Now we can write \((6)\) as matrix format
\[I(\beta)=X^TWX \tag{7}\] Where \(W=diag(\frac{e^{x_i^T\beta}}{1-e^{x_i^T\beta}})\)
Next, we write the equation \((4)\) as matrix form, from the equation \((4)\) we have
\[\begin{align*} \frac{\partial\ell}{\partial\beta_j}&=\sum_{i=1}^nx_{ij}\frac{y_i-e^{x_i^T\beta}}{1-e^{x_i^T\beta}}\\ \end{align*}\]
Therefore, we can write the first derivative of the log-likelihood function as the matrix form
\[l'(\beta)=X^T\frac{Y-e^{X^T\beta}}{e^{X^T\beta}}\tag{8}\] Combine equation \((7)\) and \((8)\) we write the Fisher scoring algorithm as
\[\beta^{(t+1)}=\beta^{t}+I(\beta^{(t)})^{-1}l'(\beta^{(t)}) \tag{9}\] Next, we write R code to run the Fisher scoring algorithm.
fisher_scoring<-function(x,y,b.init,tol=1e-8){
change <- Inf
b.old <- b.init
while(change > tol) {
eta <- x %*% b.old # linear predictor
w<-diag(as.vector(exp(eta)/(1-exp(eta))),nrow=nrow(x),ncol=nrow(x)) #diagonal weigth matrix
b.new<-b.old+solve(t(x)%*%w%*%x)%*%t(x)%*%((y-exp(eta))/(1-exp(eta))) #equation 9
change <- sqrt(sum((b.new - b.old)^2))
b.old <- b.new
}
b.new
}
x<-matrix(c(rep(1,198),mydata$treatment), nrow=198,ncol=2) # design matrix, i.e the predictor, here we just add intercept.
y<-mydata$disease
fisher_scoring(x,y,rep(1,2))
## [,1]
## [1,] -0.5930637
## [2,] -0.2036241
We get the exact same results as the GLM function, however, if we use Newton-Raphson algorithm the algorithm is very difficult to converge.
Note, the for log-binomial regression we have to restrict \(x_i^T\beta\le0\) this is because \(0\le p_i=e^{x_i^T\beta}\le1\)
The log-binomial model is quite similar as the so called catalytic model in R, the catalytic model is often used to model the force of infection, next I will show the differences between the log-binomial model and the catalytic model.
1.Switamy, A. P., Sutarman, A., & Open, D. S. (2017). Maximum likelihood based on Newton Raphson, fisher scoring and expectation maximization algorithm application on accident data. International Journal of Advanced Research, 6, 965-69.
2.Williamson, Tyler, Misha Eliasziw, and Gordon Hilton Fick. “Log-binomial models: exploring failed convergence.” Emerging themes in epidemiology 10.1 (2013): 1-10.
3.Chen, Wansu, et al. “Comparing performance between log-binomial and robust Poisson regression models for estimating risk ratios under model misspecification.” BMC medical research methodology 18.1 (2018): 1-12.