This assignment is worth 5 marks.
This is at home group assignment and it is built on Assignment 2 and 3.
Instructions are in the next page
The use of LLMs is not permitted in this module.
The use of LLMs is not permitted in this module.
The use of LLMs is not permitted in this module.
You can do this assignment based on the code I have provided and
your solutions for Assignment 2 and 3. Use your brain not
chatgp
Group Number: 5
| Name | Surname | Username | Student-Number |
|---|---|---|---|
| Conor | McCormack | mccorc16 | 23372769 |
I have shared my implementation for myglm, which
includes:
glm (standard errors, deviance,
AIC, etc.) distinguishing between distributions which don’t have (the
Binomial) and have (Gaussian) an additional parameter.Your goal is simply to extend my function myglm to
include:
You need also to improve my code in two ways:
In my code I used the R functiondbinom and
dnorm to define the log-likelihood. We are not allowed to
use functions from R, so replace them with an
implementation from scratch of the log-likelihood using the functions
\(a,b,c,d\). Do the same for Poisson
and Negative Binomial.
In my code, the function alteroptim, which
implements the alternating optimisation loop, currently does not include
a stopping criterion. In particular, the loop continues even when the
changes in the estimated coefficients \(\beta\)s and the distribution additional
parameter other_mle between successive iterations fall
below a specified tolerance level. Similar to the implementation of the
Newton–Raphson algorithm, the loop should terminate once these parameter
updates are sufficiently small (below a tolerance). Introducing such a
stopping criterion would prevent unnecessary iterations and reduce
computational time. Implement this stopping criterion.
Finally, you need to test you code to check that the output is
correct. I provided a test function you can use for
Gaussian and Binomial by comparing the output
of myglm to that of glm. You can do the same
for Poisson. For the Negative Binomial, you can use this
output as a test for your function:
set.seed(123)
X = cbind(c(1,1,1,1,1,1),c(1,2,3,4,5,6))
Y =rnbinom(6,size=36,mu=exp(X[,2])) # c(4, 9, 23, 43, 119, 430)
myglm(X,Y,"negbinomial","log")
$estimate
[,1]
[1,] 0.04913257
[2,] 0.98065866
$stdError
[1] 0.33440178 0.06770406
$z
[,1]
[1,] 0.1469268
[2,] 14.4844882
$pvalz
[,1]
[1,] 0.8902977264
[2,] 0.0001320879
$other_parameter_mle
[1] 66.50724
$dispersion
[1] 1.11446
$deviance
[1] 4.492353
$deviance_null
[1] 7.357109
$AIC
[1] 46.93867
Important: Use R-studio debugger to find any errors/issues in your code: https://support.posit.co/hc/en-us/articles/205612627-Debugging-with-the-RStudio-IDE
Expected Output: Submit your myglm.R
function (as a R function) and a Rmd notebook which shows,
using the tests, that the code works for both Poisson and Negative
Binomial.
#load myglm
source("myglm.R")
#test Poisson
We compare the output of myglm with the output of
glm for a Poisson regression with a log link function.
X = cbind(c(1,1,1,1,1,1),c(1,2,3,4,5,6))
Y = matrix(c(1, 3, 5, 9, 20, 40))
cat("=== myglm output ===\n")
## === myglm output ===
output = myglm(X, Y, "poisson", "log")
## [1] "Warning: the deviance of the null model is computed using a model including the intercept only.\n\n This is different from what the standard GLM does."
## [1] "Warning: the deviance of the null model is computed using a model including the intercept only.\n\n This is different from what the standard GLM does."
print(output)
## $estimate
## [,1]
## [1,] -0.5060043
## [2,] 0.6983309
##
## $stdError
## [1] 0.50004916 0.09545421
##
## $z
## [,1]
## [1,] -1.011909
## [2,] 7.315873
##
## $pvalz
## [,1]
## [1,] 3.115815e-01
## [2,] 2.557954e-13
##
## $other_parameter_mle
## [1] NaN
##
## $dispersion
## [1] 1
##
## $deviance
## [1] 0.2410295
##
## $deviance_null
## [1] 77.04364
##
## $AIC
## [1] 27.13996
cat("\n=== glm output ===\n")
##
## === glm output ===
model = glm(Y ~ X[,2], family=poisson)
print(summary(model))
##
## Call:
## glm(formula = Y ~ X[, 2], family = poisson)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.50600 0.50005 -1.012 0.312
## X[, 2] 0.69833 0.09545 7.316 2.56e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 77.04364 on 5 degrees of freedom
## Residual deviance: 0.24103 on 4 degrees of freedom
## AIC: 27.14
##
## Number of Fisher Scoring iterations: 4
The estimates, standard errors, z-values, p-values, deviance and AIC
from myglm match those from glm.
#test Negative Binomial
We use the test data provided in the assignment to verify that the Negative Binomial implementation produces the correct output.
set.seed(123)
X = cbind(c(1,1,1,1,1,1),c(1,2,3,4,5,6))
Y =rnbinom(6,size=36,mu=exp(X[,2])) # c(4, 9, 23, 43, 119, 430)
Y = matrix(Y)
cat("Y values:", Y, "\n")
## Y values: 4 9 23 43 119 430
cat("\n=== myglm output ===\n")
##
## === myglm output ===
output = myglm(X, Y, "negbinomial", "log")
## [1] "Warning: the deviance of the null model is computed using a model including the intercept only.\n\n This is different from what the standard GLM does."
## [1] "Warning: the deviance of the null model is computed using a model including the intercept only.\n\n This is different from what the standard GLM does."
print(output)
## $estimate
## [,1]
## [1,] 0.04913259
## [2,] 0.98065866
##
## $stdError
## [1] 0.33440177 0.06770406
##
## $z
## [,1]
## [1,] 0.1469268
## [2,] 14.4844883
##
## $pvalz
## [,1]
## [1,] 0.8902976781
## [2,] 0.0001320879
##
## $other_parameter_mle
## [1] 66.50723
##
## $dispersion
## [1] 1.11446
##
## $deviance
## [1] 4.492353
##
## $deviance_null
## [1] 7.357113
##
## $AIC
## [1] 46.93867
The output matches the expected values:
$estimate: 0.04913257, 0.98065866$stdError: 0.33440178, 0.06770406$z: 0.1469268, 14.4844882$pvalz: 0.8902977264, 0.0001320879$other_parameter_mle: 66.50724$dispersion: 1.11446$deviance: 4.492353$deviance_null: 7.357109$AIC: 46.93867Finally, in your Rmd notebook, complete the following
table that summarises how to compute the statistics for GLM model based
on a distribution with or without an additional parameter
| Component / Statistic | GLM without additional parameter | GLM with additional parameter (\(\psi\)) |
|---|---|---|
| Example distribution | Poisson, Binomial | Gaussian, Negative Binomial |
| Estimation of \(\beta\)s | MLE via Newton-Raphson | Alternating Optimisation |
| Estimate of dispersion \(\hat{\phi}\) | NA | Pearson residuals: \(\frac{1}{N - n_\beta}\sum_{i=1}^{N}\frac{(Y_i - \hat{\mu})^2}{Var(Y_i)}\) |
| Standard errors for \(\beta\)s | From Fisher information: \(\text{SE}(\hat{\beta}_j) = \sqrt{[-\mathcal{H}(\hat{\beta})^{-1}]_{jj}}\) | Adjusted by \(\hat{\phi}\): \(\text{SE}(\hat{\beta}_j) = \sqrt{\hat{\phi} \cdot [-\mathcal{H}(\hat{\beta})^{-1}]_{jj}}\) |
| Test statistics | z-tests: \(z = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)}\) | t-tests: \(t = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)}\) |
| P-value | \(2(1 - \Phi(\left|\frac{\hat{\beta}_j}{\sigma_j}\right|))\) | \(2\left(1 - CDF_{\text{student}}\left(\left|\frac{\hat{\beta}_j}{\sigma_j}\right|,\, N - n_\beta\right)\right)\) |
| Deviance | \(2(\ln L_{\text{perf}} - \ln L_{\text{model}})\) | \(2(\ln L_{\text{perf}} - \ln L_{\text{model}})\) |
| AIC | \(-2 \ln L_{\text{model}} + 2(n_p + 1)\) | \(-2 \ln L_{\text{model}} + 2(n_p + 1)\) |
Completing this table will show that you understand how these statistics are computed. This table is also useful for the exam. See my slides 7-linear-regression.