This assignment is worth 5 marks.

This is at home group assignment and it is built on Assignment 2 and 3.

Instructions are in the next page

The use of LLMs is not permitted in this module.

The use of LLMs is not permitted in this module.

The use of LLMs is not permitted in this module.

   You can do this assignment based on the code I have provided and your solutions for Assignment 2 and 3. Use your brain not chatgp  

Group Number: 5

 
Name Surname Username Student-Number
Conor McCormack mccorc16 23372769

I have shared my implementation for myglm, which includes:

Your goal is simply to extend my function myglm to include:

You need also to improve my code in two ways:

  1. In my code I used the R functiondbinom and dnorm to define the log-likelihood. We are not allowed to use functions from R, so replace them with an implementation from scratch of the log-likelihood using the functions \(a,b,c,d\). Do the same for Poisson and Negative Binomial.

  2. In my code, the function alteroptim, which implements the alternating optimisation loop, currently does not include a stopping criterion. In particular, the loop continues even when the changes in the estimated coefficients \(\beta\)s and the distribution additional parameter other_mle between successive iterations fall below a specified tolerance level. Similar to the implementation of the Newton–Raphson algorithm, the loop should terminate once these parameter updates are sufficiently small (below a tolerance). Introducing such a stopping criterion would prevent unnecessary iterations and reduce computational time. Implement this stopping criterion.

Finally, you need to test you code to check that the output is correct. I provided a test function you can use for Gaussian and Binomial by comparing the output of myglm to that of glm. You can do the same for Poisson. For the Negative Binomial, you can use this output as a test for your function:

set.seed(123)
X = cbind(c(1,1,1,1,1,1),c(1,2,3,4,5,6))
Y =rnbinom(6,size=36,mu=exp(X[,2])) # c(4,   9,  23,  43, 119, 430)
myglm(X,Y,"negbinomial","log")

$estimate
           [,1]
[1,] 0.04913257
[2,] 0.98065866

$stdError
[1] 0.33440178 0.06770406

$z
           [,1]
[1,]  0.1469268
[2,] 14.4844882

$pvalz
             [,1]
[1,] 0.8902977264
[2,] 0.0001320879

$other_parameter_mle
[1] 66.50724

$dispersion
[1] 1.11446

$deviance
[1] 4.492353

$deviance_null
[1] 7.357109

$AIC
[1] 46.93867

Important: Use R-studio debugger to find any errors/issues in your code: https://support.posit.co/hc/en-us/articles/205612627-Debugging-with-the-RStudio-IDE

Expected Output: Submit your myglm.R function (as a R function) and a Rmd notebook which shows, using the tests, that the code works for both Poisson and Negative Binomial.

#load myglm

source("myglm.R")

#test Poisson

We compare the output of myglm with the output of glm for a Poisson regression with a log link function.

X = cbind(c(1,1,1,1,1,1),c(1,2,3,4,5,6))
Y = matrix(c(1, 3, 5, 9, 20, 40))

cat("=== myglm output ===\n")
## === myglm output ===
output = myglm(X, Y, "poisson", "log")
## [1] "Warning: the deviance of the null model is computed using a model including  the intercept only.\n\n        This is different from what the standard GLM does."
## [1] "Warning: the deviance of the null model is computed using a model including  the intercept only.\n\n        This is different from what the standard GLM does."
print(output)
## $estimate
##            [,1]
## [1,] -0.5060043
## [2,]  0.6983309
## 
## $stdError
## [1] 0.50004916 0.09545421
## 
## $z
##           [,1]
## [1,] -1.011909
## [2,]  7.315873
## 
## $pvalz
##              [,1]
## [1,] 3.115815e-01
## [2,] 2.557954e-13
## 
## $other_parameter_mle
## [1] NaN
## 
## $dispersion
## [1] 1
## 
## $deviance
## [1] 0.2410295
## 
## $deviance_null
## [1] 77.04364
## 
## $AIC
## [1] 27.13996
cat("\n=== glm output ===\n")
## 
## === glm output ===
model = glm(Y ~ X[,2], family=poisson)
print(summary(model))
## 
## Call:
## glm(formula = Y ~ X[, 2], family = poisson)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.50600    0.50005  -1.012    0.312    
## X[, 2]       0.69833    0.09545   7.316 2.56e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 77.04364  on 5  degrees of freedom
## Residual deviance:  0.24103  on 4  degrees of freedom
## AIC: 27.14
## 
## Number of Fisher Scoring iterations: 4

The estimates, standard errors, z-values, p-values, deviance and AIC from myglm match those from glm.

#test Negative Binomial

We use the test data provided in the assignment to verify that the Negative Binomial implementation produces the correct output.

set.seed(123)
X = cbind(c(1,1,1,1,1,1),c(1,2,3,4,5,6))
Y =rnbinom(6,size=36,mu=exp(X[,2])) # c(4, 9, 23, 43, 119, 430)
Y = matrix(Y)
cat("Y values:", Y, "\n")
## Y values: 4 9 23 43 119 430
cat("\n=== myglm output ===\n")
## 
## === myglm output ===
output = myglm(X, Y, "negbinomial", "log")
## [1] "Warning: the deviance of the null model is computed using a model including  the intercept only.\n\n        This is different from what the standard GLM does."
## [1] "Warning: the deviance of the null model is computed using a model including  the intercept only.\n\n        This is different from what the standard GLM does."
print(output)
## $estimate
##            [,1]
## [1,] 0.04913259
## [2,] 0.98065866
## 
## $stdError
## [1] 0.33440177 0.06770406
## 
## $z
##            [,1]
## [1,]  0.1469268
## [2,] 14.4844883
## 
## $pvalz
##              [,1]
## [1,] 0.8902976781
## [2,] 0.0001320879
## 
## $other_parameter_mle
## [1] 66.50723
## 
## $dispersion
## [1] 1.11446
## 
## $deviance
## [1] 4.492353
## 
## $deviance_null
## [1] 7.357113
## 
## $AIC
## [1] 46.93867

The output matches the expected values:

Finally, in your Rmd notebook, complete the following table that summarises how to compute the statistics for GLM model based on a distribution with or without an additional parameter

Component / Statistic GLM without additional parameter GLM with additional parameter (\(\psi\))
Example distribution Poisson, Binomial Gaussian, Negative Binomial
Estimation of \(\beta\)s MLE via Newton-Raphson Alternating Optimisation
Estimate of dispersion \(\hat{\phi}\) NA Pearson residuals: \(\frac{1}{N - n_\beta}\sum_{i=1}^{N}\frac{(Y_i - \hat{\mu})^2}{Var(Y_i)}\)
Standard errors for \(\beta\)s From Fisher information: \(\text{SE}(\hat{\beta}_j) = \sqrt{[-\mathcal{H}(\hat{\beta})^{-1}]_{jj}}\) Adjusted by \(\hat{\phi}\): \(\text{SE}(\hat{\beta}_j) = \sqrt{\hat{\phi} \cdot [-\mathcal{H}(\hat{\beta})^{-1}]_{jj}}\)
Test statistics z-tests: \(z = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)}\) t-tests: \(t = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)}\)
P-value \(2(1 - \Phi(\left|\frac{\hat{\beta}_j}{\sigma_j}\right|))\) \(2\left(1 - CDF_{\text{student}}\left(\left|\frac{\hat{\beta}_j}{\sigma_j}\right|,\, N - n_\beta\right)\right)\)
Deviance \(2(\ln L_{\text{perf}} - \ln L_{\text{model}})\) \(2(\ln L_{\text{perf}} - \ln L_{\text{model}})\)
AIC \(-2 \ln L_{\text{model}} + 2(n_p + 1)\) \(-2 \ln L_{\text{model}} + 2(n_p + 1)\)

Completing this table will show that you understand how these statistics are computed. This table is also useful for the exam. See my slides 7-linear-regression.