IRLS

Author

Takafumi Kubota

Published

June 7, 2024

Abstract

This document compares GLM and VGLM using simulated data with multiple predictors and Poisson-distributed responses, evaluating residuals and coefficients.

Keywords

GLM, VGLM, Poisson regression, residual analysis, predictive modeling, simulated data, R programming, statistical modeling

1 Comparative Analysis of GLM and VGLM Using Simulated Data

This document presents a comparative analysis of Generalized Linear Models (GLM) and Vector Generalized Linear Models (VGLM) using simulated data. I generate sample data with multiple predictor variables and response variables following a Poisson distribution. The analysis includes fitting GLM models separately for each response variable and fitting a VGLM model for all response variables simultaneously. Residuals and coefficient estimates are calculated and visualized for both models to evaluate their performance and differences.

Code
# Install and load VGAM package
# install.packages("VGAM")
library(VGAM)
Warning: package 'VGAM' was built under R version 4.3.1
Loading required package: stats4
Loading required package: splines
Code
# Generate sample data
set.seed(123)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
x4 <- rnorm(n)
x5 <- rnorm(n)

# Generate response variables
y1 <- rpois(n, lambda = exp(0.5 + 0.3 * x1 + 0.2 * x2 + 0.1 * x3))
y2 <- rpois(n, lambda = exp(0.2 + 0.5 * x1 + 0.3 * x4 + 0.4 * x5))
y3 <- rpois(n, lambda = exp(0.1 + 0.7 * x2 + 0.2 * x3 + 0.5 * x4))

# Create data frame
data <- data.frame(x1 = x1, x2 = x2, x3 = x3, x4 = x4, x5 = x5, y1 = y1, y2 = y2, y3 = y3)
head(data)
           x1          x2         x3         x4          x5 y1 y2 y3
1 -0.56047565 -0.71040656  2.1988103 -0.7152422 -0.07355602  1  1  2
2 -0.23017749  0.25688371  1.3124130 -0.7526890 -1.16865142  2  1  3
3  1.55870831 -0.24669188 -0.2651451 -0.9385387 -0.63474826  1  1  0
4  0.07050839 -0.34754260  0.5431941 -1.0525133 -0.02884155  3  1  1
5  0.12928774 -0.95161857 -0.4143399 -0.4371595  0.67069597  3  3  0
6  1.71506499 -0.04502772 -0.4762469  0.3311792 -1.65054654  2  2  4
Code
# GLM Poisson regression
glm_model_y1 <- glm(y1 ~ x1 + x2 + x3 + x4 + x5, family = poisson, data = data)
glm_model_y2 <- glm(y2 ~ x1 + x2 + x3 + x4 + x5, family = poisson, data = data)
glm_model_y3 <- glm(y3 ~ x1 + x2 + x3 + x4 + x5, family = poisson, data = data)

# Display summaries of each model
summary(glm_model_y1)

Call:
glm(formula = y1 ~ x1 + x2 + x3 + x4 + x5, family = poisson, 
    data = data)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.38517    0.08854   4.350 1.36e-05 ***
x1           0.43631    0.08834   4.939 7.86e-07 ***
x2           0.33736    0.07935   4.251 2.12e-05 ***
x3           0.08839    0.08044   1.099    0.272    
x4          -0.01082    0.07788  -0.139    0.889    
x5          -0.02852    0.07367  -0.387    0.699    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 136.120  on 99  degrees of freedom
Residual deviance:  93.069  on 94  degrees of freedom
AIC: 306.15

Number of Fisher Scoring iterations: 5
Code
summary(glm_model_y2)

Call:
glm(formula = y2 ~ x1 + x2 + x3 + x4 + x5, family = poisson, 
    data = data)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.070123   0.109203   0.642    0.521    
x1           0.663334   0.094500   7.019 2.23e-12 ***
x2           0.004192   0.095817   0.044    0.965    
x3          -0.080928   0.092684  -0.873    0.383    
x4           0.111971   0.082933   1.350    0.177    
x5           0.517226   0.088189   5.865 4.49e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 163.58  on 99  degrees of freedom
Residual deviance:  88.37  on 94  degrees of freedom
AIC: 283.95

Number of Fisher Scoring iterations: 5
Code
summary(glm_model_y3)

Call:
glm(formula = y3 ~ x1 + x2 + x3 + x4 + x5, family = poisson, 
    data = data)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.176741   0.100730   1.755  0.07933 .  
x1           0.008668   0.097943   0.089  0.92948    
x2           0.641678   0.068253   9.401  < 2e-16 ***
x3           0.215410   0.081309   2.649  0.00807 ** 
x4           0.449104   0.085294   5.265  1.4e-07 ***
x5          -0.100219   0.076188  -1.315  0.18837    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 289.16  on 99  degrees of freedom
Residual deviance: 114.90  on 94  degrees of freedom
AIC: 290.69

Number of Fisher Scoring iterations: 5
Code
# Calculate residuals for each model
glm_residuals_y1 <- residuals(glm_model_y1, type = "deviance")
glm_residuals_y2 <- residuals(glm_model_y2, type = "deviance")
glm_residuals_y3 <- residuals(glm_model_y3, type = "deviance")

# VGLM Poisson regression
vglm_model <- vglm(cbind(y1, y2, y3) ~ x1 + x2 + x3 + x4 + x5, family = poissonff, data = data)

# Display summary of the VGLM model
summary(vglm_model)

Call:
vglm(formula = cbind(y1, y2, y3) ~ x1 + x2 + x3 + x4 + x5, family = poissonff, 
    data = data)

Coefficients: 
               Estimate Std. Error z value Pr(>|z|)    
(Intercept):1  0.385168   0.088545   4.350 1.36e-05 ***
(Intercept):2  0.070123   0.109203   0.642  0.52079    
(Intercept):3  0.176741   0.100730   1.755  0.07933 .  
x1:1           0.436308   0.088344   4.939 7.86e-07 ***
x1:2           0.663334   0.094500   7.019 2.23e-12 ***
x1:3           0.008668   0.097943   0.089  0.92948    
x2:1           0.337355   0.079352   4.251 2.12e-05 ***
x2:2           0.004192   0.095817   0.044  0.96510    
x2:3           0.641678   0.068253   9.401  < 2e-16 ***
x3:1           0.088388   0.080442   1.099  0.27187    
x3:2          -0.080928   0.092684  -0.873  0.38258    
x3:3           0.215410   0.081309   2.649  0.00807 ** 
x4:1          -0.010823   0.077882  -0.139  0.88948    
x4:2           0.111971   0.082933   1.350  0.17697    
x4:3           0.449104   0.085294   5.265 1.40e-07 ***
x5:1          -0.028516   0.073672  -0.387  0.69871    
x5:2           0.517226   0.088189   5.865 4.49e-09 ***
x5:3          -0.100219   0.076188  -1.315  0.18837    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: loglink(E[y1]), loglink(E[y2]), loglink(E[y3])

Residual deviance: 296.34 on 282 degrees of freedom

Log-likelihood: -422.3963 on 282 degrees of freedom

Number of Fisher scoring iterations: 5 

No Hauck-Donner effect found in any of the estimates
Code
# Calculate residuals for the VGLM model
vglm_fitted <- fitted(vglm_model)
vglm_residuals_y1 <- data$y1 - vglm_fitted[, 1]
vglm_residuals_y2 <- data$y2 - vglm_fitted[, 2]
vglm_residuals_y3 <- data$y3 - vglm_fitted[, 3]

# Display residuals
print("GLM residuals for y1:")
[1] "GLM residuals for y1:"
Code
print(summary(glm_residuals_y1))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.08598 -0.90722 -0.05605 -0.14586  0.55509  2.83644 
Code
print("GLM residuals for y2:")
[1] "GLM residuals for y2:"
Code
print(summary(glm_residuals_y2))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-2.0575 -0.9070 -0.1530 -0.1499  0.5561  2.3643 
Code
print("GLM residuals for y3:")
[1] "GLM residuals for y3:"
Code
print(summary(glm_residuals_y3))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-2.9728 -0.9890 -0.1555 -0.2180  0.3973  3.1318 
Code
print("VGLM residuals for y1:")
[1] "VGLM residuals for y1:"
Code
print(summary(vglm_residuals_y1))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.40988 -0.80119 -0.07479  0.00000  0.72825  4.46899 
Code
print("VGLM residuals for y2:")
[1] "VGLM residuals for y2:"
Code
print(summary(vglm_residuals_y2))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-2.9385 -0.6323 -0.1615  0.0000  0.6231  2.9187 
Code
print("VGLM residuals for y3:")
[1] "VGLM residuals for y3:"
Code
print(summary(vglm_residuals_y3))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-6.4235 -0.6190 -0.1933  0.0000  0.5927  6.4515 
Code
# Display coefficient estimates
print("GLM coefficient estimates for y1:")
[1] "GLM coefficient estimates for y1:"
Code
print(coef(glm_model_y1))
(Intercept)          x1          x2          x3          x4          x5 
 0.38516799  0.43630805  0.33735500  0.08838753 -0.01082276 -0.02851557 
Code
print("GLM coefficient estimates for y2:")
[1] "GLM coefficient estimates for y2:"
Code
print(coef(glm_model_y2))
 (Intercept)           x1           x2           x3           x4           x5 
 0.070123026  0.663333996  0.004191967 -0.080928155  0.111970667  0.517226123 
Code
print("GLM coefficient estimates for y3:")
[1] "GLM coefficient estimates for y3:"
Code
print(coef(glm_model_y3))
 (Intercept)           x1           x2           x3           x4           x5 
 0.176740816  0.008668477  0.641677616  0.215409800  0.449103651 -0.100219355 
Code
print("VGLM coefficient estimates:")
[1] "VGLM coefficient estimates:"
Code
print(coef(vglm_model))
(Intercept):1 (Intercept):2 (Intercept):3          x1:1          x1:2 
  0.385167993   0.070123026   0.176740816   0.436308045   0.663333996 
         x1:3          x2:1          x2:2          x2:3          x3:1 
  0.008668477   0.337355003   0.004191967   0.641677616   0.088387532 
         x3:2          x3:3          x4:1          x4:2          x4:3 
 -0.080928155   0.215409800  -0.010822759   0.111970667   0.449103651 
         x5:1          x5:2          x5:3 
 -0.028515573   0.517226123  -0.100219355 
Code
# Plot residuals for visual comparison
par(mfrow = c(3, 2))

plot(glm_residuals_y1, main = "GLM Residuals for y1", xlab = "Index", ylab = "Residuals", ylim = c(-3, 5))
plot(vglm_residuals_y1, main = "VGLM Residuals for y1", xlab = "Index", ylab = "Residuals", ylim = c(-3, 5))

plot(glm_residuals_y2, main = "GLM Residuals for y2", xlab = "Index", ylab = "Residuals", ylim = c(-4, 4))
plot(vglm_residuals_y2, main = "VGLM Residuals for y2", xlab = "Index", ylab = "Residuals", ylim = c(-4, 4))

plot(glm_residuals_y3, main = "GLM Residuals for y3", xlab = "Index", ylab = "Residuals", ylim = c(-7, 7))
plot(vglm_residuals_y3, main = "VGLM Residuals for y3", xlab = "Index", ylab = "Residuals", ylim = c(-7, 7))

1.1 Explanation

  1. Load VGAM Package:

    • Load the VGAM package to use the VGLM function for fitting vector generalized linear models.
  2. Generate Sample Data:

    • Set the seed for reproducibility.

    • Generate sample data for five predictor variables (x1, x2, x3, x4, x5) and three response variables (y1, y2, y3) using the Poisson distribution.

  3. Create Data Frame:

    • Combine the predictor and response variables into a data frame.
  4. GLM Poisson Regression:

    • Fit separate GLM Poisson regression models for each of the three response variables (y1, y2, y3).

    • Display summaries for each model.

  5. Calculate GLM Residuals:

    • Calculate deviance residuals for each GLM model.
  6. VGLM Poisson Regression:

    • Fit a VGLM Poisson regression model for all three response variables simultaneously.

    • Display the summary of the VGLM model.

  7. Calculate VGLM Residuals:

    • Calculate residuals for the VGLM model.
  8. Display Residuals:

    • Display summaries of the residuals for each response variable from both GLM and VGLM models.
  9. Display Coefficient Estimates:

    • Print the coefficient estimates from the GLM and VGLM models.
  10. Plot Residuals for Visual Comparison:

    • Use par to set up a 3x2 plotting layout.

    • Plot the residuals for each response variable from both the GLM and VGLM models, ensuring the y-axes are aligned.