Generalized Linear Models

Introduction

Generalized linear models allow for linear regression to be applied to data sets that have response variables in which the error distribution does not follow a normal distribution. In this approach the response variable can be considered with a linear model through a link function. This link function allows the magnitude of the variance of the measurement to be a function of its predicted value. Common uses cases are for datasets with a Boolean response, a Bernoulli variable, in which the probability of the outcome is only of the range 0 to 1. This would lead to a binomial distribution. A log-linear model or log-odds model are also good candidates for applying the link function to use a GLM. The log-odds model would use the logit function to define the model.

From Wikipedia, the GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

Implementation

Using data from: https://www.kaggle.com/mhdzahier/travel-insurance, in which the dependent variable Claim identifies the filing of a travel insurance claim (yes or no). I work as a software engineer for a large auto insurance company based in San Antonio, Texas, so I found this dataset intriguing as an exercise in predicting insurance claims. The independent variables use in the implementation below include product, duration and age.

library(rcompanion)
travel_data <- read.csv("travelinsurance.csv")

travel_data$Claim <- ifelse(travel_data$Claim == 'Yes', 1, 0)

summary(travel_data)
##      Agency             Agency_Type    Distribution_Channel
##  EPX    :35119   Airlines     :17457   Offline: 1107       
##  CWT    : 8580   Travel Agency:45869   Online :62219       
##  C2B    : 8267                                             
##  JZI    : 6329                                             
##  SSI    : 1056                                             
##  JWT    :  749                                             
##  (Other): 3226                                             
##                             Product          Claim            Duration      
##  Cancellation Plan              :18630   Min.   :0.00000   Min.   :  -2.00  
##  2 way Comprehensive Plan       :13158   1st Qu.:0.00000   1st Qu.:   9.00  
##  Rental Vehicle Excess Insurance: 8580   Median :0.00000   Median :  22.00  
##  Basic Plan                     : 5469   Mean   :0.01464   Mean   :  49.32  
##  Bronze Plan                    : 4049   3rd Qu.:0.00000   3rd Qu.:  53.00  
##  1 way Comprehensive Plan       : 3331   Max.   :1.00000   Max.   :4881.00  
##  (Other)                        :10109                                      
##     Destination      Net_Sales         Commision      Gender   
##  SINGAPORE:13255   Min.   :-389.00   Min.   :  0.00    :45107  
##  MALAYSIA : 5930   1st Qu.:  18.00   1st Qu.:  0.00   F: 8872  
##  THAILAND : 5894   Median :  26.53   Median :  0.00   M: 9347  
##  CHINA    : 4796   Mean   :  40.70   Mean   :  9.81            
##  AUSTRALIA: 3694   3rd Qu.:  48.00   3rd Qu.: 11.55            
##  INDONESIA: 3452   Max.   : 810.00   Max.   :283.50            
##  (Other)  :26305                                               
##       Age        
##  Min.   :  0.00  
##  1st Qu.: 35.00  
##  Median : 36.00  
##  Mean   : 39.97  
##  3rd Qu.: 43.00  
##  Max.   :118.00  
## 

For the sake of comparison, build three GLM models based on progressive use of predictor variables.

glmod1 <- glm(Claim ~ Product, data=travel_data, family=binomial())

summary(glmod1)
## 
## Call:
## glm(formula = Claim ~ Product, family = binomial(), data = travel_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.4835  -0.1473  -0.1185  -0.0688   3.4780  
## 
## Coefficients:
##                                              Estimate Std. Error z value
## (Intercept)                                   -5.9111     0.3338 -17.709
## Product2 way Comprehensive Plan                1.3930     0.3443   4.046
## Product24 Protect                            -10.6550   152.6797  -0.070
## ProductAnnual Gold Plan                        3.8023     0.4060   9.366
## ProductAnnual Silver Plan                      3.8237     0.3443  11.104
## ProductAnnual Travel Protect Gold              3.7139     0.4717   7.873
## ProductAnnual Travel Protect Platinum          3.0977     0.6817   4.544
## ProductAnnual Travel Protect Silver            2.8907     0.6112   4.729
## ProductBasic Plan                              0.4440     0.3938   1.127
## ProductBronze Plan                             3.0052     0.3412   8.807
## ProductCancellation Plan                      -0.1349     0.3663  -0.368
## ProductChild Comprehensive Plan              -10.6550   799.8483  -0.013
## ProductComprehensive Plan                      1.6372     0.5605   2.921
## ProductGold Plan                               2.9903     0.4123   7.253
## ProductIndividual Comprehensive Plan           2.7470     0.6774   4.055
## ProductPremier Plan                            1.7574     0.6708   2.620
## ProductRental Vehicle Excess Insurance         1.3183     0.3509   3.757
## ProductSilver Plan                             3.2064     0.3449   9.296
## ProductSingle Trip Travel Protect Gold         2.9458     0.4654   6.330
## ProductSingle Trip Travel Protect Platinum     3.3010     0.5711   5.780
## ProductSingle Trip Travel Protect Silver       2.1675     0.6061   3.576
## ProductSpouse or Parents Comprehensive Plan    3.2720     1.0876   3.009
## ProductTicket Protector                        0.9014     0.5052   1.784
## ProductTravel Cruise Protect                   0.3408     0.7831   0.435
## ProductTravel Cruise Protect Family          -10.6550  2399.5447  -0.004
## ProductValue Plan                              0.9560     0.4055   2.358
##                                             Pr(>|z|)    
## (Intercept)                                  < 2e-16 ***
## Product2 way Comprehensive Plan             5.21e-05 ***
## Product24 Protect                           0.944364    
## ProductAnnual Gold Plan                      < 2e-16 ***
## ProductAnnual Silver Plan                    < 2e-16 ***
## ProductAnnual Travel Protect Gold           3.46e-15 ***
## ProductAnnual Travel Protect Platinum       5.52e-06 ***
## ProductAnnual Travel Protect Silver         2.25e-06 ***
## ProductBasic Plan                           0.259583    
## ProductBronze Plan                           < 2e-16 ***
## ProductCancellation Plan                    0.712732    
## ProductChild Comprehensive Plan             0.989372    
## ProductComprehensive Plan                   0.003491 ** 
## ProductGold Plan                            4.06e-13 ***
## ProductIndividual Comprehensive Plan        5.00e-05 ***
## ProductPremier Plan                         0.008796 ** 
## ProductRental Vehicle Excess Insurance      0.000172 ***
## ProductSilver Plan                           < 2e-16 ***
## ProductSingle Trip Travel Protect Gold      2.45e-10 ***
## ProductSingle Trip Travel Protect Platinum  7.45e-09 ***
## ProductSingle Trip Travel Protect Silver    0.000349 ***
## ProductSpouse or Parents Comprehensive Plan 0.002625 ** 
## ProductTicket Protector                     0.074377 .  
## ProductTravel Cruise Protect                0.663397    
## ProductTravel Cruise Protect Family         0.996457    
## ProductValue Plan                           0.018387 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 9671.8  on 63325  degrees of freedom
## Residual deviance: 8270.5  on 63300  degrees of freedom
## AIC: 8322.5
## 
## Number of Fisher Scoring iterations: 15
glmod2 <- glm(Claim ~ Product + Duration, data=travel_data, family=binomial())

summary(glmod2)
## 
## Call:
## glm(formula = Claim ~ Product + Duration, family = binomial(), 
##     data = travel_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.5196  -0.1468  -0.1183  -0.0690   3.4825  
## 
## Coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                 -5.921e+00  3.338e-01 -17.735
## Product2 way Comprehensive Plan              1.384e+00  3.443e-01   4.020
## Product24 Protect                           -1.066e+01  1.527e+02  -0.070
## ProductAnnual Gold Plan                      3.655e+00  4.175e-01   8.755
## ProductAnnual Silver Plan                    3.675e+00  3.580e-01  10.265
## ProductAnnual Travel Protect Gold            3.563e+00  4.821e-01   7.391
## ProductAnnual Travel Protect Platinum        2.948e+00  6.888e-01   4.280
## ProductAnnual Travel Protect Silver          2.735e+00  6.199e-01   4.412
## ProductBasic Plan                            4.395e-01  3.938e-01   1.116
## ProductBronze Plan                           3.004e+00  3.412e-01   8.803
## ProductCancellation Plan                    -1.421e-01  3.664e-01  -0.388
## ProductChild Comprehensive Plan             -1.080e+01  7.998e+02  -0.014
## ProductComprehensive Plan                    1.615e+00  5.607e-01   2.881
## ProductGold Plan                             2.987e+00  4.123e-01   7.244
## ProductIndividual Comprehensive Plan         2.601e+00  6.842e-01   3.802
## ProductPremier Plan                          1.746e+00  6.708e-01   2.603
## ProductRental Vehicle Excess Insurance       1.310e+00  3.510e-01   3.733
## ProductSilver Plan                           3.204e+00  3.449e-01   9.289
## ProductSingle Trip Travel Protect Gold       2.945e+00  4.654e-01   6.329
## ProductSingle Trip Travel Protect Platinum   3.299e+00  5.711e-01   5.777
## ProductSingle Trip Travel Protect Silver     2.166e+00  6.061e-01   3.573
## ProductSpouse or Parents Comprehensive Plan  3.127e+00  1.092e+00   2.864
## ProductTicket Protector                      8.087e-01  5.194e-01   1.557
## ProductTravel Cruise Protect                 3.333e-01  7.832e-01   0.426
## ProductTravel Cruise Protect Family         -1.066e+01  2.400e+03  -0.004
## ProductValue Plan                            9.516e-01  4.055e-01   2.347
## Duration                                     4.207e-04  2.778e-04   1.514
##                                             Pr(>|z|)    
## (Intercept)                                  < 2e-16 ***
## Product2 way Comprehensive Plan             5.81e-05 ***
## Product24 Protect                           0.944345    
## ProductAnnual Gold Plan                      < 2e-16 ***
## ProductAnnual Silver Plan                    < 2e-16 ***
## ProductAnnual Travel Protect Gold           1.46e-13 ***
## ProductAnnual Travel Protect Platinum       1.87e-05 ***
## ProductAnnual Travel Protect Silver         1.03e-05 ***
## ProductBasic Plan                           0.264385    
## ProductBronze Plan                           < 2e-16 ***
## ProductCancellation Plan                    0.698196    
## ProductChild Comprehensive Plan             0.989227    
## ProductComprehensive Plan                   0.003964 ** 
## ProductGold Plan                            4.34e-13 ***
## ProductIndividual Comprehensive Plan        0.000144 ***
## ProductPremier Plan                         0.009243 ** 
## ProductRental Vehicle Excess Insurance      0.000189 ***
## ProductSilver Plan                           < 2e-16 ***
## ProductSingle Trip Travel Protect Gold      2.47e-10 ***
## ProductSingle Trip Travel Protect Platinum  7.61e-09 ***
## ProductSingle Trip Travel Protect Silver    0.000352 ***
## ProductSpouse or Parents Comprehensive Plan 0.004184 ** 
## ProductTicket Protector                     0.119463    
## ProductTravel Cruise Protect                0.670407    
## ProductTravel Cruise Protect Family         0.996456    
## ProductValue Plan                           0.018933 *  
## Duration                                    0.129932    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 9671.8  on 63325  degrees of freedom
## Residual deviance: 8269.0  on 63299  degrees of freedom
## AIC: 8323
## 
## Number of Fisher Scoring iterations: 15
glmod3 <- glm(Claim ~ Product + Duration + Age, data=travel_data, family=binomial())

summary(glmod3)
## 
## Call:
## glm(formula = Claim ~ Product + Duration + Age, family = binomial(), 
##     data = travel_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.5239  -0.1472  -0.1226  -0.0694   3.5430  
## 
## Coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                 -5.759e+00  3.479e-01 -16.551
## Product2 way Comprehensive Plan              1.388e+00  3.443e-01   4.031
## Product24 Protect                           -1.061e+01  1.527e+02  -0.069
## ProductAnnual Gold Plan                      3.688e+00  4.180e-01   8.823
## ProductAnnual Silver Plan                    3.692e+00  3.582e-01  10.308
## ProductAnnual Travel Protect Gold            3.574e+00  4.822e-01   7.412
## ProductAnnual Travel Protect Platinum        2.978e+00  6.891e-01   4.321
## ProductAnnual Travel Protect Silver          2.751e+00  6.200e-01   4.437
## ProductBasic Plan                            4.728e-01  3.943e-01   1.199
## ProductBronze Plan                           3.002e+00  3.412e-01   8.797
## ProductCancellation Plan                    -1.419e-01  3.664e-01  -0.387
## ProductChild Comprehensive Plan             -1.090e+01  7.998e+02  -0.014
## ProductComprehensive Plan                    1.724e+00  5.642e-01   3.056
## ProductGold Plan                             3.001e+00  4.124e-01   7.278
## ProductIndividual Comprehensive Plan         2.630e+00  6.845e-01   3.842
## ProductPremier Plan                          1.775e+00  6.711e-01   2.646
## ProductRental Vehicle Excess Insurance       1.326e+00  3.511e-01   3.777
## ProductSilver Plan                           3.208e+00  3.449e-01   9.299
## ProductSingle Trip Travel Protect Gold       2.959e+00  4.654e-01   6.356
## ProductSingle Trip Travel Protect Platinum   3.319e+00  5.712e-01   5.810
## ProductSingle Trip Travel Protect Silver     2.176e+00  6.061e-01   3.589
## ProductSpouse or Parents Comprehensive Plan  3.164e+00  1.092e+00   2.897
## ProductTicket Protector                      8.647e-01  5.204e-01   1.662
## ProductTravel Cruise Protect                 3.921e-01  7.839e-01   0.500
## ProductTravel Cruise Protect Family         -1.068e+01  2.400e+03  -0.004
## ProductValue Plan                            1.071e+00  4.109e-01   2.608
## Duration                                     4.186e-04  2.783e-04   1.504
## Age                                         -4.433e-03  2.695e-03  -1.645
##                                             Pr(>|z|)    
## (Intercept)                                  < 2e-16 ***
## Product2 way Comprehensive Plan             5.56e-05 ***
## Product24 Protect                           0.944620    
## ProductAnnual Gold Plan                      < 2e-16 ***
## ProductAnnual Silver Plan                    < 2e-16 ***
## ProductAnnual Travel Protect Gold           1.24e-13 ***
## ProductAnnual Travel Protect Platinum       1.55e-05 ***
## ProductAnnual Travel Protect Silver         9.12e-06 ***
## ProductBasic Plan                           0.230485    
## ProductBronze Plan                           < 2e-16 ***
## ProductCancellation Plan                    0.698613    
## ProductChild Comprehensive Plan             0.989123    
## ProductComprehensive Plan                   0.002245 ** 
## ProductGold Plan                            3.39e-13 ***
## ProductIndividual Comprehensive Plan        0.000122 ***
## ProductPremier Plan                         0.008157 ** 
## ProductRental Vehicle Excess Insurance      0.000159 ***
## ProductSilver Plan                           < 2e-16 ***
## ProductSingle Trip Travel Protect Gold      2.07e-10 ***
## ProductSingle Trip Travel Protect Platinum  6.26e-09 ***
## ProductSingle Trip Travel Protect Silver    0.000332 ***
## ProductSpouse or Parents Comprehensive Plan 0.003769 ** 
## ProductTicket Protector                     0.096603 .  
## ProductTravel Cruise Protect                0.616910    
## ProductTravel Cruise Protect Family         0.996449    
## ProductValue Plan                           0.009117 ** 
## Duration                                    0.132521    
## Age                                         0.100071    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 9671.8  on 63325  degrees of freedom
## Residual deviance: 8266.2  on 63298  degrees of freedom
## AIC: 8322.2
## 
## Number of Fisher Scoring iterations: 15
compareGLM(glmod1, glmod2, glmod3) 
## $Models
##   Formula                           
## 1 "Claim ~ Product"                 
## 2 "Claim ~ Product + Duration"      
## 3 "Claim ~ Product + Duration + Age"
## 
## $Fit.criteria
##   Rank Df.res  AIC AICc  BIC McFadden Cox.and.Snell Nagelkerke    p.value
## 1   26  63300 8325 8325 8569   0.1449       0.02189     0.1545 9.844e-281
## 2   27  63300 8325 8325 8579   0.1450       0.02191     0.1547 3.547e-280
## 3   28  63300 8324 8324 8587   0.1453       0.02195     0.1550 6.763e-280

Comparison of the three different GLMs shows an almost identical AIC for all three models (8325 or 8324). This would indicate the models are comparatively the same in which the third model slightly outperforms the first two.

library(ModelMetrics)
## 
## Attaching package: 'ModelMetrics'
## The following object is masked from 'package:base':
## 
##     kappa
lmod <- lm(Claim ~ Product + Duration + Age, data=travel_data)

summary(lmod)
## 
## Call:
## lm(formula = Claim ~ Product + Duration + Age, data = travel_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.11303 -0.01082 -0.00811 -0.00260  1.00258 
## 
## Coefficients:
##                                               Estimate Std. Error t value
## (Intercept)                                  4.915e-03  2.476e-03   1.985
## Product2 way Comprehensive Plan              8.008e-03  2.292e-03   3.494
## Product24 Protect                           -1.982e-03  7.795e-03  -0.254
## ProductAnnual Gold Plan                      1.037e-01  8.946e-03  11.593
## ProductAnnual Silver Plan                    1.055e-01  4.247e-03  24.852
## ProductAnnual Travel Protect Gold            9.508e-02  1.215e-02   7.826
## ProductAnnual Travel Protect Platinum        5.197e-02  1.646e-02   3.156
## ProductAnnual Travel Protect Silver          4.159e-02  1.306e-02   3.185
## ProductBasic Plan                            1.939e-03  2.611e-03   0.743
## ProductBronze Plan                           4.912e-02  2.760e-03  17.796
## ProductCancellation Plan                    -4.499e-04  2.222e-03  -0.202
## ProductChild Comprehensive Plan             -6.501e-03  3.944e-02  -0.165
## ProductComprehensive Plan                    1.238e-02  6.596e-03   1.877
## ProductGold Plan                             4.861e-02  6.615e-03   7.349
## ProductIndividual Comprehensive Plan         3.596e-02  1.401e-02   2.566
## ProductPremier Plan                          1.304e-02  8.720e-03   1.495
## ProductRental Vehicle Excess Insurance       7.434e-03  2.415e-03   3.078
## ProductSilver Plan                           6.002e-02  3.221e-03  18.636
## ProductSingle Trip Travel Protect Gold       4.652e-02  8.511e-03   5.465
## ProductSingle Trip Travel Protect Platinum   6.606e-02  1.396e-02   4.732
## ProductSingle Trip Travel Protect Silver     2.055e-02  9.202e-03   2.234
## ProductSpouse or Parents Comprehensive Plan  6.221e-02  3.060e-02   2.033
## ProductTicket Protector                      4.075e-03  4.234e-03   0.963
## ProductTravel Cruise Protect                 1.870e-03  5.558e-03   0.336
## ProductTravel Cruise Protect Family         -3.049e-03  1.180e-01  -0.026
## ProductValue Plan                            6.139e-03  3.252e-03   1.888
## Duration                                     6.624e-06  5.690e-06   1.164
## Age                                         -6.453e-05  3.790e-05  -1.702
##                                             Pr(>|t|)    
## (Intercept)                                 0.047125 *  
## Product2 way Comprehensive Plan             0.000476 ***
## Product24 Protect                           0.799298    
## ProductAnnual Gold Plan                      < 2e-16 ***
## ProductAnnual Silver Plan                    < 2e-16 ***
## ProductAnnual Travel Protect Gold           5.10e-15 ***
## ProductAnnual Travel Protect Platinum       0.001598 ** 
## ProductAnnual Travel Protect Silver         0.001450 ** 
## ProductBasic Plan                           0.457684    
## ProductBronze Plan                           < 2e-16 ***
## ProductCancellation Plan                    0.839543    
## ProductChild Comprehensive Plan             0.869088    
## ProductComprehensive Plan                   0.060492 .  
## ProductGold Plan                            2.03e-13 ***
## ProductIndividual Comprehensive Plan        0.010278 *  
## ProductPremier Plan                         0.134842    
## ProductRental Vehicle Excess Insurance      0.002086 ** 
## ProductSilver Plan                           < 2e-16 ***
## ProductSingle Trip Travel Protect Gold      4.65e-08 ***
## ProductSingle Trip Travel Protect Platinum  2.23e-06 ***
## ProductSingle Trip Travel Protect Silver    0.025517 *  
## ProductSpouse or Parents Comprehensive Plan 0.042044 *  
## ProductTicket Protector                     0.335788    
## ProductTravel Cruise Protect                0.736503    
## ProductTravel Cruise Protect Family         0.979391    
## ProductValue Plan                           0.059043 .  
## Duration                                    0.244401    
## Age                                         0.088676 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.118 on 63298 degrees of freedom
## Multiple R-squared:  0.03517,    Adjusted R-squared:  0.03476 
## F-statistic: 85.46 on 27 and 63298 DF,  p-value: < 2.2e-16
AIC(lmod)
## [1] -90927.14
#mae(lmod)
DMwR::regr.eval(travel_data$Claim, lmod$fitted.values)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
##        mae        mse       rmse       mape 
## 0.02783661 0.01391691 0.11796994        Inf

Naive linear regression model created with the same independent variables as the third GLM above for comparison purposes.

AIC(glmod3)
## [1] 8322.245
mae(glmod3)
## [1] 0.02782946

Interestingly, the AIC of the linear regression model is negative, while the AIC of the GLM is positive. This is cause for further investigation.

As for the MAE, the GLM did slightly outperform the linear model in MAE (mean absolute error).

Conclusion

Overall, generalized linear models allow for the application of linear models to a wider array of datasets. As my blog entries have attempted to focus on real-world, observational datasets, the GLMs can be a powerful tool for evaluating such data. The binomial nature of the response data in the implementation above represents a good use case for building a GLM using the binomial family as the dependent variable is guaranteed to not follow a linear pattern. The GLM offers a viable approach to generating models to meaningfully evaluate non-normal, non-continuous distributions.