Project 4

Insurance Data: Predicting Insurance Claims by Binomial and Multiple Regressions

Our task was to create a model to predict the probability a policy would experience an auto accident and how much it would cost if an accident occurred. We have two target variables indicating the existance of a claim and the amount of the claim. To attempt to make a prediction, we created three sets of models to explore modelling losses separate from likelihood of a claim, or to model both in a single model. The variables provided include:

Many of the variables weren’t clean to start with. Some also needed to be changed into dummy variables. Old claims is now an ordinal categorical variable. Claims and income are formatted to lose $ format. Urbanicity is split for work and home, then work is kept because they’re identical. NAs for income are imputed at 61898.09,(the mean). For YOJ, NA is turned to 0 so it doesn’t ruin the baseline for the career dummy variables. To create models and evaluate them, we will be segregating 25% of the cases to create an evaluation set.

.

Looking at the correlation plot, we find few surprises. Associated dummy variables are negatively correlated with one another. The three driving experience variables, old claims, old claim frequency and driver record points all correlate with one another. Later, we’ll check if this poses any problem for our models. Nothing appears to have a high correlation with our target variable, indicating that a meaningful model will be hard to fit.

.

To get a sense of how our variables might help us build a model, we take a look at the significance of single variable models using the claim presence variable and a binomial logit model. We take a look at our variables indicating our policyholders’ careers. We find that doctors, lawyers, managers and professionals are less likely to have claims. Students are more likely. From these models, it is unclear if we are seeing an effect based on wealth or on age.

Single Variable Models by Professional Category
Profession Model P_value
Clerical y = 0.1669x+-1.0531 0.0135
Doctor y = -1.0096x+-1.003 4.09e-07
Homemaker y = 0.0933x+-1.0337 0.309
Lawyer y = -0.5151x+-0.9795 3.28e-08
Manager y = -0.8872x+-0.9392 1.9e-20
Professional y = -0.2674x+-0.9917 0.000507
Student y = 0.5641x+-1.081 5.72e-12
BlueCollar y = 0.5236x+-1.1541 6.33e-20

The beta parameters for variables connected to wealth are significant at a .001 level, but are incredibly small. The effect of wealth on car accident likelihood is extremely small. We can see the effect in regression models and in boxplots of three of our wealth variables.

Single Variable Models by Wealth Measures
Variable Model P_value
INCOME y = -7.7e-06x+-0.5796 5.34e-35
HOME_VAL y = -3.5e-06x+-0.53 6.81e-57
EDUCATION y = 0.0672548x+-1.2363 0.000109
BLUEBOOK y = -2.95e-05x+-0.5769 1.61e-20
CAR_AGE y = -0.0411201x+-0.7018 1.88e-18
REVOKED y = 0.9303102x+-1.1593 6.15e-41

We’ll now investigate the impact of variables on the size of claims. We’ll investigate models based on the overall likely magnitude of claim size before turning to a conditional probability model for claim sizes given an accident has occurred. Here are single variable models for conditional probability, given an accident was claimed.

Single Variable Models by Wealth Measures
Variable Model P_value
HOME_VAL y = 0.0019148x+5450.4687 0.192
BLUEBOOK y = 0.1101667x+4131.6544 3.9e-08
RED_CAR y = 468.1950991x+5568.2235 0.205
OLDCLAIM y = -38.4348091x+5796.8835 0.729
CLM_FREQ y = 12.1620883x+5687.3798 0.928
REVOKED y = -707.8851112x+5847.834 0.0864
MVR_PTS y = 119.5203636x+5405.5718 0.0648
CAR_AGE y = -18.5106011x+5861.9967 0.56

The first number that pops out is the measure for REVOKED. Those whose licenses have been revoked are likely to have lower claim sizes. If we look above, we see that they are more likely to have a claim. This could be a fruitful subject of further investigation. We also see a different effect from wealth measures. It seems that more wealthy people are slightly less likely to file a claim but, with more expensive cars, are likely to file larger claims. Also, looking at plots of some of our variables compared to claim amount, we find that people with more past claims are more likely to file smaller claims.

.

Turning to creating models, we first look at the likelihood to file a claim. We’ll create a model from forward stepwise regression. We’ll create a model from our most significant predictors with a k-means cluster augmenting our variables. We’ll create a model with the same variables, but without the k-means augmentation. For now, we’ll evaluate them based on AIC. Later, we’ll look at ROC curves.

## 
## Call:
## glm(formula = TARGET_FLAG ~ REVOKED + work + HOME_VAL + MVR_PTS + 
##     CAR_USE + BLUEBOOK + PARENT1 + Manager + TRAVTIME + KIDSDRIV + 
##     TIF + INCOME + CLM_FREQ + Sports_Car + SUV + MSTATUS + Clerical + 
##     Pickup + Van + Panel_Truck + CAR_AGE + BlueCollar + EDUCATION + 
##     Doctor + YOJ + HOMEKIDS, family = binomial(link = "logit"), 
##     data = training_set)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5429  -0.7150  -0.3957   0.6089   2.8533  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -2.763e+00  2.405e-01 -11.490  < 2e-16 ***
## REVOKEDYes      6.950e-01  9.318e-02   7.459 8.73e-14 ***
## workUrban       2.326e+00  1.289e-01  18.041  < 2e-16 ***
## HOME_VAL       -1.406e-06  3.849e-07  -3.653 0.000259 ***
## MVR_PTS         1.095e-01  1.574e-02   6.956 3.49e-12 ***
## CAR_USEPrivate -7.719e-01  9.725e-02  -7.938 2.06e-15 ***
## BLUEBOOK       -2.481e-05  5.425e-06  -4.573 4.80e-06 ***
## PARENT1Yes      3.299e-01  1.266e-01   2.605 0.009181 ** 
## Manager        -8.000e-01  1.288e-01  -6.213 5.20e-10 ***
## TRAVTIME        1.567e-02  2.189e-03   7.160 8.08e-13 ***
## KIDSDRIV        3.932e-01  6.864e-02   5.729 1.01e-08 ***
## TIF            -5.859e-02  8.540e-03  -6.861 6.82e-12 ***
## INCOME         -4.879e-06  1.114e-06  -4.379 1.19e-05 ***
## CLM_FREQ        1.534e-01  2.952e-02   5.196 2.03e-07 ***
## Sports_Car      9.933e-01  1.236e-01   8.038 9.12e-16 ***
## SUV             6.668e-01  9.969e-02   6.689 2.24e-11 ***
## MSTATUSYes     -5.298e-01  9.580e-02  -5.530 3.20e-08 ***
## Clerical        2.588e-01  1.054e-01   2.455 0.014099 *  
## Pickup          5.127e-01  1.154e-01   4.444 8.83e-06 ***
## Van             7.091e-01  1.400e-01   5.063 4.12e-07 ***
## Panel_Truck     5.791e-01  1.692e-01   3.423 0.000620 ***
## CAR_AGE        -1.956e-02  7.252e-03  -2.697 0.007005 ** 
## BlueCollar      2.216e-01  9.940e-02   2.230 0.025758 *  
## EDUCATION       6.712e-02  2.338e-02   2.871 0.004097 ** 
## Doctor         -4.509e-01  2.585e-01  -1.744 0.081153 .  
## YOJ            -8.312e-03  7.785e-03  -1.068 0.285678    
## HOMEKIDS        4.486e-02  3.903e-02   1.149 0.250370    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7039.4  on 6120  degrees of freedom
## Residual deviance: 5444.8  on 6094  degrees of freedom
## AIC: 5498.8
## 
## Number of Fisher Scoring iterations: 5
##     REVOKED        work    HOME_VAL     MVR_PTS     CAR_USE    BLUEBOOK 
##    1.004836    1.127765    1.747249    1.142618    2.060208    1.779887 
##     PARENT1     Manager    TRAVTIME    KIDSDRIV         TIF      INCOME 
##    1.921724    1.118494    1.038259    1.288659    1.012448    1.907773 
##    CLM_FREQ  Sports_Car         SUV     MSTATUS    Clerical      Pickup 
##    1.157604    1.498749    1.821772    2.014094    1.321975    1.807172 
##         Van Panel_Truck     CAR_AGE  BlueCollar   EDUCATION      Doctor 
##    1.564285    2.126481    1.359082    1.689536    1.075873    1.098444 
##         YOJ    HOMEKIDS 
##    1.190885    1.821511

Our first model is the result of a forward selection starting with REVOKED as the base model. No VIF is higher than 2.5. We are not worried about multicollinearity. The residuals are reasonably random and show no discernible pattern. The Cook’s Distance graph shows that there is no sign of influential points that are skewing our results. With an AIC of 5498.8, we have a benchmark to compare our other 2 logit models.

```

means_group<-matrix(kmeans(training_set[,c(6,7,12,21,28,33,24,25)],2))
training_set<-cbind(training_set,means_group[1])
colnames(training_set)[37]<-'means_group'
kmeans_model<-glm(data=training_set, TARGET_FLAG~REVOKED +MSTATUS +MVR_PTS + work +CAR_USE +TRAVTIME +TIF+means_group,family=binomial(link='logit'))
summary(kmeans_model)
## 
## Call:
## glm(formula = TARGET_FLAG ~ REVOKED + MSTATUS + MVR_PTS + work + 
##     CAR_USE + TRAVTIME + TIF + means_group, family = binomial(link = "logit"), 
##     data = training_set)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1638  -0.7599  -0.4806   0.7690   3.0759  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -4.405039   0.218873 -20.126  < 2e-16 ***
## REVOKEDYes      0.762527   0.088690   8.598  < 2e-16 ***
## MSTATUSYes     -0.707165   0.065106 -10.862  < 2e-16 ***
## MVR_PTS         0.163621   0.014217  11.509  < 2e-16 ***
## workUrban       1.997261   0.121299  16.466  < 2e-16 ***
## CAR_USEPrivate -0.778455   0.065465 -11.891  < 2e-16 ***
## TRAVTIME        0.014797   0.002086   7.093 1.31e-12 ***
## TIF            -0.053969   0.008175  -6.602 4.06e-11 ***
## means_group     1.048728   0.081553  12.860  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7039.4  on 6120  degrees of freedom
## Residual deviance: 5876.5  on 6112  degrees of freedom
## AIC: 5894.5
## 
## Number of Fisher Scoring iterations: 5
small_model<-glm(data=training_set, TARGET_FLAG~REVOKED +MSTATUS +MVR_PTS + work +CAR_USE +TRAVTIME +TIF,family=binomial(link='logit'))
summary(small_model)
## 
## Call:
## glm(formula = TARGET_FLAG ~ REVOKED + MSTATUS + MVR_PTS + work + 
##     CAR_USE + TRAVTIME + TIF, family = binomial(link = "logit"), 
##     data = training_set)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0848  -0.7782  -0.5301   0.8390   2.7365  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -2.465712   0.153999 -16.011  < 2e-16 ***
## REVOKEDYes      0.778057   0.087001   8.943  < 2e-16 ***
## MSTATUSYes     -0.645667   0.063643 -10.145  < 2e-16 ***
## MVR_PTS         0.176422   0.013975  12.624  < 2e-16 ***
## workUrban       1.803680   0.119589  15.082  < 2e-16 ***
## CAR_USEPrivate -0.724499   0.064012 -11.318  < 2e-16 ***
## TRAVTIME        0.014320   0.002054   6.973 3.10e-12 ***
## TIF            -0.052313   0.008021  -6.522 6.92e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7039.4  on 6120  degrees of freedom
## Residual deviance: 6061.0  on 6113  degrees of freedom
## AIC: 6077
## 
## Number of Fisher Scoring iterations: 5

Our next two models have higher AICs. All of the individual t-tests are significant at a .001 level. The inclusion of a 2 group k-means model improved the small model. Both, however, don’t perform as well as the larger forward stepwise regression.

.

Moving to a model for loss amounts, we first look at models that attempt to model both the likelihood of paying and the likely amount in one model. Our first model entertains all of the possible variables. We will be using multi-regression models.

max_model<-lm(data=training_set, TARGET_AMT~KIDSDRIV +AGE+HOMEKIDS +YOJ + INCOME + PARENT1+ HOME_VAL + MSTATUS + SEX + EDUCATION + TRAVTIME+ CAR_USE + BLUEBOOK + TIF + RED_CAR  + OLDCLAIM + CLM_FREQ + REVOKED + MVR_PTS + CAR_AGE + work + Panel_Truck + Pickup + Sports_Car + Van + SUV + Clerical       
+ Doctor + Homemaker + Lawyer + Manager + Professional + Student + BlueCollar)
summary(max_model)
## 
## Call:
## lm(formula = TARGET_AMT ~ KIDSDRIV + AGE + HOMEKIDS + YOJ + INCOME + 
##     PARENT1 + HOME_VAL + MSTATUS + SEX + EDUCATION + TRAVTIME + 
##     CAR_USE + BLUEBOOK + TIF + RED_CAR + OLDCLAIM + CLM_FREQ + 
##     REVOKED + MVR_PTS + CAR_AGE + work + Panel_Truck + Pickup + 
##     Sports_Car + Van + SUV + Clerical + Doctor + Homemaker + 
##     Lawyer + Manager + Professional + Student + BlueCollar, data = training_set)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -5475  -1642   -721    385 103240 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     9.773e+01  6.372e+02   0.153 0.878107    
## KIDSDRIV        2.488e+02  1.270e+02   1.960 0.050083 .  
## AGE             7.940e+00  7.985e+00   0.994 0.320079    
## HOMEKIDS        4.210e+01  7.290e+01   0.577 0.563628    
## YOJ            -2.176e+01  1.383e+01  -1.573 0.115679    
## INCOME         -3.783e-03  1.979e-03  -1.912 0.055970 .  
## PARENT1Yes      7.107e+02  2.302e+02   3.087 0.002033 ** 
## HOME_VAL       -4.884e-04  6.647e-04  -0.735 0.462547    
## MSTATUSYes     -5.357e+02  1.633e+02  -3.280 0.001043 ** 
## SEXM            2.563e+02  2.082e+02   1.231 0.218480    
## EDUCATION       2.691e+01  4.127e+01   0.652 0.514342    
## TRAVTIME        1.239e+01  3.677e+00   3.371 0.000755 ***
## CAR_USEPrivate -9.373e+02  1.826e+02  -5.133 2.93e-07 ***
## BLUEBOOK        1.589e-02  9.728e-03   1.634 0.102400    
## TIF            -4.758e+01  1.384e+01  -3.438 0.000590 ***
## RED_CARyes      1.402e+01  1.692e+02   0.083 0.933996    
## OLDCLAIM       -4.397e+01  6.522e+01  -0.674 0.500259    
## CLM_FREQ        2.076e+02  7.726e+01   2.687 0.007225 ** 
## REVOKEDYes      5.589e+02  1.850e+02   3.020 0.002534 ** 
## MVR_PTS         1.473e+02  2.982e+01   4.940 8.03e-07 ***
## CAR_AGE        -3.749e+01  1.286e+01  -2.915 0.003572 ** 
## workUrban       1.540e+03  1.587e+02   9.706  < 2e-16 ***
## Panel_Truck     8.892e+01  3.097e+02   0.287 0.774057    
## Pickup          2.123e+02  1.921e+02   1.105 0.269312    
## Sports_Car      1.086e+03  2.488e+02   4.364 1.30e-05 ***
## Van             6.178e+02  2.407e+02   2.567 0.010293 *  
## SUV             6.120e+02  2.047e+02   2.989 0.002807 ** 
## Clerical        2.005e+02  3.357e+02   0.597 0.550380    
## Doctor         -4.251e+02  4.267e+02  -0.996 0.319182    
## Homemaker      -1.718e+02  3.852e+02  -0.446 0.655723    
## Lawyer          1.441e+02  3.248e+02   0.444 0.657401    
## Manager        -7.454e+02  3.024e+02  -2.465 0.013728 *  
## Professional    3.575e+01  3.031e+02   0.118 0.906116    
## Student         1.206e+02  3.780e+02   0.319 0.749824    
## BlueCollar      1.025e+02  3.044e+02   0.337 0.736369    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4467 on 6086 degrees of freedom
## Multiple R-squared:  0.07484,    Adjusted R-squared:  0.06968 
## F-statistic: 14.48 on 34 and 6086 DF,  p-value: < 2.2e-16

Our maximal model has an adjusted R2 of 0.06968. We are not going to be able to find a model that explains a large portion of the variance in claim amount. We will have to explore the limits of our models to understand why finding a model for this data is so difficult.

#revoked_model<-lm(data=training_set, TARGET_AMT~REVOKED)
#step(revoked_model,scope=list(lower=revoked_model,upper=max_model) ,direction="forward")
step_claimSize_model<-lm(data=training_set, TARGET_AMT~ REVOKED + MVR_PTS + CAR_USE + work + PARENT1 + INCOME + Manager + MSTATUS + CLM_FREQ + TIF + TRAVTIME + CAR_AGE +  Sports_Car + Van + KIDSDRIV + SUV)
summary(step_claimSize_model)
## 
## Call:
## lm(formula = TARGET_AMT ~ REVOKED + MVR_PTS + CAR_USE + work + 
##     PARENT1 + INCOME + Manager + MSTATUS + CLM_FREQ + TIF + TRAVTIME + 
##     CAR_AGE + Sports_Car + Van + KIDSDRIV + SUV, data = training_set)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -5393  -1639   -722    361 103469 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1.013e+03  2.610e+02   3.882 0.000105 ***
## REVOKEDYes      5.341e+02  1.763e+02   3.030 0.002460 ** 
## MVR_PTS         1.449e+02  2.937e+01   4.933 8.32e-07 ***
## CAR_USEPrivate -1.070e+03  1.269e+02  -8.428  < 2e-16 ***
## workUrban       1.514e+03  1.556e+02   9.726  < 2e-16 ***
## PARENT1Yes      6.941e+02  2.004e+02   3.463 0.000538 ***
## INCOME         -4.460e-03  1.404e-03  -3.176 0.001502 ** 
## Manager        -7.701e+02  1.820e+02  -4.231 2.36e-05 ***
## MSTATUSYes     -5.936e+02  1.353e+02  -4.387 1.17e-05 ***
## CLM_FREQ        1.764e+02  5.540e+01   3.185 0.001457 ** 
## TIF            -4.743e+01  1.380e+01  -3.437 0.000591 ***
## TRAVTIME        1.229e+01  3.668e+00   3.350 0.000812 ***
## CAR_AGE        -3.863e+01  1.146e+01  -3.371 0.000753 ***
## Sports_Car      7.812e+02  1.949e+02   4.007 6.22e-05 ***
## Van             5.858e+02  2.050e+02   2.858 0.004283 ** 
## KIDSDRIV        2.805e+02  1.150e+02   2.440 0.014699 *  
## SUV             3.055e+02  1.405e+02   2.175 0.029691 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4465 on 6104 degrees of freedom
## Multiple R-squared:  0.07281,    Adjusted R-squared:  0.07038 
## F-statistic: 29.96 on 16 and 6104 DF,  p-value: < 2.2e-16
small_model<-lm(data=training_set, TARGET_AMT~Sports_Car +MSTATUS + Manager + work + CAR_USE)
summary(small_model)
## 
## Call:
## lm(formula = TARGET_AMT ~ Sports_Car + MSTATUS + Manager + work + 
##     CAR_USE, data = training_set)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -3868  -1914  -1068    475 105393 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      1496.0      164.8   9.079  < 2e-16 ***
## Sports_Car        828.7      186.8   4.436 9.32e-06 ***
## MSTATUSYes       -846.1      118.3  -7.150 9.66e-13 ***
## Manager         -1121.0      181.3  -6.183 6.68e-10 ***
## workUrban        1543.3      146.8  10.515  < 2e-16 ***
## CAR_USEPrivate  -1124.9      121.3  -9.272  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4529 on 6115 degrees of freedom
## Multiple R-squared:  0.04444,    Adjusted R-squared:  0.04366 
## F-statistic: 56.88 on 5 and 6115 DF,  p-value: < 2.2e-16

Our forward stepwise regression has an adjusted R2 of 0.07038. It is a small improvement. Our last model, a small model with the most significant 5 predictors has a very small R2. Our forward stepwise regression is the best of some fairly weak models.

Our graphs above and below tell part of the story of the difficulty of creating a model for claim sizes. The actual data is really right skewed. Insurance loss values are often this way. This is why a model with heavy tails, like a Pareto or Weibull distribution is often used in modelling insurance losses. Our zoomed in graph shows that the model also entertains negative values for losses. A logged model may help eliminate this feature.

Looking at the difference between our prediction and the true value, we see that a number of values that are far in the tail and hard to model and are much larger than our prediction. It would be difficult to decide who will have a large accident, as opposed to who might be more likely to. For this reason, it would be far more practical to model the full pool of losses or smaller groups. To model individual losses would be quite difficult. That is why we have such a small R2.

Finally, we attempt to create a conditional model for losses only given that a loss has already occurred. We also attempt to log the amount variable to see if we could build a better model that takes into account the skewed nature of our losses. Our model with this tactic leads to a quite small adjusted R2 of 0.01237. It does not appear to pay to create a separate model with conditional probability.

## 
## Call:
## lm(formula = LOGGED_TARGET ~ KIDSDRIV + AGE + HOMEKIDS + YOJ + 
##     INCOME + PARENT1 + HOME_VAL + MSTATUS + SEX + EDUCATION + 
##     TRAVTIME + CAR_USE + BLUEBOOK + TIF + RED_CAR + OLDCLAIM + 
##     CLM_FREQ + REVOKED + MVR_PTS + CAR_AGE + work + Panel_Truck + 
##     Pickup + Sports_Car + Van + SUV + Clerical + Doctor + Homemaker + 
##     Lawyer + Manager + Professional + Student + BlueCollar, data = logged_set)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7297 -0.3891  0.0348  0.4050  3.1218 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     8.076e+00  2.338e-01  34.547  < 2e-16 ***
## KIDSDRIV       -6.407e-02  3.807e-02  -1.683 0.092589 .  
## AGE             2.979e-03  2.597e-03   1.147 0.251562    
## HOMEKIDS        1.793e-02  2.507e-02   0.715 0.474688    
## YOJ            -7.960e-03  5.004e-03  -1.591 0.111864    
## INCOME         -9.588e-07  7.780e-07  -1.233 0.217940    
## PARENT1Yes      8.437e-02  7.172e-02   1.176 0.239608    
## HOME_VAL        1.502e-07  2.422e-07   0.620 0.535072    
## MSTATUSYes     -8.247e-02  5.913e-02  -1.395 0.163266    
## SEXM            1.019e-01  8.008e-02   1.272 0.203557    
## EDUCATION       1.318e-02  1.367e-02   0.964 0.335170    
## TRAVTIME       -1.066e-03  1.346e-03  -0.792 0.428367    
## CAR_USEPrivate -5.121e-02  6.285e-02  -0.815 0.415313    
## BLUEBOOK        1.252e-05  3.655e-06   3.427 0.000627 ***
## TIF            -5.818e-03  5.162e-03  -1.127 0.259870    
## RED_CARyes      5.864e-03  6.030e-02   0.097 0.922547    
## OLDCLAIM        1.475e-02  1.969e-02   0.749 0.453836    
## CLM_FREQ       -2.121e-02  2.291e-02  -0.926 0.354563    
## REVOKEDYes     -4.567e-03  5.485e-02  -0.083 0.933656    
## MVR_PTS         8.882e-03  8.382e-03   1.060 0.289470    
## CAR_AGE        -6.836e-04  4.698e-03  -0.146 0.884327    
## workUrban       2.383e-02  9.097e-02   0.262 0.793416    
## Panel_Truck     1.206e-02  1.138e-01   0.106 0.915623    
## Pickup          5.440e-02  7.151e-02   0.761 0.446936    
## Sports_Car      9.737e-02  9.073e-02   1.073 0.283343    
## Van             5.165e-02  9.206e-02   0.561 0.574848    
## SUV             1.337e-01  8.199e-02   1.631 0.103102    
## Clerical       -1.169e-01  1.184e-01  -0.987 0.323657    
## Doctor         -1.149e-01  2.000e-01  -0.575 0.565600    
## Homemaker      -2.595e-01  1.385e-01  -1.874 0.061070 .  
## Lawyer         -6.360e-02  1.221e-01  -0.521 0.602491    
## Manager        -4.840e-02  1.211e-01  -0.400 0.689463    
## Professional   -5.367e-02  1.087e-01  -0.494 0.621538    
## Student        -1.567e-01  1.304e-01  -1.202 0.229404    
## BlueCollar     -1.234e-01  1.055e-01  -1.170 0.242105    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8062 on 1568 degrees of freedom
## Multiple R-squared:  0.03333,    Adjusted R-squared:  0.01237 
## F-statistic:  1.59 on 34 and 1568 DF,  p-value: 0.01715

Returning to our logit models to predict whether a claim would occurr, we look at the ROC curves created with our evaluation data. Our curves confirm what we found above. None of the models explain a lot of the variation in accident occurrence, but the forward stepwise regression model explains the most and is the most likely to give a true positive while limiting false positives.

## [1] "k-means model"
## Area under the curve: 0.7596

## [1] "forward step model"
## Area under the curve: 0.8022

## [1] "small model"
## Area under the curve: 0.7002

Our selected model, the single forward stepwise-created model modelling likely costs among all policies, appears to have similar residuals to models we’ve previously investigated. There do not seem to be values that are overly influential. Our residuals are random, but skewed. Overall, we’ve discovered that we won’t be able to build a great model. But this model is the best of what we’ve evaluated. Our Root Mean Square Error is 4789.636, which is quite high.

## [1] 4789.636

Our predictions for our test set are as follows:

##         1         2         3         4         5         6         7 
## 3165.7134 3789.3290 3280.0757 3500.6060 3649.5038 3651.5984 4608.9787 
##         8         9        10        11        12        13        14 
## 2000.0000 3163.4054 3921.2870 2248.7003 5119.5131 2000.0000 2491.0060 
##        15        16        17        18        19        20        21 
## 1827.8208 4850.2182 2000.0000 3307.3088 4833.6405 3852.4088 3833.4706 
##        22        23        24        25        26        27        28 
## 3541.5363 1973.3695 3748.5625 3558.5053 4362.6633 4102.7392 4901.3323 
##        29        30        31        32        33        34        35 
## 2010.2328 2000.0000 1998.1489 5042.4838 2469.1644 3939.5076 3309.1275 
##        36        37        38        39        40        41        42 
## 2243.6806 2000.0000 3966.6746 1860.7798 4578.9695 3390.1089 4321.9904 
##        43        44        45        46        47        48        49 
## 1846.1821 5270.5420  642.4453 3284.8901 1862.8694 4495.0873 1378.0836 
##        50        51        52        53        54        55        56 
## 4417.8522 2406.6593 4610.7947 5774.9671 2036.1063 3807.7711 4591.2997 
##        57        58        59        60        61        62        63 
## 3805.1335 4146.6238 2358.1203 4035.0913  973.1650 1727.7817 4430.7549 
##        64        65        66        67        68        69        70 
## 3124.3605 2277.1071 3637.1646 5916.9625 5743.4514 3132.2543 3330.8479 
##        71        72        73        74        75        76        77 
## 1395.5044 3726.5177 5230.8935 3554.2540 5062.4981 3215.6971 3451.7809 
##        78        79        80        81        82        83        84 
## 3673.2863 2935.1929 1902.3050 4542.5386 4456.0362 4259.6899 2358.4948 
##        85        86        87        88        89        90        91 
## 4095.8295 5418.0199 3550.4280 3718.1944 2172.3583 6057.8544 3132.5985 
##        92        93        94        95        96        97        98 
## 2992.6560 1683.0985 2623.5066 2685.2312 3223.0712 1056.9746 4166.8885 
##        99       100       101       102       103       104       105 
## 4760.8431 2850.1161 3236.2395 4878.8403 5121.4840 5806.4603 3080.1361 
##       106       107       108       109       110       111       112 
## 3077.9892 3167.0454 2571.6008 4583.1980 3748.8535 4831.2872 1730.3549 
##       113       114       115       116       117       118       119 
## 3301.1000 2448.7294 4836.0738 2153.1751 1446.5208 5150.0010 4392.7484 
##       120       121       122       123       124       125       126 
## 2730.8821 4042.3693 4978.9275 4439.8954 4113.1989 3917.9270 4000.8417 
##       127       128       129       130       131       132       133 
## 3998.2749 3782.2700 2971.8399 3274.7899 2000.0000 2824.4399 2000.0000 
##       134       135       136       137       138       139       140 
## 2563.4851 3305.2738 3777.5750 5637.6860 5743.4175 2602.2532 2000.0000 
##       141       142       143       144       145       146       147 
## 1629.7389 5750.4227 2694.0433 2135.1291 2469.9512 5227.2559 2678.1124 
##       148       149       150       151       152       153       154 
## 2607.3898 4202.0688 2382.4501 5124.7068 4064.8133 6201.2154 3963.4980 
##       155       156       157       158       159       160       161 
## 4159.6531 4316.8041 2433.0096 2474.9861 4368.7017 4409.4924 3463.9087 
##       162       163       164       165       166       167       168 
## 2405.3887 3670.8900 2024.0777 6692.4475 3366.4922 2763.6731 3571.1017 
##       169       170       171       172       173       174       175 
## 3746.0965 2000.0000 2519.1814 4577.3682 3366.9724 6010.2618 3753.2219 
##       176       177       178       179       180       181       182 
## 3612.9528 4886.8318 4983.2654 6522.1031 5038.3210 5233.3738 4349.9759 
##       183       184       185       186       187       188       189 
## 2393.2917 3836.5394 3677.7241 3917.9855 2000.0000 2037.1249 3424.1460 
##       190       191       192       193       194       195       196 
## 4838.6656 4236.1099 4854.2599 4940.7706 3436.6680 2872.7400 4465.3128 
##       197       198       199       200       201       202       203 
## 4517.7175 4048.8852 3896.1541 3010.1240 2000.0000 4227.9367 2924.5355 
##       204       205       206       207       208       209       210 
## 2948.3362 2363.7089 1319.5407 5926.6264 2792.5609 2782.3138 2027.2573 
##       211       212       213       214       215       216       217 
##  987.1803 3627.6382 4625.2750 4946.6214 2154.2237 4053.6166 4422.9763 
##       218       219       220       221       222       223       224 
## 2403.2657 3334.1797 1375.0320 1972.4324 2185.2819 3976.5688 4159.4661 
##       225       226       227       228       229       230       231 
## 2817.8358 3705.6394 5528.0717 4370.1398 4397.7943 3123.4466 1652.9868 
##       232       233       234       235       236       237       238 
## 3314.8499 6321.4432 2723.1886 1631.6024 3274.4396 2889.1974 2304.1700 
##       239       240       241       242       243       244       245 
## 1482.3202 6097.4847 1855.3653 2717.7697 5630.6407 4308.9377 3044.2470 
##       246       247       248       249       250       251       252 
## 3980.9585 3298.2627 3697.6746 1932.5096 4464.7618 4111.1007 4350.9067 
##       253       254       255       256       257       258       259 
## 2654.2357 3854.0512 3504.9701 4759.0075 2000.4936 3663.2129 4650.7918 
##       260       261       262       263       264       265       266 
## 2734.7524 2508.5815 2570.0693 2918.9354 3068.2299 3227.5619 2339.4265 
##       267       268       269       270       271       272       273 
## 2497.0318 2647.0317 6724.2285 4121.6746 4422.6960 2551.7816 3677.4461 
##       274       275       276       277       278       279       280 
## 6068.6741 3160.9911 4301.3919 5190.8239 3214.2282 3896.7289 2953.1342 
##       281       282       283       284       285       286       287 
## 4096.3049 3948.4253 3577.7552 3540.1545 4152.0764 4233.8902 2360.6667 
##       288       289       290       291       292       293       294 
## 3914.6761 3439.8018 5827.5959 4145.8960 2529.7459 2269.3844 2000.0000 
##       295       296       297       298       299       300       301 
## 2980.4813 3627.8532 3868.6555 5279.3193 1967.3827 3260.5177 3750.7018 
##       302       303       304       305       306       307       308 
## 2511.6447 3568.9245 5085.5859 4634.2668 4257.6736 3052.8976 4233.3116 
##       309       310       311       312       313       314       315 
## 1697.7629 3747.0900 5558.0418 3176.3690 2000.0000 6469.1991 3925.3056 
##       316       317       318       319       320       321       322 
## 2776.2703 3167.5999 2579.5554 4626.3105 1978.3707 3113.8108 4562.3706 
##       323       324       325       326       327       328       329 
## 2401.2325 3381.6916 5705.4339 4195.8846 5072.1586 2807.0494 4149.9090 
##       330       331       332       333       334       335       336 
## 3915.5591 2434.3177 1834.7254 5551.8376 2300.8830 4204.1619 4314.9356 
##       337       338       339       340       341       342       343 
## 3409.7008 4090.2503 3358.7676 2394.0773 4423.1797 5641.9650 4086.7504 
##       344       345       346       347       348       349       350 
## 5133.4079 1821.7397 2051.9631 2792.7409 2688.2789 2298.8358 3097.2476 
##       351       352       353       354       355       356       357 
## 1734.7859 4959.3784 6290.4614 6087.7603 2603.4816 4793.0482 4384.8992 
##       358       359       360       361       362       363       364 
## 3245.0830 1992.5132 2241.6517 5289.1904 3293.5868 2180.3633 4093.3940 
##       365       366       367       368       369       370       371 
## 2896.4556 4964.7941 2456.0345 3561.6423 2798.1148 2921.4160 3419.0499 
##       372       373       374       375       376       377       378 
## 3666.4040 5742.7648 2000.0000 2552.5681 4575.2696 1353.3040 3562.7825 
##       379       380       381       382       383       384       385 
## 1965.3004 3045.1112 2000.0000 3547.3692 4227.6833 3183.4665 1843.5911 
##       386       387       388       389       390       391       392 
## 4840.6343 4678.6893 2848.1535 2216.1514 4747.8403 3310.4329 2472.8436 
##       393       394       395       396       397       398       399 
## 2770.8081 3529.5515 2480.0623 4305.0348 2573.9978 3535.2141 4293.1882 
##       400       401       402       403       404       405       406 
## 4681.6792 2000.0000 2000.0000 2258.1014 2657.4592 2681.8871 3324.1050 
##       407       408       409       410       411       412       413 
## 1342.2685 5055.0693 1328.5372 2581.1015 2291.3062 5219.3007 4523.0338 
##       414       415       416       417       418       419       420 
## 1025.9878 4371.4689 3490.6248 1781.6586 4120.5775  980.4230 2850.9001 
##       421       422       423       424       425       426       427 
## 4784.0741 5883.3825 6256.3304 1872.4688 3228.2344 4031.8935 3373.9432 
##       428       429       430       431       432       433       434 
## 3479.3191 4494.1888 4771.6102 3988.7726 1546.9730 2067.6811 1389.7658 
##       435       436       437       438       439       440       441 
## 3440.6089 6289.6566 1861.1823 2425.9738 3240.9644 2990.3466  968.8001 
##       442       443       444       445       446       447       448 
## 3746.9585 2102.7647 3458.7101 1938.4093 1372.2371 3586.8255 2273.4479 
##       449       450       451       452       453       454       455 
## 4874.5612 5332.2740 2537.4398 3436.8476 4179.2754 3290.0063 4550.5535 
##       456       457       458       459       460       461       462 
## 4658.8695 4786.3130 6605.5491 1636.0420 3617.7524 2016.8578 2354.7939 
##       463       464       465       466       467       468       469 
## 1357.5312 3563.0373 2000.0000 2613.5016 5645.7864 4086.3996 2492.0835 
##       470       471       472       473       474       475       476 
## 4054.1301 2294.3649 5356.1219 2000.0000 2014.2563 3731.3426 2022.4256 
##       477       478       479       480       481       482       483 
## 4894.7094 6212.6432 2021.4659 3395.1032 1934.9898 3197.2660 2000.0000 
##       484       485       486       487       488       489       490 
## 3743.7454 5628.5881 5174.6673 3338.7824 4301.3090 3340.1234 5928.5609 
##       491       492       493       494       495       496       497 
## 4531.1771 2711.7764 3096.6050 3474.4207 3373.8249 4109.1956 3656.4348 
##       498       499       500       501       502       503       504 
## 2653.8796 4362.6167 2000.0000 3873.3382 3443.4886 6111.2858 3708.9543 
##       505       506       507       508       509       510       511 
## 5361.7521 2126.3082 5029.5590 3196.1751 2000.0000 3461.3221 2000.0000 
##       512       513       514       515       516       517       518 
## 2917.4019 3946.9884 1448.0143 2000.0000 2424.7252 5032.5451 5125.9894 
##       519       520       521       522       523       524       525 
## 2000.0000 5256.4458 3080.7220 2773.5703 4049.5989 3991.0551 2824.5492 
##       526       527       528       529       530       531       532 
## 1893.7884 2031.4574 2473.4712  721.3670 3592.7594 1396.2331 2582.4729 
##       533       534       535       536       537       538       539 
## 4176.3838 4343.2223 1979.2867 4014.5000 3375.3483 1366.0456 3189.8561 
##       540       541       542       543       544       545       546 
## 1161.2068 1869.9740 2465.2835 3677.5830 4480.7255 1584.2975 3064.6811 
##       547       548       549       550       551       552       553 
## 2399.6935 6160.8760 5373.6045 2137.4289 2536.5392 2846.9988 2208.8553 
##       554       555       556       557       558       559       560 
## 3759.5578 1863.6321 2611.2350 3889.0760 2472.3829 4627.7184 3247.2599 
##       561       562       563       564       565       566       567 
## 1861.1246 3931.0912 3586.9478 2000.0000 4464.2981 3471.9442 7453.1164 
##       568       569       570       571       572       573       574 
## 3270.0369 2000.0000 5635.6963 2581.2131 3508.3973 3343.6335 2380.2044 
##       575       576       577       578       579       580       581 
## 2464.7476  659.8902 3587.9335 3207.5862 2000.0000 2606.1483 2068.9992 
##       582       583       584       585       586       587       588 
## 5195.3187 3060.4432 5184.3240 1854.2999 2402.9845 2643.6263 3796.0621 
##       589       590       591       592       593       594       595 
## 6026.6984 4703.8325 3771.2015 2522.4436  802.7681 3978.0343 4770.6990 
##       596       597       598       599       600       601       602 
## 4097.7136 2000.0000 3660.8115 3412.6754 3851.9403 4322.8133 3617.4510 
##       603       604       605       606       607       608       609 
## 3537.6751 3065.6844 4341.7121 4453.8111 5807.3948 2545.9046 3053.6363 
##       610       611       612       613       614       615       616 
## 3133.7937 3095.5787 4558.8087 1248.6866 2542.2062 4105.8298 1777.7853 
##       617       618       619       620       621       622       623 
## 1750.5918 4010.8715 3578.6725 5486.1230 3294.0974 3372.7582 2399.1971 
##       624       625       626       627       628       629       630 
## 3751.7739 3388.2673 5912.4765 4809.6676 2351.7132 1681.0211 5257.2220 
##       631       632       633       634       635       636       637 
## 2426.9810 3971.5118 2993.2749 3335.4208 2828.3992 3405.5634 2612.7548 
##       638       639       640       641       642       643       644 
## 5098.2633 3319.3062 4523.5780 3328.6846 2000.0000 1809.3216 1557.9559 
##       645       646       647       648       649       650       651 
## 4158.5706 5155.3326 4166.1323 4383.0572 2000.0000 2000.0000 1963.2211 
##       652       653       654       655       656       657       658 
## 2068.7189 5005.0012  751.8128 2859.8072 3433.9346 3815.1683 1531.7240 
##       659       660       661       662       663       664       665 
## 2863.4025 2000.0000 4425.6697 3816.4021 3703.2805 3845.5344 3119.9043 
##       666       667       668       669       670       671       672 
## 3696.4220 4271.3414 2000.0000 3816.7516 2614.4431 3755.2281 4637.4716 
##       673       674       675       676       677       678       679 
## 5129.1150 2065.5408 3667.2570 3748.1448 3639.1263 4682.9051 1617.9560 
##       680       681       682       683       684       685       686 
## 2764.8353 5666.4050 3611.7286 4136.0842 1988.9391 3345.2733 2517.0846 
##       687       688       689       690       691       692       693 
## 1525.2692 3662.1926 3880.1854 2000.0000 3156.2732 2607.8734 1893.7204 
##       694       695       696       697       698       699       700 
## 3670.4877 2051.0348 3789.6294 1947.6120 4485.2655 3403.8357 3836.0803 
##       701       702       703       704       705       706       707 
## 4024.2873 1446.2240 4085.8489 1564.5751 2731.4875 3662.6871 4026.4109 
##       708       709       710       711       712       713       714 
## 5823.5532 3733.8114 2572.2931 3465.4814 2264.3329 4540.4759 2000.0000 
##       715       716       717       718       719       720       721 
## 2014.7512 1980.0693 2460.0053 2098.4757 2217.0281 2210.2801 4395.7235 
##       722       723       724       725       726       727       728 
## 1762.9059 2440.5830 2000.0000 1745.0726 2311.4700 3017.6323 2000.0000 
##       729       730       731       732       733       734       735 
## 2314.3913 2828.2221 4355.5057 4648.0187 4202.0901 1850.4918 4259.1434 
##       736       737       738       739       740       741       742 
## 2000.0000 2682.5724 2168.3668 3566.3392 3691.8431 5385.8401 3088.2401 
##       743       744       745       746       747       748       749 
## 4669.0459 3089.4658 4380.9199 4413.7615 5308.6999 4041.0334 2000.0000 
##       750       751       752       753       754       755       756 
## 3405.2866 2729.3744 3647.0320 4509.7891 2000.0000 2254.5593 3736.6326 
##       757       758       759       760       761       762       763 
## 3701.4252 3743.2099 3166.2989 3261.6503 3396.6114 5990.8008 4790.6103 
##       764       765       766       767       768       769       770 
## 5040.5302 4021.4044 4667.6567 3487.3433 3774.0468 4159.3397 1640.0853 
##       771       772       773       774       775       776       777 
## 4335.3192 4556.9921 4189.3534 5147.8299 1834.1695 4294.4506 2339.0695 
##       778       779       780       781       782       783       784 
## 2000.0000 3414.9999 1227.9877 3753.6668 6211.8570 2208.3233 4251.2390 
##       785       786       787       788       789       790       791 
## 2758.4411 4958.4187 2637.2589 3655.2428 3527.2230 2000.0000 2273.3296 
##       792       793       794       795       796       797       798 
## 3166.4971 4438.6095 2000.0000 2790.8712 3588.9668 3115.5549 4607.5744 
##       799       800       801       802       803       804       805 
## 5389.7995 4399.8403 3690.5187 1463.3549 1609.1311 3845.0838 1648.2361 
##       806       807       808       809       810       811       812 
## 2912.5595 2756.4115 3174.0003 1725.8891 4454.2380 4851.4102 2018.8889 
##       813       814       815       816       817       818       819 
## 2434.9622 1386.5693 4107.7133 2936.9626 2087.8648 5505.9136 4871.0305 
##       820       821       822       823       824       825       826 
## 4155.2731 4713.7139 2992.0426 4650.6613 4131.4457 4794.4007 3504.6232 
##       827       828       829       830       831       832       833 
## 3009.0891 2482.5494 1756.1913 2824.7062 2134.7041 2009.9854 4527.1828 
##       834       835       836       837       838       839       840 
## 1838.8658 2447.8514 3405.2221 3257.0972 3291.3215 2392.6543 3207.4356 
##       841       842       843       844       845       846       847 
## 3107.4916 3180.1661 3724.5037 2342.8716 1954.3688 1703.2706 3959.3334 
##       848       849       850       851       852       853       854 
## 2000.0000 6077.2044 5192.9478 5010.7070 2000.0000 2825.1378 3342.8682 
##       855       856       857       858       859       860       861 
## 4481.1586  684.5560 2828.8419 2000.0000 6125.5947 2680.9243 3548.8139 
##       862       863       864       865       866       867       868 
## 4965.9986 3368.4374 2152.5261 3770.7142 2101.2546 5692.4990 2509.8495 
##       869       870       871       872       873       874       875 
## 1697.3634 4877.6783 2423.4118 5560.6665 2000.0000 6268.6442 3743.2985 
##       876       877       878       879       880       881       882 
## 3587.9373 1926.1657 3322.1229 1029.9067 4395.0980 4894.1729 3943.8193 
##       883       884       885       886       887       888       889 
## 2303.9316 3386.7790 4971.6738 3810.5603 5696.0691 1905.6700 2180.9991 
##       890       891       892       893       894       895       896 
## 3329.5617 3934.8720 2671.9134 2690.3883 1566.2326 2653.8899 3896.0439 
##       897       898       899       900       901       902       903 
## 2000.0000 2715.6417 3875.8001 4066.5545 2447.5922 3253.4630 5934.0456 
##       904       905       906       907       908       909       910 
## 1274.9391 1606.8546 3597.2992 2000.0000 2476.5968 4117.7982 2000.0000 
##       911       912       913       914       915       916       917 
## 5638.7343 3274.6632 1269.9419 2940.0866 1892.1253 3757.6464 6660.2413 
##       918       919       920       921       922       923       924 
## 4297.8421 3337.6806 3628.0603 3234.5746 2547.5235 2887.7105 2405.7623 
##       925       926       927       928       929       930       931 
## 1918.2664 1784.2764 3537.7494 2000.0000 5224.3052 1220.4532 3921.0493 
##       932       933       934       935       936       937       938 
## 6284.6922 2535.0047 5119.8503 3486.0289 4506.1086 2431.7348 3847.2881 
##       939       940       941       942       943       944       945 
## 5053.5005 3610.9904 6017.5061 2468.3077 4181.0674 2788.6660 5389.9837 
##       946       947       948       949       950       951       952 
## 2952.4111 3722.6708 3121.5937 1740.2257 3893.6360 1804.4679 3749.7618 
##       953       954       955       956       957       958       959 
## 4113.1418 4246.3048 1467.7886 4609.0037 4193.2543 3095.7815 4032.6246 
##       960       961       962       963       964       965       966 
## 2403.6728 4171.8461 4264.8711 4051.1379 2651.8912 2000.0000 5607.1415 
##       967       968       969       970       971       972       973 
## 4661.0486 4050.8799 2816.2490 3403.3790 4719.7899 3053.3262 2629.8313 
##       974       975       976       977       978       979       980 
## 4363.7834  961.5790 3728.0105 4523.5830 2664.9811 2894.2174 4366.6149 
##       981       982       983       984       985       986       987 
## 3133.7728 2309.3879 5578.2431 4509.8256 5032.7826 4586.6842 4131.0257 
##       988       989       990       991       992       993       994 
## 2299.3996 3296.2055 4942.7357 3152.8543 2000.0000 5470.4331 3162.6487 
##       995       996       997       998       999      1000      1001 
## 3800.7807 2802.4518 4149.4910 1389.9292 3601.0357 3709.2249 3262.6434 
##      1002      1003      1004      1005      1006      1007      1008 
## 5799.4990 4544.9863 2993.2635 4336.5390 1919.1412 2408.4800 4250.5255 
##      1009      1010      1011      1012      1013      1014      1015 
## 3914.3785 2615.0939 4092.1296 3322.9876 2545.3782 1392.5899 2531.0839 
##      1016      1017      1018      1019      1020      1021      1022 
## 3723.8603 2873.6500 4646.8845 3621.6162 1479.3372 3246.9035 2000.0000 
##      1023      1024      1025      1026      1027      1028      1029 
## 5236.7328 4633.9291 5372.0443 5578.8063 4145.0064 2000.0000 3129.4036 
##      1030      1031      1032      1033      1034      1035      1036 
## 1648.9005 2947.9055 2286.6422 3775.9124 2904.5040 2967.8002 2000.0000 
##      1037      1038      1039      1040      1041      1042      1043 
## 2149.3449 2906.7862 3596.6641 3754.6263 2989.7219 2000.0000 3578.9240 
##      1044      1045      1046      1047      1048      1049      1050 
## 1425.7081 6377.2362 2526.1414 4428.1060 4731.7049 4256.0483 4417.0029 
##      1051      1052      1053      1054      1055      1056      1057 
## 4475.7974 5158.9201 5837.8510  890.0905 5121.8161 3017.9503 3337.9085 
##      1058      1059      1060      1061      1062      1063      1064 
## 1468.7529 4708.6658 5259.6665 2882.9223 5371.2462 5048.3411 3134.9563 
##      1065      1066      1067      1068      1069      1070      1071 
## 2168.7655 3371.8661 3014.9032 3873.5865 1538.8078 4168.0075 3517.4706 
##      1072      1073      1074      1075      1076      1077      1078 
## 2556.3417 2000.0000 6121.2401 3645.2742 2000.0000 1850.5528 1976.5221 
##      1079      1080      1081      1082      1083      1084      1085 
## 4433.1180 1534.5815 5351.8874 5346.3866 1721.9465 3919.1313 4228.5993 
##      1086      1087      1088      1089      1090      1091      1092 
## 6578.4180 1494.3962 3223.1745 3447.7981 1124.3392 3053.1558 2143.5149 
##      1093      1094      1095      1096      1097      1098      1099 
## 2727.7673 3754.3049 3378.2384 3539.8584 3547.9865 2994.8297 3895.6991 
##      1100      1101      1102      1103      1104      1105      1106 
## 5910.2353 1308.7881 2365.0668 4172.6115 4100.7849 4495.3700 3361.3443 
##      1107      1108      1109      1110      1111      1112      1113 
## 2746.0468 4095.3000 2000.0000 3476.7201 4648.3938 2000.0000 3326.3365 
##      1114      1115      1116      1117      1118      1119      1120 
## 2960.7894 3530.4070  574.3626 4093.5768 6447.9255 5269.1498 2977.6538 
##      1121      1122      1123      1124      1125      1126      1127 
## 3344.2481 3928.8230 1118.7815 3357.0446 2002.3822 3440.2021 6421.6108 
##      1128      1129      1130      1131      1132      1133      1134 
## 1393.7947 1424.2239 3939.6724 1854.1537 2000.0000 5490.7785 2228.8960 
##      1135      1136      1137      1138      1139      1140      1141 
## 5398.5155 2490.5929 3464.3535 3329.4839 2132.5166 3580.8877 2004.1147 
##      1142      1143      1144      1145      1146      1147      1148 
## 3765.9684 2000.0000 4282.7292 4895.7356 2068.8006 4149.7673 6148.1756 
##      1149      1150      1151      1152      1153      1154      1155 
## 3129.1637 5213.9280 4395.6413 6270.0900 2000.0000 4174.9532 5656.3352 
##      1156      1157      1158      1159      1160      1161      1162 
## 4485.8155 1267.8182 4017.7023 3287.2779 1570.7894 2265.0998 2809.1630 
##      1163      1164      1165      1166      1167      1168      1169 
## 3160.7620 3679.9034 1911.9461 4113.6238 3167.0363 4389.8096 4189.6834 
##      1170      1171      1172      1173      1174      1175      1176 
## 1939.8442 3808.0891 5221.2923 4792.0136 5671.5329 1836.6587 2950.3820 
##      1177      1178      1179      1180      1181      1182      1183 
## 4116.3350  997.6755 3483.0364 5448.7485 1984.3765 5986.5711 2899.9184 
##      1184      1185      1186      1187      1188      1189      1190 
## 4342.3235 6058.6544 1795.3976 2714.0621 2280.2432 4645.2189 3198.2330 
##      1191      1192      1193      1194      1195      1196      1197 
## 4567.5171 3011.1530 1287.5031 5105.0632 3147.0383 1587.3175 3750.2047 
##      1198      1199      1200      1201      1202      1203      1204 
## 1866.9430 5203.2128 4361.5005 3656.7199 3036.6842 3857.2174 3418.1724 
##      1205      1206      1207      1208      1209      1210      1211 
## 3434.2173 1220.3905 4605.8821 4055.2714 4202.4572 2000.0000 1577.1807 
##      1212      1213      1214      1215      1216      1217      1218 
## 3529.4928 5611.9871 2496.4079 4139.2545 1939.4425 4822.4694 3633.8120 
##      1219      1220      1221      1222      1223      1224      1225 
## 3145.5930 2000.0000 1273.0826 3173.6946 5676.0075 3710.8652 4389.0965 
##      1226      1227      1228      1229      1230      1231      1232 
## 2084.6189 2119.6620 2698.2972 4127.2205 6214.4100 3212.7939 2953.4769 
##      1233      1234      1235      1236      1237      1238      1239 
## 3916.2452 5296.1496 2119.1362 1186.6845 1710.0797 5531.0194 3169.2833 
##      1240      1241      1242      1243      1244      1245      1246 
## 2022.8587 4263.0293 1069.7967 1162.3425 3248.0334 2532.0211 6053.8790 
##      1247      1248      1249      1250      1251      1252      1253 
## 4476.7478 1158.1602 1729.6886 2110.0236 3155.3626 5321.4385 1672.3267 
##      1254      1255      1256      1257      1258      1259      1260 
## 4172.5067 2000.0000 2944.6015 2000.0000 1456.5452 4584.5123 3039.9357 
##      1261      1262      1263      1264      1265      1266      1267 
## 3941.4977 3932.4335 1460.0036 2000.0000 3291.8936 3576.8024 4642.2399 
##      1268      1269      1270      1271      1272      1273      1274 
## 1515.6047 4606.5698 3042.0982 3350.9423 4604.1394 2992.2354 3934.9912 
##      1275      1276      1277      1278      1279      1280      1281 
## 2876.2891 2149.3396 4737.1467 4063.2115 3536.6990 2931.1333 4867.6810 
##      1282      1283      1284      1285      1286      1287      1288 
## 3789.1353 4101.4857 4200.1284 4793.1147 3549.0705 4801.8016 2000.0000 
##      1289      1290      1291      1292      1293      1294      1295 
## 2753.6943 3277.1780 5414.0908 3002.4307 1467.8274 4078.2429 4839.2915 
##      1296      1297      1298      1299      1300      1301      1302 
## 3236.0218 3147.1455 3992.7928 5182.9539 4513.9772 2503.1269 2660.6465 
##      1303      1304      1305      1306      1307      1308      1309 
## 2303.5157 2959.3601 1782.3062 2450.8630 2000.0000 5125.0522 2000.0000 
##      1310      1311      1312      1313      1314      1315      1316 
## 5777.2578 4289.8366 3754.9117 6461.2097 3907.0512 2309.5197 1563.4290 
##      1317      1318      1319      1320      1321      1322      1323 
## 2444.0281 2323.7774 2320.8301 2000.0000 2694.0126 4979.8881 5111.1520 
##      1324      1325      1326      1327      1328      1329      1330 
## 1481.9401 2566.8750 2000.0000 3668.9706 3330.9635 1681.7886 2737.1884 
##      1331      1332      1333      1334      1335      1336      1337 
## 2525.0615 2345.8383 2393.2697 2401.9595 3833.9255 3555.4681 2773.5942 
##      1338      1339      1340      1341      1342      1343      1344 
##  940.5686 4216.0003 3776.6796 2860.3115 4223.7897 1819.7105 2576.8445 
##      1345      1346      1347      1348      1349      1350      1351 
## 2000.0000  929.4027 3603.8422 4307.0279 1059.4566 3339.8982 3531.6624 
##      1352      1353      1354      1355      1356      1357      1358 
## 5020.9234 3111.4254 3674.2478 2017.6938  709.6880 2918.1825 4585.2871 
##      1359      1360      1361      1362      1363      1364      1365 
## 2292.2119 2000.0000 1701.4844 1438.4875 5753.4083 2371.9369 2574.7930 
##      1366      1367      1368      1369      1370      1371      1372 
## 3687.4002 5265.6215 5006.4076 4165.0431 4867.3322 4150.9964 2673.0470 
##      1373      1374      1375      1376      1377      1378      1379 
## 2000.0000 3425.4049 3914.9526 4249.1858 4731.9220 4171.0264 2438.3164 
##      1380      1381      1382      1383      1384      1385      1386 
## 3428.8220 5712.3806 5645.5186 4492.3080 2000.0000 2000.0000 1524.5283 
##      1387      1388      1389      1390      1391      1392      1393 
## 2053.5551 2099.6082 1104.7906 3500.7476 4585.4037 5151.4630 2088.7671 
##      1394      1395      1396      1397      1398      1399      1400 
## 4314.4928 3060.2450 2959.4030 3570.6949 4602.9965 5104.8236 3749.2916 
##      1401      1402      1403      1404      1405      1406      1407 
## 3035.3950 3204.4590 5853.0316 4076.7725 3589.2126 4155.3401 3453.4001 
##      1408      1409      1410      1411      1412      1413      1414 
## 3145.0009 1649.2498 4955.3254 2987.9387 3328.0412 2151.7248 4318.8943 
##      1415      1416      1417      1418      1419      1420      1421 
## 1473.5126 2000.0000 2575.1208 2140.1150 3950.0900 2320.3248 1530.3794 
##      1422      1423      1424      1425      1426      1427      1428 
## 6439.1809 4485.1898 2000.0000 5106.1030 6514.4641 1631.6231 3412.2078 
##      1429      1430      1431      1432      1433      1434      1435 
## 4271.1522 3040.5847 3620.1655 2207.5884 2200.0671 2840.0988 3123.0375 
##      1436      1437      1438      1439      1440      1441      1442 
## 3669.4356 3504.8599 2000.0000 2861.5203 1113.0727 2000.0000 5410.3690 
##      1443      1444      1445      1446      1447      1448      1449 
## 4578.8209 2131.7639 3628.2253 4599.3097 2000.0000 2962.6489 2000.0000 
##      1450      1451      1452      1453      1454      1455      1456 
## 1931.6613 2811.8461 3625.3056 2000.0000 2279.1102 1933.4723 3127.8360 
##      1457      1458      1459      1460      1461      1462      1463 
## 5318.7897 1918.5202 2615.3363 2417.5204  758.0804 3162.9059 2000.0000 
##      1464      1465      1466      1467      1468      1469      1470 
## 3734.1937 3633.0183 4974.6463 2612.5019 2633.5520 3943.1018 3930.1665 
##      1471      1472      1473      1474      1475      1476      1477 
## 3742.7597 2636.3156 1974.6631 3402.6929 3632.5925  839.0055 3850.0536 
##      1478      1479      1480      1481      1482      1483      1484 
## 3585.1458 3134.2401 1693.8258 3962.5133  927.9094 4050.1062 3429.2527 
##      1485      1486      1487      1488      1489      1490      1491 
## 4286.4223 3881.4118 3358.3791 3734.5677 4908.3723 3424.9182 3309.4160 
##      1492      1493      1494      1495      1496      1497      1498 
## 5403.3569 2268.4660 1613.8864 5604.7435 4169.6482 3300.9521 1790.4768 
##      1499      1500      1501      1502      1503      1504      1505 
## 2284.7805 1912.1460 1515.7145 4202.4843 4073.1048 5942.3588 2888.9616 
##      1506      1507      1508      1509      1510      1511      1512 
## 3562.6139 4647.4340 3683.5458 4113.0908 4292.5198 1589.6789 3642.1667 
##      1513      1514      1515      1516      1517      1518      1519 
## 2933.9941 3852.0415 4272.3243 4622.0353 1700.9265 3401.6620 3687.5316 
##      1520      1521      1522      1523      1524      1525      1526 
## 2420.4055 3660.1078 2111.5418 3083.7902 3546.6998 1080.9974 4132.0662 
##      1527      1528      1529      1530      1531      1532      1533 
## 2946.5912 4577.1906 5155.1645 5389.3552 4437.3364 5183.4716 2000.0000 
##      1534      1535      1536      1537      1538      1539      1540 
## 1482.6369 3930.0962 3360.8262 2922.9278 4510.7488 5786.7670 6194.0746 
##      1541      1542      1543      1544      1545      1546      1547 
## 2068.7203 2703.9345 2162.7739 4529.0202 4751.4791 3626.9843 2233.4852 
##      1548      1549      1550      1551      1552      1553      1554 
## 3745.7275 1966.9733 3098.8850 4296.9647 3432.0130 2862.7872 4710.4430 
##      1555      1556      1557      1558      1559      1560      1561 
## 3354.3902 2526.1344 4233.2686 1921.9308 4373.0938 3799.9004 3139.3414 
##      1562      1563      1564      1565      1566      1567      1568 
## 5144.2532 4314.6844 4231.5129 6345.9836 3426.5864 2310.5590 3340.9249 
##      1569      1570      1571      1572      1573      1574      1575 
## 3707.8847 3768.0711 4463.3211 3694.9722 2691.1719 4743.9018 1942.8352 
##      1576      1577      1578      1579      1580      1581      1582 
## 4497.8862 3996.7766 2556.0326 2769.7450 3730.2779 3861.2122 1812.1074 
##      1583      1584      1585      1586      1587      1588      1589 
## 2000.0000 2485.0203 2435.2594 2875.8842 1575.2034 3490.9795 1367.7754 
##      1590      1591      1592      1593      1594      1595      1596 
## 4852.1413 3871.2480 5745.6542 2213.5460 2000.0000 2735.9126 3036.7288 
##      1597      1598      1599      1600      1601      1602      1603 
## 3666.6638 3822.5202 3539.3824 3526.4903 4316.6611 4414.2025 3454.6280 
##      1604      1605      1606      1607      1608      1609      1610 
## 2000.0000 3519.4993 5856.0771 1253.6929 2000.0000 3461.5064 5942.9003 
##      1611      1612      1613      1614      1615      1616      1617 
## 3691.8393 2000.0000 3451.1722 4381.4749 4156.3960 4584.8590 2815.6177 
##      1618      1619      1620      1621      1622      1623      1624 
## 4671.2429 3589.9591 3799.3850 4579.7008 2141.9938 5329.6947 2000.0000 
##      1625      1626      1627      1628      1629      1630      1631 
## 2848.1033 3238.9940 3730.4143 2957.4784 2844.9391 5893.6492 3223.4677 
##      1632      1633      1634      1635      1636      1637      1638 
## 4994.9260 2692.7158 4122.7903 2675.3300 3430.0008 4900.4182 3241.7170 
##      1639      1640      1641      1642      1643      1644      1645 
## 3154.7055 3393.5756 2000.0000 2000.0000 3043.9669 3546.4320 3607.6089 
##      1646      1647      1648      1649      1650      1651      1652 
## 1027.1243 4154.6847 3044.3760 1301.6588 4679.0071 3178.2471 1064.9144 
##      1653      1654      1655      1656      1657      1658      1659 
## 3866.9743 1578.8153 4994.2468 4418.8721  473.9855 2222.1654 3004.5917 
##      1660      1661      1662      1663      1664      1665      1666 
## 2510.9863 4329.6999 5996.4850 4945.4441 5538.4402 6603.3229 1751.0016 
##      1667      1668      1669      1670      1671      1672      1673 
## 3995.6488 3733.8442 6421.8057 2418.9236 5145.3619 2000.0000 5183.7022 
##      1674      1675      1676      1677      1678      1679      1680 
## 3284.5290 1342.1954 2966.4908 2651.9803 3045.5640 4357.9492 2000.0000 
##      1681      1682      1683      1684      1685      1686      1687 
## 2385.4868 5860.3794 4950.7721 3829.3469 4121.8760 2778.4126 1144.5693 
##      1688      1689      1690      1691      1692      1693      1694 
## 4645.4759  868.9109 2064.9674 1888.5405 2000.0000 2834.5626 4499.7543 
##      1695      1696      1697      1698      1699      1700      1701 
## 3353.6516 5063.1897 2000.0000 5071.0228 6383.9937 2478.0614 4702.6415 
##      1702      1703      1704      1705      1706      1707      1708 
## 1196.1815 3265.8169 3524.8014 3844.2949 3103.4184 2000.0000 3151.8250 
##      1709      1710      1711      1712      1713      1714      1715 
## 6226.9004 2259.9599 3904.9859 1675.2918 3763.4710 3574.2207 4605.8931 
##      1716      1717      1718      1719      1720      1721      1722 
## 1925.0893 2796.1525 2000.0000 3474.2740 1619.9041 2958.8068 4217.5413 
##      1723      1724      1725      1726      1727      1728      1729 
## 4081.5829 4991.3372 4232.8232  257.5929 1908.4152 4133.5781 5471.4220 
##      1730      1731      1732      1733      1734      1735      1736 
## 4075.7584 3609.1964 2200.9555 1350.0129 3479.3983 2000.0000 2401.5940 
##      1737      1738      1739      1740      1741      1742      1743 
## 3505.3313 2528.3423 2000.0000 2705.3440 6779.6997 3057.4868 1414.6873 
##      1744      1745      1746      1747      1748      1749      1750 
## 2228.2291 3634.6959 3993.3999 4217.9211 4012.4825 4537.8548 3446.2949 
##      1751      1752      1753      1754      1755      1756      1757 
## 3353.2942 5212.7783 2966.3350 4741.5831 2182.3376 3593.3177 2936.1236 
##      1758      1759      1760      1761      1762      1763      1764 
## 4152.7778 2555.4282 3023.3276 5871.5606 1305.0027 2504.8579 2485.9208 
##      1765      1766      1767      1768      1769      1770      1771 
## 4787.9123 5615.3031 2349.2389 4017.5785 2265.7488 1870.4974 1679.4240 
##      1772      1773      1774      1775      1776      1777      1778 
## 3262.6825 3214.1388 5778.7782 3601.3633 2005.6202 4650.0099 3883.4083 
##      1779      1780      1781      1782      1783      1784      1785 
## 4307.7400 4166.9914 3378.5911 2319.9369 4669.5353 3121.1434 3124.8286 
##      1786      1787      1788      1789      1790      1791      1792 
## 2587.1625 3340.5088 1896.0189 5227.1544 4299.1088 2000.0000 2376.4156 
##      1793      1794      1795      1796      1797      1798      1799 
## 3882.3887 4036.5142 3553.2028 4472.1176 1723.3196 3533.3928 1926.1881 
##      1800      1801      1802      1803      1804      1805      1806 
## 4488.7878 4441.6676 2000.0000 2728.4679 4264.0063 2623.5530 4277.6630 
##      1807      1808      1809      1810      1811      1812      1813 
## 4832.5645 5108.6623 2938.5937 3362.2183 2636.5791 2442.5120 4816.0181 
##      1814      1815      1816      1817      1818      1819      1820 
## 3720.3885 2000.0000 2441.4661 1437.1166 3129.4608 3399.9861 3455.6573 
##      1821      1822      1823      1824      1825      1826      1827 
## 1430.8233 2000.0000 1970.5367 3611.7749 4046.8717 5966.3118 3859.5436 
##      1828      1829      1830      1831      1832      1833      1834 
## 4013.0822 1056.0820 3924.3658 3241.6813 1544.3972 3348.6362 1942.8725 
##      1835      1836      1837      1838      1839      1840      1841 
## 4311.3402 3514.2683 2746.3758 2000.0000 1610.6692 4979.2511 2202.4496 
##      1842      1843      1844      1845      1846      1847      1848 
## 2991.2337 3384.2577 3684.3141 1969.3035 3102.3483 3118.6422 2832.3167 
##      1849      1850      1851      1852      1853      1854      1855 
## 3794.8762 3716.7625 3631.7870 3120.2466 3765.1789  732.4628  939.4335 
##      1856      1857      1858      1859      1860      1861      1862 
## 3416.6051 1889.2075 3590.8436 2000.0000 1977.7907 2000.0000 1176.4707 
##      1863      1864      1865      1866      1867      1868      1869 
## 3263.8690 2769.3284 4203.0660 2755.5847 2130.9725 2312.7288 4837.5474 
##      1870      1871      1872      1873      1874      1875      1876 
## 3791.5895 5209.5092 3800.1937 1433.1828 2436.8744 3827.5529 3651.8262 
##      1877      1878      1879      1880      1881      1882      1883 
## 1326.4718 5631.0517 4726.9224 4221.6546 2447.4303 3084.3745 1860.3378 
##      1884      1885      1886      1887      1888      1889      1890 
## 2000.0000 3759.5904 3156.1098 2398.5777 2942.1133 4336.8642 4713.4048 
##      1891      1892      1893      1894      1895      1896      1897 
## 2156.3950 3431.3357 1467.5322 1491.7845 4732.3184 3224.7778 2000.0000 
##      1898      1899      1900      1901      1902      1903      1904 
## 3741.7002 3083.8013 4448.6313 3511.2217 4631.0157 1834.6171 3675.2062 
##      1905      1906      1907      1908      1909      1910      1911 
## 4020.0915 1441.0089 1603.9728 4017.7788 4317.4266 5443.9310 3166.7050 
##      1912      1913      1914      1915      1916      1917      1918 
## 3973.8591 1810.3200 4251.1832 4116.7982 3740.3830 3017.4052 1486.2741 
##      1919      1920      1921      1922      1923      1924      1925 
## 1631.6594 3151.1671 1730.9150 3934.6221 2365.5231 3267.5674 2000.0000 
##      1926      1927      1928      1929      1930      1931      1932 
##  767.4163 2928.3569 3382.0716 1149.6548 2951.5166  857.6289 3891.8662 
##      1933      1934      1935      1936      1937      1938      1939 
## 5047.4313 5139.6335 3112.1137 4604.1086 3194.5751 1263.8083 3591.2063 
##      1940      1941      1942      1943      1944      1945      1946 
## 3386.6801 3168.1406 2556.7887 2582.7207 2145.6172 1371.2136 4220.5689 
##      1947      1948      1949      1950      1951      1952      1953 
## 5363.4098 2183.2947 4879.3458 1676.8388 4096.0459 3281.5568 4274.0956 
##      1954      1955      1956      1957      1958      1959      1960 
## 5144.0862 3736.7820 3868.1782 2000.0000 2700.4981 3796.5960 2000.0000 
##      1961      1962      1963      1964      1965      1966      1967 
## 5803.7582 4205.4803 4220.4667 3063.3256 3497.1128 2091.0219 2338.0400 
##      1968      1969      1970      1971      1972      1973      1974 
## 5007.3146  998.2706 1520.2432 2053.2349 1878.4981 4183.0168 5072.8072 
##      1975      1976      1977      1978      1979      1980      1981 
## 2162.7383 3792.5547 1075.0562 5229.4000 4089.2175 3790.1487 3204.3636 
##      1982      1983      1984      1985      1986      1987      1988 
## 4614.4910 3543.7286 3071.6483 3044.6924 4389.6522 3714.0012 2832.7665 
##      1989      1990      1991      1992      1993      1994      1995 
## 2835.1443 4617.4343 1893.0619 5627.2838 5297.9426 5296.5493 3727.2762 
##      1996      1997      1998      1999      2000      2001      2002 
## 3810.6870 3925.6772 5828.6118 3765.2304 2304.1457 4499.0903 4426.0328 
##      2003      2004      2005      2006      2007      2008      2009 
## 5647.3033 1278.7555 4417.8428 2838.6388 5387.1464 1483.3620 3961.6995 
##      2010      2011      2012      2013      2014      2015      2016 
## 4519.6442 3733.2200 2671.2793 7134.5247 1984.2133 3948.3066 4841.8525 
##      2017      2018      2019      2020      2021      2022      2023 
## 2189.4327 5076.3957 4596.8888 2354.9074 1529.1891 3489.2215 3391.2545 
##      2024      2025      2026      2027      2028      2029      2030 
## 3790.1246 3170.3592 1767.6268 2879.7157 3610.3980 2782.6804 4905.6463 
##      2031      2032      2033      2034      2035      2036      2037 
## 2963.7299 3048.9933 1183.6720 2678.5389 4889.6169 4028.2851 3128.8711 
##      2038      2039      2040      2041      2042      2043      2044 
## 3108.2827 4577.4475 2943.2317 2029.3064 2000.0000 4066.9784 4655.0231 
##      2045      2046      2047      2048      2049      2050      2051 
## 1265.0511 1711.1618 1427.9448 3701.4409 3769.3889 3943.1107 3229.6834 
##      2052      2053      2054      2055      2056      2057      2058 
## 3690.3654 4705.2283 2838.9226 5245.0365 2000.0000 4538.8396 3458.3365 
##      2059      2060      2061      2062      2063      2064      2065 
## 1647.2239 2000.0000 3038.7192 3995.2275 2000.0000 2000.0000 3403.8575 
##      2066      2067      2068      2069      2070      2071      2072 
## 2562.8209 4350.9962 5973.1737 2000.0000 1065.8521 3453.8159 3172.4698 
##      2073      2074      2075      2076      2077      2078      2079 
## 4933.3639 4325.0912 1719.7898 2420.4847 5372.1487 3269.6193 2000.0000 
##      2080      2081      2082      2083      2084      2085      2086 
## 6929.3753  922.1474 4507.9040 4031.8804 1129.1868 2000.0000 1950.5719 
##      2087      2088      2089      2090      2091      2092      2093 
## 3358.3593 4459.3335 2799.7969 4787.1241 2337.4047 1995.0724 3974.6677 
##      2094      2095      2096      2097      2098      2099      2100 
## 3452.8973 3723.3520 4899.1314 4497.9722 3702.5734 5399.6342 4432.7682 
##      2101      2102      2103      2104      2105      2106      2107 
## 2000.0000 2000.0000 6819.0245 3804.8563 2116.5166 2946.2444 4680.2187 
##      2108      2109      2110      2111      2112      2113      2114 
##  926.6597 3134.8945 2582.1384 6395.4160 2024.6560 4192.2381 1511.7049 
##      2115      2116      2117      2118      2119      2120      2121 
## 2443.5511 1015.2999 4909.7739 4011.3178 7217.2536 3068.8725 2464.3455 
##      2122      2123      2124      2125      2126      2127      2128 
## 4233.3490 5662.8025 4566.7010 3527.6764 2000.0000 4513.7353 3600.3518 
##      2129      2130      2131      2132      2133      2134      2135 
## 2776.2541 1580.9736 2607.1232 3758.6481 3897.4451 2000.0000 4831.6181 
##      2136      2137      2138      2139      2140      2141 
##  918.9765 4098.6414 1955.1520  646.8850 3938.2103 2721.3404

Appendix————————————————

suppressWarnings(suppressMessages(library(e1071))) suppressWarnings(suppressMessages(library(MASS))) suppressWarnings(suppressMessages(library(car))) suppressWarnings(suppressMessages(library(corrplot))) suppressWarnings(suppressMessages(library(pROC))) suppressWarnings(suppressMessages(library(caret))) suppressWarnings(suppressMessages(library(tidyr))) suppressWarnings(suppressMessages(library(ggplot2))) suppressWarnings(suppressMessages(library(dplyr))) suppressWarnings(suppressMessages(library(corrplot))) suppressWarnings(suppressMessages(library(kableExtra))) suppressWarnings(suppressMessages(library(gridExtra)))

insurance_data<-as_data_frame(read.csv(‘https://raw.githubusercontent.com/WigodskyD/data-sets/master/insurance_training_data%20(1).csv’),stringsAsFactors=FALSE) head(insurance_data) insurance_data\(OLDCLAIM<-as.numeric(gsub('\\\)|,‘,’‘, insurance_data\(OLDCLAIM)) insurance_data\)INCOME<-as.numeric(gsub(’\\(|,', '', insurance_data\)INCOME)) conditional_oldclaim<-insurance_data\(OLDCLAIM[which(insurance_data\)OLDCLAIM!=0)] hist(insurance_data\(OLDCLAIM) hist(conditional_oldclaim,breaks=32) insurance_data\)OLDCLAIM<-cut(insurance_data\(OLDCLAIM,breaks=c(-.1,.1,3660,6050,9866,max(insurance_data\)OLDCLAIM)),labels=c(1:5)) hist(as.numeric(insurance_data\(OLDCLAIM)) insurance_data %>% separate(URBANICITY, sep='/ ',into=c('home','work'))->insurance_data insurance_data\)home<- as.factor(gsub(‘z_’,‘’,insurance_data\(home)) insurance_data\)INCOME[is.na(insurance_data$INCOME)]<-61898 insurance_data\(INCOME<-cut(insurance_data\)INCOME,breaks=c(-.1,22345,43660,65260,95555,max(insurance_data\(INCOME)),labels=c(1:5)) insurance_data %>% spread(key=CAR_TYPE,value=CAR_TYPE)->insurance_data insurance_data<-insurance_data[,-c(25,27)] colnames(insurance_data)[26]<-'Panel_Truck' colnames(insurance_data)[30]<-'SUV' colnames(insurance_data)[28]<-'Sports_Car' insurance_data\)Panel_Truck<-as.character(insurance_data\(Panel_Truck) insurance_data\)Panel_Truck[insurance_data$Panel_Truck==’Panel Truck’]<-1 insurance_data\(Panel_Truck[is.na(insurance_data\)Panel_Truck)]<-0 insurance_data\(Sports_Car<-as.character(insurance_data\)Sports_Car) insurance_data\(Sports_Car[insurance_data\)Sports_Car==’Sports Car’]<-1 insurance_data\(Sports_Car[is.na(insurance_data\)Sports_Car)]<-0 insurance_data\(Pickup<-as.character(insurance_data\)Pickup) insurance_data\(Pickup[insurance_data\)Pickup==‘Pickup’]<-1 insurance_data\(Pickup[is.na(insurance_data\)Pickup)]<-0 insurance_data\(Van<-as.character(insurance_data\)Van) insurance_data\(Van[insurance_data\)Van==‘Van’]<-1 insurance_data\(Van[is.na(insurance_data\)Van)]<-0 insurance_data\(SUV<-as.character(insurance_data\)SUV) insurance_data\(SUV[insurance_data\)SUV==‘z_SUV’]<-1 insurance_data\(SUV[is.na(insurance_data\)SUV)]<-0 cols<-c(‘work’,‘Panel_Truck’,‘Pickup’,‘Sports_Car’,‘Van’,‘SUV’) insurance_data[cols] <- lapply(insurance_data[cols], factor) insurance_data\(BLUEBOOK<-as.numeric(gsub('\\\)|,‘,’’, insurance_data$BLUEBOOK)) head(insurance_data[,c(10:16)])

insurance_data\(JOB<-as.character(insurance_data\)JOB) insurance_data\(JOB[insurance_data\)JOB==‘’]<-’None’ insurance_data %>% spread(key=JOB,value=JOB,fill=‘0’)->insurance_data colnames(insurance_data)[32]<-’Homemaker’ colnames(insurance_data)[38]<-’BlueCollar’ insurance_data\(Clerical %>% recode('Clerical' = '1')->insurance_data\)Clerical insurance_data\(Doctor %>% recode('Doctor' = '1')->insurance_data\)Doctor insurance_data\(Homemaker %>% recode('Home Maker' = '1')->insurance_data\)Homemaker insurance_data\(Lawyer %>% recode('Lawyer' = '1')->insurance_data\)Lawyer insurance_data\(Manager %>% recode('Manager' = '1')->insurance_data\)Manager insurance_data\(Professional %>% recode('Professional' = '1')->insurance_data\)Professional insurance_data\(Student %>% recode('Student' = '1')->insurance_data\)Student insurance_data\(BlueCollar %>% recode('z_Blue Collar' = '1')->insurance_data\)BlueCollar insurance_data\(YOJ[is.na(insurance_data\)YOJ)]<-0 insurance_data<-insurance_data[,-c(1,35)] insurance_data\(KIDSDRIV<-as.factor(insurance_data\)KIDSDRIV) insurance_data\(HOME_VAL<-as.numeric(gsub('\\\)|,‘,’‘, insurance_data\(HOME_VAL)) '<High School'='1','Bachelors'='3','Masters'='4','PhD'='5','z_High School'='2' insurance_data\)EDUCATION %>% recode(’% mutate(ave_claim_size=CLM_FREQ/OLDCLAIM)->insurance_data_b
optional dataframe with conglomerate column limiting multicollinearity of old claims

Profession_Set<-rep(‘a’,8) Profession_Set<-cbind(Profession_Set,Profession_Set,Profession_Set) colnames(Profession_Set)<-c(‘Profession’,‘Model’,‘P_value’) for(i in 29:36){ column_to_test<-noquote(colnames(insurance_data[i])) regression<-paste0(‘TARGET_FLAG’,‘~’,column_to_test) one_var_model<-glm(data=insurance_data, as.formula(regression),family=binomial(link=‘logit’)) Profession_Set[i-28,1]<-noquote(as.character((one_var_model)\(terms[[3]])) Profession_Set[i-28,2]<-noquote(paste0('y = ',round(summary(one_var_model)\)coeff[2],4),‘x’,‘+’,round(summary(one_var_model)\(coeff[1],4))) Profession_Set[i-28,3]<-noquote(signif(summary(one_var_model)\)coeff[8],3)) }

kable_input<-kable(Profession_Set, “html”) %>% kable_styling(“striped”, full_width = T) %>% column_spec(1, bold = T, color = “cornsilk”, background = “DarkCyan”) %>% column_spec(2, bold = T, color = “DarkCyan”, background = “cornsilk”) %>% column_spec(3, bold = T, color = “DarkCyan”, background = “cornsilk”) add_header_above(kable_input, header = c(“Single Variable Models by Professional Category”=2,‘’=1), bold = TRUE, italic = TRUE)%>% kable_styling(bootstrap_options = “striped”, font_size = 18)

wealth_set<-rep(‘a’,6) wealth_set<-cbind(wealth_set,wealth_set,wealth_set) colnames(wealth_set)<-c(‘Variable’,‘Model’,‘P_value’)

j<-1 for(i in c(7,9,12,15,22,20)){ column_to_test<-noquote(colnames(insurance_data[i])) regression<-paste0(‘TARGET_FLAG’,‘~’,column_to_test) one_var_model<-glm(data=insurance_data, as.formula(regression),family=binomial(link=‘logit’)) wealth_set[j,1]<-noquote(as.character((one_var_model)\(terms[[3]])) wealth_set[j,2]<-noquote(paste0('y = ',round(summary(one_var_model)\)coeff[2],7),‘x’,‘+’,round(summary(one_var_model)\(coeff[1],4))) wealth_set[j,3]<-noquote(signif(summary(one_var_model)\)coeff[8],3)) j<-j+1 }

kable_input<-kable(wealth_set, “html”) %>% kable_styling(“striped”, full_width = T) %>% column_spec(1, bold = T, color = “AliceBlue”, background = “lightslategray”) %>% column_spec(2, bold = T, color = “lightslategray”, background = “AliceBlue”) %>% column_spec(3, bold = T, color = “lightslategray”, background = “AliceBlue”) add_header_above(kable_input, header = c(“Single Variable Models by Wealth Measures”=2,‘’=1), bold = TRUE, italic = TRUE)%>% kable_styling(bootstrap_options = “striped”, font_size = 18) plota<-ggplot()+geom_boxplot(data=insurance_data, y=insurance_data\(INCOME,aes(y=insurance_data\)INCOME,x=insurance_data\(TARGET_FLAG,group=insurance_data\)TARGET_FLAG))+labs(x=‘target’,y=‘income’)+ theme(panel.background = element_rect(fill = ‘Wheat’)) plotb<-ggplot()+geom_boxplot(data=insurance_data, y=insurance_data\(HOME_VAL,aes(y=insurance_data\)HOME_VAL,x=insurance_data\(TARGET_FLAG,group=insurance_data\)TARGET_FLAG))+labs(x=‘target’,y=‘home value’)+ theme(panel.background = element_rect(fill = ‘Wheat’)) plotc<-ggplot()+geom_boxplot(data=insurance_data, y=insurance_data\(BLUEBOOK,aes(y=insurance_data\)BLUEBOOK,x=insurance_data\(TARGET_FLAG,group=insurance_data\)TARGET_FLAG))+labs(x=‘target’,y=‘Bluebook Value’)+ theme(panel.background = element_rect(fill = ‘Wheat’)) grid.arrange(plota,plotb,plotc,nrow = 1)

claim_size_data<-insurance_data[which(insurance_data$TARGET_FLAG==1),] claimSize_set<-rep(‘a’,8) claimSize_set<-cbind(claimSize_set,claimSize_set,claimSize_set) colnames(claimSize_set)<-c(‘Variable’,‘Model’,‘P_value’) j<-1 for(i in c(9,15,17,18,19,20,21,22)){ column_to_test<-noquote(colnames(claim_size_data[i])) regression<-paste0(‘TARGET_AMT’,‘~’,column_to_test) one_var_model<-lm(data=claim_size_data, as.formula(regression) ) claimSize_set[j,1]<-noquote(as.character((one_var_model)\(terms[[3]])) claimSize_set[j,2]<-noquote(paste0('y = ',round(summary(one_var_model)\)coeff[2],7),‘x’,‘+’,round(summary(one_var_model)\(coeff[1],4))) claimSize_set[j,3]<-noquote(signif(summary(one_var_model)\)coeff[8],3)) j<-j+1 }

kable_input<-kable(claimSize_set, “html”) %>% kable_styling(“striped”, full_width = T) %>% column_spec(1, bold = T, color = “cornsilk”, background = “DarkCyan”) %>% column_spec(2, bold = T, color = “DarkCyan”, background = “cornsilk”) %>% column_spec(3, bold = T, color = “DarkCyan”, background = “cornsilk”) add_header_above(kable_input, header = c(“Single Variable Models by Wealth Measures”=2,‘’=1), bold = TRUE, italic = TRUE)%>% kable_styling(bootstrap_options = “striped”, font_size = 18)

plota<-ggplot()+geom_boxplot(data=claim_size_data, y=claim_size_data\(TARGET_AMT,aes(y=claim_size_data\)TARGET_AMT,x=claim_size_data\(RED_CAR,group=claim_size_data\)RED_CAR))+labs(x=‘Red Car’,y=‘Claim Amount’)+ theme(panel.background = element_rect(fill = ‘#c7e2d1’)) plotb<-ggplot()+geom_point(data=claim_size_data, aes(y=claim_size_data\(TARGET_AMT,x=claim_size_data\)HOME_VAL))+labs(x=‘Home Value’,y=‘Claim Amount’)+ theme(panel.background = element_rect(fill = ‘#c7e2d1’))+xlim(10,800000) plotc<-ggplot()+geom_point(data=claim_size_data, aes(y=claim_size_data\(TARGET_AMT,x=claim_size_data\)BLUEBOOK))+labs(x=‘Bluebook’,y=‘Claim Amount’)+ theme(panel.background = element_rect(fill = ‘#c7e2d1’))+xlim(10,75000) plotd<-ggplot()+geom_boxplot(data=claim_size_data, y=claim_size_data\(TARGET_AMT,aes(y=claim_size_data\)TARGET_AMT,x=claim_size_data\(OLDCLAIM,group=claim_size_data\)OLDCLAIM))+labs(x=‘Old Claim Amt’,y=‘Claim Amount’)+ theme(panel.background = element_rect(fill = ‘#c7e2d1’))

plote<-ggplot()+geom_boxplot(data=claim_size_data, y=claim_size_data\(TARGET_AMT,aes(y=claim_size_data\)TARGET_AMT,x=claim_size_data\(CLM_FREQ,group=claim_size_data\)CLM_FREQ))+labs(x=‘Claim Freq’,y=‘Claim Amount’)+ theme(panel.background = element_rect(fill = ‘#c7e2d1’))+ylim(0,30000)

plotg<-ggplot()+geom_point(data=claim_size_data, aes(y=claim_size_data\(TARGET_AMT,x=claim_size_data\)MVR_PTS))+labs(x=‘Points on License’,y=‘Claim Amount’)+ theme(panel.background = element_rect(fill = ‘#c7e2d1’))+xlim(0,15) ploth<-ggplot()+geom_point(data=claim_size_data, aes(y=claim_size_data\(TARGET_AMT,x=claim_size_data\)CAR_AGE))+labs(x=‘Car Age’,y=‘Claim Amount’)+ theme(panel.background = element_rect(fill = ‘#c7e2d1’))+xlim(0,30) grid.arrange(plota,plotb,plotc,plotd,plote,plotg,ploth,nrow = 2)

set.seed(102) insurance_data\(CAR_AGE[is.na(insurance_data\)CAR_AGE)]<-mean(insurance_data\(CAR_AGE,na.rm=TRUE) insurance_data\)AGE[is.na(insurance_data$AGE)]<-mean(insurance_data\(AGE,na.rm=TRUE) insurance_data\)HOME_VAL[is.na(insurance_data$HOME_VAL)]<-mean(insurance_data\(HOME_VAL,na.rm=TRUE) testing_indices<-sample.int(length(insurance_data\)AGE),size=.25*length(insurance_data$AGE)) testing_set<-insurance_data[testing_indices,] training_set<-insurance_data[-testing_indices,]

first_logit<-glm(data=training_set, TARGET_FLAG~KIDSDRIV +AGE+HOMEKIDS +YOJ + INCOME + PARENT1+ HOME_VAL + MSTATUS + SEX + #EDUCATION + TRAVTIME+ CAR_USE + BLUEBOOK + TIF + RED_CAR + OLDCLAIM + CLM_FREQ + REVOKED + MVR_PTS + CAR_AGE + work + Panel_Truck + Pickup + Sports_Car + Van + SUV + Clerical
+ Doctor + Homemaker + Lawyer + Manager + Professional + Student + BlueCollar ,family=binomial(link=‘logit’)) revoked_model<-glm(data=training_set, TARGET_FLAG~REVOKED,family=binomial(link=‘logit’)) The stepwise regression model produced the model below. step(revoked_model,scope=list(lower=revoked_model,upper=first_logit) ,direction=“forward”) forward_step_model<-glm(data=training_set, TARGET_FLAG~REVOKED+work+HOME_VAL+MVR_PTS+CAR_USE+BLUEBOOK+PARENT1+Manager+TRAVTIME+KIDSDRIV+TIF+INCOME+CLM_FREQ+Sports_Car+SUV+MSTATUS+Clerical+Pickup+Van+Panel_Truck+CAR_AGE+BlueCollar+EDUCATION+Doctor+YOJ+HOMEKIDS,family=binomial(link=‘logit’)) summary(forward_step_model) vif(forward_step_model) plota<-ggplot()+geom_point(aes(x=seq_along(resid(forward_step_model)),y=resid(forward_step_model)),color=‘blue’,shape=20,size=2)+ theme(panel.background = element_rect(fill = ‘#d3dded’))+labs(x=‘Forward Step Model’,y=‘Residuals’)+ylim(-4,4) plotb<-ggplot()+geom_point(aes(x=seq_along(cooks.distance(forward_step_model)),y=cooks.distance(forward_step_model)),color=‘blue’,shape=20,size=2)+ theme(panel.background = element_rect(fill = ‘#d3dded’))+labs(x=‘Forward Step Model’,y=“Cook’s Distance”)+ylim(0,.004) grid.arrange(plota,plotb,nrow = 1)

means_group<-matrix(kmeans(training_set[,c(6,7,12,21,28,33,24,25)],2)) training_set<-cbind(training_set,means_group[1]) colnames(training_set)[37]<-’means_group’ kmeans_model<-glm(data=training_set, TARGET_FLAG~REVOKED +MSTATUS +MVR_PTS + work +CAR_USE +TRAVTIME +TIF+means_group,family=binomial(link=‘logit’)) summary(kmeans_model) small_model<-glm(data=training_set, TARGET_FLAG~REVOKED +MSTATUS +MVR_PTS + work +CAR_USE +TRAVTIME +TIF,family=binomial(link=‘logit’)) summary(small_model)

max_model<-lm(data=training_set, TARGET_AMT~KIDSDRIV +AGE+HOMEKIDS +YOJ + INCOME + PARENT1+ HOME_VAL + MSTATUS + SEX + EDUCATION + TRAVTIME+ CAR_USE + BLUEBOOK + TIF + RED_CAR + OLDCLAIM + CLM_FREQ + REVOKED + MVR_PTS + CAR_AGE + work + Panel_Truck + Pickup + Sports_Car + Van + SUV + Clerical
+ Doctor + Homemaker + Lawyer + Manager + Professional + Student + BlueCollar) summary(max_model)

revoked_model<-lm(data=training_set, TARGET_AMT~REVOKED) step(revoked_model,scope=list(lower=revoked_model,upper=max_model) ,direction=“forward”) step_claimSize_model<-lm(data=training_set, TARGET_AMT~ REVOKED + MVR_PTS + CAR_USE + work + PARENT1 + INCOME + Manager + MSTATUS + CLM_FREQ + TIF + TRAVTIME + CAR_AGE + Sports_Car + Van + KIDSDRIV + SUV) summary(step_claimSize_model) small_model<-lm(data=training_set, TARGET_AMT~Sports_Car +MSTATUS + Manager + work + CAR_USE) summary(small_model)

plota<-ggplot(data=insurance_data,aes(x=insurance_data\(TARGET_FLAG,y=insurance_data\)TARGET_AMT))+geom_violin(draw_quantiles = c(0.25, 0.5, 0.75),bw=.7)+ scale_color_manual(values=c(“#022082”, “#a3204e”))+ theme(panel.background = element_rect(fill = ‘#b5c6fc’))+coord_flip()+ylab(‘Claim Sizes’)+xlab(‘Actual Amount’)+ylim(1,25000) plotb<-ggplot(data=insurance_data,aes(x=insurance_data\(TARGET_FLAG,y=insurance_data\)TARGET_AMT))+geom_violin()+ylim(10000,50000)+ scale_color_manual(values=c(“#022082”, “#a3204e”))+ theme(panel.background = element_rect(fill = ‘#b5c6fc’))+coord_flip()+ylab(‘Claim Sizes’)+xlab(‘’) plotc<-ggplot(data=insurance_data,aes(x=insurance_data\(TARGET_FLAG,y=insurance_data\)TARGET_AMT))+geom_violin()+ylim(50000,75000)+ scale_color_manual(values=c(“#022082”, “#a3204e”))+ theme(panel.background = element_rect(fill =’#b5c6fc’))+coord_flip()+ylab(‘Claim Sizes’)+xlab(‘’) plotd<-ggplot(data=insurance_data,aes(x=insurance_data\(TARGET_FLAG,y=insurance_data\)TARGET_AMT))+geom_violin()+ylim(75000,100000)+ scale_color_manual(values=c(“#022082”, “#a3204e”))+ theme(panel.background = element_rect(fill =’#b5c6fc’))+coord_flip()+ylab(‘Claim Sizes’)+xlab(’’) grid.arrange(plota,nrow = 1)

predictions<-as.data.frame(predict(max_model,testing_set)) plota<-ggplot(data=predictions,aes(x=1,y=predictions))+geom_violin(draw_quantiles = c(0.25, 0.5, 0.75),bw=.7)+ scale_color_manual(values=c(“#022082”, “#a3204e”))+ theme(panel.background = element_rect(fill = ‘palegoldenrod’))+coord_flip()+ylab(‘Claim Sizes’)+xlab(‘Predicted’)+ylim(1,25000) plotb<-ggplot(data=predictions,aes(x=1,y=predictions))+geom_violin(draw_quantiles = c(0.25, 0.5, 0.75),bw=.7)+ scale_color_manual(values=c(“#022082”, “#a3204e”))+ theme(panel.background = element_rect(fill = ‘goldenrod’))+coord_flip()+ylab(‘Claim Sizes’)+xlab(‘Pridicted-Zoomed’)+ylim(-2000,5000) grid.arrange(plota,plotb,nrow = 2)

predictions<-cbind(predictions,testing_set$TARGET_AMT)

colnames(predictions)<-c(‘predictions’,‘true_value’) predictions %>% mutate(diff_pred=predictions - true_value)->predictions diff_pred_df<-as.data.frame(predictions$diff_pred) plota<-ggplot()+geom_point(aes(x=seq(1,2040),y=diff_pred_df),color=‘cornsilk’)+ theme(panel.background = element_rect(fill = ‘Darkcyan’))+ylab(‘Claim Size Difference’)+xlab(’’) grid.arrange(plota,nrow = 1)

training_set %>% filter(!(TARGET_AMT==0)) %>% mutate(LOGGED_TARGET=log(TARGET_AMT))->logged_set logged_model<-lm(data=logged_set, LOGGED_TARGET~ KIDSDRIV +AGE+HOMEKIDS +YOJ + INCOME + PARENT1+ HOME_VAL + MSTATUS + SEX + EDUCATION + TRAVTIME+ CAR_USE + BLUEBOOK + TIF + RED_CAR + OLDCLAIM + CLM_FREQ + REVOKED + MVR_PTS + CAR_AGE + work + Panel_Truck + Pickup + Sports_Car + Van + SUV + Clerical + Doctor + Homemaker + Lawyer + Manager + Professional + Student + BlueCollar) summary(logged_model)

means_group<-matrix(kmeans(testing_set[,c(6,7,12,21,28,33,24,25)],2)) testing_set<-cbind(testing_set,means_group[1]) colnames(testing_set)[37]<-’means_group’ kmeans_predictions<-predict(kmeans_model,testing_set) ROC_set<-cbind(testing_set$TARGET_FLAG,kmeans_predictions) roc_function_object<-roc(ROC_set[,1],ROC_set[,2]) plot(roc_function_object) print(‘k-means model’) auc(roc_function_object) ——————————————–

forward_step_predictions<-predict(forward_step_model,testing_set) ROC_set<-cbind(testing_set$TARGET_FLAG,forward_step_predictions) roc_function_object<-roc(ROC_set[,1],ROC_set[,2]) plot(roc_function_object) print(‘forward step model’) auc(roc_function_object) ——————————————–

small_predictions<-predict(small_model,testing_set) ROC_set<-cbind(testing_set$TARGET_FLAG,small_predictions) roc_function_object<-roc(ROC_set[,1],ROC_set[,2]) plot(roc_function_object) print(‘small model’) auc(roc_function_object)

plota<-ggplot()+geom_point(aes(x=seq_along(resid(step_claimSize_model)),y=resid(step_claimSize_model)),color=‘blue’,shape=20,size=2)+ theme(panel.background = element_rect(fill = ‘#d3dded’))+labs(x=‘Forward Step Model’,y=‘Residuals’) plotb<-ggplot()+geom_point(aes(x=seq_along(cooks.distance(step_claimSize_model)),y=cooks.distance(step_claimSize_model)),color=‘blue’,shape=20,size=2)+ theme(panel.background = element_rect(fill = ‘#d3dded’))+labs(x=‘Forward Step Model’,y=“Cook’s Distance”) grid.arrange(plota,plotb,nrow = 1)

results_from_step_model<-predict(step_claimSize_model,testing_set) Metrics::rmse(results_from_step_model,testing_set[,2])

evaluation_data<-as_data_frame(read.csv(‘https://raw.githubusercontent.com/WigodskyD/data-sets/master/insurance-evaluation-data.csv’),stringsAsFactors=FALSE) evaluation_data\(OLDCLAIM<-as.numeric(gsub('\\\)|,‘,’‘, evaluation_data\(OLDCLAIM)) evaluation_data\)INCOME<-as.numeric(gsub(’\\(|,', '', evaluation_data\)INCOME)) conditional_oldclaim<-evaluation_data\(OLDCLAIM[which(evaluation_data\)OLDCLAIM!=0)] evaluation_data\(OLDCLAIM<-cut(evaluation_data\)OLDCLAIM,breaks=c(-.1,.1,3660,6050,9866,max(evaluation_data\(OLDCLAIM)),labels=c(1:5)) evaluation_data %>% separate(URBANICITY, sep='/ ',into=c('home','work'))->evaluation_data evaluation_data\)home<- as.factor(gsub(‘z_’,‘’,evaluation_data\(home)) evaluation_data\)INCOME[is.na(evaluation_data$INCOME)]<-61898 evaluation_data\(INCOME<-cut(evaluation_data\)INCOME,breaks=c(-.1,22345,43660,65260,95555,max(evaluation_data\(INCOME)),labels=c(1:5)) evaluation_data %>% spread(key=CAR_TYPE,value=CAR_TYPE)->evaluation_data evaluation_data<-evaluation_data[,-c(25,27)] colnames(evaluation_data)[26]<-'Panel_Truck' colnames(evaluation_data)[30]<-'SUV' colnames(evaluation_data)[28]<-'Sports_Car' evaluation_data\)Panel_Truck<-as.character(evaluation_data\(Panel_Truck) evaluation_data\)Panel_Truck[evaluation_data$Panel_Truck==’Panel Truck’]<-1 evaluation_data\(Panel_Truck[is.na(evaluation_data\)Panel_Truck)]<-0 evaluation_data\(Sports_Car<-as.character(evaluation_data\)Sports_Car) evaluation_data\(Sports_Car[evaluation_data\)Sports_Car==’Sports Car’]<-1 evaluation_data\(Sports_Car[is.na(evaluation_data\)Sports_Car)]<-0 evaluation_data\(Pickup<-as.character(evaluation_data\)Pickup) evaluation_data\(Pickup[evaluation_data\)Pickup==‘Pickup’]<-1 evaluation_data\(Pickup[is.na(evaluation_data\)Pickup)]<-0 evaluation_data\(Van<-as.character(evaluation_data\)Van) evaluation_data\(Van[evaluation_data\)Van==‘Van’]<-1 evaluation_data\(Van[is.na(evaluation_data\)Van)]<-0 evaluation_data\(SUV<-as.character(evaluation_data\)SUV) evaluation_data\(SUV[evaluation_data\)SUV==‘z_SUV’]<-1 evaluation_data\(SUV[is.na(evaluation_data\)SUV)]<-0 cols<-c(‘work’,‘Panel_Truck’,‘Pickup’,‘Sports_Car’,‘Van’,‘SUV’) evaluation_data[cols] <- lapply(evaluation_data[cols], factor) evaluation_data\(BLUEBOOK<-as.numeric(gsub('\\\)|,‘,’‘, evaluation_data\(BLUEBOOK)) evaluation_data\)JOB<-as.character(evaluation_data\(JOB) evaluation_data\)JOB[evaluation_data$JOB==’’]<-’None’ evaluation_data %>% spread(key=JOB,value=JOB,fill=‘0’)->evaluation_data colnames(evaluation_data)[32]<-’Homemaker’ colnames(evaluation_data)[38]<-’BlueCollar’ evaluation_data\(Clerical %>% recode('Clerical' = '1')->evaluation_data\)Clerical evaluation_data\(Doctor %>% recode('Doctor' = '1')->evaluation_data\)Doctor evaluation_data\(Homemaker %>% recode('Home Maker' = '1')->evaluation_data\)Homemaker evaluation_data\(Lawyer %>% recode('Lawyer' = '1')->evaluation_data\)Lawyer evaluation_data\(Manager %>% recode('Manager' = '1')->evaluation_data\)Manager evaluation_data\(Professional %>% recode('Professional' = '1')->evaluation_data\)Professional evaluation_data\(Student %>% recode('Student' = '1')->evaluation_data\)Student evaluation_data\(BlueCollar %>% recode('z_Blue Collar' = '1')->evaluation_data\)BlueCollar evaluation_data\(YOJ[is.na(evaluation_data\)YOJ)]<-0 evaluation_data<-evaluation_data[,-c(1,35)] evaluation_data\(KIDSDRIV<-as.factor(evaluation_data\)KIDSDRIV) evaluation_data\(HOME_VAL<-as.numeric(gsub('\\\)|,‘,’‘, evaluation_data\(HOME_VAL)) '<High School'='1','Bachelors'='3','Masters'='4','PhD'='5','z_High School'='2' evaluation_data\)EDUCATION %>% recode(’<High School’=‘1’,‘Bachelors’=‘3’,‘Masters’=‘4’,‘PhD’=‘5’,‘z_High School’=‘2’) evaluation_data\(SEX<-as.character(evaluation_data\)SEX) evaluation_data\(MSTATUS<-as.character(evaluation_data\)MSTATUS) evaluation_data\(SEX[evaluation_data\)SEX==‘z_F’]<-‘F’ evaluation_data\(MSTATUS[evaluation_data\)MSTATUS==‘z_No’]<-‘No’ evaluation_data\(EDUCATION<-as.numeric(evaluation_data\)EDUCATION) evaluation_data\(Manager<-as.numeric(evaluation_data\)Manager) evaluation_data\(Sports_Car<-as.numeric(evaluation_data\)Sports_Car) evaluation_data\(Van<-as.numeric(evaluation_data\)Van) evaluation_data\(KIDSDRIV<-as.numeric(evaluation_data\)KIDSDRIV) evaluation_data\(SUV<-as.numeric(evaluation_data\)SUV) results_to_report<-predict(step_claimSize_model,evaluation_data) results_to_report[is.na(results_to_report)]<-2000 results_to_report write.csv(results_to_report,‘C:/Users/dawig/Desktop/Data621/Homework_4/WigodskyDanpredictions4.csv’)