Data 621 Homework #4:

Overview

In this homework assignment, you will explore, analyze and model a data set containing approximately 8000 records representing a customer at an auto insurance company. Each record has two response variables. The first response variable, TARGET_FLAG, is a 1 or a 0. A "“1” means that the person was in a car crash. A zero means that the person was not in a car crash. The second response variable is TARGET_AMT. This value is zero if the person did not crash their car. But if they did crash their car, this number will be a value greater than zero.

Your objective is to build multiple linear regression and binary logistic regression models on the training data to predict the probability that a person will crash their car and also the amount of money it will cost if the person does crash their car. You can only use the variables given to you (or variables that you derive from the variables provided).

Data Exploration and Data Preparation

The first step we did was to import the data from GitHub, remove the index and look at the structure of the data.

## 'data.frame':    8161 obs. of  25 variables:
##  $ TARGET_FLAG: int  0 0 0 0 0 1 0 1 1 0 ...
##  $ TARGET_AMT : num  0 0 0 0 0 ...
##  $ KIDSDRIV   : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ AGE        : int  60 43 35 51 50 34 54 37 34 50 ...
##  $ HOMEKIDS   : int  0 0 1 0 0 1 0 2 0 0 ...
##  $ YOJ        : int  11 11 10 14 NA 12 NA NA 10 7 ...
##  $ INCOME     : Factor w/ 6613 levels "","$0","$1,007",..: 5033 6292 1250 1 509 746 1488 315 4765 282 ...
##  $ PARENT1    : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
##  $ HOME_VAL   : Factor w/ 5107 levels "","$0","$100,093",..: 2 3259 348 3917 3034 2 1 4167 2 2 ...
##  $ MSTATUS    : Factor w/ 2 levels "Yes","z_No": 2 2 1 1 1 2 1 1 2 2 ...
##  $ SEX        : Factor w/ 2 levels "M","z_F": 1 1 2 1 2 2 2 1 2 1 ...
##  $ EDUCATION  : Factor w/ 5 levels "<High School",..: 4 5 5 1 4 2 1 2 2 2 ...
##  $ JOB        : Factor w/ 9 levels "","Clerical",..: 7 9 2 9 3 9 9 9 2 7 ...
##  $ TRAVTIME   : int  14 22 5 32 36 46 33 44 34 48 ...
##  $ CAR_USE    : Factor w/ 2 levels "Commercial","Private": 2 1 2 2 2 1 2 1 2 1 ...
##  $ BLUEBOOK   : Factor w/ 2789 levels "$1,500","$1,520",..: 434 503 2212 553 802 746 2672 701 135 852 ...
##  $ TIF        : int  11 1 4 7 1 1 1 1 1 7 ...
##  $ CAR_TYPE   : Factor w/ 6 levels "Minivan","Panel Truck",..: 1 1 6 1 6 4 6 5 6 5 ...
##  $ RED_CAR    : Factor w/ 2 levels "no","yes": 2 2 1 2 1 1 1 2 1 1 ...
##  $ OLDCLAIM   : Factor w/ 2857 levels "$0","$1,000",..: 1449 1 1311 1 432 1 1 510 1 1 ...
##  $ CLM_FREQ   : int  2 0 2 0 2 0 0 1 0 0 ...
##  $ REVOKED    : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 1 1 2 1 1 ...
##  $ MVR_PTS    : int  3 0 3 0 3 0 0 10 0 1 ...
##  $ CAR_AGE    : int  18 1 10 6 17 7 1 7 1 17 ...
##  $ URBANICITY : Factor w/ 2 levels "Highly Urban/ Urban",..: 1 1 1 1 1 1 1 1 1 2 ...

We removed special characters then converted variables to numbers for both the Training and Evaluation data.

We then split the training data into a train and test data set.

##   TARGET_FLAG      TARGET_AMT       KIDSDRIV           AGE       
##  Min.   :0.000   Min.   :    0   Min.   :0.0000   Min.   :16.00  
##  1st Qu.:0.000   1st Qu.:    0   1st Qu.:0.0000   1st Qu.:39.00  
##  Median :0.000   Median :    0   Median :0.0000   Median :45.00  
##  Mean   :0.265   Mean   : 1491   Mean   :0.1731   Mean   :44.85  
##  3rd Qu.:1.000   3rd Qu.: 1102   3rd Qu.:0.0000   3rd Qu.:51.00  
##  Max.   :1.000   Max.   :85524   Max.   :4.0000   Max.   :76.00  
##                                                   NA's   :6      
##     HOMEKIDS           YOJ            INCOME       PARENT1       HOME_VAL     
##  Min.   :0.0000   Min.   : 0.00   Min.   :     0   No :5663   Min.   :     0  
##  1st Qu.:0.0000   1st Qu.: 9.00   1st Qu.: 27646   Yes: 866   1st Qu.:     0  
##  Median :0.0000   Median :11.00   Median : 54005              Median :160945  
##  Mean   :0.7265   Mean   :10.49   Mean   : 61552              Mean   :154188  
##  3rd Qu.:1.0000   3rd Qu.:13.00   3rd Qu.: 85697              3rd Qu.:238750  
##  Max.   :5.0000   Max.   :19.00   Max.   :367030              Max.   :885282  
##                   NA's   :370     NA's   :350                 NA's   :358     
##  MSTATUS      SEX               EDUCATION               JOB      
##  Yes :3936   M  :3033   <High School : 971   z_Blue Collar:1476  
##  z_No:2593   z_F:3496   Bachelors    :1798   Clerical     : 997  
##                         Masters      :1324   Professional : 901  
##                         PhD          : 577   Manager      : 783  
##                         z_High School:1859   Lawyer       : 665  
##                                              Student      : 573  
##                                              (Other)      :1134  
##     TRAVTIME            CAR_USE        BLUEBOOK          TIF        
##  Min.   :  5.00   Commercial:2440   Min.   : 1500   Min.   : 1.000  
##  1st Qu.: 23.00   Private   :4089   1st Qu.: 9260   1st Qu.: 1.000  
##  Median : 33.00                     Median :14440   Median : 4.000  
##  Mean   : 33.58                     Mean   :15684   Mean   : 5.357  
##  3rd Qu.: 44.00                     3rd Qu.:20800   3rd Qu.: 7.000  
##  Max.   :142.00                     Max.   :65970   Max.   :25.000  
##                                                                     
##         CAR_TYPE    RED_CAR       OLDCLAIM        CLM_FREQ      REVOKED   
##  Minivan    :1706   no :4623   Min.   :    0   Min.   :0.0000   No :5742  
##  Panel Truck: 550   yes:1906   1st Qu.:    0   1st Qu.:0.0000   Yes: 787  
##  Pickup     :1083              Median :    0   Median :0.0000             
##  Sports Car : 732              Mean   : 3982   Mean   :0.7961             
##  Van        : 612              3rd Qu.: 4633   3rd Qu.:2.0000             
##  z_SUV      :1846              Max.   :57037   Max.   :5.0000             
##                                                                           
##     MVR_PTS          CAR_AGE                       URBANICITY  
##  Min.   : 0.000   Min.   : 0.000   Highly Urban/ Urban  :5169  
##  1st Qu.: 0.000   1st Qu.: 1.000   z_Highly Rural/ Rural:1360  
##  Median : 1.000   Median : 8.000                               
##  Mean   : 1.695   Mean   : 8.255                               
##  3rd Qu.: 3.000   3rd Qu.:12.000                               
##  Max.   :13.000   Max.   :28.000                               
##                   NA's   :415
##       variable q_zeros p_zeros q_na p_na q_inf p_inf    type unique
## 1  TARGET_FLAG    4799   73.50    0 0.00     0     0 integer      2
## 2   TARGET_AMT    4799   73.50    0 0.00     0     0 numeric   1595
## 3     KIDSDRIV    5735   87.84    0 0.00     0     0 integer      5
## 4          AGE       0    0.00    6 0.09     0     0 integer     57
## 5     HOMEKIDS    4219   64.62    0 0.00     0     0 integer      6
## 6          YOJ     512    7.84  370 5.67     0     0 integer     20
## 7       INCOME     507    7.77  350 5.36     0     0 numeric   5347
## 8      PARENT1       0    0.00    0 0.00     0     0  factor      2
## 9     HOME_VAL    1852   28.37  358 5.48     0     0 numeric   4121
## 10     MSTATUS       0    0.00    0 0.00     0     0  factor      2
## 11         SEX       0    0.00    0 0.00     0     0  factor      2
## 12   EDUCATION       0    0.00    0 0.00     0     0  factor      5
## 13         JOB       0    0.00    0 0.00     0     0  factor      9
## 14    TRAVTIME       0    0.00    0 0.00     0     0 integer     95
## 15     CAR_USE       0    0.00    0 0.00     0     0  factor      2
## 16    BLUEBOOK       0    0.00    0 0.00     0     0 numeric   2572
## 17         TIF       0    0.00    0 0.00     0     0 integer     23
## 18    CAR_TYPE       0    0.00    0 0.00     0     0  factor      6
## 19     RED_CAR       0    0.00    0 0.00     0     0  factor      2
## 20    OLDCLAIM    4006   61.36    0 0.00     0     0 numeric   2336
## 21    CLM_FREQ    4006   61.36    0 0.00     0     0 integer      6
## 22     REVOKED       0    0.00    0 0.00     0     0  factor      2
## 23     MVR_PTS    2967   45.44    0 0.00     0     0 integer     13
## 24     CAR_AGE       2    0.03  415 6.36     0     0 integer     28
## 25  URBANICITY       0    0.00    0 0.00     0     0  factor      2
## [1] "TARGET_FLAG" "TARGET_AMT"  "KIDSDRIV"    "HOMEKIDS"    "OLDCLAIM"   
## [6] "CLM_FREQ"
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

##   PARENT1 frequency percentage cumulative_perc
## 1      No      5663      86.74           86.74
## 2     Yes       866      13.26          100.00
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

##   MSTATUS frequency percentage cumulative_perc
## 1     Yes      3936      60.28           60.28
## 2    z_No      2593      39.72          100.00
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

##   SEX frequency percentage cumulative_perc
## 1 z_F      3496      53.55           53.55
## 2   M      3033      46.45          100.00
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

##       EDUCATION frequency percentage cumulative_perc
## 1 z_High School      1859      28.47           28.47
## 2     Bachelors      1798      27.54           56.01
## 3       Masters      1324      20.28           76.29
## 4  <High School       971      14.87           91.16
## 5           PhD       577       8.84          100.00
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

##             JOB frequency percentage cumulative_perc
## 1 z_Blue Collar      1476      22.61           22.61
## 2      Clerical       997      15.27           37.88
## 3  Professional       901      13.80           51.68
## 4       Manager       783      11.99           63.67
## 5        Lawyer       665      10.19           73.86
## 6       Student       573       8.78           82.64
## 7    Home Maker       517       7.92           90.56
## 8                     419       6.42           96.98
## 9        Doctor       198       3.03          100.00
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

##      CAR_USE frequency percentage cumulative_perc
## 1    Private      4089      62.63           62.63
## 2 Commercial      2440      37.37          100.00
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

##      CAR_TYPE frequency percentage cumulative_perc
## 1       z_SUV      1846      28.27           28.27
## 2     Minivan      1706      26.13           54.40
## 3      Pickup      1083      16.59           70.99
## 4  Sports Car       732      11.21           82.20
## 5         Van       612       9.37           91.57
## 6 Panel Truck       550       8.42          100.00
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

##   RED_CAR frequency percentage cumulative_perc
## 1      no      4623      70.81           70.81
## 2     yes      1906      29.19          100.00
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

##   REVOKED frequency percentage cumulative_perc
## 1      No      5742      87.95           87.95
## 2     Yes       787      12.05          100.00
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.
## Warning: Use of `tbl_plot$frequency` is discouraged. Use `frequency` instead.
## Warning: Use of `tbl_plot$category` is discouraged. Use `category` instead.

##              URBANICITY frequency percentage cumulative_perc
## 1   Highly Urban/ Urban      5169      79.17           79.17
## 2 z_Highly Rural/ Rural      1360      20.83          100.00
## [1] "Variables processed: PARENT1, MSTATUS, SEX, EDUCATION, JOB, CAR_USE, CAR_TYPE, RED_CAR, REVOKED, URBANICITY"

We can determine the skewness and kurtosis of the data.

Build Models

Predicting car crash

In the model, we selected the following variables.

## 
## Call:
## glm(formula = TARGET_FLAG ~ YOJ + INCOME + PARENT1 + HOME_VAL + 
##     MSTATUS + SEX + EDUCATION + JOB + TRAVTIME + CAR_USE + TIF + 
##     CAR_TYPE + RED_CAR + REVOKED + URBANICITY + PTSAGE, family = "binomial", 
##     data = train2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1603  -0.7234  -0.4181   0.6649   3.0602  
## 
## Coefficients:
##                                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                     -1.154e+00  3.097e-01  -3.727 0.000194 ***
## YOJ                             -6.191e-03  9.490e-03  -0.652 0.514169    
## INCOME                          -2.730e-06  1.238e-06  -2.204 0.027514 *  
## PARENT1Yes                       5.639e-01  1.047e-01   5.383 7.32e-08 ***
## HOME_VAL                        -1.351e-06  3.913e-07  -3.454 0.000553 ***
## MSTATUSz_No                      3.830e-01  9.266e-02   4.133 3.58e-05 ***
## SEXz_F                          -2.449e-01  1.175e-01  -2.085 0.037062 *  
## EDUCATIONBachelors              -3.601e-01  1.244e-01  -2.896 0.003784 ** 
## EDUCATIONMasters                -3.924e-01  1.868e-01  -2.101 0.035649 *  
## EDUCATIONPhD                    -1.700e-01  2.270e-01  -0.749 0.453831    
## EDUCATIONz_High School           7.008e-02  1.083e-01   0.647 0.517416    
## JOBClerical                      4.164e-01  2.240e-01   1.859 0.063050 .  
## JOBDoctor                       -6.475e-01  3.043e-01  -2.128 0.033362 *  
## JOBHome Maker                    2.450e-01  2.379e-01   1.030 0.303225    
## JOBLawyer                        9.244e-02  1.911e-01   0.484 0.628575    
## JOBManager                      -6.692e-01  1.978e-01  -3.383 0.000717 ***
## JOBProfessional                  8.490e-02  2.034e-01   0.417 0.676417    
## JOBStudent                       3.574e-01  2.444e-01   1.462 0.143642    
## JOBz_Blue Collar                 2.867e-01  2.122e-01   1.351 0.176615    
## TRAVTIME                         1.593e-02  2.122e-03   7.509 5.94e-14 ***
## CAR_USEPrivate                  -6.998e-01  1.050e-01  -6.665 2.64e-11 ***
## TIF                             -5.058e-02  8.294e-03  -6.099 1.07e-09 ***
## CAR_TYPEPanel Truck              3.056e-01  1.613e-01   1.895 0.058144 .  
## CAR_TYPEPickup                   5.584e-01  1.151e-01   4.853 1.22e-06 ***
## CAR_TYPESports Car               1.199e+00  1.374e-01   8.724  < 2e-16 ***
## CAR_TYPEVan                      4.925e-01  1.393e-01   3.536 0.000407 ***
## CAR_TYPEz_SUV                    9.610e-01  1.162e-01   8.272  < 2e-16 ***
## RED_CARyes                      -5.146e-02  9.856e-02  -0.522 0.601606    
## REVOKEDYes                       7.648e-01  9.198e-02   8.315  < 2e-16 ***
## URBANICITYz_Highly Rural/ Rural -2.436e+00  1.255e-01 -19.415  < 2e-16 ***
## PTSAGE                           5.356e+00  5.792e-01   9.247  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7129.6  on 6170  degrees of freedom
## Residual deviance: 5609.8  on 6140  degrees of freedom
##   (358 observations deleted due to missingness)
## AIC: 5671.8
## 
## Number of Fisher Scoring iterations: 5

However we removed variables that deemed insufficient.

## 
## Call:
## glm(formula = TARGET_FLAG ~ INCOME + PARENT1 + HOME_VAL + MSTATUS + 
##     EDUCATION + TRAVTIME + CAR_USE + TIF + CAR_TYPE + REVOKED + 
##     URBANICITY + PTSAGE, family = "binomial", data = train2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1696  -0.7337  -0.4349   0.6606   3.0671  
## 
## Coefficients:
##                                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                     -8.296e-01  1.684e-01  -4.928 8.31e-07 ***
## INCOME                          -4.457e-06  1.120e-06  -3.981 6.87e-05 ***
## PARENT1Yes                       5.555e-01  1.031e-01   5.385 7.22e-08 ***
## HOME_VAL                        -1.425e-06  3.774e-07  -3.775 0.000160 ***
## MSTATUSz_No                      3.718e-01  9.037e-02   4.115 3.88e-05 ***
## EDUCATIONBachelors              -5.966e-01  1.115e-01  -5.352 8.68e-08 ***
## EDUCATIONMasters                -6.731e-01  1.251e-01  -5.380 7.44e-08 ***
## EDUCATIONPhD                    -6.456e-01  1.665e-01  -3.877 0.000106 ***
## EDUCATIONz_High School          -4.559e-02  1.044e-01  -0.437 0.662453    
## TRAVTIME                         1.646e-02  2.102e-03   7.827 4.99e-15 ***
## CAR_USEPrivate                  -8.303e-01  8.391e-02  -9.895  < 2e-16 ***
## TIF                             -4.973e-02  8.240e-03  -6.035 1.59e-09 ***
## CAR_TYPEPanel Truck              2.685e-01  1.481e-01   1.813 0.069811 .  
## CAR_TYPEPickup                   5.028e-01  1.118e-01   4.496 6.93e-06 ***
## CAR_TYPESports Car               1.044e+00  1.186e-01   8.808  < 2e-16 ***
## CAR_TYPEVan                      4.819e-01  1.342e-01   3.590 0.000330 ***
## CAR_TYPEz_SUV                    8.294e-01  9.490e-02   8.739  < 2e-16 ***
## REVOKEDYes                       7.795e-01  9.108e-02   8.559  < 2e-16 ***
## URBANICITYz_Highly Rural/ Rural -2.360e+00  1.250e-01 -18.875  < 2e-16 ***
## PTSAGE                           5.541e+00  5.745e-01   9.645  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7129.6  on 6170  degrees of freedom
## Residual deviance: 5677.2  on 6151  degrees of freedom
##   (358 observations deleted due to missingness)
## AIC: 5717.2
## 
## Number of Fisher Scoring iterations: 5

After removing the unecessary variables, all coefficients fall in line with their theoretical effects.

The model has a majority of the variables with significant p-values, with the exception of 2 categories of education (high school) and car type (truck). All of the coefficients of the variables also fall in line with theoretical effects.

Amount Predicted

## 
## Call:
## lm(formula = TARGET_AMT ~ . - TARGET_FLAG, data = train2_claims)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -8473  -3015  -1393    568  76295 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      3.085e+03  1.773e+03   1.741 0.081949 .  
## YOJ                              4.300e+01  5.164e+01   0.833 0.405148    
## INCOME                          -4.142e-03  7.301e-03  -0.567 0.570612    
## PARENT1Yes                      -3.944e+02  5.176e+02  -0.762 0.446170    
## HOME_VAL                         1.232e-03  2.192e-03   0.562 0.574090    
## MSTATUSz_No                      1.161e+03  5.091e+02   2.281 0.022660 *  
## SEXz_F                          -1.011e+03  7.043e+02  -1.436 0.151154    
## EDUCATIONBachelors               8.035e+01  6.935e+02   0.116 0.907772    
## EDUCATIONMasters                 1.442e+03  1.182e+03   1.220 0.222527    
## EDUCATIONPhD                     1.492e+03  1.393e+03   1.071 0.284439    
## EDUCATIONz_High School          -7.167e+02  5.571e+02  -1.287 0.198413    
## JOBClerical                      6.019e+02  1.300e+03   0.463 0.643432    
## JOBDoctor                       -1.132e+03  1.927e+03  -0.587 0.557010    
## JOBHome Maker                    1.299e+03  1.359e+03   0.956 0.339060    
## JOBLawyer                        9.975e+02  1.103e+03   0.904 0.366077    
## JOBManager                      -1.581e+02  1.193e+03  -0.133 0.894599    
## JOBProfessional                  2.152e+03  1.219e+03   1.766 0.077621 .  
## JOBStudent                       1.523e+03  1.385e+03   1.099 0.271811    
## JOBz_Blue Collar                 1.619e+03  1.241e+03   1.304 0.192348    
## TRAVTIME                        -2.845e+00  1.181e+01  -0.241 0.809624    
## CAR_USEPrivate                  -1.720e+02  5.619e+02  -0.306 0.759581    
## BLUEBOOK                         1.186e-01  3.280e-02   3.617 0.000308 ***
## TIF                              3.672e+00  4.486e+01   0.082 0.934772    
## CAR_TYPEPanel Truck             -5.591e+02  1.028e+03  -0.544 0.586808    
## CAR_TYPEPickup                   1.181e+01  6.455e+02   0.018 0.985405    
## CAR_TYPESports Car               1.345e+03  7.953e+02   1.691 0.091001 .  
## CAR_TYPEVan                     -4.801e+02  8.319e+02  -0.577 0.563937    
## CAR_TYPEz_SUV                    8.016e+02  7.101e+02   1.129 0.259130    
## RED_CARyes                      -1.670e+01  5.347e+02  -0.031 0.975087    
## REVOKEDYes                      -9.291e+02  4.458e+02  -2.084 0.037277 *  
## CAR_AGE                         -1.147e+02  4.753e+01  -2.414 0.015877 *  
## URBANICITYz_Highly Rural/ Rural -5.489e+02  8.108e+02  -0.677 0.498498    
## PTSAGE                           2.351e+03  2.599e+03   0.904 0.365915    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7210 on 1599 degrees of freedom
##   (98 observations deleted due to missingness)
## Multiple R-squared:  0.03284,    Adjusted R-squared:  0.01349 
## F-statistic: 1.697 on 32 and 1599 DF,  p-value: 0.009073

A lot of the variables are insignificant so we will limit the variables in the next model to make it more significant.

## 
## Call:
## lm(formula = TARGET_AMT ~ MSTATUS + BLUEBOOK + CAR_AGE, data = train2_claims)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -7721  -3027  -1490    351  78332 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4339.86307  423.06857  10.258  < 2e-16 ***
## MSTATUSz_No  754.61699  347.16539   2.174   0.0299 *  
## BLUEBOOK       0.09451    0.02106   4.487 7.68e-06 ***
## CAR_AGE      -60.72690   33.03295  -1.838   0.0662 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7200 on 1726 degrees of freedom
## Multiple R-squared:  0.01471,    Adjusted R-squared:  0.013 
## F-statistic: 8.591 on 3 and 1726 DF,  p-value: 1.163e-05

The coefficients are in line with theoretical effects in this model.

Select Model

Linear Models

##   TARGET_FLAG   TARGET_AMT           YOJ            INCOME       PARENT1  
##  Min.   :1    Min.   :  159.2   Min.   : 0.00   Min.   :     0   No :275  
##  1st Qu.:1    1st Qu.: 2632.2   1st Qu.: 9.00   1st Qu.: 17853   Yes: 81  
##  Median :1    Median : 4159.5   Median :11.00   Median : 41299            
##  Mean   :1    Mean   : 5616.0   Mean   :10.08   Mean   : 49377            
##  3rd Qu.:1    3rd Qu.: 5727.7   3rd Qu.:13.00   3rd Qu.: 70128            
##  Max.   :1    Max.   :60838.1   Max.   :19.00   Max.   :320127            
##                                 NA's   :24      NA's   :16                
##     HOME_VAL      MSTATUS     SEX              EDUCATION              JOB    
##  Min.   :     0   Yes :172   M  :162   <High School : 58   z_Blue Collar:97  
##  1st Qu.:     0   z_No:184   z_F:194   Bachelors    : 84   Clerical     :66  
##  Median :101563                        Masters      : 55   Student      :46  
##  Mean   :108545                        PhD          : 18   Home Maker   :37  
##  3rd Qu.:190761                        z_High School:141   Professional :36  
##  Max.   :750455                                                         :25  
##  NA's   :18                                                (Other)      :49  
##     TRAVTIME           CAR_USE       BLUEBOOK          TIF        
##  Min.   : 5.00   Commercial:178   Min.   : 1500   Min.   : 1.000  
##  1st Qu.:24.00   Private   :178   1st Qu.: 7338   1st Qu.: 1.000  
##  Median :35.00                    Median :12245   Median : 4.000  
##  Mean   :35.17                    Mean   :14643   Mean   : 4.747  
##  3rd Qu.:46.00                    3rd Qu.:20215   3rd Qu.: 7.000  
##  Max.   :81.00                    Max.   :62240   Max.   :18.000  
##                                                                   
##         CAR_TYPE   RED_CAR   REVOKED      CAR_AGE      
##  Minivan    : 65   no :254   No :297   Min.   : 1.000  
##  Panel Truck: 35   yes:102   Yes: 59   1st Qu.: 1.000  
##  Pickup     : 76                       Median : 7.000  
##  Sports Car : 49                       Mean   : 7.061  
##  Van        : 29                       3rd Qu.:10.750  
##  z_SUV      :102                       Max.   :22.000  
##                                        NA's   :30      
##                  URBANICITY      PTSAGE       
##  Highly Urban/ Urban  :343   Min.   :0.00000  
##  z_Highly Rural/ Rural: 13   1st Qu.:0.00000  
##                              Median :0.04651  
##                              Mean   :0.06580  
##                              3rd Qu.:0.10217  
##                              Max.   :0.42308  
## 
##          [,1]
## [1,] 45889850
## [2,] 47757957
## [3,] 47757957

Logit Models

## fitting null model for pseudo-r2
##           llh       llhNull            G2      McFadden          r2ML 
## -2804.9002913 -3564.8219813  1519.8433799     0.2131724     0.2183030 
##          r2CU 
##     0.3186664
## fitting null model for pseudo-r2
##           llh       llhNull            G2      McFadden          r2ML 
## -2838.6188224 -3564.8219813  1452.4063178     0.2037137     0.2097137 
##          r2CU 
##     0.3061283
## [1] "Accurancy 0.784"
## 
##  1 values imputed to 45 
## 
## 
##  94 values imputed to 11 
## 
## 
##  125 values imputed to 51778 
## 
## 
##  129 values imputed to 8 
## 
## 
##  1 values imputed to 0.02222222
##  TARGET_FLAG    TARGET_AMT        KIDSDRIV           AGE       
##  Mode:logical   Mode:logical   Min.   :0.0000   Min.   :17.00  
##  NA's:2141      NA's:2141      1st Qu.:0.0000   1st Qu.:39.00  
##                                Median :0.0000   Median :45.00  
##                                Mean   :0.1625   Mean   :45.02  
##                                3rd Qu.:0.0000   3rd Qu.:51.00  
##                                Max.   :3.0000   Max.   :73.00  
##                                                                
##     HOMEKIDS           YOJ            INCOME       PARENT1       HOME_VAL     
##  Min.   :0.0000   Min.   : 0.00   Min.   :     0   No :1875   Min.   :     0  
##  1st Qu.:0.0000   1st Qu.: 9.00   1st Qu.: 27450   Yes: 266   1st Qu.:     0  
##  Median :0.0000   Median :11.00   Median : 51778              Median :158840  
##  Mean   :0.7174   Mean   :10.41   Mean   : 59825              Mean   :153218  
##  3rd Qu.:1.0000   3rd Qu.:13.00   3rd Qu.: 82918              3rd Qu.:236652  
##  Max.   :5.0000   Max.   :19.00   Max.   :291182              Max.   :669271  
##                                                               NA's   :111     
##  MSTATUS      SEX               EDUCATION              JOB     
##  Yes :1294   M  : 971   <High School :312   z_Blue Collar:463  
##  z_No: 847   z_F:1170   Bachelors    :581   Clerical     :319  
##                         Masters      :420   Professional :291  
##                         PhD          :206   Manager      :269  
##                         z_High School:622   Home Maker   :202  
##                                             Lawyer       :196  
##                                             (Other)      :401  
##     TRAVTIME            CAR_USE        BLUEBOOK          TIF        
##  Min.   :  5.00   Commercial: 760   Min.   : 1500   Min.   : 1.000  
##  1st Qu.: 22.00   Private   :1381   1st Qu.: 8870   1st Qu.: 1.000  
##  Median : 33.00                     Median :14170   Median : 4.000  
##  Mean   : 33.15                     Mean   :15469   Mean   : 5.245  
##  3rd Qu.: 43.00                     3rd Qu.:21050   3rd Qu.: 7.000  
##  Max.   :105.00                     Max.   :49940   Max.   :25.000  
##                                                                     
##         CAR_TYPE   RED_CAR       OLDCLAIM        CLM_FREQ     REVOKED   
##  Minivan    :549   no :1543   Min.   :    0   Min.   :0.000   No :1880  
##  Panel Truck:177   yes: 598   1st Qu.:    0   1st Qu.:0.000   Yes: 261  
##  Pickup     :383              Median :    0   Median :0.000             
##  Sports Car :272              Mean   : 4022   Mean   :0.809             
##  Van        :171              3rd Qu.: 4718   3rd Qu.:2.000             
##  z_SUV      :589              Max.   :54399   Max.   :5.000             
##                                                                         
##     MVR_PTS          CAR_AGE                       URBANICITY  
##  Min.   : 0.000   Min.   : 0.000   Highly Urban/ Urban  :1738  
##  1st Qu.: 0.000   1st Qu.: 1.000   z_Highly Rural/ Rural: 403  
##  Median : 1.000   Median : 8.000                               
##  Mean   : 1.766   Mean   : 8.172                               
##  3rd Qu.: 3.000   3rd Qu.:12.000                               
##  Max.   :12.000   Max.   :26.000                               
##                                                                
##      PTSAGE       
##  Min.   :0.00000  
##  1st Qu.:0.00000  
##  Median :0.02128  
##  Mean   :0.04227  
##  3rd Qu.:0.06522  
##  Max.   :0.50000  
## 

Critical Thinking Group 4: Rajwant Mishra, Priya Shaji, Debabrata Kabiraj, Isabel Ramesar, Sin Ying Wong and Fan Xu

04/03/2029