Overview
In this homework assignment, you will explore, analyze and model a dataset containing approximately 8000 records representing a customer at an auto insurance company. Each record has two response variables. The first responsevariable, TARGET_FLAG, is a 1 or a 0. A “1” means that the person was in a car crash. A zero means that the person was not in a car crash.The second responsevariable is TARGET_AMT. This value is zero if the person did not crash their car. But if they did crash their car, this number will be a value greater than zero.
Your objective is to build multiple linear regression and binary logistic regression models on the training data to predict the probability that a person will crash their car and also the amount of money it will cost if the person does crash their car. You can only use the variables given to you (or variables that you derive from the variables provided).
1. DATA EXPLORATION
## [1] "Training dataset dimensions: Number of rows: 8161, Number of cols: 26"
## INDEX TARGET_FLAG TARGET_AMT KIDSDRIV AGE HOMEKIDS YOJ INCOME PARENT1
## 1 1 0 0 0 60 0 11 $67,349 No
## 2 2 0 0 0 43 0 11 $91,449 No
## 3 4 0 0 0 35 1 10 $16,039 No
## 4 5 0 0 0 51 0 14 No
## 5 6 0 0 0 50 0 NA $114,986 No
## 6 7 1 2946 0 34 1 12 $125,301 Yes
## HOME_VAL MSTATUS SEX EDUCATION JOB TRAVTIME CAR_USE BLUEBOOK
## 1 $0 z_No M PhD Professional 14 Private $14,230
## 2 $257,252 z_No M z_High School z_Blue Collar 22 Commercial $14,940
## 3 $124,191 Yes z_F z_High School Clerical 5 Private $4,010
## 4 $306,251 Yes M <High School z_Blue Collar 32 Private $15,440
## 5 $243,925 Yes z_F PhD Doctor 36 Private $18,000
## 6 $0 z_No z_F Bachelors z_Blue Collar 46 Commercial $17,430
## TIF CAR_TYPE RED_CAR OLDCLAIM CLM_FREQ REVOKED MVR_PTS CAR_AGE
## 1 11 Minivan yes $4,461 2 No 3 18
## 2 1 Minivan yes $0 0 No 0 1
## 3 4 z_SUV no $38,690 2 No 3 10
## 4 7 Minivan yes $0 0 No 0 6
## 5 1 z_SUV no $19,217 2 Yes 3 17
## 6 1 Sports Car no $0 0 No 0 7
## URBANICITY
## 1 Highly Urban/ Urban
## 2 Highly Urban/ Urban
## 3 Highly Urban/ Urban
## 4 Highly Urban/ Urban
## 5 Highly Urban/ Urban
## 6 Highly Urban/ Urban
## 'data.frame': 8161 obs. of 26 variables:
## $ INDEX : int 1 2 4 5 6 7 8 11 12 13 ...
## $ TARGET_FLAG: int 0 0 0 0 0 1 0 1 1 0 ...
## $ TARGET_AMT : num 0 0 0 0 0 ...
## $ KIDSDRIV : int 0 0 0 0 0 0 0 1 0 0 ...
## $ AGE : int 60 43 35 51 50 34 54 37 34 50 ...
## $ HOMEKIDS : int 0 0 1 0 0 1 0 2 0 0 ...
## $ YOJ : int 11 11 10 14 NA 12 NA NA 10 7 ...
## $ INCOME : chr "$67,349" "$91,449" "$16,039" "" ...
## $ PARENT1 : chr "No" "No" "No" "No" ...
## $ HOME_VAL : chr "$0" "$257,252" "$124,191" "$306,251" ...
## $ MSTATUS : chr "z_No" "z_No" "Yes" "Yes" ...
## $ SEX : chr "M" "M" "z_F" "M" ...
## $ EDUCATION : chr "PhD" "z_High School" "z_High School" "<High School" ...
## $ JOB : chr "Professional" "z_Blue Collar" "Clerical" "z_Blue Collar" ...
## $ TRAVTIME : int 14 22 5 32 36 46 33 44 34 48 ...
## $ CAR_USE : chr "Private" "Commercial" "Private" "Private" ...
## $ BLUEBOOK : chr "$14,230" "$14,940" "$4,010" "$15,440" ...
## $ TIF : int 11 1 4 7 1 1 1 1 1 7 ...
## $ CAR_TYPE : chr "Minivan" "Minivan" "z_SUV" "Minivan" ...
## $ RED_CAR : chr "yes" "yes" "no" "yes" ...
## $ OLDCLAIM : chr "$4,461" "$0" "$38,690" "$0" ...
## $ CLM_FREQ : int 2 0 2 0 2 0 0 1 0 0 ...
## $ REVOKED : chr "No" "No" "No" "No" ...
## $ MVR_PTS : int 3 0 3 0 3 0 0 10 0 1 ...
## $ CAR_AGE : int 18 1 10 6 17 7 1 7 1 17 ...
## $ URBANICITY : chr "Highly Urban/ Urban" "Highly Urban/ Urban" "Highly Urban/ Urban" "Highly Urban/ Urban" ...
The training dataset consists of 26 variables and 8161 observations


|
Correlation
|
|
TARGET_FLAG
|
TARGET_AMT
|
TARGET_FLAG
|
1.0000000
|
1.0000000
|
TARGET_AMT
|
0.8334240
|
0.8334240
|
MVR_PTS
|
0.2191323
|
0.1970216
|
CLM_FREQ
|
0.2161961
|
0.1741927
|
OLDCLAIM
|
0.1947302
|
0.1611626
|
PARENT1
|
0.1576222
|
0.1359305
|
REVOKED
|
0.1519391
|
0.1263285
|
MSTATUS
|
0.1351248
|
0.1214701
|
HOMEKIDS
|
0.1156210
|
0.1008356
|
KIDSDRIV
|
0.1036683
|
0.0877148
|
CAR_TYPE
|
0.1023650
|
0.0797487
|
JOB
|
0.0612262
|
0.0488313
|
TRAVTIME
|
0.0492559
|
0.0401971
|
EDUCATION
|
0.0428730
|
0.0397864
|
SEX
|
0.0210786
|
0.0088270
|
RED_CAR
|
-0.0069473
|
0.0005877
|
TIF
|
-0.0823431
|
-0.0683183
|
BLUEBOOK
|
-0.1092768
|
-0.0709830
|
CAR_USE
|
-0.1426737
|
-0.1287263
|
URBANICITY
|
-0.2242509
|
-0.1904945
|

2. DATA PREPARATION
##
## iter imp variable
## 1 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 5 AGE YOJ INCOME HOME_VAL CAR_AGE
##
## iter imp variable
## 1 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## [1] "Missing value after imputation: 0"
##
## iter imp variable
## 1 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 5 AGE YOJ INCOME HOME_VAL CAR_AGE
##
## iter imp variable
## 1 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 1 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 2 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 3 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 4 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 1 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 2 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 3 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 4 AGE YOJ INCOME HOME_VAL CAR_AGE
## 5 5 AGE YOJ INCOME HOME_VAL CAR_AGE
## [1] "Missing value after imputation: 0"
|
VIF Score
|
TARGET_AMT
|
1.183605
|
KIDSDRIV
|
1.322411
|
AGE
|
1.412150
|
HOMEKIDS
|
2.068708
|
YOJ
|
1.225576
|
INCOME
|
2.738123
|
PARENT1
|
1.844249
|
HOME_VAL
|
2.454186
|
MSTATUS
|
1.924588
|
SEX
|
2.265022
|
EDUCATION
|
1.043666
|
JOB
|
1.153863
|
TRAVTIME
|
1.038907
|
CAR_USE
|
1.353494
|
BLUEBOOK
|
1.378285
|
TIF
|
1.009140
|
CAR_TYPE
|
1.410025
|
RED_CAR
|
1.809130
|
OLDCLAIM
|
2.201369
|
CLM_FREQ
|
2.131246
|
REVOKED
|
1.148729
|
MVR_PTS
|
1.249246
|
CAR_AGE
|
1.325166
|
URBANICITY
|
1.241628
|
3. BUILD MODELS
##
## Call:
## lm(formula = TARGET_AMT ~ ., data = correlated_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -898.90 -286.25 -134.43 62.85 1927.07
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.018e+02 7.820e+01 5.138 2.86e-07 ***
## KIDSDRIV 4.205e+01 1.318e+01 3.189 0.00143 **
## AGE -2.046e-01 8.070e-01 -0.254 0.79986
## HOMEKIDS 1.359e+01 7.532e+00 1.804 0.07121 .
## YOJ -3.491e-01 1.590e+00 -0.220 0.82625
## INCOME -2.270e-02 4.803e-03 -4.727 2.33e-06 ***
## PARENT1 7.414e+01 2.336e+01 3.173 0.00151 **
## HOME_VAL -1.082e-02 5.505e-03 -1.965 0.04946 *
## MSTATUS 7.301e+01 1.685e+01 4.332 1.50e-05 ***
## SEX -9.673e+00 1.756e+01 -0.551 0.58178
## EDUCATION 6.679e+00 4.132e+00 1.616 0.10606
## JOB -6.667e-01 2.347e+00 -0.284 0.77637
## TRAVTIME 2.185e+00 3.772e-01 5.793 7.25e-09 ***
## CAR_USE -1.472e+02 1.392e+01 -10.571 < 2e-16 ***
## BLUEBOOK -2.513e-02 9.504e-03 -2.644 0.00821 **
## TIF -7.286e+00 1.416e+00 -5.147 2.73e-07 ***
## CAR_TYPE 1.877e+01 3.517e+00 5.335 9.87e-08 ***
## RED_CAR -1.768e+01 1.732e+01 -1.021 0.30740
## OLDCLAIM -4.355e-03 1.039e-02 -0.419 0.67518
## CLM_FREQ 2.315e+01 7.358e+00 3.147 0.00166 **
## REVOKED 1.281e+02 1.915e+01 6.693 2.37e-11 ***
## MVR_PTS 2.597e+01 3.034e+00 8.560 < 2e-16 ***
## CAR_AGE -3.795e+00 1.182e+00 -3.210 0.00133 **
## URBANICITY -2.685e+02 1.590e+01 -16.891 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 470.1 on 6424 degrees of freedom
## (1713 observations deleted due to missingness)
## Multiple R-squared: 0.1564, Adjusted R-squared: 0.1534
## F-statistic: 51.79 on 23 and 6424 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = TARGET_AMT ~ ., data = vif_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -911.15 -287.66 -135.27 65.26 1922.93
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.515e+02 6.999e+01 5.022 5.21e-07 ***
## KIDSDRIV 5.526e+01 1.173e+01 4.713 2.49e-06 ***
## AGE 5.953e-02 7.197e-01 0.083 0.93409
## HOMEKIDS 1.391e+01 6.728e+00 2.067 0.03876 *
## YOJ -9.904e-01 1.413e+00 -0.701 0.48339
## INCOME -2.305e-02 4.244e-03 -5.432 5.73e-08 ***
## PARENT1 6.756e+01 2.094e+01 3.226 0.00126 **
## HOME_VAL -1.116e-02 4.812e-03 -2.319 0.02044 *
## MSTATUS 8.276e+01 1.476e+01 5.607 2.12e-08 ***
## SEX -3.660e+00 1.576e+01 -0.232 0.81636
## EDUCATION 7.924e+00 3.692e+00 2.146 0.03187 *
## JOB -4.376e-02 2.092e+00 -0.021 0.98331
## TRAVTIME 2.078e+00 3.357e-01 6.189 6.34e-10 ***
## CAR_USE -1.435e+02 1.248e+01 -11.504 < 2e-16 ***
## BLUEBOOK -2.488e-02 8.508e-03 -2.924 0.00347 **
## TIF -7.542e+00 1.263e+00 -5.970 2.47e-09 ***
## CAR_TYPE 1.669e+01 3.150e+00 5.297 1.21e-07 ***
## RED_CAR -3.939e+00 1.546e+01 -0.255 0.79887
## OLDCLAIM -7.244e-03 9.189e-03 -0.788 0.43052
## CLM_FREQ 2.147e+01 6.578e+00 3.264 0.00110 **
## REVOKED 1.307e+02 1.701e+01 7.684 1.72e-14 ***
## MVR_PTS 2.597e+01 2.705e+00 9.602 < 2e-16 ***
## CAR_AGE -3.134e+00 1.055e+00 -2.971 0.00297 **
## URBANICITY -2.714e+02 1.411e+01 -19.231 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 471.8 on 8137 degrees of freedom
## Multiple R-squared: 0.1551, Adjusted R-squared: 0.1527
## F-statistic: 64.96 on 23 and 8137 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = TARGET_AMT ~ KIDSDRIV + HOMEKIDS + INCOME + PARENT1 +
## HOME_VAL + MSTATUS + EDUCATION + TRAVTIME + CAR_USE + BLUEBOOK +
## TIF + CAR_TYPE + CLM_FREQ + REVOKED + MVR_PTS + CAR_AGE +
## URBANICITY, data = vif_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -916.6 -287.0 -135.0 65.4 1927.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.394e+02 4.979e+01 6.817 9.95e-12 ***
## KIDSDRIV 5.568e+01 1.155e+01 4.819 1.47e-06 ***
## HOMEKIDS 1.308e+01 6.167e+00 2.122 0.03388 *
## INCOME -2.375e-02 4.098e-03 -5.795 7.07e-09 ***
## PARENT1 6.771e+01 2.083e+01 3.251 0.00115 **
## HOME_VAL -1.110e-02 4.788e-03 -2.317 0.02050 *
## MSTATUS 8.381e+01 1.468e+01 5.708 1.18e-08 ***
## EDUCATION 7.980e+00 3.656e+00 2.183 0.02908 *
## TRAVTIME 2.080e+00 3.355e-01 6.202 5.86e-10 ***
## CAR_USE -1.439e+02 1.136e+01 -12.667 < 2e-16 ***
## BLUEBOOK -2.499e-02 8.280e-03 -3.018 0.00255 **
## TIF -7.551e+00 1.262e+00 -5.982 2.30e-09 ***
## CAR_TYPE 1.661e+01 2.741e+00 6.059 1.43e-09 ***
## CLM_FREQ 1.817e+01 5.059e+00 3.592 0.00033 ***
## REVOKED 1.263e+02 1.606e+01 7.859 4.35e-15 ***
## MVR_PTS 2.568e+01 2.674e+00 9.605 < 2e-16 ***
## CAR_AGE -3.065e+00 1.031e+00 -2.972 0.00296 **
## URBANICITY -2.707e+02 1.408e+01 -19.222 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 471.7 on 8143 degrees of freedom
## Multiple R-squared: 0.155, Adjusted R-squared: 0.1532
## F-statistic: 87.86 on 17 and 8143 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = TARGET_AMT ~ ., data = boxcoxed_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7391 -0.5452 -0.2238 0.5891 2.3647
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.1119671 0.1100133 10.108 < 2e-16 ***
## KIDSDRIV 0.4309404 0.0744645 5.787 7.42e-09 ***
## AGE -0.0008369 0.0011832 -0.707 0.479380
## HOMEKIDS 0.1060352 0.0603551 1.757 0.078980 .
## YOJ 0.0004341 0.0006386 0.680 0.496655
## INCOME -0.0019144 0.0002550 -7.507 6.67e-14 ***
## PARENT1 0.1193992 0.0354643 3.367 0.000764 ***
## HOME_VAL -0.0049140 0.0013070 -3.760 0.000171 ***
## MSTATUS 0.1354766 0.0248347 5.455 5.04e-08 ***
## SEX 0.0036217 0.0249212 0.145 0.884457
## EDUCATION 0.0171283 0.0089235 1.919 0.054961 .
## JOB 0.0009017 0.0033090 0.272 0.785258
## TRAVTIME 0.0112483 0.0013997 8.036 1.06e-15 ***
## CAR_USE -0.2703675 0.0197399 -13.696 < 2e-16 ***
## BLUEBOOK -0.0013081 0.0002076 -6.301 3.11e-10 ***
## TIF -0.0527999 0.0068477 -7.711 1.40e-14 ***
## CAR_TYPE 0.0523189 0.0077014 6.793 1.17e-11 ***
## RED_CAR -0.0101470 0.0245368 -0.414 0.679220
## OLDCLAIM -0.0063592 0.0104988 -0.606 0.544729
## CLM_FREQ 0.3706891 0.1402772 2.643 0.008244 **
## REVOKED 0.2553268 0.0260135 9.815 < 2e-16 ***
## MVR_PTS 0.1327628 0.0184169 7.209 6.15e-13 ***
## CAR_AGE -0.0241521 0.0049178 -4.911 9.23e-07 ***
## URBANICITY -0.5171850 0.0225073 -22.979 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7488 on 8137 degrees of freedom
## Multiple R-squared: 0.2153, Adjusted R-squared: 0.213
## F-statistic: 97.05 on 23 and 8137 DF, p-value: < 2.2e-16
##
## Call:
## glm(formula = TARGET_FLAG ~ ., family = "binomial", data = binomial_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5020 -0.7290 -0.4165 0.6571 3.1081
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.488e-01 3.802e-01 1.180 0.237905
## KIDSDRIV 3.743e-01 6.035e-02 6.202 5.57e-10 ***
## AGE -2.848e-03 3.907e-03 -0.729 0.466048
## HOMEKIDS 5.714e-02 3.653e-02 1.564 0.117764
## YOJ -7.003e-03 7.610e-03 -0.920 0.357406
## INCOME -1.358e-04 2.262e-05 -6.003 1.93e-09 ***
## PARENT1 3.659e-01 1.082e-01 3.380 0.000724 ***
## HOME_VAL -9.205e-05 2.595e-05 -3.547 0.000389 ***
## MSTATUS 4.992e-01 8.097e-02 6.166 7.02e-10 ***
## SEX 1.491e-02 8.798e-02 0.169 0.865455
## EDUCATION 3.428e-02 1.985e-02 1.727 0.084199 .
## JOB -7.651e-03 1.130e-02 -0.677 0.498387
## TRAVTIME 1.529e-02 1.874e-03 8.161 3.32e-16 ***
## CAR_USE -9.293e-01 6.833e-02 -13.600 < 2e-16 ***
## BLUEBOOK -2.806e-04 4.703e-05 -5.967 2.42e-09 ***
## TIF -5.438e-02 7.276e-03 -7.474 7.79e-14 ***
## CAR_TYPE 1.177e-01 1.788e-02 6.583 4.60e-11 ***
## RED_CAR -2.848e-02 8.544e-02 -0.333 0.738853
## OLDCLAIM -4.625e-05 4.495e-05 -1.029 0.303521
## CLM_FREQ 1.721e-01 3.203e-02 5.372 7.78e-08 ***
## REVOKED 7.673e-01 8.448e-02 9.083 < 2e-16 ***
## MVR_PTS 1.160e-01 1.358e-02 8.539 < 2e-16 ***
## CAR_AGE -2.195e-02 5.850e-03 -3.753 0.000175 ***
## URBANICITY -2.310e+00 1.126e-01 -20.515 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9418.0 on 8160 degrees of freedom
## Residual deviance: 7424.9 on 8137 degrees of freedom
## AIC: 7472.9
##
## Number of Fisher Scoring iterations: 5
##
## Call:
## glm(formula = TARGET_FLAG ~ KIDSDRIV + HOMEKIDS + INCOME + PARENT1 +
## HOME_VAL + MSTATUS + EDUCATION + TRAVTIME + CAR_USE + BLUEBOOK +
## TIF + CAR_TYPE + CLM_FREQ + REVOKED + MVR_PTS + CAR_AGE +
## URBANICITY, family = "binomial", data = binomial_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5092 -0.7278 -0.4182 0.6541 3.0800
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.003e-01 2.672e-01 0.750 0.453393
## KIDSDRIV 3.681e-01 5.933e-02 6.205 5.48e-10 ***
## HOMEKIDS 6.343e-02 3.351e-02 1.893 0.058424 .
## INCOME -1.422e-04 2.175e-05 -6.534 6.39e-11 ***
## PARENT1 3.770e-01 1.075e-01 3.507 0.000454 ***
## HOME_VAL -9.408e-05 2.585e-05 -3.640 0.000273 ***
## MSTATUS 5.073e-01 8.059e-02 6.295 3.07e-10 ***
## EDUCATION 3.626e-02 1.968e-02 1.843 0.065391 .
## TRAVTIME 1.526e-02 1.872e-03 8.154 3.53e-16 ***
## CAR_USE -9.124e-01 6.203e-02 -14.709 < 2e-16 ***
## BLUEBOOK -2.782e-04 4.594e-05 -6.055 1.40e-09 ***
## TIF -5.427e-02 7.268e-03 -7.467 8.20e-14 ***
## CAR_TYPE 1.217e-01 1.545e-02 7.878 3.32e-15 ***
## CLM_FREQ 1.513e-01 2.517e-02 6.012 1.84e-09 ***
## REVOKED 7.367e-01 7.929e-02 9.291 < 2e-16 ***
## MVR_PTS 1.146e-01 1.341e-02 8.547 < 2e-16 ***
## CAR_AGE -2.101e-02 5.695e-03 -3.689 0.000225 ***
## URBANICITY -2.300e+00 1.123e-01 -20.489 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9418.0 on 8160 degrees of freedom
## Residual deviance: 7428.3 on 8143 degrees of freedom
## AIC: 7464.3
##
## Number of Fisher Scoring iterations: 5
##
## Call:
## glm(formula = TARGET_FLAG ~ ., family = "binomial", data = in_bc_transformed1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3386 -0.7288 -0.4181 0.6763 3.1405
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.5280207 0.3790752 4.031 5.56e-05 ***
## KIDSDRIV 1.4378762 0.2440809 5.891 3.84e-09 ***
## AGE -0.0022458 0.0040536 -0.554 0.579565
## HOMEKIDS 0.4265052 0.2133621 1.999 0.045611 *
## YOJ 0.0018798 0.0022036 0.853 0.393629
## INCOME -0.0065045 0.0008632 -7.535 4.87e-14 ***
## PARENT1 0.2612104 0.1181074 2.212 0.026992 *
## HOME_VAL -0.0158167 0.0043145 -3.666 0.000246 ***
## MSTATUS 0.5257550 0.0869478 6.047 1.48e-09 ***
## SEX -0.0083849 0.0876730 -0.096 0.923808
## EDUCATION 0.0454300 0.0303019 1.499 0.133810
## JOB -0.0047592 0.0112436 -0.423 0.672088
## TRAVTIME 0.0419500 0.0049865 8.413 < 2e-16 ***
## CAR_USE -0.9163469 0.0681251 -13.451 < 2e-16 ***
## BLUEBOOK -0.0047117 0.0007119 -6.619 3.62e-11 ***
## TIF -0.1815658 0.0238069 -7.627 2.41e-14 ***
## CAR_TYPE 0.1988017 0.0278696 7.133 9.80e-13 ***
## RED_CAR -0.0317191 0.0854514 -0.371 0.710492
## OLDCLAIM -0.0213959 0.0316719 -0.676 0.499327
## CLM_FREQ 1.1610295 0.4227084 2.747 0.006021 **
## REVOKED 0.7479378 0.0811108 9.221 < 2e-16 ***
## MVR_PTS 0.4120924 0.0621516 6.630 3.35e-11 ***
## CAR_AGE -0.0821287 0.0169290 -4.851 1.23e-06 ***
## URBANICITY -2.2941991 0.1132080 -20.265 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9418.0 on 8160 degrees of freedom
## Residual deviance: 7413.1 on 8137 degrees of freedom
## AIC: 7461.1
##
## Number of Fisher Scoring iterations: 5