University of Cincinnati


Team Project - Predicting Customer’s Response to an Offer


Team Members

Rohit Jayakumar Nair [M13251730]

Siddharth Narayana Menon [M13489503]

Anamika Mishra [M13440005]

Dominic Franco [M06338256]

Roshan Khandare [M13262333]








Executive Summary

Marketing Customer Value dataset provided by IBM Watson Analytics gives you information about customers. This rich dataset can be used to predict their behavior to retain your customers. Analysis on this dataset can be used to understand the behavior of customers and buying habits. We can analyze all relevant customer data to understand customer demographics and develop focused customer retention programs.

We can analyze the most profitable customers and how they interact and help take targeted actions to increase profitable customer response, retention, and growth.

The purpose of this project is to determine the response of the customer whether they would accept or reject the offer we make based on the customer profile. The profile of the customer is built upon based on details such as Employment status, Marital status, Income group, etc.

We’ve created two models on top this dataset using Logistic Regression and Decision Tree. Based on the testing dataset and the model built upon training dataset, accuracy is determined by cross checking the model output with the response we already have. Using this we determined the challenger model and champion model based on the accuracy.

The report progresses with different parts of regression model-building process such as model specification, parameter estimation, model adequacy checking, and model validation.

The report concludes with determining the champion model based on the accuracy which will predict the possibility of a customer accepting or rejecting the offer.

Chapter 1: Data Introduction

The statistics are about whether the customer has accepted or rejected the offer extended to them along with the customer profile containing personal information. From the available attributes, we are initially considering the following covariates:

We picked the dataset from IBM Watson Analytics Gallery. The statistics are about whether the customer has accepted or rejected the offer extended to them along with the customer profile containing personal information, Policy and Vehicle Information. Customer profile is built upon the personal information displayed in tabular format below.

We have 9134 unique observation in total. We created training dataset and testing dataset by splitting original dataset into 70-30 ratio where 70% of dataset is used for training and remaining 30% for testing the model

Import all the libraries:

library('ggcorrplot')
## Warning: package 'ggcorrplot' was built under R version 3.6.3
## Loading required package: ggplot2
library('ggplot2')
library('ROCR')
## Warning: package 'ROCR' was built under R version 3.6.3
## Loading required package: gplots
## Warning: package 'gplots' was built under R version 3.6.2
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
library('car')
## Loading required package: carData
library("dplyr")
## Warning: package 'dplyr' was built under R version 3.6.3
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
## 
##     recode
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library('rpart')
library('tidyverse')
## -- Attaching packages ------------------------------------------------------------------- tidyverse 1.2.1 --
## v tibble  2.1.3     v purrr   0.3.2
## v tidyr   0.8.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ---------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x dplyr::recode() masks car::recode()
## x purrr::some()   masks car::some()
library('corrgram')
## Warning: package 'corrgram' was built under R version 3.6.2
## Registered S3 method overwritten by 'seriation':
##   method         from 
##   reorder.hclust gclus
library('glmnet')
## Warning: package 'glmnet' was built under R version 3.6.2
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following object is masked from 'package:tidyr':
## 
##     expand
## Loaded glmnet 3.0-2
library('boot')
## 
## Attaching package: 'boot'
## The following object is masked from 'package:car':
## 
##     logit

Loading the dataset.

insurance.data<-read.csv("~/Assignment/WA_Fn-UseC_-Marketing-Customer-Value-Analysis.csv")
head(insurance.data)
##   Customer      State Customer.Lifetime.Value Response Coverage Education
## 1  BU79786 Washington                2763.519       No    Basic  Bachelor
## 2  QZ44356    Arizona                6979.536       No Extended  Bachelor
## 3  AI49188     Nevada               12887.432       No  Premium  Bachelor
## 4  WW63253 California                7645.862       No    Basic  Bachelor
## 5  HB64268 Washington                2813.693       No    Basic  Bachelor
## 6  OC83172     Oregon                8256.298      Yes    Basic  Bachelor
##   Effective.To.Date EmploymentStatus Gender Income Location.Code
## 1           2/24/11         Employed      F  56274      Suburban
## 2           1/31/11       Unemployed      F      0      Suburban
## 3           2/19/11         Employed      F  48767      Suburban
## 4           1/20/11       Unemployed      M      0      Suburban
## 5            2/3/11         Employed      M  43836         Rural
## 6           1/25/11         Employed      F  62902         Rural
##   Marital.Status Monthly.Premium.Auto Months.Since.Last.Claim
## 1        Married                   69                      32
## 2         Single                   94                      13
## 3        Married                  108                      18
## 4        Married                  106                      18
## 5         Single                   73                      12
## 6        Married                   69                      14
##   Months.Since.Policy.Inception Number.of.Open.Complaints
## 1                             5                         0
## 2                            42                         0
## 3                            38                         0
## 4                            65                         0
## 5                            44                         0
## 6                            94                         0
##   Number.of.Policies    Policy.Type       Policy Renew.Offer.Type
## 1                  1 Corporate Auto Corporate L3           Offer1
## 2                  8  Personal Auto  Personal L3           Offer3
## 3                  2  Personal Auto  Personal L3           Offer1
## 4                  7 Corporate Auto Corporate L2           Offer1
## 5                  1  Personal Auto  Personal L1           Offer1
## 6                  2  Personal Auto  Personal L3           Offer2
##   Sales.Channel Total.Claim.Amount Vehicle.Class Vehicle.Size
## 1         Agent           384.8111  Two-Door Car      Medsize
## 2         Agent          1131.4649 Four-Door Car      Medsize
## 3         Agent           566.4722  Two-Door Car      Medsize
## 4   Call Center           529.8813           SUV      Medsize
## 5         Agent           138.1309 Four-Door Car      Medsize
## 6           Web           159.3830  Two-Door Car      Medsize


##Data Exploration


sapply(insurance.data, class)
##                      Customer                         State 
##                      "factor"                      "factor" 
##       Customer.Lifetime.Value                      Response 
##                     "numeric"                      "factor" 
##                      Coverage                     Education 
##                      "factor"                      "factor" 
##             Effective.To.Date              EmploymentStatus 
##                      "factor"                      "factor" 
##                        Gender                        Income 
##                      "factor"                     "integer" 
##                 Location.Code                Marital.Status 
##                      "factor"                      "factor" 
##          Monthly.Premium.Auto       Months.Since.Last.Claim 
##                     "integer"                     "integer" 
## Months.Since.Policy.Inception     Number.of.Open.Complaints 
##                     "integer"                     "integer" 
##            Number.of.Policies                   Policy.Type 
##                     "integer"                      "factor" 
##                        Policy              Renew.Offer.Type 
##                      "factor"                      "factor" 
##                 Sales.Channel            Total.Claim.Amount 
##                      "factor"                     "numeric" 
##                 Vehicle.Class                  Vehicle.Size 
##                      "factor"                      "factor"
summary(insurance.data)
##     Customer           State      Customer.Lifetime.Value Response  
##  AA10041:   1   Arizona   :1703   Min.   : 1898           No :7826  
##  AA11235:   1   California:3150   1st Qu.: 3994           Yes:1308  
##  AA16582:   1   Nevada    : 882   Median : 5780                     
##  AA30683:   1   Oregon    :2601   Mean   : 8005                     
##  AA34092:   1   Washington: 798   3rd Qu.: 8962                     
##  AA35519:   1                     Max.   :83325                     
##  (Other):9128                                                       
##      Coverage                   Education    Effective.To.Date
##  Basic   :5568   Bachelor            :2748   1/10/11: 195     
##  Extended:2742   College             :2681   1/27/11: 194     
##  Premium : 824   Doctor              : 342   2/14/11: 186     
##                  High School or Below:2622   1/26/11: 181     
##                  Master              : 741   1/17/11: 180     
##                                              1/19/11: 179     
##                                              (Other):8019     
##       EmploymentStatus Gender       Income       Location.Code 
##  Disabled     : 405    F:4658   Min.   :    0   Rural   :1773  
##  Employed     :5698    M:4476   1st Qu.:    0   Suburban:5779  
##  Medical Leave: 432             Median :33890   Urban   :1582  
##  Retired      : 282             Mean   :37657                  
##  Unemployed   :2317             3rd Qu.:62320                  
##                                 Max.   :99981                  
##                                                                
##   Marital.Status Monthly.Premium.Auto Months.Since.Last.Claim
##  Divorced:1369   Min.   : 61.00       Min.   : 0.0           
##  Married :5298   1st Qu.: 68.00       1st Qu.: 6.0           
##  Single  :2467   Median : 83.00       Median :14.0           
##                  Mean   : 93.22       Mean   :15.1           
##                  3rd Qu.:109.00       3rd Qu.:23.0           
##                  Max.   :298.00       Max.   :35.0           
##                                                              
##  Months.Since.Policy.Inception Number.of.Open.Complaints
##  Min.   : 0.00                 Min.   :0.0000           
##  1st Qu.:24.00                 1st Qu.:0.0000           
##  Median :48.00                 Median :0.0000           
##  Mean   :48.06                 Mean   :0.3844           
##  3rd Qu.:71.00                 3rd Qu.:0.0000           
##  Max.   :99.00                 Max.   :5.0000           
##                                                         
##  Number.of.Policies         Policy.Type            Policy    
##  Min.   :1.000      Corporate Auto:1968   Personal L3 :3426  
##  1st Qu.:1.000      Personal Auto :6788   Personal L2 :2122  
##  Median :2.000      Special Auto  : 378   Personal L1 :1240  
##  Mean   :2.966                            Corporate L3:1014  
##  3rd Qu.:4.000                            Corporate L2: 595  
##  Max.   :9.000                            Corporate L1: 359  
##                                           (Other)     : 378  
##  Renew.Offer.Type     Sales.Channel  Total.Claim.Amount
##  Offer1:3752      Agent      :3477   Min.   :   0.099  
##  Offer2:2926      Branch     :2567   1st Qu.: 272.258  
##  Offer3:1432      Call Center:1765   Median : 383.945  
##  Offer4:1024      Web        :1325   Mean   : 434.089  
##                                      3rd Qu.: 547.515  
##                                      Max.   :2893.240  
##                                                        
##        Vehicle.Class   Vehicle.Size 
##  Four-Door Car:4621   Large  : 946  
##  Luxury Car   : 163   Medsize:6424  
##  Luxury SUV   : 184   Small  :1764  
##  Sports Car   : 484                 
##  SUV          :1796                 
##  Two-Door Car :1886                 
## 
str(insurance.data)
## 'data.frame':    9134 obs. of  24 variables:
##  $ Customer                     : Factor w/ 9134 levels "AA10041","AA11235",..: 601 5947 97 8017 2489 4948 8434 756 1352 548 ...
##  $ State                        : Factor w/ 5 levels "Arizona","California",..: 5 1 3 2 5 4 4 1 4 4 ...
##  $ Customer.Lifetime.Value      : num  2764 6980 12887 7646 2814 ...
##  $ Response                     : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 2 1 2 1 ...
##  $ Coverage                     : Factor w/ 3 levels "Basic","Extended",..: 1 2 3 1 1 1 1 3 1 2 ...
##  $ Education                    : Factor w/ 5 levels "Bachelor","College",..: 1 1 1 1 1 1 2 5 1 2 ...
##  $ Effective.To.Date            : Factor w/ 59 levels "1/1/11","1/10/11",..: 48 25 42 13 53 18 48 10 19 40 ...
##  $ EmploymentStatus             : Factor w/ 5 levels "Disabled","Employed",..: 2 5 2 5 2 2 2 5 3 2 ...
##  $ Gender                       : Factor w/ 2 levels "F","M": 1 1 1 2 2 1 1 2 2 1 ...
##  $ Income                       : int  56274 0 48767 0 43836 62902 55350 0 14072 28812 ...
##  $ Location.Code                : Factor w/ 3 levels "Rural","Suburban",..: 2 2 2 2 1 1 2 3 2 3 ...
##  $ Marital.Status               : Factor w/ 3 levels "Divorced","Married",..: 2 3 2 2 3 2 2 3 1 2 ...
##  $ Monthly.Premium.Auto         : int  69 94 108 106 73 69 67 101 71 93 ...
##  $ Months.Since.Last.Claim      : int  32 13 18 18 12 14 0 0 13 17 ...
##  $ Months.Since.Policy.Inception: int  5 42 38 65 44 94 13 68 3 7 ...
##  $ Number.of.Open.Complaints    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number.of.Policies           : int  1 8 2 7 1 2 9 4 2 8 ...
##  $ Policy.Type                  : Factor w/ 3 levels "Corporate Auto",..: 1 2 2 1 2 2 1 1 1 3 ...
##  $ Policy                       : Factor w/ 9 levels "Corporate L1",..: 3 6 6 2 4 6 3 3 3 8 ...
##  $ Renew.Offer.Type             : Factor w/ 4 levels "Offer1","Offer2",..: 1 3 1 1 1 2 1 1 1 2 ...
##  $ Sales.Channel                : Factor w/ 4 levels "Agent","Branch",..: 1 1 1 3 1 4 1 1 1 2 ...
##  $ Total.Claim.Amount           : num  385 1131 566 530 138 ...
##  $ Vehicle.Class                : Factor w/ 6 levels "Four-Door Car",..: 6 1 6 5 1 6 1 1 1 1 ...
##  $ Vehicle.Size                 : Factor w/ 3 levels "Large","Medsize",..: 2 2 2 2 2 2 2 2 2 2 ...
glimpse(insurance.data)
## Observations: 9,134
## Variables: 24
## $ Customer                      <fct> BU79786, QZ44356, AI49188, WW632...
## $ State                         <fct> Washington, Arizona, Nevada, Cal...
## $ Customer.Lifetime.Value       <dbl> 2763.519, 6979.536, 12887.432, 7...
## $ Response                      <fct> No, No, No, No, No, Yes, Yes, No...
## $ Coverage                      <fct> Basic, Extended, Premium, Basic,...
## $ Education                     <fct> Bachelor, Bachelor, Bachelor, Ba...
## $ Effective.To.Date             <fct> 2/24/11, 1/31/11, 2/19/11, 1/20/...
## $ EmploymentStatus              <fct> Employed, Unemployed, Employed, ...
## $ Gender                        <fct> F, F, F, M, M, F, F, M, M, F, M,...
## $ Income                        <int> 56274, 0, 48767, 0, 43836, 62902...
## $ Location.Code                 <fct> Suburban, Suburban, Suburban, Su...
## $ Marital.Status                <fct> Married, Single, Married, Marrie...
## $ Monthly.Premium.Auto          <int> 69, 94, 108, 106, 73, 69, 67, 10...
## $ Months.Since.Last.Claim       <int> 32, 13, 18, 18, 12, 14, 0, 0, 13...
## $ Months.Since.Policy.Inception <int> 5, 42, 38, 65, 44, 94, 13, 68, 3...
## $ Number.of.Open.Complaints     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Number.of.Policies            <int> 1, 8, 2, 7, 1, 2, 9, 4, 2, 8, 3,...
## $ Policy.Type                   <fct> Corporate Auto, Personal Auto, P...
## $ Policy                        <fct> Corporate L3, Personal L3, Perso...
## $ Renew.Offer.Type              <fct> Offer1, Offer3, Offer1, Offer1, ...
## $ Sales.Channel                 <fct> Agent, Agent, Agent, Call Center...
## $ Total.Claim.Amount            <dbl> 384.81115, 1131.46493, 566.47225...
## $ Vehicle.Class                 <fct> Two-Door Car, Four-Door Car, Two...
## $ Vehicle.Size                  <fct> Medsize, Medsize, Medsize, Medsi...

Use sapply() function to count the number of observations with each feature that contains.

sapply(insurance.data, function(x) sum(is.na(x)))
##                      Customer                         State 
##                             0                             0 
##       Customer.Lifetime.Value                      Response 
##                             0                             0 
##                      Coverage                     Education 
##                             0                             0 
##             Effective.To.Date              EmploymentStatus 
##                             0                             0 
##                        Gender                        Income 
##                             0                             0 
##                 Location.Code                Marital.Status 
##                             0                             0 
##          Monthly.Premium.Auto       Months.Since.Last.Claim 
##                             0                             0 
## Months.Since.Policy.Inception     Number.of.Open.Complaints 
##                             0                             0 
##            Number.of.Policies                   Policy.Type 
##                             0                             0 
##                        Policy              Renew.Offer.Type 
##                             0                             0 
##                 Sales.Channel            Total.Claim.Amount 
##                             0                             0 
##                 Vehicle.Class                  Vehicle.Size 
##                             0                             0

#Similarly, the number of unique observations per column is revealed below.

sapply(insurance.data, function(x) length(unique(x)))
##                      Customer                         State 
##                          9134                             5 
##       Customer.Lifetime.Value                      Response 
##                          8041                             2 
##                      Coverage                     Education 
##                             3                             5 
##             Effective.To.Date              EmploymentStatus 
##                            59                             5 
##                        Gender                        Income 
##                             2                          5694 
##                 Location.Code                Marital.Status 
##                             3                             3 
##          Monthly.Premium.Auto       Months.Since.Last.Claim 
##                           202                            36 
## Months.Since.Policy.Inception     Number.of.Open.Complaints 
##                           100                             6 
##            Number.of.Policies                   Policy.Type 
##                             9                             3 
##                        Policy              Renew.Offer.Type 
##                             9                             4 
##                 Sales.Channel            Total.Claim.Amount 
##                             4                          5106 
##                 Vehicle.Class                  Vehicle.Size 
##                             6                             3

Using the missmap() function under the Amelia package, the visualization of the amount of missing and observed values per features is observed below. Most information in the Cabin and Age features are missing in both datasets.

library(Amelia)
## Warning: package 'Amelia' was built under R version 3.6.3
## Loading required package: Rcpp
## ## 
## ## Amelia II: Multiple Imputation
## ## (Version 1.7.6, built: 2019-11-24)
## ## Copyright (C) 2005-2020 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
missmap(insurance.data, main = "Missing Values vs. Observed")

Our data contains 9134 customers with information about their income, education, gender,residence and so on.

Each customer owns a car and you as entrepreneur offers 4 different car insurances to them. The target of this dataset is the Response.The response can be “Yes” - the customer accept the offer and “No” - the customer didn´t accept the offer.
#Using Graphs to understand our Data
# Relation between numerical variables

nums <- unlist(lapply(insurance.data, is.numeric)) 
insurance_numeric<-insurance.data[,nums]
corr<-cor(insurance_numeric)

library(ggcorrplot)
ggcorrplot(corr, hc.order = TRUE, type = "lower",lab = TRUE)


## Exploratory Data Analysis
Relation between categorial variables and response variable Gender - > Response

library(ggcorrplot)
tbl_gen <- with(insurance.data, table(Gender, Response))
ggplot(as.data.frame(tbl_gen), aes(factor(Response),Freq, fill=Gender) )+ geom_col(position = 'dodge')

State - > Response

library(ggcorrplot)
tbl_State <- with(insurance.data, table(State, Response))
ggplot(as.data.frame(tbl_State), aes(factor(State),Freq, fill=Response) )+ geom_col(position = 'dodge')

Coverage -> Response

library(ggcorrplot)
tbl_Coverage <- with(insurance.data, table(Coverage, Response))
ggplot(as.data.frame(tbl_Coverage), aes(factor(Coverage),Freq, fill=Response) )+ geom_col(position = 'dodge')

Education -> Response

library(ggcorrplot)
tbl_Education <- with(insurance.data, table(Education, Response))
ggplot(as.data.frame(tbl_Coverage), aes(factor(Coverage),Freq, fill=Response) )+ geom_col(position = 'dodge')

EmploymentStatus -> Response

library(ggcorrplot)
tbl_EmploymentStatus <- with(insurance.data, table(EmploymentStatus, Response))
ggplot(as.data.frame(tbl_EmploymentStatus), aes(factor(EmploymentStatus),Freq, fill=Response) )+ geom_col(position = 'dodge')

Location Code - > Response

library(ggcorrplot)
tbl_LocationCode <- with(insurance.data, table(Location.Code, Response))
ggplot(as.data.frame(tbl_LocationCode), aes(factor(Location.Code),Freq, fill=Response) )+ geom_col(position = 'dodge')

Marital.Status -> Response

library(ggcorrplot)
tbl_MaritalStatus <- with(insurance.data, table(Marital.Status, Response))
ggplot(as.data.frame(tbl_MaritalStatus), aes(factor(Marital.Status),Freq, fill=Response) )+ geom_col(position = 'dodge')

Monthly.Premium.Auto -> Response

library(ggcorrplot)
ggplot(insurance.data, aes(x = Monthly.Premium.Auto,fill=Response)) + geom_histogram(position = 'dodge')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Months.Since.Last.Claim -> Response

library(ggcorrplot)
ggplot(insurance.data, aes(x = Months.Since.Last.Claim ,fill=Response)) + geom_histogram(position = 'dodge') 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Months.Since.Policy.Inception -> Response

library(ggcorrplot)
ggplot(insurance.data, aes(x = Months.Since.Policy.Inception ,fill=Response)) + geom_histogram(position = 'dodge') 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Number.of.Open.Complaints -> Response

library(ggcorrplot)
ggplot(insurance.data, aes(x = Number.of.Open.Complaints  ,fill=Response)) + geom_histogram(position = 'dodge') 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Number.of.Policies -> Response

library(ggcorrplot)
ggplot(insurance.data, aes(x = Number.of.Policies  ,fill=Response)) + geom_histogram(position = 'dodge')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Policy.Type -> Response

library(ggcorrplot)
tbl_PolicyType <- with(insurance.data, table(Policy.Type, Response))
ggplot(as.data.frame(tbl_PolicyType), aes(factor(Policy.Type),Freq, fill=Response) )+ geom_col(position = 'dodge')

Renew.Offer.Type -> Response

library(ggcorrplot)
tbl_RenewOfferType <- with(insurance.data, table(Renew.Offer.Type, Response))
ggplot(as.data.frame(tbl_RenewOfferType), aes(factor(Renew.Offer.Type),Freq, fill=Response) )+ geom_col(position = 'dodge')

Sales.Channel -> Response

library(ggcorrplot)
tbl_SalesChannel <- with(insurance.data, table(Sales.Channel, Response))
ggplot(as.data.frame(tbl_SalesChannel), aes(factor(Sales.Channel),Freq, fill=Response) )+ geom_col(position = 'dodge')

Total.Claim.Amount -> Response

library(ggcorrplot)
ggplot(insurance.data, aes(x = Total.Claim.Amount  ,fill=Response)) + geom_histogram(position = 'dodge')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Vehicle.Class -> Response

library(ggcorrplot)
tbl_VehicleClass <- with(insurance.data, table(Vehicle.Class, Response))
ggplot(as.data.frame(tbl_VehicleClass), aes(factor(Vehicle.Class),Freq, fill=Response) )+ geom_col(position = 'dodge')

Vehicle.Size -> Response

library(ggcorrplot)
tbl_VehicleSize <- with(insurance.data, table(Vehicle.Size, Response))
ggplot(as.data.frame(tbl_VehicleSize), aes(factor(Vehicle.Size),Freq, fill=Response) )+ geom_col(position = 'dodge')


##Data Wrangling - cleaning
All categorial features are well distributet, so I will keep them and encode them to numerical data. Some columns don´t make sense or are not so important, e.g. Customer (because it´s just a unique number),

Policy is the same as Policy Type, Effective To Date is also not important, so I will drop them. The data is inbalanced regarding the outcome “Response”

insurance.data = subset(insurance.data , select = -c(Customer,Policy,Effective.To.Date) )
str(insurance.data)
## 'data.frame':    9134 obs. of  21 variables:
##  $ State                        : Factor w/ 5 levels "Arizona","California",..: 5 1 3 2 5 4 4 1 4 4 ...
##  $ Customer.Lifetime.Value      : num  2764 6980 12887 7646 2814 ...
##  $ Response                     : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 2 1 2 1 ...
##  $ Coverage                     : Factor w/ 3 levels "Basic","Extended",..: 1 2 3 1 1 1 1 3 1 2 ...
##  $ Education                    : Factor w/ 5 levels "Bachelor","College",..: 1 1 1 1 1 1 2 5 1 2 ...
##  $ EmploymentStatus             : Factor w/ 5 levels "Disabled","Employed",..: 2 5 2 5 2 2 2 5 3 2 ...
##  $ Gender                       : Factor w/ 2 levels "F","M": 1 1 1 2 2 1 1 2 2 1 ...
##  $ Income                       : int  56274 0 48767 0 43836 62902 55350 0 14072 28812 ...
##  $ Location.Code                : Factor w/ 3 levels "Rural","Suburban",..: 2 2 2 2 1 1 2 3 2 3 ...
##  $ Marital.Status               : Factor w/ 3 levels "Divorced","Married",..: 2 3 2 2 3 2 2 3 1 2 ...
##  $ Monthly.Premium.Auto         : int  69 94 108 106 73 69 67 101 71 93 ...
##  $ Months.Since.Last.Claim      : int  32 13 18 18 12 14 0 0 13 17 ...
##  $ Months.Since.Policy.Inception: int  5 42 38 65 44 94 13 68 3 7 ...
##  $ Number.of.Open.Complaints    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number.of.Policies           : int  1 8 2 7 1 2 9 4 2 8 ...
##  $ Policy.Type                  : Factor w/ 3 levels "Corporate Auto",..: 1 2 2 1 2 2 1 1 1 3 ...
##  $ Renew.Offer.Type             : Factor w/ 4 levels "Offer1","Offer2",..: 1 3 1 1 1 2 1 1 1 2 ...
##  $ Sales.Channel                : Factor w/ 4 levels "Agent","Branch",..: 1 1 1 3 1 4 1 1 1 2 ...
##  $ Total.Claim.Amount           : num  385 1131 566 530 138 ...
##  $ Vehicle.Class                : Factor w/ 6 levels "Four-Door Car",..: 6 1 6 5 1 6 1 1 1 1 ...
##  $ Vehicle.Size                 : Factor w/ 3 levels "Large","Medsize",..: 2 2 2 2 2 2 2 2 2 2 ...

Encode the categorial Data to numerical

encode_ordinal <- function(x, order = unique(x)) {
  x <- as.numeric(factor(x, levels = order, exclude = NULL))
  x
}

table(insurance.data[["Response"]], encode_ordinal(insurance.data[["Response"]]), useNA = "ifany")
##      
##          1    2
##   No  7826    0
##   Yes    0 1308

Updated Dataset

insurance.data.new <- insurance.data
insurance.data.new[["Response"]] <- encode_ordinal(insurance.data[["Response"]])
head(insurance.data.new)
##        State Customer.Lifetime.Value Response Coverage Education
## 1 Washington                2763.519        1    Basic  Bachelor
## 2    Arizona                6979.536        1 Extended  Bachelor
## 3     Nevada               12887.432        1  Premium  Bachelor
## 4 California                7645.862        1    Basic  Bachelor
## 5 Washington                2813.693        1    Basic  Bachelor
## 6     Oregon                8256.298        2    Basic  Bachelor
##   EmploymentStatus Gender Income Location.Code Marital.Status
## 1         Employed      F  56274      Suburban        Married
## 2       Unemployed      F      0      Suburban         Single
## 3         Employed      F  48767      Suburban        Married
## 4       Unemployed      M      0      Suburban        Married
## 5         Employed      M  43836         Rural         Single
## 6         Employed      F  62902         Rural        Married
##   Monthly.Premium.Auto Months.Since.Last.Claim
## 1                   69                      32
## 2                   94                      13
## 3                  108                      18
## 4                  106                      18
## 5                   73                      12
## 6                   69                      14
##   Months.Since.Policy.Inception Number.of.Open.Complaints
## 1                             5                         0
## 2                            42                         0
## 3                            38                         0
## 4                            65                         0
## 5                            44                         0
## 6                            94                         0
##   Number.of.Policies    Policy.Type Renew.Offer.Type Sales.Channel
## 1                  1 Corporate Auto           Offer1         Agent
## 2                  8  Personal Auto           Offer3         Agent
## 3                  2  Personal Auto           Offer1         Agent
## 4                  7 Corporate Auto           Offer1   Call Center
## 5                  1  Personal Auto           Offer1         Agent
## 6                  2  Personal Auto           Offer2           Web
##   Total.Claim.Amount Vehicle.Class Vehicle.Size
## 1           384.8111  Two-Door Car      Medsize
## 2          1131.4649 Four-Door Car      Medsize
## 3           566.4722  Two-Door Car      Medsize
## 4           529.8813           SUV      Medsize
## 5           138.1309 Four-Door Car      Medsize
## 6           159.3830  Two-Door Car      Medsize
str(insurance.data.new)
## 'data.frame':    9134 obs. of  21 variables:
##  $ State                        : Factor w/ 5 levels "Arizona","California",..: 5 1 3 2 5 4 4 1 4 4 ...
##  $ Customer.Lifetime.Value      : num  2764 6980 12887 7646 2814 ...
##  $ Response                     : num  1 1 1 1 1 2 2 1 2 1 ...
##  $ Coverage                     : Factor w/ 3 levels "Basic","Extended",..: 1 2 3 1 1 1 1 3 1 2 ...
##  $ Education                    : Factor w/ 5 levels "Bachelor","College",..: 1 1 1 1 1 1 2 5 1 2 ...
##  $ EmploymentStatus             : Factor w/ 5 levels "Disabled","Employed",..: 2 5 2 5 2 2 2 5 3 2 ...
##  $ Gender                       : Factor w/ 2 levels "F","M": 1 1 1 2 2 1 1 2 2 1 ...
##  $ Income                       : int  56274 0 48767 0 43836 62902 55350 0 14072 28812 ...
##  $ Location.Code                : Factor w/ 3 levels "Rural","Suburban",..: 2 2 2 2 1 1 2 3 2 3 ...
##  $ Marital.Status               : Factor w/ 3 levels "Divorced","Married",..: 2 3 2 2 3 2 2 3 1 2 ...
##  $ Monthly.Premium.Auto         : int  69 94 108 106 73 69 67 101 71 93 ...
##  $ Months.Since.Last.Claim      : int  32 13 18 18 12 14 0 0 13 17 ...
##  $ Months.Since.Policy.Inception: int  5 42 38 65 44 94 13 68 3 7 ...
##  $ Number.of.Open.Complaints    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number.of.Policies           : int  1 8 2 7 1 2 9 4 2 8 ...
##  $ Policy.Type                  : Factor w/ 3 levels "Corporate Auto",..: 1 2 2 1 2 2 1 1 1 3 ...
##  $ Renew.Offer.Type             : Factor w/ 4 levels "Offer1","Offer2",..: 1 3 1 1 1 2 1 1 1 2 ...
##  $ Sales.Channel                : Factor w/ 4 levels "Agent","Branch",..: 1 1 1 3 1 4 1 1 1 2 ...
##  $ Total.Claim.Amount           : num  385 1131 566 530 138 ...
##  $ Vehicle.Class                : Factor w/ 6 levels "Four-Door Car",..: 6 1 6 5 1 6 1 1 1 1 ...
##  $ Vehicle.Size                 : Factor w/ 3 levels "Large","Medsize",..: 2 2 2 2 2 2 2 2 2 2 ...


##Correlation Graph
Analyzing the relationship between feature variables and the target variable

nums_new <- unlist(lapply(insurance.data.new, is.numeric)) 
insurance_numeric_new<-insurance.data.new[,nums_new]
corrnew<-cor(insurance_numeric_new)

library(ggcorrplot)
ggcorrplot(corrnew, hc.order = TRUE, type = "lower",lab = TRUE)

library("ggplot2")
library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
melted_cormat <- melt(corrnew)
head(melted_cormat)
##                            Var1                    Var2        value
## 1       Customer.Lifetime.Value Customer.Lifetime.Value  1.000000000
## 2                      Response Customer.Lifetime.Value -0.008929582
## 3                        Income Customer.Lifetime.Value  0.024365661
## 4          Monthly.Premium.Auto Customer.Lifetime.Value  0.396261738
## 5       Months.Since.Last.Claim Customer.Lifetime.Value  0.011516682
## 6 Months.Since.Policy.Inception Customer.Lifetime.Value  0.009418381
library(ggplot2)
ggplot(data = melted_cormat, aes(x=Var1, y=Var2, fill=value)) + 
  geom_tile()+theme(axis.text.x=element_text(angle = 90)) 


##Model building
##Logistic regression

insurance.data.new$Response[insurance.data.new$Response==1] <- 0
insurance.data.new$Response[insurance.data.new$Response==2] <- 1

insurance.datas <- insurance.data[ , -which(names(insurance.data) %in% c("Customer","Policy","Effective.To.Date"))]

Split the data

set.seed(13255870)
index <- sample(nrow(insurance.data.new),nrow(insurance.data.new)*0.70)
insurance.train = insurance.data.new[index,]
insurance.test = insurance.data.new[-index,]

str(insurance.train)
## 'data.frame':    6393 obs. of  21 variables:
##  $ State                        : Factor w/ 5 levels "Arizona","California",..: 1 4 4 2 4 1 2 1 4 1 ...
##  $ Customer.Lifetime.Value      : num  4014 5511 8305 2787 8677 ...
##  $ Response                     : num  1 0 0 0 0 0 0 0 0 0 ...
##  $ Coverage                     : Factor w/ 3 levels "Basic","Extended",..: 2 1 2 1 1 2 3 1 1 2 ...
##  $ Education                    : Factor w/ 5 levels "Bachelor","College",..: 3 4 2 1 4 2 2 5 2 4 ...
##  $ EmploymentStatus             : Factor w/ 5 levels "Disabled","Employed",..: 2 5 5 2 2 2 2 2 5 2 ...
##  $ Gender                       : Factor w/ 2 levels "F","M": 1 1 2 2 2 2 1 1 1 2 ...
##  $ Income                       : int  37384 0 0 38667 76214 25899 92850 51199 0 53603 ...
##  $ Location.Code                : Factor w/ 3 levels "Rural","Suburban",..: 2 2 2 1 3 2 2 2 2 2 ...
##  $ Marital.Status               : Factor w/ 3 levels "Divorced","Married",..: 2 3 2 3 2 1 3 1 3 2 ...
##  $ Monthly.Premium.Auto         : int  99 73 122 72 72 79 104 74 72 132 ...
##  $ Months.Since.Last.Claim      : int  9 24 22 8 7 10 3 19 15 30 ...
##  $ Months.Since.Policy.Inception: int  17 57 14 67 48 11 28 52 70 1 ...
##  $ Number.of.Open.Complaints    : int  0 0 2 0 0 0 1 0 0 1 ...
##  $ Number.of.Policies           : int  1 4 9 1 2 8 2 1 2 1 ...
##  $ Policy.Type                  : Factor w/ 3 levels "Corporate Auto",..: 2 2 1 2 3 2 2 1 2 2 ...
##  $ Renew.Offer.Type             : Factor w/ 4 levels "Offer1","Offer2",..: 2 1 2 1 2 4 1 1 2 1 ...
##  $ Sales.Channel                : Factor w/ 4 levels "Agent","Branch",..: 1 2 1 1 4 2 1 2 2 2 ...
##  $ Total.Claim.Amount           : num  475 526 681 159 203 ...
##  $ Vehicle.Class                : Factor w/ 6 levels "Four-Door Car",..: 1 1 5 1 1 6 1 1 1 4 ...
##  $ Vehicle.Size                 : Factor w/ 3 levels "Large","Medsize",..: 2 1 3 2 2 2 3 3 2 2 ...

model

glm0<-glm(Response~.,family = binomial(link = 'logit'),data = insurance.train)
summary(glm0)
## 
## Call:
## glm(formula = Response ~ ., family = binomial(link = "logit"), 
##     data = insurance.train)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.32576  -0.56835  -0.37340  -0.00021   3.11850  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -1.934e+00  5.122e-01  -3.775 0.000160 ***
## StateCalifornia                4.836e-02  1.159e-01   0.417 0.676380    
## StateNevada                   -9.692e-03  1.605e-01  -0.060 0.951845    
## StateOregon                   -2.025e-02  1.207e-01  -0.168 0.866821    
## StateWashington               -6.128e-02  1.660e-01  -0.369 0.711951    
## Customer.Lifetime.Value       -6.050e-06  6.443e-06  -0.939 0.347675    
## CoverageExtended              -6.962e-02  1.537e-01  -0.453 0.650504    
## CoveragePremium               -7.652e-02  3.238e-01  -0.236 0.813178    
## EducationCollege               1.106e-01  1.062e-01   1.042 0.297565    
## EducationDoctor                4.536e-01  2.051e-01   2.211 0.027021 *  
## EducationHigh School or Below  1.951e-02  1.079e-01   0.181 0.856485    
## EducationMaster                3.813e-01  1.555e-01   2.452 0.014205 *  
## EmploymentStatusEmployed      -2.355e-01  1.921e-01  -1.226 0.220202    
## EmploymentStatusMedical Leave  1.111e-01  2.288e-01   0.486 0.627068    
## EmploymentStatusRetired        2.508e+00  2.529e-01   9.916  < 2e-16 ***
## EmploymentStatusUnemployed    -6.263e-01  1.992e-01  -3.144 0.001669 ** 
## GenderM                        5.920e-02  8.173e-02   0.724 0.468852    
## Income                         4.149e-06  2.328e-06   1.782 0.074716 .  
## Location.CodeSuburban          1.430e+00  1.789e-01   7.993 1.32e-15 ***
## Location.CodeUrban             9.295e-02  1.759e-01   0.528 0.597219    
## Marital.StatusMarried         -4.718e-01  1.090e-01  -4.327 1.51e-05 ***
## Marital.StatusSingle          -4.882e-01  1.293e-01  -3.775 0.000160 ***
## Monthly.Premium.Auto           8.108e-03  6.248e-03   1.298 0.194373    
## Months.Since.Last.Claim       -4.898e-03  4.103e-03  -1.194 0.232553    
## Months.Since.Policy.Inception  2.724e-04  1.450e-03   0.188 0.851032    
## Number.of.Open.Complaints     -5.366e-02  4.638e-02  -1.157 0.247243    
## Number.of.Policies            -2.426e-02  1.710e-02  -1.419 0.155943    
## Policy.TypePersonal Auto       2.368e-02  1.006e-01   0.235 0.813998    
## Policy.TypeSpecial Auto        3.530e-01  2.044e-01   1.727 0.084129 .  
## Renew.Offer.TypeOffer2         6.859e-01  8.850e-02   7.751 9.14e-15 ***
## Renew.Offer.TypeOffer3        -2.389e+00  2.675e-01  -8.931  < 2e-16 ***
## Renew.Offer.TypeOffer4        -1.679e+01  2.318e+02  -0.072 0.942264    
## Sales.ChannelBranch           -5.364e-01  1.008e-01  -5.324 1.01e-07 ***
## Sales.ChannelCall Center      -4.070e-01  1.143e-01  -3.562 0.000368 ***
## Sales.ChannelWeb              -6.886e-01  1.388e-01  -4.960 7.06e-07 ***
## Total.Claim.Amount            -1.479e-03  3.367e-04  -4.392 1.12e-05 ***
## Vehicle.ClassLuxury Car       -4.072e-01  8.933e-01  -0.456 0.648550    
## Vehicle.ClassLuxury SUV       -3.128e-02  8.557e-01  -0.037 0.970843    
## Vehicle.ClassSports Car        3.149e-01  3.162e-01   0.996 0.319193    
## Vehicle.ClassSUV               2.841e-01  2.799e-01   1.015 0.310060    
## Vehicle.ClassTwo-Door Car      6.233e-02  1.078e-01   0.578 0.563177    
## Vehicle.SizeMedsize           -2.747e-01  1.266e-01  -2.170 0.030018 *  
## Vehicle.SizeSmall             -6.271e-01  1.530e-01  -4.098 4.16e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5203.1  on 6392  degrees of freedom
## Residual deviance: 4031.5  on 6350  degrees of freedom
## AIC: 4117.5
## 
## Number of Fisher Scoring iterations: 17
insurance_model0_insample <- predict(glm0, type="response")
pred <- prediction(insurance_model0_insample,insurance.train$Response)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)

Get Area Under Curve (AUC)

cat('AUC for full model is ',unlist(slot(performance(pred, "auc"), "y.values")))
## AUC for full model is  0.8170277

Using Model for our Testing data

insurance_model0_insample <- predict(glm0, newdata =insurance.test ,type="response")
pred <- prediction(insurance_model0_insample,insurance.test$Response)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)

Get Area Under Curve (AUC)

cat('AUC for full model is ',unlist(slot(performance(pred, "auc"), "y.values")))
## AUC for full model is  0.8090875

Confusion Matrix

predict.insurance.glm <- predict(glm0, newdata = insurance.test, type = "response")
conf.mat.glm <- table(insurance.test$Response, predict.insurance.glm>.5)

fourfoldplot(conf.mat.glm, color = c("#CC6666", "#99CC99"),
             conf.level = 0, margin = 1, main = "Confusion Matrix GLM")

glm_accuracy <- (2312+63)/(2312+23+343+63)
glm_recall <- 63/(63+2312)
glm_precision <- 63/(63+343)

 
glm_accuracy 
## [1] 0.8664721
glm_recall 
## [1] 0.02652632
glm_precision
## [1] 0.1551724
resp_count <- insurance.data.new %>% group_by(Response) %>% summarise(count = n())
resp_count
## # A tibble: 2 x 2
##   Response count
##      <dbl> <int>
## 1        0  7826
## 2        1  1308


##Logistic Regression Model number 2 Using Lasso

dummy<- model.matrix(~ ., data = insurance.data.new)
insurance_data_lasso <- data.frame(dummy[,-1])
insurance.train.X <- as.matrix(select(insurance_data_lasso, -Response)[index,])
insurance.test.X <- as.matrix(select(insurance_data_lasso, -Response)[-index,])
insurance.train.Y <- insurance_data_lasso[index, "Response"]
insurance.test.Y <- insurance_data_lasso[-index, "Response"]


insurance_lasso <- glmnet(x=insurance.train.X, y=insurance.train.Y, family = "binomial")
insurance_lasso_cv <- cv.glmnet(x=insurance.train.X, y=insurance.train.Y, family = "binomial", type.measure = "class")
plot(insurance_lasso_cv)

par(mfrow=c(1,1))

coef(insurance_lasso, s=insurance_lasso_cv$lambda.min)
## 43 x 1 sparse Matrix of class "dgCMatrix"
##                                           1
## (Intercept)                   -1.811631e+00
## StateCalifornia                .           
## StateNevada                    .           
## StateOregon                    .           
## StateWashington                .           
## Customer.Lifetime.Value        .           
## CoverageExtended               .           
## CoveragePremium                .           
## EducationCollege               .           
## EducationDoctor                1.935424e-01
## EducationHigh.School.or.Below  .           
## EducationMaster                1.854072e-01
## EmploymentStatusEmployed       .           
## EmploymentStatusMedical.Leave  4.950095e-02
## EmploymentStatusRetired        2.456598e+00
## EmploymentStatusUnemployed    -5.806465e-01
## GenderM                        .           
## Income                         5.080228e-07
## Location.CodeSuburban          8.330149e-01
## Location.CodeUrban            -1.098303e-01
## Marital.StatusMarried         -2.341657e-01
## Marital.StatusSingle          -2.912739e-01
## Monthly.Premium.Auto           5.495355e-05
## Months.Since.Last.Claim       -1.644167e-03
## Months.Since.Policy.Inception  .           
## Number.of.Open.Complaints     -5.522565e-03
## Number.of.Policies            -1.047594e-02
## Policy.TypePersonal.Auto       .           
## Policy.TypeSpecial.Auto        1.183639e-01
## Renew.Offer.TypeOffer2         6.036038e-01
## Renew.Offer.TypeOffer3        -1.875267e+00
## Renew.Offer.TypeOffer4        -2.712024e+00
## Sales.ChannelBranch           -3.516541e-01
## Sales.ChannelCall.Center      -2.149748e-01
## Sales.ChannelWeb              -4.546039e-01
## Total.Claim.Amount            -2.705172e-04
## Vehicle.ClassLuxury.Car        .           
## Vehicle.ClassLuxury.SUV        .           
## Vehicle.ClassSports.Car        2.161568e-01
## Vehicle.ClassSUV               2.308456e-01
## Vehicle.ClassTwo.Door.Car      .           
## Vehicle.SizeMedsize            .           
## Vehicle.SizeSmall             -2.841743e-01
coef(insurance_lasso, s=insurance_lasso_cv$lambda.1se)
## 43 x 1 sparse Matrix of class "dgCMatrix"
##                                          1
## (Intercept)                   -1.977400124
## StateCalifornia                .          
## StateNevada                    .          
## StateOregon                    .          
## StateWashington                .          
## Customer.Lifetime.Value        .          
## CoverageExtended               .          
## CoveragePremium                .          
## EducationCollege               .          
## EducationDoctor                .          
## EducationHigh.School.or.Below  .          
## EducationMaster                .          
## EmploymentStatusEmployed       .          
## EmploymentStatusMedical.Leave  .          
## EmploymentStatusRetired        2.032512167
## EmploymentStatusUnemployed     .          
## GenderM                        .          
## Income                         .          
## Location.CodeSuburban          0.008787571
## Location.CodeUrban             .          
## Marital.StatusMarried          .          
## Marital.StatusSingle           .          
## Monthly.Premium.Auto           .          
## Months.Since.Last.Claim        .          
## Months.Since.Policy.Inception  .          
## Number.of.Open.Complaints      .          
## Number.of.Policies             .          
## Policy.TypePersonal.Auto       .          
## Policy.TypeSpecial.Auto        .          
## Renew.Offer.TypeOffer2         0.384398291
## Renew.Offer.TypeOffer3        -0.344749739
## Renew.Offer.TypeOffer4        -0.329119235
## Sales.ChannelBranch            .          
## Sales.ChannelCall.Center       .          
## Sales.ChannelWeb               .          
## Total.Claim.Amount             .          
## Vehicle.ClassLuxury.Car        .          
## Vehicle.ClassLuxury.SUV        .          
## Vehicle.ClassSports.Car        .          
## Vehicle.ClassSUV               .          
## Vehicle.ClassTwo.Door.Car      .          
## Vehicle.SizeMedsize            .          
## Vehicle.SizeSmall              .
pred.lasso.train<- predict(insurance_lasso, newx=insurance.train.X, s=insurance_lasso_cv$lambda.min, type = "response")

pred <- prediction(pred.lasso.train,insurance.train.Y)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)

Get Area Under Curve (AUC)

cat('AUC for full model is ',unlist(slot(performance(pred, "auc"), "y.values")))
## AUC for full model is  0.8094227

Out-of-sample prediction

pred.lasso.test<- predict(insurance_lasso, newx=insurance.test.X, s=insurance_lasso_cv$lambda.min, type = "response")

pred <- prediction(pred.lasso.test,insurance.test.Y)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)

Get Area Under Curve (AUC)

cat('AUC for full model is ',unlist(slot(performance(pred, "auc"), "y.values")))
## AUC for full model is  0.8025042

Confusion Matrix

conf.mat.lasso <- table(insurance.test$Response, pred.lasso.test>.5)

fourfoldplot(conf.mat.lasso, color = c("#CC6666", "#99CC99"),
             conf.level = 0, margin = 1, main = "Confusion Matrix Lasso")

lasso_accuracy <- (2280 + 13)/(2280+55+393+13)
lasso_accuracy
## [1] 0.836556


##Decision Tree

library(rpart)
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 3.6.3

Data ready with Categorical and Numeric variables. Start Decision Tree Classification Tree with all variables for training dataset

tree0 <- rpart(formula = Response ~ ., data = insurance.train, method = "class")
tree0
## n= 6393 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 6393 902 0 (0.85890818 0.14109182)  
##   2) EmploymentStatus=Disabled,Employed,Medical Leave,Unemployed 6197 760 0 (0.87736001 0.12263999) *
##   3) EmploymentStatus=Retired 196  54 1 (0.27551020 0.72448980)  
##     6) Renew.Offer.Type=Offer3,Offer4 23   2 0 (0.91304348 0.08695652) *
##     7) Renew.Offer.Type=Offer1,Offer2 173  33 1 (0.19075145 0.80924855) *
prp(tree0, extra = 1)

This tree has 3 leaf nodes with Employement and Renewal as the main split

Checking predicition rate of tree

pred0<- predict(tree0, type="class")
table(insurance.train$Response, pred0, dnn = c("True", "Pred"))
##     Pred
## True    0    1
##    0 5458   33
##    1  762  140

The Tree0 has a prediction rate of 87.5% (5599 out of 6393 predicted right).

Since only twow variables used, changing Complexity Parameter to add more depth. CP = 0.001 (0.01 default)

tree1 <- rpart(formula = Response ~ ., data = insurance.train, cp=0.001, method = "class")
tree1
## n= 6393 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##      1) root 6393 902 0 (0.858908181 0.141091819)  
##        2) EmploymentStatus=Disabled,Employed,Medical Leave,Unemployed 6197 760 0 (0.877360013 0.122639987)  
##          4) Renew.Offer.Type=Offer3,Offer4 1705  14 0 (0.991788856 0.008211144) *
##          5) Renew.Offer.Type=Offer1,Offer2 4492 746 0 (0.833926981 0.166073019)  
##           10) Renew.Offer.Type=Offer1 2501 318 0 (0.872850860 0.127149140)  
##             20) Location.Code=Rural,Urban 844  50 0 (0.940758294 0.059241706)  
##               40) Education=Bachelor,College 547  17 0 (0.968921389 0.031078611) *
##               41) Education=Doctor,High School or Below,Master 297  33 0 (0.888888889 0.111111111)  
##                 82) Customer.Lifetime.Value>=4491.294 253  19 0 (0.924901186 0.075098814)  
##                  164) Number.of.Policies< 5.5 180   4 0 (0.977777778 0.022222222) *
##                  165) Number.of.Policies>=5.5 73  15 0 (0.794520548 0.205479452)  
##                    330) Marital.Status=Divorced,Married 58   4 0 (0.931034483 0.068965517)  
##                      660) Income< 88917.5 51   0 0 (1.000000000 0.000000000) *
##                      661) Income>=88917.5 7   3 1 (0.428571429 0.571428571) *
##                    331) Marital.Status=Single 15   4 1 (0.266666667 0.733333333) *
##                 83) Customer.Lifetime.Value< 4491.294 44  14 0 (0.681818182 0.318181818)  
##                  166) Total.Claim.Amount>=70.40171 32   5 0 (0.843750000 0.156250000)  
##                    332) Customer.Lifetime.Value< 4260.683 25   0 0 (1.000000000 0.000000000) *
##                    333) Customer.Lifetime.Value>=4260.683 7   2 1 (0.285714286 0.714285714) *
##                  167) Total.Claim.Amount< 70.40171 12   3 1 (0.250000000 0.750000000) *
##             21) Location.Code=Suburban 1657 268 0 (0.838261919 0.161738081)  
##               42) Marital.Status=Married,Single 1381 188 0 (0.863866763 0.136133237)  
##                 84) Total.Claim.Amount>=667.6793 365  14 0 (0.961643836 0.038356164) *
##                 85) Total.Claim.Amount< 667.6793 1016 174 0 (0.828740157 0.171259843)  
##                  170) Monthly.Premium.Auto< 101.5 737  88 0 (0.880597015 0.119402985)  
##                    340) Total.Claim.Amount>=460.8031 224   3 0 (0.986607143 0.013392857) *
##                    341) Total.Claim.Amount< 460.8031 513  85 0 (0.834307992 0.165692008)  
##                      682) Months.Since.Policy.Inception>=13.5 441  58 0 (0.868480726 0.131519274)  
##                       1364) Customer.Lifetime.Value>=6076.984 136   3 0 (0.977941176 0.022058824) *
##                       1365) Customer.Lifetime.Value< 6076.984 305  55 0 (0.819672131 0.180327869)  
##                         2730) Customer.Lifetime.Value< 5469.964 239  24 0 (0.899581590 0.100418410)  
##                           5460) Monthly.Premium.Auto< 91.5 225  15 0 (0.933333333 0.066666667)  
##                            10920) Total.Claim.Amount< 455.1514 214  10 0 (0.953271028 0.046728972)  
##                              21840) Sales.Channel=Agent,Branch,Call Center 194   4 0 (0.979381443 0.020618557) *
##                              21841) Sales.Channel=Web 20   6 0 (0.700000000 0.300000000)  
##                                43682) Vehicle.Class=Four-Door Car 12   0 0 (1.000000000 0.000000000) *
##                                43683) Vehicle.Class=Two-Door Car 8   2 1 (0.250000000 0.750000000) *
##                            10921) Total.Claim.Amount>=455.1514 11   5 0 (0.545454545 0.454545455) *
##                           5461) Monthly.Premium.Auto>=91.5 14   5 1 (0.357142857 0.642857143) *
##                         2731) Customer.Lifetime.Value>=5469.964 66  31 0 (0.530303030 0.469696970)  
##                           5462) Customer.Lifetime.Value>=5528.439 46  13 0 (0.717391304 0.282608696)  
##                            10924) Coverage=Basic 32   0 0 (1.000000000 0.000000000) *
##                            10925) Coverage=Extended 14   1 1 (0.071428571 0.928571429) *
##                           5463) Customer.Lifetime.Value< 5528.439 20   2 1 (0.100000000 0.900000000) *
##                      683) Months.Since.Policy.Inception< 13.5 72  27 0 (0.625000000 0.375000000)  
##                       1366) Education=Bachelor 18   0 0 (1.000000000 0.000000000) *
##                       1367) Education=College,High School or Below 54  27 0 (0.500000000 0.500000000)  
##                         2734) Total.Claim.Amount>=295.2 45  18 0 (0.600000000 0.400000000)  
##                           5468) Monthly.Premium.Auto< 66.5 12   0 0 (1.000000000 0.000000000) *
##                           5469) Monthly.Premium.Auto>=66.5 33  15 1 (0.454545455 0.545454545)  
##                            10938) Monthly.Premium.Auto>=70.5 18   5 0 (0.722222222 0.277777778) *
##                            10939) Monthly.Premium.Auto< 70.5 15   2 1 (0.133333333 0.866666667) *
##                         2735) Total.Claim.Amount< 295.2 9   0 1 (0.000000000 1.000000000) *
##                  171) Monthly.Premium.Auto>=101.5 279  86 0 (0.691756272 0.308243728)  
##                    342) Customer.Lifetime.Value< 9619.43 172  32 0 (0.813953488 0.186046512)  
##                      684) Months.Since.Policy.Inception< 46.5 67   0 0 (1.000000000 0.000000000) *
##                      685) Months.Since.Policy.Inception>=46.5 105  32 0 (0.695238095 0.304761905)  
##                       1370) Months.Since.Policy.Inception>=51 87  18 0 (0.793103448 0.206896552)  
##                         2740) Total.Claim.Amount>=516 64   6 0 (0.906250000 0.093750000)  
##                           5480) Months.Since.Last.Claim< 30.5 56   0 0 (1.000000000 0.000000000) *
##                           5481) Months.Since.Last.Claim>=30.5 8   2 1 (0.250000000 0.750000000) *
##                         2741) Total.Claim.Amount< 516 23  11 1 (0.478260870 0.521739130)  
##                           5482) Months.Since.Last.Claim>=3.5 14   4 0 (0.714285714 0.285714286) *
##                           5483) Months.Since.Last.Claim< 3.5 9   1 1 (0.111111111 0.888888889) *
##                       1371) Months.Since.Policy.Inception< 51 18   4 1 (0.222222222 0.777777778) *
##                    343) Customer.Lifetime.Value>=9619.43 107  53 1 (0.495327103 0.504672897)  
##                      686) Number.of.Open.Complaints>=0.5 21   0 0 (1.000000000 0.000000000) *
##                      687) Number.of.Open.Complaints< 0.5 86  32 1 (0.372093023 0.627906977)  
##                       1374) Total.Claim.Amount>=544.8 42  17 0 (0.595238095 0.404761905)  
##                         2748) Months.Since.Last.Claim< 11.5 14   0 0 (1.000000000 0.000000000) *
##                         2749) Months.Since.Last.Claim>=11.5 28  11 1 (0.392857143 0.607142857)  
##                           5498) Customer.Lifetime.Value>=10215.37 17   8 0 (0.529411765 0.470588235) *
##                           5499) Customer.Lifetime.Value< 10215.37 11   2 1 (0.181818182 0.818181818) *
##                       1375) Total.Claim.Amount< 544.8 44   7 1 (0.159090909 0.840909091) *
##               43) Marital.Status=Divorced 276  80 0 (0.710144928 0.289855072)  
##                 86) Sales.Channel=Branch 63   4 0 (0.936507937 0.063492063) *
##                 87) Sales.Channel=Agent,Call Center,Web 213  76 0 (0.643192488 0.356807512)  
##                  174) Total.Claim.Amount< 991.2 197  63 0 (0.680203046 0.319796954)  
##                    348) Income>=26093 99  16 0 (0.838383838 0.161616162)  
##                      696) Customer.Lifetime.Value< 32347.55 91   9 0 (0.901098901 0.098901099) *
##                      697) Customer.Lifetime.Value>=32347.55 8   1 1 (0.125000000 0.875000000) *
##                    349) Income< 26093 98  47 0 (0.520408163 0.479591837)  
##                      698) Income< 25796 85  34 0 (0.600000000 0.400000000)  
##                       1396) Months.Since.Policy.Inception< 92.5 70  21 0 (0.700000000 0.300000000)  
##                         2792) Education=College,Doctor,High School or Below 34   1 0 (0.970588235 0.029411765) *
##                         2793) Education=Bachelor,Master 36  16 1 (0.444444444 0.555555556)  
##                           5586) Gender=M 14   4 0 (0.714285714 0.285714286) *
##                           5587) Gender=F 22   6 1 (0.272727273 0.727272727) *
##                       1397) Months.Since.Policy.Inception>=92.5 15   2 1 (0.133333333 0.866666667) *
##                      699) Income>=25796 13   0 1 (0.000000000 1.000000000) *
##                  175) Total.Claim.Amount>=991.2 16   3 1 (0.187500000 0.812500000) *
##           11) Renew.Offer.Type=Offer2 1991 428 0 (0.785032647 0.214967353)  
##             22) Sales.Channel=Branch,Web 882 139 0 (0.842403628 0.157596372)  
##               44) Monthly.Premium.Auto< 106.5 650  76 0 (0.883076923 0.116923077)  
##                 88) Marital.Status=Married,Single 581  52 0 (0.910499139 0.089500861)  
##                  176) Total.Claim.Amount< 1145.186 574  46 0 (0.919860627 0.080139373)  
##                    352) Total.Claim.Amount>=63.61346 529  33 0 (0.937618147 0.062381853)  
##                      704) Income< 22503.5 130   0 0 (1.000000000 0.000000000) *
##                      705) Income>=22503.5 399  33 0 (0.917293233 0.082706767)  
##                       1410) Marital.Status=Married 333  18 0 (0.945945946 0.054054054) *
##                       1411) Marital.Status=Single 66  15 0 (0.772727273 0.227272727)  
##                         2822) Income>=49629 34   0 0 (1.000000000 0.000000000) *
##                         2823) Income< 49629 32  15 0 (0.531250000 0.468750000)  
##                           5646) Education=Bachelor,Doctor,Master 10   0 0 (1.000000000 0.000000000) *
##                           5647) Education=College,High School or Below 22   7 1 (0.318181818 0.681818182)  
##                            11294) Months.Since.Last.Claim< 27.5 13   6 0 (0.538461538 0.461538462) *
##                            11295) Months.Since.Last.Claim>=27.5 9   0 1 (0.000000000 1.000000000) *
##                    353) Total.Claim.Amount< 63.61346 45  13 0 (0.711111111 0.288888889)  
##                      706) Customer.Lifetime.Value>=2792.734 26   0 0 (1.000000000 0.000000000) *
##                      707) Customer.Lifetime.Value< 2792.734 19   6 1 (0.315789474 0.684210526) *
##                  177) Total.Claim.Amount>=1145.186 7   1 1 (0.142857143 0.857142857) *
##                 89) Marital.Status=Divorced 69  24 0 (0.652173913 0.347826087)  
##                  178) Education=Bachelor,College,High School or Below,Master 58  13 0 (0.775862069 0.224137931)  
##                    356) Months.Since.Policy.Inception< 58.5 36   1 0 (0.972222222 0.027777778) *
##                    357) Months.Since.Policy.Inception>=58.5 22  10 1 (0.454545455 0.545454545)  
##                      714) Income< 74839 12   3 0 (0.750000000 0.250000000) *
##                      715) Income>=74839 10   1 1 (0.100000000 0.900000000) *
##                  179) Education=Doctor 11   0 1 (0.000000000 1.000000000) *
##               45) Monthly.Premium.Auto>=106.5 232  63 0 (0.728448276 0.271551724)  
##                 90) Customer.Lifetime.Value>=9072.068 82   6 0 (0.926829268 0.073170732) *
##                 91) Customer.Lifetime.Value< 9072.068 150  57 0 (0.620000000 0.380000000)  
##                  182) Months.Since.Policy.Inception>=27 95  23 0 (0.757894737 0.242105263)  
##                    364) Monthly.Premium.Auto>=114.5 38   0 0 (1.000000000 0.000000000) *
##                    365) Monthly.Premium.Auto< 114.5 57  23 0 (0.596491228 0.403508772)  
##                      730) Location.Code=Rural 13   0 0 (1.000000000 0.000000000) *
##                      731) Location.Code=Suburban,Urban 44  21 1 (0.477272727 0.522727273)  
##                       1462) Number.of.Open.Complaints>=0.5 11   0 0 (1.000000000 0.000000000) *
##                       1463) Number.of.Open.Complaints< 0.5 33  10 1 (0.303030303 0.696969697)  
##                         2926) Months.Since.Policy.Inception< 65.5 19   9 0 (0.526315789 0.473684211) *
##                         2927) Months.Since.Policy.Inception>=65.5 14   0 1 (0.000000000 1.000000000) *
##                  183) Months.Since.Policy.Inception< 27 55  21 1 (0.381818182 0.618181818)  
##                    366) Months.Since.Last.Claim< 8.5 8   0 0 (1.000000000 0.000000000) *
##                    367) Months.Since.Last.Claim>=8.5 47  13 1 (0.276595745 0.723404255)  
##                      734) Months.Since.Last.Claim>=10.5 33  13 1 (0.393939394 0.606060606)  
##                       1468) Months.Since.Last.Claim< 22 7   0 0 (1.000000000 0.000000000) *
##                       1469) Months.Since.Last.Claim>=22 26   6 1 (0.230769231 0.769230769) *
##                      735) Months.Since.Last.Claim< 10.5 14   0 1 (0.000000000 1.000000000) *
##             23) Sales.Channel=Agent,Call Center 1109 289 0 (0.739404869 0.260595131)  
##               46) Location.Code=Rural,Urban 413  80 0 (0.806295400 0.193704600)  
##                 92) Months.Since.Policy.Inception< 28.5 115   5 0 (0.956521739 0.043478261) *
##                 93) Months.Since.Policy.Inception>=28.5 298  75 0 (0.748322148 0.251677852)  
##                  186) Education=Doctor,Master 45   0 0 (1.000000000 0.000000000) *
##                  187) Education=Bachelor,College,High School or Below 253  75 0 (0.703557312 0.296442688)  
##                    374) Income< 35124.5 33   0 0 (1.000000000 0.000000000) *
##                    375) Income>=35124.5 220  75 0 (0.659090909 0.340909091)  
##                      750) Vehicle.Class=Luxury SUV,SUV 24   0 0 (1.000000000 0.000000000) *
##                      751) Vehicle.Class=Four-Door Car,Sports Car,Two-Door Car 196  75 0 (0.617346939 0.382653061)  
##                       1502) Total.Claim.Amount>=289.2332 34   4 0 (0.882352941 0.117647059)  
##                         3004) Monthly.Premium.Auto< 108 27   0 0 (1.000000000 0.000000000) *
##                         3005) Monthly.Premium.Auto>=108 7   3 1 (0.428571429 0.571428571) *
##                       1503) Total.Claim.Amount< 289.2332 162  71 0 (0.561728395 0.438271605)  
##                         3006) Total.Claim.Amount< 279.439 148  58 0 (0.608108108 0.391891892)  
##                           6012) Monthly.Premium.Auto< 88.5 110  31 0 (0.718181818 0.281818182)  
##                            12024) Months.Since.Last.Claim>=3.5 80  14 0 (0.825000000 0.175000000)  
##                              24048) Customer.Lifetime.Value>=2517.447 66   3 0 (0.954545455 0.045454545) *
##                              24049) Customer.Lifetime.Value< 2517.447 14   3 1 (0.214285714 0.785714286) *
##                            12025) Months.Since.Last.Claim< 3.5 30  13 1 (0.433333333 0.566666667)  
##                              24050) Vehicle.Class=Two-Door Car 10   3 0 (0.700000000 0.300000000) *
##                              24051) Vehicle.Class=Four-Door Car 20   6 1 (0.300000000 0.700000000)  
##                                48102) Gender=F 8   3 0 (0.625000000 0.375000000) *
##                                48103) Gender=M 12   1 1 (0.083333333 0.916666667) *
##                           6013) Monthly.Premium.Auto>=88.5 38  11 1 (0.289473684 0.710526316)  
##                            12026) Months.Since.Last.Claim< 13 9   0 0 (1.000000000 0.000000000) *
##                            12027) Months.Since.Last.Claim>=13 29   2 1 (0.068965517 0.931034483) *
##                         3007) Total.Claim.Amount>=279.439 14   1 1 (0.071428571 0.928571429) *
##               47) Location.Code=Suburban 696 209 0 (0.699712644 0.300287356)  
##                 94) Income< 48520 479 116 0 (0.757828810 0.242171190)  
##                  188) Marital.Status=Single 148   4 0 (0.972972973 0.027027027) *
##                  189) Marital.Status=Divorced,Married 331 112 0 (0.661631420 0.338368580)  
##                    378) Customer.Lifetime.Value< 13172.35 282  81 0 (0.712765957 0.287234043)  
##                      756) Education=High School or Below,Master 89  12 0 (0.865168539 0.134831461)  
##                       1512) Income< 44323.5 77   5 0 (0.935064935 0.064935065) *
##                       1513) Income>=44323.5 12   5 1 (0.416666667 0.583333333) *
##                      757) Education=Bachelor,College,Doctor 193  69 0 (0.642487047 0.357512953)  
##                       1514) Income>=27233.5 56   7 0 (0.875000000 0.125000000) *
##                       1515) Income< 27233.5 137  62 0 (0.547445255 0.452554745)  
##                         3030) Months.Since.Policy.Inception>=77.5 15   0 0 (1.000000000 0.000000000) *
##                         3031) Months.Since.Policy.Inception< 77.5 122  60 1 (0.491803279 0.508196721)  
##                           6062) Total.Claim.Amount< 312 11   0 0 (1.000000000 0.000000000) *
##                           6063) Total.Claim.Amount>=312 111  49 1 (0.441441441 0.558558559)  
##                            12126) Coverage=Premium 9   0 0 (1.000000000 0.000000000) *
##                            12127) Coverage=Basic,Extended 102  40 1 (0.392156863 0.607843137)  
##                              24254) Months.Since.Policy.Inception>=12.5 84  39 1 (0.464285714 0.535714286)  
##                                48508) Months.Since.Policy.Inception< 37.5 15   0 0 (1.000000000 0.000000000) *
##                                48509) Months.Since.Policy.Inception>=37.5 69  24 1 (0.347826087 0.652173913)  
##                                  97018) Months.Since.Policy.Inception>=49.5 41  19 0 (0.536585366 0.463414634)  
##                                   194036) Monthly.Premium.Auto>=72 13   0 0 (1.000000000 0.000000000) *
##                                   194037) Monthly.Premium.Auto< 72 28   9 1 (0.321428571 0.678571429)  
##                                     388074) State=Arizona,Washington 12   5 0 (0.583333333 0.416666667) *
##                                     388075) State=California,Nevada,Oregon 16   2 1 (0.125000000 0.875000000) *
##                                  97019) Months.Since.Policy.Inception< 49.5 28   2 1 (0.071428571 0.928571429) *
##                              24255) Months.Since.Policy.Inception< 12.5 18   1 1 (0.055555556 0.944444444) *
##                    379) Customer.Lifetime.Value>=13172.35 49  18 1 (0.367346939 0.632653061)  
##                      758) Months.Since.Policy.Inception>=62 9   0 0 (1.000000000 0.000000000) *
##                      759) Months.Since.Policy.Inception< 62 40   9 1 (0.225000000 0.775000000)  
##                       1518) Total.Claim.Amount>=571.918 13   6 0 (0.538461538 0.461538462) *
##                       1519) Total.Claim.Amount< 571.918 27   2 1 (0.074074074 0.925925926) *
##                 95) Income>=48520 217  93 0 (0.571428571 0.428571429)  
##                  190) Monthly.Premium.Auto< 68.5 42   7 0 (0.833333333 0.166666667) *
##                  191) Monthly.Premium.Auto>=68.5 175  86 0 (0.508571429 0.491428571)  
##                    382) Monthly.Premium.Auto>=76.5 116  46 0 (0.603448276 0.396551724)  
##                      764) Coverage=Basic 22   0 0 (1.000000000 0.000000000) *
##                      765) Coverage=Extended,Premium 94  46 0 (0.510638298 0.489361702)  
##                       1530) Gender=M 28   6 0 (0.785714286 0.214285714)  
##                         3060) Income>=59440 19   0 0 (1.000000000 0.000000000) *
##                         3061) Income< 59440 9   3 1 (0.333333333 0.666666667) *
##                       1531) Gender=F 66  26 1 (0.393939394 0.606060606)  
##                         3062) Customer.Lifetime.Value>=6645.183 20   7 0 (0.650000000 0.350000000)  
##                           6124) Total.Claim.Amount< 606.2715 10   0 0 (1.000000000 0.000000000) *
##                           6125) Total.Claim.Amount>=606.2715 10   3 1 (0.300000000 0.700000000) *
##                         3063) Customer.Lifetime.Value< 6645.183 46  13 1 (0.282608696 0.717391304)  
##                           6126) Sales.Channel=Call Center 10   4 0 (0.600000000 0.400000000) *
##                           6127) Sales.Channel=Agent 36   7 1 (0.194444444 0.805555556) *
##                    383) Monthly.Premium.Auto< 76.5 59  19 1 (0.322033898 0.677966102)  
##                      766) Months.Since.Last.Claim>=25.5 8   0 0 (1.000000000 0.000000000) *
##                      767) Months.Since.Last.Claim< 25.5 51  11 1 (0.215686275 0.784313725)  
##                       1534) Number.of.Open.Complaints>=0.5 12   4 0 (0.666666667 0.333333333) *
##                       1535) Number.of.Open.Complaints< 0.5 39   3 1 (0.076923077 0.923076923) *
##        3) EmploymentStatus=Retired 196  54 1 (0.275510204 0.724489796)  
##          6) Renew.Offer.Type=Offer3,Offer4 23   2 0 (0.913043478 0.086956522) *
##          7) Renew.Offer.Type=Offer1,Offer2 173  33 1 (0.190751445 0.809248555)  
##           14) Vehicle.Size=Small 18   6 0 (0.666666667 0.333333333) *
##           15) Vehicle.Size=Large,Medsize 155  21 1 (0.135483871 0.864516129)  
##             30) Months.Since.Last.Claim< 10.5 57  15 1 (0.263157895 0.736842105)  
##               60) Customer.Lifetime.Value>=5022.643 21  10 0 (0.523809524 0.476190476)  
##                120) Customer.Lifetime.Value< 10395.74 9   0 0 (1.000000000 0.000000000) *
##                121) Customer.Lifetime.Value>=10395.74 12   2 1 (0.166666667 0.833333333) *
##               61) Customer.Lifetime.Value< 5022.643 36   4 1 (0.111111111 0.888888889) *
##             31) Months.Since.Last.Claim>=10.5 98   6 1 (0.061224490 0.938775510) *
prp(tree1, extra = 1)
## Warning: labs do not fit even at cex 0.15, there may be some overplotting

Depth of the tree is too large and unreadable because of length of variables and their factors. Pruning the tree to change Complexity Parameter to reduce depth.

plotcp(tree1)

printcp(tree1)
## 
## Classification tree:
## rpart(formula = Response ~ ., data = insurance.train, method = "class", 
##     cp = 0.001)
## 
## Variables actually used in tree construction:
##  [1] Coverage                      Customer.Lifetime.Value      
##  [3] Education                     EmploymentStatus             
##  [5] Gender                        Income                       
##  [7] Location.Code                 Marital.Status               
##  [9] Monthly.Premium.Auto          Months.Since.Last.Claim      
## [11] Months.Since.Policy.Inception Number.of.Open.Complaints    
## [13] Number.of.Policies            Renew.Offer.Type             
## [15] Sales.Channel                 State                        
## [17] Total.Claim.Amount            Vehicle.Class                
## [19] Vehicle.Size                 
## 
## Root node error: 902/6393 = 0.14109
## 
## n= 6393 
## 
##           CP nsplit rel error  xerror     xstd
## 1  0.0975610      0   1.00000 1.00000 0.030858
## 2  0.0210643      1   0.90244 0.90244 0.029548
## 3  0.0066519      2   0.88137 0.88137 0.029251
## 4  0.0041574      3   0.87472 0.87805 0.029204
## 5  0.0040650     60   0.55765 0.77827 0.027714
## 6  0.0039595     63   0.54545 0.77605 0.027680
## 7  0.0038803     74   0.48780 0.77273 0.027628
## 8  0.0036031     78   0.47228 0.76386 0.027488
## 9  0.0033259     87   0.41685 0.70843 0.026587
## 10 0.0028825     91   0.40355 0.69956 0.026439
## 11 0.0022173     97   0.38581 0.68404 0.026176
## 12 0.0014782    105   0.36807 0.65188 0.025617
## 13 0.0011086    108   0.36364 0.64745 0.025539
## 14 0.0010000    114   0.35698 0.64080 0.025420

Setting the CP value to 0.0034

tree_final<-prune(tree1, cp = 0.0034)
prp(tree_final, extra = 1)

rpart.plot(tree_final, extra=1)
## Warning: labs do not fit even at cex 0.15, there may be some overplotting

Checking predicition rate of final tree - In Sample

pred_final<- predict(tree_final, type="class")
table(insurance.train$Response, pred_final, dnn = c("True", "Pred"))
##     Pred
## True    0    1
##    0 5347  144
##    1  232  670

Checking predicition rate of final tree - Out Sample

pred_final_test<- predict(tree_final, newdata=insurance.test, type="class")
table(insurance.test$Response, pred_final_test, dnn = c("True", "Pred"))
##     Pred
## True    0    1
##    0 2212  123
##    1  164  242

Checking AUC value for Training dataset- In Sample

pred.traintree = prediction(as.double(pred_final), insurance.train$Response)
perf = performance(pred.traintree, "tpr", "fpr")
plot(perf, colorize=TRUE)

unlist(slot(performance(pred.traintree,"auc"),"y.values"))
## [1] 0.8582845
str(pred_final)
##  Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

Checking AUC value for Testing dataset- Out Sample

pred.traintree.test = prediction(as.double(pred_final_test), insurance.test$Response)
perf = performance(pred.traintree.test, "tpr", "fpr")
plot(perf, colorize=TRUE)

unlist(slot(performance(pred.traintree.test,"auc"),"y.values"))
## [1] 0.7716912

Mean error for training data

MR.treetrain<- mean(insurance.train$Response!= pred_final)
MR.treetrain
## [1] 0.05881433

Mean error for testing data

MR.treetrain<- mean(insurance.test$Response!= pred_final_test)
MR.treetrain
## [1] 0.1047063


Conclusion We have built two final models for our dataset, the Logistic Regression Model and the Decision Tree Model to help predict the response (Yes/No) of customers better based on different variables. As we can defer from the accuracy and AUC values of both our final models, we can conclude that our Decision Tree model came out as the Champion with an accuracy of 91.7%. Our Logistic model is our Challenger model with an accuracy of around 86% with an AUC of 81!