Marketing Customer Value dataset provided by IBM Watson Analytics gives you information about customers. This rich dataset can be used to predict their behavior to retain your customers. Analysis on this dataset can be used to understand the behavior of customers and buying habits. We can analyze all relevant customer data to understand customer demographics and develop focused customer retention programs.
We can analyze the most profitable customers and how they interact and help take targeted actions to increase profitable customer response, retention, and growth.
The purpose of this project is to determine the response of the customer whether they would accept or reject the offer we make based on the customer profile. The profile of the customer is built upon based on details such as Employment status, Marital status, Income group, etc.
We’ve created two models on top this dataset using Logistic Regression and Decision Tree. Based on the testing dataset and the model built upon training dataset, accuracy is determined by cross checking the model output with the response we already have. Using this we determined the challenger model and champion model based on the accuracy.
The report progresses with different parts of regression model-building process such as model specification, parameter estimation, model adequacy checking, and model validation.
The report concludes with determining the champion model based on the accuracy which will predict the possibility of a customer accepting or rejecting the offer.
The statistics are about whether the customer has accepted or rejected the offer extended to them along with the customer profile containing personal information. From the available attributes, we are initially considering the following covariates:
We picked the dataset from IBM Watson Analytics Gallery. The statistics are about whether the customer has accepted or rejected the offer extended to them along with the customer profile containing personal information, Policy and Vehicle Information. Customer profile is built upon the personal information displayed in tabular format below.
We have 9134 unique observation in total. We created training dataset and testing dataset by splitting original dataset into 70-30 ratio where 70% of dataset is used for training and remaining 30% for testing the model
Import all the libraries:
library('ggcorrplot')
## Warning: package 'ggcorrplot' was built under R version 3.6.3
## Loading required package: ggplot2
library('ggplot2')
library('ROCR')
## Warning: package 'ROCR' was built under R version 3.6.3
## Loading required package: gplots
## Warning: package 'gplots' was built under R version 3.6.2
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
library('car')
## Loading required package: carData
library("dplyr")
## Warning: package 'dplyr' was built under R version 3.6.3
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library('rpart')
library('tidyverse')
## -- Attaching packages ------------------------------------------------------------------- tidyverse 1.2.1 --
## v tibble 2.1.3 v purrr 0.3.2
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ---------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x dplyr::recode() masks car::recode()
## x purrr::some() masks car::some()
library('corrgram')
## Warning: package 'corrgram' was built under R version 3.6.2
## Registered S3 method overwritten by 'seriation':
## method from
## reorder.hclust gclus
library('glmnet')
## Warning: package 'glmnet' was built under R version 3.6.2
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following object is masked from 'package:tidyr':
##
## expand
## Loaded glmnet 3.0-2
library('boot')
##
## Attaching package: 'boot'
## The following object is masked from 'package:car':
##
## logit
Loading the dataset.
insurance.data<-read.csv("~/Assignment/WA_Fn-UseC_-Marketing-Customer-Value-Analysis.csv")
head(insurance.data)
## Customer State Customer.Lifetime.Value Response Coverage Education
## 1 BU79786 Washington 2763.519 No Basic Bachelor
## 2 QZ44356 Arizona 6979.536 No Extended Bachelor
## 3 AI49188 Nevada 12887.432 No Premium Bachelor
## 4 WW63253 California 7645.862 No Basic Bachelor
## 5 HB64268 Washington 2813.693 No Basic Bachelor
## 6 OC83172 Oregon 8256.298 Yes Basic Bachelor
## Effective.To.Date EmploymentStatus Gender Income Location.Code
## 1 2/24/11 Employed F 56274 Suburban
## 2 1/31/11 Unemployed F 0 Suburban
## 3 2/19/11 Employed F 48767 Suburban
## 4 1/20/11 Unemployed M 0 Suburban
## 5 2/3/11 Employed M 43836 Rural
## 6 1/25/11 Employed F 62902 Rural
## Marital.Status Monthly.Premium.Auto Months.Since.Last.Claim
## 1 Married 69 32
## 2 Single 94 13
## 3 Married 108 18
## 4 Married 106 18
## 5 Single 73 12
## 6 Married 69 14
## Months.Since.Policy.Inception Number.of.Open.Complaints
## 1 5 0
## 2 42 0
## 3 38 0
## 4 65 0
## 5 44 0
## 6 94 0
## Number.of.Policies Policy.Type Policy Renew.Offer.Type
## 1 1 Corporate Auto Corporate L3 Offer1
## 2 8 Personal Auto Personal L3 Offer3
## 3 2 Personal Auto Personal L3 Offer1
## 4 7 Corporate Auto Corporate L2 Offer1
## 5 1 Personal Auto Personal L1 Offer1
## 6 2 Personal Auto Personal L3 Offer2
## Sales.Channel Total.Claim.Amount Vehicle.Class Vehicle.Size
## 1 Agent 384.8111 Two-Door Car Medsize
## 2 Agent 1131.4649 Four-Door Car Medsize
## 3 Agent 566.4722 Two-Door Car Medsize
## 4 Call Center 529.8813 SUV Medsize
## 5 Agent 138.1309 Four-Door Car Medsize
## 6 Web 159.3830 Two-Door Car Medsize
##Data Exploration
sapply(insurance.data, class)
## Customer State
## "factor" "factor"
## Customer.Lifetime.Value Response
## "numeric" "factor"
## Coverage Education
## "factor" "factor"
## Effective.To.Date EmploymentStatus
## "factor" "factor"
## Gender Income
## "factor" "integer"
## Location.Code Marital.Status
## "factor" "factor"
## Monthly.Premium.Auto Months.Since.Last.Claim
## "integer" "integer"
## Months.Since.Policy.Inception Number.of.Open.Complaints
## "integer" "integer"
## Number.of.Policies Policy.Type
## "integer" "factor"
## Policy Renew.Offer.Type
## "factor" "factor"
## Sales.Channel Total.Claim.Amount
## "factor" "numeric"
## Vehicle.Class Vehicle.Size
## "factor" "factor"
summary(insurance.data)
## Customer State Customer.Lifetime.Value Response
## AA10041: 1 Arizona :1703 Min. : 1898 No :7826
## AA11235: 1 California:3150 1st Qu.: 3994 Yes:1308
## AA16582: 1 Nevada : 882 Median : 5780
## AA30683: 1 Oregon :2601 Mean : 8005
## AA34092: 1 Washington: 798 3rd Qu.: 8962
## AA35519: 1 Max. :83325
## (Other):9128
## Coverage Education Effective.To.Date
## Basic :5568 Bachelor :2748 1/10/11: 195
## Extended:2742 College :2681 1/27/11: 194
## Premium : 824 Doctor : 342 2/14/11: 186
## High School or Below:2622 1/26/11: 181
## Master : 741 1/17/11: 180
## 1/19/11: 179
## (Other):8019
## EmploymentStatus Gender Income Location.Code
## Disabled : 405 F:4658 Min. : 0 Rural :1773
## Employed :5698 M:4476 1st Qu.: 0 Suburban:5779
## Medical Leave: 432 Median :33890 Urban :1582
## Retired : 282 Mean :37657
## Unemployed :2317 3rd Qu.:62320
## Max. :99981
##
## Marital.Status Monthly.Premium.Auto Months.Since.Last.Claim
## Divorced:1369 Min. : 61.00 Min. : 0.0
## Married :5298 1st Qu.: 68.00 1st Qu.: 6.0
## Single :2467 Median : 83.00 Median :14.0
## Mean : 93.22 Mean :15.1
## 3rd Qu.:109.00 3rd Qu.:23.0
## Max. :298.00 Max. :35.0
##
## Months.Since.Policy.Inception Number.of.Open.Complaints
## Min. : 0.00 Min. :0.0000
## 1st Qu.:24.00 1st Qu.:0.0000
## Median :48.00 Median :0.0000
## Mean :48.06 Mean :0.3844
## 3rd Qu.:71.00 3rd Qu.:0.0000
## Max. :99.00 Max. :5.0000
##
## Number.of.Policies Policy.Type Policy
## Min. :1.000 Corporate Auto:1968 Personal L3 :3426
## 1st Qu.:1.000 Personal Auto :6788 Personal L2 :2122
## Median :2.000 Special Auto : 378 Personal L1 :1240
## Mean :2.966 Corporate L3:1014
## 3rd Qu.:4.000 Corporate L2: 595
## Max. :9.000 Corporate L1: 359
## (Other) : 378
## Renew.Offer.Type Sales.Channel Total.Claim.Amount
## Offer1:3752 Agent :3477 Min. : 0.099
## Offer2:2926 Branch :2567 1st Qu.: 272.258
## Offer3:1432 Call Center:1765 Median : 383.945
## Offer4:1024 Web :1325 Mean : 434.089
## 3rd Qu.: 547.515
## Max. :2893.240
##
## Vehicle.Class Vehicle.Size
## Four-Door Car:4621 Large : 946
## Luxury Car : 163 Medsize:6424
## Luxury SUV : 184 Small :1764
## Sports Car : 484
## SUV :1796
## Two-Door Car :1886
##
str(insurance.data)
## 'data.frame': 9134 obs. of 24 variables:
## $ Customer : Factor w/ 9134 levels "AA10041","AA11235",..: 601 5947 97 8017 2489 4948 8434 756 1352 548 ...
## $ State : Factor w/ 5 levels "Arizona","California",..: 5 1 3 2 5 4 4 1 4 4 ...
## $ Customer.Lifetime.Value : num 2764 6980 12887 7646 2814 ...
## $ Response : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 2 1 2 1 ...
## $ Coverage : Factor w/ 3 levels "Basic","Extended",..: 1 2 3 1 1 1 1 3 1 2 ...
## $ Education : Factor w/ 5 levels "Bachelor","College",..: 1 1 1 1 1 1 2 5 1 2 ...
## $ Effective.To.Date : Factor w/ 59 levels "1/1/11","1/10/11",..: 48 25 42 13 53 18 48 10 19 40 ...
## $ EmploymentStatus : Factor w/ 5 levels "Disabled","Employed",..: 2 5 2 5 2 2 2 5 3 2 ...
## $ Gender : Factor w/ 2 levels "F","M": 1 1 1 2 2 1 1 2 2 1 ...
## $ Income : int 56274 0 48767 0 43836 62902 55350 0 14072 28812 ...
## $ Location.Code : Factor w/ 3 levels "Rural","Suburban",..: 2 2 2 2 1 1 2 3 2 3 ...
## $ Marital.Status : Factor w/ 3 levels "Divorced","Married",..: 2 3 2 2 3 2 2 3 1 2 ...
## $ Monthly.Premium.Auto : int 69 94 108 106 73 69 67 101 71 93 ...
## $ Months.Since.Last.Claim : int 32 13 18 18 12 14 0 0 13 17 ...
## $ Months.Since.Policy.Inception: int 5 42 38 65 44 94 13 68 3 7 ...
## $ Number.of.Open.Complaints : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Number.of.Policies : int 1 8 2 7 1 2 9 4 2 8 ...
## $ Policy.Type : Factor w/ 3 levels "Corporate Auto",..: 1 2 2 1 2 2 1 1 1 3 ...
## $ Policy : Factor w/ 9 levels "Corporate L1",..: 3 6 6 2 4 6 3 3 3 8 ...
## $ Renew.Offer.Type : Factor w/ 4 levels "Offer1","Offer2",..: 1 3 1 1 1 2 1 1 1 2 ...
## $ Sales.Channel : Factor w/ 4 levels "Agent","Branch",..: 1 1 1 3 1 4 1 1 1 2 ...
## $ Total.Claim.Amount : num 385 1131 566 530 138 ...
## $ Vehicle.Class : Factor w/ 6 levels "Four-Door Car",..: 6 1 6 5 1 6 1 1 1 1 ...
## $ Vehicle.Size : Factor w/ 3 levels "Large","Medsize",..: 2 2 2 2 2 2 2 2 2 2 ...
glimpse(insurance.data)
## Observations: 9,134
## Variables: 24
## $ Customer <fct> BU79786, QZ44356, AI49188, WW632...
## $ State <fct> Washington, Arizona, Nevada, Cal...
## $ Customer.Lifetime.Value <dbl> 2763.519, 6979.536, 12887.432, 7...
## $ Response <fct> No, No, No, No, No, Yes, Yes, No...
## $ Coverage <fct> Basic, Extended, Premium, Basic,...
## $ Education <fct> Bachelor, Bachelor, Bachelor, Ba...
## $ Effective.To.Date <fct> 2/24/11, 1/31/11, 2/19/11, 1/20/...
## $ EmploymentStatus <fct> Employed, Unemployed, Employed, ...
## $ Gender <fct> F, F, F, M, M, F, F, M, M, F, M,...
## $ Income <int> 56274, 0, 48767, 0, 43836, 62902...
## $ Location.Code <fct> Suburban, Suburban, Suburban, Su...
## $ Marital.Status <fct> Married, Single, Married, Marrie...
## $ Monthly.Premium.Auto <int> 69, 94, 108, 106, 73, 69, 67, 10...
## $ Months.Since.Last.Claim <int> 32, 13, 18, 18, 12, 14, 0, 0, 13...
## $ Months.Since.Policy.Inception <int> 5, 42, 38, 65, 44, 94, 13, 68, 3...
## $ Number.of.Open.Complaints <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Number.of.Policies <int> 1, 8, 2, 7, 1, 2, 9, 4, 2, 8, 3,...
## $ Policy.Type <fct> Corporate Auto, Personal Auto, P...
## $ Policy <fct> Corporate L3, Personal L3, Perso...
## $ Renew.Offer.Type <fct> Offer1, Offer3, Offer1, Offer1, ...
## $ Sales.Channel <fct> Agent, Agent, Agent, Call Center...
## $ Total.Claim.Amount <dbl> 384.81115, 1131.46493, 566.47225...
## $ Vehicle.Class <fct> Two-Door Car, Four-Door Car, Two...
## $ Vehicle.Size <fct> Medsize, Medsize, Medsize, Medsi...
Use sapply() function to count the number of observations with each feature that contains.
sapply(insurance.data, function(x) sum(is.na(x)))
## Customer State
## 0 0
## Customer.Lifetime.Value Response
## 0 0
## Coverage Education
## 0 0
## Effective.To.Date EmploymentStatus
## 0 0
## Gender Income
## 0 0
## Location.Code Marital.Status
## 0 0
## Monthly.Premium.Auto Months.Since.Last.Claim
## 0 0
## Months.Since.Policy.Inception Number.of.Open.Complaints
## 0 0
## Number.of.Policies Policy.Type
## 0 0
## Policy Renew.Offer.Type
## 0 0
## Sales.Channel Total.Claim.Amount
## 0 0
## Vehicle.Class Vehicle.Size
## 0 0
#Similarly, the number of unique observations per column is revealed below.
sapply(insurance.data, function(x) length(unique(x)))
## Customer State
## 9134 5
## Customer.Lifetime.Value Response
## 8041 2
## Coverage Education
## 3 5
## Effective.To.Date EmploymentStatus
## 59 5
## Gender Income
## 2 5694
## Location.Code Marital.Status
## 3 3
## Monthly.Premium.Auto Months.Since.Last.Claim
## 202 36
## Months.Since.Policy.Inception Number.of.Open.Complaints
## 100 6
## Number.of.Policies Policy.Type
## 9 3
## Policy Renew.Offer.Type
## 9 4
## Sales.Channel Total.Claim.Amount
## 4 5106
## Vehicle.Class Vehicle.Size
## 6 3
Using the missmap() function under the Amelia package, the visualization of the amount of missing and observed values per features is observed below. Most information in the Cabin and Age features are missing in both datasets.
library(Amelia)
## Warning: package 'Amelia' was built under R version 3.6.3
## Loading required package: Rcpp
## ##
## ## Amelia II: Multiple Imputation
## ## (Version 1.7.6, built: 2019-11-24)
## ## Copyright (C) 2005-2020 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
missmap(insurance.data, main = "Missing Values vs. Observed")
Our data contains 9134 customers with information about their income, education, gender,residence and so on.
Each customer owns a car and you as entrepreneur offers 4 different car insurances to them. The target of this dataset is the Response.The response can be “Yes” - the customer accept the offer and “No” - the customer didn´t accept the offer.
#Using Graphs to understand our Data
# Relation between numerical variables
nums <- unlist(lapply(insurance.data, is.numeric))
insurance_numeric<-insurance.data[,nums]
corr<-cor(insurance_numeric)
library(ggcorrplot)
ggcorrplot(corr, hc.order = TRUE, type = "lower",lab = TRUE)
## Exploratory Data Analysis
Relation between categorial variables and response variable Gender - > Response
library(ggcorrplot)
tbl_gen <- with(insurance.data, table(Gender, Response))
ggplot(as.data.frame(tbl_gen), aes(factor(Response),Freq, fill=Gender) )+ geom_col(position = 'dodge')
State - > Response
library(ggcorrplot)
tbl_State <- with(insurance.data, table(State, Response))
ggplot(as.data.frame(tbl_State), aes(factor(State),Freq, fill=Response) )+ geom_col(position = 'dodge')
Coverage -> Response
library(ggcorrplot)
tbl_Coverage <- with(insurance.data, table(Coverage, Response))
ggplot(as.data.frame(tbl_Coverage), aes(factor(Coverage),Freq, fill=Response) )+ geom_col(position = 'dodge')
Education -> Response
library(ggcorrplot)
tbl_Education <- with(insurance.data, table(Education, Response))
ggplot(as.data.frame(tbl_Coverage), aes(factor(Coverage),Freq, fill=Response) )+ geom_col(position = 'dodge')
EmploymentStatus -> Response
library(ggcorrplot)
tbl_EmploymentStatus <- with(insurance.data, table(EmploymentStatus, Response))
ggplot(as.data.frame(tbl_EmploymentStatus), aes(factor(EmploymentStatus),Freq, fill=Response) )+ geom_col(position = 'dodge')
Location Code - > Response
library(ggcorrplot)
tbl_LocationCode <- with(insurance.data, table(Location.Code, Response))
ggplot(as.data.frame(tbl_LocationCode), aes(factor(Location.Code),Freq, fill=Response) )+ geom_col(position = 'dodge')
Marital.Status -> Response
library(ggcorrplot)
tbl_MaritalStatus <- with(insurance.data, table(Marital.Status, Response))
ggplot(as.data.frame(tbl_MaritalStatus), aes(factor(Marital.Status),Freq, fill=Response) )+ geom_col(position = 'dodge')
Monthly.Premium.Auto -> Response
library(ggcorrplot)
ggplot(insurance.data, aes(x = Monthly.Premium.Auto,fill=Response)) + geom_histogram(position = 'dodge')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Months.Since.Last.Claim -> Response
library(ggcorrplot)
ggplot(insurance.data, aes(x = Months.Since.Last.Claim ,fill=Response)) + geom_histogram(position = 'dodge')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Months.Since.Policy.Inception -> Response
library(ggcorrplot)
ggplot(insurance.data, aes(x = Months.Since.Policy.Inception ,fill=Response)) + geom_histogram(position = 'dodge')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Number.of.Open.Complaints -> Response
library(ggcorrplot)
ggplot(insurance.data, aes(x = Number.of.Open.Complaints ,fill=Response)) + geom_histogram(position = 'dodge')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Number.of.Policies -> Response
library(ggcorrplot)
ggplot(insurance.data, aes(x = Number.of.Policies ,fill=Response)) + geom_histogram(position = 'dodge')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Policy.Type -> Response
library(ggcorrplot)
tbl_PolicyType <- with(insurance.data, table(Policy.Type, Response))
ggplot(as.data.frame(tbl_PolicyType), aes(factor(Policy.Type),Freq, fill=Response) )+ geom_col(position = 'dodge')
Renew.Offer.Type -> Response
library(ggcorrplot)
tbl_RenewOfferType <- with(insurance.data, table(Renew.Offer.Type, Response))
ggplot(as.data.frame(tbl_RenewOfferType), aes(factor(Renew.Offer.Type),Freq, fill=Response) )+ geom_col(position = 'dodge')
Sales.Channel -> Response
library(ggcorrplot)
tbl_SalesChannel <- with(insurance.data, table(Sales.Channel, Response))
ggplot(as.data.frame(tbl_SalesChannel), aes(factor(Sales.Channel),Freq, fill=Response) )+ geom_col(position = 'dodge')
Total.Claim.Amount -> Response
library(ggcorrplot)
ggplot(insurance.data, aes(x = Total.Claim.Amount ,fill=Response)) + geom_histogram(position = 'dodge')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Vehicle.Class -> Response
library(ggcorrplot)
tbl_VehicleClass <- with(insurance.data, table(Vehicle.Class, Response))
ggplot(as.data.frame(tbl_VehicleClass), aes(factor(Vehicle.Class),Freq, fill=Response) )+ geom_col(position = 'dodge')
Vehicle.Size -> Response
library(ggcorrplot)
tbl_VehicleSize <- with(insurance.data, table(Vehicle.Size, Response))
ggplot(as.data.frame(tbl_VehicleSize), aes(factor(Vehicle.Size),Freq, fill=Response) )+ geom_col(position = 'dodge')
##Data Wrangling - cleaning
All categorial features are well distributet, so I will keep them and encode them to numerical data. Some columns don´t make sense or are not so important, e.g. Customer (because it´s just a unique number),
Policy is the same as Policy Type, Effective To Date is also not important, so I will drop them. The data is inbalanced regarding the outcome “Response”
insurance.data = subset(insurance.data , select = -c(Customer,Policy,Effective.To.Date) )
str(insurance.data)
## 'data.frame': 9134 obs. of 21 variables:
## $ State : Factor w/ 5 levels "Arizona","California",..: 5 1 3 2 5 4 4 1 4 4 ...
## $ Customer.Lifetime.Value : num 2764 6980 12887 7646 2814 ...
## $ Response : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 2 1 2 1 ...
## $ Coverage : Factor w/ 3 levels "Basic","Extended",..: 1 2 3 1 1 1 1 3 1 2 ...
## $ Education : Factor w/ 5 levels "Bachelor","College",..: 1 1 1 1 1 1 2 5 1 2 ...
## $ EmploymentStatus : Factor w/ 5 levels "Disabled","Employed",..: 2 5 2 5 2 2 2 5 3 2 ...
## $ Gender : Factor w/ 2 levels "F","M": 1 1 1 2 2 1 1 2 2 1 ...
## $ Income : int 56274 0 48767 0 43836 62902 55350 0 14072 28812 ...
## $ Location.Code : Factor w/ 3 levels "Rural","Suburban",..: 2 2 2 2 1 1 2 3 2 3 ...
## $ Marital.Status : Factor w/ 3 levels "Divorced","Married",..: 2 3 2 2 3 2 2 3 1 2 ...
## $ Monthly.Premium.Auto : int 69 94 108 106 73 69 67 101 71 93 ...
## $ Months.Since.Last.Claim : int 32 13 18 18 12 14 0 0 13 17 ...
## $ Months.Since.Policy.Inception: int 5 42 38 65 44 94 13 68 3 7 ...
## $ Number.of.Open.Complaints : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Number.of.Policies : int 1 8 2 7 1 2 9 4 2 8 ...
## $ Policy.Type : Factor w/ 3 levels "Corporate Auto",..: 1 2 2 1 2 2 1 1 1 3 ...
## $ Renew.Offer.Type : Factor w/ 4 levels "Offer1","Offer2",..: 1 3 1 1 1 2 1 1 1 2 ...
## $ Sales.Channel : Factor w/ 4 levels "Agent","Branch",..: 1 1 1 3 1 4 1 1 1 2 ...
## $ Total.Claim.Amount : num 385 1131 566 530 138 ...
## $ Vehicle.Class : Factor w/ 6 levels "Four-Door Car",..: 6 1 6 5 1 6 1 1 1 1 ...
## $ Vehicle.Size : Factor w/ 3 levels "Large","Medsize",..: 2 2 2 2 2 2 2 2 2 2 ...
Encode the categorial Data to numerical
encode_ordinal <- function(x, order = unique(x)) {
x <- as.numeric(factor(x, levels = order, exclude = NULL))
x
}
table(insurance.data[["Response"]], encode_ordinal(insurance.data[["Response"]]), useNA = "ifany")
##
## 1 2
## No 7826 0
## Yes 0 1308
Updated Dataset
insurance.data.new <- insurance.data
insurance.data.new[["Response"]] <- encode_ordinal(insurance.data[["Response"]])
head(insurance.data.new)
## State Customer.Lifetime.Value Response Coverage Education
## 1 Washington 2763.519 1 Basic Bachelor
## 2 Arizona 6979.536 1 Extended Bachelor
## 3 Nevada 12887.432 1 Premium Bachelor
## 4 California 7645.862 1 Basic Bachelor
## 5 Washington 2813.693 1 Basic Bachelor
## 6 Oregon 8256.298 2 Basic Bachelor
## EmploymentStatus Gender Income Location.Code Marital.Status
## 1 Employed F 56274 Suburban Married
## 2 Unemployed F 0 Suburban Single
## 3 Employed F 48767 Suburban Married
## 4 Unemployed M 0 Suburban Married
## 5 Employed M 43836 Rural Single
## 6 Employed F 62902 Rural Married
## Monthly.Premium.Auto Months.Since.Last.Claim
## 1 69 32
## 2 94 13
## 3 108 18
## 4 106 18
## 5 73 12
## 6 69 14
## Months.Since.Policy.Inception Number.of.Open.Complaints
## 1 5 0
## 2 42 0
## 3 38 0
## 4 65 0
## 5 44 0
## 6 94 0
## Number.of.Policies Policy.Type Renew.Offer.Type Sales.Channel
## 1 1 Corporate Auto Offer1 Agent
## 2 8 Personal Auto Offer3 Agent
## 3 2 Personal Auto Offer1 Agent
## 4 7 Corporate Auto Offer1 Call Center
## 5 1 Personal Auto Offer1 Agent
## 6 2 Personal Auto Offer2 Web
## Total.Claim.Amount Vehicle.Class Vehicle.Size
## 1 384.8111 Two-Door Car Medsize
## 2 1131.4649 Four-Door Car Medsize
## 3 566.4722 Two-Door Car Medsize
## 4 529.8813 SUV Medsize
## 5 138.1309 Four-Door Car Medsize
## 6 159.3830 Two-Door Car Medsize
str(insurance.data.new)
## 'data.frame': 9134 obs. of 21 variables:
## $ State : Factor w/ 5 levels "Arizona","California",..: 5 1 3 2 5 4 4 1 4 4 ...
## $ Customer.Lifetime.Value : num 2764 6980 12887 7646 2814 ...
## $ Response : num 1 1 1 1 1 2 2 1 2 1 ...
## $ Coverage : Factor w/ 3 levels "Basic","Extended",..: 1 2 3 1 1 1 1 3 1 2 ...
## $ Education : Factor w/ 5 levels "Bachelor","College",..: 1 1 1 1 1 1 2 5 1 2 ...
## $ EmploymentStatus : Factor w/ 5 levels "Disabled","Employed",..: 2 5 2 5 2 2 2 5 3 2 ...
## $ Gender : Factor w/ 2 levels "F","M": 1 1 1 2 2 1 1 2 2 1 ...
## $ Income : int 56274 0 48767 0 43836 62902 55350 0 14072 28812 ...
## $ Location.Code : Factor w/ 3 levels "Rural","Suburban",..: 2 2 2 2 1 1 2 3 2 3 ...
## $ Marital.Status : Factor w/ 3 levels "Divorced","Married",..: 2 3 2 2 3 2 2 3 1 2 ...
## $ Monthly.Premium.Auto : int 69 94 108 106 73 69 67 101 71 93 ...
## $ Months.Since.Last.Claim : int 32 13 18 18 12 14 0 0 13 17 ...
## $ Months.Since.Policy.Inception: int 5 42 38 65 44 94 13 68 3 7 ...
## $ Number.of.Open.Complaints : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Number.of.Policies : int 1 8 2 7 1 2 9 4 2 8 ...
## $ Policy.Type : Factor w/ 3 levels "Corporate Auto",..: 1 2 2 1 2 2 1 1 1 3 ...
## $ Renew.Offer.Type : Factor w/ 4 levels "Offer1","Offer2",..: 1 3 1 1 1 2 1 1 1 2 ...
## $ Sales.Channel : Factor w/ 4 levels "Agent","Branch",..: 1 1 1 3 1 4 1 1 1 2 ...
## $ Total.Claim.Amount : num 385 1131 566 530 138 ...
## $ Vehicle.Class : Factor w/ 6 levels "Four-Door Car",..: 6 1 6 5 1 6 1 1 1 1 ...
## $ Vehicle.Size : Factor w/ 3 levels "Large","Medsize",..: 2 2 2 2 2 2 2 2 2 2 ...
##Correlation Graph
Analyzing the relationship between feature variables and the target variable
nums_new <- unlist(lapply(insurance.data.new, is.numeric))
insurance_numeric_new<-insurance.data.new[,nums_new]
corrnew<-cor(insurance_numeric_new)
library(ggcorrplot)
ggcorrplot(corrnew, hc.order = TRUE, type = "lower",lab = TRUE)
library("ggplot2")
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
melted_cormat <- melt(corrnew)
head(melted_cormat)
## Var1 Var2 value
## 1 Customer.Lifetime.Value Customer.Lifetime.Value 1.000000000
## 2 Response Customer.Lifetime.Value -0.008929582
## 3 Income Customer.Lifetime.Value 0.024365661
## 4 Monthly.Premium.Auto Customer.Lifetime.Value 0.396261738
## 5 Months.Since.Last.Claim Customer.Lifetime.Value 0.011516682
## 6 Months.Since.Policy.Inception Customer.Lifetime.Value 0.009418381
library(ggplot2)
ggplot(data = melted_cormat, aes(x=Var1, y=Var2, fill=value)) +
geom_tile()+theme(axis.text.x=element_text(angle = 90))
##Model building
##Logistic regression
insurance.data.new$Response[insurance.data.new$Response==1] <- 0
insurance.data.new$Response[insurance.data.new$Response==2] <- 1
insurance.datas <- insurance.data[ , -which(names(insurance.data) %in% c("Customer","Policy","Effective.To.Date"))]
Split the data
set.seed(13255870)
index <- sample(nrow(insurance.data.new),nrow(insurance.data.new)*0.70)
insurance.train = insurance.data.new[index,]
insurance.test = insurance.data.new[-index,]
str(insurance.train)
## 'data.frame': 6393 obs. of 21 variables:
## $ State : Factor w/ 5 levels "Arizona","California",..: 1 4 4 2 4 1 2 1 4 1 ...
## $ Customer.Lifetime.Value : num 4014 5511 8305 2787 8677 ...
## $ Response : num 1 0 0 0 0 0 0 0 0 0 ...
## $ Coverage : Factor w/ 3 levels "Basic","Extended",..: 2 1 2 1 1 2 3 1 1 2 ...
## $ Education : Factor w/ 5 levels "Bachelor","College",..: 3 4 2 1 4 2 2 5 2 4 ...
## $ EmploymentStatus : Factor w/ 5 levels "Disabled","Employed",..: 2 5 5 2 2 2 2 2 5 2 ...
## $ Gender : Factor w/ 2 levels "F","M": 1 1 2 2 2 2 1 1 1 2 ...
## $ Income : int 37384 0 0 38667 76214 25899 92850 51199 0 53603 ...
## $ Location.Code : Factor w/ 3 levels "Rural","Suburban",..: 2 2 2 1 3 2 2 2 2 2 ...
## $ Marital.Status : Factor w/ 3 levels "Divorced","Married",..: 2 3 2 3 2 1 3 1 3 2 ...
## $ Monthly.Premium.Auto : int 99 73 122 72 72 79 104 74 72 132 ...
## $ Months.Since.Last.Claim : int 9 24 22 8 7 10 3 19 15 30 ...
## $ Months.Since.Policy.Inception: int 17 57 14 67 48 11 28 52 70 1 ...
## $ Number.of.Open.Complaints : int 0 0 2 0 0 0 1 0 0 1 ...
## $ Number.of.Policies : int 1 4 9 1 2 8 2 1 2 1 ...
## $ Policy.Type : Factor w/ 3 levels "Corporate Auto",..: 2 2 1 2 3 2 2 1 2 2 ...
## $ Renew.Offer.Type : Factor w/ 4 levels "Offer1","Offer2",..: 2 1 2 1 2 4 1 1 2 1 ...
## $ Sales.Channel : Factor w/ 4 levels "Agent","Branch",..: 1 2 1 1 4 2 1 2 2 2 ...
## $ Total.Claim.Amount : num 475 526 681 159 203 ...
## $ Vehicle.Class : Factor w/ 6 levels "Four-Door Car",..: 1 1 5 1 1 6 1 1 1 4 ...
## $ Vehicle.Size : Factor w/ 3 levels "Large","Medsize",..: 2 1 3 2 2 2 3 3 2 2 ...
model
glm0<-glm(Response~.,family = binomial(link = 'logit'),data = insurance.train)
summary(glm0)
##
## Call:
## glm(formula = Response ~ ., family = binomial(link = "logit"),
## data = insurance.train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.32576 -0.56835 -0.37340 -0.00021 3.11850
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.934e+00 5.122e-01 -3.775 0.000160 ***
## StateCalifornia 4.836e-02 1.159e-01 0.417 0.676380
## StateNevada -9.692e-03 1.605e-01 -0.060 0.951845
## StateOregon -2.025e-02 1.207e-01 -0.168 0.866821
## StateWashington -6.128e-02 1.660e-01 -0.369 0.711951
## Customer.Lifetime.Value -6.050e-06 6.443e-06 -0.939 0.347675
## CoverageExtended -6.962e-02 1.537e-01 -0.453 0.650504
## CoveragePremium -7.652e-02 3.238e-01 -0.236 0.813178
## EducationCollege 1.106e-01 1.062e-01 1.042 0.297565
## EducationDoctor 4.536e-01 2.051e-01 2.211 0.027021 *
## EducationHigh School or Below 1.951e-02 1.079e-01 0.181 0.856485
## EducationMaster 3.813e-01 1.555e-01 2.452 0.014205 *
## EmploymentStatusEmployed -2.355e-01 1.921e-01 -1.226 0.220202
## EmploymentStatusMedical Leave 1.111e-01 2.288e-01 0.486 0.627068
## EmploymentStatusRetired 2.508e+00 2.529e-01 9.916 < 2e-16 ***
## EmploymentStatusUnemployed -6.263e-01 1.992e-01 -3.144 0.001669 **
## GenderM 5.920e-02 8.173e-02 0.724 0.468852
## Income 4.149e-06 2.328e-06 1.782 0.074716 .
## Location.CodeSuburban 1.430e+00 1.789e-01 7.993 1.32e-15 ***
## Location.CodeUrban 9.295e-02 1.759e-01 0.528 0.597219
## Marital.StatusMarried -4.718e-01 1.090e-01 -4.327 1.51e-05 ***
## Marital.StatusSingle -4.882e-01 1.293e-01 -3.775 0.000160 ***
## Monthly.Premium.Auto 8.108e-03 6.248e-03 1.298 0.194373
## Months.Since.Last.Claim -4.898e-03 4.103e-03 -1.194 0.232553
## Months.Since.Policy.Inception 2.724e-04 1.450e-03 0.188 0.851032
## Number.of.Open.Complaints -5.366e-02 4.638e-02 -1.157 0.247243
## Number.of.Policies -2.426e-02 1.710e-02 -1.419 0.155943
## Policy.TypePersonal Auto 2.368e-02 1.006e-01 0.235 0.813998
## Policy.TypeSpecial Auto 3.530e-01 2.044e-01 1.727 0.084129 .
## Renew.Offer.TypeOffer2 6.859e-01 8.850e-02 7.751 9.14e-15 ***
## Renew.Offer.TypeOffer3 -2.389e+00 2.675e-01 -8.931 < 2e-16 ***
## Renew.Offer.TypeOffer4 -1.679e+01 2.318e+02 -0.072 0.942264
## Sales.ChannelBranch -5.364e-01 1.008e-01 -5.324 1.01e-07 ***
## Sales.ChannelCall Center -4.070e-01 1.143e-01 -3.562 0.000368 ***
## Sales.ChannelWeb -6.886e-01 1.388e-01 -4.960 7.06e-07 ***
## Total.Claim.Amount -1.479e-03 3.367e-04 -4.392 1.12e-05 ***
## Vehicle.ClassLuxury Car -4.072e-01 8.933e-01 -0.456 0.648550
## Vehicle.ClassLuxury SUV -3.128e-02 8.557e-01 -0.037 0.970843
## Vehicle.ClassSports Car 3.149e-01 3.162e-01 0.996 0.319193
## Vehicle.ClassSUV 2.841e-01 2.799e-01 1.015 0.310060
## Vehicle.ClassTwo-Door Car 6.233e-02 1.078e-01 0.578 0.563177
## Vehicle.SizeMedsize -2.747e-01 1.266e-01 -2.170 0.030018 *
## Vehicle.SizeSmall -6.271e-01 1.530e-01 -4.098 4.16e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 5203.1 on 6392 degrees of freedom
## Residual deviance: 4031.5 on 6350 degrees of freedom
## AIC: 4117.5
##
## Number of Fisher Scoring iterations: 17
insurance_model0_insample <- predict(glm0, type="response")
pred <- prediction(insurance_model0_insample,insurance.train$Response)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)
Get Area Under Curve (AUC)
cat('AUC for full model is ',unlist(slot(performance(pred, "auc"), "y.values")))
## AUC for full model is 0.8170277
Using Model for our Testing data
insurance_model0_insample <- predict(glm0, newdata =insurance.test ,type="response")
pred <- prediction(insurance_model0_insample,insurance.test$Response)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)
Get Area Under Curve (AUC)
cat('AUC for full model is ',unlist(slot(performance(pred, "auc"), "y.values")))
## AUC for full model is 0.8090875
Confusion Matrix
predict.insurance.glm <- predict(glm0, newdata = insurance.test, type = "response")
conf.mat.glm <- table(insurance.test$Response, predict.insurance.glm>.5)
fourfoldplot(conf.mat.glm, color = c("#CC6666", "#99CC99"),
conf.level = 0, margin = 1, main = "Confusion Matrix GLM")
glm_accuracy <- (2312+63)/(2312+23+343+63)
glm_recall <- 63/(63+2312)
glm_precision <- 63/(63+343)
glm_accuracy
## [1] 0.8664721
glm_recall
## [1] 0.02652632
glm_precision
## [1] 0.1551724
resp_count <- insurance.data.new %>% group_by(Response) %>% summarise(count = n())
resp_count
## # A tibble: 2 x 2
## Response count
## <dbl> <int>
## 1 0 7826
## 2 1 1308
##Logistic Regression Model number 2 Using Lasso
dummy<- model.matrix(~ ., data = insurance.data.new)
insurance_data_lasso <- data.frame(dummy[,-1])
insurance.train.X <- as.matrix(select(insurance_data_lasso, -Response)[index,])
insurance.test.X <- as.matrix(select(insurance_data_lasso, -Response)[-index,])
insurance.train.Y <- insurance_data_lasso[index, "Response"]
insurance.test.Y <- insurance_data_lasso[-index, "Response"]
insurance_lasso <- glmnet(x=insurance.train.X, y=insurance.train.Y, family = "binomial")
insurance_lasso_cv <- cv.glmnet(x=insurance.train.X, y=insurance.train.Y, family = "binomial", type.measure = "class")
plot(insurance_lasso_cv)
par(mfrow=c(1,1))
coef(insurance_lasso, s=insurance_lasso_cv$lambda.min)
## 43 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -1.811631e+00
## StateCalifornia .
## StateNevada .
## StateOregon .
## StateWashington .
## Customer.Lifetime.Value .
## CoverageExtended .
## CoveragePremium .
## EducationCollege .
## EducationDoctor 1.935424e-01
## EducationHigh.School.or.Below .
## EducationMaster 1.854072e-01
## EmploymentStatusEmployed .
## EmploymentStatusMedical.Leave 4.950095e-02
## EmploymentStatusRetired 2.456598e+00
## EmploymentStatusUnemployed -5.806465e-01
## GenderM .
## Income 5.080228e-07
## Location.CodeSuburban 8.330149e-01
## Location.CodeUrban -1.098303e-01
## Marital.StatusMarried -2.341657e-01
## Marital.StatusSingle -2.912739e-01
## Monthly.Premium.Auto 5.495355e-05
## Months.Since.Last.Claim -1.644167e-03
## Months.Since.Policy.Inception .
## Number.of.Open.Complaints -5.522565e-03
## Number.of.Policies -1.047594e-02
## Policy.TypePersonal.Auto .
## Policy.TypeSpecial.Auto 1.183639e-01
## Renew.Offer.TypeOffer2 6.036038e-01
## Renew.Offer.TypeOffer3 -1.875267e+00
## Renew.Offer.TypeOffer4 -2.712024e+00
## Sales.ChannelBranch -3.516541e-01
## Sales.ChannelCall.Center -2.149748e-01
## Sales.ChannelWeb -4.546039e-01
## Total.Claim.Amount -2.705172e-04
## Vehicle.ClassLuxury.Car .
## Vehicle.ClassLuxury.SUV .
## Vehicle.ClassSports.Car 2.161568e-01
## Vehicle.ClassSUV 2.308456e-01
## Vehicle.ClassTwo.Door.Car .
## Vehicle.SizeMedsize .
## Vehicle.SizeSmall -2.841743e-01
coef(insurance_lasso, s=insurance_lasso_cv$lambda.1se)
## 43 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -1.977400124
## StateCalifornia .
## StateNevada .
## StateOregon .
## StateWashington .
## Customer.Lifetime.Value .
## CoverageExtended .
## CoveragePremium .
## EducationCollege .
## EducationDoctor .
## EducationHigh.School.or.Below .
## EducationMaster .
## EmploymentStatusEmployed .
## EmploymentStatusMedical.Leave .
## EmploymentStatusRetired 2.032512167
## EmploymentStatusUnemployed .
## GenderM .
## Income .
## Location.CodeSuburban 0.008787571
## Location.CodeUrban .
## Marital.StatusMarried .
## Marital.StatusSingle .
## Monthly.Premium.Auto .
## Months.Since.Last.Claim .
## Months.Since.Policy.Inception .
## Number.of.Open.Complaints .
## Number.of.Policies .
## Policy.TypePersonal.Auto .
## Policy.TypeSpecial.Auto .
## Renew.Offer.TypeOffer2 0.384398291
## Renew.Offer.TypeOffer3 -0.344749739
## Renew.Offer.TypeOffer4 -0.329119235
## Sales.ChannelBranch .
## Sales.ChannelCall.Center .
## Sales.ChannelWeb .
## Total.Claim.Amount .
## Vehicle.ClassLuxury.Car .
## Vehicle.ClassLuxury.SUV .
## Vehicle.ClassSports.Car .
## Vehicle.ClassSUV .
## Vehicle.ClassTwo.Door.Car .
## Vehicle.SizeMedsize .
## Vehicle.SizeSmall .
pred.lasso.train<- predict(insurance_lasso, newx=insurance.train.X, s=insurance_lasso_cv$lambda.min, type = "response")
pred <- prediction(pred.lasso.train,insurance.train.Y)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)
Get Area Under Curve (AUC)
cat('AUC for full model is ',unlist(slot(performance(pred, "auc"), "y.values")))
## AUC for full model is 0.8094227
Out-of-sample prediction
pred.lasso.test<- predict(insurance_lasso, newx=insurance.test.X, s=insurance_lasso_cv$lambda.min, type = "response")
pred <- prediction(pred.lasso.test,insurance.test.Y)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)
Get Area Under Curve (AUC)
cat('AUC for full model is ',unlist(slot(performance(pred, "auc"), "y.values")))
## AUC for full model is 0.8025042
Confusion Matrix
conf.mat.lasso <- table(insurance.test$Response, pred.lasso.test>.5)
fourfoldplot(conf.mat.lasso, color = c("#CC6666", "#99CC99"),
conf.level = 0, margin = 1, main = "Confusion Matrix Lasso")
lasso_accuracy <- (2280 + 13)/(2280+55+393+13)
lasso_accuracy
## [1] 0.836556
##Decision Tree
library(rpart)
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 3.6.3
Data ready with Categorical and Numeric variables. Start Decision Tree Classification Tree with all variables for training dataset
tree0 <- rpart(formula = Response ~ ., data = insurance.train, method = "class")
tree0
## n= 6393
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 6393 902 0 (0.85890818 0.14109182)
## 2) EmploymentStatus=Disabled,Employed,Medical Leave,Unemployed 6197 760 0 (0.87736001 0.12263999) *
## 3) EmploymentStatus=Retired 196 54 1 (0.27551020 0.72448980)
## 6) Renew.Offer.Type=Offer3,Offer4 23 2 0 (0.91304348 0.08695652) *
## 7) Renew.Offer.Type=Offer1,Offer2 173 33 1 (0.19075145 0.80924855) *
prp(tree0, extra = 1)
This tree has 3 leaf nodes with Employement and Renewal as the main split
Checking predicition rate of tree
pred0<- predict(tree0, type="class")
table(insurance.train$Response, pred0, dnn = c("True", "Pred"))
## Pred
## True 0 1
## 0 5458 33
## 1 762 140
The Tree0 has a prediction rate of 87.5% (5599 out of 6393 predicted right).
Since only twow variables used, changing Complexity Parameter to add more depth. CP = 0.001 (0.01 default)
tree1 <- rpart(formula = Response ~ ., data = insurance.train, cp=0.001, method = "class")
tree1
## n= 6393
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 6393 902 0 (0.858908181 0.141091819)
## 2) EmploymentStatus=Disabled,Employed,Medical Leave,Unemployed 6197 760 0 (0.877360013 0.122639987)
## 4) Renew.Offer.Type=Offer3,Offer4 1705 14 0 (0.991788856 0.008211144) *
## 5) Renew.Offer.Type=Offer1,Offer2 4492 746 0 (0.833926981 0.166073019)
## 10) Renew.Offer.Type=Offer1 2501 318 0 (0.872850860 0.127149140)
## 20) Location.Code=Rural,Urban 844 50 0 (0.940758294 0.059241706)
## 40) Education=Bachelor,College 547 17 0 (0.968921389 0.031078611) *
## 41) Education=Doctor,High School or Below,Master 297 33 0 (0.888888889 0.111111111)
## 82) Customer.Lifetime.Value>=4491.294 253 19 0 (0.924901186 0.075098814)
## 164) Number.of.Policies< 5.5 180 4 0 (0.977777778 0.022222222) *
## 165) Number.of.Policies>=5.5 73 15 0 (0.794520548 0.205479452)
## 330) Marital.Status=Divorced,Married 58 4 0 (0.931034483 0.068965517)
## 660) Income< 88917.5 51 0 0 (1.000000000 0.000000000) *
## 661) Income>=88917.5 7 3 1 (0.428571429 0.571428571) *
## 331) Marital.Status=Single 15 4 1 (0.266666667 0.733333333) *
## 83) Customer.Lifetime.Value< 4491.294 44 14 0 (0.681818182 0.318181818)
## 166) Total.Claim.Amount>=70.40171 32 5 0 (0.843750000 0.156250000)
## 332) Customer.Lifetime.Value< 4260.683 25 0 0 (1.000000000 0.000000000) *
## 333) Customer.Lifetime.Value>=4260.683 7 2 1 (0.285714286 0.714285714) *
## 167) Total.Claim.Amount< 70.40171 12 3 1 (0.250000000 0.750000000) *
## 21) Location.Code=Suburban 1657 268 0 (0.838261919 0.161738081)
## 42) Marital.Status=Married,Single 1381 188 0 (0.863866763 0.136133237)
## 84) Total.Claim.Amount>=667.6793 365 14 0 (0.961643836 0.038356164) *
## 85) Total.Claim.Amount< 667.6793 1016 174 0 (0.828740157 0.171259843)
## 170) Monthly.Premium.Auto< 101.5 737 88 0 (0.880597015 0.119402985)
## 340) Total.Claim.Amount>=460.8031 224 3 0 (0.986607143 0.013392857) *
## 341) Total.Claim.Amount< 460.8031 513 85 0 (0.834307992 0.165692008)
## 682) Months.Since.Policy.Inception>=13.5 441 58 0 (0.868480726 0.131519274)
## 1364) Customer.Lifetime.Value>=6076.984 136 3 0 (0.977941176 0.022058824) *
## 1365) Customer.Lifetime.Value< 6076.984 305 55 0 (0.819672131 0.180327869)
## 2730) Customer.Lifetime.Value< 5469.964 239 24 0 (0.899581590 0.100418410)
## 5460) Monthly.Premium.Auto< 91.5 225 15 0 (0.933333333 0.066666667)
## 10920) Total.Claim.Amount< 455.1514 214 10 0 (0.953271028 0.046728972)
## 21840) Sales.Channel=Agent,Branch,Call Center 194 4 0 (0.979381443 0.020618557) *
## 21841) Sales.Channel=Web 20 6 0 (0.700000000 0.300000000)
## 43682) Vehicle.Class=Four-Door Car 12 0 0 (1.000000000 0.000000000) *
## 43683) Vehicle.Class=Two-Door Car 8 2 1 (0.250000000 0.750000000) *
## 10921) Total.Claim.Amount>=455.1514 11 5 0 (0.545454545 0.454545455) *
## 5461) Monthly.Premium.Auto>=91.5 14 5 1 (0.357142857 0.642857143) *
## 2731) Customer.Lifetime.Value>=5469.964 66 31 0 (0.530303030 0.469696970)
## 5462) Customer.Lifetime.Value>=5528.439 46 13 0 (0.717391304 0.282608696)
## 10924) Coverage=Basic 32 0 0 (1.000000000 0.000000000) *
## 10925) Coverage=Extended 14 1 1 (0.071428571 0.928571429) *
## 5463) Customer.Lifetime.Value< 5528.439 20 2 1 (0.100000000 0.900000000) *
## 683) Months.Since.Policy.Inception< 13.5 72 27 0 (0.625000000 0.375000000)
## 1366) Education=Bachelor 18 0 0 (1.000000000 0.000000000) *
## 1367) Education=College,High School or Below 54 27 0 (0.500000000 0.500000000)
## 2734) Total.Claim.Amount>=295.2 45 18 0 (0.600000000 0.400000000)
## 5468) Monthly.Premium.Auto< 66.5 12 0 0 (1.000000000 0.000000000) *
## 5469) Monthly.Premium.Auto>=66.5 33 15 1 (0.454545455 0.545454545)
## 10938) Monthly.Premium.Auto>=70.5 18 5 0 (0.722222222 0.277777778) *
## 10939) Monthly.Premium.Auto< 70.5 15 2 1 (0.133333333 0.866666667) *
## 2735) Total.Claim.Amount< 295.2 9 0 1 (0.000000000 1.000000000) *
## 171) Monthly.Premium.Auto>=101.5 279 86 0 (0.691756272 0.308243728)
## 342) Customer.Lifetime.Value< 9619.43 172 32 0 (0.813953488 0.186046512)
## 684) Months.Since.Policy.Inception< 46.5 67 0 0 (1.000000000 0.000000000) *
## 685) Months.Since.Policy.Inception>=46.5 105 32 0 (0.695238095 0.304761905)
## 1370) Months.Since.Policy.Inception>=51 87 18 0 (0.793103448 0.206896552)
## 2740) Total.Claim.Amount>=516 64 6 0 (0.906250000 0.093750000)
## 5480) Months.Since.Last.Claim< 30.5 56 0 0 (1.000000000 0.000000000) *
## 5481) Months.Since.Last.Claim>=30.5 8 2 1 (0.250000000 0.750000000) *
## 2741) Total.Claim.Amount< 516 23 11 1 (0.478260870 0.521739130)
## 5482) Months.Since.Last.Claim>=3.5 14 4 0 (0.714285714 0.285714286) *
## 5483) Months.Since.Last.Claim< 3.5 9 1 1 (0.111111111 0.888888889) *
## 1371) Months.Since.Policy.Inception< 51 18 4 1 (0.222222222 0.777777778) *
## 343) Customer.Lifetime.Value>=9619.43 107 53 1 (0.495327103 0.504672897)
## 686) Number.of.Open.Complaints>=0.5 21 0 0 (1.000000000 0.000000000) *
## 687) Number.of.Open.Complaints< 0.5 86 32 1 (0.372093023 0.627906977)
## 1374) Total.Claim.Amount>=544.8 42 17 0 (0.595238095 0.404761905)
## 2748) Months.Since.Last.Claim< 11.5 14 0 0 (1.000000000 0.000000000) *
## 2749) Months.Since.Last.Claim>=11.5 28 11 1 (0.392857143 0.607142857)
## 5498) Customer.Lifetime.Value>=10215.37 17 8 0 (0.529411765 0.470588235) *
## 5499) Customer.Lifetime.Value< 10215.37 11 2 1 (0.181818182 0.818181818) *
## 1375) Total.Claim.Amount< 544.8 44 7 1 (0.159090909 0.840909091) *
## 43) Marital.Status=Divorced 276 80 0 (0.710144928 0.289855072)
## 86) Sales.Channel=Branch 63 4 0 (0.936507937 0.063492063) *
## 87) Sales.Channel=Agent,Call Center,Web 213 76 0 (0.643192488 0.356807512)
## 174) Total.Claim.Amount< 991.2 197 63 0 (0.680203046 0.319796954)
## 348) Income>=26093 99 16 0 (0.838383838 0.161616162)
## 696) Customer.Lifetime.Value< 32347.55 91 9 0 (0.901098901 0.098901099) *
## 697) Customer.Lifetime.Value>=32347.55 8 1 1 (0.125000000 0.875000000) *
## 349) Income< 26093 98 47 0 (0.520408163 0.479591837)
## 698) Income< 25796 85 34 0 (0.600000000 0.400000000)
## 1396) Months.Since.Policy.Inception< 92.5 70 21 0 (0.700000000 0.300000000)
## 2792) Education=College,Doctor,High School or Below 34 1 0 (0.970588235 0.029411765) *
## 2793) Education=Bachelor,Master 36 16 1 (0.444444444 0.555555556)
## 5586) Gender=M 14 4 0 (0.714285714 0.285714286) *
## 5587) Gender=F 22 6 1 (0.272727273 0.727272727) *
## 1397) Months.Since.Policy.Inception>=92.5 15 2 1 (0.133333333 0.866666667) *
## 699) Income>=25796 13 0 1 (0.000000000 1.000000000) *
## 175) Total.Claim.Amount>=991.2 16 3 1 (0.187500000 0.812500000) *
## 11) Renew.Offer.Type=Offer2 1991 428 0 (0.785032647 0.214967353)
## 22) Sales.Channel=Branch,Web 882 139 0 (0.842403628 0.157596372)
## 44) Monthly.Premium.Auto< 106.5 650 76 0 (0.883076923 0.116923077)
## 88) Marital.Status=Married,Single 581 52 0 (0.910499139 0.089500861)
## 176) Total.Claim.Amount< 1145.186 574 46 0 (0.919860627 0.080139373)
## 352) Total.Claim.Amount>=63.61346 529 33 0 (0.937618147 0.062381853)
## 704) Income< 22503.5 130 0 0 (1.000000000 0.000000000) *
## 705) Income>=22503.5 399 33 0 (0.917293233 0.082706767)
## 1410) Marital.Status=Married 333 18 0 (0.945945946 0.054054054) *
## 1411) Marital.Status=Single 66 15 0 (0.772727273 0.227272727)
## 2822) Income>=49629 34 0 0 (1.000000000 0.000000000) *
## 2823) Income< 49629 32 15 0 (0.531250000 0.468750000)
## 5646) Education=Bachelor,Doctor,Master 10 0 0 (1.000000000 0.000000000) *
## 5647) Education=College,High School or Below 22 7 1 (0.318181818 0.681818182)
## 11294) Months.Since.Last.Claim< 27.5 13 6 0 (0.538461538 0.461538462) *
## 11295) Months.Since.Last.Claim>=27.5 9 0 1 (0.000000000 1.000000000) *
## 353) Total.Claim.Amount< 63.61346 45 13 0 (0.711111111 0.288888889)
## 706) Customer.Lifetime.Value>=2792.734 26 0 0 (1.000000000 0.000000000) *
## 707) Customer.Lifetime.Value< 2792.734 19 6 1 (0.315789474 0.684210526) *
## 177) Total.Claim.Amount>=1145.186 7 1 1 (0.142857143 0.857142857) *
## 89) Marital.Status=Divorced 69 24 0 (0.652173913 0.347826087)
## 178) Education=Bachelor,College,High School or Below,Master 58 13 0 (0.775862069 0.224137931)
## 356) Months.Since.Policy.Inception< 58.5 36 1 0 (0.972222222 0.027777778) *
## 357) Months.Since.Policy.Inception>=58.5 22 10 1 (0.454545455 0.545454545)
## 714) Income< 74839 12 3 0 (0.750000000 0.250000000) *
## 715) Income>=74839 10 1 1 (0.100000000 0.900000000) *
## 179) Education=Doctor 11 0 1 (0.000000000 1.000000000) *
## 45) Monthly.Premium.Auto>=106.5 232 63 0 (0.728448276 0.271551724)
## 90) Customer.Lifetime.Value>=9072.068 82 6 0 (0.926829268 0.073170732) *
## 91) Customer.Lifetime.Value< 9072.068 150 57 0 (0.620000000 0.380000000)
## 182) Months.Since.Policy.Inception>=27 95 23 0 (0.757894737 0.242105263)
## 364) Monthly.Premium.Auto>=114.5 38 0 0 (1.000000000 0.000000000) *
## 365) Monthly.Premium.Auto< 114.5 57 23 0 (0.596491228 0.403508772)
## 730) Location.Code=Rural 13 0 0 (1.000000000 0.000000000) *
## 731) Location.Code=Suburban,Urban 44 21 1 (0.477272727 0.522727273)
## 1462) Number.of.Open.Complaints>=0.5 11 0 0 (1.000000000 0.000000000) *
## 1463) Number.of.Open.Complaints< 0.5 33 10 1 (0.303030303 0.696969697)
## 2926) Months.Since.Policy.Inception< 65.5 19 9 0 (0.526315789 0.473684211) *
## 2927) Months.Since.Policy.Inception>=65.5 14 0 1 (0.000000000 1.000000000) *
## 183) Months.Since.Policy.Inception< 27 55 21 1 (0.381818182 0.618181818)
## 366) Months.Since.Last.Claim< 8.5 8 0 0 (1.000000000 0.000000000) *
## 367) Months.Since.Last.Claim>=8.5 47 13 1 (0.276595745 0.723404255)
## 734) Months.Since.Last.Claim>=10.5 33 13 1 (0.393939394 0.606060606)
## 1468) Months.Since.Last.Claim< 22 7 0 0 (1.000000000 0.000000000) *
## 1469) Months.Since.Last.Claim>=22 26 6 1 (0.230769231 0.769230769) *
## 735) Months.Since.Last.Claim< 10.5 14 0 1 (0.000000000 1.000000000) *
## 23) Sales.Channel=Agent,Call Center 1109 289 0 (0.739404869 0.260595131)
## 46) Location.Code=Rural,Urban 413 80 0 (0.806295400 0.193704600)
## 92) Months.Since.Policy.Inception< 28.5 115 5 0 (0.956521739 0.043478261) *
## 93) Months.Since.Policy.Inception>=28.5 298 75 0 (0.748322148 0.251677852)
## 186) Education=Doctor,Master 45 0 0 (1.000000000 0.000000000) *
## 187) Education=Bachelor,College,High School or Below 253 75 0 (0.703557312 0.296442688)
## 374) Income< 35124.5 33 0 0 (1.000000000 0.000000000) *
## 375) Income>=35124.5 220 75 0 (0.659090909 0.340909091)
## 750) Vehicle.Class=Luxury SUV,SUV 24 0 0 (1.000000000 0.000000000) *
## 751) Vehicle.Class=Four-Door Car,Sports Car,Two-Door Car 196 75 0 (0.617346939 0.382653061)
## 1502) Total.Claim.Amount>=289.2332 34 4 0 (0.882352941 0.117647059)
## 3004) Monthly.Premium.Auto< 108 27 0 0 (1.000000000 0.000000000) *
## 3005) Monthly.Premium.Auto>=108 7 3 1 (0.428571429 0.571428571) *
## 1503) Total.Claim.Amount< 289.2332 162 71 0 (0.561728395 0.438271605)
## 3006) Total.Claim.Amount< 279.439 148 58 0 (0.608108108 0.391891892)
## 6012) Monthly.Premium.Auto< 88.5 110 31 0 (0.718181818 0.281818182)
## 12024) Months.Since.Last.Claim>=3.5 80 14 0 (0.825000000 0.175000000)
## 24048) Customer.Lifetime.Value>=2517.447 66 3 0 (0.954545455 0.045454545) *
## 24049) Customer.Lifetime.Value< 2517.447 14 3 1 (0.214285714 0.785714286) *
## 12025) Months.Since.Last.Claim< 3.5 30 13 1 (0.433333333 0.566666667)
## 24050) Vehicle.Class=Two-Door Car 10 3 0 (0.700000000 0.300000000) *
## 24051) Vehicle.Class=Four-Door Car 20 6 1 (0.300000000 0.700000000)
## 48102) Gender=F 8 3 0 (0.625000000 0.375000000) *
## 48103) Gender=M 12 1 1 (0.083333333 0.916666667) *
## 6013) Monthly.Premium.Auto>=88.5 38 11 1 (0.289473684 0.710526316)
## 12026) Months.Since.Last.Claim< 13 9 0 0 (1.000000000 0.000000000) *
## 12027) Months.Since.Last.Claim>=13 29 2 1 (0.068965517 0.931034483) *
## 3007) Total.Claim.Amount>=279.439 14 1 1 (0.071428571 0.928571429) *
## 47) Location.Code=Suburban 696 209 0 (0.699712644 0.300287356)
## 94) Income< 48520 479 116 0 (0.757828810 0.242171190)
## 188) Marital.Status=Single 148 4 0 (0.972972973 0.027027027) *
## 189) Marital.Status=Divorced,Married 331 112 0 (0.661631420 0.338368580)
## 378) Customer.Lifetime.Value< 13172.35 282 81 0 (0.712765957 0.287234043)
## 756) Education=High School or Below,Master 89 12 0 (0.865168539 0.134831461)
## 1512) Income< 44323.5 77 5 0 (0.935064935 0.064935065) *
## 1513) Income>=44323.5 12 5 1 (0.416666667 0.583333333) *
## 757) Education=Bachelor,College,Doctor 193 69 0 (0.642487047 0.357512953)
## 1514) Income>=27233.5 56 7 0 (0.875000000 0.125000000) *
## 1515) Income< 27233.5 137 62 0 (0.547445255 0.452554745)
## 3030) Months.Since.Policy.Inception>=77.5 15 0 0 (1.000000000 0.000000000) *
## 3031) Months.Since.Policy.Inception< 77.5 122 60 1 (0.491803279 0.508196721)
## 6062) Total.Claim.Amount< 312 11 0 0 (1.000000000 0.000000000) *
## 6063) Total.Claim.Amount>=312 111 49 1 (0.441441441 0.558558559)
## 12126) Coverage=Premium 9 0 0 (1.000000000 0.000000000) *
## 12127) Coverage=Basic,Extended 102 40 1 (0.392156863 0.607843137)
## 24254) Months.Since.Policy.Inception>=12.5 84 39 1 (0.464285714 0.535714286)
## 48508) Months.Since.Policy.Inception< 37.5 15 0 0 (1.000000000 0.000000000) *
## 48509) Months.Since.Policy.Inception>=37.5 69 24 1 (0.347826087 0.652173913)
## 97018) Months.Since.Policy.Inception>=49.5 41 19 0 (0.536585366 0.463414634)
## 194036) Monthly.Premium.Auto>=72 13 0 0 (1.000000000 0.000000000) *
## 194037) Monthly.Premium.Auto< 72 28 9 1 (0.321428571 0.678571429)
## 388074) State=Arizona,Washington 12 5 0 (0.583333333 0.416666667) *
## 388075) State=California,Nevada,Oregon 16 2 1 (0.125000000 0.875000000) *
## 97019) Months.Since.Policy.Inception< 49.5 28 2 1 (0.071428571 0.928571429) *
## 24255) Months.Since.Policy.Inception< 12.5 18 1 1 (0.055555556 0.944444444) *
## 379) Customer.Lifetime.Value>=13172.35 49 18 1 (0.367346939 0.632653061)
## 758) Months.Since.Policy.Inception>=62 9 0 0 (1.000000000 0.000000000) *
## 759) Months.Since.Policy.Inception< 62 40 9 1 (0.225000000 0.775000000)
## 1518) Total.Claim.Amount>=571.918 13 6 0 (0.538461538 0.461538462) *
## 1519) Total.Claim.Amount< 571.918 27 2 1 (0.074074074 0.925925926) *
## 95) Income>=48520 217 93 0 (0.571428571 0.428571429)
## 190) Monthly.Premium.Auto< 68.5 42 7 0 (0.833333333 0.166666667) *
## 191) Monthly.Premium.Auto>=68.5 175 86 0 (0.508571429 0.491428571)
## 382) Monthly.Premium.Auto>=76.5 116 46 0 (0.603448276 0.396551724)
## 764) Coverage=Basic 22 0 0 (1.000000000 0.000000000) *
## 765) Coverage=Extended,Premium 94 46 0 (0.510638298 0.489361702)
## 1530) Gender=M 28 6 0 (0.785714286 0.214285714)
## 3060) Income>=59440 19 0 0 (1.000000000 0.000000000) *
## 3061) Income< 59440 9 3 1 (0.333333333 0.666666667) *
## 1531) Gender=F 66 26 1 (0.393939394 0.606060606)
## 3062) Customer.Lifetime.Value>=6645.183 20 7 0 (0.650000000 0.350000000)
## 6124) Total.Claim.Amount< 606.2715 10 0 0 (1.000000000 0.000000000) *
## 6125) Total.Claim.Amount>=606.2715 10 3 1 (0.300000000 0.700000000) *
## 3063) Customer.Lifetime.Value< 6645.183 46 13 1 (0.282608696 0.717391304)
## 6126) Sales.Channel=Call Center 10 4 0 (0.600000000 0.400000000) *
## 6127) Sales.Channel=Agent 36 7 1 (0.194444444 0.805555556) *
## 383) Monthly.Premium.Auto< 76.5 59 19 1 (0.322033898 0.677966102)
## 766) Months.Since.Last.Claim>=25.5 8 0 0 (1.000000000 0.000000000) *
## 767) Months.Since.Last.Claim< 25.5 51 11 1 (0.215686275 0.784313725)
## 1534) Number.of.Open.Complaints>=0.5 12 4 0 (0.666666667 0.333333333) *
## 1535) Number.of.Open.Complaints< 0.5 39 3 1 (0.076923077 0.923076923) *
## 3) EmploymentStatus=Retired 196 54 1 (0.275510204 0.724489796)
## 6) Renew.Offer.Type=Offer3,Offer4 23 2 0 (0.913043478 0.086956522) *
## 7) Renew.Offer.Type=Offer1,Offer2 173 33 1 (0.190751445 0.809248555)
## 14) Vehicle.Size=Small 18 6 0 (0.666666667 0.333333333) *
## 15) Vehicle.Size=Large,Medsize 155 21 1 (0.135483871 0.864516129)
## 30) Months.Since.Last.Claim< 10.5 57 15 1 (0.263157895 0.736842105)
## 60) Customer.Lifetime.Value>=5022.643 21 10 0 (0.523809524 0.476190476)
## 120) Customer.Lifetime.Value< 10395.74 9 0 0 (1.000000000 0.000000000) *
## 121) Customer.Lifetime.Value>=10395.74 12 2 1 (0.166666667 0.833333333) *
## 61) Customer.Lifetime.Value< 5022.643 36 4 1 (0.111111111 0.888888889) *
## 31) Months.Since.Last.Claim>=10.5 98 6 1 (0.061224490 0.938775510) *
prp(tree1, extra = 1)
## Warning: labs do not fit even at cex 0.15, there may be some overplotting
Depth of the tree is too large and unreadable because of length of variables and their factors. Pruning the tree to change Complexity Parameter to reduce depth.
plotcp(tree1)
printcp(tree1)
##
## Classification tree:
## rpart(formula = Response ~ ., data = insurance.train, method = "class",
## cp = 0.001)
##
## Variables actually used in tree construction:
## [1] Coverage Customer.Lifetime.Value
## [3] Education EmploymentStatus
## [5] Gender Income
## [7] Location.Code Marital.Status
## [9] Monthly.Premium.Auto Months.Since.Last.Claim
## [11] Months.Since.Policy.Inception Number.of.Open.Complaints
## [13] Number.of.Policies Renew.Offer.Type
## [15] Sales.Channel State
## [17] Total.Claim.Amount Vehicle.Class
## [19] Vehicle.Size
##
## Root node error: 902/6393 = 0.14109
##
## n= 6393
##
## CP nsplit rel error xerror xstd
## 1 0.0975610 0 1.00000 1.00000 0.030858
## 2 0.0210643 1 0.90244 0.90244 0.029548
## 3 0.0066519 2 0.88137 0.88137 0.029251
## 4 0.0041574 3 0.87472 0.87805 0.029204
## 5 0.0040650 60 0.55765 0.77827 0.027714
## 6 0.0039595 63 0.54545 0.77605 0.027680
## 7 0.0038803 74 0.48780 0.77273 0.027628
## 8 0.0036031 78 0.47228 0.76386 0.027488
## 9 0.0033259 87 0.41685 0.70843 0.026587
## 10 0.0028825 91 0.40355 0.69956 0.026439
## 11 0.0022173 97 0.38581 0.68404 0.026176
## 12 0.0014782 105 0.36807 0.65188 0.025617
## 13 0.0011086 108 0.36364 0.64745 0.025539
## 14 0.0010000 114 0.35698 0.64080 0.025420
Setting the CP value to 0.0034
tree_final<-prune(tree1, cp = 0.0034)
prp(tree_final, extra = 1)
rpart.plot(tree_final, extra=1)
## Warning: labs do not fit even at cex 0.15, there may be some overplotting
Checking predicition rate of final tree - In Sample
pred_final<- predict(tree_final, type="class")
table(insurance.train$Response, pred_final, dnn = c("True", "Pred"))
## Pred
## True 0 1
## 0 5347 144
## 1 232 670
Checking predicition rate of final tree - Out Sample
pred_final_test<- predict(tree_final, newdata=insurance.test, type="class")
table(insurance.test$Response, pred_final_test, dnn = c("True", "Pred"))
## Pred
## True 0 1
## 0 2212 123
## 1 164 242
Checking AUC value for Training dataset- In Sample
pred.traintree = prediction(as.double(pred_final), insurance.train$Response)
perf = performance(pred.traintree, "tpr", "fpr")
plot(perf, colorize=TRUE)
unlist(slot(performance(pred.traintree,"auc"),"y.values"))
## [1] 0.8582845
str(pred_final)
## Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
Checking AUC value for Testing dataset- Out Sample
pred.traintree.test = prediction(as.double(pred_final_test), insurance.test$Response)
perf = performance(pred.traintree.test, "tpr", "fpr")
plot(perf, colorize=TRUE)
unlist(slot(performance(pred.traintree.test,"auc"),"y.values"))
## [1] 0.7716912
Mean error for training data
MR.treetrain<- mean(insurance.train$Response!= pred_final)
MR.treetrain
## [1] 0.05881433
Mean error for testing data
MR.treetrain<- mean(insurance.test$Response!= pred_final_test)
MR.treetrain
## [1] 0.1047063
Conclusion We have built two final models for our dataset, the Logistic Regression Model and the Decision Tree Model to help predict the response (Yes/No) of customers better based on different variables. As we can defer from the accuracy and AUC values of both our final models, we can conclude that our Decision Tree model came out as the Champion with an accuracy of 91.7%. Our Logistic model is our Challenger model with an accuracy of around 86% with an AUC of 81!