PROJECT WORK - INSURANCE COMPANY BENCHMARK CoIL 2000

Introduction

This data set is about a direct marketing case from the insurance sector which was to predict policy ownership. It is about predicting who would be interested in buying a caravan insurance policy. This data set was used in the second edition of the Computational intelligence and Learning(CoIL) competition Challenge in the Year 2000, organized by CoIL cluster, which is a cooperation between four EU funded Networks of Excellence which represent the areas of neural networks (NeuroNet), fuzzy systems (ERUDIT), evolutionary computing (EvoNet) and machine learning (MLNet) and it is owned and donated by Peter van der Putten of the Dutch data mining company Sentient Machine Research, Baarsjesweg 224 1058 AA Amsterdam The Netherlands +31 20 6186927 putten@liacs.nl and is based on real world business problem. TIC (The Insurance Company) Benchmark Homepage (http://www.liacs.nl/~putten/library/cc2000) was donated on March 7, 2000.

Relevant Papers

P. van der Putten and M. van Someren (eds). CoIL Challenge 2000: The Insurance Company Case. Published by Sentient Machine Research, Amsterdam. Also a Leiden Institute of Advanced Computer Science Technical Report 2000-09. June 22, 2000.

SUMMARY ABOUT DATASET

NO OF OBSERVATIONS: 5822 real customer records

NO OF VARIABLES: 86 Nos.

Each real customer record consists of 86 variables, containing sociodemographic data (variables 1-43) and product ownership data (variables 44-86). The sociodemographic data is derived from zip codes. All customers living in areas with the same zip code have the same sociodemographic attributes. Variable 86 (Purchase), “CARAVAN: Number of mobile home policies”, is the target variable which indicates whether the customer purchase a caravan insurance policy or not.

TASK

Predict which customers are potentially interested in a caravan insurance policy (Prediction or Classification).

PREDICTION TASK

To predict whether a customer is interested in a caravan insurance policy from other data about the customer. Information about customers consists of 86 variables and includes product usage data and socio-demographic data derived from zip area codes. The data was supplied by the Dutch data mining company Sentient Machine Research and is based on a real world business problem. The training set contains over 5000 descriptions of customers, including the information of whether or not they have a caravan insurance policy. A test set contains 4000 customers. In the prediction task, the underlying problem is to the find the subset of customers with a probability of having a caravan insurance policy above some boundary probability. The known policyholders can then be removed and the rest receives a mailing. The boundary depends on the costs and benefits such as of the costs of mailing and benefit of selling insurance policies. To approximate this problem, we want to find the set of 800 customers in the test set of 4000 customers that contains the most caravan policy owners. For each solution submitted, the number of actual policyholders will be counted and this gives the score of a solution.

library(ISLR)

## PIE CHART OF YES/NO FOR PURCHASE OF CARAVAN POLICY BY CUSTOMERS

r a<-table(Caravan$Purchase) a

## ## No Yes ## 5474 348

colors=c("red","green")
col=colors
pie(a,main = "CUSTOMERS OF CARAVAN POLICY",col=colors)
box()

OBSERVATION FOR PIE CHART The above piechart shows the number of customers who purchased(Yes) the Caravan policy which is 348 and who have not purchased(NO) the Caravan policy which is 5474

# BAR AND PIE CHARTS SHOWING CORRELATION OF CUSTOMERS WHO PURCHASED CARAVAN POLICY AND VARIOUS VARIABLES

## BAR CHARTS AND PIE CHARTS SHOWING PURCHASE OF CARAVAN POLICY BY CUSTOMERS AGAINST PRODUCT USAGE(POLICY OWNERSHIP) DATA VARIABLES

### 1.VARIABLE - NUMBER OF BOAT POLICIES

a<-table(Caravan$APLEZIER[Caravan$Purchase=="Yes"])
a

## 
##   0   1   2 
## 335  12   1

barplot(a,border="dark blue",main = "PURCHASE OF CARAVAN POLICY vs NUMBER OF BOAT POLICIES",xlab = "Number of boat policies",ylab = "Number of customers")

OBSERVATION OF CUSTOMER TYPE

In the above barplot, We come to know that the Customers who have not purchased the boat policy(0) have purchased the Caravan policy

### 2. VARIABLE - NUMBER OF SOCIAL SECURITY INSURANCE POLICIES

a<-table(Caravan$ABYSTAND[Caravan$Purchase=="Yes"])
a

## 
##   0   1 
## 332  16

barplot(a,border="dark blue",main = "PURCHASE OF CARAVAN POLICY vs NO. OF SS INSURANCE POLICIES",xlab = "Number of social security insurance policies",ylab = "Number of customers")

OBSERVATION OF CUSTOMER TYPE

In the above barplot, We come to know that the Customers who have not purchased social security insurance policy(0) have purchased the Caravan policy

### 3. VARIABLE - CONTRIBUTION CAR POLICIES

a<-table(Caravan$PPERSAUT[Caravan$Purchase=="Yes"])
a

## 
##   0   5   6 
##  72  14 262

colors=c("blue","red","green")
col=colors
pie(a,main ="PURCHASE OF CARAVAN POLICY vs CONTRIBUTION CAR POLICIES",col=colors)
box()

OBSERVATION OF CUSTOMER TYPE

In the above piechart, We come to know that the Customers who pay car policy premium averagely from $1000 to $4999(6)  have purchased the Caravan policy

### 4. VARIABLE - Number of fire policies

a<-table(Caravan$ABRAND[Caravan$Purchase=="Yes"])
a

## 
##   0   1   2 
## 109 232   7

colors=c("orange","violet","yellow")
col=colors
pie(a,main ="PURCHASE OF CARAVAN POLICY vs NUMBER OF FIRE POLICIES",col=colors)
box()

OBSERVATION OF CUSTOMER TYPE

In the above piechart, We come to know that the Customers who purchase only one fire policy have purchased the Caravan policy

## CHARTS SHOWING PURCHASE OF CARAVAN POLICY BY CUSTOMERS AGAINST SOCIODEMOGRAPHIC DATA VARIABLES

### 1. VARIABLE - CUSTOMER SUBTYPE

r a<-table(Caravan$MOSTYPE[Caravan$Purchase=="Yes"]) a

## ## 1 2 3 4 5 6 7 8 9 10 11 12 13 20 22 23 24 25 26 27 29 30 31 32 33 ## 13 6 25 2 2 12 3 51 12 9 9 16 13 2 4 4 5 2 1 1 2 4 6 8 46 ## 34 35 36 37 38 39 41 ## 9 8 16 10 23 19 5

r barplot(a,border="dark blue",main = "PURCHASE OF CARAVAN POLICY vs CUSTOMER SUBTYPE",xlab="Customer subtype",ylab="Number of customers")

OBSERVATION OF CUSTOMER TYPE

In the above barplot,Customers belong to 41 subtypes .Customers belonging to 8(Middle class families) & 33(lower class with large families) have purchased the Caravan policy

### 2. VARIABLE - AVG AGE (Age group)

r a<-table(Caravan$MGEMLEEF[Caravan$Purchase=="Yes"]) a

## ## 1 2 3 4 5 6 ## 1 87 183 64 12 1

names(a)=c("20 to 30","30 to 40","40 to 50","50 to 60","60 to 70","70 to 80")
barplot(a,col=rainbow(6),main = "PURCHASE OF CARAVAN POLICY vs AVE AGE",xlab="Avg age or Age group",ylab="Number of customers")

OBSERVATION FOR AVG AGE

In the above barplot, customers of various age group is taken it shows that customers belonging to age group of 40-50 are have purchased the caravan policy

### 3. VARIABLE - PURCHASING POWER CLASS

r a<-table(Caravan$MKOOPKLA[Caravan$Purchase=="Yes"]) a

## ## 1 2 3 4 5 6 7 8 ## 18 15 71 46 30 66 67 35

barplot(a,col=rainbow(7),main = "PURCHASE OF CARAVAN POLICY vs PURCHASING POWER CLASS",xlab = "Purchasing power class",ylab = "Number of customers")

OBSERVATION OF CUSTOMER TYPE

In the above barplot,**3rd** purchasing class High status seniors,**7th** Dinki's (double income no kids),**6** people and Career and childcare class have Purchased the caravan policy

### 4. VARIABLE - AVERAGE INCOME

a<-table(Caravan$MINKGEM[Caravan$Purchase=="Yes"])
a

## 
##   1   2   3   4   5   6   7   8 
##   1  20  69 139  70  24  17   8

pie(a,col=rainbow(7),main ="PURCHASE OF CARAVAN POLICY vs AVERAGE INCOME")
box()

OBSERVATION OF CUSTOMER TYPE

In the above piechart,  the Customers belonging to the 3rd label whose income 
is between $100 to $199 ,customers belonging to the 4th label whose income 
between $200to $499 and customers belonging to the 5th label whose income
between $500 to $999(5) are likely to purchase the Caravan policy

### 5. VARIABLE - CUSTOMER MAIN TYPE

b<-table(Caravan$MOSHOOFD[Caravan$Purchase=="Yes"])
b

## 
##  1  2  3  5  6  7  8  9 10 
## 48 66 59 15  4 20 89 42  5

colors=c("violet","yellow","blue","red","brown","orange","green")
color=colors

pie(b,col=colors,main ="PURCHASE OF CARAVAN POLICY vs CUSTOMER MAIN TYPE")
box()

OBSERVATION OF CUSTOMER TYPE

In the above Pie chart, Customers maintype includes  10 labels. Customers belonging to  maintype 2(Driven Growers)& maintype 8(Family with grown ups)have purchase the policy

PREDICTION MODELS USING ALGORITHMS

RPART

GLM

C50 - Rules and Trees

ZERO R

MODEL No 1

#PREDICTION USING ALGORITHM( RPART)

library(rpart)
library(rattle)

## Loading required package: RGtk2
## Rattle: A free graphical interface for data mining with R.
## Version 3.5.0 Copyright (c) 2006-2015 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.

library(rpart.plot)
library(RColorBrewer)
library(crossval)
library(gplots)

## 
## Attaching package: 'gplots'
## 
## The following object is masked from 'package:stats':
## 
##     lowess

library(vcd)

## Loading required package: grid
## 
## Attaching package: 'vcd'
## 
## The following object is masked from 'package:ISLR':
## 
##     Hitters

library(Metrics)

  
d1<- read.csv("C:/Users/vananga/Downloads/Caravan2.csv")

d1.ori<-d1


set.seed(99)

tr <- d1.ori[sample(row.names(d1.ori), size = round(nrow(d1.ori)*0.5)),]
te <- d1.ori[!(row.names(d1.ori) %in% row.names(tr)), ]

Reset the original training and test data - just to be sure

tr1 <- tr
te1  <- te
te2 <-te

zero r startgey no one will purchase

te2$Purchase <- rep(0,nrow(te2))

R PART

tr1$Purchase = as.factor(tr1$Purchase)
fit1 <- rpart(formula=Purchase ~ .,data=tr1,control=rpart.control(minsplit=20, minbucket=1, cp=0.008))

fit1

## n= 2911 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 2911 181 0 (0.93782205 0.06217795)  
##    2) PPERSAUT< 5.5 1755  53 0 (0.96980057 0.03019943) *
##    3) PPERSAUT>=5.5 1156 128 0 (0.88927336 0.11072664)  
##      6) MOSTYPE>=12.5 785  56 0 (0.92866242 0.07133758) *
##      7) MOSTYPE< 12.5 371  72 0 (0.80592992 0.19407008)  
##       14) PBRAND< 3.5 213  24 0 (0.88732394 0.11267606)  
##         28) MBERHOOG< 5.5 181  15 0 (0.91712707 0.08287293) *
##         29) MBERHOOG>=5.5 32   9 0 (0.71875000 0.28125000)  
##           58) MBERMIDD< 1.5 23   3 0 (0.86956522 0.13043478) *
##           59) MBERMIDD>=1.5 9   3 1 (0.33333333 0.66666667) *
##       15) PBRAND>=3.5 158  48 0 (0.69620253 0.30379747)  
##         30) MBERMIDD< 6.5 142  37 0 (0.73943662 0.26056338) *
##         31) MBERMIDD>=6.5 16   5 1 (0.31250000 0.68750000) *

gc()

##           used (Mb) gc trigger (Mb) max used (Mb)
## Ncells  616528 33.0    1168576 62.5   794069 42.5
## Vcells 1853233 14.2    3944484 30.1  3211055 24.5

fancyRpartPlot(fit1)

NAMES	INFORMATION	VALUES 1	labels
PPERSAUT	`car policy`	1-8 values	-
MOSTYPE	`Customer subtype`	1-41	FYE,12(affluent young)
PBRAND	`fire policy`	(0-7) values	-
MBERHOOG	`High status`	(0-9)values	-
MBERMIDD	`Middle management`	(0-9)values	-

printcp(fit1)

## 
## Classification tree:
## rpart(formula = Purchase ~ ., data = tr1, control = rpart.control(minsplit = 20, 
##     minbucket = 1, cp = 0.008))
## 
## Variables actually used in tree construction:
## [1] MBERHOOG MBERMIDD MOSTYPE  PBRAND   PPERSAUT
## 
## Root node error: 181/2911 = 0.062178
## 
## n= 2911 
## 
##          CP nsplit rel error xerror     xstd
## 1 0.0082873      0   1.00000  1.000 0.071982
## 2 0.0080000      6   0.95028  1.105 0.075402

print(fit1)

## n= 2911 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 2911 181 0 (0.93782205 0.06217795)  
##    2) PPERSAUT< 5.5 1755  53 0 (0.96980057 0.03019943) *
##    3) PPERSAUT>=5.5 1156 128 0 (0.88927336 0.11072664)  
##      6) MOSTYPE>=12.5 785  56 0 (0.92866242 0.07133758) *
##      7) MOSTYPE< 12.5 371  72 0 (0.80592992 0.19407008)  
##       14) PBRAND< 3.5 213  24 0 (0.88732394 0.11267606)  
##         28) MBERHOOG< 5.5 181  15 0 (0.91712707 0.08287293) *
##         29) MBERHOOG>=5.5 32   9 0 (0.71875000 0.28125000)  
##           58) MBERMIDD< 1.5 23   3 0 (0.86956522 0.13043478) *
##           59) MBERMIDD>=1.5 9   3 1 (0.33333333 0.66666667) *
##       15) PBRAND>=3.5 158  48 0 (0.69620253 0.30379747)  
##         30) MBERMIDD< 6.5 142  37 0 (0.73943662 0.26056338) *
##         31) MBERMIDD>=6.5 16   5 1 (0.31250000 0.68750000) *

plot(fit1)
text(fit1)

fit1$cptable[which.min(fit1$cptable[,"xerror"]),"CP"]

## [1] 0.008287293

Prediction<-predict(fit1,te1,type="class")

Compare with base model

Update the prediction

te2$Purchase <- Prediction

Pred = factor(as.factor(te2$Purchase), c(0, 1), labels = c("Not purchased", "Purchased"))
Actual = factor(as.factor(te1$Purchase), c(0, 1), labels = c("Not purchased", "Purchased"))
                      
cmr1 = confusionMatrix(Actual,Pred, negative = "Not purchased")
cmr1

##   FP   TP   TN   FN 
##   23    6 2721  161 
## attr(,"negative")
## [1] "Not purchased"

3 corresponding accuracy, sensitivity etc.

diagnosticErrors(cmr1)

##        acc       sens       spec        ppv        npv        lor 
## 0.93679148 0.03592814 0.99161808 0.20689655 0.94413602 1.48361563 
## attr(,"negative")
## [1] "Not purchased"

Compute the classification error

ce(Actual,Pred)

## [1] 0.06320852

*Model No 2*
#PREDICTION USING ALGORITHM(GLM)
Variable Used
NAMES \| INFORMATION \| VALUES 1 \| labels \| ————\|——————- \|——————\|————————– MOSHOOFD \|`customer main typr`\|1-10 values \|- \| MSKB1 \|`Social Class B1` \|0-41 \|- \| PWAPART \|`Pvt 3rd Party Ins` \|(0-9) values \|- \|
`r library(ggplot2) library(MASS) library(splines) library(mgcv)`
`## Loading required package: nlme ## This is mgcv 1.8-7. For overview type 'help("mgcv-package")'.`
```r library(crossval)
Caravan2<- read.csv(“C:/Users/vananga/Downloads/Caravan2.csv”) Caravan.ori <- Caravan2
set.seed(11) train <- Caravan.ori[sample(row.names(Caravan.ori), size = round(nrow(Caravan.ori)*0.7)), ] test <- Caravan.ori[!(row.names(Caravan.ori) %in% row.names(train)), ]
train.ori <-train test.ori<-test
train2<-train test2<-test ```
##No one purchased
```r test2$Purchase <- rep(0, nrow(test2))
glm.logistic <- glm(Purchase ~ MOSHOOFD + MSKB1+PWAPART, family = “binomial”, data = train) Prediction.prob <- predict(glm.logistic, newdata = test, type=“response”) head(Prediction.prob) ```
`## 2 5 10 11 12 16 ## 0.07442026 0.02692169 0.10603728 0.04551256 0.10980618 0.07176800`
`r Prediction <- round(Prediction.prob,0)` ####
```r test2$Purchase <- Prediction
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased")) Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”))
cm5 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = “Not Purchased”) cm5 ```
`## FP TP TN FN ## 0 0 1638 109 ## attr(,"negative") ## [1] "Not Purchased"`
`r diagnosticErrors(cm5)`
`## acc sens spec ppv npv lor ## 0.9376073 0.0000000 1.0000000 NaN 0.9376073 NaN ## attr(,"negative") ## [1] "Not Purchased"`
`r ce(Actual.Outcome,Our.Prediction)`
`## [1] 0.06239267`
```r glm.fit <- glm(Purchase ~ MOSHOOFD + PWAPART + MSKB1, family = “binomial”, data = train)
inv.logit <- function(x) exp(x) / (1 + exp(x))
glm.pred <- predict(glm.fit, newdata = test, se.fit = TRUE)
pred <- data.frame(mean = inv.logit(glm.pred$fit), lo = inv.logit(glm.pred$fit - 2 * glm.pred$se.fit), hi = inv.logit(glm.pred$fit + 2 * glm.pred$se.fit), Purchase = test$Purchase) pred <- pred[order(pred$mean), ] pred$id <- seq_along(pred$mean) row.names(pred) <- NULL
p <- ggplot(pred, aes(x = id)) p <- p + geom_line(aes(x = id, y = mean)) p <- p + geom_ribbon(aes(y = mean, ymin = lo, ymax = hi), alpha = 0.25) p <- p + geom_vline(xintercept = which(pred$Purchase == 1), colour = “red”, alpha = .95) p <- p + scale_x_discrete(breaks = NULL) p <- p + labs(x = NULL, y = “prediction”) p ```

# We use the expand.grid function to create a data frame with all possible values # of the variables I am interested in, and then visualize the model from there
```r sim.data <- expand.grid(MSKB1 = 2, MOSHOOFD = 8, PWAPART = 0)
pred <- predict(glm.fit, newdata = sim.data, se.fit = TRUE) sim.data$mean <- inv.logit(pred$fit) sim.data$lo <- inv.logit(pred$fit - 2 * pred$se.fit) sim.data$hi <- inv.logit(pred$fit + 2 * pred$se.fit)
p2 <- ggplot(Caravan2, aes(x = MSKB1, y = Purchase)) p2<- p2+ geom_rug() p2<- p2+ facet_grid(MSKB1 ~ MOSHOOFD) p2<- p2+ geom_line(data = sim.data, aes(y = mean), color = “blue”) Prediction <- round(Prediction.prob,0) ```
######Compare with base model
# Update the prediction with out model output
```r test2$Purchase <- Prediction
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased")) Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”)) ```
# Confusion matrix # cm(actual,predicted)
`r cm6 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased") cm6`
`## FP TP TN FN ## 0 0 1638 109 ## attr(,"negative") ## [1] "Not Purchased"`
# corresponding accuracy, sensitivity etc.
`r diagnosticErrors(cm6)`
`## acc sens spec ppv npv lor ## 0.9376073 0.0000000 1.0000000 NaN 0.9376073 NaN ## attr(,"negative") ## [1] "Not Purchased"`

Model No 3

#PREDICTION USING ALGORITHM(C50)

Variables Used

NAMES	INFORMATION	VALUES	labels
PPLEZIER	`Cont to Boat Policy`	0-6 values	-
PBYSTAND	`Cont to Social Sec`	0-5	-
APLEZIER	`No of boat policies`	0-3	-
ABYSTAND	`No of Social Sec`	(0-2) values	-

# Strategy 7 - C50 trees (rules) basic

# Read the Caravan data from Caravan2.csv

Caravan.ori <-Caravan2

set.seed(11)
train <- Caravan.ori[sample(row.names(Caravan.ori), size = round(nrow(Caravan.ori)*0.7)), ]
test <- Caravan.ori[!(row.names(Caravan.ori) %in% row.names(train)), ]

# Creating backup of test and train data for later use. Not modifying .ori files as a rule

train.ori <-train
test.ori<-test
train2<-train
test2<-test

library(crossval)
library(gplots)
library(vcd)
library(Metrics)
library(C50)

# Resetting the original training and test data - just to be sure

train <- train.ori
test  <- test.ori
test2 <-test

# Also resetting the test2 data with no one purchased ZeroR strategy

test2$Purchase <- rep(0, nrow(test2))

combinedData1 <- Caravan.ori[,-7]
combinedData2 <- combinedData1[,-6]
combinedData <- combinedData2[,-5]

combinedData$Purchase <- factor(combinedData$Purchase)


set.seed(11)
train <- combinedData[sample(row.names(combinedData), size = round(nrow(combinedData)*0.7)), ]
test <- combinedData[!(row.names(combinedData) %in% row.names(train)), ]

C50.Rules <- C5.0(Purchase~PPLEZIER+PBYSTAND+APLEZIER+ABYSTAND, data=train, rules = FALSE)

Prediction <- predict(C50.Rules,test)

# Comparing with base model # Updating the prediction with out model output

test2$Purchase <- Prediction

Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))

# Confusion matrix # cm(actual,predicted)

cm7 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased")
cm7

##   FP   TP   TN   FN 
##    0    0 1638  109 
## attr(,"negative")
## [1] "Not Purchased"

# corresponding accuracy, sensitivity etc.

diagnosticErrors(cm7)

##       acc      sens      spec       ppv       npv       lor 
## 0.9376073 0.0000000 1.0000000       NaN 0.9376073       NaN 
## attr(,"negative")
## [1] "Not Purchased"

# Computing the classification error

ce(Actual.Outcome,Our.Prediction)

## [1] 0.06239267

# Strategy 8 - Tree model of C50

C50.Tree <- C5.0(train[,-86],train$Purchase)

Prediction <- predict(C50.Tree,test)

# Comparing with base model # Updating the prediction with out model output

test2$Purchase <- Prediction

Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))

# Confusion matrix # cm(actual,predicted)

cm8 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased")
cm8

##   FP   TP   TN   FN 
##    0  109 1638    0 
## attr(,"negative")
## [1] "Not Purchased"

# corresponding accuracy, sensitivity etc.

diagnosticErrors(cm8)

##  acc sens spec  ppv  npv  lor 
##    1    1    1    1    1  Inf 
## attr(,"negative")
## [1] "Not Purchased"

# Computing the classification error

ce(Actual.Outcome,Our.Prediction)

## [1] 0

# Strategy 9 - Tree model of C50 with min of 2 items in tree edges

```r C50.Tree.small <- C5.0(train[,-3],train$Purchase, control = C5.0Control(minCases = 2))

Prediction <- predict(C50.Tree.small,test) ```

#Comparing with base model #Updating the prediction with out model output

```r test2$Purchase <- Prediction

Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased")) Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”)) ```

# Confusion matrix # cm(actual,predicted)

r cm9 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased") cm9

## FP TP TN FN ## 0 109 1638 0 ## attr(,"negative") ## [1] "Not Purchased"

# corresponding accuracy, sensitivity etc.

r diagnosticErrors(cm9)

## acc sens spec ppv npv lor ## 1 1 1 1 1 Inf ## attr(,"negative") ## [1] "Not Purchased"

# Computing the classification error

r ce(Actual.Outcome,Our.Prediction)

## [1] 0

# Strategy 10 - Tree model of C50 Rulw & Tree Model

library(ISLR)
library(C50)
treeModel <- C5.0(x = Caravan[, -86], y = Caravan$Purchase)
treeModel

## 
## Call:
## C5.0.default(x = Caravan[, -86], y = Caravan$Purchase)
## 
## Classification Tree
## Number of samples: 5822 
## Number of predictors: 85 
## 
## Tree size: 1 
## 
## Non-standard options: attempt to group attributes

summary(treeModel)

## 
## Call:
## C5.0.default(x = Caravan[, -86], y = Caravan$Purchase)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Sun Nov 22 14:27:36 2015
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 5822 cases (86 attributes) from undefined.data
## 
## Decision tree:
##  No (5822/348)
## 
## 
## Evaluation on training data (5822 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##       1  348( 6.0%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##    5474          (a): class No
##     348          (b): class Yes
## 
## 
## Time: 0.7 secs

ruleModel <- C5.0(Purchase ~ ., data = Caravan, rules = FALSE)
ruleModel

## 
## Call:
## C5.0.formula(formula = Purchase ~ ., data = Caravan, rules = FALSE)
## 
## Classification Tree
## Number of samples: 5822 
## Number of predictors: 85 
## 
## Tree size: 1 
## 
## Non-standard options: attempt to group attributes

summary(ruleModel)

## 
## Call:
## C5.0.formula(formula = Purchase ~ ., data = Caravan, rules = FALSE)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Sun Nov 22 14:27:40 2015
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 5822 cases (86 attributes) from undefined.data
## 
## Decision tree:
##  No (5822/348)
## 
## 
## Evaluation on training data (5822 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##       1  348( 6.0%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##    5474          (a): class No
##     348          (b): class Yes
## 
## 
## Time: 0.7 secs

treeModel <- C5.0(x = Caravan[, -86], y = Caravan$Purchase,
                  control = C5.0Control(winnow = FALSE))
summary(treeModel)

## 
## Call:
## C5.0.default(x = Caravan[, -86], y = Caravan$Purchase, control
##  = C5.0Control(winnow = FALSE))
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Sun Nov 22 14:27:44 2015
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 5822 cases (86 attributes) from undefined.data
## 
## Decision tree:
##  No (5822/348)
## 
## 
## Evaluation on training data (5822 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##       1  348( 6.0%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##    5474          (a): class No
##     348          (b): class Yes
## 
## 
## Time: 0.7 secs

treeModel <- C5.0(x = Caravan[, -86], y = Caravan$Purchase,
                  control = C5.0Control(winnow = FALSE, minCases = 5))
summary(treeModel)

## 
## Call:
## C5.0.default(x = Caravan[, -86], y = Caravan$Purchase, control
##  = C5.0Control(winnow = FALSE, minCases = 5))
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Sun Nov 22 14:27:48 2015
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 5822 cases (86 attributes) from undefined.data
## 
## Decision tree:
##  No (5822/348)
## 
## 
## Evaluation on training data (5822 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##       1  348( 6.0%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##    5474          (a): class No
##     348          (b): class Yes
## 
## 
## Time: 0.7 secs

## Variable importance

treeModel <- C5.0(x = Caravan[, -86], y = Caravan$Purchase)

# When metric = “splits”, the percentage of splits associated with each predictor is calculated.

C5imp(treeModel, metric = "splits")

##          Overall
## MOSTYPE      NaN
## MAANTHUI     NaN
## MGEMOMV      NaN
## MGEMLEEF     NaN
## MOSHOOFD     NaN
## MGODRK       NaN
## MGODPR       NaN
## MGODOV       NaN
## MGODGE       NaN
## MRELGE       NaN
## MRELSA       NaN
## MRELOV       NaN
## MFALLEEN     NaN
## MFGEKIND     NaN
## MFWEKIND     NaN
## MOPLHOOG     NaN
## MOPLMIDD     NaN
## MOPLLAAG     NaN
## MBERHOOG     NaN
## MBERZELF     NaN
## MBERBOER     NaN
## MBERMIDD     NaN
## MBERARBG     NaN
## MBERARBO     NaN
## MSKA         NaN
## MSKB1        NaN
## MSKB2        NaN
## MSKC         NaN
## MSKD         NaN
## MHHUUR       NaN
## MHKOOP       NaN
## MAUT1        NaN
## MAUT2        NaN
## MAUT0        NaN
## MZFONDS      NaN
## MZPART       NaN
## MINKM30      NaN
## MINK3045     NaN
## MINK4575     NaN
## MINK7512     NaN
## MINK123M     NaN
## MINKGEM      NaN
## MKOOPKLA     NaN
## PWAPART      NaN
## PWABEDR      NaN
## PWALAND      NaN
## PPERSAUT     NaN
## PBESAUT      NaN
## PMOTSCO      NaN
## PVRAAUT      NaN
## PAANHANG     NaN
## PTRACTOR     NaN
## PWERKT       NaN
## PBROM        NaN
## PLEVEN       NaN
## PPERSONG     NaN
## PGEZONG      NaN
## PWAOREG      NaN
## PBRAND       NaN
## PZEILPL      NaN
## PPLEZIER     NaN
## PFIETS       NaN
## PINBOED      NaN
## PBYSTAND     NaN
## AWAPART      NaN
## AWABEDR      NaN
## AWALAND      NaN
## APERSAUT     NaN
## ABESAUT      NaN
## AMOTSCO      NaN
## AVRAAUT      NaN
## AAANHANG     NaN
## ATRACTOR     NaN
## AWERKT       NaN
## ABROM        NaN
## ALEVEN       NaN
## APERSONG     NaN
## AGEZONG      NaN
## AWAOREG      NaN
## ABRAND       NaN
## AZEILPL      NaN
## APLEZIER     NaN
## AFIETS       NaN
## AINBOED      NaN
## ABYSTAND     NaN

#######

treeModel <- C5.0(x = Caravan[, -86], y = Caravan$Purchase)
predict(treeModel, head(Caravan[, -86]))

## [1] No No No No No No
## Levels: No Yes

predict(treeModel, head(Caravan[, -86]), type = "prob")

##          No        Yes
## 1 0.9402267 0.05977327
## 2 0.9402267 0.05977327
## 3 0.9402267 0.05977327
## 4 0.9402267 0.05977327
## 5 0.9402267 0.05977327
## 6 0.9402267 0.05977327

Model No 4

#PREDICTION USING ALGORITHM(ZERO R)

Variables Used

NAMES	INFORMATION	VALUES	labels
All Variable	`All Variables`		-
MGEMLEEF	`Avg age of customer`	1-6 values	3 - 40 to 50 Yrs
MOSTYPE	`Customer Sub Type`	0-41	-33 with more records
PBRAND	`Cont Fire Plocy`	0-8	-
PPERSAUT	`No of Car Policy`	(0, 4-8)	-

library(crossval)
library(gplots)
library(vcd)
library(Metrics)

Caravan2 <- read.csv("C:/Users/vananga/Downloads/Caravan2.csv")

# Read the Caravan data from Caravan2.csv

Caravan.ori <- Caravan2

set.seed(11)
train <- Caravan.ori[sample(row.names(Caravan.ori), size = round(nrow(Caravan.ori)*0.5)), ]
test <- Caravan.ori[!(row.names(Caravan.ori) %in% row.names(train)), ]

#Create backup of test and train data for later use. Do not modify .ori files as a rule

train.ori <-train
test.ori<-test

train2<-train
test2<-test

# Looking at NO. of people who Purchased or not the Caravan policy

table(train$Purchase)

## 
##    0    1 
## 2745  166

table(test$Purchase)

## 
##    0    1 
## 2729  182

prop.table(table(train$Purchase))

## 
##          0          1 
## 0.94297492 0.05702508

# Strategy 11 - ZeroR model # Using ZeroR algorithm and solving it. # Creating new column in test set with our prediction every one has purchased

test2$Purchase <- rep(1, nrow(test2))

Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))

# Confusion matrix # cm(actual,predicted)

cm11 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased")
cm11

##   FP   TP   TN   FN 
##    0    0  182 2729 
## attr(,"negative")
## [1] "Purchased"

# corresponding accuracy, sensitivity etc.

diagnosticErrors(cm11)

##        acc       sens       spec        ppv        npv        lor 
## 0.06252147 0.00000000 1.00000000        NaN 0.06252147        NaN 
## attr(,"negative")
## [1] "Purchased"

# Computing the classification error

ce(Actual.Outcome,Our.Prediction)

## [1] 0.9374785

#Strategy 12 - ZeroR model

# Creating new column in test set with our prediction no one purchased

test2$Purchase <- rep(0, nrow(test2))

Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))

# Confusion matrix # cm(actual,predicted)

cm12 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased")
cm12

##   FP   TP   TN   FN 
##    0    0 2729  182 
## attr(,"negative")
## [1] "Not Purchased"

# corresponding accuracy, sensitivity etc.

diagnosticErrors(cm12)

##       acc      sens      spec       ppv       npv       lor 
## 0.9374785 0.0000000 1.0000000       NaN 0.9374785       NaN 
## attr(,"negative")
## [1] "Not Purchased"

# Computing the classification error

ce(Actual.Outcome,Our.Prediction)

## [1] 0.06252147

# Strategy 13 - Customer Sub Type

# Resetting the original training and test data - just to be sure

train <- train.ori
test  <- test.ori
test2 <-test

# Also resetting the test2 data with no one purchased ZeroR strategy

test2$Purchase <- rep(0, nrow(test2))

summary(train$MOSTYPE)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   10.00   30.00   24.23   35.00   41.00

prop.table(table(train$MOSTYPE, train$Purchase))

##     
##                 0            1
##   1  0.0178632772 0.0013740982
##   2  0.0151150807 0.0013740982
##   3  0.0415664720 0.0027481965
##   4  0.0106492614 0.0003435246
##   5  0.0082445895 0.0003435246
##   6  0.0195809000 0.0017176228
##   7  0.0089316386 0.0010305737
##   8  0.0467193404 0.0068704912
##   9  0.0446581931 0.0024046719
##   10 0.0254208176 0.0017176228
##   11 0.0237031948 0.0013740982
##   12 0.0168327035 0.0027481965
##   13 0.0271384404 0.0020611474
##   15 0.0006870491 0.0000000000
##   16 0.0034352456 0.0000000000
##   17 0.0013740982 0.0000000000
##   18 0.0034352456 0.0000000000
##   19 0.0006870491 0.0000000000
##   20 0.0037787702 0.0000000000
##   21 0.0020611474 0.0000000000
##   22 0.0154586053 0.0003435246
##   23 0.0425970457 0.0003435246
##   24 0.0291995878 0.0006870491
##   25 0.0147715562 0.0003435246
##   26 0.0089316386 0.0003435246
##   27 0.0075575404 0.0000000000
##   28 0.0048093439 0.0000000000
##   29 0.0147715562 0.0003435246
##   30 0.0202679492 0.0006870491
##   31 0.0357265544 0.0013740982
##   32 0.0223290965 0.0006870491
##   33 0.1308828581 0.0089316386
##   34 0.0305736860 0.0013740982
##   35 0.0350395053 0.0013740982
##   36 0.0336654071 0.0034352456
##   37 0.0216420474 0.0013740982
##   38 0.0546204054 0.0051528684
##   39 0.0570250773 0.0027481965
##   40 0.0109927860 0.0000000000
##   41 0.0302301615 0.0013740982

prop.table(table(train$MOSTYPE, train$Purchase), 1)

##     
##               0          1
##   1  0.92857143 0.07142857
##   2  0.91666667 0.08333333
##   3  0.93798450 0.06201550
##   4  0.96875000 0.03125000
##   5  0.96000000 0.04000000
##   6  0.91935484 0.08064516
##   7  0.89655172 0.10344828
##   8  0.87179487 0.12820513
##   9  0.94890511 0.05109489
##   10 0.93670886 0.06329114
##   11 0.94520548 0.05479452
##   12 0.85964912 0.14035088
##   13 0.92941176 0.07058824
##   15 1.00000000 0.00000000
##   16 1.00000000 0.00000000
##   17 1.00000000 0.00000000
##   18 1.00000000 0.00000000
##   19 1.00000000 0.00000000
##   20 1.00000000 0.00000000
##   21 1.00000000 0.00000000
##   22 0.97826087 0.02173913
##   23 0.99200000 0.00800000
##   24 0.97701149 0.02298851
##   25 0.97727273 0.02272727
##   26 0.96296296 0.03703704
##   27 1.00000000 0.00000000
##   28 1.00000000 0.00000000
##   29 0.97727273 0.02272727
##   30 0.96721311 0.03278689
##   31 0.96296296 0.03703704
##   32 0.97014925 0.02985075
##   33 0.93611794 0.06388206
##   34 0.95698925 0.04301075
##   35 0.96226415 0.03773585
##   36 0.90740741 0.09259259
##   37 0.94029851 0.05970149
##   38 0.91379310 0.08620690
##   39 0.95402299 0.04597701
##   40 1.00000000 0.00000000
##   41 0.95652174 0.04347826

#Comparing with base model

# Updating the prediction to say that Subtype will Purchase

test2$Purchase[test2$MGEMLEEF ==3] <- 1
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))

# Confusion matrix # cm(actual,predicted)

cm13 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased")
cm13

##   FP   TP   TN   FN 
##   90 1294   92 1435 
## attr(,"negative")
## [1] "Purchased"

# corresponding accuracy, sensitivity etc.

diagnosticErrors(cm13)

##         acc        sens        spec         ppv         npv         lor 
##  0.47612504  0.47416636  0.50549451  0.93497110  0.06024885 -0.08144775 
## attr(,"negative")
## [1] "Purchased"

# Computing the classification error

ce(Actual.Outcome,Our.Prediction)

## [1] 0.523875

# Strategy 14 - Customer Sub Type
# Strategy 15 - Customer Sub Type 33
# Resetting the original training and test data - just to be sure
`r train <- train.ori test <- test.ori test2 <-test`
# Also resetting the test2 data with no one purchased ZeroR strategy
`r test2$Purchase <- rep(0, nrow(test2))`
`r summary(train$MOSTYPE)`
`## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.00 10.00 30.00 24.23 35.00 41.00`
# Comparing with base model
# Updating the prediction to say that Subtype 33 will Purchase
`r test2$Purchase[test2$MOSTYPE==33] <- 1`
`r Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased")) Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))`
# Confusion matrix # cm(actual,predicted)
`r cm15 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased") cm15`
`## FP TP TN FN ## 162 2346 20 383 ## attr(,"negative") ## [1] "Purchased"`
# corresponding accuracy, sensitivity etc.
`r diagnosticErrors(cm15)`
`## acc sens spec ppv npv lor ## 0.81277911 0.85965555 0.10989011 0.93540670 0.04962779 -0.27943202 ## attr(,"negative") ## [1] "Purchased"`
# Computing the classification error
`r ce(Actual.Outcome,Our.Prediction)`
`## [1] 0.1872209`

# Strategy 16 - Customer Sub Type 33

# Resetting the original training and test data - just to be sure

train <- train.ori
test  <- test.ori
test2 <-test

# Also resetting the test2 data with no one purchased ZeroR strategy

test2$Purchase <- rep(0, nrow(test2))

summary(train$MOSTYPE)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   10.00   30.00   24.23   35.00   41.00

# Comparing with base model

# Updating the prediction to say that Subtype 33 will not Purchase

test2$Purchase[test2$MOSTYPE==33] <- 0

Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))

# Confusion matrix # cm(actual,predicted)

cm16 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased")
cm16

##   FP   TP   TN   FN 
##    0    0 2729  182 
## attr(,"negative")
## [1] "Not Purchased"

# corresponding accuracy, sensitivity etc.

diagnosticErrors(cm16)

##       acc      sens      spec       ppv       npv       lor 
## 0.9374785 0.0000000 1.0000000       NaN 0.9374785       NaN 
## attr(,"negative")
## [1] "Not Purchased"

# Computing the classification error

ce(Actual.Outcome,Our.Prediction)

## [1] 0.06252147

# Strategy 17 - Contribution to Fire Policy

# Resetting the original training and test data - just to be sure

r train <- train.ori test <- test.ori test2 <-test

# Also reset the test2 data with no one purchased ZeroR strategy

r test2$Purchase <- rep(0, nrow(test2))

################################## Comparing with base model

# Updating the prediction to say that Customer will Purchase

```r test2$Purchase[test2$PBRAND] <- 1

Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”))

Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”)) ```

# Confusion matrix # cm(actual,predicted)

r cm17 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased") cm17

## FP TP TN FN ## 182 2721 0 8 ## attr(,"negative") ## [1] "Purchased"

# corresponding accuracy, sensitivity etc.

r diagnosticErrors(cm17)

##       acc      sens      spec       ppv       npv       lor ## 0.9347303 0.9970685 0.0000000 0.9373062 0.0000000      -Inf ## attr(,"negative") ## [1] "Purchased"

# Computing the classification error

r ce(Actual.Outcome,Our.Prediction)

## [1] 0.06526967

# Strategy 18 - Contribution to Fire Policy

# Resetting the original training and test data - just to be sure

train <- train.ori
test  <- test.ori
test2 <-test

# Also resetting the test2 data with no one purchased ZeroR strategy

test2$Purchase <- rep(0, nrow(test2))

# Comparing with base model

# Updating the prediction to say that Customer will not Purchase

test2$Purchase[test2$PBRAND] <- 0


Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))

# Confusion matrix # cm(actual,predicted)

cm18 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased")
cm18

##   FP   TP   TN   FN 
##    0    0 2729  182 
## attr(,"negative")
## [1] "Not Purchased"

# corresponding accuracy, sensitivity etc.

diagnosticErrors(cm18)

##       acc      sens      spec       ppv       npv       lor 
## 0.9374785 0.0000000 1.0000000       NaN 0.9374785       NaN 
## attr(,"negative")
## [1] "Not Purchased"

# Computing the classification error

ce(Actual.Outcome,Our.Prediction)

## [1] 0.06252147

# Strategy 19 - No of Car Policy

# Resetting the original training and test data - just to be sure

train <- train.ori
test  <- test.ori
test2 <-test

# Also resetting the test2 data with no one purchased ZeroR strategy

test2$Purchase <- rep(0, nrow(test2))

# Comparing with base model

# Updating the prediction to say that Customer will Purchase

test2$Purchase[test2$APERSAUT] <- 1


Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))

# Confusion matrix # cm(actual,predicted)

cm19 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased")
cm19

##   FP   TP   TN   FN 
##  182 2724    0    5 
## attr(,"negative")
## [1] "Purchased"

# corresponding accuracy, sensitivity etc.

diagnosticErrors(cm19)

##       acc      sens      spec       ppv       npv       lor 
## 0.9357609 0.9981678 0.0000000 0.9373710 0.0000000      -Inf 
## attr(,"negative")
## [1] "Purchased"

# Computing the classification error

ce(Actual.Outcome,Our.Prediction)

## [1] 0.06423909

# Strategy 20 - No of Car Policy

# Resetting the original training and test data - just to be sure

train <- train.ori
test  <- test.ori
test2 <-test

# Also resetting the test2 data with no one purchased ZeroR strategy

test2$Purchase <- rep(0, nrow(test2))

# Comparing with base model

# Updating the prediction to say that Customer will not Purchase

test2$Purchase[test2$APERSAUT] <- 0

Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))

# Confusion matrix # cm(actual,predicted)

cm20 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased")
cm20

##   FP   TP   TN   FN 
##    0    0 2729  182 
## attr(,"negative")
## [1] "Not Purchased"

# corresponding accuracy, sensitivity etc.

diagnosticErrors(cm20)

##       acc      sens      spec       ppv       npv       lor 
## 0.9374785 0.0000000 1.0000000       NaN 0.9374785       NaN 
## attr(,"negative")
## [1] "Not Purchased"

# Computing the classification error

ce(Actual.Outcome,Our.Prediction)

## [1] 0.06252147

##Model Cost Summary**
FINAL OBSERVATION OF THE PROJECT AND ITS MODEL
1. More than 90% of our observations were responded that they hold Caravan Policy 2. We tried to do prediction assuming no one will purchase the policy as majority of observations doesn’t hold insurance 3. We have used 4 models with more than 12 variables 4. As predominantly the data belongs to one prediction variable there is not much variance in each model we used for prediction 5. We also tried predicting the another way i.e. customer will purchase too and the accuracy has seen notable level of improvement in terms of accuracy which we have given for ref in the last model.

** FINAL OBSERVATION OF THE PROJECT AND ITS MODEL**

1. More than 90% of our observations were responded that they hold Caravan Policy 2. We tried to do prediction assuming no one will purchase the policy as majority of observations doesn’t hold insurance 3. We have used 4 models with more than 12 variables 4. As predominantly the data belongs to one prediction variable there is not much variance in each model we used for prediction 5. We also tried predicting the another way i.e. customer will purchase too and the accuracy has seen notable level of improvement in terms of accuracy which we have given for ref in the last model.

PROJECT WORK - INSURANCE COMPANY BENCHMARK CoIL 2000

ABINAYA, BADRINATH KE, RAJKUMAR S & SOMU KAKRECHA

17 October 2015

Reset the original training and test data - just to be sure

zero r startgey no one will purchase

R PART

Compare with base model

Update the prediction

Model No 3