Introduction
This data set is about a direct marketing case from the insurance sector which was to predict policy ownership. It is about predicting who would be interested in buying a caravan insurance policy. This data set was used in the second edition of the Computational intelligence and Learning(CoIL) competition Challenge in the Year 2000, organized by CoIL cluster, which is a cooperation between four EU funded Networks of Excellence which represent the areas of neural networks (NeuroNet), fuzzy systems (ERUDIT), evolutionary computing (EvoNet) and machine learning (MLNet) and it is owned and donated by Peter van der Putten of the Dutch data mining company Sentient Machine Research, Baarsjesweg 224 1058 AA Amsterdam The Netherlands +31 20 6186927 putten@liacs.nl and is based on real world business problem. TIC (The Insurance Company) Benchmark Homepage (http://www.liacs.nl/~putten/library/cc2000) was donated on March 7, 2000.
Relevant Papers
P. van der Putten and M. van Someren (eds). CoIL Challenge 2000: The Insurance Company Case. Published by Sentient Machine Research, Amsterdam. Also a Leiden Institute of Advanced Computer Science Technical Report 2000-09. June 22, 2000.
SUMMARY ABOUT DATASET
NO OF OBSERVATIONS: 5822 real customer records
NO OF VARIABLES: 86 Nos.
Each real customer record consists of 86 variables, containing sociodemographic data (variables 1-43) and product ownership data (variables 44-86). The sociodemographic data is derived from zip codes. All customers living in areas with the same zip code have the same sociodemographic attributes. Variable 86 (Purchase), “CARAVAN: Number of mobile home policies”, is the target variable which indicates whether the customer purchase a caravan insurance policy or not.
TASK
Predict which customers are potentially interested in a caravan insurance policy (Prediction or Regression).
PREDICTION TASK
To predict whether a customer is interested in a caravan insurance policy from other data about the customer. Information about customers consists of 86 variables and includes product usage data and socio-demographic data derived from zip area codes. The data was supplied by the Dutch data mining company Sentient Machine Research and is based on a real world business problem. The training set contains over 5000 descriptions of customers, including the information of whether or not they have a caravan insurance policy. A test set contains 4000 customers. In the prediction task, the underlying problem is to the find the subset of customers with a probability of having a caravan insurance policy above some boundary probability. The known policyholders can then be removed and the rest receives a mailing. The boundary depends on the costs and benefits such as of the costs of mailing and benefit of selling insurance policies. To approximate this problem, we want to find the set of 800 customers in the test set of 4000 customers that contains the most caravan policy owners. For each solution submitted, the number of actual policyholders will be counted and this gives the score of a solution.
library(ISLR)
## PIE CHART OF YES/NO FOR PURCHASE OF CARAVAN POLICY BY CUSTOMERS
a<-table(Caravan$Purchase)
a
##
## No Yes
## 5474 348
colors=c("red","green")
col=colors
pie(a,main = "CUSTOMERS OF CARAVAN POLICY",col=colors)
box()
OBSERVATION FOR PIE CHART
The above piechart shows the number of customers who purchased(Yes) the Caravan policy which is 348 and who have not purchased(NO) the Caravan policy which is 5474
# BAR AND PIE CHARTS SHOWING CORRELATION OF CUSTOMERS WHO PURCHASED CARAVAN POLICY AND VARIOUS VARIABLES
## CHARTS SHOWING PURCHASE OF CARAVAN POLICY BY CUSTOMERS AGAINST SOCIODEMOGRAPHIC DATA VARIABLES
### 1. VARIABLE - CUSTOMER SUBTYPE
a<-table(Caravan$MOSTYPE[Caravan$Purchase=="Yes"])
a
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 20 22 23 24 25 26 27 29 30 31 32 33
## 13 6 25 2 2 12 3 51 12 9 9 16 13 2 4 4 5 2 1 1 2 4 6 8 46
## 34 35 36 37 38 39 41
## 9 8 16 10 23 19 5
barplot(a,border="dark blue",main = "PURCHASE OF CARAVAN POLICY vs CUSTOMER SUBTYPE",xlab="Customer subtype",ylab="Number of customers")
OBSERVATION OF CUSTOMER TYPE
In the above barplot, Customers of various subtype of about 41 labels are taken. Customers belonging to subtype 8(Middle class families) & subtype 33(lower class with large families) are more likely to purchase the Caravan policy
### 2. VARIABLE - AVG AGE (Age group)
a<-table(Caravan$MGEMLEEF[Caravan$Purchase=="Yes"])
a
##
## 1 2 3 4 5 6
## 1 87 183 64 12 1
names(a)=c("20 to 30","30 to 40","40 to 50","50 to 60","60 to 70","70 to 80")
barplot(a,col=rainbow(6),main = "PURCHASE OF CARAVAN POLICY vs AVE AGE",xlab="Avg age or Age group",ylab="Number of customers")
OBSERVATION FOR AVG AGE
In the above barplot, customers of various age group is taken and it is plotted against the customers who have said yes to buy caravan policy. The customers belonging to age group of 40-50 are more likely to purchase the caravan policy
### 3. VARIABLE - PURCHASING POWER CLASS
a<-table(Caravan$MKOOPKLA[Caravan$Purchase=="Yes"])
a
##
## 1 2 3 4 5 6 7 8
## 18 15 71 46 30 66 67 35
barplot(a,col=rainbow(7),main = "PURCHASE OF CARAVAN POLICY vs PURCHASING POWER CLASS",xlab = "Purchasing power class",ylab = "Number of customers")
OBSERVATION OF CUSTOMER TYPE
In the above barplot, We come to know that the Customers who are of High status seniors(3) are more likely to purchase the Caravan policy with Dinki's (double income no kids)(7) people and Career and childcare class(6) coming a close second and third respectively
### 4. VARIABLE - AVERAGE INCOME
a<-table(Caravan$MINKGEM[Caravan$Purchase=="Yes"])
a
##
## 1 2 3 4 5 6 7 8
## 1 20 69 139 70 24 17 8
pie(a,col=rainbow(7),main ="PURCHASE OF CARAVAN POLICY vs AVERAGE INCOME")
box()
OBSERVATION OF CUSTOMER TYPE
In the above piechart, We come to know that the middle income Customers who are of the average income between $200 to $499(4) are more likely and the Customers who are of the average income between $100 to $199(3) and between $500 to $999(5) are likely to purchase the Caravan policy
### 5. VARIABLE - CUSTOMER MAIN TYPE
b<-table(Caravan$MOSHOOFD[Caravan$Purchase=="Yes"])
b
##
## 1 2 3 5 6 7 8 9 10
## 48 66 59 15 4 20 89 42 5
colors=c("violet","yellow","blue","red","brown","orange","green")
color=colors
pie(b,col=colors,main ="PURCHASE OF CARAVAN POLICY vs CUSTOMER MAIN TYPE")
box()
OBSERVATION OF CUSTOMER TYPE
In the above Pie chart, Customers of various maintype of about 10 labels are taken. Customers belonging to maintype 8(Family with grown ups) & maintype 2(Driven Growers) are more likely to purchase the Caravan policy
## BAR CHARTS AND PIE CHARTS SHOWING PURCHASE OF CARAVAN POLICY BY CUSTOMERS AGAINST PRODUCT USAGE(POLICY OWNERSHIP) DATA VARIABLES
### 1.VARIABLE - NUMBER OF BOAT POLICIES
a<-table(Caravan$APLEZIER[Caravan$Purchase=="Yes"])
a
##
## 0 1 2
## 335 12 1
barplot(a,border="dark blue",main = "PURCHASE OF CARAVAN POLICY vs NUMBER OF BOAT POLICIES",xlab = "Number of boat policies",ylab = "Number of customers")
OBSERVATION OF CUSTOMER TYPE
In the above barplot, We come to know that the Customers who have not purchased the boat policy(0) are more likely to purchase the Caravan policy
### 2. VARIABLE - NUMBER OF SOCIAL SECURITY INSURANCE POLICIES
a<-table(Caravan$ABYSTAND[Caravan$Purchase=="Yes"])
a
##
## 0 1
## 332 16
barplot(a,border="dark blue",main = "PURCHASE OF CARAVAN POLICY vs NO. OF SS INSURANCE POLICIES",xlab = "Number of social security insurance policies",ylab = "Number of customers")
OBSERVATION OF CUSTOMER TYPE
In the above barplot, We come to know that the Customers who have not purchased social security insurance policy(0) are more likely to purchase the Caravan policy
### 3. VARIABLE - CONTRIBUTION CAR POLICIES
a<-table(Caravan$PPERSAUT[Caravan$Purchase=="Yes"])
a
##
## 0 5 6
## 72 14 262
colors=c("blue","red","green")
col=colors
pie(a,main ="PURCHASE OF CARAVAN POLICY vs CONTRIBUTION CAR POLICIES",col=colors)
box()
OBSERVATION OF CUSTOMER TYPE
In the above piechart, We come to know that the Customers who pay car policy premium averagely from $1000 to $4999(6) are more likely to purchase the Caravan policy
### 4. VARIABLE - Number of fire policies
a<-table(Caravan$ABRAND[Caravan$Purchase=="Yes"])
a
##
## 0 1 2
## 109 232 7
colors=c("orange","violet","yellow")
col=colors
pie(a,main ="PURCHASE OF CARAVAN POLICY vs NUMBER OF FIRE POLICIES",col=colors)
box()
OBSERVATION OF CUSTOMER TYPE
In the above piechart, We come to know that the Customers who purchase only one fire policy are more likely to purchase the Caravan policy
#PREDICTION USING ALGORITHMS(ZERO R, RPART, C50 & GLM) AND THEIR MODELS
# 1. MODELS BASED ON ZERO R ALGORITHM
library(crossval)
library(gplots)
##
## Attaching package: 'gplots'
##
## The following object is masked from 'package:stats':
##
## lowess
library(vcd)
## Loading required package: grid
##
## Attaching package: 'vcd'
##
## The following object is masked from 'package:ISLR':
##
## Hitters
library(Metrics)
Caravan2 <- read.csv("C:/Users/S.RAJKUMAR/Desktop/Caravan2.csv")
# Reading the Caravan data from Caravan2.csv
Caravan.ori <-Caravan2
head(Caravan.ori)
## MOSTYPE MAANTHUI MGEMOMV MGEMLEEF MOSHOOFD MGODRK MGODPR MGODOV MGODGE
## 1 33 1 3 2 8 0 5 1 3
## 2 37 1 2 2 8 1 4 1 4
## 3 37 1 2 2 8 0 4 2 4
## 4 9 1 3 3 3 2 3 2 4
## 5 40 1 4 2 10 1 4 1 4
## 6 23 1 2 1 5 0 5 0 5
## MRELGE MRELSA MRELOV MFALLEEN MFGEKIND MFWEKIND MOPLHOOG MOPLMIDD
## 1 7 0 2 1 2 6 1 2
## 2 6 2 2 0 4 5 0 5
## 3 3 2 4 4 4 2 0 5
## 4 5 2 2 2 3 4 3 4
## 5 7 1 2 2 4 4 5 4
## 6 0 6 3 3 5 2 0 5
## MOPLLAAG MBERHOOG MBERZELF MBERBOER MBERMIDD MBERARBG MBERARBO MSKA
## 1 7 1 0 1 2 5 2 1
## 2 4 0 0 0 5 0 4 0
## 3 4 0 0 0 7 0 2 0
## 4 2 4 0 0 3 1 2 3
## 5 0 0 5 4 0 0 0 9
## 6 4 2 0 0 4 2 2 2
## MSKB1 MSKB2 MSKC MSKD MHHUUR MHKOOP MAUT1 MAUT2 MAUT0 MZFONDS MZPART
## 1 1 2 6 1 1 8 8 0 1 8 1
## 2 2 3 5 0 2 7 7 1 2 6 3
## 3 5 0 4 0 7 2 7 0 2 9 0
## 4 2 1 4 0 5 4 9 0 0 7 2
## 5 0 0 0 0 4 5 6 2 1 5 4
## 6 2 2 4 2 9 0 5 3 3 9 0
## MINKM30 MINK3045 MINK4575 MINK7512 MINK123M MINKGEM MKOOPKLA PWAPART
## 1 0 4 5 0 0 4 3 0
## 2 2 0 5 2 0 5 4 2
## 3 4 5 0 0 0 3 4 2
## 4 1 5 3 0 0 4 4 0
## 5 0 0 9 0 0 6 3 0
## 6 5 2 3 0 0 3 3 0
## PWABEDR PWALAND PPERSAUT PBESAUT PMOTSCO PVRAAUT PAANHANG PTRACTOR
## 1 0 0 6 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0
## 3 0 0 6 0 0 0 0 0
## 4 0 0 6 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0
## 6 0 0 6 0 0 0 0 0
## PWERKT PBROM PLEVEN PPERSONG PGEZONG PWAOREG PBRAND PZEILPL PPLEZIER
## 1 0 0 0 0 0 0 5 0 0
## 2 0 0 0 0 0 0 2 0 0
## 3 0 0 0 0 0 0 2 0 0
## 4 0 0 0 0 0 0 2 0 0
## 5 0 0 0 0 0 0 6 0 0
## 6 0 0 0 0 0 0 0 0 0
## PFIETS PINBOED PBYSTAND AWAPART AWABEDR AWALAND APERSAUT ABESAUT AMOTSCO
## 1 0 0 0 0 0 0 1 0 0
## 2 0 0 0 2 0 0 0 0 0
## 3 0 0 0 1 0 0 1 0 0
## 4 0 0 0 0 0 0 1 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 1 0 0
## AVRAAUT AAANHANG ATRACTOR AWERKT ABROM ALEVEN APERSONG AGEZONG AWAOREG
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0
## ABRAND AZEILPL APLEZIER AFIETS AINBOED ABYSTAND Purchase
## 1 1 0 0 0 0 0 0
## 2 1 0 0 0 0 0 0
## 3 1 0 0 0 0 0 0
## 4 1 0 0 0 0 0 0
## 5 1 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0
str(Caravan.ori)
## 'data.frame': 5822 obs. of 86 variables:
## $ MOSTYPE : int 33 37 37 9 40 23 39 33 33 11 ...
## $ MAANTHUI: int 1 1 1 1 1 1 2 1 1 2 ...
## $ MGEMOMV : int 3 2 2 3 4 2 3 2 2 3 ...
## $ MGEMLEEF: int 2 2 2 3 2 1 2 3 4 3 ...
## $ MOSHOOFD: int 8 8 8 3 10 5 9 8 8 3 ...
## $ MGODRK : int 0 1 0 2 1 0 2 0 0 3 ...
## $ MGODPR : int 5 4 4 3 4 5 2 7 1 5 ...
## $ MGODOV : int 1 1 2 2 1 0 0 0 3 0 ...
## $ MGODGE : int 3 4 4 4 4 5 5 2 6 2 ...
## $ MRELGE : int 7 6 3 5 7 0 7 7 6 7 ...
## $ MRELSA : int 0 2 2 2 1 6 2 2 0 0 ...
## $ MRELOV : int 2 2 4 2 2 3 0 0 3 2 ...
## $ MFALLEEN: int 1 0 4 2 2 3 0 0 3 2 ...
## $ MFGEKIND: int 2 4 4 3 4 5 3 5 3 2 ...
## $ MFWEKIND: int 6 5 2 4 4 2 6 4 3 6 ...
## $ MOPLHOOG: int 1 0 0 3 5 0 0 0 0 0 ...
## $ MOPLMIDD: int 2 5 5 4 4 5 4 3 1 4 ...
## $ MOPLLAAG: int 7 4 4 2 0 4 5 6 8 5 ...
## $ MBERHOOG: int 1 0 0 4 0 2 0 2 1 2 ...
## $ MBERZELF: int 0 0 0 0 5 0 0 0 1 0 ...
## $ MBERBOER: int 1 0 0 0 4 0 0 0 0 0 ...
## $ MBERMIDD: int 2 5 7 3 0 4 4 2 1 3 ...
## $ MBERARBG: int 5 0 0 1 0 2 1 5 8 3 ...
## $ MBERARBO: int 2 4 2 2 0 2 5 2 1 3 ...
## $ MSKA : int 1 0 0 3 9 2 0 2 1 1 ...
## $ MSKB1 : int 1 2 5 2 0 2 1 1 1 2 ...
## $ MSKB2 : int 2 3 0 1 0 2 4 2 0 1 ...
## $ MSKC : int 6 5 4 4 0 4 5 5 8 4 ...
## $ MSKD : int 1 0 0 0 0 2 0 2 1 2 ...
## $ MHHUUR : int 1 2 7 5 4 9 6 0 9 0 ...
## $ MHKOOP : int 8 7 2 4 5 0 3 9 0 9 ...
## $ MAUT1 : int 8 7 7 9 6 5 8 4 5 6 ...
## $ MAUT2 : int 0 1 0 0 2 3 0 4 2 1 ...
## $ MAUT0 : int 1 2 2 0 1 3 1 2 3 2 ...
## $ MZFONDS : int 8 6 9 7 5 9 9 6 7 6 ...
## $ MZPART : int 1 3 0 2 4 0 0 3 2 3 ...
## $ MINKM30 : int 0 2 4 1 0 5 4 2 7 2 ...
## $ MINK3045: int 4 0 5 5 0 2 3 5 2 3 ...
## $ MINK4575: int 5 5 0 3 9 3 3 3 1 3 ...
## $ MINK7512: int 0 2 0 0 0 0 0 0 0 1 ...
## $ MINK123M: int 0 0 0 0 0 0 0 0 0 0 ...
## $ MINKGEM : int 4 5 3 4 6 3 3 3 2 4 ...
## $ MKOOPKLA: int 3 4 4 4 3 3 5 3 3 7 ...
## $ PWAPART : int 0 2 2 0 0 0 0 0 0 2 ...
## $ PWABEDR : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PWALAND : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PPERSAUT: int 6 0 6 6 0 6 6 0 5 0 ...
## $ PBESAUT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PMOTSCO : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PVRAAUT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PAANHANG: int 0 0 0 0 0 0 0 0 0 0 ...
## $ PTRACTOR: int 0 0 0 0 0 0 0 0 0 0 ...
## $ PWERKT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PBROM : int 0 0 0 0 0 0 0 3 0 0 ...
## $ PLEVEN : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PPERSONG: int 0 0 0 0 0 0 0 0 0 0 ...
## $ PGEZONG : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PWAOREG : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PBRAND : int 5 2 2 2 6 0 0 0 0 3 ...
## $ PZEILPL : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PPLEZIER: int 0 0 0 0 0 0 0 0 0 0 ...
## $ PFIETS : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PINBOED : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PBYSTAND: int 0 0 0 0 0 0 0 0 0 0 ...
## $ AWAPART : int 0 2 1 0 0 0 0 0 0 1 ...
## $ AWABEDR : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AWALAND : int 0 0 0 0 0 0 0 0 0 0 ...
## $ APERSAUT: int 1 0 1 1 0 1 1 0 1 0 ...
## $ ABESAUT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AMOTSCO : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AVRAAUT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AAANHANG: int 0 0 0 0 0 0 0 0 0 0 ...
## $ ATRACTOR: int 0 0 0 0 0 0 0 0 0 0 ...
## $ AWERKT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ABROM : int 0 0 0 0 0 0 0 1 0 0 ...
## $ ALEVEN : int 0 0 0 0 0 0 0 0 0 0 ...
## $ APERSONG: int 0 0 0 0 0 0 0 0 0 0 ...
## $ AGEZONG : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AWAOREG : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ABRAND : int 1 1 1 1 1 0 0 0 0 1 ...
## $ AZEILPL : int 0 0 0 0 0 0 0 0 0 0 ...
## $ APLEZIER: int 0 0 0 0 0 0 0 0 0 0 ...
## $ AFIETS : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AINBOED : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ABYSTAND: int 0 0 0 0 0 0 0 0 0 0 ...
## $ Purchase: int 0 0 0 0 0 0 0 0 0 0 ...
set.seed(11)
train <- Caravan.ori[sample(row.names(Caravan.ori), size = round(nrow(Caravan.ori)*0.5)), ]
test <- Caravan.ori[!(row.names(Caravan.ori) %in% row.names(train)), ]
# Creating backup of test and train data for later use. Not modified .ori files as a rule
train.ori <-train
test.ori<-test
train2<-train
test2<-test
# Examining structure of dataframe
str(train2)
## 'data.frame': 2911 obs. of 86 variables:
## $ MOSTYPE : int 39 9 5 3 39 33 33 37 12 36 ...
## $ MAANTHUI: int 1 1 1 1 2 1 1 1 1 1 ...
## $ MGEMOMV : int 3 3 3 2 3 3 4 2 3 3 ...
## $ MGEMLEEF: int 2 3 3 4 3 4 3 3 2 3 ...
## $ MOSHOOFD: int 9 3 1 1 9 8 8 8 3 8 ...
## $ MGODRK : int 2 2 1 1 2 0 0 0 0 0 ...
## $ MGODPR : int 5 3 4 4 6 0 6 4 4 4 ...
## $ MGODOV : int 1 2 2 1 0 0 0 1 0 1 ...
## $ MGODGE : int 2 4 3 4 3 9 3 4 5 4 ...
## $ MRELGE : int 5 5 7 7 6 9 7 6 9 7 ...
## $ MRELSA : int 1 2 0 1 1 0 0 1 0 0 ...
## $ MRELOV : int 3 2 2 2 3 0 2 3 0 2 ...
## $ MFALLEEN: int 1 2 2 0 1 0 2 1 0 2 ...
## $ MFGEKIND: int 4 3 4 9 0 3 2 5 2 3 ...
## $ MFWEKIND: int 5 4 4 0 8 6 6 3 7 5 ...
## $ MOPLHOOG: int 0 3 1 0 1 0 0 1 3 0 ...
## $ MOPLMIDD: int 7 4 4 0 3 2 3 5 3 2 ...
## $ MOPLLAAG: int 2 2 5 9 6 7 6 3 4 7 ...
## $ MBERHOOG: int 4 4 1 4 2 3 0 1 3 0 ...
## $ MBERZELF: int 0 0 0 0 1 0 0 0 0 0 ...
## $ MBERBOER: int 3 0 0 0 0 0 0 1 0 3 ...
## $ MBERMIDD: int 1 3 6 2 4 2 2 2 3 0 ...
## $ MBERARBG: int 3 1 1 2 0 5 8 1 3 3 ...
## $ MBERARBO: int 0 2 3 1 3 0 0 5 2 3 ...
## $ MSKA : int 3 3 0 0 1 0 0 1 3 2 ...
## $ MSKB1 : int 2 2 3 0 2 2 1 1 3 0 ...
## $ MSKB2 : int 4 1 2 5 2 3 2 5 0 2 ...
## $ MSKC : int 2 4 5 4 3 5 6 3 4 3 ...
## $ MSKD : int 0 0 0 0 3 0 0 1 0 4 ...
## $ MHHUUR : int 9 5 6 5 2 4 2 3 9 3 ...
## $ MHKOOP : int 0 4 3 5 7 5 7 6 0 6 ...
## $ MAUT1 : int 5 9 5 6 5 7 9 4 9 5 ...
## $ MAUT2 : int 1 0 2 2 2 2 0 0 0 2 ...
## $ MAUT0 : int 3 0 3 1 2 0 0 5 0 3 ...
## $ MZFONDS : int 7 7 7 5 5 5 9 7 6 7 ...
## $ MZPART : int 2 2 2 4 4 4 0 2 3 2 ...
## $ MINKM30 : int 0 1 3 5 1 0 5 3 0 5 ...
## $ MINK3045: int 7 5 3 5 4 0 0 3 6 4 ...
## $ MINK4575: int 2 3 3 0 2 9 4 1 2 0 ...
## $ MINK7512: int 0 0 1 0 3 0 0 3 2 0 ...
## $ MINK123M: int 0 0 0 0 1 0 0 1 0 0 ...
## $ MINKGEM : int 4 4 4 3 5 5 4 5 4 2 ...
## $ MKOOPKLA: int 5 4 3 6 5 3 3 4 7 3 ...
## $ PWAPART : int 0 0 0 0 0 0 0 1 2 0 ...
## $ PWABEDR : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PWALAND : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PPERSAUT: int 5 6 6 6 6 6 6 0 0 0 ...
## $ PBESAUT : int 0 0 5 0 0 0 0 0 0 0 ...
## $ PMOTSCO : int 0 0 0 0 0 6 0 0 4 0 ...
## $ PVRAAUT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PAANHANG: int 0 0 0 0 0 0 0 0 0 0 ...
## $ PTRACTOR: int 0 0 0 0 0 0 0 0 0 0 ...
## $ PWERKT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PBROM : int 0 0 0 0 0 0 0 0 0 3 ...
## $ PLEVEN : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PPERSONG: int 0 0 0 0 0 2 0 0 0 0 ...
## $ PGEZONG : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PWAOREG : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PBRAND : int 0 2 0 0 0 0 0 4 0 0 ...
## $ PZEILPL : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PPLEZIER: int 0 0 0 0 0 0 0 0 0 0 ...
## $ PFIETS : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PINBOED : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PBYSTAND: int 0 0 0 0 0 0 0 0 0 0 ...
## $ AWAPART : int 0 0 0 0 0 0 0 1 1 0 ...
## $ AWABEDR : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AWALAND : int 0 0 0 0 0 0 0 0 0 0 ...
## $ APERSAUT: int 1 1 1 1 3 1 1 0 0 0 ...
## $ ABESAUT : int 0 0 1 0 0 0 0 0 0 0 ...
## $ AMOTSCO : int 0 0 0 0 0 1 0 0 1 0 ...
## $ AVRAAUT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AAANHANG: int 0 0 0 0 0 0 0 0 0 0 ...
## $ ATRACTOR: int 0 0 0 0 0 0 0 0 0 0 ...
## $ AWERKT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ABROM : int 0 0 0 0 0 0 0 0 0 1 ...
## $ ALEVEN : int 0 0 0 0 0 0 0 0 0 0 ...
## $ APERSONG: int 0 0 0 0 0 1 0 0 0 0 ...
## $ AGEZONG : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AWAOREG : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ABRAND : int 0 1 0 0 0 0 0 1 0 0 ...
## $ AZEILPL : int 0 0 0 0 0 0 0 0 0 0 ...
## $ APLEZIER: int 0 0 0 0 0 0 0 0 0 0 ...
## $ AFIETS : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AINBOED : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ABYSTAND: int 0 0 0 0 0 0 0 0 0 0 ...
## $ Purchase: int 0 0 0 0 0 0 0 0 0 0 ...
str(test2)
## 'data.frame': 2911 obs. of 86 variables:
## $ MOSTYPE : int 37 40 23 33 11 10 9 33 38 22 ...
## $ MAANTHUI: int 1 1 1 1 2 1 1 1 1 2 ...
## $ MGEMOMV : int 2 4 2 2 3 4 3 2 2 3 ...
## $ MGEMLEEF: int 2 2 1 3 3 3 3 3 3 3 ...
## $ MOSHOOFD: int 8 10 5 8 3 3 3 8 9 5 ...
## $ MGODRK : int 1 1 0 0 3 1 1 0 0 0 ...
## $ MGODPR : int 4 4 5 7 5 4 3 7 6 5 ...
## $ MGODOV : int 1 1 0 0 0 1 2 0 0 0 ...
## $ MGODGE : int 4 4 5 2 2 4 4 2 3 4 ...
## $ MRELGE : int 6 7 0 7 7 7 7 7 7 7 ...
## $ MRELSA : int 2 1 6 2 0 1 1 2 0 0 ...
## $ MRELOV : int 2 2 3 0 2 2 2 0 2 2 ...
## $ MFALLEEN: int 0 2 3 0 2 0 2 0 0 0 ...
## $ MFGEKIND: int 4 4 5 5 2 3 3 5 6 2 ...
## $ MFWEKIND: int 5 4 2 4 6 6 5 4 3 7 ...
## $ MOPLHOOG: int 0 5 0 0 0 4 1 0 2 2 ...
## $ MOPLMIDD: int 5 4 5 3 4 3 7 3 6 1 ...
## $ MOPLLAAG: int 4 0 4 6 5 3 1 6 2 7 ...
## $ MBERHOOG: int 0 0 2 2 2 0 4 2 2 0 ...
## $ MBERZELF: int 0 5 0 0 0 0 0 0 0 2 ...
## $ MBERBOER: int 0 4 0 0 0 0 0 0 0 0 ...
## $ MBERMIDD: int 5 0 4 2 3 9 5 2 4 1 ...
## $ MBERARBG: int 0 0 2 5 3 0 1 5 0 1 ...
## $ MBERARBO: int 4 0 2 2 3 0 1 2 4 5 ...
## $ MSKA : int 0 9 2 2 1 3 2 2 2 2 ...
## $ MSKB1 : int 2 0 2 1 2 0 3 1 2 0 ...
## $ MSKB2 : int 3 0 2 2 1 6 4 2 4 0 ...
## $ MSKC : int 5 0 4 5 4 0 1 5 2 7 ...
## $ MSKD : int 0 0 2 2 2 0 0 2 0 0 ...
## $ MHHUUR : int 2 4 9 0 0 0 6 0 6 4 ...
## $ MHKOOP : int 7 5 0 9 9 9 3 9 3 5 ...
## $ MAUT1 : int 7 6 5 4 6 6 7 4 7 6 ...
## $ MAUT2 : int 1 2 3 4 1 2 1 4 2 1 ...
## $ MAUT0 : int 2 1 3 2 2 1 2 2 0 2 ...
## $ MZFONDS : int 6 5 9 6 6 5 4 6 4 7 ...
## $ MZPART : int 3 4 0 3 3 4 5 3 5 2 ...
## $ MINKM30 : int 2 0 5 2 2 0 3 2 0 0 ...
## $ MINK3045: int 0 0 2 5 3 3 4 5 1 6 ...
## $ MINK4575: int 5 9 3 3 3 2 3 3 6 3 ...
## $ MINK7512: int 2 0 0 0 1 2 1 0 2 0 ...
## $ MINK123M: int 0 0 0 0 0 2 0 0 0 0 ...
## $ MINKGEM : int 5 6 3 3 4 8 3 3 5 4 ...
## $ MKOOPKLA: int 4 3 3 3 7 7 4 3 4 2 ...
## $ PWAPART : int 2 0 0 0 2 0 2 2 0 2 ...
## $ PWABEDR : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PWALAND : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PPERSAUT: int 0 0 6 0 0 6 5 0 0 6 ...
## $ PBESAUT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PMOTSCO : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PVRAAUT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PAANHANG: int 0 0 0 0 0 0 0 0 0 0 ...
## $ PTRACTOR: int 0 0 0 0 0 0 0 0 0 0 ...
## $ PWERKT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PBROM : int 0 0 0 3 0 0 0 0 0 0 ...
## $ PLEVEN : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PPERSONG: int 0 0 0 0 0 0 0 0 0 0 ...
## $ PGEZONG : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PWAOREG : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PBRAND : int 2 6 0 0 3 0 2 4 0 3 ...
## $ PZEILPL : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PPLEZIER: int 0 0 0 0 0 0 0 0 0 0 ...
## $ PFIETS : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PINBOED : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PBYSTAND: int 0 0 0 0 0 0 0 0 0 0 ...
## $ AWAPART : int 2 0 0 0 1 0 1 1 0 1 ...
## $ AWABEDR : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AWALAND : int 0 0 0 0 0 0 0 0 0 0 ...
## $ APERSAUT: int 0 0 1 0 0 1 1 0 0 1 ...
## $ ABESAUT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AMOTSCO : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AVRAAUT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AAANHANG: int 0 0 0 0 0 0 0 0 0 0 ...
## $ ATRACTOR: int 0 0 0 0 0 0 0 0 0 0 ...
## $ AWERKT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ABROM : int 0 0 0 1 0 0 0 0 0 0 ...
## $ ALEVEN : int 0 0 0 0 0 0 0 0 0 0 ...
## $ APERSONG: int 0 0 0 0 0 0 0 0 0 0 ...
## $ AGEZONG : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AWAOREG : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ABRAND : int 1 1 0 0 1 0 1 1 0 1 ...
## $ AZEILPL : int 0 0 0 0 0 0 0 0 0 0 ...
## $ APLEZIER: int 0 0 0 0 0 0 0 0 0 0 ...
## $ AFIETS : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AINBOED : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ABYSTAND: int 0 0 0 0 0 0 0 0 0 0 ...
## $ Purchase: int 0 0 0 0 0 0 0 0 0 0 ...
# Looking at NO. of people who Purchased or not the Caravan policy
table(train$Purchase)
##
## 0 1
## 2745 166
prop.table(table(train$Purchase))
##
## 0 1
## 0.94297492 0.05702508
# Strategy 1 - ZeroR model # Using ZeroR algorithm and solving it. # Creating new column in test set with our prediction every one has purchased
test2$Purchase <- rep(1, nrow(test2))
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm1 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased")
cm1
## FP TP TN FN
## 0 0 182 2729
## attr(,"negative")
## [1] "Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm1)
## acc sens spec ppv npv lor
## 0.06252147 0.00000000 1.00000000 NaN 0.06252147 NaN
## attr(,"negative")
## [1] "Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0.9374785
table(test$Purchase)
##
## 0 1
## 2729 182
prop.table(table(test$Purchase))
##
## 0 1
## 0.93747853 0.06252147
#Strategy 2 - ZeroR model
# Creating new column in test set with our prediction no one purchased
test2$Purchase <- rep(0, nrow(test2))
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm2 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased")
cm2
## FP TP TN FN
## 0 0 2729 182
## attr(,"negative")
## [1] "Not Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm2)
## acc sens spec ppv npv lor
## 0.9374785 0.0000000 1.0000000 NaN 0.9374785 NaN
## attr(,"negative")
## [1] "Not Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0.06252147
# Strategy 3 - Customer Sub Type
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
summary(train$MOSTYPE)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 10.00 30.00 24.23 35.00 41.00
prop.table(table(train$MOSTYPE, train$Purchase))
##
## 0 1
## 1 0.0178632772 0.0013740982
## 2 0.0151150807 0.0013740982
## 3 0.0415664720 0.0027481965
## 4 0.0106492614 0.0003435246
## 5 0.0082445895 0.0003435246
## 6 0.0195809000 0.0017176228
## 7 0.0089316386 0.0010305737
## 8 0.0467193404 0.0068704912
## 9 0.0446581931 0.0024046719
## 10 0.0254208176 0.0017176228
## 11 0.0237031948 0.0013740982
## 12 0.0168327035 0.0027481965
## 13 0.0271384404 0.0020611474
## 15 0.0006870491 0.0000000000
## 16 0.0034352456 0.0000000000
## 17 0.0013740982 0.0000000000
## 18 0.0034352456 0.0000000000
## 19 0.0006870491 0.0000000000
## 20 0.0037787702 0.0000000000
## 21 0.0020611474 0.0000000000
## 22 0.0154586053 0.0003435246
## 23 0.0425970457 0.0003435246
## 24 0.0291995878 0.0006870491
## 25 0.0147715562 0.0003435246
## 26 0.0089316386 0.0003435246
## 27 0.0075575404 0.0000000000
## 28 0.0048093439 0.0000000000
## 29 0.0147715562 0.0003435246
## 30 0.0202679492 0.0006870491
## 31 0.0357265544 0.0013740982
## 32 0.0223290965 0.0006870491
## 33 0.1308828581 0.0089316386
## 34 0.0305736860 0.0013740982
## 35 0.0350395053 0.0013740982
## 36 0.0336654071 0.0034352456
## 37 0.0216420474 0.0013740982
## 38 0.0546204054 0.0051528684
## 39 0.0570250773 0.0027481965
## 40 0.0109927860 0.0000000000
## 41 0.0302301615 0.0013740982
prop.table(table(train$MOSTYPE, train$Purchase), 1)
##
## 0 1
## 1 0.92857143 0.07142857
## 2 0.91666667 0.08333333
## 3 0.93798450 0.06201550
## 4 0.96875000 0.03125000
## 5 0.96000000 0.04000000
## 6 0.91935484 0.08064516
## 7 0.89655172 0.10344828
## 8 0.87179487 0.12820513
## 9 0.94890511 0.05109489
## 10 0.93670886 0.06329114
## 11 0.94520548 0.05479452
## 12 0.85964912 0.14035088
## 13 0.92941176 0.07058824
## 15 1.00000000 0.00000000
## 16 1.00000000 0.00000000
## 17 1.00000000 0.00000000
## 18 1.00000000 0.00000000
## 19 1.00000000 0.00000000
## 20 1.00000000 0.00000000
## 21 1.00000000 0.00000000
## 22 0.97826087 0.02173913
## 23 0.99200000 0.00800000
## 24 0.97701149 0.02298851
## 25 0.97727273 0.02272727
## 26 0.96296296 0.03703704
## 27 1.00000000 0.00000000
## 28 1.00000000 0.00000000
## 29 0.97727273 0.02272727
## 30 0.96721311 0.03278689
## 31 0.96296296 0.03703704
## 32 0.97014925 0.02985075
## 33 0.93611794 0.06388206
## 34 0.95698925 0.04301075
## 35 0.96226415 0.03773585
## 36 0.90740741 0.09259259
## 37 0.94029851 0.05970149
## 38 0.91379310 0.08620690
## 39 0.95402299 0.04597701
## 40 1.00000000 0.00000000
## 41 0.95652174 0.04347826
################################## Comparing with base model
# Updating the prediction to say that Subtype will Purchase
test2$Purchase[test2$MOSTYPE] <- 1
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm3 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased")
cm3
## FP TP TN FN
## 182 2689 0 40
## attr(,"negative")
## [1] "Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm3)
## acc sens spec ppv npv lor
## 0.9237375 0.9853426 0.0000000 0.9366075 0.0000000 -Inf
## attr(,"negative")
## [1] "Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0.07626245
| # Strategy 4 - Customer Sub Type |
|---|
| # Strategy 5 - Customer Sub Type 8 |
| # Resetting the original training and test data - just to be sure |
r train <- train.ori test <- test.ori test2 <-test |
| # Also resetting the test2 data with no one purchased ZeroR strategy |
| ```r test2$Purchase <- rep(0, nrow(test2)) |
| summary(train$MOSTYPE) ``` |
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.00 10.00 30.00 24.23 35.00 41.00 |
r prop.table(table(train$MOSTYPE, train$Purchase)) |
## ## 0 1 ## 1 0.0178632772 0.0013740982 ## 2 0.0151150807 0.0013740982 ## 3 0.0415664720 0.0027481965 ## 4 0.0106492614 0.0003435246 ## 5 0.0082445895 0.0003435246 ## 6 0.0195809000 0.0017176228 ## 7 0.0089316386 0.0010305737 ## 8 0.0467193404 0.0068704912 ## 9 0.0446581931 0.0024046719 ## 10 0.0254208176 0.0017176228 ## 11 0.0237031948 0.0013740982 ## 12 0.0168327035 0.0027481965 ## 13 0.0271384404 0.0020611474 ## 15 0.0006870491 0.0000000000 ## 16 0.0034352456 0.0000000000 ## 17 0.0013740982 0.0000000000 ## 18 0.0034352456 0.0000000000 ## 19 0.0006870491 0.0000000000 ## 20 0.0037787702 0.0000000000 ## 21 0.0020611474 0.0000000000 ## 22 0.0154586053 0.0003435246 ## 23 0.0425970457 0.0003435246 ## 24 0.0291995878 0.0006870491 ## 25 0.0147715562 0.0003435246 ## 26 0.0089316386 0.0003435246 ## 27 0.0075575404 0.0000000000 ## 28 0.0048093439 0.0000000000 ## 29 0.0147715562 0.0003435246 ## 30 0.0202679492 0.0006870491 ## 31 0.0357265544 0.0013740982 ## 32 0.0223290965 0.0006870491 ## 33 0.1308828581 0.0089316386 ## 34 0.0305736860 0.0013740982 ## 35 0.0350395053 0.0013740982 ## 36 0.0336654071 0.0034352456 ## 37 0.0216420474 0.0013740982 ## 38 0.0546204054 0.0051528684 ## 39 0.0570250773 0.0027481965 ## 40 0.0109927860 0.0000000000 ## 41 0.0302301615 0.0013740982 |
r prop.table(table(train$MOSTYPE, train$Purchase), 1) |
## ## 0 1 ## 1 0.92857143 0.07142857 ## 2 0.91666667 0.08333333 ## 3 0.93798450 0.06201550 ## 4 0.96875000 0.03125000 ## 5 0.96000000 0.04000000 ## 6 0.91935484 0.08064516 ## 7 0.89655172 0.10344828 ## 8 0.87179487 0.12820513 ## 9 0.94890511 0.05109489 ## 10 0.93670886 0.06329114 ## 11 0.94520548 0.05479452 ## 12 0.85964912 0.14035088 ## 13 0.92941176 0.07058824 ## 15 1.00000000 0.00000000 ## 16 1.00000000 0.00000000 ## 17 1.00000000 0.00000000 ## 18 1.00000000 0.00000000 ## 19 1.00000000 0.00000000 ## 20 1.00000000 0.00000000 ## 21 1.00000000 0.00000000 ## 22 0.97826087 0.02173913 ## 23 0.99200000 0.00800000 ## 24 0.97701149 0.02298851 ## 25 0.97727273 0.02272727 ## 26 0.96296296 0.03703704 ## 27 1.00000000 0.00000000 ## 28 1.00000000 0.00000000 ## 29 0.97727273 0.02272727 ## 30 0.96721311 0.03278689 ## 31 0.96296296 0.03703704 ## 32 0.97014925 0.02985075 ## 33 0.93611794 0.06388206 ## 34 0.95698925 0.04301075 ## 35 0.96226415 0.03773585 ## 36 0.90740741 0.09259259 ## 37 0.94029851 0.05970149 ## 38 0.91379310 0.08620690 ## 39 0.95402299 0.04597701 ## 40 1.00000000 0.00000000 ## 41 0.95652174 0.04347826 |
| ################################## Comparing with base model |
| # Updating the prediction to say that Subtype 8 will Purchase |
| ```r test2\(Purchase[test2\)MOSTYPE==8] <- 1 |
| Our.Prediction=factor(as.factor(test2\(Purchase), c(0, 1), labels = c("Not Purchased", "Purchased")) Actual.Outcome=factor(as.factor(test\)Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”)) ``` |
| # Confusion matrix # cm(actual,predicted) |
r cm5 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased") cm5 |
## FP TP TN FN ## 151 2577 31 152 ## attr(,"negative") ## [1] "Purchased" |
| # corresponding accuracy, sensitivity etc. |
r diagnosticErrors(cm5) |
## acc sens spec ppv npv lor ## 0.8959121 0.9443019 0.1703297 0.9446481 0.1693989 1.2472081 ## attr(,"negative") ## [1] "Purchased" |
| # Computing the classification error |
r ce(Actual.Outcome,Our.Prediction) |
## [1] 0.1040879 |
# Strategy 6 - Customer Sub Type 8
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
summary(train$MOSTYPE)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 10.00 30.00 24.23 35.00 41.00
prop.table(table(train$MOSTYPE, train$Purchase))
##
## 0 1
## 1 0.0178632772 0.0013740982
## 2 0.0151150807 0.0013740982
## 3 0.0415664720 0.0027481965
## 4 0.0106492614 0.0003435246
## 5 0.0082445895 0.0003435246
## 6 0.0195809000 0.0017176228
## 7 0.0089316386 0.0010305737
## 8 0.0467193404 0.0068704912
## 9 0.0446581931 0.0024046719
## 10 0.0254208176 0.0017176228
## 11 0.0237031948 0.0013740982
## 12 0.0168327035 0.0027481965
## 13 0.0271384404 0.0020611474
## 15 0.0006870491 0.0000000000
## 16 0.0034352456 0.0000000000
## 17 0.0013740982 0.0000000000
## 18 0.0034352456 0.0000000000
## 19 0.0006870491 0.0000000000
## 20 0.0037787702 0.0000000000
## 21 0.0020611474 0.0000000000
## 22 0.0154586053 0.0003435246
## 23 0.0425970457 0.0003435246
## 24 0.0291995878 0.0006870491
## 25 0.0147715562 0.0003435246
## 26 0.0089316386 0.0003435246
## 27 0.0075575404 0.0000000000
## 28 0.0048093439 0.0000000000
## 29 0.0147715562 0.0003435246
## 30 0.0202679492 0.0006870491
## 31 0.0357265544 0.0013740982
## 32 0.0223290965 0.0006870491
## 33 0.1308828581 0.0089316386
## 34 0.0305736860 0.0013740982
## 35 0.0350395053 0.0013740982
## 36 0.0336654071 0.0034352456
## 37 0.0216420474 0.0013740982
## 38 0.0546204054 0.0051528684
## 39 0.0570250773 0.0027481965
## 40 0.0109927860 0.0000000000
## 41 0.0302301615 0.0013740982
prop.table(table(train$MOSTYPE, train$Purchase), 1)
##
## 0 1
## 1 0.92857143 0.07142857
## 2 0.91666667 0.08333333
## 3 0.93798450 0.06201550
## 4 0.96875000 0.03125000
## 5 0.96000000 0.04000000
## 6 0.91935484 0.08064516
## 7 0.89655172 0.10344828
## 8 0.87179487 0.12820513
## 9 0.94890511 0.05109489
## 10 0.93670886 0.06329114
## 11 0.94520548 0.05479452
## 12 0.85964912 0.14035088
## 13 0.92941176 0.07058824
## 15 1.00000000 0.00000000
## 16 1.00000000 0.00000000
## 17 1.00000000 0.00000000
## 18 1.00000000 0.00000000
## 19 1.00000000 0.00000000
## 20 1.00000000 0.00000000
## 21 1.00000000 0.00000000
## 22 0.97826087 0.02173913
## 23 0.99200000 0.00800000
## 24 0.97701149 0.02298851
## 25 0.97727273 0.02272727
## 26 0.96296296 0.03703704
## 27 1.00000000 0.00000000
## 28 1.00000000 0.00000000
## 29 0.97727273 0.02272727
## 30 0.96721311 0.03278689
## 31 0.96296296 0.03703704
## 32 0.97014925 0.02985075
## 33 0.93611794 0.06388206
## 34 0.95698925 0.04301075
## 35 0.96226415 0.03773585
## 36 0.90740741 0.09259259
## 37 0.94029851 0.05970149
## 38 0.91379310 0.08620690
## 39 0.95402299 0.04597701
## 40 1.00000000 0.00000000
## 41 0.95652174 0.04347826
################################## Comparing with base model
# Updating the prediction to say that Subtype 8 will not Purchase
test2$Purchase[test2$MOSTYPE==8] <- 0
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm6 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased")
cm6
## FP TP TN FN
## 0 0 2729 182
## attr(,"negative")
## [1] "Not Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm6)
## acc sens spec ppv npv lor
## 0.9374785 0.0000000 1.0000000 NaN 0.9374785 NaN
## attr(,"negative")
## [1] "Not Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0.06252147
| # Strategy 7 - Contribution to Fire Policy |
| # Resetting the original training and test data - just to be sure |
r train <- train.ori test <- test.ori test2 <-test |
| # Also reset the test2 data with no one purchased ZeroR strategy |
| ```r test2$Purchase <- rep(0, nrow(test2)) |
| summary(train$PBRAND) ``` |
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.000 0.000 2.000 1.808 4.000 7.000 |
r prop.table(table(train$PBRAND, train$Purchase)) |
## ## 0 1 ## 0 0.442116111 0.019580900 ## 1 0.025764342 0.001030574 ## 2 0.090003435 0.001030574 ## 3 0.151494332 0.010305737 ## 4 0.184472690 0.023703195 ## 5 0.022329097 0.001374098 ## 6 0.025077293 0.000000000 ## 7 0.001717623 0.000000000 |
r prop.table(table(train$PBRAND, train$Purchase), 1) |
## ## 0 1 ## 0 0.95758929 0.04241071 ## 1 0.96153846 0.03846154 ## 2 0.98867925 0.01132075 ## 3 0.93630573 0.06369427 ## 4 0.88613861 0.11386139 ## 5 0.94202899 0.05797101 ## 6 1.00000000 0.00000000 ## 7 1.00000000 0.00000000 |
| ################################## Comparing with base model |
| # Updating the prediction to say that Customer will Purchase |
| ```r test2\(Purchase[test2\)PBRAND] <- 1 |
| Our.Prediction=factor(as.factor(test2\(Purchase), c(0, 1), labels = c("Not Purchased", "Purchased")) Actual.Outcome=factor(as.factor(test\)Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”)) ``` |
| # Confusion matrix # cm(actual,predicted) |
r cm7 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased") cm7 |
## FP TP TN FN ## 182 2721 0 8 ## attr(,"negative") ## [1] "Purchased" |
| # corresponding accuracy, sensitivity etc. |
r diagnosticErrors(cm7) |
## acc sens spec ppv npv lor ## 0.9347303 0.9970685 0.0000000 0.9373062 0.0000000 -Inf ## attr(,"negative") ## [1] "Purchased" |
| # Computing the classification error |
r ce(Actual.Outcome,Our.Prediction) |
## [1] 0.06526967 |
# Strategy 8 - Contribution to Fire Policy
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
summary(train$PBRAND)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 2.000 1.808 4.000 7.000
prop.table(table(train$PBRAND, train$Purchase))
##
## 0 1
## 0 0.442116111 0.019580900
## 1 0.025764342 0.001030574
## 2 0.090003435 0.001030574
## 3 0.151494332 0.010305737
## 4 0.184472690 0.023703195
## 5 0.022329097 0.001374098
## 6 0.025077293 0.000000000
## 7 0.001717623 0.000000000
prop.table(table(train$PBRAND, train$Purchase), 1)
##
## 0 1
## 0 0.95758929 0.04241071
## 1 0.96153846 0.03846154
## 2 0.98867925 0.01132075
## 3 0.93630573 0.06369427
## 4 0.88613861 0.11386139
## 5 0.94202899 0.05797101
## 6 1.00000000 0.00000000
## 7 1.00000000 0.00000000
################################## Comparing with base model
# Updating the prediction to say that Customer will not Purchase
test2$Purchase[test2$PBRAND] <- 0
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm8 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased")
cm8
## FP TP TN FN
## 0 0 2729 182
## attr(,"negative")
## [1] "Not Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm8)
## acc sens spec ppv npv lor
## 0.9374785 0.0000000 1.0000000 NaN 0.9374785 NaN
## attr(,"negative")
## [1] "Not Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0.06252147
# Strategy 9 - No of Car Policy
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
summary(train$APERSAUT)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.5538 1.0000 6.0000
prop.table(table(train$APERSAUT, train$Purchase))
##
## 0 1
## 0 0.4819649605 0.0116798351
## 1 0.4256269323 0.0388182755
## 2 0.0319477843 0.0065269667
## 3 0.0020611474 0.0000000000
## 4 0.0010305737 0.0000000000
## 6 0.0003435246 0.0000000000
prop.table(table(train$APERSAUT, train$Purchase), 1)
##
## 0 1
## 0 0.97633960 0.02366040
## 1 0.91642012 0.08357988
## 2 0.83035714 0.16964286
## 3 1.00000000 0.00000000
## 4 1.00000000 0.00000000
## 6 1.00000000 0.00000000
################################## Comparing with base model
# Updating the prediction to say that Customer will Purchase
test2$Purchase[test2$APERSAUT] <- 1
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm9 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased")
cm9
## FP TP TN FN
## 182 2724 0 5
## attr(,"negative")
## [1] "Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm9)
## acc sens spec ppv npv lor
## 0.9357609 0.9981678 0.0000000 0.9373710 0.0000000 -Inf
## attr(,"negative")
## [1] "Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0.06423909
# Strategy 10 - No of Car Policy
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
summary(train$APERSAUT)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.5538 1.0000 6.0000
prop.table(table(train$APERSAUT, train$Purchase))
##
## 0 1
## 0 0.4819649605 0.0116798351
## 1 0.4256269323 0.0388182755
## 2 0.0319477843 0.0065269667
## 3 0.0020611474 0.0000000000
## 4 0.0010305737 0.0000000000
## 6 0.0003435246 0.0000000000
prop.table(table(train$APERSAUT, train$Purchase), 1)
##
## 0 1
## 0 0.97633960 0.02366040
## 1 0.91642012 0.08357988
## 2 0.83035714 0.16964286
## 3 1.00000000 0.00000000
## 4 1.00000000 0.00000000
## 6 1.00000000 0.00000000
################################## Comparing with base model
# Updating the prediction to say that Customer will not Purchase
test2$Purchase[test2$APERSAUT] <- 0
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm10 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased")
cm10
## FP TP TN FN
## 0 0 2729 182
## attr(,"negative")
## [1] "Not Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm10)
## acc sens spec ppv npv lor
## 0.9374785 0.0000000 1.0000000 NaN 0.9374785 NaN
## attr(,"negative")
## [1] "Not Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0.06252147
| ############################################# |
|---|
| # Strategy 12 - Contribution of car Policy |
| # Resetting the original training and test data - just to be sure |
r train <- train.ori test <- test.ori test2 <-test |
| # Also resetting the test2 data with no one purchased ZeroR strategy |
| ```r test2$Purchase <- rep(0, nrow(test2)) |
| summary(train$PPERSAUT) ``` |
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.000 0.000 5.000 2.944 6.000 8.000 |
r prop.table(table(train$PPERSAUT, train$Purchase)) |
## ## 0 1 ## 0 0.4819649605 0.0116798351 ## 4 0.0003435246 0.0000000000 ## 5 0.0979045002 0.0027481965 ## 6 0.3558914462 0.0425970457 ## 7 0.0065269667 0.0000000000 ## 8 0.0003435246 0.0000000000 |
r prop.table(table(train$PPERSAUT, train$Purchase), 1) |
## ## 0 1 ## 0 0.97633960 0.02366040 ## 4 1.00000000 0.00000000 ## 5 0.97269625 0.02730375 ## 6 0.89310345 0.10689655 ## 7 1.00000000 0.00000000 ## 8 1.00000000 0.00000000 |
| ################################## Comparing with base model |
| # Updating the prediction to say that Customer will not Purchase |
| ```r test2\(Purchase[test2\)PPERSAUT] <- 0 |
| Our.Prediction=factor(as.factor(test2\(Purchase), c(0, 1), labels = c("Not Purchased", "Purchased")) Actual.Outcome=factor(as.factor(test\)Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”)) ``` |
| # Confusion matrix # cm(actual,predicted) |
r cm12 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased") cm12 |
## FP TP TN FN ## 0 0 2729 182 ## attr(,"negative") ## [1] "Not Purchased" |
| # corresponding accuracy, sensitivity etc. |
r diagnosticErrors(cm12) |
## acc sens spec ppv npv lor ## 0.9374785 0.0000000 1.0000000 NaN 0.9374785 NaN ## attr(,"negative") ## [1] "Not Purchased" |
| # Computing the classification error |
r ce(Actual.Outcome,Our.Prediction) |
## [1] 0.06252147 |
# Strategy 13a - Contribution to car Policy
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
summary(train$PPERSAUT)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 5.000 2.944 6.000 8.000
prop.table(table(train$PPERSAUT, train$Purchase))
##
## 0 1
## 0 0.4819649605 0.0116798351
## 4 0.0003435246 0.0000000000
## 5 0.0979045002 0.0027481965
## 6 0.3558914462 0.0425970457
## 7 0.0065269667 0.0000000000
## 8 0.0003435246 0.0000000000
prop.table(table(train$PPERSAUT, train$Purchase), 1)
##
## 0 1
## 0 0.97633960 0.02366040
## 4 1.00000000 0.00000000
## 5 0.97269625 0.02730375
## 6 0.89310345 0.10689655
## 7 1.00000000 0.00000000
## 8 1.00000000 0.00000000
################################## Comparing with base model
# Updating the prediction to say that Customer will Purchase if they are 0,5,6 class
test2$Purchase[test2$PPERSAUT==0&5&6] <- 1
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm13a = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased")
cm13a
## FP TP TN FN
## 144 1359 38 1370
## attr(,"negative")
## [1] "Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm13a)
## acc sens spec ppv npv lor
## 0.47990381 0.49798461 0.20879121 0.90419162 0.02698864 -1.34028874
## attr(,"negative")
## [1] "Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0.5200962
| # Strategy 13b - Contribution to car Policy |
| # Resetting the original training and test data - just to be sure |
r train <- train.ori test <- test.ori test2 <-test |
| # Also resetting the test2 data with no one purchased ZeroR strategy |
| ```r test2$Purchase <- rep(0, nrow(test2)) |
| summary(train$PPERSAUT) ``` |
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.000 0.000 5.000 2.944 6.000 8.000 |
r prop.table(table(train$PPERSAUT, train$Purchase)) |
## ## 0 1 ## 0 0.4819649605 0.0116798351 ## 4 0.0003435246 0.0000000000 ## 5 0.0979045002 0.0027481965 ## 6 0.3558914462 0.0425970457 ## 7 0.0065269667 0.0000000000 ## 8 0.0003435246 0.0000000000 |
r prop.table(table(train$PPERSAUT, train$Purchase), 1) |
## ## 0 1 ## 0 0.97633960 0.02366040 ## 4 1.00000000 0.00000000 ## 5 0.97269625 0.02730375 ## 6 0.89310345 0.10689655 ## 7 1.00000000 0.00000000 ## 8 1.00000000 0.00000000 |
| ################################## Comparing with base model |
| # Updating the prediction to say that Customer will Purchase if they are 0,5,6 class |
| ```r test2\(Purchase[test2\)PPERSAUT==0&5&6] <- 0 |
| Our.Prediction=factor(as.factor(test2\(Purchase), c(0, 1), labels = c("Not Purchased", "Purchased")) Actual.Outcome=factor(as.factor(test\)Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”)) ``` |
| # Confusion matrix # cm(actual,predicted) |
r cm13b = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased") cm13b |
## FP TP TN FN ## 0 0 2729 182 ## attr(,"negative") ## [1] "Not Purchased" |
| # corresponding accuracy, sensitivity etc. |
r diagnosticErrors(cm13b) |
## acc sens spec ppv npv lor ## 0.9374785 0.0000000 1.0000000 NaN 0.9374785 NaN ## attr(,"negative") ## [1] "Not Purchased" |
| # Computing the classification error |
r ce(Actual.Outcome,Our.Prediction) |
## [1] 0.06252147 |
library(rpart)
library(rattle)
library(rpart.plot)
library(RColorBrewer)
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
#############
# Rpart basic tree # R part for Contribution to car policy, NO. of car policy,contribution to fire policy ## NO. of fire policy
## STUDY OF each variables ##purchase
table(train$Purchase)
##
## 0 1
## 2745 166
##No. of car policy
table(train$APERSAUT)
##
## 0 1 2 3 4 6
## 1437 1352 112 6 3 1
### RpartBasic tree # R part for Number of car policy vs Purchase
fit <- rpart(Purchase~ APERSAUT, data=train)
fit$variable.importance
## APERSAUT
## 3.159186
fancyRpartPlot(fit)
# Strategy 14 - No. of Car Policy
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
summary(train$APERSAUT)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.5538 1.0000 6.0000
prop.table(table(train$APERSAUT, train$Purchase))
##
## 0 1
## 0 0.4819649605 0.0116798351
## 1 0.4256269323 0.0388182755
## 2 0.0319477843 0.0065269667
## 3 0.0020611474 0.0000000000
## 4 0.0010305737 0.0000000000
## 6 0.0003435246 0.0000000000
prop.table(table(train$APERSAUT, train$Purchase), 1)
##
## 0 1
## 0 0.97633960 0.02366040
## 1 0.91642012 0.08357988
## 2 0.83035714 0.16964286
## 3 1.00000000 0.00000000
## 4 1.00000000 0.00000000
## 6 1.00000000 0.00000000
################################## Comparing with base model
# Updating the prediction to say that all the number of car policy will Purchase
test2$Purchase[test2$APERSAUT] <- 1
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm14 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased")
cm14
## FP TP TN FN
## 182 2724 0 5
## attr(,"negative")
## [1] "Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm14)
## acc sens spec ppv npv lor
## 0.9357609 0.9981678 0.0000000 0.9373710 0.0000000 -Inf
## attr(,"negative")
## [1] "Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0.06423909
| # Strategy 15 - No. of Car Policy |
|---|
| #### startegy 16 |
| # Reset the original training and test data - just to be sure |
r train <- train.ori test <- test.ori test2 <-test |
| # Also resetting the test2 data with no one purchased ZeroR strategy |
| ```r test2$Purchase <- rep(0, nrow(test2)) |
| summary(train$APERSAUT) ``` |
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.0000 0.0000 1.0000 0.5538 1.0000 6.0000 |
r prop.table(table(train$APERSAUT, train$Purchase)) |
## ## 0 1 ## 0 0.4819649605 0.0116798351 ## 1 0.4256269323 0.0388182755 ## 2 0.0319477843 0.0065269667 ## 3 0.0020611474 0.0000000000 ## 4 0.0010305737 0.0000000000 ## 6 0.0003435246 0.0000000000 |
r prop.table(table(train$APERSAUT, train$Purchase), 1) |
## ## 0 1 ## 0 0.97633960 0.02366040 ## 1 0.91642012 0.08357988 ## 2 0.83035714 0.16964286 ## 3 1.00000000 0.00000000 ## 4 1.00000000 0.00000000 ## 6 1.00000000 0.00000000 |
| ################################## Compare with base model |
| # Updating the prediction to say that Customer will Purchase |
| ```r test2\(Purchase[test2\)APERSAUT==2] <- 1 |
| Our.Prediction=factor(as.factor(test2\(Purchase), c(0, 1), labels = c("Not Purchased", "Purchased")) Actual.Outcome=factor(as.factor(test\)Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”)) ``` |
| # Confusion matrix # cm(actual,predicted) |
r cm16 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased") cm16 |
## FP TP TN FN ## 115 19 2614 163 ## attr(,"negative") ## [1] "Not Purchased" |
| # corresponding accuracy, sensitivity etc. |
r diagnosticErrors(cm16) |
## acc sens spec ppv npv lor ## 0.9045002 0.1043956 0.9578600 0.1417910 0.9413036 0.9743935 ## attr(,"negative") ## [1] "Not Purchased" |
| # Computing the classification error |
r ce(Actual.Outcome,Our.Prediction) |
## [1] 0.09549983 |
library(rpart)
library(rattle)
library(rpart.plot)
library(RColorBrewer)
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
## Rpart basic tree # R part for contribution to fire policy
## STUDY OF each variables ##purchase
table(train$Purchase)
##
## 0 1
## 2745 166
##contribution to fire policy
table(train$PBRAND)
##
## 0 1 2 3 4 5 6 7
## 1344 78 265 471 606 69 73 5
######################- Rpart Basic tree # R part for contribution of fire policy vs Purchase
fit <- rpart(Purchase~ PBRAND, data=train)
fit$variable.importance
## PBRAND
## 1.61874
fancyRpartPlot(fit)
# Strategy 17 - Contribution to Fire Policy
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
summary(train$PBRAND)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 2.000 1.808 4.000 7.000
prop.table(table(train$PBRAND, train$Purchase))
##
## 0 1
## 0 0.442116111 0.019580900
## 1 0.025764342 0.001030574
## 2 0.090003435 0.001030574
## 3 0.151494332 0.010305737
## 4 0.184472690 0.023703195
## 5 0.022329097 0.001374098
## 6 0.025077293 0.000000000
## 7 0.001717623 0.000000000
prop.table(table(train$PBRAND, train$Purchase), 1)
##
## 0 1
## 0 0.95758929 0.04241071
## 1 0.96153846 0.03846154
## 2 0.98867925 0.01132075
## 3 0.93630573 0.06369427
## 4 0.88613861 0.11386139
## 5 0.94202899 0.05797101
## 6 1.00000000 0.00000000
## 7 1.00000000 0.00000000
################################## Comparing with base model
# Updating the prediction to say that Customer will Purchase
test2$Purchase[test2$PBRAND] <- 1
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm17 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased")
cm17
## FP TP TN FN
## 182 2721 0 8
## attr(,"negative")
## [1] "Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm17)
## acc sens spec ppv npv lor
## 0.9347303 0.9970685 0.0000000 0.9373062 0.0000000 -Inf
## attr(,"negative")
## [1] "Purchased"
**# Computing the classification error*8
ce(Actual.Outcome,Our.Prediction)
## [1] 0.06526967
| # Strategy 18 - Contribution to Fire Policy |
| # Resetting the original training and test data - just to be sure |
r train <- train.ori test <- test.ori test2 <-test |
| # Also resetting the test2 data with no one purchased ZeroR strategy |
| ```r test2$Purchase <- rep(0, nrow(test2)) |
| summary(train$PBRAND) ``` |
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.000 0.000 2.000 1.808 4.000 7.000 |
r prop.table(table(train$PBRAND, train$Purchase)) |
## ## 0 1 ## 0 0.442116111 0.019580900 ## 1 0.025764342 0.001030574 ## 2 0.090003435 0.001030574 ## 3 0.151494332 0.010305737 ## 4 0.184472690 0.023703195 ## 5 0.022329097 0.001374098 ## 6 0.025077293 0.000000000 ## 7 0.001717623 0.000000000 |
r prop.table(table(train$PBRAND, train$Purchase), 1) |
## ## 0 1 ## 0 0.95758929 0.04241071 ## 1 0.96153846 0.03846154 ## 2 0.98867925 0.01132075 ## 3 0.93630573 0.06369427 ## 4 0.88613861 0.11386139 ## 5 0.94202899 0.05797101 ## 6 1.00000000 0.00000000 ## 7 1.00000000 0.00000000 |
| ################################## Comparing with base model |
| # Updating the prediction to say that Customer not Purchase |
| ```r test2\(Purchase[test2\)PBRAND] <- 0 |
| Our.Prediction=factor(as.factor(test2\(Purchase), c(0, 1), labels = c("Not Purchased", "Purchased")) Actual.Outcome=factor(as.factor(test\)Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”)) ``` |
| # Confusion matrix # cm(actual,predicted) |
r cm18 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased") cm18 |
## FP TP TN FN ## 0 0 2729 182 ## attr(,"negative") ## [1] "Not Purchased" |
| # corresponding accuracy, sensitivity etc. |
r diagnosticErrors(cm18) |
## acc sens spec ppv npv lor ## 0.9374785 0.0000000 1.0000000 NaN 0.9374785 NaN ## attr(,"negative") ## [1] "Not Purchased" |
| # Computing the classification error |
r ce(Actual.Outcome,Our.Prediction) |
## [1] 0.06252147 |
library(rpart)
library(rattle)
library(rpart.plot)
library(RColorBrewer)
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
table(test$Purchase)
##
## 0 1
## 2729 182
table(train$Purchase)
##
## 0 1
## 2745 166
test2$Purchase <- rep(0, nrow(test2))
table(test2$Purchase)
##
## 0
## 2911
# Rpart basic tree # R part for Contribution to car policy, NO of car policy,contribution to fire policy ## no of fire policy
## STUDY OF each variables ##purchase
table(train$Purchase)
##
## 0 1
## 2745 166
length(train$Purchase)
## [1] 2911
##Contribution to car policy
table(train$PPERSAUT)
##
## 0 4 5 6 7 8
## 1437 1 293 1160 19 1
##No. of car policy
table(train$APERSAUT)
##
## 0 1 2 3 4 6
## 1437 1352 112 6 3 1
##contribution to fire policy
table(train$PBRAND)
##
## 0 1 2 3 4 5 6 7
## 1344 78 265 471 606 69 73 5
##No of fire policy
table(train$ABRAND)
##
## 0 1 2 4
## 1344 1502 64 1
fit <- rpart(Purchase~ ABRAND, data=train)
fit$variable.importance
## NULL
fancyRpartPlot(fit)
# Strategy 19 - Number of Fire Policy
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
summary(train$ABRAND)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.5613 1.0000 4.0000
prop.table(table(train$ABRAND, train$Purchase))
##
## 0 1
## 0 0.4421161113 0.0195809000
## 1 0.4802473377 0.0357265544
## 2 0.0202679492 0.0017176228
## 4 0.0003435246 0.0000000000
prop.table(table(train$ABRAND, train$Purchase), 1)
##
## 0 1
## 0 0.95758929 0.04241071
## 1 0.93075899 0.06924101
## 2 0.92187500 0.07812500
## 4 1.00000000 0.00000000
################################## Comparing with base model
# Updating the prediction to say that Customer will Purchase
test2$Purchase[test2$ABRAND] <- 1
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm19 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased")
cm19
## FP TP TN FN
## 182 2723 0 6
## attr(,"negative")
## [1] "Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm19)
## acc sens spec ppv npv lor
## 0.9354174 0.9978014 0.0000000 0.9373494 0.0000000 -Inf
## attr(,"negative")
## [1] "Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0.06458262
| # Strategy 20 - Number of Fire Policy |
|---|
| # Strategy 21 - Number of Fire Policy |
| # Resetting the original training and test data - just to be sure |
r train <- train.ori test <- test.ori test2 <-test |
| # Also resetting the test2 data with no one purchased ZeroR strategy |
| ```r test2$Purchase <- rep(0, nrow(test2)) |
| summary(train$ABRAND) ``` |
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.0000 0.0000 1.0000 0.5613 1.0000 4.0000 |
r prop.table(table(train$ABRAND, train$Purchase)) |
## ## 0 1 ## 0 0.4421161113 0.0195809000 ## 1 0.4802473377 0.0357265544 ## 2 0.0202679492 0.0017176228 ## 4 0.0003435246 0.0000000000 |
r prop.table(table(train$ABRAND, train$Purchase), 1) |
## ## 0 1 ## 0 0.95758929 0.04241071 ## 1 0.93075899 0.06924101 ## 2 0.92187500 0.07812500 ## 4 1.00000000 0.00000000 |
| ################################## Comparing with base model |
| # Updating the prediction to say that Customer will Purchase |
| ```r test2\(Purchase[test2\)ABRAND==2] <- 1 |
| Our.Prediction=factor(as.factor(test2\(Purchase), c(0, 1), labels = c("Not Purchased", "Purchased")) Actual.Outcome=factor(as.factor(test\)Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”)) ``` |
| # Confusion matrix # cm(actual,predicted) |
r cm21 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased") cm21 |
## FP TP TN FN ## 180 2669 2 60 ## attr(,"negative") ## [1] "Purchased" |
| # corresponding accuracy, sensitivity etc. |
r diagnosticErrors(cm21) |
## acc sens spec ppv npv lor ## 0.91755411 0.97801392 0.01098901 0.93681994 0.03225806 -0.70469508 ## attr(,"negative") ## [1] "Purchased" |
| # Computing the classification error |
r ce(Actual.Outcome,Our.Prediction) |
## [1] 0.08244589 |
library(crossval)
library(gplots)
library(vcd)
library(Metrics)
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
# Rpart basic tree # R part for Contribution to car policy, NO of car policy,contribution to fire policy ## NO. of fire policy
## STUDY OF each variables ##purchase
table(train$Purchase)
##
## 0 1
## 2745 166
length(train$Purchase)
## [1] 2911
##Contribution to car policy
table(train$PPERSAUT)
##
## 0 4 5 6 7 8
## 1437 1 293 1160 19 1
##NO. of car policy
table(train$APERSAUT)
##
## 0 1 2 3 4 6
## 1437 1352 112 6 3 1
##contribution to fire policy
table(train$PBRAND)
##
## 0 1 2 3 4 5 6 7
## 1344 78 265 471 606 69 73 5
##NO. of fire policy
table(train$ABRAND)
##
## 0 1 2 4
## 1344 1502 64 1
### fancy r part
fit <- rpart(Purchase~ PPERSAUT+APERSAUT+PBRAND+ABRAND, data=train)
fit$variable.importance
## PPERSAUT APERSAUT PBRAND ABRAND
## 4.589962 3.618357 2.100354 1.615168
fancyRpartPlot(fit)
# Strategy 22 - Contribution to Car Polcy+Number of Car Policy, Contribution to Fire Policy + Number of Fire Policy
# Resetting the original training and test data - just to be sure
train <- train.ori
test <- test.ori
test2 <-test
# Also resetting the test2 data with no one purchased ZeroR strategy
test2$Purchase <- rep(0, nrow(test2))
summary(train$PPERSAUT + train$APERSAUT + train$PBRAND + train$ABRAND)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 3.000 6.000 5.867 10.000 22.000
prop.table(table(train$PPERSAUT + train$APERSAUT + train$PBRAND + train$ABRAND, train$Purchase))
##
## 0 1
## 0 0.2229474407 0.0072140158
## 2 0.0175197527 0.0000000000
## 3 0.0522157334 0.0000000000
## 4 0.0841635177 0.0017176228
## 5 0.0690484370 0.0027481965
## 6 0.0786671247 0.0006870491
## 7 0.1638612161 0.0109927860
## 8 0.0133974579 0.0010305737
## 9 0.0178632772 0.0006870491
## 10 0.0360700790 0.0017176228
## 11 0.0632085194 0.0072140158
## 12 0.0845070423 0.0171762281
## 13 0.0216420474 0.0048093439
## 14 0.0130539334 0.0006870491
## 15 0.0030917211 0.0003435246
## 16 0.0010305737 0.0000000000
## 17 0.0003435246 0.0000000000
## 22 0.0003435246 0.0000000000
prop.table(table(train$PPERSAUT + train$APERSAUT + train$PBRAND + train$ABRAND, train$Purchase), 1)
##
## 0 1
## 0 0.968656716 0.031343284
## 2 1.000000000 0.000000000
## 3 1.000000000 0.000000000
## 4 0.980000000 0.020000000
## 5 0.961722488 0.038277512
## 6 0.991341991 0.008658009
## 7 0.937131631 0.062868369
## 8 0.928571429 0.071428571
## 9 0.962962963 0.037037037
## 10 0.954545455 0.045454545
## 11 0.897560976 0.102439024
## 12 0.831081081 0.168918919
## 13 0.818181818 0.181818182
## 14 0.950000000 0.050000000
## 15 0.900000000 0.100000000
## 16 1.000000000 0.000000000
## 17 1.000000000 0.000000000
## 22 1.000000000 0.000000000
################################## Comparing with base model
# Updating the prediction to say that Customer will Purchase
test2$Purchase[test2$PPERSAUT + test2$APERSAUT + test2$PBRAND + test2$ABRAND] <- 1
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm22 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Purchased")
cm22
## FP TP TN FN
## 182 2711 0 18
## attr(,"negative")
## [1] "Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm22)
## acc sens spec ppv npv lor
## 0.9312951 0.9934042 0.0000000 0.9370895 0.0000000 -Inf
## attr(,"negative")
## [1] "Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0.06870491
| # Strategy 23 - Contribution to Car Polcy+Number of Car Policy, Contribution to Fire Policy + Number of Fire Policy # Resetting the original training and test data - just to be sure |
|---|
| # Strategy 24 - C50 trees (rules) basic # Now we look at C50 algorithm by taking the data as is. # Read the Caravan data from Caravan2.csv |
| ```r Caravan.ori <-Caravan2 |
| set.seed(11) train <- Caravan.ori[sample(row.names(Caravan.ori), size = round(nrow(Caravan.ori)*0.7)), ] test <- Caravan.ori[!(row.names(Caravan.ori) %in% row.names(train)), ] ``` |
| # Creating backup of test and train data for later use. Not modifying .ori files as a rule |
r train.ori <-train test.ori<-test train2<-train test2<-test |
r library(crossval) library(gplots) library(vcd) library(Metrics) library(C50) |
| # Resetting the original training and test data - just to be sure |
r train <- train.ori test <- test.ori test2 <-test |
| # Also resetting the test2 data with no one purchased ZeroR strategy |
| ```r test2$Purchase <- rep(0, nrow(test2)) |
| combinedData1 <- Caravan.ori[,-7] combinedData2 <- combinedData1[,-6] combinedData <- combinedData2[,-5] |
| combinedData\(Purchase <- factor(combinedData\)Purchase) |
| set.seed(11) train <- combinedData[sample(row.names(combinedData), size = round(nrow(combinedData)*0.7)), ] test <- combinedData[!(row.names(combinedData) %in% row.names(train)), ] |
| C50.Rules <- C5.0(Purchase~PPLEZIER+PBYSTAND+APLEZIER+ABYSTAND, data=train, rules = FALSE) |
| Prediction <- predict(C50.Rules,test) ``` |
| ################################## Comparing with base model # Updating the prediction with out model output |
| ```r test2$Purchase <- Prediction |
| Our.Prediction=factor(as.factor(test2\(Purchase), c(0, 1), labels = c("Not Purchased", "Purchased")) Actual.Outcome=factor(as.factor(test\)Purchase), c(0, 1), labels = c(“Not Purchased”, “Purchased”)) ``` |
| # Confusion matrix # cm(actual,predicted) |
r cm24 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased") cm24 |
## FP TP TN FN ## 0 0 1638 109 ## attr(,"negative") ## [1] "Not Purchased" |
| # corresponding accuracy, sensitivity etc. |
r diagnosticErrors(cm24) |
## acc sens spec ppv npv lor ## 0.9376073 0.0000000 1.0000000 NaN 0.9376073 NaN ## attr(,"negative") ## [1] "Not Purchased" |
| # Computing the classification error |
r ce(Actual.Outcome,Our.Prediction) |
## [1] 0.06239267 |
# Strategy 25 - Tree model of C50
C50.Tree <- C5.0(train[,-86],train$Purchase)
Prediction <- predict(C50.Tree,test)
################################## Comparing with base model # Updating the prediction with out model output
test2$Purchase <- Prediction
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm25 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased")
cm25
## FP TP TN FN
## 0 109 1638 0
## attr(,"negative")
## [1] "Not Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm25)
## acc sens spec ppv npv lor
## 1 1 1 1 1 Inf
## attr(,"negative")
## [1] "Not Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0
# Strategy 26 - Tree model of C50 with min of 2 items in tree edges
C50.Tree.small <- C5.0(train[,-3],train$Purchase,
control = C5.0Control(minCases = 2))
Prediction <- predict(C50.Tree.small,test)
################################## Comparing with base model # Updating the prediction with out model output
test2$Purchase <- Prediction
Our.Prediction=factor(as.factor(test2$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
Actual.Outcome=factor(as.factor(test$Purchase), c(0, 1), labels = c("Not Purchased", "Purchased"))
# Confusion matrix # cm(actual,predicted)
cm26 = confusionMatrix(Actual.Outcome,Our.Prediction, negative = "Not Purchased")
cm26
## FP TP TN FN
## 0 109 1638 0
## attr(,"negative")
## [1] "Not Purchased"
# corresponding accuracy, sensitivity etc.
diagnosticErrors(cm26)
## acc sens spec ppv npv lor
## 1 1 1 1 1 Inf
## attr(,"negative")
## [1] "Not Purchased"
# Computing the classification error
ce(Actual.Outcome,Our.Prediction)
## [1] 0
library(effects)
data(Caravan)
Caravan$Purchase <- as.factor(Caravan$Purchase)
Caravan.mod <- glm(Purchase ~ MOSHOOFD + MSKB1 ,
family=binomial, data=Caravan)
summary(Caravan.mod)
##
## Call:
## glm(formula = Purchase ~ MOSHOOFD + MSKB1, family = binomial,
## data = Caravan)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5088 -0.3954 -0.3211 -0.2993 2.5560
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.31925 0.14247 -16.279 < 2e-16 ***
## MOSHOOFD -0.09567 0.01944 -4.922 8.58e-07 ***
## MSKB1 0.04840 0.04057 1.193 0.233
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2635.5 on 5821 degrees of freedom
## Residual deviance: 2606.7 on 5819 degrees of freedom
## AIC: 2612.7
##
## Number of Fisher Scoring iterations: 5
GLM.1 <- glm(Purchase ~ MOSHOOFD + MSKB1,
family=binomial(logit), data=Caravan)
# All effect plots
plot(allEffects(GLM.1))
# Logistics Regression Model
logr_vm <- glm(Purchase ~ MOSHOOFD, data=Caravan, family=binomial)
logr_vm <- glm(Purchase ~ MOSHOOFD, data=Caravan, family=binomial(link="logit"))
# Print information about the model
logr_vm
##
## Call: glm(formula = Purchase ~ MOSHOOFD, family = binomial(link = "logit"),
## data = Caravan)
##
## Coefficients:
## (Intercept) MOSHOOFD
## -2.21490 -0.09996
##
## Degrees of Freedom: 5821 Total (i.e. Null); 5820 Residual
## Null Deviance: 2636
## Residual Deviance: 2608 AIC: 2612
summary(logr_vm)
##
## Call:
## glm(formula = Purchase ~ MOSHOOFD, family = binomial(link = "logit"),
## data = Caravan)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.4340 -0.3944 -0.3250 -0.3095 2.5510
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.21490 0.11101 -19.953 < 2e-16 ***
## MOSHOOFD -0.09996 0.01908 -5.239 1.61e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2635.5 on 5821 degrees of freedom
## Residual deviance: 2608.1 on 5820 degrees of freedom
## AIC: 2612.1
##
## Number of Fisher Scoring iterations: 5
logr_va <- glm(Purchase ~ MSKB1, data=Caravan, family=binomial)
logr_va
##
## Call: glm(formula = Purchase ~ MSKB1, family = binomial, data = Caravan)
##
## Coefficients:
## (Intercept) MSKB1
## -2.89979 0.08605
##
## Degrees of Freedom: 5821 Total (i.e. Null); 5820 Residual
## Null Deviance: 2636
## Residual Deviance: 2631 AIC: 2635
summary(logr_va)
##
## Call:
## glm(formula = Purchase ~ MSKB1, family = binomial, data = Caravan)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.4749 -0.3559 -0.3413 -0.3273 2.4304
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.89979 0.08805 -32.932 <2e-16 ***
## MSKB1 0.08605 0.03921 2.195 0.0282 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2635.5 on 5821 degrees of freedom
## Residual deviance: 2630.9 on 5820 degrees of freedom
## AIC: 2634.9
##
## Number of Fisher Scoring iterations: 5
# Multiple predictors with interactions
logr_vmai <- glm(Purchase ~ MOSHOOFD*MSKB1 , data=Caravan, family=binomial)
logr_vmai <- glm(Purchase ~ MOSHOOFD + MSKB1 + MOSHOOFD:MSKB1, data=Caravan, family=binomial)
logr_vmai
##
## Call: glm(formula = Purchase ~ MOSHOOFD + MSKB1 + MOSHOOFD:MSKB1, family = binomial,
## data = Caravan)
##
## Coefficients:
## (Intercept) MOSHOOFD MSKB1 MOSHOOFD:MSKB1
## -2.42610 -0.07504 0.10703 -0.01250
##
## Degrees of Freedom: 5821 Total (i.e. Null); 5818 Residual
## Null Deviance: 2636
## Residual Deviance: 2606 AIC: 2614
summary(logr_vmai)
##
## Call:
## glm(formula = Purchase ~ MOSHOOFD + MSKB1 + MOSHOOFD:MSKB1, family = binomial,
## data = Caravan)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5927 -0.3949 -0.3223 -0.3077 2.5637
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.42610 0.18982 -12.781 <2e-16 ***
## MOSHOOFD -0.07504 0.03072 -2.442 0.0146 *
## MSKB1 0.10703 0.07828 1.367 0.1715
## MOSHOOFD:MSKB1 -0.01250 0.01442 -0.867 0.3861
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2635.5 on 5821 degrees of freedom
## Residual deviance: 2605.9 on 5818 degrees of freedom
## AIC: 2613.9
##
## Number of Fisher Scoring iterations: 5