ABSTRACT
This dataset is owned and supplied by the Dutch data mining company Sentient Machine Research, and is based on real world business data. The data consists of 86 variables and includes product usage data and socio-demographic data derived from zip area codes. This dataset has been used in the CoIL Challenge 2000 data mining competition.
SOURCES
Original Owner and Donor Peter van der Putten Sentient Machine Research Baarsjesweg 224 1058 AA Amsterdam The Netherlands +31 20 6186927 pvdputten@hotmail.com, putten@liacs.nl TIC Benchmark Homepage Date Donated: March 7, 2000
OBJECTIVE:-
Can you predict who would be interested in buying a caravan insurance policy and give an explanation.
Prediction
For the prediction task, the underlying problem is to the find the subset of customers with a probability of having a caravan insurance policy above some boundary probability. The known policyholders can then be removed and the rest receives a mailing. The boundary depends on the costs and benefits such as of the costs of mailing and benefit of selling insurance policies. To approximate this problem, we want you to find the set of 800 customers in the test set that contains the most caravan policy owners.
SUMMARY ABOUT DATASET:
NO OF OBSERVATIONS 5822
NO OF VARIABLES 86
0ut of 86 variables, containing socio demographic data (variables 1-43) .The product ownership (variables 44-86).The socio demographic data is derived from zip codes. All customers living in areas with the same zip code have the same socio demographic attributes
library(ISLR)
## PIE CHART FOR YES/ NO
## Purchase of caravan policy
a=(Caravan$Purchase)
carv.data=table(a)
carv.data
## a
## No Yes
## 5474 348
colors=c("black","red")
col=colors
pie(carv.data,main = "customers of caravan policy",
col=colors)
box()
OBSERVATION FOR PIE CHART
The insurance company holds 5822 customer records . Total number of customer who said yes to caravan policy is 348.
## BAR CHART FOR CUSTOMERS SUBTYPE
customer subtype 1
a<-table(Caravan$MOSTYPE[Caravan$Purchase=="Yes"])
barplot(a,
border="dark blue",
main = "BAR PLOT FOR CUSTOMER SUBTYPE 1 ",
xlab=" customer subtype 1",
ylab="No of customers")
box()
OBSERVATION FOR CUSTOMER TYPE(L1)
In the above barplot customers of subtype L1 is taken . The L1 contain 41 labels.The customer belonging to subtype 8 (Middle class families) & subtype 33 (lower class with large families)said Yes to the caravan policy
PIE CHART FOR CUSTOMER SUBTYPE 2
CUSTOMER SUBTYPE 2(L2)
b<-table(Caravan$MOSHOOFD[Caravan$Purchase=="Yes"])
b
##
## 1 2 3 5 6 7 8 9 10
## 48 66 59 15 4 20 89 42 5
pie(b,
col=rainbow(12),
main="pie chart for customers subtype 2")
box()
OBSERVATION FOR CUSTOMER SUBTYPE 2
In the above Pie chart customer subtype 2 is taken.L2 contain 10 labels.The customer of subtype 8(Family with grown up)& subtype 2(Driven growers) are interested in buying Caravan policy
AGE GROUP
a<-table(Caravan$MGEMLEEF[Caravan$Purchase=="Yes"])
barplot(a,
col=rainbow(6),
main = "THE AGE GROUP",
xlab="age",
ylab="No of customers")
box()
OBSERVATION FOR AGE GROUP
In the above barplot customer with 6 age group between 20 - 80 is taken . The customers belonging to age group of 3(40-50)said yes for caravan policy
Ref links:
``` http://kdd.ics.uci.edu/databases/tic/tic.task.html http://www.liacs.nl/~putten/library/cc2000/Preface.html http://www.liacs.nl/~putten/library/cc2000/ELKANP~1.pdf http://www.liacs.nl/~putten/library/cc2000/data.html http://www.liacs.nl/~putten/library/cc2000/report2.html http://mlr.cs.umass.edu/ml/datasets/Insurance+Company+Benchmark+%28COIL+2