Introduction

This data set is about a direct marketing case from the insurance sector which was to predict and explain policy ownership. It is about predicting who would be interested in buying a caravan insurance policy and to give a relevant explanation. This data set was used in the second edition of the Computational intelligence and Learning(CoIL) competition Challenge in the Year 2000, organized by CoIL cluster, which is a cooperation between four EU funded Networks of Excellence which represent the areas of neural networks (NeuroNet), fuzzy systems (ERUDIT), evolutionary computing (EvoNet) and machine learning (MLNet) and it is owned and donated by Peter van der Putten of the Dutch data mining company Sentient Machine Research, Baarsjesweg 224 1058 AA Amsterdam The Netherlands +31 20 6186927 putten@liacs.nl and is based on real world business problem. TIC (The Insurance Company) Benchmark Homepage (http://www.liacs.nl/~putten/library/cc2000) was donated on March 7, 2000.

Relevant Papers

P. van der Putten and M. van Someren (eds). CoIL Challenge 2000: The Insurance Company Case. Published by Sentient Machine Research, Amsterdam. Also a Leiden Institute of Advanced Computer Science Technical Report 2000-09. June 22, 2000.

SUMMARY ABOUT DATASET

NO OF OBSERVATIONS: 5822 real customer records

NO OF VARIABLES: 86 Nos.

Each real customer record consists of 86 variables, containing sociodemographic data (variables 1-43) and product ownership data (variables 44-86). The sociodemographic data is derived from zip codes. All customers living in areas with the same zip code have the same sociodemographic attributes. Variable 86 (Purchase), “CARAVAN: Number of mobile home policies”, is the target variable which indicates whether the customer purchased a caravan insurance policy or not.

TASK

Predict which customers are potentially interested in a caravan insurance policy (Prediction or Regression).

PREDICTION TASK

To predict whether a customer is interested in a caravan insurance policy from other data about the customer. Information about customers consists of 86 variables and includes product usage data and socio-demographic data derived from zip area codes. The data was supplied by the Dutch data mining company Sentient Machine Research and is based on a real world business problem. The training set contains over 5000 descriptions of customers, including the information of whether or not they have a caravan insurance policy. A test set contains 4000 customers. In the prediction task, the underlying problem is to the find the subset of customers with a probability of having a caravan insurance policy above some boundary probability. The known policyholders can then be removed and the rest receives a mailing. The boundary depends on the costs and benefits such as of the costs of mailing and benefit of selling insurance policies. To approximate this problem, we want to find the set of 800 customers in the test set of 4000 customers that contains the most caravan policy owners. For each solution submitted, the number of actual policyholders will be counted and this gives the score of a solution.

library(ISLR)

## PIE CHART FOR YES/ NO

## Purchase of caravan policy

a=(Caravan$Purchase)
carv.data=table(a)
carv.data
## a
##   No  Yes 
## 5474  348
colors=c("black","red")
col=colors
pie(carv.data,main = "customers of caravan policy",
    col=colors)
box()

OBSERVATION FOR PIE CHART

The piechart shows total number of customers who took Caravan policy. Total number of customer who took the policy is 348. 
                ## BARPLOT
                         
                         

customer subtype 2

a<-table(Caravan$MOSTYPE[Caravan$Purchase=="Yes"])
barplot(a,
        main = "caravan policy customers ",
        xlab=" customer subtype",
        ylab="purchase of caravan policy")
box()

OBSERVATION OF CUSTOMER TYPE

In the above barplot customers of varied subtype about 41 labels are taken .The customer belonging to subtype 8  (Middle class families) & subtype 33 (lower class with large families)are interested in buying the policy

AGE GROUP

a<-table(Caravan$MGEMLEEF[Caravan$Purchase=="Yes"])
barplot(a, legend.text = "Purchase of Caravan Policy",
        main = "THE AGE GROUP",
        xlab="age",
        ylab="purchase")
box()

OBSERVATION FOR AGE GROUP

In the above barplot customer with varied age group is taken and it is plotted against the customers who have said yes to buy caravan   policy. The customers belonging to age group of 3(40-50)are the most interested in buying the caravan policy
library(ISLR)

## BAR CHARTS (TYPES OF CUSTOMERS WHO PURCHASED CARAVAN POLICY)

## Number of boat policies

a<-table(Caravan$APLEZIER[Caravan$Purchase=="Yes"])

OBSERVATION OF CUSTOMER TYPE

In the above barplot, We come to know that the Customers having no boat policy are interested in buying the Caravan policy

## Number of social security insurance policies

a<-table(Caravan$ABYSTAND[Caravan$Purchase=="Yes"])

OBSERVATION OF CUSTOMER TYPE

In the above barplot, We come to know that the Customers having no social security insurance policy are interested in buying the Caravan policy

## PIE CHARTS (TYPES OF CUSTOMERS WHO PURCHASED CARAVAN POLICY)

## Contribution car policies

a<-table(Caravan$PPERSAUT[Caravan$Purchase=="Yes"])
carv.data=table(a)
carv.data
## a
##  14  72 262 
##   1   1   1
colors=c("green","red","blue")
col=colors