This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

CampaignData.csv inludes the data of a Direct Marketing campaign of a European Banking institution. The marketing campaigns were conducted through phone calls. Often, more than one contact to the same client was required, in order to access if the product subscribed (bank term deposit) would be (‘yes’) or (‘no’)

A brief analysis has been performed on the data procured and documented as an R Markdown document.

Loading and processing Dataset

myy <- read.table("New.txt", header = TRUE, sep = ";")

m1 <- data.frame(myy)

m1$ID<-seq.int(nrow(m1)) ## Adding a new column ID

m2 <- m1[c(22,1:21)] ## Re-positioning the column ID to the beginning

summary(m2)
##        ID             age                 job            marital     
##  Min.   :    1   Min.   :17.00   admin.     :10422   divorced: 4612  
##  1st Qu.:10298   1st Qu.:32.00   blue-collar: 9254   married :24928  
##  Median :20595   Median :38.00   technician : 6743   single  :11568  
##  Mean   :20595   Mean   :40.02   services   : 3969   unknown :   80  
##  3rd Qu.:30891   3rd Qu.:47.00   management : 2924                   
##  Max.   :41188   Max.   :98.00   retired    : 1720                   
##                                  (Other)    : 6156                   
##                education        default         housing     
##  university.degree  :12168   no     :32588   no     :18622  
##  high.school        : 9515   unknown: 8597   unknown:  990  
##  basic.9y           : 6045   yes    :    3   yes    :21576  
##  professional.course: 5243                                  
##  basic.4y           : 4176                                  
##  basic.6y           : 2292                                  
##  (Other)            : 1749                                  
##       loan            contact          month       day_of_week
##  no     :33950   cellular :26144   may    :13769   fri:7827   
##  unknown:  990   telephone:15044   jul    : 7174   mon:8514   
##  yes    : 6248                     aug    : 6178   thu:8623   
##                                    jun    : 5318   tue:8090   
##                                    nov    : 4101   wed:8134   
##                                    apr    : 2632              
##                                    (Other): 2016              
##     duration         campaign          pdays          previous    
##  Min.   :   0.0   Min.   : 1.000   Min.   :  0.0   Min.   :0.000  
##  1st Qu.: 102.0   1st Qu.: 1.000   1st Qu.:999.0   1st Qu.:0.000  
##  Median : 180.0   Median : 2.000   Median :999.0   Median :0.000  
##  Mean   : 258.3   Mean   : 2.568   Mean   :962.5   Mean   :0.173  
##  3rd Qu.: 319.0   3rd Qu.: 3.000   3rd Qu.:999.0   3rd Qu.:0.000  
##  Max.   :4918.0   Max.   :56.000   Max.   :999.0   Max.   :7.000  
##                                                                   
##         poutcome      emp.var.rate      cons.price.idx  cons.conf.idx  
##  failure    : 4252   Min.   :-3.40000   Min.   :92.20   Min.   :-50.8  
##  nonexistent:35563   1st Qu.:-1.80000   1st Qu.:93.08   1st Qu.:-42.7  
##  success    : 1373   Median : 1.10000   Median :93.75   Median :-41.8  
##                      Mean   : 0.08189   Mean   :93.58   Mean   :-40.5  
##                      3rd Qu.: 1.40000   3rd Qu.:93.99   3rd Qu.:-36.4  
##                      Max.   : 1.40000   Max.   :94.77   Max.   :-26.9  
##                                                                        
##    euribor3m      nr.employed   response   
##  Min.   :0.634   Min.   :4964   no :36548  
##  1st Qu.:1.344   1st Qu.:5099   yes: 4640  
##  Median :4.857   Median :5191              
##  Mean   :3.621   Mean   :5167              
##  3rd Qu.:4.961   3rd Qu.:5228              
##  Max.   :5.045   Max.   :5228              
## 

Distribution of Age w.r.t the positive response

The column age.group includes the age of all the clients contacted for the term deposit. Now, let’s map the distribution of age of the clients over the positive response.

library(data.table); library(ggplot2); library(plyr);

m2$age.group <- findInterval(m1$age, c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100))

new.m<- table(m2$age.group, m2$response)

new.m1 = as.data.frame.matrix(new.m)

new.m1$ID<-seq.int(nrow(new.m1))

new.m1 <- new.m1[c(3,1:2)]

mm <- melt(new.m1[,c('ID','no','yes')],id.vars = 1)

new.m1$age.group <- c("10-19","20-29","30-39","40-49", "50-59", "60-69", "70-79", "80-89", "90-99")

new.m1 <- new.m1[c(1,4,2,3)]

print(new.m1[, 1:2])
##   ID age.group
## 1  1     10-19
## 2  2     20-29
## 3  3     30-39
## 4  4     40-49
## 5  5     50-59
## 6  6     60-69
## 7  7     70-79
## 8  8     80-89
## 9  9     90-99
ggplot(mm,aes(x = ID,y = value, fill = variable)) + 
           geom_bar(stat="identity", position = "dodge") + scale_x_continuous(breaks= mm$ID)

The above plot shows the distribution of Age w.r.t the responses attained. Let’s plot a pie chart for more clearer view at the age of the subscribers.

piepercent<- round(100*new.m1$yes/sum(new.m1$yes), 1)

piepercent <- paste(piepercent, "%")

pie(new.m1$yes, labels = piepercent, main = "Distribution of age wrt positive response", col = rainbow(length(new.m1$yes)))
legend("bottomleft", c("10-19","20-29","30-39","40-49", "50-59", "60-69", "70-79", "80-89", "90-99"), cex = 1.0,
      fill = rainbow(length(new.m1$yes)))

From the above plot we could clearly see which Age-Group has been subscribed for the term deposit which is 30-39 age group which is nearly 37 % of the whole subscriptions.

New clients who subscribed to the term deposit

The dataset includes the responses of more than 40000 clients. The Columns default, housing and loan includes the responses whether the clients is already a part of the credit plan or not.

Let’s analyse how many new clients have subscribed to the term deposit.

for(i in m2[,1])
{
    if(m2$default[i] == "no")
    {
      if(m2$housing[i] == "no")
      {
        if(m2$loan[i] == "no")
         {
          m2$new.user[i] = "yes"
        }
      }
      else
        m2$new.user[i] = "no"
    }
  else
    m2$new.user[i] = "no"
}

table(m2$new.user)
## 
##    no   yes 
## 26574 14614

The results shows that out of the total subscribers, only 14614 clients are the fresh subscribers who doesn’t have housing loan, credit default or anyother person loan.

Previous outcome w.r.t the positive response

The clients have been previously contacted or not, If yes what is outcome has been recorded as the poutcome column of the dataset. Here we are going to plot the result of the previous outcome of the clients w.r.t the outcome of the response of the same user for the term deposit scheme.

p2 <- table(m2$poutcome, m2$response)
p2 = as.data.frame.matrix(p2)

setDT(p2, keep.rownames = TRUE)[]
##             rn    no  yes
## 1:     failure  3647  605
## 2: nonexistent 32422 3141
## 3:     success   479  894
colnames(p2)[1] <- "outcome"

piepercent11<- round(100*p2$yes/sum(p2$yes), 1)

piepercent11 <- paste(piepercent11, "%")

pie(p2$yes, labels = piepercent11, main = "Previous outcome wrt positive response", col = rainbow(length(p2$yes)))
legend("topright", c(p2$outcome), cex = 1.0, fill = rainbow(length(p2$yes)))

The plot shows the Success rate of the Term deposit campaign w.r.t. the outcome of the previous campaign.

Profession of the Subscibers

Variation of positive response w.r.t the count of the profession of the term subscribers.

j2 <- table(m1$job, m1$response)

j2 = as.data.frame.matrix(j2)

setDT(j2, keep.rownames = TRUE)[]
##                rn   no  yes
##  1:        admin. 9070 1352
##  2:   blue-collar 8616  638
##  3:  entrepreneur 1332  124
##  4:     housemaid  954  106
##  5:    management 2596  328
##  6:       retired 1286  434
##  7: self-employed 1272  149
##  8:      services 3646  323
##  9:       student  600  275
## 10:    technician 6013  730
## 11:    unemployed  870  144
## 12:       unknown  293   37
colnames(j2)[1] <- "job"

barplot(j2$yes, names.arg=j2$job, horiz=TRUE, las=1, cex.names=0.6,  border=TRUE, xlab = "Count", main = "Profession wrt positive response", axes = "TRUE")

Subcription rate Month

n2 <- table(m1$month, m1$response)

n2 = as.data.frame.matrix(n2)

setDT(n2, keep.rownames = TRUE)[]
##      rn    no yes
##  1: apr  2093 539
##  2: aug  5523 655
##  3: dec    93  89
##  4: jul  6525 649
##  5: jun  4759 559
##  6: mar   270 276
##  7: may 12883 886
##  8: nov  3685 416
##  9: oct   403 315
## 10: sep   314 256
colnames(n2)[1] <- "month"

nn2 <- melt(n2[,c('month','no','yes')],id.vars = 1)

piepercent1<- round(100*n2$yes/sum(n2$yes), 1)

piepercent1 <- paste(piepercent1, "%")

pie(n2$yes, labels = piepercent1, main = "Distribution of months wrt positive response", col = rainbow(length(n2$yes)))
legend("topright", c(n2$month), cex = 1.0, fill = rainbow(length(n2$yes)))

Variation of monthly subscriptions with positive response on the campaign.

Other Relevant Obsevations

table(m2$housing, m2$response)  ## Shows how many clients have housing loan and has subscribed for the term deposit
##          
##              no   yes
##   no      16596  2026
##   unknown   883   107
##   yes     19069  2507
table(m2$loan, m2$response)   ## Shows how many clients have loan and has subscribed for the term deposit
##          
##              no   yes
##   no      30100  3850
##   unknown   883   107
##   yes      5565   683
table(m2$default, m2$response) ## Shows how many clients have credit in default and has subscribed for the term deposit
##          
##              no   yes
##   no      28391  4197
##   unknown  8154   443
##   yes         3     0

Complete Analysis has been performed using R studio.