Investigation to predict customer subscription to a term deposit

Lipika Sharma, Anooja Mathew, Bhavy Shukla

Introduction

-A Portuguese banking institute conducted marketing campaigns through phone calls. The data collected through these campaigns were used to determine if the customer will subscribe to a term deposit. -This data will be useful to understand if there is relationship between the education of an individual and their chances of subscribing for a term deposit . -The target feature in the dataset is denoted by ‘y’. -There are several other features like age, maratial status, previous campaign results etc that are also used to understand the customer behaviour. -In this analysis, we would only be taking into account the education factor of the customer to determine if it impacts the customer decision to subscribe to a term deposit. -The dataset consist of 41188 observations. However, it is an unbalanced dataset with around 90% observations related to customers who would not subscribe and the only the remaining 10% consisting of customers who would subscribe to the plan.

Problem Statement

-To conduct a statistical study to analyse how much the education of an individual affects to the desicion of taking term deposit. -This analysis is going to be very useful for contacting potential customers and focusing on determining who are more likely to take term deposit.

-The Chi test is been done to find the answer of this problem statement.

Data

-Reference to dataset[https://archive.ics.uci.edu/ml/datasets/Bank+Marketing]

Data Cont.

-The target column consist of the below are the categories. 1. Yes 2. No

Descriptive Statistics and Visualisation

-There were no data related issues like typos or null values that had to be rectified.

Bank_Marketing$y <- Bank_Marketing$y %>% factor(levels=c('yes','no'),ordered=TRUE)
Bank_Marketing$y %>% levels
## [1] "yes" "no"
Bank_Marketing$education <- Bank_Marketing$education %>% factor(levels=c('basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown'),ordered=TRUE)
Bank_Marketing$education %>% levels
## [1] "basic.4y"            "basic.6y"            "basic.9y"           
## [4] "high.school"         "illiterate"          "professional.course"
## [7] "university.degree"   "unknown"

Decsriptive Statistics Cont.

target_analysis <- Bank_Marketing$y %>% summary
education_analysis <- Bank_Marketing$education %>% summary
knitr::kable(target_analysis)
x
yes 5289
no 39922
knitr::kable(education_analysis)
x
basic.4y 0
basic.6y 0
basic.9y 0
high.school 0
illiterate 0
professional.course 0
university.degree 0
unknown 1857
NA’s 43354
library(RColorBrewer)
table <- table(Bank_Marketing$education, Bank_Marketing$y) %>% prop.table(margin = 2)
barplot(table,ylab="Proportion Within Group",
          ylim=c(0,.5),legend=rownames(table),beside=TRUE,
          args.legend=c(x = "top",horiz=TRUE,title="Age Category"),
          xlab="Age Category", col = brewer.pal(5, name = "Blues"))

Hypothesis Testing

The hypothesis assumed for this investigation is: H0: There is no association between the customers’s education and their chances of subscribing to a term deposit.

The alternate hypothesis assumed for this investigation is: HA: There is an association between the customers’s education and their chances of subscribing to a term deposit.

Hypthesis Testing Cont.

chi2 <- chisq.test(table(Bank_Marketing$education, Bank_Marketing$y))
chi2
## 
##  Pearson's Chi-squared test
## 
## data:  table(Bank_Marketing$education, Bank_Marketing$y)
## X-squared = NaN, df = 7, p-value = NA
# Observed
chi2$observed
##                      
##                        yes   no
##   basic.4y               0    0
##   basic.6y               0    0
##   basic.9y               0    0
##   high.school            0    0
##   illiterate             0    0
##   professional.course    0    0
##   university.degree      0    0
##   unknown              252 1605
# Expected
chi2$expected
##                      
##                       yes   no
##   basic.4y              0    0
##   basic.6y              0    0
##   basic.9y              0    0
##   high.school           0    0
##   illiterate            0    0
##   professional.course   0    0
##   university.degree     0    0
##   unknown             252 1605
qchisq(p = .95,df = 6)
## [1] 12.59159
pchisq(q = 92.462,df = 6,lower.tail = FALSE)
## [1] 9.327188e-18
chi<-chisq.test(table(Bank_Marketing$education, Bank_Marketing$y))
chi$p.value
## [1] NaN

Discussion

-The above test helps us to conclude that the-value was less then 0.05 hence the H0 was rejected and it is proved that education has a statistically significant association with customer subscription.

-This analysis conclude that education is a factor that can be considered while campaigning for term deposits. Also, it can be statistically proved that individuals with higher education are more likely to plan these term deposits. -This would be insightful analysis for future marketing campaign and focusing on this aspect and make your compaign more accurate,successful and efficient.

References

The dataset was taken from the UCI repository: [https://archive.ics.uci.edu/ml/datasets/Bank+Marketing]