Lipika Sharma, Anooja Mathew, Bhavy Shukla
-A Portuguese banking institute conducted marketing campaigns through phone calls. The data collected through these campaigns were used to determine if the customer will subscribe to a term deposit. -This data will be useful to understand if there is relationship between the education of an individual and their chances of subscribing for a term deposit . -The target feature in the dataset is denoted by ‘y’. -There are several other features like age, maratial status, previous campaign results etc that are also used to understand the customer behaviour. -In this analysis, we would only be taking into account the education factor of the customer to determine if it impacts the customer decision to subscribe to a term deposit. -The dataset consist of 41188 observations. However, it is an unbalanced dataset with around 90% observations related to customers who would not subscribe and the only the remaining 10% consisting of customers who would subscribe to the plan.
-To conduct a statistical study to analyse how much the education of an individual affects to the desicion of taking term deposit. -This analysis is going to be very useful for contacting potential customers and focusing on determining who are more likely to take term deposit.
-The Chi test is been done to find the answer of this problem statement.
The data representated in analysis is data for direct-marketing [tele marketing] of portuguese bank.
These data is sourced to be from 2014 on UCI Machine Learning Repository.
-Reference to dataset[https://archive.ics.uci.edu/ml/datasets/Bank+Marketing]
-The target column consist of the below are the categories. 1. Yes 2. No
-There were no data related issues like typos or null values that had to be rectified.
Bank_Marketing$y <- Bank_Marketing$y %>% factor(levels=c('yes','no'),ordered=TRUE)
Bank_Marketing$y %>% levels## [1] "yes" "no"
Bank_Marketing$education <- Bank_Marketing$education %>% factor(levels=c('basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown'),ordered=TRUE)
Bank_Marketing$education %>% levels## [1] "basic.4y" "basic.6y" "basic.9y"
## [4] "high.school" "illiterate" "professional.course"
## [7] "university.degree" "unknown"
target_analysis <- Bank_Marketing$y %>% summary
education_analysis <- Bank_Marketing$education %>% summary
knitr::kable(target_analysis)| x | |
|---|---|
| yes | 5289 |
| no | 39922 |
knitr::kable(education_analysis)| x | |
|---|---|
| basic.4y | 0 |
| basic.6y | 0 |
| basic.9y | 0 |
| high.school | 0 |
| illiterate | 0 |
| professional.course | 0 |
| university.degree | 0 |
| unknown | 1857 |
| NA’s | 43354 |
library(RColorBrewer)
table <- table(Bank_Marketing$education, Bank_Marketing$y) %>% prop.table(margin = 2)
barplot(table,ylab="Proportion Within Group",
ylim=c(0,.5),legend=rownames(table),beside=TRUE,
args.legend=c(x = "top",horiz=TRUE,title="Age Category"),
xlab="Age Category", col = brewer.pal(5, name = "Blues"))The hypothesis assumed for this investigation is: H0: There is no association between the customers’s education and their chances of subscribing to a term deposit.
The alternate hypothesis assumed for this investigation is: HA: There is an association between the customers’s education and their chances of subscribing to a term deposit.
chi2 <- chisq.test(table(Bank_Marketing$education, Bank_Marketing$y))
chi2##
## Pearson's Chi-squared test
##
## data: table(Bank_Marketing$education, Bank_Marketing$y)
## X-squared = NaN, df = 7, p-value = NA
# Observed
chi2$observed##
## yes no
## basic.4y 0 0
## basic.6y 0 0
## basic.9y 0 0
## high.school 0 0
## illiterate 0 0
## professional.course 0 0
## university.degree 0 0
## unknown 252 1605
# Expected
chi2$expected##
## yes no
## basic.4y 0 0
## basic.6y 0 0
## basic.9y 0 0
## high.school 0 0
## illiterate 0 0
## professional.course 0 0
## university.degree 0 0
## unknown 252 1605
qchisq(p = .95,df = 6)## [1] 12.59159
pchisq(q = 92.462,df = 6,lower.tail = FALSE)## [1] 9.327188e-18
chi<-chisq.test(table(Bank_Marketing$education, Bank_Marketing$y))
chi$p.value## [1] NaN
-The above test helps us to conclude that the-value was less then 0.05 hence the H0 was rejected and it is proved that education has a statistically significant association with customer subscription.
-This analysis conclude that education is a factor that can be considered while campaigning for term deposits. Also, it can be statistically proved that individuals with higher education are more likely to plan these term deposits. -This would be insightful analysis for future marketing campaign and focusing on this aspect and make your compaign more accurate,successful and efficient.
The dataset was taken from the UCI repository: [https://archive.ics.uci.edu/ml/datasets/Bank+Marketing]