our abstract or what we learned from the case study. which model worked best for predicting our solution.
This data set is correlated directly with marketing campaigns that were conducted by the Portuguese Banking institution. These model below show th correlation between the different variable tracked in the campaign. In this case study we look to improve the efficiency of the marketing campaign by defining the main factors that may affect the success of the campaign.
Introduction/Background to the case Why this case study is important *What was the hypothesis of the case was there a positive correlation, negative correlation, etc. 1/2 a page
What are the variables that we are dealing with in the case. What types of variables are they continuous, numerical, categorical. What type of sampling techniques we used. Was this full full data set or just a sample of the data. Talk about the assumptions and limitations to the model that we chose to use.
Variables related to bank client data:
Age: Client’s age. Job: Client’s type of job. Marital: Client’s marital status, divorced means divorced or widowed. Education: Client’s education. Default: Client has previosly defaulted. Housing: Client has a housing loan. Loan: Client has a personal loan.
Variables related to last contact of the current marketing campaign:
Contact: Contact communication type (telephone or cellular). Month: Last contact month of year. day_of_week: Last contact day of week. duration: Last contact duration in seconds. If duration is 0s, then we never contacted a client to sign up for a term deposit account. Pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted) Previous: number of contacts performed before this campaign and for this client (numeric) Poutcome: outcome of the previous marketing campaign (categorical: ‘failure’,‘nonexistent’,‘success’)
Social and economic context attributes:
Emp.var.rate: employment variation rate - quarterly indicator (numeric) Cons.price.idx: consumer price index - monthly indicator (numeric) Cons.conf.idx: consumer confidence index - monthly indicator (numeric) Euribor3m: euribor 3 month rate - daily indicator (numeric) Nr.employed: number of employees - quarterly indicator (numeric)
Output variable (desired target):
y - has the client subscribed a term deposit? (binary: ‘yes’, ‘no’)
1-2 pages
library(titanic)
library(caret)
## Warning in register(): Can't find generic `scale_type` in package ggplot2 to
## register S3 method.
library(lattice)
library(ggplot2)
library(gam)
library(car)
library(ROCR)
library(ggmosaic)
b1 = read.csv("/Users/farre/Documents/data_set/bank-additional.csv", sep = ";")
head(b1)
## age job marital education default housing loan contact
## 1 30 blue-collar married basic.9y no yes no cellular
## 2 39 services single high.school no no no telephone
## 3 25 services married high.school no yes no telephone
## 4 38 services married basic.9y no unknown unknown telephone
## 5 47 admin. married university.degree no yes no cellular
## 6 32 services single university.degree no no no cellular
## month day_of_week duration campaign pdays previous poutcome emp.var.rate
## 1 may fri 487 2 999 0 nonexistent -1.8
## 2 may fri 346 4 999 0 nonexistent 1.1
## 3 jun wed 227 1 999 0 nonexistent 1.4
## 4 jun fri 17 3 999 0 nonexistent 1.4
## 5 nov mon 58 1 999 0 nonexistent -0.1
## 6 sep thu 128 3 999 2 failure -1.1
## cons.price.idx cons.conf.idx euribor3m nr.employed y
## 1 92.893 -46.2 1.313 5099.1 no
## 2 93.994 -36.4 4.855 5191.0 no
## 3 94.465 -41.8 4.962 5228.1 no
## 4 94.465 -41.8 4.959 5228.1 no
## 5 93.200 -42.0 4.191 5195.8 no
## 6 94.199 -37.5 0.884 4963.6 no
b1$job = as.factor(b1$job)
b1$marital = as.factor(b1$marital)
b1$education = as.factor(b1$education)
b1$default = as.factor(b1$default)
b1$housing = as.factor(b1$housing)
b1$loan = as.factor(b1$loan)
b1$contact = as.factor(b1$contact)
b1$month = as.factor(b1$month)
b1$day_of_week = as.factor(b1$day_of_week)
b1$poutcome = as.factor(b1$poutcome)
b1$y = as.factor(b1$y)
str(b1)
## 'data.frame': 4119 obs. of 21 variables:
## $ age : int 30 39 25 38 47 32 32 41 31 35 ...
## $ job : Factor w/ 12 levels "admin.","blue-collar",..: 2 8 8 8 1 8 1 3 8 2 ...
## $ marital : Factor w/ 4 levels "divorced","married",..: 2 3 2 2 2 3 3 2 1 2 ...
## $ education : Factor w/ 8 levels "basic.4y","basic.6y",..: 3 4 4 3 7 7 7 7 6 3 ...
## $ default : Factor w/ 3 levels "no","unknown",..: 1 1 1 1 1 1 1 2 1 2 ...
## $ housing : Factor w/ 3 levels "no","unknown",..: 3 1 3 2 3 1 3 3 1 1 ...
## $ loan : Factor w/ 3 levels "no","unknown",..: 1 1 1 2 1 1 1 1 1 1 ...
## $ contact : Factor w/ 2 levels "cellular","telephone": 1 2 2 2 1 1 1 1 1 2 ...
## $ month : Factor w/ 10 levels "apr","aug","dec",..: 7 7 5 5 8 10 10 8 8 7 ...
## $ day_of_week : Factor w/ 5 levels "fri","mon","thu",..: 1 1 5 1 2 3 2 2 4 3 ...
## $ duration : int 487 346 227 17 58 128 290 44 68 170 ...
## $ campaign : int 2 4 1 3 1 3 4 2 1 1 ...
## $ pdays : int 999 999 999 999 999 999 999 999 999 999 ...
## $ previous : int 0 0 0 0 0 2 0 0 1 0 ...
## $ poutcome : Factor w/ 3 levels "failure","nonexistent",..: 2 2 2 2 2 1 2 2 1 2 ...
## $ emp.var.rate : num -1.8 1.1 1.4 1.4 -0.1 -1.1 -1.1 -0.1 -0.1 1.1 ...
## $ cons.price.idx: num 92.9 94 94.5 94.5 93.2 ...
## $ cons.conf.idx : num -46.2 -36.4 -41.8 -41.8 -42 -37.5 -37.5 -42 -42 -36.4 ...
## $ euribor3m : num 1.31 4.86 4.96 4.96 4.19 ...
## $ nr.employed : num 5099 5191 5228 5228 5196 ...
## $ y : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
summary(b1$y)
## no yes
## 3668 451
b2 = b1
b2 = subset(b2, !is.na(b2$age))
b2 = subset(b2, !is.na(b2$job))
b2 = subset(b2, !is.na(b2$marital))
b2 = subset(b2, !is.na(b2$education))
b2 = subset(b2, !is.na(b2$default))
b2 = subset(b2, !is.na(b2$housing))
b2 = subset(b2, !is.na(b2$loan))
b2 = subset(b2, !is.na(b2$contact))
b2 = subset(b2, !is.na(b2$month))
b2 = subset(b2, !is.na(b2$day_of_week))
b2 = subset(b2, !is.na(b2$duration))
b2 = subset(b2, !is.na(b2$campaign))
b2 = subset(b2, !is.na(b2$pdays))
b2 = subset(b2, !is.na(b2$previous))
b2 = subset(b2, !is.na(b2$poutcome))
b2 = subset(b2, !is.na(b2$emp.var.rate))
b2 = subset(b2, !is.na(b2$cons.price.idx))
b2 = subset(b2, !is.na(b2$cons.conf.idx))
b2 = subset(b2, !is.na(b2$euribor3m))
b2 = subset(b2, !is.na(b2$nr.employed))
b2 = subset(b2, !is.na(b2$y))
colSums(is.na(b2))
## age job marital education default
## 0 0 0 0 0
## housing loan contact month day_of_week
## 0 0 0 0 0
## duration campaign pdays previous poutcome
## 0 0 0 0 0
## emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed
## 0 0 0 0 0
## y
## 0
ggplot(data = b2, mapping = aes(x = age, fill = y)) +
geom_histogram(binwidth = 10)+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 1)
ggplot(data = b2, mapping = aes(x = education, fill = y)) +
geom_bar()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = b2, mapping = aes(x = job, fill = y)) +
geom_bar()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = b2, mapping = aes(x = marital, fill = y)) +
geom_bar()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = b2, mapping = aes(x = default, fill = y)) +
geom_bar()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = b2, mapping = aes(x = housing, fill = y)) +
geom_bar()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = b2, mapping = aes(x = loan, fill = y)) +
geom_bar()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = b2, mapping = aes(x = contact, fill = y)) +
geom_bar()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = b2, mapping = aes(x = month, fill = y)) +
geom_bar()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = b2, mapping = aes(x = day_of_week, fill = y)) +
geom_bar()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = b2, mapping = aes(x = duration, fill = y)) +
geom_histogram()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(data = b2, mapping = aes(x = pdays, fill = y)) +
geom_histogram()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(data = b2, mapping = aes(x = previous, fill = y)) +
geom_bar()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = b2, mapping = aes(x = poutcome, fill = y)) +
geom_bar()+
scale_fill_discrete(name = "Yes or no counts")+
facet_wrap(~ y , nrow = 2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))