Executive Summary

our abstract or what we learned from the case study. which model worked best for predicting our solution.

This data set is correlated directly with marketing campaigns that were conducted by the Portuguese Banking institution. These model below show th correlation between the different variable tracked in the campaign. In this case study we look to improve the efficiency of the marketing campaign by defining the main factors that may affect the success of the campaign.

The Problem

Introduction/Background to the case Why this case study is important *What was the hypothesis of the case was there a positive correlation, negative correlation, etc. 1/2 a page

Methodology

What are the variables that we are dealing with in the case. What types of variables are they continuous, numerical, categorical. What type of sampling techniques we used. Was this full full data set or just a sample of the data. Talk about the assumptions and limitations to the model that we chose to use.

Variables related to bank client data:

Age: Client’s age. Job: Client’s type of job. Marital: Client’s marital status, divorced means divorced or widowed. Education: Client’s education. Default: Client has previosly defaulted. Housing: Client has a housing loan. Loan: Client has a personal loan.

Variables related to last contact of the current marketing campaign:

Contact: Contact communication type (telephone or cellular). Month: Last contact month of year. day_of_week: Last contact day of week. duration: Last contact duration in seconds. If duration is 0s, then we never contacted a client to sign up for a term deposit account. Pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted) Previous: number of contacts performed before this campaign and for this client (numeric) Poutcome: outcome of the previous marketing campaign (categorical: ‘failure’,‘nonexistent’,‘success’)

Social and economic context attributes:

Emp.var.rate: employment variation rate - quarterly indicator (numeric) Cons.price.idx: consumer price index - monthly indicator (numeric) Cons.conf.idx: consumer confidence index - monthly indicator (numeric) Euribor3m: euribor 3 month rate - daily indicator (numeric) Nr.employed: number of employees - quarterly indicator (numeric)

Output variable (desired target):

y - has the client subscribed a term deposit? (binary: ‘yes’, ‘no’)

1-2 pages

Loading in library

library(titanic)
library(caret)
## Warning in register(): Can't find generic `scale_type` in package ggplot2 to
## register S3 method.
library(lattice)
library(ggplot2)
library(gam)
library(car)
library(ROCR)
library(ggmosaic)
b1 = read.csv("/Users/farre/Documents/data_set/bank-additional.csv", sep = ";")

Reading in data

head(b1)
##   age         job marital         education default housing    loan   contact
## 1  30 blue-collar married          basic.9y      no     yes      no  cellular
## 2  39    services  single       high.school      no      no      no telephone
## 3  25    services married       high.school      no     yes      no telephone
## 4  38    services married          basic.9y      no unknown unknown telephone
## 5  47      admin. married university.degree      no     yes      no  cellular
## 6  32    services  single university.degree      no      no      no  cellular
##   month day_of_week duration campaign pdays previous    poutcome emp.var.rate
## 1   may         fri      487        2   999        0 nonexistent         -1.8
## 2   may         fri      346        4   999        0 nonexistent          1.1
## 3   jun         wed      227        1   999        0 nonexistent          1.4
## 4   jun         fri       17        3   999        0 nonexistent          1.4
## 5   nov         mon       58        1   999        0 nonexistent         -0.1
## 6   sep         thu      128        3   999        2     failure         -1.1
##   cons.price.idx cons.conf.idx euribor3m nr.employed  y
## 1         92.893         -46.2     1.313      5099.1 no
## 2         93.994         -36.4     4.855      5191.0 no
## 3         94.465         -41.8     4.962      5228.1 no
## 4         94.465         -41.8     4.959      5228.1 no
## 5         93.200         -42.0     4.191      5195.8 no
## 6         94.199         -37.5     0.884      4963.6 no

Changing character variables to factors

b1$job = as.factor(b1$job)
b1$marital = as.factor(b1$marital)
b1$education = as.factor(b1$education)
b1$default = as.factor(b1$default)
b1$housing = as.factor(b1$housing)
b1$loan = as.factor(b1$loan)
b1$contact = as.factor(b1$contact)
b1$month = as.factor(b1$month)
b1$day_of_week = as.factor(b1$day_of_week)
b1$poutcome = as.factor(b1$poutcome)
b1$y = as.factor(b1$y)

str(b1)
## 'data.frame':    4119 obs. of  21 variables:
##  $ age           : int  30 39 25 38 47 32 32 41 31 35 ...
##  $ job           : Factor w/ 12 levels "admin.","blue-collar",..: 2 8 8 8 1 8 1 3 8 2 ...
##  $ marital       : Factor w/ 4 levels "divorced","married",..: 2 3 2 2 2 3 3 2 1 2 ...
##  $ education     : Factor w/ 8 levels "basic.4y","basic.6y",..: 3 4 4 3 7 7 7 7 6 3 ...
##  $ default       : Factor w/ 3 levels "no","unknown",..: 1 1 1 1 1 1 1 2 1 2 ...
##  $ housing       : Factor w/ 3 levels "no","unknown",..: 3 1 3 2 3 1 3 3 1 1 ...
##  $ loan          : Factor w/ 3 levels "no","unknown",..: 1 1 1 2 1 1 1 1 1 1 ...
##  $ contact       : Factor w/ 2 levels "cellular","telephone": 1 2 2 2 1 1 1 1 1 2 ...
##  $ month         : Factor w/ 10 levels "apr","aug","dec",..: 7 7 5 5 8 10 10 8 8 7 ...
##  $ day_of_week   : Factor w/ 5 levels "fri","mon","thu",..: 1 1 5 1 2 3 2 2 4 3 ...
##  $ duration      : int  487 346 227 17 58 128 290 44 68 170 ...
##  $ campaign      : int  2 4 1 3 1 3 4 2 1 1 ...
##  $ pdays         : int  999 999 999 999 999 999 999 999 999 999 ...
##  $ previous      : int  0 0 0 0 0 2 0 0 1 0 ...
##  $ poutcome      : Factor w/ 3 levels "failure","nonexistent",..: 2 2 2 2 2 1 2 2 1 2 ...
##  $ emp.var.rate  : num  -1.8 1.1 1.4 1.4 -0.1 -1.1 -1.1 -0.1 -0.1 1.1 ...
##  $ cons.price.idx: num  92.9 94 94.5 94.5 93.2 ...
##  $ cons.conf.idx : num  -46.2 -36.4 -41.8 -41.8 -42 -37.5 -37.5 -42 -42 -36.4 ...
##  $ euribor3m     : num  1.31 4.86 4.96 4.96 4.19 ...
##  $ nr.employed   : num  5099 5191 5228 5228 5196 ...
##  $ y             : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

Checking summary of target variable

summary(b1$y)
##   no  yes 
## 3668  451

Removing any N/A variables from data

b2 = b1
b2 = subset(b2, !is.na(b2$age))
b2 = subset(b2, !is.na(b2$job))
b2 = subset(b2, !is.na(b2$marital))
b2 = subset(b2, !is.na(b2$education))
b2 = subset(b2, !is.na(b2$default))
b2 = subset(b2, !is.na(b2$housing))
b2 = subset(b2, !is.na(b2$loan))
b2 = subset(b2, !is.na(b2$contact))
b2 = subset(b2, !is.na(b2$month))
b2 = subset(b2, !is.na(b2$day_of_week))
b2 = subset(b2, !is.na(b2$duration))
b2 = subset(b2, !is.na(b2$campaign))
b2 = subset(b2, !is.na(b2$pdays))
b2 = subset(b2, !is.na(b2$previous))
b2 = subset(b2, !is.na(b2$poutcome))
b2 = subset(b2, !is.na(b2$emp.var.rate))
b2 = subset(b2, !is.na(b2$cons.price.idx))
b2 = subset(b2, !is.na(b2$cons.conf.idx))
b2 = subset(b2, !is.na(b2$euribor3m))
b2 = subset(b2, !is.na(b2$nr.employed))
b2 = subset(b2, !is.na(b2$y))

colSums(is.na(b2))
##            age            job        marital      education        default 
##              0              0              0              0              0 
##        housing           loan        contact          month    day_of_week 
##              0              0              0              0              0 
##       duration       campaign          pdays       previous       poutcome 
##              0              0              0              0              0 
##   emp.var.rate cons.price.idx  cons.conf.idx      euribor3m    nr.employed 
##              0              0              0              0              0 
##              y 
##              0

Histogram of yes or no counts for age

ggplot(data = b2, mapping = aes(x = age, fill = y)) +
  geom_histogram(binwidth = 10)+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 1)

Bar chart of yes and no deposits for education

ggplot(data = b2, mapping = aes(x = education, fill = y)) +
  geom_bar()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Job field yes and no count

ggplot(data = b2, mapping = aes(x = job, fill = y)) +
  geom_bar()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Martial status yes and no count

ggplot(data = b2, mapping = aes(x = marital, fill = y)) +
  geom_bar()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Default Variable yes and no count

ggplot(data = b2, mapping = aes(x = default, fill = y)) +
  geom_bar()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Housing yes and no counts

ggplot(data = b2, mapping = aes(x = housing, fill = y)) +
  geom_bar()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Loan variable yes and no counts

ggplot(data = b2, mapping = aes(x = loan, fill = y)) +
  geom_bar()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Contact variable yes and no counts

ggplot(data = b2, mapping = aes(x = contact, fill = y)) +
  geom_bar()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Month variable yes and no counts

ggplot(data = b2, mapping = aes(x = month, fill = y)) +
  geom_bar()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Day of the week yes and no counts

ggplot(data = b2, mapping = aes(x = day_of_week, fill = y)) +
  geom_bar()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Duration variavle yes and no counts

ggplot(data = b2, mapping = aes(x = duration, fill = y)) +
  geom_histogram()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Pdays variable yes and no count

ggplot(data = b2, mapping = aes(x = pdays, fill = y)) +
  geom_histogram()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Previous variable yes and no count

ggplot(data = b2, mapping = aes(x = previous, fill = y)) +
  geom_bar()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Poutcome variable yes or no count

ggplot(data = b2, mapping = aes(x = poutcome, fill = y)) +
  geom_bar()+
  scale_fill_discrete(name = "Yes or no counts")+
  facet_wrap(~ y , nrow = 2)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))