1. Introduction
In the business world, especially consumer data, it is essential to carry out an analysis of consumer behavior to know insights into consumer behavior so that it can accelerate the growth in the number of consumers. The data used is from a bank in Portugal, where the data is obtained to determine whether the consumer will make a loan. Data comes from https://archive.ics.uci.edu/ml/datasets/bank+marketing.
2. Data Processing
2.1 Import data
bank <-read.csv2("bank.csv")
2.2 View the first 10 rows
head(bank, 10)
## age job marital education default balance housing loan contact
## 1 30 unemployed married primary no 1787 no no cellular
## 2 33 services married secondary no 4789 yes yes cellular
## 3 35 management single tertiary no 1350 yes no cellular
## 4 30 management married tertiary no 1476 yes yes unknown
## 5 59 blue-collar married secondary no 0 yes no unknown
## 6 35 management single tertiary no 747 no no cellular
## 7 36 self-employed married tertiary no 307 yes no cellular
## 8 39 technician married secondary no 147 yes no cellular
## 9 41 entrepreneur married tertiary no 221 yes no unknown
## 10 43 services married primary no -88 yes yes cellular
## day month duration campaign pdays previous poutcome y
## 1 19 oct 79 1 -1 0 unknown no
## 2 11 may 220 1 339 4 failure no
## 3 16 apr 185 1 330 1 failure no
## 4 3 jun 199 4 -1 0 unknown no
## 5 5 may 226 1 -1 0 unknown no
## 6 23 feb 141 2 176 3 failure no
## 7 14 may 341 1 330 2 other no
## 8 6 may 151 2 -1 0 unknown no
## 9 14 may 57 2 -1 0 unknown no
## 10 17 apr 313 1 147 2 failure no
2.3 View the last 10 rows
tail(bank, 10)
## age job marital education default balance housing loan contact
## 4512 46 blue-collar married secondary no 668 yes no unknown
## 4513 40 blue-collar married secondary no 1100 yes no unknown
## 4514 49 blue-collar married secondary no 322 no no cellular
## 4515 38 blue-collar married secondary no 1205 yes no cellular
## 4516 32 services single secondary no 473 yes no cellular
## 4517 33 services married secondary no -333 yes no cellular
## 4518 57 self-employed married tertiary yes -3313 yes yes unknown
## 4519 57 technician married secondary no 295 no no cellular
## 4520 28 blue-collar married secondary no 1137 no no cellular
## 4521 44 entrepreneur single tertiary no 1136 yes yes cellular
## day month duration campaign pdays previous poutcome y
## 4512 15 may 1263 2 -1 0 unknown yes
## 4513 29 may 660 2 -1 0 unknown no
## 4514 14 aug 356 2 -1 0 unknown no
## 4515 20 apr 45 4 153 1 failure no
## 4516 7 jul 624 5 -1 0 unknown no
## 4517 30 jul 329 5 -1 0 unknown no
## 4518 9 may 153 1 -1 0 unknown no
## 4519 19 aug 151 11 -1 0 unknown no
## 4520 6 feb 129 4 211 3 other no
## 4521 3 apr 345 2 249 7 other no
Explanation of each column in the bank data:
- age (numeric)
- job : type of job (categorical: “admin.”,“unknown”,“unemployed”,“management”,“housemaid”,“entrepreneur”,“student”,“blue-collar”,“self-employed”,“retired”,“technician”,“services”)
- marital : marital status (categorical: “married”,“divorced”,“single”; note: “divorced” means divorced or widowed)
- education (categorical: “unknown”,“secondary”,“primary”,“tertiary”)
- default: has credit in default? (binary: “yes”,“no”)
- balance: average yearly balance, in euros (numeric)
- housing: has housing loan? (binary: “yes”,“no”)
- loan: has personal loan? (binary: “yes”,“no”), related with the last contact of the current campaign:
- contact: contact communication type (categorical: “unknown”,“telephone”,“cellular”)
- day: last contact day of the month (numeric)
- month: last contact month of year (categorical: “jan”, “feb”, “mar”, …, “nov”, “dec”)
- duration: last contact duration, in seconds (numeric)
- campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
- pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)
- previous: number of contacts performed before this campaign and for this client (numeric)
- poutcome: outcome of the previous marketing campaign (categorical: “unknown”,“other”,“failure”,“success”)
- y - has the client subscribed a term deposit? (binary: “yes”,“no”)
2.4 Change the appropriate data type
names <- c('job' ,'marital', "education", "default", "housing", "loan", "contact", "month", "poutcome", "y")
bank[,names] <- lapply(bank[,names] , factor)
str(bank)
## 'data.frame': 4521 obs. of 17 variables:
## $ age : int 30 33 35 30 59 35 36 39 41 43 ...
## $ job : Factor w/ 12 levels "admin.","blue-collar",..: 11 8 5 5 2 5 7 10 3 8 ...
## $ marital : Factor w/ 3 levels "divorced","married",..: 2 2 3 2 2 3 2 2 2 2 ...
## $ education: Factor w/ 4 levels "primary","secondary",..: 1 2 3 3 2 3 3 2 3 1 ...
## $ default : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ balance : int 1787 4789 1350 1476 0 747 307 147 221 -88 ...
## $ housing : Factor w/ 2 levels "no","yes": 1 2 2 2 2 1 2 2 2 2 ...
## $ loan : Factor w/ 2 levels "no","yes": 1 2 1 2 1 1 1 1 1 2 ...
## $ contact : Factor w/ 3 levels "cellular","telephone",..: 1 1 1 3 3 1 1 1 3 1 ...
## $ day : int 19 11 16 3 5 23 14 6 14 17 ...
## $ month : Factor w/ 12 levels "apr","aug","dec",..: 11 9 1 7 9 4 9 9 9 1 ...
## $ duration : int 79 220 185 199 226 141 341 151 57 313 ...
## $ campaign : int 1 1 1 4 1 2 1 2 2 1 ...
## $ pdays : int -1 339 330 -1 -1 176 330 -1 -1 147 ...
## $ previous : int 0 4 1 0 0 3 2 0 0 2 ...
## $ poutcome : Factor w/ 4 levels "failure","other",..: 4 1 1 4 4 1 2 4 4 1 ...
## $ y : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
2.5 Checks whether there is missing value in the data frame
is.null(bank)
## [1] FALSE
2.6 View data dimensions
dim(bank)
## [1] 4521 17
From the steps taken, it can be seen that there are 4521 rows with 17 columns without any null data in the dataset.
3. Exploratory Data Analysis
Summary Data
summary(bank)
## age job marital education default
## Min. :19.00 management :969 divorced: 528 primary : 678 no :4445
## 1st Qu.:33.00 blue-collar:946 married :2797 secondary:2306 yes: 76
## Median :39.00 technician :768 single :1196 tertiary :1350
## Mean :41.17 admin. :478 unknown : 187
## 3rd Qu.:49.00 services :417
## Max. :87.00 retired :230
## (Other) :713
## balance housing loan contact day
## Min. :-3313 no :1962 no :3830 cellular :2896 Min. : 1.00
## 1st Qu.: 69 yes:2559 yes: 691 telephone: 301 1st Qu.: 9.00
## Median : 444 unknown :1324 Median :16.00
## Mean : 1423 Mean :15.92
## 3rd Qu.: 1480 3rd Qu.:21.00
## Max. :71188 Max. :31.00
##
## month duration campaign pdays
## may :1398 Min. : 4 Min. : 1.000 Min. : -1.00
## jul : 706 1st Qu.: 104 1st Qu.: 1.000 1st Qu.: -1.00
## aug : 633 Median : 185 Median : 2.000 Median : -1.00
## jun : 531 Mean : 264 Mean : 2.794 Mean : 39.77
## nov : 389 3rd Qu.: 329 3rd Qu.: 3.000 3rd Qu.: -1.00
## apr : 293 Max. :3025 Max. :50.000 Max. :871.00
## (Other): 571
## previous poutcome y
## Min. : 0.0000 failure: 490 no :4000
## 1st Qu.: 0.0000 other : 197 yes: 521
## Median : 0.0000 success: 129
## Mean : 0.5426 unknown:3705
## 3rd Qu.: 0.0000
## Max. :25.0000
##
Insight:
- The minimum consumer age is 19 years and the maximum age is 87 years, with an average age of 41 years
- Most consumers come from management, blue-collar and technician workers
- The marital status of most consumers is from those who are married, single, then divorced
- Most consumer education comes from secondary or junior high school
- The average consumer cash per year is 1423 Euro
- Of all the consumers, the largest number of homeowners was 2559 people and those who did not have a home either lived in their parent’s house or rented out was 1962
- from the load column, it is concluded that most consumers do not/do not have loans
- The most used contact by consumers is mobile
- There is a minus balance
Amount of balance based on job & education
xtabs(balance~job+education,bank)
## education
## job primary secondary tertiary unknown
## admin. 6640 498984 53718 27038
## blue-collar 395647 559942 28626 42348
## entrepreneur 9982 74018 188771 3610
## housemaid 103005 56333 52636 21412
## management 104731 145012 1397982 64429
## retired 219568 219355 76779 17712
## self-employed 22076 88506 142205 2024
## services 27683 362595 30318 39754
## student 3575 75690 22338 28078
## technician 38895 599876 344274 39160
## unemployed 22703 69711 39193 7839
## unknown 2522 9832 19982 24729
How long it takes customers to subscribed a term deposit ?
bank_y <- (bank[bank$y == "yes",])
aggregate(x = duration ~ y + contact, data = bank_y, FUN = "mean")
## y contact duration
## 1 yes cellular 513.4495
## 2 yes telephone 498.2273
## 3 yes unknown 860.0328
What is the percentage of marketing campaign success?
prop.table(table(bank$poutcome))
##
## failure other success unknown
## 0.10838310 0.04357443 0.02853351 0.81950896
Insight: The percentage of success is only 0.028% and the percentage of failure is 0.108%
What is the percentage of marketing campaign results based on education?
prop.table(table(bank$education, bank$poutcome), margin = 2)
##
## failure other success unknown
## primary 0.12448980 0.11167513 0.11627907 0.15654521
## secondary 0.51632653 0.51776650 0.46511628 0.51039136
## tertiary 0.32040816 0.32994924 0.34108527 0.29257760
## unknown 0.03877551 0.04060914 0.07751938 0.04048583
Insight: The most successful marketing campaigns were carried out for consumers with secondary education but secondary was also the number of marketing that failed the most.
What is the percentage of marketing campaign results based on job?
prop.table(xtabs(formula = balance ~ job + poutcome, data = bank), margin = 2)
## poutcome
## job failure other success unknown
## admin. 0.080488710 0.087284986 0.156071801 0.089867731
## blue-collar 0.150851805 0.100847763 0.103088192 0.167018790
## entrepreneur 0.015238057 0.014973933 0.015524468 0.050255562
## housemaid 0.059484264 0.006578267 0.019075531 0.035102233
## management 0.333544697 0.341232481 0.140646747 0.257610333
## retired 0.089162490 0.035699395 0.145183995 0.081476851
## self-employed 0.014916668 0.091379476 0.010836110 0.042094174
## services 0.080619003 0.085374936 0.042978598 0.070793998
## student 0.024379651 0.102793447 0.026161750 0.014646838
## technician 0.141986929 0.091960331 0.321007341 0.157297093
## unemployed 0.008197281 0.034013848 0.015377335 0.023445463
## unknown 0.001130445 0.007861137 0.004048132 0.010390933
Insight: The most successful marketing campaigns were conducted for consumers with technician jobs
Conslusion
From bank data in Portugal, there are 4521 customers and 17 variables related to customer behavior. The percentage of successful marketing campaigns that have been carried out is 0.028% with the most successful marketing campaigns being carried out for consumers with secondary education and technician jobs. Based on calculations, one of the factors that cause success is the duration of the call so by increasing engagement and call duration, customers will be more interested in subscribing to a term deposit.