About

Column

About this dashboard

The dataset contains information about marketing campaigns that were conducted via phone calls from a Portuguese banking institution to their clients. Purpose of these campaigns is to prompt their clients to subscribe for a specific financial product of the bank (term deposit). After each call was conducted, the client had to inform the institution about their intention of either subscribing to the product (indicating a successful campaign) or not (unsucessful campaign). The final output of this survey will be a binary result indicating if the client subscribed (‘yes’) to the product or not (‘no’).

The dataset has 41188 rows (instances of calls to clients) and 21 columns (variables) which are describing certain aspects of the call. Please note that there are cases where the same client was contacted multiple times - something that practically doesn’t affect the analysis as each call will be considered independent from another even if the client is the same.

In the StoryBoard pages we will see how successful the campaign was - plus, some other insights into the data so we can get a better understanding of what lies within.

Column

Dataset Content

1. Variables that describing attributes related directly to the client:

1. age
1. job: type of job (e.g. ‘admin’, ‘technician’, ‘unemployed’, etc)
1. marital: marital status (‘married’, ‘single’, ‘divorced’, ‘unknown’)
1. education: level of education (‘basic.4y’, ‘high.school’, ‘basic.6y’, ‘basic.9y’,‘professional.course’, ‘unknown’,‘university.degree’,‘illiterate’)
1. default: if the client has credit in default (‘no’, ‘unknown’, ‘yes’)
1. housing: if the client has housing a loan (‘no’, ‘unknown’, ‘yes’)
1. loan: if the client has a personal loan ? (‘no’, ‘unknown’, ‘yes’)

2. Variables related to the last contact of the current campaign:

1. contact: type of communication (‘telephone’, ‘cellular’)
1. month: month of last contact
1. day_of_week: day of last contact
1. duration: call duration (in seconds)

3. Other variables related to the campaign(s):

1. campaign: number of contacts performed during this campaign and for this client
1. pdays: number of days passed by after the client was last contacted from a previous campaign
1. previous: number of contacts performed before this campaign and for this client
1. poutcome: outcome of previous marketing campaign (‘nonexistent’, ‘failure’, ‘success’)

4. Socioeconomic variables:

1. emp.var.rate: employement variation rate - quarterly indicator
1. cons.price.idx: consumer price index - monthly indicator
1. cons.conf.idx: consumer confidence index - monthly indicator
1. euribor3m: euribor 3 month rate - daily indicator
1. nr.employed: number of employees - quarterly indicator

Storyboard

Subscribe success by month & marital status

Note that May, June, July and November are successful periods to get people to subscribe. We can also deduce that the “married” marital status is more likely to get people to subscribe followed by single then divorced. Another clear indicator is that people who do not subscribe irrespective of marital status keep the call duration to a minimum.

Chart Helper

The red line is count of non subscribed and the blue line is count of subscribed (by month, and marital status).
The Boxplot is a statistical summary of call duration in seconds by month, maritaL status and subscribed (yes or no)

Histogram - Age (bin width = 10)

Here we can clearly see that the 20 to 30 years old single are the most likely to subscribe - followed by 20 to 40 years old married

Percent of subscribed success

Decision tree

Heat map for Jobs v Marital status that have subscribed

Subscribed scatter plot - areas denoteing marital status and age

Subscribed over time by marital status and by call duration

Polar chart

Column

Call duration by jobs and subscribed

Column

Call duration by education and subscribed

Subscribed / Job - Education - percentages by all, column and row

## Row

Job percentages by marital status - all %

           job      divorced        married         single     unknown
        admin. 132 (2.8448%) 652 (14.0517%) 566 (12.1983%) 2 (0.0431%)
   blue-collar  53 (1.1422%) 421  (9.0733%) 161  (3.4698%) 3 (0.0647%)
  entrepreneur  14 (0.3017%)  88  (1.8966%)  21  (0.4526%) 1 (0.0216%)
     housemaid  16 (0.3448%)  74  (1.5948%)  16  (0.3448%) 0 (0.0000%)
    management  39 (0.8405%) 226  (4.8707%)  63  (1.3578%) 0 (0.0000%)
       retired  92 (1.9828%) 329  (7.0905%)  12  (0.2586%) 1 (0.0216%)
 self-employed  16 (0.3448%)  82  (1.7672%)  51  (1.0991%) 0 (0.0000%)
      services  33 (0.7112%) 166  (3.5776%) 124  (2.6724%) 0 (0.0000%)
       student   3 (0.0647%)   8  (0.1724%) 264  (5.6897%) 0 (0.0000%)
    technician  65 (1.4009%) 384  (8.2759%) 279  (6.0129%) 2 (0.0431%)
    unemployed  10 (0.2155%)  86  (1.8534%)  48  (1.0345%) 0 (0.0000%)
       unknown   3 (0.0647%)  16  (0.3448%)  15  (0.3233%) 3 (0.0647%)

Job percentages by marital status - column %

           job       divorced        married         single      unknown
        admin. 132 (27.7311%) 652 (25.7504%) 566 (34.9383%) 2 (16.6667%)
   blue-collar  53 (11.1345%) 421 (16.6272%) 161  (9.9383%) 3 (25.0000%)
  entrepreneur  14  (2.9412%)  88  (3.4755%)  21  (1.2963%) 1  (8.3333%)
     housemaid  16  (3.3613%)  74  (2.9226%)  16  (0.9877%) 0  (0.0000%)
    management  39  (8.1933%) 226  (8.9258%)  63  (3.8889%) 0  (0.0000%)
       retired  92 (19.3277%) 329 (12.9937%)  12  (0.7407%) 1  (8.3333%)
 self-employed  16  (3.3613%)  82  (3.2385%)  51  (3.1481%) 0  (0.0000%)
      services  33  (6.9328%) 166  (6.5561%) 124  (7.6543%) 0  (0.0000%)
       student   3  (0.6303%)   8  (0.3160%) 264 (16.2963%) 0  (0.0000%)
    technician  65 (13.6555%) 384 (15.1659%) 279 (17.2222%) 2 (16.6667%)
    unemployed  10  (2.1008%)  86  (3.3965%)  48  (2.9630%) 0  (0.0000%)
       unknown   3  (0.6303%)  16  (0.6319%)  15  (0.9259%) 3 (25.0000%)

Job percentages by marital status - row %

           job       divorced        married         single     unknown
        admin. 132  (9.7633%) 652 (48.2249%) 566 (41.8639%) 2 (0.1479%)
   blue-collar  53  (8.3072%) 421 (65.9875%) 161 (25.2351%) 3 (0.4702%)
  entrepreneur  14 (11.2903%)  88 (70.9677%)  21 (16.9355%) 1 (0.8065%)
     housemaid  16 (15.0943%)  74 (69.8113%)  16 (15.0943%) 0 (0.0000%)
    management  39 (11.8902%) 226 (68.9024%)  63 (19.2073%) 0 (0.0000%)
       retired  92 (21.1982%) 329 (75.8065%)  12  (2.7650%) 1 (0.2304%)
 self-employed  16 (10.7383%)  82 (55.0336%)  51 (34.2282%) 0 (0.0000%)
      services  33 (10.2167%) 166 (51.3932%) 124 (38.3901%) 0 (0.0000%)
       student   3  (1.0909%)   8  (2.9091%) 264 (96.0000%) 0 (0.0000%)
    technician  65  (8.9041%) 384 (52.6027%) 279 (38.2192%) 2 (0.2740%)
    unemployed  10  (6.9444%)  86 (59.7222%)  48 (33.3333%) 0 (0.0000%)
       unknown   3  (8.1081%)  16 (43.2432%)  15 (40.5405%) 3 (8.1081%)

---
title: "Banking Campaign by Phil Sivyer"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    social: menu
    source_code: embed
    theme: readable
    css: fd.css
    keep_tex: yes
    

---

```{r setup, include=T}
library(flexdashboard)
library(tidyverse)
library(plotly)
library(dplyr)
library(lubridate)
library(plotly)
library(ggplot2)
library(modelr)
library(corrplot)
library(corrgram)
library(janitor)
library(DT)
library(knitr)
library(formattable)
library(rmarkdown)
library(sqldf)
library(party)
library(partykit)
library(caret)
library(packrat)
library(rsconnect)
library(skimr)
library(sjmisc)
library(kableExtra)
library(highcharter)
library(psych)
library(ggcorrplot)
library(visreg)
library(GGally)
library(scales)
library(themis)
library(tidymodels)
library(kknn)
library(tidymetrics)
library(ggforce)
library(tidyquant)
library(concaveman)
library(ggthemes)

bank.df <- read.table("C:/Reference Material/Machine_Learning/Bank_Campaign.csv", sep=";", header=T)
bank.df <- bank.df %>% 
  mutate(across(where(is.character),as.factor))

bank.df$month <- factor(bank.df$month, levels=c('jan','feb','mar','apr','may','jun',
                                                 'jul','aug','sep','oct','nov','dec'))

```


```{r setup-data, cache=F,include=F}
####################
####  Load Data ####
####################


```


About {data-navmenu="Explore"}
===================================== 

Column {data-width=250}
-------------------------------------

### About this dashboard

<font style="font-size: 16px">The dataset contains information about marketing campaigns that were conducted via phone calls from a Portuguese banking institution to their clients. Purpose of these campaigns is to prompt their clients to subscribe for a specific financial product of the bank (term deposit). After each call was conducted, the client had to inform the institution about their intention of either subscribing to the product (indicating a successful campaign) or not (unsucessful campaign).
The final output of this survey will be a binary result indicating if the client subscribed ('yes') to the product or not ('no').

The dataset has 41188 rows (instances of calls to clients) and 21 columns (variables) which are describing certain aspects of the call. Please note that there are cases where the same client was contacted multiple times - something that practically doesn't affect the analysis as each call will be considered independent from another even if the client is the same.

**In the StoryBoard pages we will see how successful the campaign was - plus, some other insights into the data so we can get a better understanding of what lies within.**</font>


Column {.tabset}
-------------------------------------

### Dataset Content

**1. Variables that describing attributes related directly to the client:**

- a. age
- b. job: type of job (e.g. 'admin', 'technician', 'unemployed', etc)
- c. marital: marital status ('married', 'single', 'divorced', 'unknown')
- d. education: level of education ('basic.4y', 'high.school', 'basic.6y', 'basic.9y','professional.course', 'unknown','university.degree','illiterate')
- e. default: if the client has credit in default ('no', 'unknown', 'yes')
- f. housing: if the client has housing a loan ('no', 'unknown', 'yes')
- g. loan: if the client has a personal loan ? ('no', 'unknown', 'yes')

**2. Variables related to the last contact of the current campaign:**

- a. contact: type of communication ('telephone', 'cellular')
- b. month: month of last contact
- c. day_of_week: day of last contact
- d. duration: call duration (in seconds)

**3. Other variables related to the campaign(s):**

- a. campaign: number of contacts performed during this campaign and for this client
- b. pdays: number of days passed by after the client was last contacted from a previous campaign
- c. previous: number of contacts performed before this campaign and for this client
- d. poutcome: outcome of previous marketing campaign ('nonexistent', 'failure', 'success')

**4. Socioeconomic variables:**
 
 
 
- a. emp.var.rate: employement variation rate - quarterly indicator
- b. cons.price.idx: consumer price index - monthly indicator
- c. cons.conf.idx: consumer confidence index - monthly indicator
- d. euribor3m: euribor 3 month rate - daily indicator
- e. nr.employed: number of employees - quarterly indicator
```{r }

```

Storyboard {.storyboard data-navmenu="Explore"}
=========================================

### Subscribe success by month & marital status {data-commentary-width=210}

```{r}
subscribed <- bank.df %>% 
  filter(subscribed == 'yes',marital != 'unknown') %>% 
  group_by(month,marital) %>% 
    mutate(yes= sum(n())) %>% 
  ungroup() %>% 
  distinct(month,marital,yes)

subscribed$month <- factor(subscribed$month, levels=c('jan','feb','mar','apr','may','jun',
                                                'jul','aug','sep','oct','nov','dec'))
#########################################################################################

not_subscribed <- bank.df %>% 
  filter(subscribed == 'no',marital != 'unknown') %>% 
  group_by(month,marital) %>% 
  mutate(no= sum(n())) %>% 
  ungroup() %>% 
  distinct(month,marital,no)

not_subscribed$month <- factor(not_subscribed$month, levels=c('jan','feb','mar','apr','may','jun',
                                                      'jul','aug','sep','oct','nov','dec'))

p1 <-ggplot() + 
  geom_boxplot(data = bank.df %>% 
                 filter(marital != 'unknown'),
               aes(x = month, y = duration, 
                   group = interaction(month, subscribed), 
                   fill = subscribed), 
               width = 10) +
  coord_cartesian(ylim = c(0, 4000))+
  facet_grid(.~marital)+
  labs(x="month", y = "call duration in seconds & count of yes/no subscribed")+
  theme(axis.text.x=element_text(angle=-90, vjust=0.4,hjust=1))+
  geom_line(data = subscribed,
            aes(x = month, y = yes, group = 1), size = 0.7, color = '#499FCA')+
  geom_line(data = not_subscribed,
            aes(x = month, y = no, group = 1), size = 0.7, color = '#E6565B')

ggplotly(p1)%>%
  config(displayModeBar = FALSE)
  
```
*** 
Note that **May, June, July and November** are successful periods to get people to subscribe. We can also deduce that the "married" marital status is more likely to get people to subscribe followed by single then divorced. Another clear indicator is that people who do not subscribe irrespective of marital status keep the call duration to a minimum.

**Chart Helper**

- *The red line is count of non subscribed and the blue line is count of subscribed (by month, and marital status).*

- *The Boxplot is a statistical summary of call duration in seconds by month, maritaL status and subscribed (yes or no)*

### Histogram - Age (bin width = 10) {data-commentary-width=210}

```{r sm-1-jan22-2017,fig.width=10}
p <- ggplot(bank.df %>% filter(marital != 'unknown'), aes(x=age,fill = subscribed)) + geom_histogram(binwidth=10,colour="white")

p <- p + facet_grid(subscribed ~ marital, scales="free", space="free")

ggplotly(p)%>%
  config(displayModeBar = FALSE)

```

*** 
Here we can clearly see that the **20 to 30 years old single** are the most likely to subscribe - followed by **20 to 40 years old married**


### Percent of subscribed success

```{r sm-2-jan22-2017,fig.width=10}
figaa <- plot_ly()
figaa <- figaa %>% add_pie(data = count(bank.df %>% 
                                          filter(subscribed == 'yes'), job), labels = ~job, values = ~n,
                         name = "Job", domain = list(x = c(0, 0.4), y = c(0.4, 1)))
figaa <- figaa %>% add_pie(data = count(bank.df%>% 
                                          filter(subscribed == 'yes'),marital), labels = ~marital, values = ~n,
                         name = "Marital", domain = list(x = c(0.6, 1), y = c(0.4, 1)))
figaa <- figaa %>% add_pie(data = count(bank.df%>% 
                                          filter(subscribed == 'yes'),education ), labels = ~education, values = ~n,
                         name = "Education", domain = list(x = c(0.25, 0.75), y = c(0, 0.6)))
figaa <- figaa %>% layout(title = "Left to right - Job - Education - Marital Status", showlegend = F,
                        xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
                        yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

figaa%>%
  config(displayModeBar = FALSE)
```

### Decision tree

```{r sm-3-jan22-2017,fig.width=10}

d_tree <- bank.df %>%
  mutate(education = case_when(education %in% c('university.degree','professional.course') ~ 'uni',  TRUE ~ 'non_uni')) %>% 
  mutate(job = case_when(job %in% c('management','technician','entrepeneur','self-employed') ~ 'prof',  TRUE ~ 'non_prof')) %>% 
  mutate(age = case_when(age > 60 ~ '>60'
                                ,age > 40 ~ '41_60'
                                ,age >= 20 ~ '20_40'
                                ,age <= 20 ~ '<20'
                                ,TRUE ~ 'NA'))%>% 
  select(subscribed,education,job,marital,age)

d_tree <- d_tree %>% 
dplyr::mutate(across(where(is.character),as.factor))


pd <- sample(2,nrow(d_tree),replace = TRUE, prob = c(0.8,0.2))
train <- d_tree[pd == 1,]
validate <- d_tree[pd == 2,]

tree <- ctree(subscribed ~., data = d_tree
               , maxdepth = 4, alpha = 0.5
              # , controls = ctree_control(mincriterion = 0.9, minsplit = 50)
              
)
plot(tree, gp = gpar(fontsize = 7),inner_panel=node_inner(tree,pval = FALSE, id = FALSE, fill = "orange"), terminal_panel = node_barplot(tree,id = FALSE),
     ip_args = list(id = FALSE, fill = "red"),
     ep_args = list(fill = "yellow"))


```

### Heat map for Jobs v Marital status that have subscribed

```{r sm-4-jan22-2017,fig.width=10}
tile <- bank.df %>% 
  filter(subscribed == 'yes', marital != "unknown") %>% 
  group_by(job,marital) %>% 
  count(marital,name = 'n')


pl1<-tile %>% 
  ggplot(aes(job,marital,fill = n))+
  geom_tile()+
  geom_label(aes(label = n),fill = "white")+
  scale_fill_viridis_c()+
  theme_linedraw()+
  labs(title = "Subscribed data")
  

ggplotly(pl1)

```


### Subscribed scatter plot - areas denoteing marital status and age

```{r sm-5-jan22-2017,fig.width=10}
library(ggforce)
library(tidyquant)
library(concaveman)

g1 <- bank.df %>% 
  filter(marital != 'unknown', subscribed == 'yes') %>% 
  ggplot(aes(duration,age))+
  geom_point(aes(color = marital))+
  geom_mark_hull(aes(fill= marital, label =marital),
                 concavity = 2.0)+
  geom_smooth(se = TRUE, span = 0.3)+
  expand_limits(y = 100) +
  theme_fivethirtyeight()+
  theme(axis.title = element_text(),text = element_text(family = "Rubik"))+
  scale_fill_tq() +
  labs(title = "Scatter Plot - Duration of call & Age of caller", subtitle = "",
       y = "Duration of call",
       x = "Age of caller",
       caption = "Bank of Portugal")

g1
```

### Subscribed over time by marital status and by call duration

```{r sm-6-jan22-2017,fig.width=10}

b1 <- bank.df %>% 
  filter(subscribed != 'yes', marital != "unknown") %>% 
  group_by(month, marital) %>% 
  summarise(total_days = sum(duration)/86400)

hc <- b1 %>% 
  hchart('line', hcaes(x = 'month', y = 'total_days', group = "marital"),
         style = list(fontFamily = "Rubik"))%>%
  hc_plotOptions(line = list(
    lineWidth = 4 # set the line width to 3 pixels
    #, dashStyle = "LongDashDot"
  ))%>%
  hc_colors(c("red", "green","black")) %>% 
  hc_chart(
    backgroundColor = "lightgrey" # set the background color to light grey
  ) %>%
  hc_xAxis(
    gridLineWidth = 0 # remove the x-axis gridlines
  ) %>%
  hc_yAxis(
    gridLineWidth = 0 # remove the y-axis gridlines
  )
hc

```

Polar chart {data-navmenu="Explore"}
===================================== 

Column {data-width=600}
-------------------------------------

### Call duration by jobs and subscribed

```{r, eval=T}
b3 <- bank.df %>% 
  filter(subscribed == 'yes') %>% 
  group_by(job) %>% 
  summarise(duration = as.numeric(sprintf("%.2f",sum(duration)/86400)))

highchart() %>% 
  hc_chart(polar = TRUE) %>% 
  hc_title(text = "Which profession has held the longest phone call duration (days)",
           style = list(fontWeight = "bold", fontSize = "30px"),
           align = "center") %>% 
  hc_subtitle(text = "‘Data is subscribed only & taken from 1 years data source - red denotes higher than the overall mean’",
              style = list(fontWeight = "bold"),
              align = "center") %>% 
  hc_xAxis(categories = b3$job,
           style = list(fontWeight = "bold")) %>% 
  hc_credits(enabled = TRUE,
             text = "Data Source taken from 'KAGGLE'") %>% 
  hc_add_theme(hc_theme_ffx()) %>%
  hc_legend(enabled = FALSE) %>% 
  hc_series(
    list(
      name = "Bars",
      data = b3$duration,
      colorByPoint = TRUE,
      type = "column",
      colors = ifelse(b3$duration < mean(b3$duration),"green","red")
      ),
      list(
        name = "line",
        data = b3$duration,
        pointPlacement = "on",
        type = "line"))

```
Column {data-width=600}
-------------------------------------

### Call duration by education and subscribed

```{r,eval=T}

b4 <- bank.df %>% 
  filter(subscribed == 'yes') %>% 
  group_by(education) %>% 
  summarise(duration = as.numeric(sprintf("%.2f",sum(duration)/86400)))

highchart() %>% 
  hc_chart(polar = TRUE) %>% 
  hc_title(text = "Which education has held the longest phone call duration (days)",
           style = list(fontWeight = "bold", fontSize = "30px"),
           align = "center") %>% 
  hc_subtitle(text = "‘Data is subscribed only & taken from 1 years data source - red denotes higher than the overall mean’",
              style = list(fontWeight = "bold"),
              align = "center") %>% 
  hc_xAxis(categories = b4$education,
           style = list(fontWeight = "bold")) %>% 
  hc_credits(enabled = TRUE,
             text = "Data Source taken from 'KAGGLE'") %>% 
  hc_add_theme(hc_theme_ffx()) %>%
  hc_legend(enabled = FALSE) %>% 
  hc_series(
    list(
      name = "Bars",
      data = b4$duration,
      colorByPoint = TRUE,
      type = "column",
      colors = ifelse(b4$duration < mean(b4$duration),"green","red")
      ),
      list(
        name = "line",
        data = b4$duration,
        pointPlacement = "on",
        type = "line"))


```


# Subscribed / Job - Education - percentages by all, column and row 

## Row {.tabset .tabset-fade} 
-----------------------------------------------------------------------

### **Job percentages by marital status - all %** 


```{r,eval=T}
perc_all <- bank.df %>% 
  filter(subscribed == 'yes') %>% 
  select(marital,job)

##
## GENDER RACE %
gr1 <-perc_all %>%
  tabyl(job,marital) %>%
  adorn_percentages("all") %>% ## CAN BE col, row, all
  adorn_pct_formatting(digits = 4) %>%
  adorn_ns("front")
print(gr1)
```


### **Job percentages by marital status - column %**


```{r,eval=T}

gr2 <-perc_all %>%
  tabyl(job,marital) %>%
  adorn_percentages("col") %>% ## CAN BE col, row, all
  adorn_pct_formatting(digits = 4) %>%
  adorn_ns("front")
print(gr2)
```

### **Job percentages by marital status - row %**


```{r,eval=T}

gr3 <-perc_all %>%
  tabyl(job,marital) %>%
  adorn_percentages("row") %>% ## CAN BE col, row, all
  adorn_pct_formatting(digits = 4) %>%
  adorn_ns("front")
print(gr3)
```