Loading the tidyverse library

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Description of the columns

1 - age

2 - job

3 - marital(marital status)

4 - education

5 - default: has credit in default?

6 - balance: average yearly balance, in euros

7 - housing: has housing loan?

8 - loan: has personal loan?

9 - contact: contact communication type

10 - day: last contact day of the month

11 - month: last contact month of year

12 - duration: last contact duration, in seconds

13 - campaign: number of contacts performed during this campaign and for this client

14 - pdays: number of days that passed by after the client was last contacted from a previous campaign

15 - previous: number of contacts performed before this campaign and for this client

16 - poutcome: outcome of the previous marketing campaign

17 - has the client subscribed a term deposit?

Loading the data into the data frame

Note In the csv file being used the separator is ‘;’ not a comma.

Summary of data frame

##       age            job              marital           education        
##  Min.   :18.00   Length:45211       Length:45211       Length:45211      
##  1st Qu.:33.00   Class :character   Class :character   Class :character  
##  Median :39.00   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :40.94                                                           
##  3rd Qu.:48.00                                                           
##  Max.   :95.00                                                           
##    default             balance         housing              loan          
##  Length:45211       Min.   : -8019   Length:45211       Length:45211      
##  Class :character   1st Qu.:    72   Class :character   Class :character  
##  Mode  :character   Median :   448   Mode  :character   Mode  :character  
##                     Mean   :  1362                                        
##                     3rd Qu.:  1428                                        
##                     Max.   :102127                                        
##    contact               day           month              duration     
##  Length:45211       Min.   : 1.00   Length:45211       Min.   :   0.0  
##  Class :character   1st Qu.: 8.00   Class :character   1st Qu.: 103.0  
##  Mode  :character   Median :16.00   Mode  :character   Median : 180.0  
##                     Mean   :15.81                      Mean   : 258.2  
##                     3rd Qu.:21.00                      3rd Qu.: 319.0  
##                     Max.   :31.00                      Max.   :4918.0  
##     campaign          pdays          previous          poutcome        
##  Min.   : 1.000   Min.   : -1.0   Min.   :  0.0000   Length:45211      
##  1st Qu.: 1.000   1st Qu.: -1.0   1st Qu.:  0.0000   Class :character  
##  Median : 2.000   Median : -1.0   Median :  0.0000   Mode  :character  
##  Mean   : 2.764   Mean   : 40.2   Mean   :  0.5803                     
##  3rd Qu.: 3.000   3rd Qu.: -1.0   3rd Qu.:  0.0000                     
##  Max.   :63.000   Max.   :871.0   Max.   :275.0000                     
##       y            
##  Length:45211      
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

Getting the number of rows and columns of the data_table

## [1] 45211    17

Columns in the data frame

##  [1] "age"       "job"       "marital"   "education" "default"   "balance"  
##  [7] "housing"   "loan"      "contact"   "day"       "month"     "duration" 
## [13] "campaign"  "pdays"     "previous"  "poutcome"  "y"

We can see that there are 17 columns in the data_table

Data types of the columns in the data frame

## 'data.frame':    45211 obs. of  17 variables:
##  $ age      : int  58 44 33 47 33 35 28 42 58 43 ...
##  $ job      : chr  "management" "technician" "entrepreneur" "blue-collar" ...
##  $ marital  : chr  "married" "single" "married" "married" ...
##  $ education: chr  "tertiary" "secondary" "secondary" "unknown" ...
##  $ default  : chr  "no" "no" "no" "no" ...
##  $ balance  : int  2143 29 2 1506 1 231 447 2 121 593 ...
##  $ housing  : chr  "yes" "yes" "yes" "yes" ...
##  $ loan     : chr  "no" "no" "yes" "no" ...
##  $ contact  : chr  "unknown" "unknown" "unknown" "unknown" ...
##  $ day      : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ month    : chr  "may" "may" "may" "may" ...
##  $ duration : int  261 151 76 92 198 139 217 380 50 55 ...
##  $ campaign : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pdays    : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ previous : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ poutcome : chr  "unknown" "unknown" "unknown" "unknown" ...
##  $ y        : chr  "no" "no" "no" "no" ...

Summary of Columns

1. Summary of column ‘age’

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.00   33.00   39.00   40.94   48.00   95.00
##   0%  25%  50%  75% 100% 
##   18   33   39   48   95

IQR of the column

## [1] 15

the standard deviation of the age column

## [1] 10.61876

2. Summary of column ‘balance’

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -8019      72     448    1362    1428  102127
##     0%    25%    50%    75%   100% 
##  -8019     72    448   1428 102127

IQR of the column

## [1] 1356

the standard deviation of the balance column

## [1] 3044.766

3. Summary of column ‘duration’

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   103.0   180.0   258.2   319.0  4918.0
##   0%  25%  50%  75% 100% 
##    0  103  180  319 4918

IQR of the column

## [1] 216

the standard deviation of the duration column

## [1] 257.5278

4. Summary of column ‘day’

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    8.00   16.00   15.81   21.00   31.00
##   0%  25%  50%  75% 100% 
##    1    8   16   21   31

the standard deviation of the day column

## [1] 8.322476

5. Summary of column ‘job’

##    Length     Class      Mode 
##     45211 character character

finding the unique values for the ‘job’ column

##  [1] "management"    "technician"    "entrepreneur"  "blue-collar"  
##  [5] "unknown"       "retired"       "admin."        "services"     
##  [9] "self-employed" "unemployed"    "housemaid"     "student"
## 
##        admin.   blue-collar  entrepreneur     housemaid    management 
##          5171          9732          1487          1240          9458 
##       retired self-employed      services       student    technician 
##          2264          1579          4154           938          7597 
##    unemployed       unknown 
##          1303           288

6. Summary of column ‘education’

##    Length     Class      Mode 
##     45211 character character

finding the unique values for the ’education’ column

## [1] "tertiary"  "secondary" "unknown"   "primary"

finding the number of values for each unique value

## 
##   primary secondary  tertiary   unknown 
##      6851     23202     13301      1857

7. Summary of column ‘marital’

##    Length     Class      Mode 
##     45211 character character

finding the unique values for the ‘marital’ column

## [1] "married"  "single"   "divorced"

finding the number of values for each unique value

## 
## divorced  married   single 
##     5207    27214    12790

8. Summary of column ‘housing’

##    Length     Class      Mode 
##     45211 character character

finding the unique values for the ‘housing’ column

## [1] "yes" "no"

finding the number of values for each unique value

## 
##    no   yes 
## 20081 25130

9. Summary of column ‘month’

##    Length     Class      Mode 
##     45211 character character

finding the unique values for the ‘month’ column

##  [1] "apr" "aug" "dec" "feb" "jan" "jul" "jun" "mar" "may" "nov" "oct" "sep"

finding the number of values for each unique value

## 
##   apr   aug   dec   feb   jan   jul   jun   mar   may   nov   oct   sep 
##  2932  6247   214  2649  1403  6895  5341   477 13766  3970   738   579

10. Summary of column ‘contact’

##    Length     Class      Mode 
##     45211 character character

finding the unique values for the ‘contact’ column

## [1] "cellular"  "telephone" "unknown"

finding the number of values for each unique value

## 
##  cellular telephone   unknown 
##     29285      2906     13020

Visual Representation

Visual relationship between the age and balance

Visual relationship between the age and marital status

Visual relationship between the job and education

Visual relationship between the job and marital status

Visual relationship between the job and housing loan

Questions Answered

What is the aim of the data set

Using the data set we can find out how the type of job, age, having loans and other attributes in the data set have affected the user taking a term deposit.

what can you infer from the visual relationship between the job and marital status

we can see that majority of the people working a management job do not have a housing loan and the majority of the people having a blue collar job have a housing loan.

what can you infer from the visual relationship between the job between the job and education

we can see that majority of the people with tertiary education have a management job while most technicians and an admins have secondary education.

How many unique jobs are considered

There are 12 unique jobs

what is the average of the duration spent in contacting the user

258.2 in seconds