Loading the tidyverse library
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
1 - age
2 - job
3 - marital(marital status)
4 - education
5 - default: has credit in default?
6 - balance: average yearly balance, in euros
7 - housing: has housing loan?
8 - loan: has personal loan?
9 - contact: contact communication type
10 - day: last contact day of the month
11 - month: last contact month of year
12 - duration: last contact duration, in seconds
13 - campaign: number of contacts performed during this campaign and for this client
14 - pdays: number of days that passed by after the client was last contacted from a previous campaign
15 - previous: number of contacts performed before this campaign and for this client
16 - poutcome: outcome of the previous marketing campaign
17 - has the client subscribed a term deposit?
Note In the csv file being used the separator is ‘;’ not a comma.
## age job marital education
## Min. :18.00 Length:45211 Length:45211 Length:45211
## 1st Qu.:33.00 Class :character Class :character Class :character
## Median :39.00 Mode :character Mode :character Mode :character
## Mean :40.94
## 3rd Qu.:48.00
## Max. :95.00
## default balance housing loan
## Length:45211 Min. : -8019 Length:45211 Length:45211
## Class :character 1st Qu.: 72 Class :character Class :character
## Mode :character Median : 448 Mode :character Mode :character
## Mean : 1362
## 3rd Qu.: 1428
## Max. :102127
## contact day month duration
## Length:45211 Min. : 1.00 Length:45211 Min. : 0.0
## Class :character 1st Qu.: 8.00 Class :character 1st Qu.: 103.0
## Mode :character Median :16.00 Mode :character Median : 180.0
## Mean :15.81 Mean : 258.2
## 3rd Qu.:21.00 3rd Qu.: 319.0
## Max. :31.00 Max. :4918.0
## campaign pdays previous poutcome
## Min. : 1.000 Min. : -1.0 Min. : 0.0000 Length:45211
## 1st Qu.: 1.000 1st Qu.: -1.0 1st Qu.: 0.0000 Class :character
## Median : 2.000 Median : -1.0 Median : 0.0000 Mode :character
## Mean : 2.764 Mean : 40.2 Mean : 0.5803
## 3rd Qu.: 3.000 3rd Qu.: -1.0 3rd Qu.: 0.0000
## Max. :63.000 Max. :871.0 Max. :275.0000
## y
## Length:45211
## Class :character
## Mode :character
##
##
##
## [1] 45211 17
## [1] "age" "job" "marital" "education" "default" "balance"
## [7] "housing" "loan" "contact" "day" "month" "duration"
## [13] "campaign" "pdays" "previous" "poutcome" "y"
We can see that there are 17 columns in the data_table
## 'data.frame': 45211 obs. of 17 variables:
## $ age : int 58 44 33 47 33 35 28 42 58 43 ...
## $ job : chr "management" "technician" "entrepreneur" "blue-collar" ...
## $ marital : chr "married" "single" "married" "married" ...
## $ education: chr "tertiary" "secondary" "secondary" "unknown" ...
## $ default : chr "no" "no" "no" "no" ...
## $ balance : int 2143 29 2 1506 1 231 447 2 121 593 ...
## $ housing : chr "yes" "yes" "yes" "yes" ...
## $ loan : chr "no" "no" "yes" "no" ...
## $ contact : chr "unknown" "unknown" "unknown" "unknown" ...
## $ day : int 5 5 5 5 5 5 5 5 5 5 ...
## $ month : chr "may" "may" "may" "may" ...
## $ duration : int 261 151 76 92 198 139 217 380 50 55 ...
## $ campaign : int 1 1 1 1 1 1 1 1 1 1 ...
## $ pdays : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ previous : int 0 0 0 0 0 0 0 0 0 0 ...
## $ poutcome : chr "unknown" "unknown" "unknown" "unknown" ...
## $ y : chr "no" "no" "no" "no" ...
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 33.00 39.00 40.94 48.00 95.00
## 0% 25% 50% 75% 100%
## 18 33 39 48 95
IQR of the column
## [1] 15
the standard deviation of the age column
## [1] 10.61876
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -8019 72 448 1362 1428 102127
## 0% 25% 50% 75% 100%
## -8019 72 448 1428 102127
IQR of the column
## [1] 1356
the standard deviation of the balance column
## [1] 3044.766
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 103.0 180.0 258.2 319.0 4918.0
## 0% 25% 50% 75% 100%
## 0 103 180 319 4918
IQR of the column
## [1] 216
the standard deviation of the duration column
## [1] 257.5278
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 8.00 16.00 15.81 21.00 31.00
## 0% 25% 50% 75% 100%
## 1 8 16 21 31
the standard deviation of the day column
## [1] 8.322476
## Length Class Mode
## 45211 character character
finding the unique values for the ‘job’ column
## [1] "management" "technician" "entrepreneur" "blue-collar"
## [5] "unknown" "retired" "admin." "services"
## [9] "self-employed" "unemployed" "housemaid" "student"
##
## admin. blue-collar entrepreneur housemaid management
## 5171 9732 1487 1240 9458
## retired self-employed services student technician
## 2264 1579 4154 938 7597
## unemployed unknown
## 1303 288
## Length Class Mode
## 45211 character character
finding the unique values for the ’education’ column
## [1] "tertiary" "secondary" "unknown" "primary"
finding the number of values for each unique value
##
## primary secondary tertiary unknown
## 6851 23202 13301 1857
## Length Class Mode
## 45211 character character
finding the unique values for the ‘marital’ column
## [1] "married" "single" "divorced"
finding the number of values for each unique value
##
## divorced married single
## 5207 27214 12790
## Length Class Mode
## 45211 character character
finding the unique values for the ‘housing’ column
## [1] "yes" "no"
finding the number of values for each unique value
##
## no yes
## 20081 25130
## Length Class Mode
## 45211 character character
finding the unique values for the ‘month’ column
## [1] "apr" "aug" "dec" "feb" "jan" "jul" "jun" "mar" "may" "nov" "oct" "sep"
finding the number of values for each unique value
##
## apr aug dec feb jan jul jun mar may nov oct sep
## 2932 6247 214 2649 1403 6895 5341 477 13766 3970 738 579
## Length Class Mode
## 45211 character character
finding the unique values for the ‘contact’ column
## [1] "cellular" "telephone" "unknown"
finding the number of values for each unique value
##
## cellular telephone unknown
## 29285 2906 13020
Using the data set we can find out how the type of job, age, having loans and other attributes in the data set have affected the user taking a term deposit.
we can see that majority of the people working a management job do not have a housing loan and the majority of the people having a blue collar job have a housing loan.
we can see that majority of the people with tertiary education have a management job while most technicians and an admins have secondary education.
There are 12 unique jobs
what is the average of the duration spent in contacting the user
258.2 in seconds