Workshop Bank Customer Data

1. Identifying Data Structures

The purpose of this demo is to show how we identify data structures when we do analysis to draw out some useful insights. You can find and download sample dataset (Bank Customer Data.csv) from this link: https://www.kaggle.com/datasets/kidoen/bank-customers-data
First, read .csv file to R environment. You can do it by the following command. If your data is in .xls format then instal.packages(“readxl”) anda after attach this library.

BankCustomer<-read.csv("/home/user/Desktop/Analytcis with R/DataSets/BankCustomerData.csv")  #reading .csv
setwd("/home/user/Desktop/Analytcis with R/Workshops/")  #Lets change our working directory
#you can check your working directory by getwd()
head(BankCustomer)   #Overview imported dataset

##   age          job marital education default balance housing loan contact day
## 1  58   management married  tertiary      no    2143     yes   no unknown   5
## 2  44   technician  single secondary      no      29     yes   no unknown   5
## 3  33 entrepreneur married secondary      no       2     yes  yes unknown   5
## 4  47  blue-collar married   unknown      no    1506     yes   no unknown   5
## 5  33      unknown  single   unknown      no       1      no   no unknown   5
## 6  35   management married  tertiary      no     231     yes   no unknown   5
##   month duration campaign pdays previous poutcome term_deposit
## 1   may      261        1    -1        0  unknown           no
## 2   may      151        1    -1        0  unknown           no
## 3   may       76        1    -1        0  unknown           no
## 4   may       92        1    -1        0  unknown           no
## 5   may      198        1    -1        0  unknown           no
## 6   may      139        1    -1        0  unknown           no

str(BankCustomer)    #Check help(str()) in Rstudio

## 'data.frame':    42639 obs. of  17 variables:
##  $ age         : int  58 44 33 47 33 35 28 42 58 43 ...
##  $ job         : chr  "management" "technician" "entrepreneur" "blue-collar" ...
##  $ marital     : chr  "married" "single" "married" "married" ...
##  $ education   : chr  "tertiary" "secondary" "secondary" "unknown" ...
##  $ default     : chr  "no" "no" "no" "no" ...
##  $ balance     : int  2143 29 2 1506 1 231 447 2 121 593 ...
##  $ housing     : chr  "yes" "yes" "yes" "yes" ...
##  $ loan        : chr  "no" "no" "yes" "no" ...
##  $ contact     : chr  "unknown" "unknown" "unknown" "unknown" ...
##  $ day         : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ month       : chr  "may" "may" "may" "may" ...
##  $ duration    : int  261 151 76 92 198 139 217 380 50 55 ...
##  $ campaign    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pdays       : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ previous    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ poutcome    : chr  "unknown" "unknown" "unknown" "unknown" ...
##  $ term_deposit: chr  "no" "no" "no" "no" ...

2. Assigning Values to Data Structures

Now as we identified Data Structures, the next step is to assign values to the DS. This is achieved by importing and exporting data from files.

data("mtcars")  # Loading mtcars data
write.table(mtcars, file = "mtcars.txt", sep = "\t", row.names = TRUE, col.names = NA)
write.csv(mtcars, file = "mtcars.csv")

Fun: Also we can export from Rstudio to .pdf, .jpeg, .png formats
Same way we can export pdf and png by means of following sample.

# Step 1: Call the jpeg command to start the plot
jpeg(file = "//home/user/Desktop/Analytcis with R/My Plot.jpeg",   # The directory you want to save the file in
    width = 400, # The width of the plot in inches
    height = 400) # The height of the plot in inches

# Step 2: Create the plot with R code
plot(x = 1:10, 
     y = 1:10)
abline(v = 0) # Additional low-level plotting commands
text(x = 0, y = 1, labels = "Random text")

dev.off()    # Step 3: Run dev.off() to create the file!

## png 
##   2

3. Data Manipulation

-Data manipulation is required to bring accuracy in the data.
-R base package has “apply” functions in it, which helps to manupilate the data
-The apply() functions are used to perform a specific change to each column or row in object.
-Types of apply function are: apply(), lapply(), sapply(), tapply(), mapply() and so on…

apply() function apply() takes Data frame or matrix as an input and gives output in vector, list or array. Apply function in R is primarily used to avoid explicit uses of loop constructs. It is the most basic of all collections can be used over a matrice.

This function takes 3 arguments:

apply(X, MARGIN, FUN)

-x: an array or matrix
-MARGIN: take a value or range between 1 and 2 to define where to apply the function:
-MARGIN=1: the manipulation is performed on rows
-MARGIN=2: the manipulation is performed on columns
-MARGIN=c(1,2)` the manipulation is performed on rows and columns
-FUN: tells which function to apply. Built functions like mean, median, sum, min, max and even user-defined functions can be applied

Example:

m1 <- matrix(C<-(1:10),nrow=5, ncol=6)
m1

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    6    1    6    1    6
## [2,]    2    7    2    7    2    7
## [3,]    3    8    3    8    3    8
## [4,]    4    9    4    9    4    9
## [5,]    5   10    5   10    5   10

a_m1 <- apply(m1, 2, sum)
a_m1

## [1] 15 40 15 40 15 40

Now lets do some manipulation with our Bank Customer Data

library(plyr)
BankCustomer<-rename(BankCustomer,c("age"="Age"))
str(BankCustomer)

## 'data.frame':    42639 obs. of  17 variables:
##  $ Age         : int  58 44 33 47 33 35 28 42 58 43 ...
##  $ job         : chr  "management" "technician" "entrepreneur" "blue-collar" ...
##  $ marital     : chr  "married" "single" "married" "married" ...
##  $ education   : chr  "tertiary" "secondary" "secondary" "unknown" ...
##  $ default     : chr  "no" "no" "no" "no" ...
##  $ balance     : int  2143 29 2 1506 1 231 447 2 121 593 ...
##  $ housing     : chr  "yes" "yes" "yes" "yes" ...
##  $ loan        : chr  "no" "no" "yes" "no" ...
##  $ contact     : chr  "unknown" "unknown" "unknown" "unknown" ...
##  $ day         : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ month       : chr  "may" "may" "may" "may" ...
##  $ duration    : int  261 151 76 92 198 139 217 380 50 55 ...
##  $ campaign    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pdays       : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ previous    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ poutcome    : chr  "unknown" "unknown" "unknown" "unknown" ...
##  $ term_deposit: chr  "no" "no" "no" "no" ...

max(BankCustomer$Age)

## [1] 95

#Now lets add a column to categorize customers by age period
BankCustomerGen<-transform(BankCustomer, Generation=ifelse(Age<22,"Z",
                                                            ifelse(Age<41,"Y", 
                                                                   ifelse(Age<53,"X", "AB"))))
head(BankCustomerGen)

##   Age          job marital education default balance housing loan contact day
## 1  58   management married  tertiary      no    2143     yes   no unknown   5
## 2  44   technician  single secondary      no      29     yes   no unknown   5
## 3  33 entrepreneur married secondary      no       2     yes  yes unknown   5
## 4  47  blue-collar married   unknown      no    1506     yes   no unknown   5
## 5  33      unknown  single   unknown      no       1      no   no unknown   5
## 6  35   management married  tertiary      no     231     yes   no unknown   5
##   month duration campaign pdays previous poutcome term_deposit Generation
## 1   may      261        1    -1        0  unknown           no         AB
## 2   may      151        1    -1        0  unknown           no          X
## 3   may       76        1    -1        0  unknown           no          Y
## 4   may       92        1    -1        0  unknown           no          X
## 5   may      198        1    -1        0  unknown           no          Y
## 6   may      139        1    -1        0  unknown           no          Y

#2 Way Frequency table
table(BankCustomerGen$Generation,BankCustomerGen$poutcome)

##     
##      failure other success unknown
##   AB     577   178     137    5789
##   X     1199   404     183   10862
##   Y     2486   929     440   19325
##   Z        9     6       6     109

Workshop Bank Customer Data

Tural Naghiyev

2022-10-02

1. Identifying Data Structures

2. Assigning Values to Data Structures

3. Data Manipulation