Module 1. Naive Bayes Classifer

Question 1: Report the number of missing values in the dataset.

Answer 1: The dataset contains 1000 observations of 21 variables. No missing values are found in the dataset.

Question 2: Compute the percentage of both classes similar to Lab 1 and see if the distribution of both classes preserved for both training and testing data.

Answer 2: As for the training set, false negatives (Type II error) are 85, while false positives (Type I error) are 117. True positives are 151, and true negatives are 397; hence, the accuracy is 73.07%. “Positive” means 0 being creditable (not defaulted). “Negative” 1 means being not creditable (defaulted). The data dictionary seems to wrongly describe “Creditability”.

As for the test set, false negatives (Type II error) are 22, while false positives (Type I error) are 35 True positives are 42, and true negatives are 151; hence, the accuracy is 77.20%.

Step 1: Exploring and preparing the data

credit <- read.csv("creditData.csv")
sum(is.na(credit))
## [1] 0
credit$Creditability <- as.factor(credit$Creditability) #creditable: yes(0), no(1); no creditable = default

set.seed(12345)
creditR <- credit[order(runif(1000)),]
creditTraining <- creditR[1:750,]
creditTest <- creditR[751:1000,]

prop.table(table(creditTraining$Creditability))
## 
##         0         1 
## 0.3146667 0.6853333
prop.table(table(creditTest$Creditability))
## 
##     0     1 
## 0.256 0.744

Step 2: Training a model on the data

library(naivebayes)
## Warning: package 'naivebayes' was built under R version 3.6.2
## naivebayes 0.9.6 loaded
creditModelNB <- naive_bayes(Creditability~., data=creditTraining)
creditModelNB
## 
## ================================ Naive Bayes ================================= 
##  
##  Call: 
## naive_bayes.formula(formula = Creditability ~ ., data = creditTraining)
## 
## ------------------------------------------------------------------------------ 
##  
## Laplace smoothing: 0
## 
## ------------------------------------------------------------------------------ 
##  
##  A priori probabilities: 
## 
##         0         1 
## 0.3146667 0.6853333 
## 
## ------------------------------------------------------------------------------ 
##  
##  Tables: 
## 
## ------------------------------------------------------------------------------ 
##  ::: Account.Balance (Gaussian) 
## ------------------------------------------------------------------------------ 
##                
## Account.Balance        0        1
##            mean 1.923729 2.793774
##            sd   1.036826 1.252008
## 
## ------------------------------------------------------------------------------ 
##  ::: Duration.of.Credit..month. (Gaussian) 
## ------------------------------------------------------------------------------ 
##                           
## Duration.of.Credit..month.        0        1
##                       mean 24.46610 19.20039
##                       sd   13.82208 11.13433
## 
## ------------------------------------------------------------------------------ 
##  ::: Payment.Status.of.Previous.Credit (Gaussian) 
## ------------------------------------------------------------------------------ 
##                                  
## Payment.Status.of.Previous.Credit        0        1
##                              mean 2.161017 2.665370
##                              sd   1.071649 1.045219
## 
## ------------------------------------------------------------------------------ 
##  ::: Purpose (Gaussian) 
## ------------------------------------------------------------------------------ 
##        
## Purpose        0        1
##    mean 2.927966 2.803502
##    sd   2.944722 2.633253
## 
## ------------------------------------------------------------------------------ 
##  ::: Credit.Amount (Gaussian) 
## ------------------------------------------------------------------------------ 
##              
## Credit.Amount        0        1
##          mean 3964.195 2984.177
##          sd   3597.093 2379.685
## 
## ------------------------------------------------------------------------------
## 
## # ... and 15 more tables
## 
## ------------------------------------------------------------------------------
#accuracy on the training set
creditTrPredNB <- predict(creditModelNB, creditTraining, type="class")
## Warning: predict.naive_bayes(): More features in the newdata are provided
## as there are probability tables in the object. Calculation is performed
## based on features to be found in the tables.
library(gmodels)
## Warning: package 'gmodels' was built under R version 3.6.2
CrossTable(creditTraining$Creditability, creditTrPredNB, prop.chisq=F, prop.c=F, prop.r=F, dnn=c("Actual Creditability", "Predicted Creditability")) #a confusion matrix of binary classification, "negative" means being (1) not creditable / defaulted / declined
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  750 
## 
##  
##                      | Predicted Creditability 
## Actual Creditability |         0 |         1 | Row Total | 
## ---------------------|-----------|-----------|-----------|
##                    0 |       151 |        85 |       236 | 
##                      |     0.201 |     0.113 |           | 
## ---------------------|-----------|-----------|-----------|
##                    1 |       117 |       397 |       514 | 
##                      |     0.156 |     0.529 |           | 
## ---------------------|-----------|-----------|-----------|
##         Column Total |       268 |       482 |       750 | 
## ---------------------|-----------|-----------|-----------|
## 
## 

Step 3: Evaluating model performance

#accuracy on the test set
creditPredNB <- predict(creditModelNB, creditTest, type="class")
## Warning: predict.naive_bayes(): More features in the newdata are provided
## as there are probability tables in the object. Calculation is performed
## based on features to be found in the tables.
CrossTable(creditTest$Creditability, creditPredNB, prop.chisq=F, prop.c=F, prop.r=F, dnn=c("Actual Creditability", "Predicted Creditability")) #a confusion matrix of binary classification, "negative" means being (1) not creditable / defaulted / declined
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  250 
## 
##  
##                      | Predicted Creditability 
## Actual Creditability |         0 |         1 | Row Total | 
## ---------------------|-----------|-----------|-----------|
##                    0 |        42 |        22 |        64 | 
##                      |     0.168 |     0.088 |           | 
## ---------------------|-----------|-----------|-----------|
##                    1 |        35 |       151 |       186 | 
##                      |     0.140 |     0.604 |           | 
## ---------------------|-----------|-----------|-----------|
##         Column Total |        77 |       173 |       250 | 
## ---------------------|-----------|-----------|-----------|
## 
## 

Module 2: Pre-processing

Question 3: What is the accuracy this time?

Answer 3: The target / dependent / response variable is selected as “Creditability”. The predictor / independent / explanatory variables, before pre-processing, are selected as ①“Account.Balance”, ②“Duration.of.Credit..month.”, ③“Payment.Status.of.Previous.Credit”, ④“Purpose”, ⑤“Credit.Amount”, ⑥“Value.Savings.Stocks”, ⑦“Length.of.current.employment”, ⑧“Instalment.per.cent”, ⑨“Sex…Marital.Status”, ⑩“Guarantors”, ⑪“Duration.in.Current.address”, ⑫“Most.valuable.available.asset”, ⑬“Age..years.”, ⑭“Concurrent.Credits”, ⑮“Type.of.apartment”, ⑯“No.of.Credits.at.this.Bank”, ⑰“Occupation”, ⑱“No.of.dependents”, ⑲“Telephone”, and ⑳“Foreign.Worker”.

Variables ① ③ ⑥ ⑦ ⑧ ⑪ ⑯ ⑱ are ordinal, and variables ④ ⑨ ⑩ ⑫ ⑭ ⑮ ⑰ ⑲ ⑳ are nominal. Variable ② ⑤ ⑬ was originally metric but categorized into the ordinal type.

Similar to Lab 1’s Tab 3, it is incorrect to scale / standardize the categorical data and compute their correlations. Hence, Lab 2’s Part 2 instruction seems to make no sense. Twenty variables are all categorical; therefore, the Chi-squared Test of Independence is run instead. The null hypothesis (H0) is that there is no association between the two variables. The alternative hypothesis (Hα) is that there is an association between the two variables. The results table for such tests is shown in the below chunk.

Five variables “Purpose”, “Length.of.current.employment”, “Most.valuable.available.asset”, “Type.of.apartment”, and “Occupation” are having many associations with other variables. As a result, these five variables are removed from the selection of independent variables. The new model shows the accuracy of 71.47% and 78.40% respectively, on the training set and the test set. The pre-processing technique slightly improves the model performance on the test set.

Step1: Exploring and preparing the data

#change metric variables into categorical
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
credit$Duration.of.Credit..month. <- case_when(credit$Duration.of.Credit..month.>54 ~1,
                                               credit$Duration.of.Credit..month.>48 ~2,
                                               credit$Duration.of.Credit..month.>42 ~3,
                                               credit$Duration.of.Credit..month.>36 ~4,
                                               credit$Duration.of.Credit..month.>30 ~5,
                                               credit$Duration.of.Credit..month.>24 ~6,
                                               credit$Duration.of.Credit..month.>18 ~7,
                                               credit$Duration.of.Credit..month.>12 ~8,
                                               credit$Duration.of.Credit..month.>6 ~9,
                                               T ~10)
credit$Credit.Amount <- case_when(credit$Credit.Amount>20000 ~1,
                                  credit$Credit.Amount>15000 ~2,
                                  credit$Credit.Amount>10000 ~3,
                                  credit$Credit.Amount>7500 ~4,
                                  credit$Credit.Amount>5000 ~5,
                                  credit$Credit.Amount>2500 ~6,
                                  credit$Credit.Amount>1500 ~7,
                                  credit$Credit.Amount>1000 ~8,
                                  credit$Credit.Amount>500 ~9,
                                  T ~10)
credit$Age..years. <- case_when(credit$Age..years.>=65 ~4,
                                credit$Age..years.>=60 ~5,
                                credit$Age..years.>=40 ~3,
                                credit$Age..years.>=26 ~2,
                                T ~1)

#REMINDER: "for" loop is not recommended in R
pairs <- c()
for (i in 1:(ncol(credit)-2)) {
  for (j in (i+1):(ncol(credit)-1)) {
    if (chisq.test(table(credit[,i+1],credit[,j+1]))$p.value<0.05) {
      pairs <- c(pairs, c(i,j))
    }
  }
} #O(n²)
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
#pairs of variables that are rejected by Chi-squared Test of Independence
#such variables in a pair have a significant association
table(pairs)
## pairs
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
## 11 12 11 18  9  4 15  6 13  6  9 15 12  7 16 10 15 11 11  9
#REMINDER: declaration with dimensions will speed up the execution in R
chiTable <- data.frame(matrix(nrow=20,ncol=20))
for (i in 1:20) {
  for (j in 1:20) {
    if (chisq.test(table(credit[,i+1],credit[,j+1]))$p.value<0.05) {
      chiTable[i,j] <- "⬤"
    } else {
      chiTable[i,j] <- "◯"
    }
  }
}
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect

## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
colnames(chiTable) <- c(1:20)
chiTable$Variable <- c(1:20)
chiTable <- chiTable[,c(21,1:20)]
#full results of Chi-squared Test of Independence: ⬤ reject null hypothesis; ◯ accept null hypothesis
library(dplyr)
library(knitr)
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 3.6.2
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
chiTable %>% kable() %>% kable_styling(full_width=F)
Variable 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+25EF> <U+25EF>
2 <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24>
3 <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+25EF> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+25EF>
4 <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24>
5 <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF>
6 <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF>
7 <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF>
8 <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+2B24>
9 <U+25EF> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF>
10 <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+25EF> <U+2B24>
11 <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+25EF>
12 <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24>
13 <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF>
14 <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+25EF> <U+25EF>
15 <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24>
16 <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+25EF>
17 <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+2B24>
18 <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24>
19 <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24>
20 <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+25EF> <U+25EF> <U+2B24> <U+25EF> <U+2B24> <U+2B24> <U+2B24> <U+2B24>
set.seed(12345)
creditR2 <- credit[order(runif(1000)),-c(5,8,13,16,18)] #remove five variables with the count no less than 15 in the pairs table (the potential maximum count could be 19) 
creditTraining2 <- creditR2[1:750,]
creditTest2 <- creditR2[751:1000,]

Step 2: Training a model on the data

library(naivebayes)
creditModelNB2 <- naive_bayes(Creditability~., data=creditTraining2)
creditModelNB2
## 
## ================================ Naive Bayes ================================= 
##  
##  Call: 
## naive_bayes.formula(formula = Creditability ~ ., data = creditTraining2)
## 
## ------------------------------------------------------------------------------ 
##  
## Laplace smoothing: 0
## 
## ------------------------------------------------------------------------------ 
##  
##  A priori probabilities: 
## 
##         0         1 
## 0.3146667 0.6853333 
## 
## ------------------------------------------------------------------------------ 
##  
##  Tables: 
## 
## ------------------------------------------------------------------------------ 
##  ::: Account.Balance (Gaussian) 
## ------------------------------------------------------------------------------ 
##                
## Account.Balance        0        1
##            mean 1.923729 2.793774
##            sd   1.036826 1.252008
## 
## ------------------------------------------------------------------------------ 
##  ::: Duration.of.Credit..month. (Gaussian) 
## ------------------------------------------------------------------------------ 
##                           
## Duration.of.Credit..month.        0        1
##                       mean 6.838983 7.671206
##                       sd   2.246398 1.819021
## 
## ------------------------------------------------------------------------------ 
##  ::: Payment.Status.of.Previous.Credit (Gaussian) 
## ------------------------------------------------------------------------------ 
##                                  
## Payment.Status.of.Previous.Credit        0        1
##                              mean 2.161017 2.665370
##                              sd   1.071649 1.045219
## 
## ------------------------------------------------------------------------------ 
##  ::: Credit.Amount (Gaussian) 
## ------------------------------------------------------------------------------ 
##              
## Credit.Amount        0        1
##          mean 6.436441 6.756809
##          sd   1.788910 1.435149
## 
## ------------------------------------------------------------------------------ 
##  ::: Value.Savings.Stocks (Gaussian) 
## ------------------------------------------------------------------------------ 
##                     
## Value.Savings.Stocks        0        1
##                 mean 1.711864 2.334630
##                 sd   1.340700 1.674510
## 
## ------------------------------------------------------------------------------
## 
## # ... and 10 more tables
## 
## ------------------------------------------------------------------------------
#accuracy on the training set
library(gmodels)
creditTrPredNB2 <- predict(creditModelNB2, creditTraining2, type="class")
## Warning: predict.naive_bayes(): More features in the newdata are provided
## as there are probability tables in the object. Calculation is performed
## based on features to be found in the tables.
CrossTable(creditTraining2$Creditability, creditTrPredNB2, prop.chisq=F, prop.c=F, prop.r=F, dnn=c("Actual Creditability", "Predicted Creditability"))
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  750 
## 
##  
##                      | Predicted Creditability 
## Actual Creditability |         0 |         1 | Row Total | 
## ---------------------|-----------|-----------|-----------|
##                    0 |       153 |        83 |       236 | 
##                      |     0.204 |     0.111 |           | 
## ---------------------|-----------|-----------|-----------|
##                    1 |       131 |       383 |       514 | 
##                      |     0.175 |     0.511 |           | 
## ---------------------|-----------|-----------|-----------|
##         Column Total |       284 |       466 |       750 | 
## ---------------------|-----------|-----------|-----------|
## 
## 

Step3: Evaluating model performance

#accuracy on the test set
creditPredNB2 <- predict(creditModelNB2, creditTest2, type="class")
## Warning: predict.naive_bayes(): More features in the newdata are provided
## as there are probability tables in the object. Calculation is performed
## based on features to be found in the tables.
CrossTable(creditTest2$Creditability, creditPredNB2, prop.chisq=F, prop.c=F, prop.r=F, dnn=c("Actual Creditability", "Predicted Creditability"))
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  250 
## 
##  
##                      | Predicted Creditability 
## Actual Creditability |         0 |         1 | Row Total | 
## ---------------------|-----------|-----------|-----------|
##                    0 |        45 |        19 |        64 | 
##                      |     0.180 |     0.076 |           | 
## ---------------------|-----------|-----------|-----------|
##                    1 |        35 |       151 |       186 | 
##                      |     0.140 |     0.604 |           | 
## ---------------------|-----------|-----------|-----------|
##         Column Total |        80 |       170 |       250 | 
## ---------------------|-----------|-----------|-----------|
## 
## 

Module 3: Support Vector Machine

letters <- read.csv("letterdata.csv") 
str(letters)
## 'data.frame':    20000 obs. of  17 variables:
##  $ letter: Factor w/ 26 levels "A","B","C","D",..: 20 9 4 14 7 19 2 1 10 13 ...
##  $ xbox  : int  2 5 4 7 2 4 4 1 2 11 ...
##  $ ybox  : int  8 12 11 11 1 11 2 1 2 15 ...
##  $ width : int  3 3 6 6 3 5 5 3 4 13 ...
##  $ height: int  5 7 8 6 1 8 4 2 4 9 ...
##  $ onpix : int  1 2 6 3 1 3 4 1 2 7 ...
##  $ xbar  : int  8 10 10 5 8 8 8 8 10 13 ...
##  $ ybar  : int  13 5 6 9 6 8 7 2 6 2 ...
##  $ x2bar : int  0 5 2 4 6 6 6 2 2 6 ...
##  $ y2bar : int  6 4 6 6 6 9 6 2 6 2 ...
##  $ xybar : int  6 13 10 4 6 5 7 8 12 12 ...
##  $ x2ybar: int  10 3 3 4 5 6 6 2 4 1 ...
##  $ xy2bar: int  8 9 7 10 9 6 6 8 8 9 ...
##  $ xedge : int  0 2 3 6 1 0 2 1 1 8 ...
##  $ xedgey: int  8 8 7 10 7 8 8 6 6 1 ...
##  $ yedge : int  0 4 3 2 5 9 7 2 1 1 ...
##  $ yedgex: int  8 10 9 8 10 7 10 7 7 8 ...

There are 20,000 observations so we have considerable flexibility in deciding how many to put in our training set and how many to keep for testing. We decided to use 18,000. ’We’ll stick with that number now.

letters_train <- letters[1:18000, ] 
letters_test <- letters[18001:20000, ]

Training the model

library(kernlab)
letter_classifier <- ksvm(letter ~ ., data = letters_train)
summary(letter_classifier)
## Length  Class   Mode 
##      1   ksvm     S4

Evaluting the model

letter_predictions <- predict(letter_classifier, letters_test) 
(p <- table(letter_predictions,letters_test$letter))
##                   
## letter_predictions  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R
##                  A 75  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  0
##                  B  0 67  0  2  0  1  0  0  0  0  0  1  0  1  0  2  1  1
##                  C  0  0 72  0  3  0  0  0  0  0  0  1  0  0  0  0  0  0
##                  D  1  1  0 71  0  0  1  2  2  2  1  0  0  0  0  2  1  1
##                  E  0  0  0  0 70  2  0  0  0  1  0  2  0  0  0  0  0  0
##                  F  0  0  0  0  0 76  0  0  3  0  0  0  0  0  0  6  0  0
##                  G  0  0  1  0  3  0 76  1  0  0  0  0  0  0  0  0  0  0
##                  H  0  0  0  1  0  0  1 58  0  1  0  1  1  0  0  0  1  1
##                  I  0  0  0  0  0  0  0  0 69  1  0  0  0  0  0  0  0  0
##                  J  0  0  0  0  0  0  0  0  2 66  0  0  0  0  0  0  0  0
##                  K  0  0  0  0  0  0  0  3  0  0 62  0  0  1  0  0  0  2
##                  L  0  0  0  0  0  0  1  0  0  0  0 69  0  0  0  0  0  0
##                  M  0  0  0  0  0  0  1  0  0  0  0  0 71  1  0  0  0  0
##                  N  0  0  0  0  0  1  0  0  0  0  0  0  0 78  0  0  0  0
##                  O  0  0  1  0  0  0  0  0  0  1  0  0  0  2 67  1  2  0
##                  P  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 72  0  0
##                  Q  0  0  0  0  0  0  0  1  0  0  0  0  0  0  3  1 65  0
##                  R  0  1  0  0  0  0  1  1  0  0  4  0  0  2  1  0  0 74
##                  S  0  1  0  0  0  0  0  0  1  1  0  0  0  0  0  0  0  0
##                  T  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##                  U  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0
##                  V  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##                  W  0  0  1  0  0  0  0  0  0  0  0  0  1  0  2  0  0  0
##                  X  0  1  0  0  0  0  0  0  0  0  2  4  0  0  0  0  0  0
##                  Y  3  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##                  Z  0  0  0  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0
##                   
## letter_predictions  S  T  U  V  W  X  Y  Z
##                  A  0  1  0  0  0  0  0  0
##                  B  1  0  0  1  0  0  0  0
##                  C  0  0  0  0  0  0  0  0
##                  D  0  0  1  0  0  0  0  0
##                  E  0  0  0  0  0  0  0  0
##                  F  1  0  0  1  0  0  0  0
##                  G  0  0  0  0  0  0  0  0
##                  H  0  3  0  1  0  0  0  0
##                  I  0  0  0  0  0  2  0  0
##                  J  0  0  0  0  0  0  0  1
##                  K  0  0  0  0  0  0  0  0
##                  L  0  0  0  0  0  0  0  0
##                  M  0  0  0  0  2  0  0  0
##                  N  0  0  0  0  1  0  0  0
##                  O  0  0  0  0  0  0  0  0
##                  P  0  0  0  0  0  0  0  0
##                  Q  0  0  0  0  0  0  0  0
##                  R  0  1  0  0  0  0  0  0
##                  S 68  0  0  0  0  0  0  0
##                  T  0 88  0  0  0  0  1  0
##                  U  0  0 89  0  0  0  0  0
##                  V  0  0  0 68  0  0  1  0
##                  W  0  0  1  0 66  0  0  0
##                  X  0  0  0  0  0 84  1  0
##                  Y  0  1  0  0  0  0 65  0
##                  Z  1  0  0  0  0  0  0 81
(Accuracy <- sum(diag(p))/sum(p)*100)
## [1] 93.35

Module 4: News Popularity

Data preparation

news <- read.csv("OnlineNewsPopularity_for_R.csv")
newsShort <- data.frame(news$n_tokens_title, news$n_tokens_content, news$n_unique_tokens, news$n_non_stop_words, news$num_hrefs, news$num_imgs, news$num_videos, news$average_token_length, news$num_keywords, news$kw_max_max, news$global_sentiment_polarity, news$avg_positive_polarity, news$title_subjectivity, news$title_sentiment_polarity, news$abs_title_subjectivity, news$abs_title_sentiment_polarity, news$shares)

colnames(newsShort) <- c("n_tokens_title", "n_tokens_content", "n_unique_tokens", "n_non_stop_words", "num_hrefs", "num_imgs", "num_videos", "average_token_length", "num_keywords", "kw_max_max", "global_sentiment_polarity", "avg_positive_polarity", "title_subjectivity", "title_sentiment_polarity", "abs_title_subjectivity", "abs_title_sentiment_polarity", "shares")
newsShort$popular = rep('na', nrow(newsShort))
for(i in 1:39644) {
     if(newsShort$shares[i] >= 1400) {
         newsShort$popular[i] = "yes"} 
     else {newsShort$popular[i] = "no"}
}
newsShort$shares = newsShort$popular
newsShort$shares <- as.factor(newsShort$shares)
news_rand <- newsShort[order(runif(10000)), ]
set.seed(12345)

#Split the data into training and test datasets
news_train <- news_rand[1:9000, ]
news_test <- news_rand[9001:10000, ]

Model Design

nb_model <- naive_bayes(shares ~ ., data=news_train)
## Warning: naive_bayes(): Feature popular - zero probabilities are present.
## Consider Laplace smoothing.
nb_model
## 
## ================================ Naive Bayes ================================= 
##  
##  Call: 
## naive_bayes.formula(formula = shares ~ ., data = news_train)
## 
## ------------------------------------------------------------------------------ 
##  
## Laplace smoothing: 0
## 
## ------------------------------------------------------------------------------ 
##  
##  A priori probabilities: 
## 
##        no       yes 
## 0.4287778 0.5712222 
## 
## ------------------------------------------------------------------------------ 
##  
##  Tables: 
## 
## ------------------------------------------------------------------------------ 
##  ::: n_tokens_title (Gaussian) 
## ------------------------------------------------------------------------------ 
##               
## n_tokens_title       no      yes
##           mean 9.840891 9.697919
##           sd   1.940023 1.988384
## 
## ------------------------------------------------------------------------------ 
##  ::: n_tokens_content (Gaussian) 
## ------------------------------------------------------------------------------ 
##                 
## n_tokens_content       no      yes
##             mean 453.4509 513.4272
##             sd   351.7631 450.8013
## 
## ------------------------------------------------------------------------------ 
##  ::: n_unique_tokens (Gaussian) 
## ------------------------------------------------------------------------------ 
##                
## n_unique_tokens        no       yes
##            mean 0.5707594 0.5538531
##            sd   0.1125435 0.1235955
## 
## ------------------------------------------------------------------------------ 
##  ::: n_non_stop_words (Gaussian) 
## ------------------------------------------------------------------------------ 
##                 
## n_non_stop_words         no        yes
##             mean 0.99429904 0.99066329
##             sd   0.07529892 0.09618384
## 
## ------------------------------------------------------------------------------ 
##  ::: num_hrefs (Gaussian) 
## ------------------------------------------------------------------------------ 
##          
## num_hrefs        no       yes
##      mean  9.144079 10.620307
##      sd    8.613435 11.641156
## 
## ------------------------------------------------------------------------------
## 
## # ... and 12 more tables
## 
## ------------------------------------------------------------------------------

Evaluate the Model

news_Pred <- predict(nb_model, newdata = news_test)
## Warning: predict.naive_bayes(): More features in the newdata are provided
## as there are probability tables in the object. Calculation is performed
## based on features to be found in the tables.
(conf_nat <- table(news_Pred, news_test$shares))
##          
## news_Pred  no yes
##       no  424   0
##       yes   9 567
(Accuracy <- sum(diag(conf_nat))/sum(conf_nat)*100)
## [1] 99.1

Question 5: Do you see any improvement compared to last three techniques? Please completely explain your results and analysis.

Answer 5: They are shown as below: the accuracy rates from the confusion matrix on the training and the test set by the seven different models. The best model is random forest according to the accuracy on the test set. In general, decision tree, random forest, regression trees, and support vector machine models perform the similar accuracy around 60%.