Question 1: Report the number of missing values in the dataset.
Answer 1: The dataset contains 1000 observations of 21 variables. No missing values are found in the dataset.
Question 2: Compute the percentage of both classes similar to Lab 1 and see if the distribution of both classes preserved for both training and testing data.
Answer 2: As for the training set, false negatives (Type II error) are 85, while false positives (Type I error) are 117. True positives are 151, and true negatives are 397; hence, the accuracy is 73.07%. “Positive” means 0 being creditable (not defaulted). “Negative” 1 means being not creditable (defaulted). The data dictionary seems to wrongly describe “Creditability”.
As for the test set, false negatives (Type II error) are 22, while false positives (Type I error) are 35 True positives are 42, and true negatives are 151; hence, the accuracy is 77.20%.
credit <- read.csv("creditData.csv")
sum(is.na(credit))
## [1] 0
credit$Creditability <- as.factor(credit$Creditability) #creditable: yes(0), no(1); no creditable = default
set.seed(12345)
creditR <- credit[order(runif(1000)),]
creditTraining <- creditR[1:750,]
creditTest <- creditR[751:1000,]
prop.table(table(creditTraining$Creditability))
##
## 0 1
## 0.3146667 0.6853333
prop.table(table(creditTest$Creditability))
##
## 0 1
## 0.256 0.744
library(naivebayes)
## Warning: package 'naivebayes' was built under R version 3.6.2
## naivebayes 0.9.6 loaded
creditModelNB <- naive_bayes(Creditability~., data=creditTraining)
creditModelNB
##
## ================================ Naive Bayes =================================
##
## Call:
## naive_bayes.formula(formula = Creditability ~ ., data = creditTraining)
##
## ------------------------------------------------------------------------------
##
## Laplace smoothing: 0
##
## ------------------------------------------------------------------------------
##
## A priori probabilities:
##
## 0 1
## 0.3146667 0.6853333
##
## ------------------------------------------------------------------------------
##
## Tables:
##
## ------------------------------------------------------------------------------
## ::: Account.Balance (Gaussian)
## ------------------------------------------------------------------------------
##
## Account.Balance 0 1
## mean 1.923729 2.793774
## sd 1.036826 1.252008
##
## ------------------------------------------------------------------------------
## ::: Duration.of.Credit..month. (Gaussian)
## ------------------------------------------------------------------------------
##
## Duration.of.Credit..month. 0 1
## mean 24.46610 19.20039
## sd 13.82208 11.13433
##
## ------------------------------------------------------------------------------
## ::: Payment.Status.of.Previous.Credit (Gaussian)
## ------------------------------------------------------------------------------
##
## Payment.Status.of.Previous.Credit 0 1
## mean 2.161017 2.665370
## sd 1.071649 1.045219
##
## ------------------------------------------------------------------------------
## ::: Purpose (Gaussian)
## ------------------------------------------------------------------------------
##
## Purpose 0 1
## mean 2.927966 2.803502
## sd 2.944722 2.633253
##
## ------------------------------------------------------------------------------
## ::: Credit.Amount (Gaussian)
## ------------------------------------------------------------------------------
##
## Credit.Amount 0 1
## mean 3964.195 2984.177
## sd 3597.093 2379.685
##
## ------------------------------------------------------------------------------
##
## # ... and 15 more tables
##
## ------------------------------------------------------------------------------
#accuracy on the training set
creditTrPredNB <- predict(creditModelNB, creditTraining, type="class")
## Warning: predict.naive_bayes(): More features in the newdata are provided
## as there are probability tables in the object. Calculation is performed
## based on features to be found in the tables.
library(gmodels)
## Warning: package 'gmodels' was built under R version 3.6.2
CrossTable(creditTraining$Creditability, creditTrPredNB, prop.chisq=F, prop.c=F, prop.r=F, dnn=c("Actual Creditability", "Predicted Creditability")) #a confusion matrix of binary classification, "negative" means being (1) not creditable / defaulted / declined
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 750
##
##
## | Predicted Creditability
## Actual Creditability | 0 | 1 | Row Total |
## ---------------------|-----------|-----------|-----------|
## 0 | 151 | 85 | 236 |
## | 0.201 | 0.113 | |
## ---------------------|-----------|-----------|-----------|
## 1 | 117 | 397 | 514 |
## | 0.156 | 0.529 | |
## ---------------------|-----------|-----------|-----------|
## Column Total | 268 | 482 | 750 |
## ---------------------|-----------|-----------|-----------|
##
##
#accuracy on the test set
creditPredNB <- predict(creditModelNB, creditTest, type="class")
## Warning: predict.naive_bayes(): More features in the newdata are provided
## as there are probability tables in the object. Calculation is performed
## based on features to be found in the tables.
CrossTable(creditTest$Creditability, creditPredNB, prop.chisq=F, prop.c=F, prop.r=F, dnn=c("Actual Creditability", "Predicted Creditability")) #a confusion matrix of binary classification, "negative" means being (1) not creditable / defaulted / declined
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 250
##
##
## | Predicted Creditability
## Actual Creditability | 0 | 1 | Row Total |
## ---------------------|-----------|-----------|-----------|
## 0 | 42 | 22 | 64 |
## | 0.168 | 0.088 | |
## ---------------------|-----------|-----------|-----------|
## 1 | 35 | 151 | 186 |
## | 0.140 | 0.604 | |
## ---------------------|-----------|-----------|-----------|
## Column Total | 77 | 173 | 250 |
## ---------------------|-----------|-----------|-----------|
##
##
Question 3: What is the accuracy this time?
Answer 3: The target / dependent / response variable is selected as “Creditability”. The predictor / independent / explanatory variables, before pre-processing, are selected as ①“Account.Balance”, ②“Duration.of.Credit..month.”, ③“Payment.Status.of.Previous.Credit”, ④“Purpose”, ⑤“Credit.Amount”, ⑥“Value.Savings.Stocks”, ⑦“Length.of.current.employment”, ⑧“Instalment.per.cent”, ⑨“Sex…Marital.Status”, ⑩“Guarantors”, ⑪“Duration.in.Current.address”, ⑫“Most.valuable.available.asset”, ⑬“Age..years.”, ⑭“Concurrent.Credits”, ⑮“Type.of.apartment”, ⑯“No.of.Credits.at.this.Bank”, ⑰“Occupation”, ⑱“No.of.dependents”, ⑲“Telephone”, and ⑳“Foreign.Worker”.
Variables ① ③ ⑥ ⑦ ⑧ ⑪ ⑯ ⑱ are ordinal, and variables ④ ⑨ ⑩ ⑫ ⑭ ⑮ ⑰ ⑲ ⑳ are nominal. Variable ② ⑤ ⑬ was originally metric but categorized into the ordinal type.
Similar to Lab 1’s Tab 3, it is incorrect to scale / standardize the categorical data and compute their correlations. Hence, Lab 2’s Part 2 instruction seems to make no sense. Twenty variables are all categorical; therefore, the Chi-squared Test of Independence is run instead. The null hypothesis (H0) is that there is no association between the two variables. The alternative hypothesis (Hα) is that there is an association between the two variables. The results table for such tests is shown in the below chunk.
Five variables “Purpose”, “Length.of.current.employment”, “Most.valuable.available.asset”, “Type.of.apartment”, and “Occupation” are having many associations with other variables. As a result, these five variables are removed from the selection of independent variables. The new model shows the accuracy of 71.47% and 78.40% respectively, on the training set and the test set. The pre-processing technique slightly improves the model performance on the test set.
#change metric variables into categorical
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
credit$Duration.of.Credit..month. <- case_when(credit$Duration.of.Credit..month.>54 ~1,
credit$Duration.of.Credit..month.>48 ~2,
credit$Duration.of.Credit..month.>42 ~3,
credit$Duration.of.Credit..month.>36 ~4,
credit$Duration.of.Credit..month.>30 ~5,
credit$Duration.of.Credit..month.>24 ~6,
credit$Duration.of.Credit..month.>18 ~7,
credit$Duration.of.Credit..month.>12 ~8,
credit$Duration.of.Credit..month.>6 ~9,
T ~10)
credit$Credit.Amount <- case_when(credit$Credit.Amount>20000 ~1,
credit$Credit.Amount>15000 ~2,
credit$Credit.Amount>10000 ~3,
credit$Credit.Amount>7500 ~4,
credit$Credit.Amount>5000 ~5,
credit$Credit.Amount>2500 ~6,
credit$Credit.Amount>1500 ~7,
credit$Credit.Amount>1000 ~8,
credit$Credit.Amount>500 ~9,
T ~10)
credit$Age..years. <- case_when(credit$Age..years.>=65 ~4,
credit$Age..years.>=60 ~5,
credit$Age..years.>=40 ~3,
credit$Age..years.>=26 ~2,
T ~1)
#REMINDER: "for" loop is not recommended in R
pairs <- c()
for (i in 1:(ncol(credit)-2)) {
for (j in (i+1):(ncol(credit)-1)) {
if (chisq.test(table(credit[,i+1],credit[,j+1]))$p.value<0.05) {
pairs <- c(pairs, c(i,j))
}
}
} #O(n²)
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
#pairs of variables that are rejected by Chi-squared Test of Independence
#such variables in a pair have a significant association
table(pairs)
## pairs
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 11 12 11 18 9 4 15 6 13 6 9 15 12 7 16 10 15 11 11 9
#REMINDER: declaration with dimensions will speed up the execution in R
chiTable <- data.frame(matrix(nrow=20,ncol=20))
for (i in 1:20) {
for (j in 1:20) {
if (chisq.test(table(credit[,i+1],credit[,j+1]))$p.value<0.05) {
chiTable[i,j] <- "⬤"
} else {
chiTable[i,j] <- "◯"
}
}
}
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
## Warning in chisq.test(table(credit[, i + 1], credit[, j + 1])): Chi-squared
## approximation may be incorrect
colnames(chiTable) <- c(1:20)
chiTable$Variable <- c(1:20)
chiTable <- chiTable[,c(21,1:20)]
#full results of Chi-squared Test of Independence: ⬤ reject null hypothesis; ◯ accept null hypothesis
library(dplyr)
library(knitr)
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 3.6.2
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
chiTable %>% kable() %>% kable_styling(full_width=F)
| Variable | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> | <U+25EF> |
| 2 | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> |
| 3 | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+25EF> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+25EF> |
| 4 | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> |
| 5 | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> |
| 6 | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> |
| 7 | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> |
| 8 | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+2B24> |
| 9 | <U+25EF> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> |
| 10 | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> |
| 11 | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> |
| 12 | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> |
| 13 | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> |
| 14 | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> | <U+25EF> |
| 15 | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> |
| 16 | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> |
| 17 | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> |
| 18 | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> |
| 19 | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> |
| 20 | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+25EF> | <U+25EF> | <U+2B24> | <U+25EF> | <U+2B24> | <U+2B24> | <U+2B24> | <U+2B24> |
set.seed(12345)
creditR2 <- credit[order(runif(1000)),-c(5,8,13,16,18)] #remove five variables with the count no less than 15 in the pairs table (the potential maximum count could be 19)
creditTraining2 <- creditR2[1:750,]
creditTest2 <- creditR2[751:1000,]
library(naivebayes)
creditModelNB2 <- naive_bayes(Creditability~., data=creditTraining2)
creditModelNB2
##
## ================================ Naive Bayes =================================
##
## Call:
## naive_bayes.formula(formula = Creditability ~ ., data = creditTraining2)
##
## ------------------------------------------------------------------------------
##
## Laplace smoothing: 0
##
## ------------------------------------------------------------------------------
##
## A priori probabilities:
##
## 0 1
## 0.3146667 0.6853333
##
## ------------------------------------------------------------------------------
##
## Tables:
##
## ------------------------------------------------------------------------------
## ::: Account.Balance (Gaussian)
## ------------------------------------------------------------------------------
##
## Account.Balance 0 1
## mean 1.923729 2.793774
## sd 1.036826 1.252008
##
## ------------------------------------------------------------------------------
## ::: Duration.of.Credit..month. (Gaussian)
## ------------------------------------------------------------------------------
##
## Duration.of.Credit..month. 0 1
## mean 6.838983 7.671206
## sd 2.246398 1.819021
##
## ------------------------------------------------------------------------------
## ::: Payment.Status.of.Previous.Credit (Gaussian)
## ------------------------------------------------------------------------------
##
## Payment.Status.of.Previous.Credit 0 1
## mean 2.161017 2.665370
## sd 1.071649 1.045219
##
## ------------------------------------------------------------------------------
## ::: Credit.Amount (Gaussian)
## ------------------------------------------------------------------------------
##
## Credit.Amount 0 1
## mean 6.436441 6.756809
## sd 1.788910 1.435149
##
## ------------------------------------------------------------------------------
## ::: Value.Savings.Stocks (Gaussian)
## ------------------------------------------------------------------------------
##
## Value.Savings.Stocks 0 1
## mean 1.711864 2.334630
## sd 1.340700 1.674510
##
## ------------------------------------------------------------------------------
##
## # ... and 10 more tables
##
## ------------------------------------------------------------------------------
#accuracy on the training set
library(gmodels)
creditTrPredNB2 <- predict(creditModelNB2, creditTraining2, type="class")
## Warning: predict.naive_bayes(): More features in the newdata are provided
## as there are probability tables in the object. Calculation is performed
## based on features to be found in the tables.
CrossTable(creditTraining2$Creditability, creditTrPredNB2, prop.chisq=F, prop.c=F, prop.r=F, dnn=c("Actual Creditability", "Predicted Creditability"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 750
##
##
## | Predicted Creditability
## Actual Creditability | 0 | 1 | Row Total |
## ---------------------|-----------|-----------|-----------|
## 0 | 153 | 83 | 236 |
## | 0.204 | 0.111 | |
## ---------------------|-----------|-----------|-----------|
## 1 | 131 | 383 | 514 |
## | 0.175 | 0.511 | |
## ---------------------|-----------|-----------|-----------|
## Column Total | 284 | 466 | 750 |
## ---------------------|-----------|-----------|-----------|
##
##
#accuracy on the test set
creditPredNB2 <- predict(creditModelNB2, creditTest2, type="class")
## Warning: predict.naive_bayes(): More features in the newdata are provided
## as there are probability tables in the object. Calculation is performed
## based on features to be found in the tables.
CrossTable(creditTest2$Creditability, creditPredNB2, prop.chisq=F, prop.c=F, prop.r=F, dnn=c("Actual Creditability", "Predicted Creditability"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 250
##
##
## | Predicted Creditability
## Actual Creditability | 0 | 1 | Row Total |
## ---------------------|-----------|-----------|-----------|
## 0 | 45 | 19 | 64 |
## | 0.180 | 0.076 | |
## ---------------------|-----------|-----------|-----------|
## 1 | 35 | 151 | 186 |
## | 0.140 | 0.604 | |
## ---------------------|-----------|-----------|-----------|
## Column Total | 80 | 170 | 250 |
## ---------------------|-----------|-----------|-----------|
##
##
letters <- read.csv("letterdata.csv")
str(letters)
## 'data.frame': 20000 obs. of 17 variables:
## $ letter: Factor w/ 26 levels "A","B","C","D",..: 20 9 4 14 7 19 2 1 10 13 ...
## $ xbox : int 2 5 4 7 2 4 4 1 2 11 ...
## $ ybox : int 8 12 11 11 1 11 2 1 2 15 ...
## $ width : int 3 3 6 6 3 5 5 3 4 13 ...
## $ height: int 5 7 8 6 1 8 4 2 4 9 ...
## $ onpix : int 1 2 6 3 1 3 4 1 2 7 ...
## $ xbar : int 8 10 10 5 8 8 8 8 10 13 ...
## $ ybar : int 13 5 6 9 6 8 7 2 6 2 ...
## $ x2bar : int 0 5 2 4 6 6 6 2 2 6 ...
## $ y2bar : int 6 4 6 6 6 9 6 2 6 2 ...
## $ xybar : int 6 13 10 4 6 5 7 8 12 12 ...
## $ x2ybar: int 10 3 3 4 5 6 6 2 4 1 ...
## $ xy2bar: int 8 9 7 10 9 6 6 8 8 9 ...
## $ xedge : int 0 2 3 6 1 0 2 1 1 8 ...
## $ xedgey: int 8 8 7 10 7 8 8 6 6 1 ...
## $ yedge : int 0 4 3 2 5 9 7 2 1 1 ...
## $ yedgex: int 8 10 9 8 10 7 10 7 7 8 ...
There are 20,000 observations so we have considerable flexibility in deciding how many to put in our training set and how many to keep for testing. We decided to use 18,000. ’We’ll stick with that number now.
letters_train <- letters[1:18000, ]
letters_test <- letters[18001:20000, ]
Training the model
library(kernlab)
letter_classifier <- ksvm(letter ~ ., data = letters_train)
summary(letter_classifier)
## Length Class Mode
## 1 ksvm S4
Evaluting the model
letter_predictions <- predict(letter_classifier, letters_test)
(p <- table(letter_predictions,letters_test$letter))
##
## letter_predictions A B C D E F G H I J K L M N O P Q R
## A 75 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0
## B 0 67 0 2 0 1 0 0 0 0 0 1 0 1 0 2 1 1
## C 0 0 72 0 3 0 0 0 0 0 0 1 0 0 0 0 0 0
## D 1 1 0 71 0 0 1 2 2 2 1 0 0 0 0 2 1 1
## E 0 0 0 0 70 2 0 0 0 1 0 2 0 0 0 0 0 0
## F 0 0 0 0 0 76 0 0 3 0 0 0 0 0 0 6 0 0
## G 0 0 1 0 3 0 76 1 0 0 0 0 0 0 0 0 0 0
## H 0 0 0 1 0 0 1 58 0 1 0 1 1 0 0 0 1 1
## I 0 0 0 0 0 0 0 0 69 1 0 0 0 0 0 0 0 0
## J 0 0 0 0 0 0 0 0 2 66 0 0 0 0 0 0 0 0
## K 0 0 0 0 0 0 0 3 0 0 62 0 0 1 0 0 0 2
## L 0 0 0 0 0 0 1 0 0 0 0 69 0 0 0 0 0 0
## M 0 0 0 0 0 0 1 0 0 0 0 0 71 1 0 0 0 0
## N 0 0 0 0 0 1 0 0 0 0 0 0 0 78 0 0 0 0
## O 0 0 1 0 0 0 0 0 0 1 0 0 0 2 67 1 2 0
## P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 72 0 0
## Q 0 0 0 0 0 0 0 1 0 0 0 0 0 0 3 1 65 0
## R 0 1 0 0 0 0 1 1 0 0 4 0 0 2 1 0 0 74
## S 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
## T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## U 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## W 0 0 1 0 0 0 0 0 0 0 0 0 1 0 2 0 0 0
## X 0 1 0 0 0 0 0 0 0 0 2 4 0 0 0 0 0 0
## Y 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Z 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## letter_predictions S T U V W X Y Z
## A 0 1 0 0 0 0 0 0
## B 1 0 0 1 0 0 0 0
## C 0 0 0 0 0 0 0 0
## D 0 0 1 0 0 0 0 0
## E 0 0 0 0 0 0 0 0
## F 1 0 0 1 0 0 0 0
## G 0 0 0 0 0 0 0 0
## H 0 3 0 1 0 0 0 0
## I 0 0 0 0 0 2 0 0
## J 0 0 0 0 0 0 0 1
## K 0 0 0 0 0 0 0 0
## L 0 0 0 0 0 0 0 0
## M 0 0 0 0 2 0 0 0
## N 0 0 0 0 1 0 0 0
## O 0 0 0 0 0 0 0 0
## P 0 0 0 0 0 0 0 0
## Q 0 0 0 0 0 0 0 0
## R 0 1 0 0 0 0 0 0
## S 68 0 0 0 0 0 0 0
## T 0 88 0 0 0 0 1 0
## U 0 0 89 0 0 0 0 0
## V 0 0 0 68 0 0 1 0
## W 0 0 1 0 66 0 0 0
## X 0 0 0 0 0 84 1 0
## Y 0 1 0 0 0 0 65 0
## Z 1 0 0 0 0 0 0 81
(Accuracy <- sum(diag(p))/sum(p)*100)
## [1] 93.35
Data preparation
news <- read.csv("OnlineNewsPopularity_for_R.csv")
newsShort <- data.frame(news$n_tokens_title, news$n_tokens_content, news$n_unique_tokens, news$n_non_stop_words, news$num_hrefs, news$num_imgs, news$num_videos, news$average_token_length, news$num_keywords, news$kw_max_max, news$global_sentiment_polarity, news$avg_positive_polarity, news$title_subjectivity, news$title_sentiment_polarity, news$abs_title_subjectivity, news$abs_title_sentiment_polarity, news$shares)
colnames(newsShort) <- c("n_tokens_title", "n_tokens_content", "n_unique_tokens", "n_non_stop_words", "num_hrefs", "num_imgs", "num_videos", "average_token_length", "num_keywords", "kw_max_max", "global_sentiment_polarity", "avg_positive_polarity", "title_subjectivity", "title_sentiment_polarity", "abs_title_subjectivity", "abs_title_sentiment_polarity", "shares")
newsShort$popular = rep('na', nrow(newsShort))
for(i in 1:39644) {
if(newsShort$shares[i] >= 1400) {
newsShort$popular[i] = "yes"}
else {newsShort$popular[i] = "no"}
}
newsShort$shares = newsShort$popular
newsShort$shares <- as.factor(newsShort$shares)
news_rand <- newsShort[order(runif(10000)), ]
set.seed(12345)
#Split the data into training and test datasets
news_train <- news_rand[1:9000, ]
news_test <- news_rand[9001:10000, ]
Model Design
nb_model <- naive_bayes(shares ~ ., data=news_train)
## Warning: naive_bayes(): Feature popular - zero probabilities are present.
## Consider Laplace smoothing.
nb_model
##
## ================================ Naive Bayes =================================
##
## Call:
## naive_bayes.formula(formula = shares ~ ., data = news_train)
##
## ------------------------------------------------------------------------------
##
## Laplace smoothing: 0
##
## ------------------------------------------------------------------------------
##
## A priori probabilities:
##
## no yes
## 0.4287778 0.5712222
##
## ------------------------------------------------------------------------------
##
## Tables:
##
## ------------------------------------------------------------------------------
## ::: n_tokens_title (Gaussian)
## ------------------------------------------------------------------------------
##
## n_tokens_title no yes
## mean 9.840891 9.697919
## sd 1.940023 1.988384
##
## ------------------------------------------------------------------------------
## ::: n_tokens_content (Gaussian)
## ------------------------------------------------------------------------------
##
## n_tokens_content no yes
## mean 453.4509 513.4272
## sd 351.7631 450.8013
##
## ------------------------------------------------------------------------------
## ::: n_unique_tokens (Gaussian)
## ------------------------------------------------------------------------------
##
## n_unique_tokens no yes
## mean 0.5707594 0.5538531
## sd 0.1125435 0.1235955
##
## ------------------------------------------------------------------------------
## ::: n_non_stop_words (Gaussian)
## ------------------------------------------------------------------------------
##
## n_non_stop_words no yes
## mean 0.99429904 0.99066329
## sd 0.07529892 0.09618384
##
## ------------------------------------------------------------------------------
## ::: num_hrefs (Gaussian)
## ------------------------------------------------------------------------------
##
## num_hrefs no yes
## mean 9.144079 10.620307
## sd 8.613435 11.641156
##
## ------------------------------------------------------------------------------
##
## # ... and 12 more tables
##
## ------------------------------------------------------------------------------
Evaluate the Model
news_Pred <- predict(nb_model, newdata = news_test)
## Warning: predict.naive_bayes(): More features in the newdata are provided
## as there are probability tables in the object. Calculation is performed
## based on features to be found in the tables.
(conf_nat <- table(news_Pred, news_test$shares))
##
## news_Pred no yes
## no 424 0
## yes 9 567
(Accuracy <- sum(diag(conf_nat))/sum(conf_nat)*100)
## [1] 99.1
Question 5: Do you see any improvement compared to last three techniques? Please completely explain your results and analysis.
Answer 5: They are shown as below: the accuracy rates from the confusion matrix on the training and the test set by the seven different models. The best model is random forest according to the accuracy on the test set. In general, decision tree, random forest, regression trees, and support vector machine models perform the similar accuracy around 60%.