Data resuffling to make it more uniform and no baises
Data Preparation: Dividing dataset into Training(70%) and Validation dataset(30%)
## [1] 16103
## [1] 6902
Model Fitting C50
Model Prediction C50
## [1] No No No No Yes No
## Levels: No Yes
## word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11
## 22682 0.00 1.78 0.00 0 0.00 0 0.00 0 0 1.78 0
## 20626 0.48 0.00 0.00 0 0.00 0 0.00 0 0 0.00 0
## 16248 0.48 0.00 0.00 0 0.48 0 0.00 0 0 0.00 0
## 2883 0.00 0.00 1.47 0 0.00 0 0.00 0 0 0.00 0
## 18514 0.00 0.00 0.00 0 0.00 0 1.96 0 0 1.96 0
## 7359 0.00 0.00 0.00 0 0.00 0 0.00 0 0 0.00 0
## word12 word13 word14 word15 word16 word17 word18 word19 word20 word21
## 22682 3.57 0.00 0 0 0 0 0.00 8.92 0 1.78
## 20626 0.96 0.00 0 0 0 0 0.48 0.96 0 0.00
## 16248 0.00 0.48 0 0 0 0 0.00 4.39 0 0.00
## 2883 2.94 0.00 0 0 0 0 0.00 0.00 0 1.47
## 18514 1.96 0.00 0 0 0 0 0.00 3.92 0 1.96
## 7359 0.32 0.00 0 0 0 0 0.32 1.28 0 0.32
## word22 word23 word24 word25 word26 word27 word28 word29 word30 word31
## 22682 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 20626 0 0 0 2.88 0.96 0.96 0.96 0.48 0.96 0.96
## 16248 0 0 0 0.48 0.00 0.48 0.00 2.92 0.00 0.00
## 2883 0 0 0 0.00 1.47 0.00 0.00 0.00 0.00 0.00
## 18514 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 7359 0 0 0 4.48 3.52 0.96 0.96 0.64 0.32 0.32
## word32 word33 word34 word35 word36 word37 word38 word39 word40 word41
## 22682 0.00 0 0.00 0.00 0.00 0.00 0 1.78 0.00 0
## 20626 0.48 0 0.48 0.96 0.96 0.00 0 0.00 0.48 0
## 16248 0.00 0 0.00 0.00 0.00 0.00 0 0.00 0.00 0
## 2883 0.00 0 0.00 0.00 0.00 0.00 0 1.47 0.00 0
## 18514 0.00 0 0.00 0.00 0.00 0.00 0 0.00 0.00 0
## 7359 0.32 0 0.32 0.64 0.32 0.32 0 0.00 0.32 0
## word42 word43 word44 word45 word46 word47 word48 word49 word50 word51
## 22682 0.00 0.00 0 1.78 0 0 0 0.000 0.000 0.000
## 20626 0.00 0.00 0 0.48 0 0 0 0.000 0.276 0.000
## 16248 0.97 0.00 0 0.00 0 0 0 0.000 0.085 0.000
## 2883 0.00 0.00 0 0.00 0 0 0 0.000 0.000 0.000
## 18514 0.00 0.00 0 0.00 0 0 0 0.000 0.000 0.000
## 7359 0.00 0.32 0 0.96 0 0 0 0.264 0.211 0.105
## word52 word53 word54 word55 word56 word57 Spam predictSpam
## 22682 0.000 0 0.000 2.388 21 43 No No
## 20626 0.138 0 0.000 1.986 11 147 No No
## 16248 0.000 0 0.000 1.275 3 37 No No
## 2883 0.000 0 0.000 2.928 16 41 No No
## 18514 0.000 0 0.000 6.166 60 74 Yes Yes
## 7359 0.052 0 0.105 2.258 15 192 No No
Model Evaluation C50
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 4194 20
## Yes 10 2678
##
## Accuracy : 0.9957
## 95% CI : (0.9938, 0.9971)
## No Information Rate : 0.6091
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.9909
##
## Mcnemar's Test P-Value : 0.1003
##
## Sensitivity : 0.9976
## Specificity : 0.9926
## Pos Pred Value : 0.9953
## Neg Pred Value : 0.9963
## Prevalence : 0.6091
## Detection Rate : 0.6076
## Detection Prevalence : 0.6105
## Balanced Accuracy : 0.9951
##
## 'Positive' Class : No
##
Model Fitting RPART
Model Prediction RPART
## word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11
## 22682 0.00 1.78 0.00 0 0.00 0 0.00 0 0 1.78 0
## 20626 0.48 0.00 0.00 0 0.00 0 0.00 0 0 0.00 0
## 16248 0.48 0.00 0.00 0 0.48 0 0.00 0 0 0.00 0
## 2883 0.00 0.00 1.47 0 0.00 0 0.00 0 0 0.00 0
## 18514 0.00 0.00 0.00 0 0.00 0 1.96 0 0 1.96 0
## 7359 0.00 0.00 0.00 0 0.00 0 0.00 0 0 0.00 0
## word12 word13 word14 word15 word16 word17 word18 word19 word20 word21
## 22682 3.57 0.00 0 0 0 0 0.00 8.92 0 1.78
## 20626 0.96 0.00 0 0 0 0 0.48 0.96 0 0.00
## 16248 0.00 0.48 0 0 0 0 0.00 4.39 0 0.00
## 2883 2.94 0.00 0 0 0 0 0.00 0.00 0 1.47
## 18514 1.96 0.00 0 0 0 0 0.00 3.92 0 1.96
## 7359 0.32 0.00 0 0 0 0 0.32 1.28 0 0.32
## word22 word23 word24 word25 word26 word27 word28 word29 word30 word31
## 22682 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 20626 0 0 0 2.88 0.96 0.96 0.96 0.48 0.96 0.96
## 16248 0 0 0 0.48 0.00 0.48 0.00 2.92 0.00 0.00
## 2883 0 0 0 0.00 1.47 0.00 0.00 0.00 0.00 0.00
## 18514 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 7359 0 0 0 4.48 3.52 0.96 0.96 0.64 0.32 0.32
## word32 word33 word34 word35 word36 word37 word38 word39 word40 word41
## 22682 0.00 0 0.00 0.00 0.00 0.00 0 1.78 0.00 0
## 20626 0.48 0 0.48 0.96 0.96 0.00 0 0.00 0.48 0
## 16248 0.00 0 0.00 0.00 0.00 0.00 0 0.00 0.00 0
## 2883 0.00 0 0.00 0.00 0.00 0.00 0 1.47 0.00 0
## 18514 0.00 0 0.00 0.00 0.00 0.00 0 0.00 0.00 0
## 7359 0.32 0 0.32 0.64 0.32 0.32 0 0.00 0.32 0
## word42 word43 word44 word45 word46 word47 word48 word49 word50 word51
## 22682 0.00 0.00 0 1.78 0 0 0 0.000 0.000 0.000
## 20626 0.00 0.00 0 0.48 0 0 0 0.000 0.276 0.000
## 16248 0.97 0.00 0 0.00 0 0 0 0.000 0.085 0.000
## 2883 0.00 0.00 0 0.00 0 0 0 0.000 0.000 0.000
## 18514 0.00 0.00 0 0.00 0 0 0 0.000 0.000 0.000
## 7359 0.00 0.32 0 0.96 0 0 0 0.264 0.211 0.105
## word52 word53 word54 word55 word56 word57 Spam predictSpam
## 22682 0.000 0 0.000 2.388 21 43 No No
## 20626 0.138 0 0.000 1.986 11 147 No No
## 16248 0.000 0 0.000 1.275 3 37 No No
## 2883 0.000 0 0.000 2.928 16 41 No No
## 18514 0.000 0 0.000 6.166 60 74 Yes No
## 7359 0.052 0 0.105 2.258 15 192 No No
Model Evaluation RPART
## Warning in confusionMatrix.default(Validdataset$predictSpam, Validdataset$Spam):
## Levels are not in the same order for reference and data. Refactoring data to
## match.
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 4204 2698
## Yes 0 0
##
## Accuracy : 0.6091
## 95% CI : (0.5975, 0.6206)
## No Information Rate : 0.6091
## P-Value [Acc > NIR] : 0.5053
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 1.0000
## Specificity : 0.0000
## Pos Pred Value : 0.6091
## Neg Pred Value : NaN
## Prevalence : 0.6091
## Detection Rate : 0.6091
## Detection Prevalence : 1.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : No
##
Model Fitting CTREE
Model Prediction CTREE
## word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11
## 22682 0.00 1.78 0.00 0 0.00 0 0.00 0 0 1.78 0
## 20626 0.48 0.00 0.00 0 0.00 0 0.00 0 0 0.00 0
## 16248 0.48 0.00 0.00 0 0.48 0 0.00 0 0 0.00 0
## 2883 0.00 0.00 1.47 0 0.00 0 0.00 0 0 0.00 0
## 18514 0.00 0.00 0.00 0 0.00 0 1.96 0 0 1.96 0
## 7359 0.00 0.00 0.00 0 0.00 0 0.00 0 0 0.00 0
## word12 word13 word14 word15 word16 word17 word18 word19 word20 word21
## 22682 3.57 0.00 0 0 0 0 0.00 8.92 0 1.78
## 20626 0.96 0.00 0 0 0 0 0.48 0.96 0 0.00
## 16248 0.00 0.48 0 0 0 0 0.00 4.39 0 0.00
## 2883 2.94 0.00 0 0 0 0 0.00 0.00 0 1.47
## 18514 1.96 0.00 0 0 0 0 0.00 3.92 0 1.96
## 7359 0.32 0.00 0 0 0 0 0.32 1.28 0 0.32
## word22 word23 word24 word25 word26 word27 word28 word29 word30 word31
## 22682 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 20626 0 0 0 2.88 0.96 0.96 0.96 0.48 0.96 0.96
## 16248 0 0 0 0.48 0.00 0.48 0.00 2.92 0.00 0.00
## 2883 0 0 0 0.00 1.47 0.00 0.00 0.00 0.00 0.00
## 18514 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 7359 0 0 0 4.48 3.52 0.96 0.96 0.64 0.32 0.32
## word32 word33 word34 word35 word36 word37 word38 word39 word40 word41
## 22682 0.00 0 0.00 0.00 0.00 0.00 0 1.78 0.00 0
## 20626 0.48 0 0.48 0.96 0.96 0.00 0 0.00 0.48 0
## 16248 0.00 0 0.00 0.00 0.00 0.00 0 0.00 0.00 0
## 2883 0.00 0 0.00 0.00 0.00 0.00 0 1.47 0.00 0
## 18514 0.00 0 0.00 0.00 0.00 0.00 0 0.00 0.00 0
## 7359 0.32 0 0.32 0.64 0.32 0.32 0 0.00 0.32 0
## word42 word43 word44 word45 word46 word47 word48 word49 word50 word51
## 22682 0.00 0.00 0 1.78 0 0 0 0.000 0.000 0.000
## 20626 0.00 0.00 0 0.48 0 0 0 0.000 0.276 0.000
## 16248 0.97 0.00 0 0.00 0 0 0 0.000 0.085 0.000
## 2883 0.00 0.00 0 0.00 0 0 0 0.000 0.000 0.000
## 18514 0.00 0.00 0 0.00 0 0 0 0.000 0.000 0.000
## 7359 0.00 0.32 0 0.96 0 0 0 0.264 0.211 0.105
## word52 word53 word54 word55 word56 word57 Spam predictSpam
## 22682 0.000 0 0.000 2.388 21 43 No No
## 20626 0.138 0 0.000 1.986 11 147 No No
## 16248 0.000 0 0.000 1.275 3 37 No No
## 2883 0.000 0 0.000 2.928 16 41 No No
## 18514 0.000 0 0.000 6.166 60 74 Yes Yes
## 7359 0.052 0 0.105 2.258 15 192 No No
Model Evaluation CTREE
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 3979 173
## Yes 225 2525
##
## Accuracy : 0.9423
## 95% CI : (0.9366, 0.9477)
## No Information Rate : 0.6091
## P-Value [Acc > NIR] : < 2e-16
##
## Kappa : 0.8793
##
## Mcnemar's Test P-Value : 0.01058
##
## Sensitivity : 0.9465
## Specificity : 0.9359
## Pos Pred Value : 0.9583
## Neg Pred Value : 0.9182
## Prevalence : 0.6091
## Detection Rate : 0.5765
## Detection Prevalence : 0.6016
## Balanced Accuracy : 0.9412
##
## 'Positive' Class : No
##
Data Preperation of Binary Logestics Regression
## word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11
## 21679 0 0.00 0.00 0 0 0 0 0 0 0.00 0
## 7750 0 0.00 0.00 0 0 0 0 0 0 0.00 0
## 8523 0 0.00 0.00 0 0 0 0 0 0 1.07 0
## 4919 0 0.00 1.31 0 0 0 0 0 0 0.00 0
## 22682 0 1.78 0.00 0 0 0 0 0 0 1.78 0
## 17062 0 0.00 0.00 0 0 0 0 0 0 0.00 0
## word12 word13 word14 word15 word16 word17 word18 word19 word20 word21
## 21679 0.00 0 0.0 0 0 0.0 0 0.00 0 0.00
## 7750 0.00 0 2.5 0 0 0.0 0 7.50 0 0.00
## 8523 0.00 0 0.0 0 0 0.0 0 1.07 0 0.00
## 4919 1.31 0 0.0 0 0 0.0 0 1.31 0 5.26
## 22682 3.57 0 0.0 0 0 0.0 0 8.92 0 1.78
## 17062 1.40 0 0.0 0 0 0.7 0 1.40 0 1.40
## word22 word23 word24 word25 word26 word27 word28 word29 word30 word31
## 21679 0 0 0.00 1.63 4.91 0.00 0.00 0 0 0.0
## 7750 0 0 0.00 0.00 0.00 0.00 0.00 0 0 0.0
## 8523 0 0 0.00 1.07 1.07 2.15 2.15 0 0 0.0
## 4919 0 0 1.31 0.00 0.00 0.00 0.00 0 0 0.0
## 22682 0 0 0.00 0.00 0.00 0.00 0.00 0 0 0.0
## 17062 0 0 0.00 0.00 0.00 0.70 0.00 0 0 0.7
## word32 word33 word34 word35 word36 word37 word38 word39 word40 word41
## 21679 0 0 0 0 0.00 0.00 0 0.00 0.00 0
## 7750 0 0 0 0 0.00 0.00 0 0.00 0.00 0
## 8523 0 0 0 0 1.07 1.07 0 1.07 0.00 0
## 4919 0 0 0 0 0.00 0.00 0 0.00 0.00 0
## 22682 0 0 0 0 0.00 0.00 0 1.78 0.00 0
## 17062 0 0 0 0 0.00 0.00 0 0.00 2.11 0
## word42 word43 word44 word45 word46 word47 word48 word49 word50 word51
## 21679 0 0.00 0 0.00 0 0 0 0 0.000 0.000
## 7750 0 0.00 0 2.50 0 0 0 0 0.000 0.000
## 8523 0 1.07 0 2.15 0 0 0 0 0.326 0.000
## 4919 0 0.00 0 0.00 0 0 0 0 0.000 0.000
## 22682 0 0.00 0 1.78 0 0 0 0 0.000 0.000
## 17062 0 0.00 0 0.00 0 0 0 0 0.267 0.066
## word52 word53 word54 word55 word56 word57 Spam
## 21679 0 0.000 0 1.480 6 37 0
## 7750 0 0.000 0 2.142 5 15 0
## 8523 0 0.000 0 2.700 12 108 0
## 4919 0 0.199 0 4.818 25 53 1
## 22682 0 0.000 0 2.388 21 43 0
## 17062 0 0.000 0 17.952 200 377 0
Model fitting Binary Logestic Regression
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
##
## Call:
## glm(formula = Spam ~ ., family = "binomial", data = trainingdataset)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -5.1940 -0.2052 0.0000 0.1157 5.2029
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.600e+00 7.656e-02 -20.901 < 2e-16 ***
## word1 -3.156e-01 1.232e-01 -2.561 0.010432 *
## word2 -1.513e-01 3.912e-02 -3.868 0.000110 ***
## word3 9.892e-02 5.883e-02 1.681 0.092675 .
## word4 2.687e+00 9.145e-01 2.939 0.003298 **
## word5 5.650e-01 5.490e-02 10.291 < 2e-16 ***
## word6 7.937e-01 1.285e-01 6.178 6.50e-10 ***
## word7 2.238e+00 1.740e-01 12.863 < 2e-16 ***
## word8 5.759e-01 9.528e-02 6.044 1.51e-09 ***
## word9 7.026e-01 1.533e-01 4.583 4.57e-06 ***
## word10 1.418e-01 3.786e-02 3.744 0.000181 ***
## word11 -1.748e-01 1.602e-01 -1.091 0.275061
## word12 -1.285e-01 3.955e-02 -3.248 0.001161 **
## word13 -7.524e-02 1.214e-01 -0.620 0.535533
## word14 2.372e-01 8.605e-02 2.756 0.005850 **
## word15 1.298e+00 4.097e-01 3.169 0.001531 **
## word16 9.989e-01 7.750e-02 12.889 < 2e-16 ***
## word17 8.799e-01 1.139e-01 7.725 1.11e-14 ***
## word18 1.155e-01 6.258e-02 1.845 0.065019 .
## word19 6.644e-02 1.871e-02 3.550 0.000385 ***
## word20 1.078e+00 3.056e-01 3.526 0.000421 ***
## word21 2.541e-01 2.768e-02 9.181 < 2e-16 ***
## word22 2.310e-01 8.811e-02 2.621 0.008758 **
## word23 2.366e+00 2.587e-01 9.145 < 2e-16 ***
## word24 3.758e-01 7.469e-02 5.032 4.86e-07 ***
## word25 -1.808e+00 1.650e-01 -10.958 < 2e-16 ***
## word26 -1.115e+00 2.371e-01 -4.701 2.58e-06 ***
## word27 -1.115e+01 1.086e+00 -10.267 < 2e-16 ***
## word28 4.488e-01 1.088e-01 4.125 3.71e-05 ***
## word29 -2.471e+00 7.902e-01 -3.127 0.001764 **
## word30 -1.548e-01 1.548e-01 -1.000 0.317483
## word31 -7.939e-01 1.061e+00 -0.748 0.454264
## word32 3.474e+00 1.630e+00 2.131 0.033119 *
## word33 -6.648e-01 1.591e-01 -4.177 2.95e-05 ***
## word34 5.347e-01 7.804e-01 0.685 0.493282
## word35 -2.508e+00 4.670e-01 -5.370 7.88e-08 ***
## word36 9.408e-01 1.703e-01 5.525 3.29e-08 ***
## word37 -1.184e-01 1.004e-01 -1.179 0.238267
## word38 -3.733e-01 2.719e-01 -1.373 0.169778
## word39 -7.849e-01 2.022e-01 -3.882 0.000104 ***
## word40 -2.222e-01 1.990e-01 -1.117 0.264179
## word41 -4.634e+01 1.506e+01 -3.078 0.002086 **
## word42 -2.791e+00 4.616e-01 -6.047 1.48e-09 ***
## word43 -1.063e+00 4.026e-01 -2.641 0.008256 **
## word44 -1.656e+00 2.989e-01 -5.538 3.06e-08 ***
## word45 -7.077e-01 7.971e-02 -8.878 < 2e-16 ***
## word46 -1.372e+00 1.401e-01 -9.794 < 2e-16 ***
## word47 -1.674e+00 6.130e-01 -2.731 0.006316 **
## word48 -3.802e+00 8.736e-01 -4.352 1.35e-05 ***
## word49 -1.299e+00 2.360e-01 -5.505 3.70e-08 ***
## word50 -2.532e-01 1.303e-01 -1.943 0.052056 .
## word51 -5.192e-01 4.108e-01 -1.264 0.206319
## word52 5.946e-01 5.978e-02 9.948 < 2e-16 ***
## word53 5.542e+00 3.856e-01 14.370 < 2e-16 ***
## word54 2.011e+00 5.659e-01 3.553 0.000382 ***
## word55 1.081e-02 9.645e-03 1.121 0.262385
## word56 8.371e-03 1.312e-03 6.379 1.78e-10 ***
## word57 8.242e-04 1.207e-04 6.826 8.73e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 21613.4 on 16102 degrees of freedom
## Residual deviance: 6363.3 on 16045 degrees of freedom
## AIC: 6479.3
##
## Number of Fisher Scoring iterations: 13
Model Prediction : Apply the model to the validation dataset with cutoff probability of 0.5
## word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11
## 22682 0.00 1.78 0.00 0 0.00 0 0.00 0 0 1.78 0
## 20626 0.48 0.00 0.00 0 0.00 0 0.00 0 0 0.00 0
## 16248 0.48 0.00 0.00 0 0.48 0 0.00 0 0 0.00 0
## 2883 0.00 0.00 1.47 0 0.00 0 0.00 0 0 0.00 0
## 18514 0.00 0.00 0.00 0 0.00 0 1.96 0 0 1.96 0
## 7359 0.00 0.00 0.00 0 0.00 0 0.00 0 0 0.00 0
## word12 word13 word14 word15 word16 word17 word18 word19 word20 word21
## 22682 3.57 0.00 0 0 0 0 0.00 8.92 0 1.78
## 20626 0.96 0.00 0 0 0 0 0.48 0.96 0 0.00
## 16248 0.00 0.48 0 0 0 0 0.00 4.39 0 0.00
## 2883 2.94 0.00 0 0 0 0 0.00 0.00 0 1.47
## 18514 1.96 0.00 0 0 0 0 0.00 3.92 0 1.96
## 7359 0.32 0.00 0 0 0 0 0.32 1.28 0 0.32
## word22 word23 word24 word25 word26 word27 word28 word29 word30 word31
## 22682 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 20626 0 0 0 2.88 0.96 0.96 0.96 0.48 0.96 0.96
## 16248 0 0 0 0.48 0.00 0.48 0.00 2.92 0.00 0.00
## 2883 0 0 0 0.00 1.47 0.00 0.00 0.00 0.00 0.00
## 18514 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 7359 0 0 0 4.48 3.52 0.96 0.96 0.64 0.32 0.32
## word32 word33 word34 word35 word36 word37 word38 word39 word40 word41
## 22682 0.00 0 0.00 0.00 0.00 0.00 0 1.78 0.00 0
## 20626 0.48 0 0.48 0.96 0.96 0.00 0 0.00 0.48 0
## 16248 0.00 0 0.00 0.00 0.00 0.00 0 0.00 0.00 0
## 2883 0.00 0 0.00 0.00 0.00 0.00 0 1.47 0.00 0
## 18514 0.00 0 0.00 0.00 0.00 0.00 0 0.00 0.00 0
## 7359 0.32 0 0.32 0.64 0.32 0.32 0 0.00 0.32 0
## word42 word43 word44 word45 word46 word47 word48 word49 word50 word51
## 22682 0.00 0.00 0 1.78 0 0 0 0.000 0.000 0.000
## 20626 0.00 0.00 0 0.48 0 0 0 0.000 0.276 0.000
## 16248 0.97 0.00 0 0.00 0 0 0 0.000 0.085 0.000
## 2883 0.00 0.00 0 0.00 0 0 0 0.000 0.000 0.000
## 18514 0.00 0.00 0 0.00 0 0 0 0.000 0.000 0.000
## 7359 0.00 0.32 0 0.96 0 0 0 0.264 0.211 0.105
## word52 word53 word54 word55 word56 word57 Spam predictedSpam
## 22682 0.000 0 0.000 2.388 21 43 0 0
## 20626 0.138 0 0.000 1.986 11 147 0 0
## 16248 0.000 0 0.000 1.275 3 37 0 0
## 2883 0.000 0 0.000 2.928 16 41 0 0
## 18514 0.000 0 0.000 6.166 60 74 1 1
## 7359 0.052 0 0.105 2.258 15 192 0 0
Model Evaluation Binary Logestic Regression
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 4025 276
## 1 179 2422
##
## Accuracy : 0.9341
## 95% CI : (0.928, 0.9398)
## No Information Rate : 0.6091
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8607
##
## Mcnemar's Test P-Value : 6.778e-06
##
## Sensitivity : 0.8977
## Specificity : 0.9574
## Pos Pred Value : 0.9312
## Neg Pred Value : 0.9358
## Prevalence : 0.3909
## Detection Rate : 0.3509
## Detection Prevalence : 0.3768
## Balanced Accuracy : 0.9276
##
## 'Positive' Class : 1
##
New Data prediction
## [1] "New dataset created and displayed below"
## word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11 word12
## 1 0.23 0.23 0 0 0 0.23 0 0.23 0 0.23 0 0
## word13 word14 word15 word16 word17 word18 word19 word20 word21 word22 word23
## 1 0.23 0.23 0 0.23 0 0 0 0.23 0 0.23 0
## word24 word25 word26 word27 word28 word29 word30 word31 word32 word33 word34
## 1 0 0.23 0 0 0 0.23 0 0.23 0 0.23 0
## word35 word36 word37 word38 word39 word40 word41 word42 word43 word44 word45
## 1 0 0.23 0 0 0 0 0 0 0 0 0
## word46 word47 word48 word49 word50 word51 word52 word53 word54 word55 word56
## 1 0 0 0 0 0 3 0.23 0.93 0 4.23 298
## word57
## 1 2222
## 1
## 0.9979932
## [1] "Prediction done on new dataset"
## word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11 word12
## 1 0.23 0.23 0 0 0 0.23 0 0.23 0 0.23 0 0
## word13 word14 word15 word16 word17 word18 word19 word20 word21 word22 word23
## 1 0.23 0.23 0 0.23 0 0 0 0.23 0 0.23 0
## word24 word25 word26 word27 word28 word29 word30 word31 word32 word33 word34
## 1 0 0.23 0 0 0 0.23 0 0.23 0 0.23 0
## word35 word36 word37 word38 word39 word40 word41 word42 word43 word44 word45
## 1 0 0.23 0 0 0 0 0 0 0 0 0
## word46 word47 word48 word49 word50 word51 word52 word53 word54 word55 word56
## 1 0 0 0 0 0 3 0.23 0.93 0 4.23 298
## word57 PredictSpam
## 1 2222 1