Data resuffling to make it more uniform and no baises

Data Preparation: Dividing dataset into Training(70%) and Validation dataset(30%)

## [1] 16103
## [1] 6902

Model Fitting C50

Model Prediction C50

## [1] No  No  No  No  Yes No 
## Levels: No Yes
##       word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11
## 22682  0.00  1.78  0.00     0  0.00     0  0.00     0     0   1.78      0
## 20626  0.48  0.00  0.00     0  0.00     0  0.00     0     0   0.00      0
## 16248  0.48  0.00  0.00     0  0.48     0  0.00     0     0   0.00      0
## 2883   0.00  0.00  1.47     0  0.00     0  0.00     0     0   0.00      0
## 18514  0.00  0.00  0.00     0  0.00     0  1.96     0     0   1.96      0
## 7359   0.00  0.00  0.00     0  0.00     0  0.00     0     0   0.00      0
##       word12 word13 word14 word15 word16 word17 word18 word19 word20 word21
## 22682   3.57   0.00      0      0      0      0   0.00   8.92      0   1.78
## 20626   0.96   0.00      0      0      0      0   0.48   0.96      0   0.00
## 16248   0.00   0.48      0      0      0      0   0.00   4.39      0   0.00
## 2883    2.94   0.00      0      0      0      0   0.00   0.00      0   1.47
## 18514   1.96   0.00      0      0      0      0   0.00   3.92      0   1.96
## 7359    0.32   0.00      0      0      0      0   0.32   1.28      0   0.32
##       word22 word23 word24 word25 word26 word27 word28 word29 word30 word31
## 22682      0      0      0   0.00   0.00   0.00   0.00   0.00   0.00   0.00
## 20626      0      0      0   2.88   0.96   0.96   0.96   0.48   0.96   0.96
## 16248      0      0      0   0.48   0.00   0.48   0.00   2.92   0.00   0.00
## 2883       0      0      0   0.00   1.47   0.00   0.00   0.00   0.00   0.00
## 18514      0      0      0   0.00   0.00   0.00   0.00   0.00   0.00   0.00
## 7359       0      0      0   4.48   3.52   0.96   0.96   0.64   0.32   0.32
##       word32 word33 word34 word35 word36 word37 word38 word39 word40 word41
## 22682   0.00      0   0.00   0.00   0.00   0.00      0   1.78   0.00      0
## 20626   0.48      0   0.48   0.96   0.96   0.00      0   0.00   0.48      0
## 16248   0.00      0   0.00   0.00   0.00   0.00      0   0.00   0.00      0
## 2883    0.00      0   0.00   0.00   0.00   0.00      0   1.47   0.00      0
## 18514   0.00      0   0.00   0.00   0.00   0.00      0   0.00   0.00      0
## 7359    0.32      0   0.32   0.64   0.32   0.32      0   0.00   0.32      0
##       word42 word43 word44 word45 word46 word47 word48 word49 word50 word51
## 22682   0.00   0.00      0   1.78      0      0      0  0.000  0.000  0.000
## 20626   0.00   0.00      0   0.48      0      0      0  0.000  0.276  0.000
## 16248   0.97   0.00      0   0.00      0      0      0  0.000  0.085  0.000
## 2883    0.00   0.00      0   0.00      0      0      0  0.000  0.000  0.000
## 18514   0.00   0.00      0   0.00      0      0      0  0.000  0.000  0.000
## 7359    0.00   0.32      0   0.96      0      0      0  0.264  0.211  0.105
##       word52 word53 word54 word55 word56 word57 Spam predictSpam
## 22682  0.000      0  0.000  2.388     21     43   No          No
## 20626  0.138      0  0.000  1.986     11    147   No          No
## 16248  0.000      0  0.000  1.275      3     37   No          No
## 2883   0.000      0  0.000  2.928     16     41   No          No
## 18514  0.000      0  0.000  6.166     60     74  Yes         Yes
## 7359   0.052      0  0.105  2.258     15    192   No          No

Model Evaluation C50

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   No  Yes
##        No  4194   20
##        Yes   10 2678
##                                           
##                Accuracy : 0.9957          
##                  95% CI : (0.9938, 0.9971)
##     No Information Rate : 0.6091          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9909          
##                                           
##  Mcnemar's Test P-Value : 0.1003          
##                                           
##             Sensitivity : 0.9976          
##             Specificity : 0.9926          
##          Pos Pred Value : 0.9953          
##          Neg Pred Value : 0.9963          
##              Prevalence : 0.6091          
##          Detection Rate : 0.6076          
##    Detection Prevalence : 0.6105          
##       Balanced Accuracy : 0.9951          
##                                           
##        'Positive' Class : No              
## 

Model Fitting RPART

Model Prediction RPART

##       word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11
## 22682  0.00  1.78  0.00     0  0.00     0  0.00     0     0   1.78      0
## 20626  0.48  0.00  0.00     0  0.00     0  0.00     0     0   0.00      0
## 16248  0.48  0.00  0.00     0  0.48     0  0.00     0     0   0.00      0
## 2883   0.00  0.00  1.47     0  0.00     0  0.00     0     0   0.00      0
## 18514  0.00  0.00  0.00     0  0.00     0  1.96     0     0   1.96      0
## 7359   0.00  0.00  0.00     0  0.00     0  0.00     0     0   0.00      0
##       word12 word13 word14 word15 word16 word17 word18 word19 word20 word21
## 22682   3.57   0.00      0      0      0      0   0.00   8.92      0   1.78
## 20626   0.96   0.00      0      0      0      0   0.48   0.96      0   0.00
## 16248   0.00   0.48      0      0      0      0   0.00   4.39      0   0.00
## 2883    2.94   0.00      0      0      0      0   0.00   0.00      0   1.47
## 18514   1.96   0.00      0      0      0      0   0.00   3.92      0   1.96
## 7359    0.32   0.00      0      0      0      0   0.32   1.28      0   0.32
##       word22 word23 word24 word25 word26 word27 word28 word29 word30 word31
## 22682      0      0      0   0.00   0.00   0.00   0.00   0.00   0.00   0.00
## 20626      0      0      0   2.88   0.96   0.96   0.96   0.48   0.96   0.96
## 16248      0      0      0   0.48   0.00   0.48   0.00   2.92   0.00   0.00
## 2883       0      0      0   0.00   1.47   0.00   0.00   0.00   0.00   0.00
## 18514      0      0      0   0.00   0.00   0.00   0.00   0.00   0.00   0.00
## 7359       0      0      0   4.48   3.52   0.96   0.96   0.64   0.32   0.32
##       word32 word33 word34 word35 word36 word37 word38 word39 word40 word41
## 22682   0.00      0   0.00   0.00   0.00   0.00      0   1.78   0.00      0
## 20626   0.48      0   0.48   0.96   0.96   0.00      0   0.00   0.48      0
## 16248   0.00      0   0.00   0.00   0.00   0.00      0   0.00   0.00      0
## 2883    0.00      0   0.00   0.00   0.00   0.00      0   1.47   0.00      0
## 18514   0.00      0   0.00   0.00   0.00   0.00      0   0.00   0.00      0
## 7359    0.32      0   0.32   0.64   0.32   0.32      0   0.00   0.32      0
##       word42 word43 word44 word45 word46 word47 word48 word49 word50 word51
## 22682   0.00   0.00      0   1.78      0      0      0  0.000  0.000  0.000
## 20626   0.00   0.00      0   0.48      0      0      0  0.000  0.276  0.000
## 16248   0.97   0.00      0   0.00      0      0      0  0.000  0.085  0.000
## 2883    0.00   0.00      0   0.00      0      0      0  0.000  0.000  0.000
## 18514   0.00   0.00      0   0.00      0      0      0  0.000  0.000  0.000
## 7359    0.00   0.32      0   0.96      0      0      0  0.264  0.211  0.105
##       word52 word53 word54 word55 word56 word57 Spam predictSpam
## 22682  0.000      0  0.000  2.388     21     43   No          No
## 20626  0.138      0  0.000  1.986     11    147   No          No
## 16248  0.000      0  0.000  1.275      3     37   No          No
## 2883   0.000      0  0.000  2.928     16     41   No          No
## 18514  0.000      0  0.000  6.166     60     74  Yes          No
## 7359   0.052      0  0.105  2.258     15    192   No          No

Model Evaluation RPART

## Warning in confusionMatrix.default(Validdataset$predictSpam, Validdataset$Spam):
## Levels are not in the same order for reference and data. Refactoring data to
## match.
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   No  Yes
##        No  4204 2698
##        Yes    0    0
##                                           
##                Accuracy : 0.6091          
##                  95% CI : (0.5975, 0.6206)
##     No Information Rate : 0.6091          
##     P-Value [Acc > NIR] : 0.5053          
##                                           
##                   Kappa : 0               
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.0000          
##          Pos Pred Value : 0.6091          
##          Neg Pred Value :    NaN          
##              Prevalence : 0.6091          
##          Detection Rate : 0.6091          
##    Detection Prevalence : 1.0000          
##       Balanced Accuracy : 0.5000          
##                                           
##        'Positive' Class : No              
## 

Model Fitting CTREE

Model Prediction CTREE

##       word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11
## 22682  0.00  1.78  0.00     0  0.00     0  0.00     0     0   1.78      0
## 20626  0.48  0.00  0.00     0  0.00     0  0.00     0     0   0.00      0
## 16248  0.48  0.00  0.00     0  0.48     0  0.00     0     0   0.00      0
## 2883   0.00  0.00  1.47     0  0.00     0  0.00     0     0   0.00      0
## 18514  0.00  0.00  0.00     0  0.00     0  1.96     0     0   1.96      0
## 7359   0.00  0.00  0.00     0  0.00     0  0.00     0     0   0.00      0
##       word12 word13 word14 word15 word16 word17 word18 word19 word20 word21
## 22682   3.57   0.00      0      0      0      0   0.00   8.92      0   1.78
## 20626   0.96   0.00      0      0      0      0   0.48   0.96      0   0.00
## 16248   0.00   0.48      0      0      0      0   0.00   4.39      0   0.00
## 2883    2.94   0.00      0      0      0      0   0.00   0.00      0   1.47
## 18514   1.96   0.00      0      0      0      0   0.00   3.92      0   1.96
## 7359    0.32   0.00      0      0      0      0   0.32   1.28      0   0.32
##       word22 word23 word24 word25 word26 word27 word28 word29 word30 word31
## 22682      0      0      0   0.00   0.00   0.00   0.00   0.00   0.00   0.00
## 20626      0      0      0   2.88   0.96   0.96   0.96   0.48   0.96   0.96
## 16248      0      0      0   0.48   0.00   0.48   0.00   2.92   0.00   0.00
## 2883       0      0      0   0.00   1.47   0.00   0.00   0.00   0.00   0.00
## 18514      0      0      0   0.00   0.00   0.00   0.00   0.00   0.00   0.00
## 7359       0      0      0   4.48   3.52   0.96   0.96   0.64   0.32   0.32
##       word32 word33 word34 word35 word36 word37 word38 word39 word40 word41
## 22682   0.00      0   0.00   0.00   0.00   0.00      0   1.78   0.00      0
## 20626   0.48      0   0.48   0.96   0.96   0.00      0   0.00   0.48      0
## 16248   0.00      0   0.00   0.00   0.00   0.00      0   0.00   0.00      0
## 2883    0.00      0   0.00   0.00   0.00   0.00      0   1.47   0.00      0
## 18514   0.00      0   0.00   0.00   0.00   0.00      0   0.00   0.00      0
## 7359    0.32      0   0.32   0.64   0.32   0.32      0   0.00   0.32      0
##       word42 word43 word44 word45 word46 word47 word48 word49 word50 word51
## 22682   0.00   0.00      0   1.78      0      0      0  0.000  0.000  0.000
## 20626   0.00   0.00      0   0.48      0      0      0  0.000  0.276  0.000
## 16248   0.97   0.00      0   0.00      0      0      0  0.000  0.085  0.000
## 2883    0.00   0.00      0   0.00      0      0      0  0.000  0.000  0.000
## 18514   0.00   0.00      0   0.00      0      0      0  0.000  0.000  0.000
## 7359    0.00   0.32      0   0.96      0      0      0  0.264  0.211  0.105
##       word52 word53 word54 word55 word56 word57 Spam predictSpam
## 22682  0.000      0  0.000  2.388     21     43   No          No
## 20626  0.138      0  0.000  1.986     11    147   No          No
## 16248  0.000      0  0.000  1.275      3     37   No          No
## 2883   0.000      0  0.000  2.928     16     41   No          No
## 18514  0.000      0  0.000  6.166     60     74  Yes         Yes
## 7359   0.052      0  0.105  2.258     15    192   No          No

Model Evaluation CTREE

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   No  Yes
##        No  3979  173
##        Yes  225 2525
##                                           
##                Accuracy : 0.9423          
##                  95% CI : (0.9366, 0.9477)
##     No Information Rate : 0.6091          
##     P-Value [Acc > NIR] : < 2e-16         
##                                           
##                   Kappa : 0.8793          
##                                           
##  Mcnemar's Test P-Value : 0.01058         
##                                           
##             Sensitivity : 0.9465          
##             Specificity : 0.9359          
##          Pos Pred Value : 0.9583          
##          Neg Pred Value : 0.9182          
##              Prevalence : 0.6091          
##          Detection Rate : 0.5765          
##    Detection Prevalence : 0.6016          
##       Balanced Accuracy : 0.9412          
##                                           
##        'Positive' Class : No              
## 

Data Preperation of Binary Logestics Regression

##       word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11
## 21679     0  0.00  0.00     0     0     0     0     0     0   0.00      0
## 7750      0  0.00  0.00     0     0     0     0     0     0   0.00      0
## 8523      0  0.00  0.00     0     0     0     0     0     0   1.07      0
## 4919      0  0.00  1.31     0     0     0     0     0     0   0.00      0
## 22682     0  1.78  0.00     0     0     0     0     0     0   1.78      0
## 17062     0  0.00  0.00     0     0     0     0     0     0   0.00      0
##       word12 word13 word14 word15 word16 word17 word18 word19 word20 word21
## 21679   0.00      0    0.0      0      0    0.0      0   0.00      0   0.00
## 7750    0.00      0    2.5      0      0    0.0      0   7.50      0   0.00
## 8523    0.00      0    0.0      0      0    0.0      0   1.07      0   0.00
## 4919    1.31      0    0.0      0      0    0.0      0   1.31      0   5.26
## 22682   3.57      0    0.0      0      0    0.0      0   8.92      0   1.78
## 17062   1.40      0    0.0      0      0    0.7      0   1.40      0   1.40
##       word22 word23 word24 word25 word26 word27 word28 word29 word30 word31
## 21679      0      0   0.00   1.63   4.91   0.00   0.00      0      0    0.0
## 7750       0      0   0.00   0.00   0.00   0.00   0.00      0      0    0.0
## 8523       0      0   0.00   1.07   1.07   2.15   2.15      0      0    0.0
## 4919       0      0   1.31   0.00   0.00   0.00   0.00      0      0    0.0
## 22682      0      0   0.00   0.00   0.00   0.00   0.00      0      0    0.0
## 17062      0      0   0.00   0.00   0.00   0.70   0.00      0      0    0.7
##       word32 word33 word34 word35 word36 word37 word38 word39 word40 word41
## 21679      0      0      0      0   0.00   0.00      0   0.00   0.00      0
## 7750       0      0      0      0   0.00   0.00      0   0.00   0.00      0
## 8523       0      0      0      0   1.07   1.07      0   1.07   0.00      0
## 4919       0      0      0      0   0.00   0.00      0   0.00   0.00      0
## 22682      0      0      0      0   0.00   0.00      0   1.78   0.00      0
## 17062      0      0      0      0   0.00   0.00      0   0.00   2.11      0
##       word42 word43 word44 word45 word46 word47 word48 word49 word50 word51
## 21679      0   0.00      0   0.00      0      0      0      0  0.000  0.000
## 7750       0   0.00      0   2.50      0      0      0      0  0.000  0.000
## 8523       0   1.07      0   2.15      0      0      0      0  0.326  0.000
## 4919       0   0.00      0   0.00      0      0      0      0  0.000  0.000
## 22682      0   0.00      0   1.78      0      0      0      0  0.000  0.000
## 17062      0   0.00      0   0.00      0      0      0      0  0.267  0.066
##       word52 word53 word54 word55 word56 word57 Spam
## 21679      0  0.000      0  1.480      6     37    0
## 7750       0  0.000      0  2.142      5     15    0
## 8523       0  0.000      0  2.700     12    108    0
## 4919       0  0.199      0  4.818     25     53    1
## 22682      0  0.000      0  2.388     21     43    0
## 17062      0  0.000      0 17.952    200    377    0

Model fitting Binary Logestic Regression

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## 
## Call:
## glm(formula = Spam ~ ., family = "binomial", data = trainingdataset)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -5.1940  -0.2052   0.0000   0.1157   5.2029  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.600e+00  7.656e-02 -20.901  < 2e-16 ***
## word1       -3.156e-01  1.232e-01  -2.561 0.010432 *  
## word2       -1.513e-01  3.912e-02  -3.868 0.000110 ***
## word3        9.892e-02  5.883e-02   1.681 0.092675 .  
## word4        2.687e+00  9.145e-01   2.939 0.003298 ** 
## word5        5.650e-01  5.490e-02  10.291  < 2e-16 ***
## word6        7.937e-01  1.285e-01   6.178 6.50e-10 ***
## word7        2.238e+00  1.740e-01  12.863  < 2e-16 ***
## word8        5.759e-01  9.528e-02   6.044 1.51e-09 ***
## word9        7.026e-01  1.533e-01   4.583 4.57e-06 ***
## word10       1.418e-01  3.786e-02   3.744 0.000181 ***
## word11      -1.748e-01  1.602e-01  -1.091 0.275061    
## word12      -1.285e-01  3.955e-02  -3.248 0.001161 ** 
## word13      -7.524e-02  1.214e-01  -0.620 0.535533    
## word14       2.372e-01  8.605e-02   2.756 0.005850 ** 
## word15       1.298e+00  4.097e-01   3.169 0.001531 ** 
## word16       9.989e-01  7.750e-02  12.889  < 2e-16 ***
## word17       8.799e-01  1.139e-01   7.725 1.11e-14 ***
## word18       1.155e-01  6.258e-02   1.845 0.065019 .  
## word19       6.644e-02  1.871e-02   3.550 0.000385 ***
## word20       1.078e+00  3.056e-01   3.526 0.000421 ***
## word21       2.541e-01  2.768e-02   9.181  < 2e-16 ***
## word22       2.310e-01  8.811e-02   2.621 0.008758 ** 
## word23       2.366e+00  2.587e-01   9.145  < 2e-16 ***
## word24       3.758e-01  7.469e-02   5.032 4.86e-07 ***
## word25      -1.808e+00  1.650e-01 -10.958  < 2e-16 ***
## word26      -1.115e+00  2.371e-01  -4.701 2.58e-06 ***
## word27      -1.115e+01  1.086e+00 -10.267  < 2e-16 ***
## word28       4.488e-01  1.088e-01   4.125 3.71e-05 ***
## word29      -2.471e+00  7.902e-01  -3.127 0.001764 ** 
## word30      -1.548e-01  1.548e-01  -1.000 0.317483    
## word31      -7.939e-01  1.061e+00  -0.748 0.454264    
## word32       3.474e+00  1.630e+00   2.131 0.033119 *  
## word33      -6.648e-01  1.591e-01  -4.177 2.95e-05 ***
## word34       5.347e-01  7.804e-01   0.685 0.493282    
## word35      -2.508e+00  4.670e-01  -5.370 7.88e-08 ***
## word36       9.408e-01  1.703e-01   5.525 3.29e-08 ***
## word37      -1.184e-01  1.004e-01  -1.179 0.238267    
## word38      -3.733e-01  2.719e-01  -1.373 0.169778    
## word39      -7.849e-01  2.022e-01  -3.882 0.000104 ***
## word40      -2.222e-01  1.990e-01  -1.117 0.264179    
## word41      -4.634e+01  1.506e+01  -3.078 0.002086 ** 
## word42      -2.791e+00  4.616e-01  -6.047 1.48e-09 ***
## word43      -1.063e+00  4.026e-01  -2.641 0.008256 ** 
## word44      -1.656e+00  2.989e-01  -5.538 3.06e-08 ***
## word45      -7.077e-01  7.971e-02  -8.878  < 2e-16 ***
## word46      -1.372e+00  1.401e-01  -9.794  < 2e-16 ***
## word47      -1.674e+00  6.130e-01  -2.731 0.006316 ** 
## word48      -3.802e+00  8.736e-01  -4.352 1.35e-05 ***
## word49      -1.299e+00  2.360e-01  -5.505 3.70e-08 ***
## word50      -2.532e-01  1.303e-01  -1.943 0.052056 .  
## word51      -5.192e-01  4.108e-01  -1.264 0.206319    
## word52       5.946e-01  5.978e-02   9.948  < 2e-16 ***
## word53       5.542e+00  3.856e-01  14.370  < 2e-16 ***
## word54       2.011e+00  5.659e-01   3.553 0.000382 ***
## word55       1.081e-02  9.645e-03   1.121 0.262385    
## word56       8.371e-03  1.312e-03   6.379 1.78e-10 ***
## word57       8.242e-04  1.207e-04   6.826 8.73e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21613.4  on 16102  degrees of freedom
## Residual deviance:  6363.3  on 16045  degrees of freedom
## AIC: 6479.3
## 
## Number of Fisher Scoring iterations: 13

Model Prediction : Apply the model to the validation dataset with cutoff probability of 0.5

##       word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11
## 22682  0.00  1.78  0.00     0  0.00     0  0.00     0     0   1.78      0
## 20626  0.48  0.00  0.00     0  0.00     0  0.00     0     0   0.00      0
## 16248  0.48  0.00  0.00     0  0.48     0  0.00     0     0   0.00      0
## 2883   0.00  0.00  1.47     0  0.00     0  0.00     0     0   0.00      0
## 18514  0.00  0.00  0.00     0  0.00     0  1.96     0     0   1.96      0
## 7359   0.00  0.00  0.00     0  0.00     0  0.00     0     0   0.00      0
##       word12 word13 word14 word15 word16 word17 word18 word19 word20 word21
## 22682   3.57   0.00      0      0      0      0   0.00   8.92      0   1.78
## 20626   0.96   0.00      0      0      0      0   0.48   0.96      0   0.00
## 16248   0.00   0.48      0      0      0      0   0.00   4.39      0   0.00
## 2883    2.94   0.00      0      0      0      0   0.00   0.00      0   1.47
## 18514   1.96   0.00      0      0      0      0   0.00   3.92      0   1.96
## 7359    0.32   0.00      0      0      0      0   0.32   1.28      0   0.32
##       word22 word23 word24 word25 word26 word27 word28 word29 word30 word31
## 22682      0      0      0   0.00   0.00   0.00   0.00   0.00   0.00   0.00
## 20626      0      0      0   2.88   0.96   0.96   0.96   0.48   0.96   0.96
## 16248      0      0      0   0.48   0.00   0.48   0.00   2.92   0.00   0.00
## 2883       0      0      0   0.00   1.47   0.00   0.00   0.00   0.00   0.00
## 18514      0      0      0   0.00   0.00   0.00   0.00   0.00   0.00   0.00
## 7359       0      0      0   4.48   3.52   0.96   0.96   0.64   0.32   0.32
##       word32 word33 word34 word35 word36 word37 word38 word39 word40 word41
## 22682   0.00      0   0.00   0.00   0.00   0.00      0   1.78   0.00      0
## 20626   0.48      0   0.48   0.96   0.96   0.00      0   0.00   0.48      0
## 16248   0.00      0   0.00   0.00   0.00   0.00      0   0.00   0.00      0
## 2883    0.00      0   0.00   0.00   0.00   0.00      0   1.47   0.00      0
## 18514   0.00      0   0.00   0.00   0.00   0.00      0   0.00   0.00      0
## 7359    0.32      0   0.32   0.64   0.32   0.32      0   0.00   0.32      0
##       word42 word43 word44 word45 word46 word47 word48 word49 word50 word51
## 22682   0.00   0.00      0   1.78      0      0      0  0.000  0.000  0.000
## 20626   0.00   0.00      0   0.48      0      0      0  0.000  0.276  0.000
## 16248   0.97   0.00      0   0.00      0      0      0  0.000  0.085  0.000
## 2883    0.00   0.00      0   0.00      0      0      0  0.000  0.000  0.000
## 18514   0.00   0.00      0   0.00      0      0      0  0.000  0.000  0.000
## 7359    0.00   0.32      0   0.96      0      0      0  0.264  0.211  0.105
##       word52 word53 word54 word55 word56 word57 Spam predictedSpam
## 22682  0.000      0  0.000  2.388     21     43    0             0
## 20626  0.138      0  0.000  1.986     11    147    0             0
## 16248  0.000      0  0.000  1.275      3     37    0             0
## 2883   0.000      0  0.000  2.928     16     41    0             0
## 18514  0.000      0  0.000  6.166     60     74    1             1
## 7359   0.052      0  0.105  2.258     15    192    0             0

Model Evaluation Binary Logestic Regression

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0 4025  276
##          1  179 2422
##                                          
##                Accuracy : 0.9341         
##                  95% CI : (0.928, 0.9398)
##     No Information Rate : 0.6091         
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.8607         
##                                          
##  Mcnemar's Test P-Value : 6.778e-06      
##                                          
##             Sensitivity : 0.8977         
##             Specificity : 0.9574         
##          Pos Pred Value : 0.9312         
##          Neg Pred Value : 0.9358         
##              Prevalence : 0.3909         
##          Detection Rate : 0.3509         
##    Detection Prevalence : 0.3768         
##       Balanced Accuracy : 0.9276         
##                                          
##        'Positive' Class : 1              
## 

New Data prediction

## [1] "New dataset created and displayed below"
##   word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11 word12
## 1  0.23  0.23     0     0     0  0.23     0  0.23     0   0.23      0      0
##   word13 word14 word15 word16 word17 word18 word19 word20 word21 word22 word23
## 1   0.23   0.23      0   0.23      0      0      0   0.23      0   0.23      0
##   word24 word25 word26 word27 word28 word29 word30 word31 word32 word33 word34
## 1      0   0.23      0      0      0   0.23      0   0.23      0   0.23      0
##   word35 word36 word37 word38 word39 word40 word41 word42 word43 word44 word45
## 1      0   0.23      0      0      0      0      0      0      0      0      0
##   word46 word47 word48 word49 word50 word51 word52 word53 word54 word55 word56
## 1      0      0      0      0      0      3   0.23   0.93      0   4.23    298
##   word57
## 1   2222
##         1 
## 0.9979932
## [1] "Prediction done on new dataset"
##   word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11 word12
## 1  0.23  0.23     0     0     0  0.23     0  0.23     0   0.23      0      0
##   word13 word14 word15 word16 word17 word18 word19 word20 word21 word22 word23
## 1   0.23   0.23      0   0.23      0      0      0   0.23      0   0.23      0
##   word24 word25 word26 word27 word28 word29 word30 word31 word32 word33 word34
## 1      0   0.23      0      0      0   0.23      0   0.23      0   0.23      0
##   word35 word36 word37 word38 word39 word40 word41 word42 word43 word44 word45
## 1      0   0.23      0      0      0      0      0      0      0      0      0
##   word46 word47 word48 word49 word50 word51 word52 word53 word54 word55 word56
## 1      0      0      0      0      0      3   0.23   0.93      0   4.23    298
##   word57 PredictSpam
## 1   2222           1