Section : Ham files

Downloading the Dataset for Ham

vizualizing the Length of Different Senders’ Emails

Example of a Ham File

Turning These Ham Emails to a Data Frame

## # A tibble: 2 x 3
##    mail emails                         len
##   <int> <chr>                        <int>
## 1     1 Steve_Burt@cursor-system.com    28
## 2     2 Steve_Burt@cursor-system.com    28

visualizing the length of all Emails

Body of the Email

Extracting words in the Bodies of All Emails

Creating a Data Frame containing the words

Adding the Frequency of Words to the Data frame

## # A tibble: 2 x 3
##   files word                                 n
##   <int> <chr>                            <int>
## 1     1 0001                                 2
## 2     1 ea7e79d3153e7469e7a9c3e0af6a357e     2

Organizing the Data frame and adding the Term Frequency(tf), Inverse Document Frequency of a term(idf), and the combining of two term(tf_idf)

## Warning in bind_tf_idf.data.frame(., word, files, n): A value for tf_idf is negative:
## Input should have exactly one row per document-term combination.
## # A tibble: 2 x 6
##   files word                                 n    tf   idf tf_idf
##   <int> <chr>                            <int> <dbl> <dbl>  <dbl>
## 1     1 ea7e79d3153e7469e7a9c3e0af6a357e     2   0.5  7.84   3.92
## 2     1 0001                                 2   0.5  4.71   2.35

Cleaning the Data Frame,

We select only words with IDF greater than 0 and we remove words containing numbers

## # A tibble: 2 x 6
##   files word         n      tf   idf tf_idf
##   <int> <chr>    <int>   <dbl> <dbl>  <dbl>
## 1  1795 laptop's    60 0.0167   6.46 0.108 
## 2  1792 neale's    108 0.00926  7.15 0.0662

Example of the sparcity of a word

## # A tibble: 4 x 6
##   files word         n      tf   idf  tf_idf
##   <int> <chr>    <int>   <dbl> <dbl>   <dbl>
## 1  1795 laptop's    60 0.0167   6.46 0.108  
## 2  1300 laptop's   620 0.00161  6.46 0.0104 
## 3  1336 laptop's   645 0.00155  6.46 0.0100 
## 4  1301 laptop's   826 0.00121  6.46 0.00782

Section 2: Spam Files

Example of a Spam Document

Selecting the Most Frequent Words with TF_IDF

## Warning in bind_tf_idf.data.frame(., word, block, n): A value for tf_idf is negative:
## Input should have exactly one row per document-term combination.
## # A tibble: 6 x 6
##   block word                                       n      tf   idf tf_idf
##   <int> <chr>                                  <int>   <dbl> <dbl>  <dbl>
## 1     1 00001.317e78fa8ee2f54cd4890fdc09ba8176     1 1        6.14 6.14  
## 2   805 4.21.157.32                              109 0.00917  7.24 0.0664
## 3   805 g6l6w9415993                             109 0.00917  7.24 0.0664
## 4   805 1027225826.1122                          109 0.00917  7.24 0.0664
## 5   805 winnereritmugu                           109 0.00917  7.24 0.0664
## 6   805 winnergkrsvyyyyl                         109 0.00917  7.24 0.0664

Cleaning The Spam List of Words

## # A tibble: 6 x 6
##   block word                    n      tf   idf tf_idf
##   <int> <chr>               <int>   <dbl> <dbl>  <dbl>
## 1   743 luke's                127 0.00787  7.24 0.0570
## 2    58 mailto:angie_pepi     192 0.00521  7.24 0.0377
## 3   382 car's                 195 0.00513  7.24 0.0371
## 4   996 mailto:remove_me123   196 0.00510  7.24 0.0369
## 5   536 ident:nobody          125 0.008    4.53 0.0363
## 6   362 mailto:bluejo         202 0.00495  7.24 0.0359

Creating a Spam Sender’ Email Data Frame

##    Length     Class      Mode 
##      1396 character character
## [1] "lmrn@mailexcite.com"               "amknight@mailexcite.com"          
## [3] "jordan23@mailexcite.com"           "merchantsworld2001@juno.com"      
## [5] "cypherpunks-forward@ds.pro-ns.net" "sales@outsrc-em.com"

Creating a Spam Senders’ Email Data Frame

## # A tibble: 6 x 2
##   email                               len
##   <chr>                             <int>
## 1 lmrn@mailexcite.com                  19
## 2 amknight@mailexcite.com              23
## 3 jordan23@mailexcite.com              23
## 4 merchantsworld2001@juno.com          27
## 5 cypherpunks-forward@ds.pro-ns.net    33
## 6 sales@outsrc-em.com                  19

visualizing the Length of Different Spam Senders’ Emails

visualizing the Length of Different Senders’ Emails

## # A tibble: 6 x 3
##    mail spam.emails                                          len
##   <int> <chr>                                              <int>
## 1     1 lmrn@mailexcite.com                                   19
## 2     2 merchantsworld2001@juno.com                           27
## 3     3 jm@jmason.org                                         13
## 4     4 jm@netnoteinc.com                                     17
## 5     5 B0000178595@203.129.205.5.205.129.203.in-addr.arpa    50
## 6     6 B0000178595@203.129.205.5.205.129.203.in-addr.arpa    50

## Selecting by n

Spam Ham classification using Naivebayes Classifier

We create an object/model that can loop through any list of documents and create a corpus for each. This way we avoid duplicating this code for each and every set of documents that we need to loop through.

Create a corpus for each of the two email classification using the object model above

## <<VCorpus>>
## Metadata:  corpus specific: 0, document level (indexed): 0
## Content:  documents: 5
## 
## [[1]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 2818
## 
## [[2]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 1925
## 
## [[3]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 2214
## 
## [[4]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 1904
## 
## [[5]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 2708
## <<VCorpus>>
## Metadata:  corpus specific: 0, document level (indexed): 0
## Content:  documents: 6
## <<VCorpus>>
## Metadata:  corpus specific: 0, document level (indexed): 0
## Content:  documents: 5
## 
## [[1]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 2334
## 
## [[2]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 2926
## 
## [[3]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 3602
## 
## [[4]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 3675
## 
## [[5]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 2183
## steveburtcursorsystemcom thu aug
## returnpath steveburtcursorsystemcom
## deliveredto zzzzlocalhostnetnoteinccom
## receiv localhost localhost
## phoboslabsnetnoteinccom postfix esmtp id beec
## zzzzlocalhost thu aug edt
## receiv phobo
## localhost imap fetchmail
## zzzzlocalhost singledrop thu aug ist
## receiv ngrpscdyahoocom ngrpscdyahoocom
## dogmaslashnullorg smtp id
## gmbktz zzzzexamplecom thu aug
## xegroupsreturn senttozzzzexamplecomreturnsgroupsyahoocom
## receiv ngrpscdyahoocom nnfmp
## aug
## xsender steveburtcursorsystemcom
## xapparentlyto zzzzteanayahoogroupscom
## receiv egp mail aug
## receiv qmail invok network aug
## receiv unknown mgrpscdyahoocom qmqp
## aug
## receiv unknown helo mailgatewaycursorsystemcom
## mtagrpscdyahoocom smtp aug
## receiv exchangecpsloc unverifi
## mailgatewaycursorsystemcom content technolog smtprs
## esmtp id tcdefacddmailgatewaycursorsystemcom
## forteanayahoogroupscom thu aug
## receiv exchangecpsloc internet mail servic
## id pxxat thu aug
## messageid ecadddfbbdaddefbfexchangecpsloc
## zzzzteanayahoogroupscom zzzzteanayahoogroupscom
## xmailer internet mail servic
## xegroupsfrom steve burt steveburtcursorsystemcom
## steve burt steveburtcursorsystemcom
## xyahooprofil pyrus
## mimevers
## mailinglist list zzzzteanayahoogroupscom contact
## forteanaowneryahoogroupscom
## deliveredto mail list zzzzteanayahoogroupscom
## preced bulk
## listunsubscrib mailtozzzzteanaunsubscribeyahoogroupscom
## date thu aug
## subject zzzzteana re alexand
## replyto zzzzteanayahoogroupscom
## contenttyp textplain charsetusascii
## contenttransferencod bit
## 
## martin post
## tasso papadopoulo greek sculptor behind plan judg
## limeston mount kerdylio mile east salonika far
## mount atho monast communiti ideal patriot sculptur
## 
## well alexand granit featur ft high ft wide
## museum restor amphitheatr car park admir crowd
## plan
## 
## mountain limeston granit
## limeston itll weather pretti fast
## 
## yahoo group sponsor
## dvds free sp join now
## httpusclickyahoocomptybbnxieaamghaagsolbtm
## 
## 
## unsubscrib group send email
## forteanaunsubscribeegroupscom
## 
## 
## 
## use yahoo group subject httpdocsyahoocominfoterm
## martinsrvemsedacuk thu aug
## returnpath martinsrvemsedacuk
## deliveredto zzzzlocalhostnetnoteinccom
## receiv localhost localhost
## phoboslabsnetnoteinccom postfix esmtp id edbc
## zzzzlocalhost thu aug edt
## receiv phobo
## localhost imap fetchmail
## zzzzlocalhost singledrop thu aug ist
## receiv ngrpscdyahoocom ngrpscdyahoocom
## dogmaslashnullorg smtp id
## gmdtz zzzzexamplecom thu aug
## xegroupsreturn senttozzzzexamplecomreturnsgroupsyahoocom
## receiv ngrpscdyahoocom nnfmp
## aug
## xsender martinsrvemsedacuk
## xapparentlyto zzzzteanayahoogroupscom
## receiv egp mail aug
## receiv qmail invok network aug
## receiv unknown mgrpscdyahoocom qmqp
## aug
## receiv unknown helo haymarketedacuk
## mtagrpscdyahoocom smtp aug
## receiv srvemsedacuk srvemsedacuk
## haymarketedacuk esmtp id gmdsv
## forteanayahoogroupscom thu aug bst
## receiv emssrvspooldir srvemsedacuk mercuri
## aug
## receiv spooldir emssrv mercuri aug
## organ manag school
## zzzzteanayahoogroupscom
## messageid dfbdeclocalhost
## prioriti normal
## xmailer pegasus mail window v
## contentdescript mail messag bodi
## martin adamson martinsrvemsedacuk
## mimevers
## mailinglist list zzzzteanayahoogroupscom contact
## forteanaowneryahoogroupscom
## deliveredto mail list zzzzteanayahoogroupscom
## preced bulk
## listunsubscrib mailtozzzzteanaunsubscribeyahoogroupscom
## date thu aug
## subject zzzzteana playboy want go bang
## replyto zzzzteanayahoogroupscom
## contenttyp textplain charsetiso
## contenttransferencod bit
## xmimeautoconvert quotedprint bit dogmaslashnullorg
## id gmdtz
## 
## scotsman august
## 
## playboy want go bang
## 
## 
## age berlin playboy come unusu offer lure women
## bed promis last woman sleep inherit
## £
## 
## rolf eden berlin disco owner famous countless sex partner
## said imagin better way die arm attract
## young woman prefer
## 
## put last testament last woman sleep
## get money mr eden told bild newspap
## 
## want pass away beauti moment life first lot
## fun beauti woman wild sex final orgasm
## end heart attack ’m gone
## 
## mr eden sell nightclub year said applic
## sent quick age end soon said
## 
## 
## yahoo group sponsor
## dvds free sp join now
## httpusclickyahoocomptybbnxieaamghaagsolbtm
## 
## 
## unsubscrib group send email
## forteanaunsubscribeegroupscom
## 
## 
## 
## use yahoo group subject httpdocsyahoocominfoterm

Create wordcloud for Ham and Spam corpus before cleanup using bing lexicon

##                        word  freq
## "",                     "", 33612
## character(0), character(0), 15306
## "receiv             "receiv 14285
## esmtp                 esmtp  6534
## mon                     mon  5775
## sep",                 sep",  5337
## 17,                     17,  5102
## ist",                 ist",  4382
## sep                     sep  4218
## "jmlocalhost   "jmlocalhost  4136
## Classes 'tbl_df', 'tbl' and 'data.frame':    48553 obs. of  3 variables:
##  $ term    : chr  "\"\")," "\"\"," "\"\\023c\\024" "\"aa" ...
##  $ document: chr  "1" "1" "1" "1" ...
##  $ count   : num  2486 33612 1 10 1 ...
## Classes 'tbl_df', 'tbl' and 'data.frame':    891 obs. of  4 variables:
##  $ term     : chr  "abolish" "abort" "abound" "absurd" ...
##  $ document : chr  "1" "1" "1" "1" ...
##  $ count    : num  1 6 3 16 19 11 2 2 3 10 ...
##  $ sentiment: chr  "negative" "negative" "positive" "negative" ...
##                        word  freq
## "",                     "", 28837
## character(0), character(0),  8382
## "tr",                 "tr",  6841
## "receiv             "receiv  6116
## "td                     "td  5496
## mon                     mon  3230
## size                   size  3049
## "br",                 "br",  2833
## 17,                     17,  2794
## esmtp                 esmtp  2605
## Classes 'tbl_df', 'tbl' and 'data.frame':    48553 obs. of  3 variables:
##  $ term    : chr  "\"\")," "\"\"," "\"\\023c\\024" "\"aa" ...
##  $ document: chr  "1" "1" "1" "1" ...
##  $ count   : num  2486 33612 1 10 1 ...
## Classes 'tbl_df', 'tbl' and 'data.frame':    891 obs. of  4 variables:
##  $ term     : chr  "abolish" "abort" "abound" "absurd" ...
##  $ document : chr  "1" "1" "1" "1" ...
##  $ count    : num  1 6 3 16 19 11 2 2 3 10 ...
##  $ sentiment: chr  "negative" "negative" "positive" "negative" ...
## Classes 'tbl_df', 'tbl' and 'data.frame':    77303 obs. of  3 variables:
##  $ term    : chr  "\"\")," "\"\"," "\"aa" "\"aa\"," ...
##  $ document: chr  "1" "1" "1" "1" ...
##  $ count   : num  1186 28837 5 4 2 ...
## Classes 'tbl_df', 'tbl' and 'data.frame':    541 obs. of  4 variables:
##  $ term     : chr  "abort" "abscond" "acclaim" "accomplish" ...
##  $ document : chr  "1" "1" "1" "1" ...
##  $ count    : num  1 1 3 3 1 2 7 27 4 1 ...
##  $ sentiment: chr  "negative" "negative" "positive" "positive" ...

Combine the two cleaned up corpus data into a single data frame

##                                            text type
## 1             exmhworkersadminredhatcom thu aug  ham
## 2         returnpath exmhworkersadminexamplecom  ham
## 3        deliveredto zzzzlocalhostnetnoteinccom  ham
## 4                    receiv localhost localhost  ham
## 5  phoboslabsnetnoteinccom postfix esmtp id dec  ham
## 6                     zzzzlocalhost thu aug edt  ham
## 7                                  receiv phobo  ham
## 8                      localhost imap fetchmail  ham
## 9          zzzzlocalhost singledrop thu aug ist  ham
## 10   receiv listmanexamplecom listmanexamplecom  ham
## <<VCorpus>>
## Metadata:  corpus specific: 0, document level (indexed): 0
## Content:  documents: 5
## 
## [[1]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 2818
## 
## [[2]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 1925
## 
## [[3]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 2214
## 
## [[4]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 1904
## 
## [[5]]
## <<PlainTextDocument>>
## Metadata:  8
## Content:  chars: 2708

Partition data into training set and test set in ratio of 70:30

##                                    text type
## 1     exmhworkersadminredhatcom thu aug  ham
## 2 returnpath exmhworkersadminexamplecom  ham
## 4            receiv localhost localhost  ham
## 6             zzzzlocalhost thu aug edt  ham
## 7                          receiv phobo  ham
## 9  zzzzlocalhost singledrop thu aug ist  ham
##                                              text type
## 3          deliveredto zzzzlocalhostnetnoteinccom  ham
## 5    phoboslabsnetnoteinccom postfix esmtp id dec  ham
## 8                        localhost imap fetchmail  ham
## 17 receiv intmxcorpexamplecom intmxcorpexamplecom  ham
## 20                                            edt  ham
## 22   id gmbyg exmhworkerslistmanredhatcom thu aug  ham
## 25               intmxcorpredhatcom smtp id gmbyy  ham
## 29                                        thu aug  ham
## 33     receiv munnariozau localhost deltacsmuozau  ham
## 35                                            ict  ham

Create a Document Term Matrix

## Warning in tm_map.SimpleCorpus(trainCorpus, removeNumbers): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(testCorpus, removeNumbers): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(train_clean_corpus, removePunctuation):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(test_clean_corpus, removePunctuation):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(train_clean_corpus, removeWords,
## stopwords()): transformation drops documents
## Warning in tm_map.SimpleCorpus(test_clean_corpus, removeWords,
## stopwords()): transformation drops documents
## Warning in tm_map.SimpleCorpus(train_clean_corpus, stripWhitespace):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(test_clean_corpus, stripWhitespace):
## transformation drops documents

Create Term Document Matrix and Plot wordcloud and sentiment

##            word freq
## "nan",   "nan",  196
## "",         "",   93
## "receiv "receiv   93
## "brbr", "brbr",   63
## "br",     "br",   59
## aug",     aug",   57
## thu         thu   51
## esmtp     esmtp   27
## br",       br",   24
## mail       mail   23
## Classes 'tbl_df', 'tbl' and 'data.frame':    1603 obs. of  3 variables:
##  $ term    : chr  "\"\"," "\"abandon" "\"absorb" "\"act" ...
##  $ document: chr  "1" "1" "1" "1" ...
##  $ count   : num  93 1 1 2 1 3 1 1 1 3 ...
## # A tibble: 100 x 3
##    term             document count
##    <chr>            <chr>    <dbl>
##  1 "\"\","          1           93
##  2 "\"abandon"      1            1
##  3 "\"absorb"       1            1
##  4 "\"act"          1            2
##  5 "\"ad"           1            1
##  6 "\"addressbr\"," 1            3
##  7 "\"age"          1            1
##  8 "\"agenc"        1            1
##  9 "\"agre"         1            1
## 10 "\"aid"          1            3
## # ... with 90 more rows
## # A tibble: 68 x 4
##    term    document count sentiment
##    <chr>   <chr>    <dbl> <chr>    
##  1 attack  1            3 negative 
##  2 bad     1            4 negative 
##  3 betray  1            1 negative 
##  4 better  1            1 positive 
##  5 blow    1            1 negative 
##  6 bonus   1            4 positive 
##  7 boost   1            2 positive 
##  8 burn    1            1 negative 
##  9 cold    1            1 negative 
## 10 corrupt 1            1 negative 
## # ... with 58 more rows
## Classes 'tbl_df', 'tbl' and 'data.frame':    68 obs. of  4 variables:
##  $ term     : chr  "attack" "bad" "betray" "better" ...
##  $ document : chr  "1" "1" "1" "1" ...
##  $ count    : num  3 4 1 1 1 4 2 1 1 1 ...
##  $ sentiment: chr  "negative" "negative" "negative" "positive" ...
## Selecting by n

##            word freq
## "nan",   "nan",   92
## "receiv "receiv   37
## "",         "",   33
## "br",     "br",   29
## aug",     aug",   27
## thu         thu   27
## "brbr", "brbr",   24
## esmtp     esmtp   13
## mail       mail   11
## "p",       "p",   10
## Classes 'tbl_df', 'tbl' and 'data.frame':    896 obs. of  3 variables:
##  $ term    : chr  "\"\"," "\"aabaabhaceadbdc\"," "\"abl" "\"absorb" ...
##  $ document: chr  "1" "1" "1" "1" ...
##  $ count   : num  33 1 1 1 1 1 1 1 1 1 ...
## # A tibble: 100 x 3
##    term                   document count
##    <chr>                  <chr>    <dbl>
##  1 "\"\","                1           33
##  2 "\"aabaabhaceadbdc\"," 1            1
##  3 "\"abl"                1            1
##  4 "\"absorb"             1            1
##  5 "\"add"                1            1
##  6 "\"addressbr\","       1            1
##  7 "\"agre"               1            1
##  8 "\"aid"                1            1
##  9 "\"altern"             1            1
## 10 "\"anyon"              1            1
## # ... with 90 more rows
## # A tibble: 26 x 4
##    term   document count sentiment
##    <chr>  <chr>    <dbl> <chr>    
##  1 boost  1            2 positive 
##  2 crime  1            2 negative 
##  3 death  1            1 negative 
##  4 debt   1            1 negative 
##  5 easier 1            1 positive 
##  6 enjoy  1            1 positive 
##  7 fat    1            6 negative 
##  8 free   1            3 positive 
##  9 good   1            4 positive 
## 10 ideal  1            1 positive 
## # ... with 16 more rows
## Classes 'tbl_df', 'tbl' and 'data.frame':    26 obs. of  4 variables:
##  $ term     : chr  "boost" "crime" "death" "debt" ...
##  $ document : chr  "1" "1" "1" "1" ...
##  $ count    : num  2 2 1 1 1 1 6 3 4 1 ...
##  $ sentiment: chr  "positive" "negative" "negative" "negative" ...

Train the model and predict the outcome

##  chr [1:1400, 1:1270] "1" "0" "0" "1" "0" "1" "0" "0" "1" "0" "1" "0" ...
##  - attr(*, "dimnames")=List of 2
##   ..$ Docs : chr [1:1400] "1" "2" "3" "4" ...
##   ..$ Terms: chr [1:1270] "aug" "exmhworkersadminredhatcom" "thu" "exmhworkersadminexamplecom" ...
##  chr [1:600, 1:732] "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" ...
##  - attr(*, "dimnames")=List of 2
##   ..$ Docs : chr [1:600] "1" "2" "3" "4" ...
##   ..$ Terms: chr [1:732] "deliveredto" "zzzzlocalhostnetnoteinccom" "dec" "esmtp" ...

Use NaiveBayes Model to train and test/predict the test data set

## factor(corpus_train$type)
##  ham spam 
##  700  700
## $aug
##                          aug
## factor(corpus_train$type)          0          1
##                      ham  0.88142857 0.11857143
##                      spam 0.98857143 0.01142857
## 
## $exmhworkersadminredhatcom
##                          exmhworkersadminredhatcom
## factor(corpus_train$type)           0           1
##                      ham  0.998571429 0.001428571
##                      spam 1.000000000 0.000000000
## 
## $thu
##                          thu
## factor(corpus_train$type)           0           1
##                      ham  0.927142857 0.072857143
##                      spam 0.997142857 0.002857143
## 
## $exmhworkersadminexamplecom
##                          exmhworkersadminexamplecom
## factor(corpus_train$type)           0           1
##                      ham  0.997142857 0.002857143
##                      spam 1.000000000 0.000000000
## 
## $returnpath
##                          returnpath
## factor(corpus_train$type)           0           1
##                      ham  0.990000000 0.010000000
##                      spam 0.994285714 0.005714286
## 
## $localhost
##                          localhost
## factor(corpus_train$type)           0           1
##                      ham  0.977142857 0.022857143
##                      spam 0.994285714 0.005714286
## 
## $receiv
##                          receiv
## factor(corpus_train$type)          0          1
##                      ham  0.89714286 0.10285714
##                      spam 0.96571429 0.03428571
## 
## $edt
##                          edt
## factor(corpus_train$type)           0           1
##                      ham  0.987142857 0.012857143
##                      spam 0.998571429 0.001428571
## 
## $zzzzlocalhost
##                          zzzzlocalhost
## factor(corpus_train$type)          0          1
##                      ham  0.98285714 0.01714286
##                      spam 1.00000000 0.00000000
## 
## $phobo
##                          phobo
## factor(corpus_train$type)          0          1
##                      ham  0.98571429 0.01428571
##                      spam 1.00000000 0.00000000
## 
## $ist
##                          ist
## factor(corpus_train$type)           0           1
##                      ham  0.991428571 0.008571429
##                      spam 0.997142857 0.002857143
## 
## $singledrop
##                          singledrop
## factor(corpus_train$type)           0           1
##                      ham  0.991428571 0.008571429
##                      spam 0.997142857 0.002857143
## 
## $listmanexamplecom
##                          listmanexamplecom
## factor(corpus_train$type)           0           1
##                      ham  0.995714286 0.004285714
##                      spam 1.000000000 0.000000000
## 
## $dogmaslashnullorg
##                          dogmaslashnullorg
## factor(corpus_train$type)           0           1
##                      ham  0.988571429 0.011428571
##                      spam 0.997142857 0.002857143
## 
## $esmtp
##                          esmtp
## factor(corpus_train$type)          0          1
##                      ham  0.97142857 0.02857143
##                      spam 0.98714286 0.01285714
## [1] "ham"  "spam"
## naiveBayes.default(x = train, y = factor(corpus_train$type))

Output in the form of a confusion matrix

##       
## pred   ham spam
##   ham  196   39
##   spam 104  261
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction ham spam
##       ham  196   39
##       spam 104  261
##                                           
##                Accuracy : 0.7617          
##                  95% CI : (0.7255, 0.7952)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.5233          
##                                           
##  Mcnemar's Test P-Value : 8.701e-08       
##                                           
##             Sensitivity : 0.6533          
##             Specificity : 0.8700          
##          Pos Pred Value : 0.8340          
##          Neg Pred Value : 0.7151          
##              Prevalence : 0.5000          
##          Detection Rate : 0.3267          
##    Detection Prevalence : 0.3917          
##       Balanced Accuracy : 0.7617          
##                                           
##        'Positive' Class : ham             
##