In this project, I build a spam vs ham (non-spam) email classifier using the SpamAssassin public corpus.
Because Posit Cloud has limited memory, I use a smaller sample of emails and a reduced vocabulary (only frequent words) to keep the document–term matrix compact and avoid crashing.
Main steps:
From the SpamAssassin public corpus I downloaded:
20030228_easy_ham.tar.bz220030228_spam.tar.bz2In this Posit project I extracted them into a Data/
folder so the structure is:
Data/easy_ham/Data/spam/Each folder contains many raw email files.
# Folders that contain the unzipped emails
ham_dir <- "Data/easy_ham"
spam_dir <- "Data/spam"
# Quick check: how many files do we have in each folder?
length(list.files(ham_dir))
## [1] 2501
length(list.files(spam_dir))
## [1] 501
# Helper function to read all files from a folder into a data frame
read_email_folder <- function(path, label) {
files <- list.files(path, full.names = TRUE)
texts <- lapply(files, function(f) {
# Some files may fail to read; use tryCatch to avoid stopping
tryCatch(
paste(readLines(f, warn = FALSE, encoding = "latin1"), collapse = " "),
error = function(e) NA_character_
)
})
tibble(
text = unlist(texts),
label = label
) %>%
dplyr::filter(!is.na(text))
}
ham_df <- read_email_folder(ham_dir, "ham")
spam_df <- read_email_folder(spam_dir, "spam")
emails_raw <- dplyr::bind_rows(ham_df, spam_df)
dim(emails_raw)
## [1] 3002 2
head(emails_raw)
## # A tibble: 6 × 2
## text label
## <chr> <chr>
## 1 "From exmh-workers-admin@redhat.com Thu Aug 22 12:36:23 2002 Return-Pa… ham
## 2 "From Steve_Burt@cursor-system.com Thu Aug 22 12:46:39 2002 Return-Pat… ham
## 3 "From timc@2ubh.com Thu Aug 22 13:52:59 2002 Return-Path: <timc@2ubh.c… ham
## 4 "From irregulars-admin@tb.tf Thu Aug 22 14:23:39 2002 Return-Path: <ir… ham
## 5 "From Stewart.Smith@ee.ed.ac.uk Thu Aug 22 14:44:26 2002 Return-Path: … ham
## 6 "From martin@srv0.ems.ed.ac.uk Thu Aug 22 14:54:39 2002 Return-Path: <… ham
table(emails_raw$label)
##
## ham spam
## 2501 501
Using all emails can make the term matrix huge. For this assignment, a sample of a few hundred emails per class is enough to demonstrate the classifier.
Here I sample up to 300 ham and 300 spam using base R; this avoids any tricky dplyr evaluation issues and keeps memory use low.
set.seed(123)
max_n <- 300
# Sample ham
if (nrow(ham_df) > max_n) {
ham_small <- ham_df[sample(nrow(ham_df), max_n), ]
} else {
ham_small <- ham_df
}
# Sample spam
if (nrow(spam_df) > max_n) {
spam_small <- spam_df[sample(nrow(spam_df), max_n), ]
} else {
spam_small <- spam_df
}
emails_small <- dplyr::bind_rows(ham_small, spam_small)
dim(emails_small)
## [1] 600 2
table(emails_small$label)
##
## ham spam
## 300 300
All subsequent analysis uses emails_small, which is much
lighter in memory than the full corpus.
Next, I convert the sampled emails into a tm corpus and apply basic pre-processing:
I keep stopwords (common words like “the” and “and”) to ensure short messages still have some tokens.
if (nrow(emails_small) == 0) {
stop("No emails are available after sampling.")
}
corpus <- VCorpus(VectorSource(emails_small$text))
clean_corpus <- corpus %>%
tm_map(content_transformer(tolower)) %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
length(clean_corpus)
## [1] 600
First, I build a full DocumentTermMatrix just to compute term frequencies. Then I keep only frequent terms (words that appear in at least 15 emails). This dramatically reduces the number of columns and saves memory.
# Full DTM (used only to find frequent terms)
dtm_full <- DocumentTermMatrix(clean_corpus)
dtm_full
## <<DocumentTermMatrix (documents: 600, terms: 32820)>>
## Non-/sparse entries: 117817/19574183
## Sparsity : 99%
## Maximal term length: 245
## Weighting : term frequency (tf)
# Keep only terms that appear in at least 15 emails
freq_terms <- findFreqTerms(dtm_full, lowfreq = 15)
length(freq_terms) # number of terms we keep
## [1] 1621
# If, for some reason, there are no frequent terms (very tiny sample),
# fall back to using all terms.
if (length(freq_terms) == 0) {
freq_terms <- Terms(dtm_full)
}
head(freq_terms)
## [1] "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
## [2] "ability"
## [3] "able"
## [4] "about"
## [5] "above"
## [6] "absolutely"
# Build a smaller DTM using only the frequent terms
dtm <- DocumentTermMatrix(
clean_corpus,
control = list(dictionary = freq_terms)
)
dtm
## <<DocumentTermMatrix (documents: 600, terms: 1621)>>
## Non-/sparse entries: 67992/904608
## Sparsity : 93%
## Maximal term length: 76
## Weighting : term frequency (tf)
Now I convert the reduced DTM to a matrix and then a data frame, and
add the label column.
dtm_mat <- as.matrix(dtm)
dtm_df <- as.data.frame(dtm_mat)
# Attach labels (one per email)
dtm_df$label <- emails_small$label
# Check class balance in the reduced data
table(dtm_df$label)
##
## ham spam
## 300 300
I randomly split the data into 70% training and 30% test sets.
set.seed(123)
n <- nrow(dtm_df)
train_idx <- sample(seq_len(n), size = floor(0.7 * n))
train_df <- dtm_df[train_idx, ]
test_df <- dtm_df[-train_idx, ]
# Separate predictors and labels
train_x <- train_df %>% dplyr::select(-label)
train_y <- train_df$label
test_x <- test_df %>% dplyr::select(-label)
test_y <- test_df$label
dim(train_x)
## [1] 420 1621
dim(test_x)
## [1] 180 1621
I train a Naive Bayes classifier using the naiveBayes()
function from the e1071 package.
nb_model <- naiveBayes(x = train_x, y = train_y, laplace = 1)
nb_model
##
## Naive Bayes Classifier for Discrete Predictors
##
## Call:
## naiveBayes.default(x = train_x, y = train_y, laplace = 1)
##
## A-priori probabilities:
## train_y
## ham spam
## 0.5190476 0.4809524
##
## Conditional probabilities:
## aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## ability
## train_y [,1] [,2]
## ham 0.04587156 0.2096880
## spam 0.03465347 0.1833549
##
## able
## train_y [,1] [,2]
## ham 0.05045872 0.2394786
## spam 0.04950495 0.2951090
##
## about
## train_y [,1] [,2]
## ham 0.4862385 0.975409
## spam 0.4356436 1.001649
##
## above
## train_y [,1] [,2]
## ham 0.0412844 0.2213120
## spam 0.1188119 0.3675149
##
## absolutely
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.113861386 0.59173910
##
## abuse
## train_y [,1] [,2]
## ham 0.03211009 0.2426356
## spam 0.01980198 0.1396654
##
## accept
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.054455446 0.22747795
##
## access
## train_y [,1] [,2]
## ham 0.07798165 0.5416918
## spam 0.16831683 0.6397964
##
## according
## train_y [,1] [,2]
## ham 0.04587156 0.3150692
## spam 0.02475248 0.1557559
##
## account
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.23762376 1.1254497
##
## accounts
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.04950495 0.2392458
##
## across
## train_y [,1] [,2]
## ham 0.04587156 0.2498044
## spam 0.01980198 0.1396654
##
## act
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.118811881 0.44132610
##
## actiondhttpresponseresponseasp
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.1701884
##
## actually
## train_y [,1] [,2]
## ham 0.13302752 0.5220581
## spam 0.04950495 0.2174588
##
## adam
## train_y [,1] [,2]
## ham 0.06880734 0.4893704
## spam 0.00000000 0.0000000
##
## add
## train_y [,1] [,2]
## ham 0.09174312 0.3602683
## spam 0.07920792 0.3052825
##
## added
## train_y [,1] [,2]
## ham 0.06422018 0.2637982
## spam 0.03960396 0.2793308
##
## additional
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 0.049504950 0.2392458
##
## address
## train_y [,1] [,2]
## ham 0.09174312 0.3728402
## spam 0.54950495 1.7816156
##
## addresses
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.38118812 3.0351576
##
## adult
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.4880978
##
## advantage
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.04950495 0.2777392
##
## advertisement
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.03960396 0.219488
##
## advertising
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 0.173267327 0.8549989
##
## africa
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.064356436 0.52860168
##
## african
## train_y [,1] [,2]
## ham 0.01834862 0.2709142
## spam 0.05940594 0.3945344
##
## after
## train_y [,1] [,2]
## ham 0.1330275 0.4458834
## spam 0.2178218 0.8476360
##
## again
## train_y [,1] [,2]
## ham 0.1100917 0.4036572
## spam 0.1485149 0.7776241
##
## against
## train_y [,1] [,2]
## ham 0.07339450 0.3243262
## spam 0.07920792 0.2885258
##
## age
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.10891089 0.8684748
##
## agent
## train_y [,1] [,2]
## ham 0.01834862 0.2138802
## spam 0.01980198 0.1985118
##
## agents
## train_y [,1] [,2]
## ham 0.01376147 0.2031856
## spam 0.03465347 0.2519321
##
## ago
## train_y [,1] [,2]
## ham 0.055045872 0.24793571
## spam 0.004950495 0.07035975
##
## air
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.05445545 0.2274780
##
## aligncenter
## train_y [,1] [,2]
## ham 0.0000000 0.00000
## spam 0.4059406 1.28658
##
## aligncenterfont
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.3217822 1.995124
##
## aligncenterspan
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## aligndcenter
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.3514851 1.407239
##
## aligndcenterbfont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07920792 0.5405574
##
## aligndcenterfont
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1485149 0.5622651
##
## aligndcenterimg
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03960396 0.3711162
##
## aligndright
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08910891 0.6083827
##
## aligndrightbfont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08415842 0.6967205
##
## aligndrightfont
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.0990099 0.6230234
##
## aligndrightnbsptd
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.1972672
##
## alignleft
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.2920792 2.136822
##
## alignleftfont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04950495 0.2777392
##
## alignmiddle
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1485149 1.635344
##
## alignright
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1237624 0.7258755
##
## all
## train_y [,1] [,2]
## ham 0.5229358 1.021396
## spam 1.1930693 2.280646
##
## allow
## train_y [,1] [,2]
## ham 0.0412844 0.1994051
## spam 0.1336634 0.3691532
##
## almost
## train_y [,1] [,2]
## ham 0.05045872 0.2193933
## spam 0.03960396 0.2194880
##
## alone
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.01980198 0.1396654
##
## along
## train_y [,1] [,2]
## ham 0.05045872 0.2394786
## spam 0.04455446 0.2503630
##
## already
## train_y [,1] [,2]
## ham 0.05963303 0.2560311
## spam 0.08910891 0.3025270
##
## alsa
## train_y [,1] [,2]
## ham 0.06422018 0.6959425
## spam 0.00000000 0.0000000
##
## also
## train_y [,1] [,2]
## ham 0.2431193 0.6992909
## spam 0.3316832 0.9639709
##
## alt
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.2673267 2.640636
##
## altd
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02475248 0.1557559
##
## although
## train_y [,1] [,2]
## ham 0.04128440 0.2213120
## spam 0.06930693 0.3929079
##
## alttd
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1188119 1.688634
##
## always
## train_y [,1] [,2]
## ham 0.09174312 0.4408054
## spam 0.07920792 0.3211660
##
## amavisdmilter
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.049504950 0.23924581
##
## america
## train_y [,1] [,2]
## ham 0.07339450 0.4839626
## spam 0.08415842 0.9238932
##
## american
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.05445545 0.3889071
##
## amount
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.099009901 0.42332375
##
## amp
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1089109 0.4438858
##
## and
## train_y [,1] [,2]
## ham 3.981651 9.053331
## spam 6.727723 13.593609
##
## annuity
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03465347 0.4925183
##
## another
## train_y [,1] [,2]
## ham 0.09174312 0.3195990
## spam 0.08910891 0.3888754
##
## answer
## train_y [,1] [,2]
## ham 0.04587156 0.2498044
## spam 0.04950495 0.2777392
##
## any
## train_y [,1] [,2]
## ham 0.2981651 0.7728673
## spam 0.7178218 1.7319299
##
## anyone
## train_y [,1] [,2]
## ham 0.1376147 0.4597713
## spam 0.1435644 0.5029101
##
## anything
## train_y [,1] [,2]
## ham 0.06880734 0.2877522
## spam 0.05940594 0.2757814
##
## anyway
## train_y [,1] [,2]
## ham 0.06422018 0.2637982
## spam 0.00000000 0.0000000
##
## anywhere
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.054455446 0.24838774
##
## appears
## train_y [,1] [,2]
## ham 0.03669725 0.2322642
## spam 0.03465347 0.2709613
##
## application
## train_y [,1] [,2]
## ham 0.07339450 0.6103021
## spam 0.05940594 0.4307065
##
## applications
## train_y [,1] [,2]
## ham 0.082568807 0.73275492
## spam 0.004950495 0.07035975
##
## approved
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.084158416 0.29565096
##
## aptget
## train_y [,1] [,2]
## ham 0.1100917 0.8296293
## spam 0.0000000 0.0000000
##
## aqueous
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.07425743 1.055396
##
## archive
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.01980198 0.1396654
##
## are
## train_y [,1] [,2]
## ham 0.9036697 1.718666
## spam 1.8910891 3.330549
##
## area
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.08910891 0.6779984
##
## arial
## train_y [,1] [,2]
## ham 0.00000 0.000000
## spam 1.09901 4.019863
##
## around
## train_y [,1] [,2]
## ham 0.07798165 0.3160405
## spam 0.07425743 0.2628408
##
## ask
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.06930693 0.2910754
##
## asmtp
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.049504950 0.21745876
##
## assist
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.064356436 0.31672584
##
## assistance
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1435644 0.4827195
##
## association
## train_y [,1] [,2]
## ham 0.03211009 0.3641493
## spam 0.03465347 0.1833549
##
## assume
## train_y [,1] [,2]
## ham 0.03211009 0.2228350
## spam 0.03465347 0.2887391
##
## attained
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.4053725
##
## aug
## train_y [,1] [,2]
## ham 1.903670 3.984403
## spam 2.509901 3.172080
##
## august
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.02475248 0.1557559
##
## authnlegwnnet
## train_y [,1] [,2]
## ham 0.2201835 0.8298331
## spam 0.0000000 0.0000000
##
## auto
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.3588687
##
## aux
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.04455446 0.566911
##
## available
## train_y [,1] [,2]
## ham 0.06880734 0.3590046
## spam 0.19306931 0.5530121
##
## average
## train_y [,1] [,2]
## ham 0.03211009 0.2228350
## spam 0.05940594 0.5148884
##
## away
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.10891089 0.4967755
##
## back
## train_y [,1] [,2]
## ham 0.1055046 0.362870
## spam 0.2475248 1.333875
##
## background
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.02475248 0.2897609
##
## backgroundattachment
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## backgroundcolor
## train_y [,1] [,2]
## ham 0.00000000 0.00000000
## spam 0.00990099 0.09925589
##
## backgroundposition
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## backgroundrepeat
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## backup
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.064356436 0.28357496
##
## bad
## train_y [,1] [,2]
## ham 0.07339450 0.2945409
## spam 0.02475248 0.2326169
##
## bank
## train_y [,1] [,2]
## ham 0.04587156 0.4581132
## spam 0.17821782 0.6895963
##
## base
## train_y [,1] [,2]
## ham 0.04587156 0.4268699
## spam 0.11386139 0.3336903
##
## based
## train_y [,1] [,2]
## ham 0.07798165 0.4166705
## spam 0.10891089 0.5259626
##
## beat
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.03960396 0.1955114
##
## because
## train_y [,1] [,2]
## ham 0.3165138 0.8510242
## spam 0.3217822 0.8525180
##
## become
## train_y [,1] [,2]
## ham 0.08256881 0.3749885
## spam 0.09405941 0.5335177
##
## been
## train_y [,1] [,2]
## ham 0.3577982 0.8853434
## spam 0.3910891 1.0368915
##
## before
## train_y [,1] [,2]
## ham 0.1238532 0.4274884
## spam 0.1831683 0.7061308
##
## begin
## train_y [,1] [,2]
## ham 0.10550459 0.3753548
## spam 0.01485149 0.1212589
##
## behind
## train_y [,1] [,2]
## ham 0.05045872 0.2915472
## spam 0.01980198 0.1716295
##
## being
## train_y [,1] [,2]
## ham 0.1376147 0.4065793
## spam 0.1633663 0.7905188
##
## believe
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.12871287 0.5020033
##
## below
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.376237624 0.85053668
##
## ben
## train_y [,1] [,2]
## ham 0.036697248 0.42776033
## spam 0.004950495 0.07035975
##
## benefit
## train_y [,1] [,2]
## ham 0.02293578 0.2239705
## spam 0.06930693 0.3234585
##
## bengreenmindupmerchantscom
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.01980198 0.281439
##
## best
## train_y [,1] [,2]
## ham 0.1422018 0.6671756
## spam 0.3910891 0.9977686
##
## better
## train_y [,1] [,2]
## ham 0.1146789 0.4191995
## spam 0.1188119 0.4843240
##
## between
## train_y [,1] [,2]
## ham 0.09174312 0.4612403
## spam 0.04455446 0.3637766
##
## bfont
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.2277228 1.068604
##
## bgcolor
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1089109 0.465763
##
## bgcolord
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1584158 0.6423321
##
## bgcolordcccccc
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06435644 0.3743211
##
## bgcolordcfab
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.07920792 1.125756
##
## bgcolordddnbsptd
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07920792 0.7940471
##
## bgcolordffffff
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1584158 0.6186597
##
## bgcolorffffff
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.2920792 1.592531
##
## big
## train_y [,1] [,2]
## ham 0.05963303 0.3853586
## spam 0.03465347 0.2087326
##
## biggest
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.02970297 0.1701884
##
## bill
## train_y [,1] [,2]
## ham 0.05504587 0.3800282
## spam 0.06435644 0.2835750
##
## billion
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.04455446 0.2695030
##
## bills
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.049504950 0.35621665
##
## bit
## train_y [,1] [,2]
## ham 0.4633028 0.6302893
## spam 0.5247525 0.6401812
##
## black
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.262376238 2.61748242
##
## blockquotefont
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.3910891 5.558421
##
## blue
## train_y [,1] [,2]
## ham 0.02293578 0.2239705
## spam 0.02970297 0.2425177
##
## bmn
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## body
## train_y [,1] [,2]
## ham 0.03669725 0.2114948
## spam 0.66831683 0.9994457
##
## boingboing
## train_y [,1] [,2]
## ham 0.05045872 0.2193933
## spam 0.00000000 0.0000000
##
## bonus
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07920792 0.5496841
##
## book
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.03465347 0.3213577
##
## border
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.9455446 2.979533
##
## bordera
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02475248 0.1849598
##
## borderatd
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1485149 0.8796894
##
## borderbottom
## train_y [,1] [,2]
## ham 0.000000000 0.00000000
## spam 0.004950495 0.07035975
##
## bordercolor
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1039604 0.4279817
##
## bordercolord
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1237624 0.5367395
##
## borderd
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.6287129 1.925417
##
## borderdtd
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.01485149 0.2110793
##
## borderleft
## train_y [,1] [,2]
## ham 0.000000000 0.00000000
## spam 0.004950495 0.07035975
##
## bordertd
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.2376238 2.609392
##
## bordertop
## train_y [,1] [,2]
## ham 0.000000000 0.00000000
## spam 0.004950495 0.07035975
##
## botanical
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1287129 1.829354
##
## both
## train_y [,1] [,2]
## ham 0.07798165 0.3439692
## spam 0.09405941 0.3248641
##
## bottle
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.079207921 0.74223232
##
## bottom
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.074257426 0.49818646
##
## box
## train_y [,1] [,2]
## ham 0.05045872 0.2193933
## spam 0.09900990 0.5554784
##
## brbr
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1237624 0.9513173
##
## bug
## train_y [,1] [,2]
## ham 0.09633028 0.9623729
## spam 0.00000000 0.0000000
##
## build
## train_y [,1] [,2]
## ham 0.10550459 0.3365137
## spam 0.09405941 0.3541711
##
## built
## train_y [,1] [,2]
## ham 0.055045872 0.31358966
## spam 0.004950495 0.07035975
##
## bulk
## train_y [,1] [,2]
## ham 0.7064220 0.4951889
## spam 0.2326733 0.6311340
##
## bush
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.000000000 0.00000000
##
## business
## train_y [,1] [,2]
## ham 0.1009174 0.394547
## spam 0.7029703 3.369588
##
## but
## train_y [,1] [,2]
## ham 0.8165138 1.395616
## spam 0.3861386 1.392734
##
## buy
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.24257426 0.7826281
##
## bythinkgeek
## train_y [,1] [,2]
## ham 0.050458716 0.23947864
## spam 0.004950495 0.07035975
##
## cable
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03465347 0.4925183
##
## california
## train_y [,1] [,2]
## ham 0.03669725 0.2322642
## spam 0.04455446 0.2068360
##
## call
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.22772277 0.6523008
##
## called
## train_y [,1] [,2]
## ham 0.06880734 0.3033446
## spam 0.02970297 0.2425177
##
## came
## train_y [,1] [,2]
## ham 0.04128440 0.2412376
## spam 0.07425743 0.4454644
##
## can
## train_y [,1] [,2]
## ham 0.5550459 1.123358
## spam 0.9950495 2.126585
##
## canada
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.064356436 0.47923740
##
## cannot
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.08910891 0.3185480
##
## cant
## train_y [,1] [,2]
## ham 0.14220183 0.4325029
## spam 0.05445545 0.2676690
##
## capital
## train_y [,1] [,2]
## ham 0.11009174 1.4261511
## spam 0.03960396 0.3842882
##
## car
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.108910891 0.62931678
##
## card
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.18316832 0.9930674
##
## care
## train_y [,1] [,2]
## ham 0.03669725 0.2322642
## spam 0.04950495 0.2951090
##
## career
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.04455446 0.2296332
##
## case
## train_y [,1] [,2]
## ham 0.05504587 0.2479357
## spam 0.11881188 0.6807536
##
## cash
## train_y [,1] [,2]
## ham 0.03669725 0.4168481
## spam 0.25247525 1.1199764
##
## cause
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.044554455 0.32012909
##
## cbyi
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1237624 0.7787793
##
## cdo
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04950495 0.2174588
##
## cdrom
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 0.029702970 0.1972672
##
## cds
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.054455446 0.49071473
##
## cdt
## train_y [,1] [,2]
## ham 0.09633028 0.4127968
## spam 0.11386139 0.5480912
##
## cell
## train_y [,1] [,2]
## ham 0.06422018 0.2637982
## spam 0.01980198 0.1396654
##
## cellpadding
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.5841584 1.872914
##
## cellpaddingd
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.3069307 1.019634
##
## cellspacing
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.5940594 1.937728
##
## cellspacingd
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.4009901 1.331815
##
## center
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.39603960 1.3462992
##
## central
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.04455446 0.2503630
##
## cest
## train_y [,1] [,2]
## ham 0.04587156 0.2676169
## spam 0.01485149 0.2110793
##
## change
## train_y [,1] [,2]
## ham 0.1467890 0.5485748
## spam 0.1287129 0.6094344
##
## changed
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.01485149 0.1570158
##
## charge
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 0.074257426 0.3145410
##
## charsetdiso
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04950495 0.2174588
##
## charsetiso
## train_y [,1] [,2]
## ham 0.09174312 0.2893273
## spam 0.56435644 0.5356141
##
## charsetusascii
## train_y [,1] [,2]
## ham 0.5183486 0.5008132
## spam 0.1584158 0.3660376
##
## charsetwindows
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.14356436 0.3515186
##
## check
## train_y [,1] [,2]
## ham 0.08715596 0.3677308
## spam 0.13861386 0.7983777
##
## chicago
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.049504950 0.40827845
##
## children
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.05445545 0.3758969
##
## china
## train_y [,1] [,2]
## ham 0.022935780 0.22397047
## spam 0.004950495 0.07035975
##
## choice
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.04950495 0.5062050
##
## chris
## train_y [,1] [,2]
## ham 0.1284404 0.6665257
## spam 0.0000000 0.0000000
##
## cipher
## train_y [,1] [,2]
## ham 0.045871560 0.20968799
## spam 0.004950495 0.07035975
##
## city
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.08910891 0.3888754
##
## claim
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.10891089 0.4657630
##
## claimed
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.06435644 0.2459965
##
## class
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.014851485 0.15701579
##
## classarial
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## classified
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.7189203
##
## classmsobodytext
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04950495 0.5795218
##
## classmsonormal
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.08415842 1.006371
##
## clean
## train_y [,1] [,2]
## ham 0.211009174 0.61581903
## spam 0.004950495 0.07035975
##
## click
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.50000000 0.9370749
##
## client
## train_y [,1] [,2]
## ham 0.03211009 0.4116687
## spam 0.03465347 0.3213577
##
## clients
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.04455446 0.2503630
##
## cna
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## coach’invest
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04950495 0.7035975
##
## coast
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.01980198 0.1716295
##
## code
## train_y [,1] [,2]
## ham 0.1376147 0.5074176
## spam 0.1237624 0.5367395
##
## collaboration
## train_y [,1] [,2]
## ham 0.06880734 0.9503041
## spam 0.00000000 0.0000000
##
## collapse
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.084158416 0.52579864
##
## college
## train_y [,1] [,2]
## ham 0.05504587 0.49578616
## spam 0.00990099 0.09925589
##
## color
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.658415842 2.16189374
##
## colord
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.3861386 1.641928
##
## colordcc
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.01980198 0.1716295
##
## colordff
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.2128713 1.069353
##
## colordffff
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08415842 0.5711527
##
## colordffffff
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1930693 0.8507249
##
## colorff
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1386139 0.8925307
##
## colorffffff
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05445545 0.3484221
##
## colorfont
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.08910891 1.197834
##
## colspan
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.3415842 1.268506
##
## colspana
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03465347 0.2887391
##
## colspand
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.2376238 1.177302
##
## colspandinput
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04455446 0.3353101
##
## colspanimg
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1287129 1.320812
##
## com
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.05445545 0.3758969
##
## come
## train_y [,1] [,2]
## ham 0.07798165 0.3160405
## spam 0.10891089 0.5884627
##
## comes
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.04455446 0.3041914
##
## coming
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.05445545 0.3484221
##
## command
## train_y [,1] [,2]
## ham 0.0412844 0.3632194
## spam 0.0000000 0.0000000
##
## comment
## train_y [,1] [,2]
## ham 0.05504587 0.2285947
## spam 0.00000000 0.0000000
##
## commercial
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.054455446 0.24838774
##
## commission
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.044554455 0.22963317
##
## commissions
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03960396 0.4971719
##
## communication
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.074257426 0.35886872
##
## communications
## train_y [,1] [,2]
## ham 0.146788991 0.92408319
## spam 0.004950495 0.07035975
##
## community
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.06930693 0.5860302
##
## companies
## train_y [,1] [,2]
## ham 0.07798165 0.7670469
## spam 0.18811881 0.8190866
##
## company
## train_y [,1] [,2]
## ham 0.07798165 0.5331167
## spam 0.37623762 0.9500090
##
## companys
## train_y [,1] [,2]
## ham 0.032110092 0.30941620
## spam 0.004950495 0.07035975
##
## competitive
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.009900990 0.09925589
##
## complete
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.11881188 1.0816160
##
## completely
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.04950495 0.2592080
##
## computer
## train_y [,1] [,2]
## ham 0.1330275 1.166234
## spam 0.2821782 0.984749
##
## conference
## train_y [,1] [,2]
## ham 0.03669725 0.3162746
## spam 0.00000000 0.0000000
##
## confidential
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.3097674
##
## congo
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.5871219
##
## contact
## train_y [,1] [,2]
## ham 0.1009174 0.3168088
## spam 0.2574257 0.8598677
##
## contains
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.227722772 2.61408808
##
## content
## train_y [,1] [,2]
## ham 0.06422018 0.4658915
## spam 0.05445545 0.2483877
##
## contentclass
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.049504950 0.21745876
##
## contentdisposition
## train_y [,1] [,2]
## ham 0.12385321 0.4686286
## spam 0.01980198 0.1396654
##
## contentdtexthtml
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.08415842 0.278315
##
## contenttexthtml
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.2734495
##
## contenttransferencoding
## train_y [,1] [,2]
## ham 0.3807339 0.4960632
## spam 0.7277228 0.5555005
##
## contenttype
## train_y [,1] [,2]
## ham 1.036697 0.5984097
## spam 1.123762 0.6383512
##
## continue
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.07425743 0.4341523
##
## contract
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.059405941 0.38171600
##
## contracts
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.074257426 0.35886872
##
## control
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.10891089 0.3705849
##
## copy
## train_y [,1] [,2]
## ham 0.02752294 0.2335349
## spam 0.07425743 0.4105944
##
## copyright
## train_y [,1] [,2]
## ham 0.05504587 0.2285947
## spam 0.01980198 0.1716295
##
## corporate
## train_y [,1] [,2]
## ham 0.02293578 0.178128
## spam 0.02970297 0.262231
##
## corporation
## train_y [,1] [,2]
## ham 0.06880734 0.4182908
## spam 0.03960396 0.2194880
##
## cost
## train_y [,1] [,2]
## ham 0.0412844 0.2213120
## spam 0.1683168 0.5913019
##
## could
## train_y [,1] [,2]
## ham 0.2293578 0.5537144
## spam 0.1435644 0.5766460
##
## couldnt
## train_y [,1] [,2]
## ham 0.06422018 0.29668613
## spam 0.00990099 0.09925589
##
## count
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.07920792 0.7877567
##
## country
## train_y [,1] [,2]
## ham 0.03669725 0.2322642
## spam 0.21287129 0.8099092
##
## courier
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.004950495 0.07035975
##
## course
## train_y [,1] [,2]
## ham 0.08256881 0.3221026
## spam 0.05940594 0.2932671
##
## cpuosdncom
## train_y [,1] [,2]
## ham 0.05504587 0.3279559
## spam 0.00000000 0.0000000
##
## craig
## train_y [,1] [,2]
## ham 0.01834862 0.2138802
## spam 0.02475248 0.3517988
##
## crankslacknet
## train_y [,1] [,2]
## ham 0.06880734 0.45013
## spam 0.00000000 0.00000
##
## create
## train_y [,1] [,2]
## ham 0.04587156 0.3150692
## spam 0.03465347 0.1833549
##
## created
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.02475248 0.2326169
##
## credit
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.163366337 0.80300713
##
## current
## train_y [,1] [,2]
## ham 0.08256881 0.3074629
## spam 0.08415842 0.2956510
##
## currently
## train_y [,1] [,2]
## ham 0.0412844 0.2213120
## spam 0.1287129 0.4270311
##
## custom
## train_y [,1] [,2]
## ham 0.07798165 0.8249405
## spam 0.02475248 0.1849598
##
## customer
## train_y [,1] [,2]
## ham 0.02293578 0.2023516
## spam 0.07425743 0.3299792
##
## customers
## train_y [,1] [,2]
## ham 0.04587156 0.3562560
## spam 0.07920792 0.4274922
##
## cut
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.05445545 0.2274780
##
## cvs
## train_y [,1] [,2]
## ham 0.04587156 0.2843157
## spam 0.00000000 0.0000000
##
## cwgexmhdeepeddycom
## train_y [,1] [,2]
## ham 0.06422018 0.47568
## spam 0.00000000 0.00000
##
## daily
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.04455446 0.3041914
##
## data
## train_y [,1] [,2]
## ham 0.19724771 1.172886
## spam 0.07425743 0.314541
##
## database
## train_y [,1] [,2]
## ham 0.05045872 0.3069469
## spam 0.06930693 0.3234585
##
## datapower
## train_y [,1] [,2]
## ham 0.1513761 2.235042
## spam 0.0000000 0.000000
##
## date
## train_y [,1] [,2]
## ham 1.371560 0.5634966
## spam 1.044554 0.2503630
##
## david
## train_y [,1] [,2]
## ham 0.059633028 0.33411807
## spam 0.004950495 0.07035975
##
## day
## train_y [,1] [,2]
## ham 0.09633028 0.3654242
## spam 0.30198020 1.2549984
##
## days
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.23267327 0.7055725
##
## deal
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.11386139 0.4013732
##
## dear
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.133663366 0.36915319
##
## death
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.03465347 0.1833549
##
## debian
## train_y [,1] [,2]
## ham 0.36238532 0.8324401
## spam 0.04455446 0.3201291
##
## dec
## train_y [,1] [,2]
## ham 0.1605505 1.3011529
## spam 0.1287129 0.9374822
##
## decide
## train_y [,1] [,2]
## ham 0.02293578 0.2239705
## spam 0.04455446 0.2296332
##
## decided
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.05445545 0.3025677
##
## deliveredto
## train_y [,1] [,2]
## ham 1.435780 0.5663405
## spam 0.970297 0.2805625
##
## deliverydate
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.05445545 0.2274780
##
## department
## train_y [,1] [,2]
## ham 0.01376147 0.2031856
## spam 0.07425743 0.4672676
##
## des
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1633663 2.321872
##
## deserve
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.02475248 0.1849598
##
## designed
## train_y [,1] [,2]
## ham 0.05504587 0.47683408
## spam 0.00990099 0.09925589
##
## details
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.04950495 0.2777392
##
## developers
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.00000000 0.0000000
##
## development
## train_y [,1] [,2]
## ham 0.06422018 0.5218759
## spam 0.03465347 0.2087326
##
## device
## train_y [,1] [,2]
## ham 0.06880734 0.7118739
## spam 0.00000000 0.0000000
##
## did
## train_y [,1] [,2]
## ham 0.11009174 0.4036572
## spam 0.09405941 0.5047677
##
## didnt
## train_y [,1] [,2]
## ham 0.10550459 0.3628700
## spam 0.05445545 0.3025677
##
## different
## train_y [,1] [,2]
## ham 0.05963303 0.3200286
## spam 0.09405941 0.4528125
##
## digital
## train_y [,1] [,2]
## ham 0.08715596 0.5316873
## spam 0.01485149 0.1212589
##
## dimeboxbmccom
## train_y [,1] [,2]
## ham 0.1100917 0.6344844
## spam 0.0000000 0.0000000
##
## dinosaurs
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## direct
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1435644 0.5852101
##
## directly
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.04455446 0.2068360
##
## director
## train_y [,1] [,2]
## ham 0.04128440 0.3083212
## spam 0.02970297 0.1701884
##
## directories
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.064356436 0.44700978
##
## directory
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.02970297 0.2210534
##
## discover
## train_y [,1] [,2]
## ham 0.01376147 0.2031856
## spam 0.04950495 0.3959054
##
## discuss
## train_y [,1] [,2]
## ham 0.10550459 0.36286999
## spam 0.00990099 0.09925589
##
## discussion
## train_y [,1] [,2]
## ham 0.20183486 0.4022933
## spam 0.02970297 0.1701884
##
## display
## train_y [,1] [,2]
## ham 0.02293578 0.17812802
## spam 0.00990099 0.09925589
##
## div
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.5643564 2.778438
##
## doctype
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.2369702
##
## documents
## train_y [,1] [,2]
## ham 0.03211009 0.3094162
## spam 0.03465347 0.2709613
##
## does
## train_y [,1] [,2]
## ham 0.2431193 0.5844162
## spam 0.2277228 0.7581227
##
## doesnt
## train_y [,1] [,2]
## ham 0.11009174 0.3418424
## spam 0.01485149 0.1212589
##
## dogmaslashnullorg
## train_y [,1] [,2]
## ham 1.2660550 0.4921916
## spam 0.7772277 0.5772436
##
## doing
## train_y [,1] [,2]
## ham 0.08715596 0.3417496
## spam 0.07920792 0.3052825
##
## dollars
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.2079208 0.9805758
##
## domain
## train_y [,1] [,2]
## ham 0.0733945 0.4839626
## spam 0.2029703 1.3689489
##
## done
## train_y [,1] [,2]
## ham 0.11926606 0.6542984
## spam 0.05445545 0.2483877
##
## dont
## train_y [,1] [,2]
## ham 0.3348624 0.8105157
## spam 0.3118812 0.8503774
##
## down
## train_y [,1] [,2]
## ham 0.1146789 0.4610798
## spam 0.1980198 0.9412578
##
## drive
## train_y [,1] [,2]
## ham 0.09174312 0.6001733
## spam 0.02970297 0.1972672
##
## drives
## train_y [,1] [,2]
## ham 0.08715596 1.158706
## spam 0.00000000 0.000000
##
## drops
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.069306931 0.98503656
##
## due
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.09405941 0.3940659
##
## during
## train_y [,1] [,2]
## ham 0.05504587 0.2826753
## spam 0.10891089 0.8854938
##
## dvd
## train_y [,1] [,2]
## ham 0.08715596 1.0502228
## spam 0.01485149 0.1570158
##
## dvds
## train_y [,1] [,2]
## ham 0.06880734 0.2877522
## spam 0.00990099 0.1407195
##
## each
## train_y [,1] [,2]
## ham 0.08256881 0.2920905
## spam 0.30693069 1.6795082
##
## earn
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.074257426 0.31454096
##
## easily
## train_y [,1] [,2]
## ham 0.04587156 0.3293709
## spam 0.12376238 0.3726064
##
## easy
## train_y [,1] [,2]
## ham 0.07798165 0.3011063
## spam 0.16831683 0.5476191
##
## ebay
## train_y [,1] [,2]
## ham 0.009174312 0.13545709
## spam 0.004950495 0.07035975
##
## echostar
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## economy
## train_y [,1] [,2]
## ham 0.02293578 0.17812802
## spam 0.00990099 0.09925589
##
## edt
## train_y [,1] [,2]
## ham 0.6376147 1.0698455
## spam 0.5297030 0.6553332
##
## effort
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.04455446 0.2695030
##
## eggs
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.000000000 0.00000000
##
## egp
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.00000000 0.0000000
##
## egwn
## train_y [,1] [,2]
## ham 0.2018349 0.6038252
## spam 0.0000000 0.0000000
##
## egwnnet
## train_y [,1] [,2]
## ham 0.2385321 0.8247226
## spam 0.0000000 0.0000000
##
## either
## train_y [,1] [,2]
## ham 0.04587156 0.2096880
## spam 0.05445545 0.2483877
##
## else
## train_y [,1] [,2]
## ham 0.08256881 0.2758628
## spam 0.02475248 0.1557559
##
## email
## train_y [,1] [,2]
## ham 0.3807339 0.829744
## spam 1.7326733 4.526079
##
## emails
## train_y [,1] [,2]
## ham 0.03211009 0.3512665
## spam 0.39108911 1.9343761
##
## encodingutf
## train_y [,1] [,2]
## ham 0.2522936 0.4353284
## spam 0.0000000 0.0000000
##
## end
## train_y [,1] [,2]
## ham 0.19266055 0.6852907
## spam 0.07920792 0.3363003
##
## engineering
## train_y [,1] [,2]
## ham 0.077981651 0.87902940
## spam 0.004950495 0.07035975
##
## engines
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.094059406 0.63564466
##
## enough
## train_y [,1] [,2]
## ham 0.05963303 0.2734383
## spam 0.02970297 0.1701884
##
## enter
## train_y [,1] [,2]
## ham 0.02293578 0.2239705
## spam 0.02475248 0.1849598
##
## enterprise
## train_y [,1] [,2]
## ham 0.077981651 0.69783837
## spam 0.004950495 0.07035975
##
## entire
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.084158416 0.31202522
##
## enus
## train_y [,1] [,2]
## ham 0.1330275 0.4948686
## spam 0.0000000 0.0000000
##
## envelopefrom
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.07425743 0.3145410
##
## error
## train_y [,1] [,2]
## ham 0.09174312 0.4612403
## spam 0.11881188 0.3243709
##
## errorsto
## train_y [,1] [,2]
## ham 0.6192661 0.4960632
## spam 0.1386139 0.3464016
##
## esmtp
## train_y [,1] [,2]
## ham 3.573394 1.570649
## spam 2.247525 0.966153
##
## est
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.7850789
##
## etc
## train_y [,1] [,2]
## ham 0.04128440 0.2213120
## spam 0.05940594 0.4640677
##
## european
## train_y [,1] [,2]
## ham 0.06422018 0.7531835
## spam 0.00000000 0.0000000
##
## even
## train_y [,1] [,2]
## ham 0.1743119 0.5654253
## spam 0.1980198 0.5904683
##
## events
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.04950495 0.3115117
##
## ever
## train_y [,1] [,2]
## ham 0.06422018 0.2807242
## spam 0.10891089 0.4208731
##
## every
## train_y [,1] [,2]
## ham 0.1055046 0.5702833
## spam 0.2772277 1.0754503
##
## everything
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.06930693 0.3076933
##
## exactly
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.05445545 0.2274780
##
## example
## train_y [,1] [,2]
## ham 0.04128440 0.221312
## spam 0.04455446 0.269503
##
## excellent
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.054455446 0.22747795
##
## except
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.01485149 0.1570158
##
## exchange
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.16336634 0.3965580
##
## exim
## train_y [,1] [,2]
## ham 0.42660550 0.7955107
## spam 0.07425743 0.3855999
##
## exist
## train_y [,1] [,2]
## ham 0.05963303 0.3606495
## spam 0.01485149 0.1570158
##
## exmh
## train_y [,1] [,2]
## ham 0.266055 0.9420265
## spam 0.000000 0.0000000
##
## exmhp
## train_y [,1] [,2]
## ham 0.05504587 0.4035525
## spam 0.00000000 0.0000000
##
## exmhusers
## train_y [,1] [,2]
## ham 0.04587156 0.209688
## spam 0.00000000 0.000000
##
## exmhusersadminredhatcom
## train_y [,1] [,2]
## ham 0.04587156 0.209688
## spam 0.00000000 0.000000
##
## exmhusersadminspamassassintaintorg
## train_y [,1] [,2]
## ham 0.1376147 0.629064
## spam 0.0000000 0.000000
##
## exmhuserslistmanredhatcom
## train_y [,1] [,2]
## ham 0.09174312 0.419376
## spam 0.00000000 0.000000
##
## exmhuserslistmanspamassassintaintorg
## train_y [,1] [,2]
## ham 0.04587156 0.209688
## spam 0.00000000 0.000000
##
## exmhusersredhatcom
## train_y [,1] [,2]
## ham 0.1926606 0.910533
## spam 0.0000000 0.000000
##
## exmhusersspamassassintaintorg
## train_y [,1] [,2]
## ham 0.233945 1.071346
## spam 0.000000 0.000000
##
## exmhworkersadminspamassassintaintorg
## train_y [,1] [,2]
## ham 0.1238532 0.5982153
## spam 0.0000000 0.0000000
##
## exmhworkerslistmanredhatcom
## train_y [,1] [,2]
## ham 0.08256881 0.3988102
## spam 0.00000000 0.0000000
##
## exmhworkersredhatcom
## train_y [,1] [,2]
## ham 0.1743119 0.8676897
## spam 0.0000000 0.0000000
##
## exmhworkersspamassassintaintorg
## train_y [,1] [,2]
## ham 0.1605505 0.778101
## spam 0.0000000 0.000000
##
## experience
## train_y [,1] [,2]
## ham 0.06880734 0.3459302
## spam 0.03960396 0.1955114
##
## express
## train_y [,1] [,2]
## ham 0.06422018 0.2457090
## spam 0.18316832 0.4126288
##
## extended
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.103960396 0.70116049
##
## face
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.06435644 0.5191045
##
## face±¼¸²span
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1336634 1.899713
##
## facearial
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.4950495 2.111912
##
## facearialfont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07920792 0.9893278
##
## facecourier
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## faced
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04455446 0.3201291
##
## facedarial
## train_y [,1] [,2]
## ham 0.0000000 0.00000
## spam 0.3861386 1.70145
##
## facedtahoma
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.2376238 1.719936
##
## facedtimes
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.03960396 0.219488
##
## facedverdana
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.4257426 1.827629
##
## facetahoma
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03465347 0.3782483
##
## facetimes
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07920792 0.5932148
##
## faceverdana
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.8069307 4.054032
##
## faceverdanaarial
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## faceverdanafont
## train_y [,1] [,2]
## ham 0.000000 0.000000
## spam 0.519802 7.387774
##
## fact
## train_y [,1] [,2]
## ham 0.07339450 0.4544997
## spam 0.07920792 0.4389759
##
## factors
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.07425743 0.7850789
##
## failed
## train_y [,1] [,2]
## ham 0.02293578 0.2023516
## spam 0.03465347 0.2519321
##
## fall
## train_y [,1] [,2]
## ham 0.06880734 0.3323419
## spam 0.04455446 0.2068360
##
## family
## train_y [,1] [,2]
## ham 0.07798165 0.6356328
## spam 0.08415842 0.3566658
##
## familysansserif
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1089109 1.296002
##
## far
## train_y [,1] [,2]
## ham 0.06880734 0.2537088
## spam 0.05445545 0.3338379
##
## fast
## train_y [,1] [,2]
## ham 0.03211009 0.2426356
## spam 0.05940594 0.2571092
##
## fastest
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.05940594 0.2571092
##
## father
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.064356436 0.46874112
##
## fax
## train_y [,1] [,2]
## ham 0.02293578 0.178128
## spam 0.27227723 1.356805
##
## featurepacked
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08910891 0.4014652
##
## feb
## train_y [,1] [,2]
## ham 0.1880734 1.197711
## spam 0.0000000 0.000000
##
## federal
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.108910891 0.52596256
##
## fee
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.02970297 0.1701884
##
## feel
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.10891089 0.4866576
##
## fees
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.03960396 0.2194880
##
## fetchmail
## train_y [,1] [,2]
## ham 0.9908257 0.1917858
## spam 0.9306931 0.2546063
##
## few
## train_y [,1] [,2]
## ham 0.1284404 0.3617907
## spam 0.1831683 0.8110640
##
## ffffc
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## fields
## train_y [,1] [,2]
## ham 0.073394495 1.01787184
## spam 0.004950495 0.07035975
##
## figure
## train_y [,1] [,2]
## ham 0.04128440 0.2213120
## spam 0.02475248 0.1557559
##
## file
## train_y [,1] [,2]
## ham 0.13302752 0.4948686
## spam 0.03960396 0.2966074
##
## files
## train_y [,1] [,2]
## ham 0.13761468 0.498253
## spam 0.04455446 0.206836
##
## fill
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1633663 0.4439136
##
## finally
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.08415842 0.3424328
##
## financial
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.18811881 0.6723075
##
## find
## train_y [,1] [,2]
## ham 0.1238532 0.4053556
## spam 0.2079208 0.6037500
##
## finding
## train_y [,1] [,2]
## ham 0.03211009 0.17669820
## spam 0.00990099 0.09925589
##
## fine
## train_y [,1] [,2]
## ham 0.04128440 0.22131197
## spam 0.00990099 0.09925589
##
## first
## train_y [,1] [,2]
## ham 0.2293578 0.8328844
## spam 0.2079208 0.8383451
##
## five
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.084158416 0.32758203
##
## floppy
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02475248 0.1557559
##
## folder
## train_y [,1] [,2]
## ham 0.07798165 0.4382323
## spam 0.00000000 0.0000000
##
## follow
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.123762376 0.66879985
##
## following
## train_y [,1] [,2]
## ham 0.05504587 0.2479357
## spam 0.23762376 1.0710902
##
## font
## train_y [,1] [,2]
## ham 0.000000 0.000000
## spam 1.485149 3.871349
##
## fontbfont
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.04950495 0.327093
##
## fontfamily
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.2524752 1.759302
##
## fontfont
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.2871287 1.362881
##
## fontifont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03960396 0.2793308
##
## fontp
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.0990099 0.359794
##
## fontsize
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1237624 0.7326974
##
## fonttd
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06435644 0.3006077
##
## for
## train_y [,1] [,2]
## ham 6.215596 3.702575
## spam 7.391089 6.877067
##
## force
## train_y [,1] [,2]
## ham 0.03669725 0.2114948
## spam 0.02970297 0.1701884
##
## foreign
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.089108911 0.42552888
##
## forged
## train_y [,1] [,2]
## ham 0.0412844 0.1994051
## spam 0.1237624 0.3726064
##
## fork
## train_y [,1] [,2]
## ham 0.1100917 0.3920747
## spam 0.0000000 0.0000000
##
## forkadminxentcom
## train_y [,1] [,2]
## ham 1.027523 1.777788
## spam 0.000000 0.000000
##
## forkspamassassintaintorg
## train_y [,1] [,2]
## ham 0.7431193 1.315653
## spam 0.0000000 0.000000
##
## forkxentcom
## train_y [,1] [,2]
## ham 0.6192661 1.114213
## spam 0.0000000 0.000000
##
## form
## train_y [,1] [,2]
## ham 0.02293578 0.178128
## spam 0.49504950 1.116353
##
## format
## train_y [,1] [,2]
## ham 0.06422018 0.4852712
## spam 0.07920792 0.3363003
##
## formatflowed
## train_y [,1] [,2]
## ham 0.08715596 0.2827126
## spam 0.00000000 0.0000000
##
## forteanaowneryahoogroupscom
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.00000000 0.0000000
##
## forteanaunsubscribeegroupscom
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.00000000 0.0000000
##
## fortune
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.07425743 0.3299792
##
## forward
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.06435644 0.2459965
##
## found
## train_y [,1] [,2]
## ham 0.28440367 0.6524218
## spam 0.05445545 0.2856518
##
## four
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.05445545 0.3624199
##
## france
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.03960396 0.4329878
##
## free
## train_y [,1] [,2]
## ham 0.1651376 0.4802795
## spam 0.9702970 2.2984115
##
## freshrpms
## train_y [,1] [,2]
## ham 0.1055046 0.3225289
## spam 0.0000000 0.0000000
##
## fri
## train_y [,1] [,2]
## ham 1.045872 2.482697
## spam 1.034653 1.995963
##
## friend
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.079207921 0.33630028
##
## friends
## train_y [,1] [,2]
## ham 0.27981651 0.5076883
## spam 0.06930693 0.3076933
##
## from
## train_y [,1] [,2]
## ham 8.889908 3.288969
## spam 8.237624 2.667842
##
## frontpage
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03465347 0.1833549
##
## full
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.09405941 0.3398337
##
## fully
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.07425743 0.3855999
##
## fun
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.03465347 0.2519321
##
## fund
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.153465347 0.67743507
##
## funds
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.059405941 0.35469228
##
## further
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.19306931 0.6208255
##
## future
## train_y [,1] [,2]
## ham 0.05963303 0.2734383
## spam 0.19306931 0.4431361
##
## game
## train_y [,1] [,2]
## ham 0.02293578 0.2436788
## spam 0.01485149 0.2110793
##
## garrigues
## train_y [,1] [,2]
## ham 0.07798165 0.4687188
## spam 0.00000000 0.0000000
##
## gary
## train_y [,1] [,2]
## ham 0.09174312 0.4511385
## spam 0.00000000 0.0000000
##
## garymcanadacom
## train_y [,1] [,2]
## ham 0.06422018 0.3118321
## spam 0.00000000 0.0000000
##
## gecko
## train_y [,1] [,2]
## ham 0.06880734 0.2537088
## spam 0.00000000 0.0000000
##
## geege
## train_y [,1] [,2]
## ham 0.06422018 0.3401065
## spam 0.00000000 0.0000000
##
## geek
## train_y [,1] [,2]
## ham 0.055045872 0.24793571
## spam 0.004950495 0.07035975
##
## general
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.11881188 0.4181725
##
## generation
## train_y [,1] [,2]
## ham 0.05963303 0.6863541
## spam 0.05940594 0.7768002
##
## get
## train_y [,1] [,2]
## ham 0.4311927 0.9094179
## spam 0.7475248 2.2082205
##
## gets
## train_y [,1] [,2]
## ham 0.04128440 0.1994051
## spam 0.01485149 0.1212589
##
## getting
## train_y [,1] [,2]
## ham 0.08256881 0.3074629
## spam 0.01485149 0.1212589
##
## give
## train_y [,1] [,2]
## ham 0.09633028 0.4014781
## spam 0.12871287 0.4606585
##
## given
## train_y [,1] [,2]
## ham 0.05963303 0.2560311
## spam 0.06930693 0.3528823
##
## gmt
## train_y [,1] [,2]
## ham 0.1513761 0.6368289
## spam 0.1237624 0.6460979
##
## gnulinux
## train_y [,1] [,2]
## ham 0.06880734 0.2712651
## spam 0.00000000 0.0000000
##
## gnupg
## train_y [,1] [,2]
## ham 0.06422018 0.2637982
## spam 0.00000000 0.0000000
##
## goes
## train_y [,1] [,2]
## ham 0.04128440 0.2213120
## spam 0.03960396 0.3129316
##
## going
## train_y [,1] [,2]
## ham 0.10091743 0.4172536
## spam 0.08910891 0.5296955
##
## good
## train_y [,1] [,2]
## ham 0.1743119 0.4770114
## spam 0.1584158 0.5227686
##
## got
## train_y [,1] [,2]
## ham 0.12385321 0.3438463
## spam 0.08415842 0.3424328
##
## government
## train_y [,1] [,2]
## ham 0.03211009 0.260938
## spam 0.25247525 2.137237
##
## grant
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1039604 1.477555
##
## grants
## train_y [,1] [,2]
## ham 0.0000000 0.00000
## spam 0.1980198 2.81439
##
## great
## train_y [,1] [,2]
## ham 0.06880734 0.2537088
## spam 0.14851485 0.3961541
##
## group
## train_y [,1] [,2]
## ham 0.2201835 0.5973490
## spam 0.2376238 0.6932297
##
## groups
## train_y [,1] [,2]
## ham 0.15596330 0.5200906
## spam 0.01485149 0.1570158
##
## growing
## train_y [,1] [,2]
## ham 0.05045872 0.2752875
## spam 0.09405941 0.3812318
##
## guarantee
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.084158416 0.37035217
##
## guaranteed
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.133663366 0.44269128
##
## guardian
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.00000000 0.0000000
##
## guide
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1386139 1.230052
##
## habeas
## train_y [,1] [,2]
## ham 0.07798165 0.5331167
## spam 0.00000000 0.0000000
##
## had
## train_y [,1] [,2]
## ham 0.2385321 0.8840531
## spam 0.2178218 1.4633429
##
## hal
## train_y [,1] [,2]
## ham 0.06880734 0.3590046
## spam 0.00000000 0.0000000
##
## half
## train_y [,1] [,2]
## ham 0.05963303 0.3341181
## spam 0.05445545 0.3758969
##
## hand
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.01980198 0.1396654
##
## happen
## train_y [,1] [,2]
## ham 0.04128440 0.221312
## spam 0.04455446 0.206836
##
## happy
## train_y [,1] [,2]
## ham 0.04128440 0.2412376
## spam 0.06435644 0.4357379
##
## hard
## train_y [,1] [,2]
## ham 0.11926606 0.5796037
## spam 0.02970297 0.1972672
##
## hardware
## train_y [,1] [,2]
## ham 0.064220183 0.41348757
## spam 0.004950495 0.07035975
##
## has
## train_y [,1] [,2]
## ham 0.5504587 1.468478
## spam 0.5346535 1.590171
##
## have
## train_y [,1] [,2]
## ham 0.8990826 1.380876
## spam 1.5792079 3.523364
##
## having
## train_y [,1] [,2]
## ham 0.09633028 0.3253999
## spam 0.04950495 0.2777392
##
## head
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.46039604 0.8291432
##
## health
## train_y [,1] [,2]
## ham 0.03211009 0.4116687
## spam 0.08415842 0.4436916
##
## heard
## train_y [,1] [,2]
## ham 0.04128440 0.1994051
## spam 0.02475248 0.1557559
##
## heaven
## train_y [,1] [,2]
## ham 0.06422018 0.2807242
## spam 0.01485149 0.1570158
##
## height
## train_y [,1] [,2]
## ham 0.00000 0.000000
## spam 1.70297 6.214322
##
## heightbfont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04950495 0.4962794
##
## heightd
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.7722772 2.631161
##
## heightdbr
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.2210534
##
## heightdtd
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1089109 0.7248399
##
## heightfont
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1188119 1.347835
##
## heightimg
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1039604 0.4826174
##
## heighttd
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1287129 1.085072
##
## held
## train_y [,1] [,2]
## ham 0.04128440 0.3632194
## spam 0.04455446 0.2068360
##
## helo
## train_y [,1] [,2]
## ham 0.1422018 0.3629865
## spam 0.1138614 0.3482807
##
## helouswsflistsourceforgenet
## train_y [,1] [,2]
## ham 0.13302752 0.3536655
## spam 0.01980198 0.1396654
##
## help
## train_y [,1] [,2]
## ham 0.08256881 0.3221026
## spam 0.33663366 0.9902734
##
## helvetica
## train_y [,1] [,2]
## ham 0.000000 0.000000
## spam 1.277228 4.202863
##
## her
## train_y [,1] [,2]
## ham 0.03669725 0.2690350
## spam 0.10891089 0.6213609
##
## herba
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.09405941 1.336835
##
## here
## train_y [,1] [,2]
## ham 0.2018349 0.4947618
## spam 0.5792079 0.9955940
##
## herea
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1336634 0.5790329
##
## hettinga
## train_y [,1] [,2]
## ham 0.08256881 0.4320872
## spam 0.00000000 0.0000000
##
## hextab
## train_y [,1] [,2]
## ham 0.09174312 1.354571
## spam 0.00000000 0.000000
##
## hidden
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04950495 0.2777392
##
## high
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.07920792 0.3507822
##
## higher
## train_y [,1] [,2]
## ham 0.050458716 0.30694687
## spam 0.004950495 0.07035975
##
## his
## train_y [,1] [,2]
## ham 0.1284404 0.5531796
## spam 0.2524752 1.1199764
##
## hit
## train_y [,1] [,2]
## ham 0.09174312 1.0949510
## spam 0.02970297 0.1701884
##
## home
## train_y [,1] [,2]
## ham 0.0412844 0.221312
## spam 0.3663366 1.450495
##
## hope
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.044554455 0.22963317
##
## host
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.07920792 0.3211660
##
## hostinsuranceiqcom
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.1701884
##
## hotmailcom
## train_y [,1] [,2]
## ham 0.02752294 0.2335349
## spam 0.03465347 0.2087326
##
## hour
## train_y [,1] [,2]
## ham 0.02752294 0.2524978
## spam 0.05445545 0.2856518
##
## hours
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.15346535 0.4899110
##
## how
## train_y [,1] [,2]
## ham 0.2431193 0.5999796
## spam 0.5346535 1.6333853
##
## however
## train_y [,1] [,2]
## ham 0.08715596 0.3919938
## spam 0.06930693 0.3800347
##
## hrefhttpadfarmmediaplexcomadckfont
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.09405941 1.336835
##
## hrefhttpwwwjeweldivecomslefreakindexhtmlimg
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## hspace
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05445545 0.4483302
##
## html
## train_y [,1] [,2]
## ham 0.03669725 0.3573225
## spam 0.69801980 1.0664005
##
## httpamavisorg
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.049504950 0.23924581
##
## httpaptfreshrpmsnet
## train_y [,1] [,2]
## ham 0.09174312 1.354571
## spam 0.00000000 0.000000
##
## httpboingboingnet
## train_y [,1] [,2]
## ham 0.05045872 0.2193933
## spam 0.00000000 0.0000000
##
## httpdocsyahoocominfoterms
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.00000000 0.0000000
##
## httpequivcontenttype
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.2734495
##
## httpequivdcontenttype
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1188119 0.3243709
##
## httplistsfreshrpmsnetmailmanlistinforpmlist
## train_y [,1] [,2]
## ham 0.1100917 0.3418424
## spam 0.0000000 0.0000000
##
## httplistsfreshrpmsnetmailmanlistinforpmzzzlist
## train_y [,1] [,2]
## ham 0.2018349 0.6038252
## spam 0.0000000 0.0000000
##
## httplistsfreshrpmsnetpipermailrpmzzzlist
## train_y [,1] [,2]
## ham 0.1009174 0.3019126
## spam 0.0000000 0.0000000
##
## httpsexamplesourceforgenetlistslistinforazorusers
## train_y [,1] [,2]
## ham 0.08256881 0.3988102
## spam 0.00000000 0.0000000
##
## httpsexamplesourceforgenetlistslistinfospamassassintalk
## train_y [,1] [,2]
## ham 0.1651376 0.584181
## spam 0.0000000 0.000000
##
## httpslistmanredhatcommailmanlistinfoexmhusers
## train_y [,1] [,2]
## ham 0.04587156 0.209688
## spam 0.00000000 0.000000
##
## httpslistmanspamassassintaintorgmailmanlistinfoexmhusers
## train_y [,1] [,2]
## ham 0.09174312 0.419376
## spam 0.00000000 0.000000
##
## httpslistmanspamassassintaintorgmailmanlistinfoexmhworkers
## train_y [,1] [,2]
## ham 0.08256881 0.3988102
## spam 0.00000000 0.0000000
##
## httpslistmanspamassassintaintorgmailmanprivateexmhusers
## train_y [,1] [,2]
## ham 0.04587156 0.209688
## spam 0.00000000 0.000000
##
## httpslistssourceforgenetlistslistinfospamassassintalk
## train_y [,1] [,2]
## ham 0.0733945 0.2613831
## spam 0.0000000 0.0000000
##
## httpswwwinphoniccomrasprsourceforgerefcodevs
## train_y [,1] [,2]
## ham 0.05963303 0.25603113
## spam 0.00990099 0.09925589
##
## httpthinkgeekcomsf
## train_y [,1] [,2]
## ham 0.050458716 0.23947864
## spam 0.004950495 0.07035975
##
## httpwwwadclickwspcfmospk
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1188119 0.7023362
##
## httpwwwgeocrawlercomredirsfphplistspamassassintalk
## train_y [,1] [,2]
## ham 0.06880734 0.2712651
## spam 0.00000000 0.0000000
##
## httpwwwinsuranceiqcomoptout
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05445545 0.3185867
##
## httpwwwlinuxiemailmanlistinfoilug
## train_y [,1] [,2]
## ham 0.06422018 0.3785792
## spam 0.12376238 0.5081717
##
## httpwwwlinuxiemailmanlistinfosocial
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06435644 0.4241665
##
## httpwwwnewsisfreecomclick
## train_y [,1] [,2]
## ham 0.09174312 0.2893273
## spam 0.00000000 0.0000000
##
## httpxentcommailmanlistinfofork
## train_y [,1] [,2]
## ham 0.5321101 0.9314654
## spam 0.0000000 0.0000000
##
## httpxentcompipermailfork
## train_y [,1] [,2]
## ham 0.2522936 0.4353284
## spam 0.0000000 0.0000000
##
## hubfreebsdorg
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08910891 0.7275531
##
## huge
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.07920792 0.3052825
##
## hughes
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.000000000 0.00000000
##
## hundred
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.059405941 0.29326714
##
## hundreds
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.14356436 0.5127073
##
## idea
## train_y [,1] [,2]
## ham 0.06422018 0.2807242
## spam 0.05940594 0.3403768
##
## ideas
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.02970297 0.1701884
##
## ill
## train_y [,1] [,2]
## ham 0.10550459 0.3753548
## spam 0.03465347 0.2313429
##
## ilug
## train_y [,1] [,2]
## ham 0.04587156 0.2498044
## spam 0.07425743 0.2628408
##
## ilugadminlinuxie
## train_y [,1] [,2]
## ham 0.1467890 0.7538007
## spam 0.2970297 1.0513631
##
## iluglinuxie
## train_y [,1] [,2]
## ham 0.2201835 1.146888
## spam 0.4009901 1.425636
##
## image
## train_y [,1] [,2]
## ham 0.04128440 0.6095569
## spam 0.01980198 0.1716295
##
## imap
## train_y [,1] [,2]
## ham 0.9816514 0.1345175
## spam 0.7029703 0.4580850
##
## img
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.37128713 1.5081099
##
## immediate
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.09900990 0.3733658
##
## immediately
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.103960396 0.37863877
##
## importance
## train_y [,1] [,2]
## ham 0.0412844 0.2213120
## spam 0.1930693 0.3956876
##
## important
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.09900990 0.5279256
##
## inc
## train_y [,1] [,2]
## ham 0.08256881 0.5700423
## spam 0.03465347 0.2313429
##
## include
## train_y [,1] [,2]
## ham 0.03669725 0.2114948
## spam 0.11386139 0.3622840
##
## included
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.04455446 0.3201291
##
## includes
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.183168317 0.81717500
##
## including
## train_y [,1] [,2]
## ham 0.05963303 0.3853586
## spam 0.08415842 0.3703522
##
## income
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.178217822 0.79671015
##
## increase
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.12376238 0.4982853
##
## independent
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.07920792 0.3211660
##
## individuals
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.069306931 0.40537252
##
## industry
## train_y [,1] [,2]
## ham 0.03669725 0.3162746
## spam 0.01980198 0.1716295
##
## info
## train_y [,1] [,2]
## ham 0.06880734 0.3838196
## spam 0.14851485 0.6048912
##
## information
## train_y [,1] [,2]
## ham 0.1972477 0.8383612
## spam 0.6584158 1.0543223
##
## inline
## train_y [,1] [,2]
## ham 0.12844037 0.4722906
## spam 0.01485149 0.1212589
##
## input
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.45049505 2.3393720
##
## inquiry
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.034653465 0.20873256
##
## inreplyto
## train_y [,1] [,2]
## ham 0.4220183 0.5395608
## spam 0.0000000 0.0000000
##
## inset
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.000000000 0.00000000
##
## install
## train_y [,1] [,2]
## ham 0.1192661 0.5796037
## spam 0.0000000 0.0000000
##
## installed
## train_y [,1] [,2]
## ham 0.050458716 0.29154723
## spam 0.004950495 0.07035975
##
## instant
## train_y [,1] [,2]
## ham 0.05045872 0.4422656
## spam 0.03960396 0.2793308
##
## instead
## train_y [,1] [,2]
## ham 0.05504587 0.2658735
## spam 0.08415842 0.3703522
##
## instructions
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.138613861 0.66211936
##
## insurance
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1336634 0.5526557
##
## intended
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08415842 0.7966638
##
## interactive
## train_y [,1] [,2]
## ham 0.04128440 0.3756925
## spam 0.02475248 0.2326169
##
## interest
## train_y [,1] [,2]
## ham 0.02752294 0.2335349
## spam 0.10396040 0.3513785
##
## interested
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 0.202970297 2.0572384
##
## internal
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.01485149 0.1570158
##
## international
## train_y [,1] [,2]
## ham 0.03669725 0.3162746
## spam 0.08415842 0.4436916
##
## internet
## train_y [,1] [,2]
## ham 0.1467890 0.6342845
## spam 0.5148515 1.3279533
##
## intmxcorpredhatcom
## train_y [,1] [,2]
## ham 0.08715596 0.2827126
## spam 0.00000000 0.0000000
##
## intmxcorpspamassassintaintorg
## train_y [,1] [,2]
## ham 0.2614679 0.8481379
## spam 0.0000000 0.0000000
##
## into
## train_y [,1] [,2]
## ham 0.2889908 0.6678723
## spam 0.2722772 0.7465501
##
## intro
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.07920792 1.125756
##
## invest
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.069306931 0.33849023
##
## investment
## train_y [,1] [,2]
## ham 0.01834862 0.2138802
## spam 0.23762376 0.8598104
##
## investors
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 0.069306931 0.3528823
##
## invoked
## train_y [,1] [,2]
## ham 0.22935780 0.4630699
## spam 0.05940594 0.2369702
##
## involved
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.05940594 0.2932671
##
## irish
## train_y [,1] [,2]
## ham 0.07798165 0.3820531
## spam 0.21287129 0.6061317
##
## isnt
## train_y [,1] [,2]
## ham 0.08256881 0.2758628
## spam 0.01980198 0.1396654
##
## issue
## train_y [,1] [,2]
## ham 0.05504587 0.2285947
## spam 0.02475248 0.2897609
##
## issued
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.3140316
##
## ist
## train_y [,1] [,2]
## ham 1.766055 0.6112193
## spam 1.450495 0.7726833
##
## its
## train_y [,1] [,2]
## ham 0.5275229 1.1325410
## spam 0.2772277 0.9631913
##
## ive
## train_y [,1] [,2]
## ham 0.12385321 0.4381358
## spam 0.04455446 0.3041914
##
## jaa
## train_y [,1] [,2]
## ham 0.02293578 0.2436788
## spam 0.04950495 0.3270930
##
## jalapeno
## train_y [,1] [,2]
## ham 1.5504587 0.8367838
## spam 0.9851485 0.9998892
##
## java
## train_y [,1] [,2]
## ham 0.050458716 0.44226561
## spam 0.004950495 0.07035975
##
## jmasonorg
## train_y [,1] [,2]
## ham 0.71559633 0.4521682
## spam 0.01980198 0.1396654
##
## jmexmhjmasonorg
## train_y [,1] [,2]
## ham 0.08715596 0.2827126
## spam 0.00000000 0.0000000
##
## jmjmasonorg
## train_y [,1] [,2]
## ham 0.55504587 0.55912806
## spam 0.00990099 0.09925589
##
## jmlocalhost
## train_y [,1] [,2]
## ham 1.71559633 0.7001217
## spam 0.03960396 0.2793308
##
## jmnetnoteinccom
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.3254322
##
## jmrpmjmasonorg
## train_y [,1] [,2]
## ham 0.09174312 0.2893273
## spam 0.00000000 0.0000000
##
## jmsajmasonorg
## train_y [,1] [,2]
## ham 0.0733945 0.278456
## spam 0.0000000 0.000000
##
## jmuseperljmasonorg
## train_y [,1] [,2]
## ham 0.05504587 0.3279559
## spam 0.00000000 0.0000000
##
## job
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.03465347 0.2087326
##
## john
## train_y [,1] [,2]
## ham 0.05504587 0.2658735
## spam 0.03465347 0.3054841
##
## join
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.06435644 0.3006077
##
## jul
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.054455446 0.49071473
##
## just
## train_y [,1] [,2]
## ham 0.4174312 0.7829219
## spam 0.5000000 1.6847280
##
## justin
## train_y [,1] [,2]
## ham 0.05045872 0.2580051
## spam 0.00000000 0.0000000
##
## kabila
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.08415842 0.745031
##
## kathmandu
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.08910891 1.266476
##
## keep
## train_y [,1] [,2]
## ham 0.09633028 0.3654242
## spam 0.15841584 0.6105649
##
## kelly
## train_y [,1] [,2]
## ham 0.0733945 1.017872
## spam 0.0000000 0.000000
##
## kernel
## train_y [,1] [,2]
## ham 0.0733945 0.6748466
## spam 0.0000000 0.0000000
##
## key
## train_y [,1] [,2]
## ham 0.12385321 0.8136394
## spam 0.02475248 0.1557559
##
## khare
## train_y [,1] [,2]
## ham 0.266055 0.4730955
## spam 0.000000 0.0000000
##
## kind
## train_y [,1] [,2]
## ham 0.06422018 0.2637982
## spam 0.04950495 0.3270930
##
## know
## train_y [,1] [,2]
## ham 0.1926606 0.4498246
## spam 0.2475248 0.7385065
##
## knowledge
## train_y [,1] [,2]
## ham 0.03669725 0.3013520
## spam 0.05445545 0.3338379
##
## known
## train_y [,1] [,2]
## ham 0.04128440 0.2412376
## spam 0.02475248 0.1849598
##
## lairxentcom
## train_y [,1] [,2]
## ham 0.2522936 0.4353284
## spam 0.0000000 0.0000000
##
## laptop
## train_y [,1] [,2]
## ham 0.027522936 0.23353487
## spam 0.004950495 0.07035975
##
## large
## train_y [,1] [,2]
## ham 0.05504587 0.2826753
## spam 0.08910891 0.3338010
##
## last
## train_y [,1] [,2]
## ham 0.08715596 0.3677308
## spam 0.13861386 0.4354834
##
## late
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.06435644 0.3467214
##
## later
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.08415842 0.5162499
##
## laurent
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 0.059405941 0.4951864
##
## lawrence
## train_y [,1] [,2]
## ham 0.1055046 0.4731179
## spam 0.0000000 0.0000000
##
## laws
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 0.064356436 0.2654515
##
## lbs
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.4672676
##
## learn
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.14851485 0.9020280
##
## least
## train_y [,1] [,2]
## ham 0.06422018 0.3262757
## spam 0.09900990 0.5817276
##
## left
## train_y [,1] [,2]
## ham 0.05045872 0.2752875
## spam 0.03465347 0.2087326
##
## legal
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.17326733 0.6799753
##
## leramilerctrorg
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.05445545 0.267669
##
## lerleramilerctrorg
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.05445545 0.227478
##
## lerlerctrorg
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08415842 0.4436916
##
## les
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1386139 1.970073
##
## less
## train_y [,1] [,2]
## ham 0.09174312 0.3728402
## spam 0.11881188 0.3808116
##
## let
## train_y [,1] [,2]
## ham 0.06422018 0.2637982
## spam 0.06435644 0.2835750
##
## lets
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.05940594 0.3546923
##
## letter
## train_y [,1] [,2]
## ham 0.02293578 0.2239705
## spam 0.14356436 0.7689129
##
## level
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.08415842 0.7315536
##
## life
## train_y [,1] [,2]
## ham 0.0733945 0.3097917
## spam 0.1980198 0.6985387
##
## lifetime
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.01485149 0.1570158
##
## lifont
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1485149 0.7776241
##
## light
## train_y [,1] [,2]
## ham 0.05045872 0.3069469
## spam 0.01485149 0.1212589
##
## like
## train_y [,1] [,2]
## ham 0.3302752 0.6930204
## spam 0.4108911 0.8946943
##
## likely
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.01980198 0.1396654
##
## limited
## train_y [,1] [,2]
## ham 0.02752294 0.2524978
## spam 0.13366337 0.4075841
##
## line
## train_y [,1] [,2]
## ham 0.1146789 0.4901473
## spam 0.1287129 0.3641487
##
## link
## train_y [,1] [,2]
## ham 0.08715596 0.3136234
## spam 0.26237624 0.6030357
##
## links
## train_y [,1] [,2]
## ham 0.04128440 0.2596385
## spam 0.02475248 0.1849598
##
## linux
## train_y [,1] [,2]
## ham 0.2064220 0.7174937
## spam 0.2029703 0.6013179
##
## list
## train_y [,1] [,2]
## ham 1.0137615 1.100814
## spam 0.8316832 1.262307
##
## listarchive
## train_y [,1] [,2]
## ham 0.59633028 0.5010453
## spam 0.09405941 0.2926366
##
## listed
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.07920792 0.321166
##
## listhelp
## train_y [,1] [,2]
## ham 0.61926606 0.4960632
## spam 0.09405941 0.2926366
##
## listid
## train_y [,1] [,2]
## ham 0.6376147 0.4912673
## spam 0.1485149 0.3564931
##
## listmanredhatcom
## train_y [,1] [,2]
## ham 0.1743119 0.5654253
## spam 0.0000000 0.0000000
##
## listmanspamassassintaintorg
## train_y [,1] [,2]
## ham 0.2614679 0.8481379
## spam 0.0000000 0.0000000
##
## listmasterlinuxie
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.09900990 0.2994174
##
## listpost
## train_y [,1] [,2]
## ham 0.61926606 0.4960632
## spam 0.07920792 0.2707340
##
## lists
## train_y [,1] [,2]
## ham 0.09633028 0.4750797
## spam 0.04455446 0.2296332
##
## listsubscribe
## train_y [,1] [,2]
## ham 0.61926606 0.4960632
## spam 0.09405941 0.2926366
##
## listunsubscribe
## train_y [,1] [,2]
## ham 0.67889908 0.4777199
## spam 0.09405941 0.2926366
##
## little
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.12376238 0.5367395
##
## live
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.08910891 0.3758641
##
## living
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.07425743 0.4225376
##
## loan
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05445545 0.3889071
##
## loans
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.3667099
##
## local
## train_y [,1] [,2]
## ham 0.05504587 0.2658735
## spam 0.13366337 0.3823929
##
## localhost
## train_y [,1] [,2]
## ham 3.082569 0.8812270
## spam 2.391089 0.7401222
##
## localhostlocaldomain
## train_y [,1] [,2]
## ham 0.17431193 0.6123768
## spam 0.02970297 0.2425177
##
## log
## train_y [,1] [,2]
## ham 0.05045872 0.2394786
## spam 0.00000000 0.0000000
##
## long
## train_y [,1] [,2]
## ham 0.0733945 0.3243262
## spam 0.1039604 0.3915579
##
## longer
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.05940594 0.2757814
##
## look
## train_y [,1] [,2]
## ham 0.10091743 0.3577955
## spam 0.08415842 0.4323332
##
## looking
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.15346535 0.6550325
##
## lose
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.138613861 0.79212169
##
## loss
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.069306931 0.29107540
##
## lost
## train_y [,1] [,2]
## ham 0.04128440 0.2768190
## spam 0.03465347 0.2087326
##
## lot
## train_y [,1] [,2]
## ham 0.06880734 0.3590046
## spam 0.06930693 0.3234585
##
## love
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.06435644 0.3167258
##
## low
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.14851485 0.4546310
##
## lowest
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07920792 0.3363003
##
## lugh
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.07425743 0.3299792
##
## lughtuathaorg
## train_y [,1] [,2]
## ham 0.1422018 0.7330001
## spam 0.4108911 1.2476765
##
## made
## train_y [,1] [,2]
## ham 0.05963303 0.2560311
## spam 0.27227723 1.2891139
##
## mail
## train_y [,1] [,2]
## ham 0.2889908 0.7336343
## spam 0.4009901 0.8119744
##
## mailing
## train_y [,1] [,2]
## ham 0.5000000 0.6240200
## spam 0.2079208 0.5147928
##
## mailinglist
## train_y [,1] [,2]
## ham 0.07798165 0.2687598
## spam 0.00000000 0.0000000
##
## mailings
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1782178 0.5258689
##
## mailinsuranceiqcom
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.3403768
##
## maillocalhost
## train_y [,1] [,2]
## ham 0.08715596 0.2827126
## spam 0.00000000 0.0000000
##
## mailtoexmhusersrequestredhatcomsubjectsubscribe
## train_y [,1] [,2]
## ham 0.04587156 0.209688
## spam 0.00000000 0.000000
##
## mailtoexmhusersrequestredhatcomsubjectunsubscribe
## train_y [,1] [,2]
## ham 0.04587156 0.209688
## spam 0.00000000 0.000000
##
## mailtoexmhusersrequestspamassassintaintorgsubjecthelp
## train_y [,1] [,2]
## ham 0.04587156 0.209688
## spam 0.00000000 0.000000
##
## mailtoexmhusersspamassassintaintorg
## train_y [,1] [,2]
## ham 0.04587156 0.209688
## spam 0.00000000 0.000000
##
## mailtoforkrequestxentcomsubjecthelp
## train_y [,1] [,2]
## ham 0.2522936 0.4353284
## spam 0.0000000 0.0000000
##
## mailtoforkrequestxentcomsubjectsubscribe
## train_y [,1] [,2]
## ham 0.2522936 0.4353284
## spam 0.0000000 0.0000000
##
## mailtoforkrequestxentcomsubjectunsubscribe
## train_y [,1] [,2]
## ham 0.2522936 0.4353284
## spam 0.0000000 0.0000000
##
## mailtoforkspamassassintaintorg
## train_y [,1] [,2]
## ham 0.2522936 0.4353284
## spam 0.0000000 0.0000000
##
## mailtorpmlistrequestfreshrpmsnetsubjectsubscribe
## train_y [,1] [,2]
## ham 0.1009174 0.3019126
## spam 0.0000000 0.0000000
##
## mailtorpmlistrequestfreshrpmsnetsubjectunsubscribe
## train_y [,1] [,2]
## ham 0.1009174 0.3019126
## spam 0.0000000 0.0000000
##
## mailtorpmzzzlistfreshrpmsnet
## train_y [,1] [,2]
## ham 0.1009174 0.3019126
## spam 0.0000000 0.0000000
##
## mailtorpmzzzlistrequestfreshrpmsnetsubjecthelp
## train_y [,1] [,2]
## ham 0.1009174 0.3019126
## spam 0.0000000 0.0000000
##
## mailtospamassassintalkexamplesourceforgenet
## train_y [,1] [,2]
## ham 0.08256881 0.2920905
## spam 0.00000000 0.0000000
##
## mailtospamassassintalkrequestexamplesourceforgenetsubjecthelp
## train_y [,1] [,2]
## ham 0.08256881 0.2920905
## spam 0.00000000 0.0000000
##
## mailtospamassassintalkrequestlistssourceforgenetsubjectsubscribe
## train_y [,1] [,2]
## ham 0.08256881 0.2920905
## spam 0.00000000 0.0000000
##
## mailtospamassassintalkrequestlistssourceforgenetsubjectunsubscribe
## train_y [,1] [,2]
## ham 0.08256881 0.2920905
## spam 0.00000000 0.0000000
##
## mailtozzzzteanaunsubscribeyahoogroupscom
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.00000000 0.0000000
##
## mailwebnotenet
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.48514851 0.5010211
##
## main
## train_y [,1] [,2]
## ham 0.04587156 0.24980439
## spam 0.00990099 0.09925589
##
## maintainer
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.09900990 0.2994174
##
## major
## train_y [,1] [,2]
## ham 0.08715596 0.5402851
## spam 0.07425743 0.3855999
##
## make
## train_y [,1] [,2]
## ham 0.1880734 0.4955516
## spam 0.5940594 2.4051281
##
## makes
## train_y [,1] [,2]
## ham 0.04128440 0.1994051
## spam 0.02475248 0.1849598
##
## making
## train_y [,1] [,2]
## ham 0.05963303 0.2560311
## spam 0.06930693 0.3234585
##
## male
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.044554455 0.36377656
##
## man
## train_y [,1] [,2]
## ham 0.05045872 0.2394786
## spam 0.02970297 0.3584223
##
## management
## train_y [,1] [,2]
## ham 0.06880734 0.5078548
## spam 0.06435644 0.2459965
##
## manager
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.04950495 0.2592080
##
## many
## train_y [,1] [,2]
## ham 0.1880734 0.565070
## spam 0.2722772 1.334623
##
## map
## train_y [,1] [,2]
## ham 0.02293578 0.3386427
## spam 0.04950495 0.7035975
##
## marginright
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08415842 0.8332915
##
## mark
## train_y [,1] [,2]
## ham 0.087155963 0.35497801
## spam 0.004950495 0.07035975
##
## market
## train_y [,1] [,2]
## ham 0.06422018 0.4558929
## spam 0.20792079 0.9856365
##
## marketing
## train_y [,1] [,2]
## ham 0.06880734 0.6366961
## spam 0.31683168 1.0455372
##
## marriott
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## mason
## train_y [,1] [,2]
## ham 0.05045872 0.2580051
## spam 0.00000000 0.0000000
##
## matter
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.08415842 0.3566658
##
## matthias
## train_y [,1] [,2]
## ham 0.08256881 0.4426239
## spam 0.00000000 0.0000000
##
## maxline
## train_y [,1] [,2]
## ham 0.08715596 1.286842
## spam 0.00000000 0.000000
##
## maxtor
## train_y [,1] [,2]
## ham 0.09174312 1.354571
## spam 0.00000000 0.000000
##
## may
## train_y [,1] [,2]
## ham 0.2018349 0.5130521
## spam 0.4900990 1.0846743
##
## maybe
## train_y [,1] [,2]
## ham 0.059633028 0.27343834
## spam 0.004950495 0.07035975
##
## mean
## train_y [,1] [,2]
## ham 0.05045872 0.2394786
## spam 0.02475248 0.1557559
##
## means
## train_y [,1] [,2]
## ham 0.05504587 0.2479357
## spam 0.03465347 0.2313429
##
## media
## train_y [,1] [,2]
## ham 0.03211009 0.260938
## spam 0.04950495 0.295109
##
## meetings
## train_y [,1] [,2]
## ham 0.06880734 0.8201604
## spam 0.00000000 0.0000000
##
## member
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.10396040 0.3513785
##
## members
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.06435644 0.2835750
##
## membership
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.074257426 0.68344365
##
## men
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.06435644 0.6074306
##
## message
## train_y [,1] [,2]
## ham 0.3944954 0.8534921
## spam 0.3663366 0.7014239
##
## messageid
## train_y [,1] [,2]
## ham 1.0137615 0.11676744
## spam 0.9950495 0.07035975
##
## messages
## train_y [,1] [,2]
## ham 0.13761468 0.52526737
## spam 0.00990099 0.09925589
##
## meta
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.311881188 0.86775138
##
## method
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.10891089 0.6449342
##
## methoddpost
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03960396 0.1955114
##
## mgrpscdyahoocom
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.00000000 0.0000000
##
## microsoft
## train_y [,1] [,2]
## ham 0.2018349 0.5883636
## spam 0.5891089 0.8721679
##
## might
## train_y [,1] [,2]
## ham 0.10550459 0.3874375
## spam 0.02475248 0.1557559
##
## million
## train_y [,1] [,2]
## ham 0.04587156 0.3293709
## spam 0.37623762 1.3409100
##
## millions
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.0990099 0.4114034
##
## mime
## train_y [,1] [,2]
## ham 0.03211009 0.2426356
## spam 0.05940594 0.2369702
##
## mimeole
## train_y [,1] [,2]
## ham 0.07339450 0.2613831
## spam 0.09405941 0.2926366
##
## mimeversion
## train_y [,1] [,2]
## ham 0.6651376 0.4826723
## spam 0.8465347 0.3613310
##
## mind
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.09405941 0.5145295
##
## minutes
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.3667099
##
## mladih
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.0990099 0.9925589
##
## moment
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.05445545 0.2856518
##
## mon
## train_y [,1] [,2]
## ham 1.412844 2.806218
## spam 1.306931 1.950852
##
## monday
## train_y [,1] [,2]
## ham 0.0412844 0.276819
## spam 0.0000000 0.000000
##
## money
## train_y [,1] [,2]
## ham 0.06880734 0.4701597
## spam 0.87128713 3.1184362
##
## mono
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## month
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.19306931 0.6671775
##
## months
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.14356436 0.8190415
##
## more
## train_y [,1] [,2]
## ham 0.4633028 1.086589
## spam 0.8118812 1.932765
##
## mortgage
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1287129 0.6175439
##
## most
## train_y [,1] [,2]
## ham 0.1422018 0.5370672
## spam 0.2376238 0.9158474
##
## move
## train_y [,1] [,2]
## ham 0.03669725 0.2513230
## spam 0.11386139 0.6164462
##
## moved
## train_y [,1] [,2]
## ham 0.02293578 0.2239705
## spam 0.03960396 0.2609128
##
## mozilla
## train_y [,1] [,2]
## ham 0.08256881 0.2920905
## spam 0.00000000 0.0000000
##
## msimagelist
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.07920792 1.125756
##
## mtagrpscdyahoocom
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.00000000 0.0000000
##
## much
## train_y [,1] [,2]
## ham 0.1467890 0.3798056
## spam 0.2722772 0.9142998
##
## multipart
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.074257426 0.41059439
##
## multipartalternative
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.2369702
##
## multipartmixed
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.084158416 0.27831500
##
## murphy
## train_y [,1] [,2]
## ham 0.09174312 0.4511385
## spam 0.00000000 0.0000000
##
## music
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.02475248 0.2101437
##
## must
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.19306931 0.5253300
##
## mutti
## train_y [,1] [,2]
## ham 0.06880734 0.2537088
## spam 0.00000000 0.0000000
##
## mxfreebsdorg
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08910891 0.7275531
##
## mxredhatcom
## train_y [,1] [,2]
## ham 0.08256881 0.2758628
## spam 0.00000000 0.0000000
##
## mxspamassassintaintorg
## train_y [,1] [,2]
## ham 0.1788991 0.5840544
## spam 0.0000000 0.0000000
##
## myself
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.01980198 0.1396654
##
## name
## train_y [,1] [,2]
## ham 0.06422018 0.3401065
## spam 0.54950495 2.1877144
##
## namedcity
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.1701884
##
## namedcontactname
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.1701884
##
## namedemail
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.1701884
##
## namedhdnrecipienttxt
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.1701884
##
## namedhdnsubjecttxt
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.1701884
##
## namedphone
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.1701884
##
## namedsentto
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.1701884
##
## namedstate
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.1701884
##
## names
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.158415842 0.77565794
##
## national
## train_y [,1] [,2]
## ham 0.05045872 0.3216100
## spam 0.08415842 0.4086704
##
## natural
## train_y [,1] [,2]
## ham 0.02293578 0.2239705
## spam 0.03960396 0.2793308
##
## navy
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## nbsp
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1039604 0.4826174
##
## need
## train_y [,1] [,2]
## ham 0.1467890 0.4668887
## spam 0.3316832 0.8428134
##
## needed
## train_y [,1] [,2]
## ham 0.05045872 0.3745655
## spam 0.07920792 0.3507822
##
## needs
## train_y [,1] [,2]
## ham 0.05045872 0.2394786
## spam 0.01980198 0.1396654
##
## net
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.12376238 0.6383512
##
## network
## train_y [,1] [,2]
## ham 0.27522936 1.1669138
## spam 0.03960396 0.1955114
##
## networks
## train_y [,1] [,2]
## ham 0.1009174 1.175361
## spam 0.0000000 0.000000
##
## never
## train_y [,1] [,2]
## ham 0.07798165 0.3439692
## spam 0.13366337 0.4646246
##
## new
## train_y [,1] [,2]
## ham 0.3440367 0.7716081
## spam 0.7524752 1.8115491
##
## news
## train_y [,1] [,2]
## ham 0.1100917 0.7098965
## spam 0.1534653 0.6625842
##
## newsletter
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.03960396 0.3574590
##
## next
## train_y [,1] [,2]
## ham 0.07798165 0.3939304
## spam 0.18811881 0.7881318
##
## ngrpscdyahoocom
## train_y [,1] [,2]
## ham 0.1788991 0.7120521
## spam 0.0000000 0.0000000
##
## nice
## train_y [,1] [,2]
## ham 0.04587156 0.2676169
## spam 0.01485149 0.1212589
##
## nigeria
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07920792 0.4274922
##
## nmh
## train_y [,1] [,2]
## ham 0.0733945 0.2945409
## spam 0.0000000 0.0000000
##
## nnfmp
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.08910891 0.3483868
##
## none
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.06435644 0.4241665
##
## normal
## train_y [,1] [,2]
## ham 0.2018349 0.6336176
## spam 0.6485149 1.2258504
##
## north
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.03960396 0.2793308
##
## norton
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05445545 0.4255578
##
## noshade
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06435644 0.2654515
##
## not
## train_y [,1] [,2]
## ham 0.9311927 1.449564
## spam 1.5990099 3.777730
##
## note
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.15346535 0.6077549
##
## nothing
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.08415842 0.3566658
##
## notice
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.08415842 0.3703522
##
## nov
## train_y [,1] [,2]
## ham 0.09633028 1.0045451
## spam 0.01485149 0.1570158
##
## now
## train_y [,1] [,2]
## ham 0.3027523 0.7498643
## spam 0.5099010 0.9632041
##
## nsegwnnet
## train_y [,1] [,2]
## ham 0.05045872 0.2193933
## spam 0.00000000 0.0000000
##
## number
## train_y [,1] [,2]
## ham 0.1009174 0.4387866
## spam 0.2128713 0.6535267
##
## numbers
## train_y [,1] [,2]
## ham 0.02752294 0.2335349
## spam 0.11881188 0.5949561
##
## obligation
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.3299792
##
## oct
## train_y [,1] [,2]
## ham 2.2798165 3.8604286
## spam 0.1386139 0.9776579
##
## october
## train_y [,1] [,2]
## ham 0.036697248 0.23226417
## spam 0.004950495 0.07035975
##
## off
## train_y [,1] [,2]
## ham 0.09174312 0.3967909
## spam 0.10396040 0.3369222
##
## offer
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.45049505 1.3308527
##
## offering
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.014851485 0.12125885
##
## offers
## train_y [,1] [,2]
## ham 0.02293578 0.2239705
## spam 0.25742574 0.8304343
##
## office
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.09900990 0.4784916
##
## often
## train_y [,1] [,2]
## ham 0.03669725 0.2690350
## spam 0.03465347 0.2519321
##
## oil
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.04950495 0.2777392
##
## old
## train_y [,1] [,2]
## ham 0.1376147 0.3832408
## spam 0.1138614 0.4803667
##
## once
## train_y [,1] [,2]
## ham 0.1100917 0.3920747
## spam 0.1188119 0.4181725
##
## one
## train_y [,1] [,2]
## ham 0.4633028 1.073790
## spam 0.8465347 2.762797
##
## online
## train_y [,1] [,2]
## ham 0.05963303 0.3200286
## spam 0.21782178 0.5390975
##
## only
## train_y [,1] [,2]
## ham 0.3211009 0.7902719
## spam 0.7623762 1.8986694
##
## open
## train_y [,1] [,2]
## ham 0.05963303 0.347637
## spam 0.04455446 0.206836
##
## opportunities
## train_y [,1] [,2]
## ham 0.03211009 0.2609380
## spam 0.06930693 0.4053725
##
## opportunity
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1584158 0.4832039
##
## optin
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08415842 0.4086704
##
## option
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 1.509900990 18.0169186
##
## order
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.50990099 1.8958979
##
## ordering
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.09405941 0.6035256
##
## orders
## train_y [,1] [,2]
## ham 0.02293578 0.178128
## spam 0.26237624 1.572611
##
## organization
## train_y [,1] [,2]
## ham 0.123853211 0.61342869
## spam 0.004950495 0.07035975
##
## original
## train_y [,1] [,2]
## ham 0.09174312 0.3602683
## spam 0.08415842 0.3703522
##
## osdn
## train_y [,1] [,2]
## ham 0.05963303 0.25603113
## spam 0.00990099 0.09925589
##
## other
## train_y [,1] [,2]
## ham 0.3027523 0.7374709
## spam 0.2970297 0.7667798
##
## others
## train_y [,1] [,2]
## ham 0.05963303 0.2560311
## spam 0.05445545 0.3338379
##
## our
## train_y [,1] [,2]
## ham 0.1743119 0.8352161
## spam 1.4900990 2.4802505
##
## out
## train_y [,1] [,2]
## ham 0.3669725 0.7456238
## spam 0.8663366 2.5268617
##
## outlook
## train_y [,1] [,2]
## ham 0.1192661 0.3387364
## spam 0.2227723 0.4171405
##
## over
## train_y [,1] [,2]
## ham 0.1926606 0.749526
## spam 0.4207921 1.025138
##
## own
## train_y [,1] [,2]
## ham 0.1651376 0.4802795
## spam 0.2970297 0.8527924
##
## owners
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.054455446 0.22747795
##
## package
## train_y [,1] [,2]
## ham 0.10550459 0.6388837
## spam 0.07920792 0.3780855
##
## packages
## train_y [,1] [,2]
## ham 0.09633028 0.4750797
## spam 0.00000000 0.0000000
##
## paddingbottom
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.01485149 0.1570158
##
## paddingleft
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.01485149 0.1570158
##
## paddingright
## train_y [,1] [,2]
## ham 0.000000000 0.00000000
## spam 0.004950495 0.07035975
##
## paddingtop
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.01485149 0.1570158
##
## page
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.08415842 0.4436916
##
## paid
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.12376238 0.4107143
##
## par
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04455446 0.6332378
##
## part
## train_y [,1] [,2]
## ham 0.04128440 0.2213120
## spam 0.06930693 0.4174651
##
## partner
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.089108911 0.41367206
##
## partners
## train_y [,1] [,2]
## ham 0.04128440 0.5457352
## spam 0.05445545 0.2856518
##
## party
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.10891089 0.3569074
##
## pass
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.019801980 0.13966542
##
## passed
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.064356436 0.30060773
##
## past
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.08415842 0.2956510
##
## paste
## train_y [,1] [,2]
## ham 0.03669725 0.4277603
## spam 0.05940594 0.2932671
##
## patch
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.01485149 0.2110793
##
## paul
## train_y [,1] [,2]
## ham 0.073394495 0.45449972
## spam 0.004950495 0.07035975
##
## pay
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.13366337 0.4196130
##
## paying
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.03465347 0.1833549
##
## pdt
## train_y [,1] [,2]
## ham 0.59174312 0.9806072
## spam 0.08910891 0.6245239
##
## people
## train_y [,1] [,2]
## ham 0.4724771 1.793916
## spam 0.6930693 3.619303
##
## peoples
## train_y [,1] [,2]
## ham 0.03211009 0.2228350
## spam 0.01485149 0.1212589
##
## per
## train_y [,1] [,2]
## ham 0.07798165 0.5747142
## spam 0.25742574 0.9267014
##
## perfect
## train_y [,1] [,2]
## ham 0.02293578 0.2239705
## spam 0.05445545 0.2856518
##
## performance
## train_y [,1] [,2]
## ham 0.05045872 0.4916114
## spam 0.04455446 0.2068360
##
## perl
## train_y [,1] [,2]
## ham 0.2293578 1.324237
## spam 0.0000000 0.000000
##
## person
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.12376238 0.6063755
##
## personal
## train_y [,1] [,2]
## ham 0.05045872 0.2193933
## spam 0.14356436 0.8370662
##
## pfont
## train_y [,1] [,2]
## ham 0.0000000 0.00000
## spam 0.4356436 1.50565
##
## pgp
## train_y [,1] [,2]
## ham 0.1788991 0.6856761
## spam 0.0000000 0.0000000
##
## phobos
## train_y [,1] [,2]
## ham 0.2110092 0.4200811
## spam 0.2079208 0.4068281
##
## phoboslabsnetnoteinccom
## train_y [,1] [,2]
## ham 0.2247706 0.516931
## spam 0.0000000 0.000000
##
## phoboslabsspamassassintaintorg
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.4356436 0.4970729
##
## phone
## train_y [,1] [,2]
## ham 0.1009174 0.3945470
## spam 0.1633663 0.5353611
##
## pickup
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.02970297 0.1701884
##
## place
## train_y [,1] [,2]
## ham 0.07798165 0.3011063
## spam 0.08415842 0.3424328
##
## plan
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.09405941 0.4416887
##
## plans
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.069306931 0.40537252
##
## platform
## train_y [,1] [,2]
## ham 0.05504587 0.3800282
## spam 0.00000000 0.0000000
##
## play
## train_y [,1] [,2]
## ham 0.03211009 0.2426356
## spam 0.01980198 0.1716295
##
## please
## train_y [,1] [,2]
## ham 0.1009174 0.3446752
## spam 0.8960396 1.2069798
##
## plus
## train_y [,1] [,2]
## ham 0.03669725 0.3573225
## spam 0.21287129 1.3491227
##
## pnbspp
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1089109 0.4866576
##
## point
## train_y [,1] [,2]
## ham 0.06422018 0.2637982
## spam 0.04455446 0.2068360
##
## poker
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.4221585
##
## policy
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.039603960 0.32844552
##
## political
## train_y [,1] [,2]
## ham 0.05045872 0.4098159
## spam 0.02475248 0.1849598
##
## pop
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.23267327 0.4235855
##
## popular
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.05445545 0.3025677
##
## port
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.004950495 0.07035975
##
## position
## train_y [,1] [,2]
## ham 0.02293578 0.178128
## spam 0.05445545 0.227478
##
## possible
## train_y [,1] [,2]
## ham 0.08256881 0.3495472
## spam 0.06435644 0.2459965
##
## post
## train_y [,1] [,2]
## ham 0.06422018 0.2637982
## spam 0.04950495 0.4082785
##
## postfix
## train_y [,1] [,2]
## ham 1.954128 1.276410
## spam 1.163366 0.703965
##
## potential
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.05940594 0.3945344
##
## pour
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1287129 1.829354
##
## power
## train_y [,1] [,2]
## ham 0.05963303 0.4197035
## spam 0.08415842 0.4761439
##
## precedence
## train_y [,1] [,2]
## ham 0.7660550 0.6688211
## spam 0.1534653 0.3613310
##
## predsednika
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.08910891 0.893303
##
## preferences
## train_y [,1] [,2]
## ham 0.05963303 0.3341181
## spam 0.00000000 0.0000000
##
## premium
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.019801980 0.17162948
##
## premiums
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## present
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.04950495 0.2392458
##
## president
## train_y [,1] [,2]
## ham 0.05963303 0.5688730
## spam 0.08910891 0.5918015
##
## pretty
## train_y [,1] [,2]
## ham 0.07339450 0.2945409
## spam 0.01485149 0.1212589
##
## price
## train_y [,1] [,2]
## ham 0.02293578 0.178128
## spam 0.39108911 1.449144
##
## prices
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1138614 0.3887804
##
## pricing
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.06435644 0.5379312
##
## private
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.17821782 0.4762214
##
## probably
## train_y [,1] [,2]
## ham 0.10550459 0.3991546
## spam 0.02970297 0.1972672
##
## problem
## train_y [,1] [,2]
## ham 0.13761468 0.4065793
## spam 0.03465347 0.2313429
##
## problems
## train_y [,1] [,2]
## ham 0.10091743 0.4387866
## spam 0.05940594 0.3097674
##
## process
## train_y [,1] [,2]
## ham 0.11467890 0.4509745
## spam 0.06930693 0.3800347
##
## processing
## train_y [,1] [,2]
## ham 0.08256881 1.0216860
## spam 0.02475248 0.1849598
##
## produced
## train_y [,1] [,2]
## ham 0.1009174 0.3019126
## spam 0.0990099 0.2994174
##
## product
## train_y [,1] [,2]
## ham 0.05963303 0.3853586
## spam 0.18316832 1.3717527
##
## products
## train_y [,1] [,2]
## ham 0.05504587 0.5316674
## spam 0.08910891 0.3185480
##
## professional
## train_y [,1] [,2]
## ham 0.02752294 0.2335349
## spam 0.13861386 0.4467618
##
## professionals
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.08415842 0.4655778
##
## profiled
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.0990099 0.6230234
##
## profitable
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.069306931 0.29107540
##
## program
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.32178218 1.6633063
##
## programs
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.14356436 1.2868192
##
## project
## train_y [,1] [,2]
## ham 0.04587156 0.2843157
## spam 0.01485149 0.1212589
##
## promotion
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.049504950 0.25920804
##
## proposal
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.2932671
##
## protect
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.138613861 0.48928220
##
## proven
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.03465347 0.1833549
##
## provide
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.14851485 0.4546310
##
## ptsize
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1089109 1.296002
##
## public
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.13366337 0.4196130
##
## publishing
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.05445545 0.4370923
##
## pudgeperlorg
## train_y [,1] [,2]
## ham 0.08256881 0.4919338
## spam 0.00000000 0.0000000
##
## purchase
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1237624 0.572617
##
## put
## train_y [,1] [,2]
## ham 0.09174312 0.3602683
## spam 0.13366337 0.4752119
##
## python
## train_y [,1] [,2]
## ham 0.04587156 0.4873576
## spam 0.00000000 0.0000000
##
## qmail
## train_y [,1] [,2]
## ham 0.22477064 0.4603457
## spam 0.05940594 0.2369702
##
## qmqp
## train_y [,1] [,2]
## ham 0.07798165 0.2687598
## spam 0.06435644 0.3006077
##
## qualify
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03960396 0.2609128
##
## quality
## train_y [,1] [,2]
## ham 0.03669725 0.3162746
## spam 0.06930693 0.2734495
##
## question
## train_y [,1] [,2]
## ham 0.05045872 0.2580051
## spam 0.02970297 0.1701884
##
## questions
## train_y [,1] [,2]
## ham 0.02752294 0.2128895
## spam 0.08415842 0.2956510
##
## quick
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.11386139 0.5104928
##
## quickly
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.02970297 0.2977677
##
## quite
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.01485149 0.1212589
##
## quote
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.059405941 0.34037679
##
## quotedprintable
## train_y [,1] [,2]
## ham 0.05045872 0.2193933
## spam 0.25247525 0.4893074
##
## rate
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.074257426 0.31454096
##
## rates
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.118811881 0.50445042
##
## rather
## train_y [,1] [,2]
## ham 0.07798165 0.31604054
## spam 0.00990099 0.09925589
##
## razor
## train_y [,1] [,2]
## ham 0.06880734 0.3838196
## spam 0.00000000 0.0000000
##
## razorusers
## train_y [,1] [,2]
## ham 0.1009174 0.5070008
## spam 0.0000000 0.0000000
##
## razorusersadminexamplesourceforgenet
## train_y [,1] [,2]
## ham 0.1238532 0.5982153
## spam 0.0000000 0.0000000
##
## razorusersexamplesourceforgenet
## train_y [,1] [,2]
## ham 0.1284404 0.6236642
## spam 0.0000000 0.0000000
##
## razoruserslistssourceforgenet
## train_y [,1] [,2]
## ham 0.1192661 0.5952929
## spam 0.0000000 0.0000000
##
## rdf
## train_y [,1] [,2]
## ham 0.09633028 0.8060173
## spam 0.00000000 0.0000000
##
## reach
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.054455446 0.22747795
##
## read
## train_y [,1] [,2]
## ham 0.06880734 0.2877522
## spam 0.15346535 0.7472756
##
## reading
## train_y [,1] [,2]
## ham 0.05045872 0.3866729
## spam 0.04950495 0.2951090
##
## ready
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 0.049504950 0.2174588
##
## real
## train_y [,1] [,2]
## ham 0.07798165 0.2853917
## spam 0.07920792 0.3780855
##
## really
## train_y [,1] [,2]
## ham 0.15596330 0.4833507
## spam 0.07920792 0.4717527
##
## realtime
## train_y [,1] [,2]
## ham 0.04128440 0.25963853
## spam 0.00990099 0.09925589
##
## reason
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.03960396 0.2194880
##
## receive
## train_y [,1] [,2]
## ham 0.02293578 0.178128
## spam 0.49504950 1.125231
##
## received
## train_y [,1] [,2]
## ham 6.059633 2.532939
## spam 5.019802 1.897787
##
## receiving
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.12376238 0.4226541
##
## recently
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.05940594 0.2571092
##
## recommendation
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.079207921 0.50239562
##
## red
## train_y [,1] [,2]
## ham 0.07339450 0.3515973
## spam 0.03465347 0.3213577
##
## references
## train_y [,1] [,2]
## ham 0.34862385 0.4776314
## spam 0.01485149 0.1212589
##
## regards
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.06930693 0.2546063
##
## register
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.5272954
##
## registered
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.128712871 0.40305713
##
## regular
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.03465347 0.2087326
##
## relaydubtnwcgroupcom
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.3945344
##
## release
## train_y [,1] [,2]
## ham 0.06422018 0.5643026
## spam 0.03960396 0.1955114
##
## relevant
## train_y [,1] [,2]
## ham 0.02293578 0.2023516
## spam 0.02970297 0.1972672
##
## remember
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.09405941 0.6511103
##
## removal
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1435644 0.4616466
##
## remove
## train_y [,1] [,2]
## ham 0.02752294 0.1900141
## spam 0.26732673 0.5443713
##
## removed
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.326732673 0.61666590
##
## repeat
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.000000000 0.00000000
##
## reply
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.18811881 0.4723788
##
## replyto
## train_y [,1] [,2]
## ham 0.2981651 0.4684481
## spam 0.4900990 0.5011440
##
## report
## train_y [,1] [,2]
## ham 0.09633028 0.4014781
## spam 0.67326733 4.7276237
##
## reported
## train_y [,1] [,2]
## ham 0.02752294 0.28668503
## spam 0.00990099 0.09925589
##
## reports
## train_y [,1] [,2]
## ham 0.05963303 0.347637
## spam 0.17821782 1.272722
##
## republic
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.054455446 0.45929315
##
## request
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.07920792 0.3052825
##
## requests
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.049504950 0.21745876
##
## required
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.12871287 0.4152172
##
## research
## train_y [,1] [,2]
## ham 0.03669725 0.2322642
## spam 0.07920792 0.3507822
##
## resources
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 0.059405941 0.3945344
##
## respect
## train_y [,1] [,2]
## ham 0.05045872 0.2394786
## spam 0.03465347 0.1833549
##
## response
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.148514851 0.83321757
##
## rest
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.034653465 0.18335494
##
## result
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.03465347 0.1833549
##
## results
## train_y [,1] [,2]
## ham 0.05504587 0.3417187
## spam 0.06435644 0.3873842
##
## retail
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.074257426 0.26284076
##
## return
## train_y [,1] [,2]
## ham 0.04128440 0.2596385
## spam 0.05940594 0.3546923
##
## returnpath
## train_y [,1] [,2]
## ham 1.000000 0.0000000
## spam 1.039604 0.2609128
##
## revenues
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.064356436 0.37432105
##
## revision
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 0.044554455 0.2068360
##
## richardwcom
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.4189962
##
## right
## train_y [,1] [,2]
## ham 0.1743119 0.6123768
## spam 0.1732673 0.5595647
##
## rights
## train_y [,1] [,2]
## ham 0.05045872 0.2394786
## spam 0.04950495 0.2777392
##
## risk
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.173267327 0.62666949
##
## road
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.02475248 0.1557559
##
## robert
## train_y [,1] [,2]
## ham 0.03669725 0.2114948
## spam 0.01485149 0.1212589
##
## rohit
## train_y [,1] [,2]
## ham 0.293578 0.5728537
## spam 0.000000 0.0000000
##
## roman
## train_y [,1] [,2]
## ham 0.03669725 0.2114948
## spam 0.15841584 0.7561709
##
## root
## train_y [,1] [,2]
## ham 0.10091743 0.8793059
## spam 0.03465347 0.4276363
##
## rootlocalhost
## train_y [,1] [,2]
## ham 0.04587156 0.2498044
## spam 0.08415842 0.3424328
##
## rootlughtuathaorg
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.07425743 0.2628408
##
## rowspand
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02970297 0.1701884
##
## rpm
## train_y [,1] [,2]
## ham 0.233945 0.8288263
## spam 0.000000 0.0000000
##
## rpmlist
## train_y [,1] [,2]
## ham 0.1100917 0.3418424
## spam 0.0000000 0.0000000
##
## rpmlistadminfreshrpmsnet
## train_y [,1] [,2]
## ham 0.1009174 0.3019126
## spam 0.0000000 0.0000000
##
## rpmlistfreshrpmsnet
## train_y [,1] [,2]
## ham 0.3211009 0.9965908
## spam 0.0000000 0.0000000
##
## rpmzzzlistadminfreshrpmsnet
## train_y [,1] [,2]
## ham 0.3027523 0.9057379
## spam 0.0000000 0.0000000
##
## rpmzzzlistfreshrpmsnet
## train_y [,1] [,2]
## ham 0.4036697 1.20765
## spam 0.0000000 0.00000
##
## rss
## train_y [,1] [,2]
## ham 0.1422018 1.704686
## spam 0.0000000 0.000000
##
## rssfeedsjmasonorg
## train_y [,1] [,2]
## ham 0.2431193 0.4299538
## spam 0.0000000 0.0000000
##
## rssfeedsspamassassintaintorg
## train_y [,1] [,2]
## ham 0.5137615 0.8914823
## spam 0.0000000 0.0000000
##
## rules
## train_y [,1] [,2]
## ham 0.050458716 0.46263598
## spam 0.004950495 0.07035975
##
## run
## train_y [,1] [,2]
## ham 0.09633028 0.3392664
## spam 0.07920792 0.3052825
##
## running
## train_y [,1] [,2]
## ham 0.077981651 0.26875984
## spam 0.004950495 0.07035975
##
## safetyolnewnamednscom
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1188119 0.6807536
##
## said
## train_y [,1] [,2]
## ham 0.32568807 1.6318774
## spam 0.07425743 0.3724742
##
## sales
## train_y [,1] [,2]
## ham 0.04587156 0.3000867
## spam 0.17326733 0.5770729
##
## same
## train_y [,1] [,2]
## ham 0.1788991 0.4504117
## spam 0.1435644 0.4616466
##
## san
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.01485149 0.1570158
##
## sans
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02475248 0.2531025
##
## sansserif
## train_y [,1] [,2]
## ham 0.000000 0.000000
## spam 1.277228 4.152848
##
## sat
## train_y [,1] [,2]
## ham 0.5504587 1.746550
## spam 0.5346535 1.479978
##
## satalk
## train_y [,1] [,2]
## ham 0.08256881 0.2920905
## spam 0.00000000 0.0000000
##
## save
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.222772277 0.93833569
##
## savings
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.064356436 0.47923740
##
## savoir
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05445545 0.7739573
##
## say
## train_y [,1] [,2]
## ham 0.1376147 0.4496366
## spam 0.1039604 0.5316679
##
## saying
## train_y [,1] [,2]
## ham 0.04128440 0.27681900
## spam 0.00990099 0.09925589
##
## says
## train_y [,1] [,2]
## ham 0.05963303 0.3476370
## spam 0.01980198 0.1716295
##
## school
## train_y [,1] [,2]
## ham 0.06422018 0.3533964
## spam 0.05445545 0.3889071
##
## science
## train_y [,1] [,2]
## ham 0.07798165 0.9688092
## spam 0.01485149 0.1212589
##
## script
## train_y [,1] [,2]
## ham 0.06880734 0.3716193
## spam 0.01980198 0.2814390
##
## scroll
## train_y [,1] [,2]
## ham 0.000000000 0.00000000
## spam 0.004950495 0.07035975
##
## search
## train_y [,1] [,2]
## ham 0.0412844 0.2929938
## spam 0.1534653 0.7270282
##
## second
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.04455446 0.2068360
##
## secret
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07920792 0.3363003
##
## secure
## train_y [,1] [,2]
## ham 0.07798165 0.6428419
## spam 0.06930693 0.2734495
##
## securities
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.4672676
##
## security
## train_y [,1] [,2]
## ham 0.1009174 0.5830945
## spam 0.1336634 0.5959694
##
## see
## train_y [,1] [,2]
## ham 0.1743119 0.5971366
## spam 0.2524752 0.8349450
##
## seed
## train_y [,1] [,2]
## ham 0.04587156 0.6772855
## spam 0.02475248 0.3517988
##
## seem
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.03465347 0.1833549
##
## seems
## train_y [,1] [,2]
## ham 0.09174312 0.3195990
## spam 0.02970297 0.2977677
##
## seen
## train_y [,1] [,2]
## ham 0.06880734 0.2877522
## spam 0.09405941 0.4528125
##
## select
## train_y [,1] [,2]
## ham 0.03211009 0.2780381
## spam 0.01980198 0.1716295
##
## self
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.044554455 0.22963317
##
## sell
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.10396040 0.4279817
##
## selling
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.03465347 0.2519321
##
## send
## train_y [,1] [,2]
## ham 0.1192661 0.3649322
## spam 0.5495050 2.0418567
##
## sender
## train_y [,1] [,2]
## ham 0.7110092 0.6609363
## spam 0.2871287 0.4749786
##
## sending
## train_y [,1] [,2]
## ham 0.03211009 0.2426356
## spam 0.16831683 0.9417810
##
## senior
## train_y [,1] [,2]
## ham 0.02293578 0.2239705
## spam 0.02475248 0.1849598
##
## sent
## train_y [,1] [,2]
## ham 0.09633028 0.3392664
## spam 0.23762376 0.5841780
##
## senttozzzzspamassassintaintorgreturnsgroupsyahoocom
## train_y [,1] [,2]
## ham 0.09174312 0.4711254
## spam 0.00000000 0.0000000
##
## sep
## train_y [,1] [,2]
## ham 4.018349 4.653885
## spam 4.079208 3.584608
##
## september
## train_y [,1] [,2]
## ham 0.09174312 0.4511385
## spam 0.01980198 0.1716295
##
## sequence
## train_y [,1] [,2]
## ham 0.05045872 0.3490933
## spam 0.01485149 0.1212589
##
## sequences
## train_y [,1] [,2]
## ham 0.09633028 0.6032824
## spam 0.00000000 0.0000000
##
## server
## train_y [,1] [,2]
## ham 0.16513761 0.8033772
## spam 0.03960396 0.2194880
##
## service
## train_y [,1] [,2]
## ham 0.09633028 0.3898308
## spam 0.37623762 0.6594357
##
## services
## train_y [,1] [,2]
## ham 0.1055046 0.9756798
## spam 0.2227723 0.6873963
##
## set
## train_y [,1] [,2]
## ham 0.12844037 0.4624303
## spam 0.04950495 0.2392458
##
## settlement
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.00000000 0.0000000
##
## several
## train_y [,1] [,2]
## ham 0.06422018 0.3662043
## spam 0.10891089 0.4208731
##
## sex
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.01980198 0.1716295
##
## sfnet
## train_y [,1] [,2]
## ham 0.26605505 0.7007856
## spam 0.03465347 0.2519321
##
## shangrila
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1435644 2.040433
##
## share
## train_y [,1] [,2]
## ham 0.04587156 0.2843157
## spam 0.11386139 0.3887804
##
## she
## train_y [,1] [,2]
## ham 0.04128440 0.3632194
## spam 0.08415842 0.6524706
##
## shipping
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1881188 1.112685
##
## short
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.05445545 0.3025677
##
## should
## train_y [,1] [,2]
## ham 0.1834862 0.5018884
## spam 0.2673267 0.9075810
##
## show
## train_y [,1] [,2]
## ham 0.09633028 0.4652786
## spam 0.05940594 0.3097674
##
## shows
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.01485149 0.1212589
##
## sign
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.03465347 0.2087326
##
## signature
## train_y [,1] [,2]
## ham 0.14678899 0.5315084
## spam 0.03465347 0.2087326
##
## signed
## train_y [,1] [,2]
## ham 0.04128440 0.1994051
## spam 0.01980198 0.1716295
##
## similar
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.05445545 0.2274780
##
## simple
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.17326733 0.7561872
##
## simply
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.15346535 0.5098169
##
## since
## train_y [,1] [,2]
## ham 0.10550459 0.4532189
## spam 0.07920792 0.3780855
##
## sincerely
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.2369702
##
## single
## train_y [,1] [,2]
## ham 0.036697248 0.23226417
## spam 0.004950495 0.07035975
##
## singledrop
## train_y [,1] [,2]
## ham 0.9816514 0.1345175
## spam 0.9306931 0.2546063
##
## sir
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.3076933
##
## site
## train_y [,1] [,2]
## ham 0.1146789 0.5175849
## spam 0.1831683 0.7924481
##
## sites
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.19801980 0.7793326
##
## six
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.089108911 0.36238595
##
## size
## train_y [,1] [,2]
## ham 0.0412844 0.2596385
## spam 1.3217822 3.8711101
##
## sizea
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.6978508
##
## sizebfont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08910891 0.5481586
##
## sized
## train_y [,1] [,2]
## ham 0.000000 0.000000
## spam 1.024752 4.005516
##
## sizedb
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04455446 0.3353101
##
## sizedfontbitd
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## sizedtd
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1584158 0.8607838
##
## sizefont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.3528823
##
## small
## train_y [,1] [,2]
## ham 0.03669725 0.2114948
## spam 0.05940594 0.4532203
##
## smoking
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07920792 0.9893278
##
## smtp
## train_y [,1] [,2]
## ham 0.6651376 0.9943505
## spam 0.9653465 0.8602829
##
## smtpeasydnscom
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1336634 0.6205081
##
## smtpsvc
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.12376238 0.3301284
##
## social
## train_y [,1] [,2]
## ham 0.05504587 0.3919669
## spam 0.05940594 0.3403768
##
## socialadminlinuxie
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.0990099 0.6230234
##
## sociallinuxie
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1336634 0.8446233
##
## software
## train_y [,1] [,2]
## ham 0.1513761 0.7498221
## spam 0.2821782 0.9167210
##
## sold
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.06435644 0.3167258
##
## solution
## train_y [,1] [,2]
## ham 0.05504587 0.3549482
## spam 0.01980198 0.1396654
##
## solutions
## train_y [,1] [,2]
## ham 0.05963303 0.4514430
## spam 0.01980198 0.2221647
##
## some
## train_y [,1] [,2]
## ham 0.3256881 0.7115769
## spam 0.1980198 0.6985387
##
## someone
## train_y [,1] [,2]
## ham 0.11009174 0.4571894
## spam 0.05445545 0.2483877
##
## something
## train_y [,1] [,2]
## ham 0.16055046 0.4573049
## spam 0.06435644 0.3607853
##
## son
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.3403768
##
## soon
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.05445545 0.2483877
##
## sound
## train_y [,1] [,2]
## ham 0.018348624 0.16526205
## spam 0.004950495 0.07035975
##
## source
## train_y [,1] [,2]
## ham 0.08715596 0.5229482
## spam 0.02475248 0.2101437
##
## south
## train_y [,1] [,2]
## ham 0.02752294 0.2128895
## spam 0.10396040 0.7623505
##
## space
## train_y [,1] [,2]
## ham 0.04128440 0.24123760
## spam 0.00990099 0.09925589
##
## spam
## train_y [,1] [,2]
## ham 0.19724771 0.7989577
## spam 0.05940594 0.2757814
##
## spamassassin
## train_y [,1] [,2]
## ham 0.1284404 0.4420505
## spam 0.0000000 0.0000000
##
## spamassassintaintorg
## train_y [,1] [,2]
## ham 0.077981651 0.35711538
## spam 0.004950495 0.07035975
##
## spamassassintalk
## train_y [,1] [,2]
## ham 0.08715596 0.3136234
## spam 0.00000000 0.0000000
##
## spamassassintalkadminexamplesourceforgenet
## train_y [,1] [,2]
## ham 0.233945 0.8062795
## spam 0.000000 0.0000000
##
## spamassassintalkadminlistssourceforgenet
## train_y [,1] [,2]
## ham 0.07798165 0.2687598
## spam 0.00000000 0.0000000
##
## spamassassintalkexamplesourceforgenet
## train_y [,1] [,2]
## ham 0.2431193 0.8261695
## spam 0.0000000 0.0000000
##
## spamassassintalklistssourceforgenet
## train_y [,1] [,2]
## ham 0.2155963 0.758567
## spam 0.0000000 0.000000
##
## spambayes
## train_y [,1] [,2]
## ham 0.03669725 0.269035
## spam 0.00000000 0.000000
##
## special
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.33663366 0.9648265
##
## sponsored
## train_y [,1] [,2]
## ham 0.13302752 0.3664640
## spam 0.01485149 0.1212589
##
## srcdhttpiiqusimagesamfingif
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## srcdhttpiiqusimagesvbilsagif
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## srchttpaeakamainetfdimagescolumbiahousecomchimagesdemoemailcleargif
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.09405941 1.336835
##
## srchttpefriendfindercombannersaffadimagesgif
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1435644 2.040433
##
## srchttpwwwprizeinthebagnetimagesrvmovieoceanscjpg
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## srchttpwwwsalealertscomdotgif
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.0990099 0.6983271
##
## standard
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.07425743 0.3982932
##
## start
## train_y [,1] [,2]
## ham 0.05045872 0.2394786
## spam 0.19306931 0.8738042
##
## started
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.07425743 0.5080748
##
## state
## train_y [,1] [,2]
## ham 0.06880734 0.4071249
## spam 0.16336634 0.4763507
##
## statements
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.09900990 0.6230234
##
## states
## train_y [,1] [,2]
## ham 0.05045872 0.2915472
## spam 0.16336634 0.5353611
##
## step
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.074257426 0.98719699
##
## still
## train_y [,1] [,2]
## ham 0.15596330 0.4222894
## spam 0.08415842 0.3424328
##
## stock
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.079207921 0.54055745
##
## stop
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.06930693 0.3800347
##
## storage
## train_y [,1] [,2]
## ham 0.08715596 1.023557
## spam 0.00000000 0.000000
##
## store
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.04455446 0.3201291
##
## stories
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.02475248 0.2897609
##
## story
## train_y [,1] [,2]
## ham 0.08256881 0.4320872
## spam 0.01980198 0.1396654
##
## street
## train_y [,1] [,2]
## ham 0.05963303 0.2560311
## spam 0.02970297 0.2210534
##
## strong
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.03960396 0.1955114
##
## stuff
## train_y [,1] [,2]
## ham 0.07339450 0.3515973
## spam 0.02475248 0.2326169
##
## style
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.099009901 0.43491751
##
## stylebackgroundcolor
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.0990099 1.205336
##
## stylebordercollapse
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08415842 0.5257986
##
## styleborderright
## train_y [,1] [,2]
## ham 0.000000000 0.00000000
## spam 0.004950495 0.07035975
##
## stylecolor
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## styledcolor
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04455446 0.3353101
##
## stylefontfamily
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.6947912
##
## stylefontsize
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.3168317 2.740631
##
## styleheight
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## stylemargin
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## stylemarginleft
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.8222229
##
## styletextalign
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.01980198 0.2221647
##
## subject
## train_y [,1] [,2]
## ham 1.155963 0.4112320
## spam 1.163366 0.4209023
##
## submit
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.04950495 0.3562166
##
## subscribed
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.06435644 0.2459965
##
## subscription
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.00990099 0.1407195
##
## success
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.089108911 0.50072604
##
## successful
## train_y [,1] [,2]
## ham 0.009174312 0.1354571
## spam 0.054455446 0.3185867
##
## such
## train_y [,1] [,2]
## ham 0.1513761 0.6720372
## spam 0.1386139 0.4685046
##
## suite
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.05445545 0.2483877
##
## sum
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.08415842 0.3963095
##
## sun
## train_y [,1] [,2]
## ham 0.3853211 1.458541
## spam 0.5049505 1.463478
##
## super
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.059405941 0.46406772
##
## supplied
## train_y [,1] [,2]
## ham 0.082568807 0.27586280
## spam 0.004950495 0.07035975
##
## support
## train_y [,1] [,2]
## ham 0.07798165 0.3697945
## spam 0.11386139 0.4013732
##
## sur
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.07920792 1.125756
##
## sure
## train_y [,1] [,2]
## ham 0.1238532 0.3938231
## spam 0.0990099 0.5184161
##
## sweet
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.103960396 1.47755484
##
## system
## train_y [,1] [,2]
## ham 0.2018349 0.6263024
## spam 0.1039604 0.4506316
##
## systems
## train_y [,1] [,2]
## ham 0.07339450 0.4119510
## spam 0.02475248 0.2326169
##
## systemworks
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1138614 0.5480912
##
## table
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 1.87623762 4.3901212
##
## tahoma
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.2029703 1.661202
##
## take
## train_y [,1] [,2]
## ham 0.1055046 0.3628700
## spam 0.2178218 0.5572491
##
## taken
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.06435644 0.4000209
##
## takes
## train_y [,1] [,2]
## ham 0.04128440 0.1994051
## spam 0.01980198 0.1396654
##
## talk
## train_y [,1] [,2]
## ham 0.10550459 0.3365137
## spam 0.01980198 0.2814390
##
## targetblankimg
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.02475248 0.3517988
##
## tax
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.064356436 0.49956880
##
## tbody
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.2673267 1.158066
##
## tda
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.0990099 0.5554784
##
## tdbfont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.5871219
##
## tdfont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04455446 0.2296332
##
## tdibfont
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## tdimg
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1386139 1.167808
##
## tdinput
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07920792 0.7149181
##
## tdtr
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.05940594 0.368452
##
## tdtrtable
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.04455446 0.250363
##
## teach
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.04950495 0.3562166
##
## team
## train_y [,1] [,2]
## ham 0.04128440 0.4326983
## spam 0.03960396 0.1955114
##
## technologies
## train_y [,1] [,2]
## ham 0.064220183 0.62623488
## spam 0.004950495 0.07035975
##
## technology
## train_y [,1] [,2]
## ham 0.31192661 2.8225466
## spam 0.08415842 0.4547664
##
## tél
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04455446 0.6332378
##
## tell
## train_y [,1] [,2]
## ham 0.07798165 0.3160405
## spam 0.06435644 0.3320625
##
## term
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.113861386 0.79919483
##
## terms
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.07425743 0.3447267
##
## test
## train_y [,1] [,2]
## ham 0.06422018 0.3118321
## spam 0.01485149 0.1212589
##
## text
## train_y [,1] [,2]
## ham 0.05504587 0.4148147
## spam 0.09405941 0.5047677
##
## textd
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.2628408
##
## textdecoration
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04950495 0.3831329
##
## texthtml
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.4851485 0.5484281
##
## textplain
## train_y [,1] [,2]
## ham 0.9220183 0.2853917
## spam 0.4950495 0.5012177
##
## than
## train_y [,1] [,2]
## ham 0.2568807 0.7042160
## spam 0.2970297 0.9929558
##
## thank
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.089108911 0.30252700
##
## thanks
## train_y [,1] [,2]
## ham 0.10550459 0.3365137
## spam 0.05445545 0.3185867
##
## that
## train_y [,1] [,2]
## ham 2.114679 3.685901
## spam 2.207921 6.366606
##
## thats
## train_y [,1] [,2]
## ham 0.1284404 0.4095837
## spam 0.1237624 0.4881987
##
## the
## train_y [,1] [,2]
## ham 8.284404 13.20598
## spam 9.925743 21.30142
##
## their
## train_y [,1] [,2]
## ham 0.4220183 1.092864
## spam 0.3168317 1.101159
##
## them
## train_y [,1] [,2]
## ham 0.2844037 0.798521
## spam 0.2524752 1.150651
##
## then
## train_y [,1] [,2]
## ham 0.2155963 0.5296159
## spam 0.2326733 0.6311340
##
## there
## train_y [,1] [,2]
## ham 0.4495413 0.8797385
## spam 0.3663366 1.2984704
##
## theres
## train_y [,1] [,2]
## ham 0.08256881 0.2920905
## spam 0.02475248 0.1557559
##
## these
## train_y [,1] [,2]
## ham 0.1972477 0.745241
## spam 0.3564356 1.051738
##
## they
## train_y [,1] [,2]
## ham 0.6238532 1.489063
## spam 0.3613861 1.565516
##
## thing
## train_y [,1] [,2]
## ham 0.12385321 0.5064377
## spam 0.07425743 0.4105944
##
## things
## train_y [,1] [,2]
## ham 0.15137615 0.4503178
## spam 0.05445545 0.3338379
##
## think
## train_y [,1] [,2]
## ham 0.2798165 0.7309207
## spam 0.1039604 0.4161947
##
## thinking
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.06930693 0.5419226
##
## third
## train_y [,1] [,2]
## ham 0.04128440 0.3503023
## spam 0.05940594 0.2571092
##
## this
## train_y [,1] [,2]
## ham 1.449541 2.271792
## spam 3.618812 6.414244
##
## those
## train_y [,1] [,2]
## ham 0.1743119 0.6272467
## spam 0.2326733 1.0974062
##
## though
## train_y [,1] [,2]
## ham 0.06880734 0.3033446
## spam 0.05445545 0.2676690
##
## thought
## train_y [,1] [,2]
## ham 0.07798165 0.3011063
## spam 0.02970297 0.2210534
##
## thousands
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.168316832 0.65516405
##
## threadindex
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.049504950 0.21745876
##
## three
## train_y [,1] [,2]
## ham 0.05963303 0.3732086
## spam 0.02970297 0.1701884
##
## through
## train_y [,1] [,2]
## ham 0.1651376 0.4991008
## spam 0.2722772 0.6911838
##
## thu
## train_y [,1] [,2]
## ham 1.610092 2.745636
## spam 1.094059 2.157035
##
## thus
## train_y [,1] [,2]
## ham 0.02752294 0.2866850
## spam 0.01980198 0.1396654
##
## tim
## train_y [,1] [,2]
## ham 0.064220183 0.45589289
## spam 0.004950495 0.07035975
##
## time
## train_y [,1] [,2]
## ham 0.2614679 0.7861014
## spam 0.6386139 1.3578935
##
## times
## train_y [,1] [,2]
## ham 0.1100917 0.5403438
## spam 0.1881188 1.0193440
##
## tipsmtpadmanmailcom
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.4880978
##
## tired
## train_y [,1] [,2]
## ham 0.05963303 0.2560311
## spam 0.02475248 0.1849598
##
## title
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.05445545 0.2676690
##
## tlsvdescbcsha
## train_y [,1] [,2]
## ham 0.045871560 0.20968799
## spam 0.004950495 0.07035975
##
## tobacco
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06435644 0.4122705
##
## today
## train_y [,1] [,2]
## ham 0.08256881 0.3624911
## spam 0.25247525 0.6469360
##
## told
## train_y [,1] [,2]
## ham 0.03669725 0.2114948
## spam 0.03465347 0.2313429
##
## tollfree
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.064356436 0.24599646
##
## tom
## train_y [,1] [,2]
## ham 0.0733945 0.3382367
## spam 0.0000000 0.0000000
##
## tony
## train_y [,1] [,2]
## ham 0.06422018 0.6689317
## spam 0.00000000 0.0000000
##
## too
## train_y [,1] [,2]
## ham 0.11926606 0.3649322
## spam 0.09405941 0.3940659
##
## took
## train_y [,1] [,2]
## ham 0.02293578 0.1500433
## spam 0.05940594 0.2369702
##
## tools
## train_y [,1] [,2]
## ham 0.05963303 0.5179932
## spam 0.03465347 0.1833549
##
## top
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.09900990 0.4679786
##
## total
## train_y [,1] [,2]
## ham 0.02752294 0.2128895
## spam 0.24257426 1.3769854
##
## totally
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.03465347 0.2519321
##
## track
## train_y [,1] [,2]
## ham 0.04587156 0.2676169
## spam 0.05940594 0.2369702
##
## trade
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.04455446 0.3772050
##
## trading
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.178217822 1.47884613
##
## traffic
## train_y [,1] [,2]
## ham 0.03669725 0.3573225
## spam 0.02475248 0.2101437
##
## training
## train_y [,1] [,2]
## ham 0.02752294 0.2524978
## spam 0.04455446 0.2695030
##
## transaction
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.21287129 0.9246393
##
## transfer
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.12871287 0.5583089
##
## transitionalen
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03960396 0.1955114
##
## tried
## train_y [,1] [,2]
## ham 0.05963303 0.2898019
## spam 0.03960396 0.2609128
##
## trtd
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08415842 0.6049929
##
## true
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.05940594 0.2757814
##
## trust
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.08910891 0.4370642
##
## try
## train_y [,1] [,2]
## ham 0.08715596 0.3136234
## spam 0.09405941 0.5241096
##
## trying
## train_y [,1] [,2]
## ham 0.09633028 0.3392664
## spam 0.02970297 0.1972672
##
## tue
## train_y [,1] [,2]
## ham 1.151376 2.345704
## spam 1.000000 2.183156
##
## tuesday
## train_y [,1] [,2]
## ham 0.04128440 0.22131197
## spam 0.00990099 0.09925589
##
## turn
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.04455446 0.2503630
##
## two
## train_y [,1] [,2]
## ham 0.1697248 0.5112566
## spam 0.1336634 0.6743011
##
## txtdogmaslashnullorg
## train_y [,1] [,2]
## ham 0.0000000 0.00000
## spam 0.4108911 4.27587
##
## type
## train_y [,1] [,2]
## ham 0.07339450 0.3769003
## spam 0.09405941 0.3812318
##
## typecheckbox
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.9850366
##
## typedhidden
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08910891 0.5105652
##
## typedsubmit
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03465347 0.1833549
##
## typedtext
## train_y [,1] [,2]
## ham 0.0000000 0.00000
## spam 0.1633663 0.95576
##
## typehidden
## train_y [,1] [,2]
## ham 0.0000000 0.00000
## spam 0.1188119 1.00533
##
## uaa
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.039603960 0.21948796
##
## uid
## train_y [,1] [,2]
## ham 0.08715596 0.2827126
## spam 0.02970297 0.1701884
##
## under
## train_y [,1] [,2]
## ham 0.05963303 0.2734383
## spam 0.11386139 0.4013732
##
## understand
## train_y [,1] [,2]
## ham 0.04587156 0.2096880
## spam 0.07920792 0.3363003
##
## une
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04455446 0.6332378
##
## united
## train_y [,1] [,2]
## ham 0.03669725 0.2513230
## spam 0.10891089 0.3705849
##
## universal
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.000000000 0.00000000
##
## university
## train_y [,1] [,2]
## ham 0.09633028 0.68215254
## spam 0.00990099 0.09925589
##
## unknown
## train_y [,1] [,2]
## ham 0.2339450 0.5641715
## spam 0.2524752 0.5906559
##
## unseen
## train_y [,1] [,2]
## ham 0.07798165 0.49734
## spam 0.00000000 0.00000
##
## unsolicited
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.08910891 0.3483868
##
## unspun
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## unsubscribe
## train_y [,1] [,2]
## ham 0.06880734 0.2712651
## spam 0.16336634 0.4439136
##
## unsubscribed
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04455446 0.2296332
##
## unsubscription
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.09900990 0.2994174
##
## until
## train_y [,1] [,2]
## ham 0.03669725 0.1884502
## spam 0.08910891 0.5660197
##
## update
## train_y [,1] [,2]
## ham 0.04587156 0.356256
## spam 0.00000000 0.000000
##
## upfront
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.01980198 0.281439
##
## upon
## train_y [,1] [,2]
## ham 0.03211009 0.1766982
## spam 0.08910891 0.4014652
##
## urgent
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.054455446 0.34842210
##
## url
## train_y [,1] [,2]
## ham 0.26605505 0.4429104
## spam 0.05445545 0.3338379
##
## urncontentclassesmessage
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.049504950 0.21745876
##
## usa
## train_y [,1] [,2]
## ham 0.05045872 0.2580051
## spam 0.04950495 0.3831329
##
## use
## train_y [,1] [,2]
## ham 0.5229358 1.1079631
## spam 0.2821782 0.8432224
##
## used
## train_y [,1] [,2]
## ham 0.13761468 0.4889166
## spam 0.06435644 0.3167258
##
## useful
## train_y [,1] [,2]
## ham 0.022935780 0.15004333
## spam 0.004950495 0.07035975
##
## user
## train_y [,1] [,2]
## ham 0.10091743 0.3704513
## spam 0.01980198 0.1716295
##
## useragent
## train_y [,1] [,2]
## ham 0.1605505 0.3679607
## spam 0.0000000 0.0000000
##
## userid
## train_y [,1] [,2]
## ham 0.11009174 0.3137245
## spam 0.01980198 0.1396654
##
## users
## train_y [,1] [,2]
## ham 0.2064220 0.5667882
## spam 0.2425743 0.6658472
##
## using
## train_y [,1] [,2]
## ham 0.2385321 0.6353501
## spam 0.1039604 0.3513785
##
## uswsffwsourceforgenet
## train_y [,1] [,2]
## ham 0.12844037 0.3353495
## spam 0.01980198 0.1396654
##
## uswsflistbsourceforgenet
## train_y [,1] [,2]
## ham 0.13302752 0.3536655
## spam 0.01980198 0.1396654
##
## uswsflistsourceforgenet
## train_y [,1] [,2]
## ham 0.39908257 1.0609964
## spam 0.05940594 0.4189962
##
## utc
## train_y [,1] [,2]
## ham 0.03211009 0.2228350
## spam 0.12376238 0.3301284
##
## utilities
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.3724742
##
## valid
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.02970297 0.1701884
##
## valigncenter
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## valigndmiddle
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.05445545 0.227478
##
## valigndtop
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.3384902
##
## valigntop
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.3910891 1.666324
##
## valuable
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1386139 0.5377251
##
## value
## train_y [,1] [,2]
## ham 0.0412844 0.2213120
## spam 0.1534653 0.7201526
##
## valued
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.04455446 0.206836
##
## valuedsubmit
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.03465347 0.1833549
##
## valueoption
## train_y [,1] [,2]
## ham 0.000000000 0.00000000
## spam 0.004950495 0.07035975
##
## vamm
## train_y [,1] [,2]
## ham 0.27064220 0.7217238
## spam 0.04455446 0.3201291
##
## various
## train_y [,1] [,2]
## ham 0.04587156 0.2498044
## spam 0.01485149 0.1570158
##
## venture
## train_y [,1] [,2]
## ham 0.06880734 1.0159282
## spam 0.02475248 0.1849598
##
## verdana
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1039604 0.7221335
##
## version
## train_y [,1] [,2]
## ham 0.2889908 0.6609363
## spam 0.1584158 0.5033751
##
## very
## train_y [,1] [,2]
## ham 0.1651376 0.6370115
## spam 0.1534653 0.5382974
##
## via
## train_y [,1] [,2]
## ham 0.06422018 0.2637982
## spam 0.12871287 0.4817746
##
## video
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.034653465 0.32135771
##
## view
## train_y [,1] [,2]
## ham 0.03211009 0.2228350
## spam 0.03465347 0.2313429
##
## viruses
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.069306931 0.33849023
##
## visit
## train_y [,1] [,2]
## ham 0.02752294 0.2128895
## spam 0.18811881 0.5127794
##
## vous
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.07920792 1.125756
##
## vspace
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05445545 0.4483302
##
## wait
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.03960396 0.2194880
##
## waiting
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.039603960 0.21948796
##
## want
## train_y [,1] [,2]
## ham 0.1467890 0.5651261
## spam 0.4108911 0.9745417
##
## wanted
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.03960396 0.2194880
##
## war
## train_y [,1] [,2]
## ham 0.04587156 0.2096880
## spam 0.01980198 0.1396654
##
## warranty
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.079207921 0.55866166
##
## was
## train_y [,1] [,2]
## ham 0.8027523 2.007496
## spam 0.5990099 2.277425
##
## washington
## train_y [,1] [,2]
## ham 0.01376147 0.1511662
## spam 0.03960396 0.2194880
##
## way
## train_y [,1] [,2]
## ham 0.2339450 0.6261167
## spam 0.1980198 0.7126407
##
## wcdtd
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.2369702
##
## web
## train_y [,1] [,2]
## ham 0.2385321 1.5769017
## spam 0.2326733 0.9089504
##
## webex
## train_y [,1] [,2]
## ham 0.06880734 1.015928
## spam 0.00000000 0.000000
##
## webmasterefiie
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.2475248 0.5885464
##
## webnotenet
## train_y [,1] [,2]
## ham 0.05504587 0.3279559
## spam 0.74752475 0.8467784
##
## website
## train_y [,1] [,2]
## ham 0.05504587 0.4957862
## spam 0.16336634 0.7778300
##
## wed
## train_y [,1] [,2]
## ham 1.7889908 3.067815
## spam 0.8316832 1.795901
##
## week
## train_y [,1] [,2]
## ham 0.02293578 0.2239705
## spam 0.11386139 0.4698957
##
## weeks
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.18811881 1.1820622
##
## weight
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.064356436 0.44700978
##
## welcome
## train_y [,1] [,2]
## ham 0.05963303 0.2560311
## spam 0.01485149 0.1212589
##
## well
## train_y [,1] [,2]
## ham 0.1238532 0.4165691
## spam 0.2326733 0.7919507
##
## went
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.01980198 0.1716295
##
## were
## train_y [,1] [,2]
## ham 0.2614679 0.8034956
## spam 0.1435644 0.4507409
##
## west
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.02475248 0.1849598
##
## weve
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.029702970 0.26223098
##
## what
## train_y [,1] [,2]
## ham 0.3532110 0.7793497
## spam 0.3465347 1.2808241
##
## when
## train_y [,1] [,2]
## ham 0.2385321 0.5327796
## spam 0.2772277 1.0036631
##
## where
## train_y [,1] [,2]
## ham 0.2247706 0.6922422
## spam 0.1534653 0.5474617
##
## whether
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.05445545 0.3185867
##
## which
## train_y [,1] [,2]
## ham 0.4678899 0.9983286
## spam 0.3019802 0.7349796
##
## while
## train_y [,1] [,2]
## ham 0.08715596 0.3417496
## spam 0.14356436 0.6796130
##
## white
## train_y [,1] [,2]
## ham 0.02293578 0.1781280
## spam 0.09405941 0.6882557
##
## who
## train_y [,1] [,2]
## ham 0.3899083 1.042546
## spam 0.4950495 1.690617
##
## whole
## train_y [,1] [,2]
## ham 0.04587156 0.2096880
## spam 0.05445545 0.3025677
##
## why
## train_y [,1] [,2]
## ham 0.16972477 0.5627458
## spam 0.09405941 0.3679503
##
## width
## train_y [,1] [,2]
## ham 0.000000 0.00000
## spam 3.217822 10.65899
##
## widtha
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.07425743 0.5080748
##
## widthd
## train_y [,1] [,2]
## ham 0.000000 0.00000
## spam 1.727723 5.77179
##
## widthdbfont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04950495 0.5159397
##
## widthdfont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.08415842 0.7315536
##
## widthdimg
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.04455446 0.3201291
##
## widthdnbsptd
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.01485149 0.1570158
##
## widthfont
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.5232395
##
## widthnbsptd
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.5774782
##
## will
## train_y [,1] [,2]
## ham 0.4082569 1.299885
## spam 1.5792079 3.513466
##
## williams
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.04950495 0.4082785
##
## window
## train_y [,1] [,2]
## ham 0.133027523 0.90350623
## spam 0.004950495 0.07035975
##
## windows
## train_y [,1] [,2]
## ham 0.1146789 0.4080584
## spam 0.1534653 0.4246308
##
## wish
## train_y [,1] [,2]
## ham 0.02293578 0.178128
## spam 0.32178218 0.631212
##
## with
## train_y [,1] [,2]
## ham 6.766055 3.492631
## spam 6.445545 3.604793
##
## within
## train_y [,1] [,2]
## ham 0.02293578 0.178128
## spam 0.29702970 1.070124
##
## without
## train_y [,1] [,2]
## ham 0.1238532 0.4586896
## spam 0.2128713 0.6977096
##
## women
## train_y [,1] [,2]
## ham 0.06422018 0.5039060
## spam 0.07425743 0.7258076
##
## wonderful
## train_y [,1] [,2]
## ham 0.009174312 0.09556168
## spam 0.039603960 0.21948796
##
## wont
## train_y [,1] [,2]
## ham 0.03211009 0.2010941
## spam 0.01980198 0.1396654
##
## word
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.04950495 0.2392458
##
## work
## train_y [,1] [,2]
## ham 0.1330275 0.4247103
## spam 0.2722772 1.0222031
##
## worked
## train_y [,1] [,2]
## ham 0.05045872 0.3069469
## spam 0.03465347 0.2709613
##
## working
## train_y [,1] [,2]
## ham 0.07798165 0.2853917
## spam 0.09405941 0.3679503
##
## works
## train_y [,1] [,2]
## ham 0.06422018 0.3785792
## spam 0.08910891 0.4370642
##
## world
## train_y [,1] [,2]
## ham 0.22477064 0.7558870
## spam 0.08415842 0.5534573
##
## worlds
## train_y [,1] [,2]
## ham 0.07798165 0.3303002
## spam 0.03960396 0.2793308
##
## worldwide
## train_y [,1] [,2]
## ham 0.01834862 0.1652621
## spam 0.03465347 0.2087326
##
## worst
## train_y [,1] [,2]
## ham 0.02752294 0.286685
## spam 0.04455446 0.206836
##
## worth
## train_y [,1] [,2]
## ham 0.03669725 0.2690350
## spam 0.03465347 0.1833549
##
## would
## train_y [,1] [,2]
## ham 0.5000000 1.184578
## spam 0.3910891 1.046444
##
## wowie
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1237624 1.758994
##
## write
## train_y [,1] [,2]
## ham 0.07339450 0.3769003
## spam 0.04455446 0.2503630
##
## writes
## train_y [,1] [,2]
## ham 0.1146789 0.3600629
## spam 0.0000000 0.0000000
##
## writing
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.03960396 0.2609128
##
## wrong
## train_y [,1] [,2]
## ham 0.07798165 0.3697945
## spam 0.01980198 0.1396654
##
## wrote
## train_y [,1] [,2]
## ham 0.3990826 0.652114
## spam 0.0000000 0.000000
##
## xacceptlanguage
## train_y [,1] [,2]
## ham 0.08715596 0.2827126
## spam 0.00000000 0.0000000
##
## xantiabuse
## train_y [,1] [,2]
## ham 0.06880734 0.5838372
## spam 0.00000000 0.0000000
##
## xapparentlyto
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.00000000 0.0000000
##
## xauthenticationwarning
## train_y [,1] [,2]
## ham 0.06880734 0.2537088
## spam 0.05940594 0.2369702
##
## xbeenthere
## train_y [,1] [,2]
## ham 0.6192661 0.4960632
## spam 0.1336634 0.3411357
##
## xegroupsreturn
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.00000000 0.0000000
##
## xentcom
## train_y [,1] [,2]
## ham 0.7706422 1.334636
## spam 0.0000000 0.000000
##
## xhabeasswe
## train_y [,1] [,2]
## ham 0.1238532 1.050907
## spam 0.0000000 0.000000
##
## xinfo
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.05940594 0.3254322
##
## xkeywords
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.05445545 0.227478
##
## xloop
## train_y [,1] [,2]
## ham 0.09174312 0.2893273
## spam 0.01485149 0.1212589
##
## xmailer
## train_y [,1] [,2]
## ham 0.3073394 0.472313
## spam 0.4702970 0.500357
##
## xmailmanversion
## train_y [,1] [,2]
## ham 0.6192661 0.4960632
## spam 0.1336634 0.3411357
##
## xmailscanner
## train_y [,1] [,2]
## ham 0.07798165 0.2687598
## spam 0.00000000 0.0000000
##
## xmimeautoconverted
## train_y [,1] [,2]
## ham 0.02752294 0.1639779
## spam 0.05445545 0.2274780
##
## xmimeole
## train_y [,1] [,2]
## ham 0.08256881 0.2758628
## spam 0.09405941 0.2926366
##
## xml
## train_y [,1] [,2]
## ham 0.23394495 2.992363
## spam 0.01980198 0.281439
##
## xmsmailpriority
## train_y [,1] [,2]
## ham 0.0733945 0.2613831
## spam 0.2178218 0.4137911
##
## xoriginalarrivaltime
## train_y [,1] [,2]
## ham 0.01376147 0.1167674
## spam 0.12376238 0.3301284
##
## xoriginaldate
## train_y [,1] [,2]
## ham 0.24311927 0.4299538
## spam 0.06435644 0.2459965
##
## xpriority
## train_y [,1] [,2]
## ham 0.09633028 0.2957227
## spam 0.31683168 0.4663971
##
## xsender
## train_y [,1] [,2]
## ham 0.110091743 0.32808478
## spam 0.004950495 0.07035975
##
## xsmall
## train_y [,1] [,2]
## ham 0 0
## spam 0 0
##
## xstatus
## train_y [,1] [,2]
## ham 0.00000000 0.000000
## spam 0.05445545 0.227478
##
## xvirusscanned
## train_y [,1] [,2]
## ham 0.004587156 0.06772855
## spam 0.049504950 0.23924581
##
## yahoo
## train_y [,1] [,2]
## ham 0.15137615 0.5345423
## spam 0.01485149 0.1570158
##
## year
## train_y [,1] [,2]
## ham 0.08256881 0.501214
## spam 0.22772277 1.011193
##
## years
## train_y [,1] [,2]
## ham 0.2064220 0.6913255
## spam 0.2425743 1.0393115
##
## yes
## train_y [,1] [,2]
## ham 0.04587156 0.2096880
## spam 0.12376238 0.4778993
##
## yet
## train_y [,1] [,2]
## ham 0.05963303 0.2734383
## spam 0.04455446 0.2503630
##
## york
## train_y [,1] [,2]
## ham 0.02752294 0.2128895
## spam 0.03465347 0.2087326
##
## you
## train_y [,1] [,2]
## ham 1.201835 2.147895
## spam 5.742574 11.225995
##
## youll
## train_y [,1] [,2]
## ham 0.04587156 0.2306201
## spam 0.11881188 0.4061010
##
## young
## train_y [,1] [,2]
## ham 0.02752294 0.3450430
## spam 0.03465347 0.4276363
##
## your
## train_y [,1] [,2]
## ham 0.4587156 0.8372889
## spam 3.8168317 5.7424936
##
## youre
## train_y [,1] [,2]
## ham 0.08256881 0.3361051
## spam 0.10891089 0.5259626
##
## yours
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.1089109 0.3278451
##
## yourself
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.12376238 0.3857276
##
## youve
## train_y [,1] [,2]
## ham 0.01834862 0.1345175
## spam 0.05445545 0.3484221
##
## yyyylocalhostnetnoteinccom
## train_y [,1] [,2]
## ham 0.1422018 0.3500608
## spam 0.0000000 0.0000000
##
## yyyylocalhostspamassassintaintorg
## train_y [,1] [,2]
## ham 0.71559633 0.4521682
## spam 0.01980198 0.1396654
##
## yyyyspamassassintaintorg
## train_y [,1] [,2]
## ham 0.26146789 0.4404456
## spam 0.01485149 0.1212589
##
## zowie
## train_y [,1] [,2]
## ham 0.0000000 0.000000
## spam 0.1237624 1.758994
##
## zzzzasonorg
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.4851485 0.5299744
##
## zzzzilugjmasonorg
## train_y [,1] [,2]
## ham 0.00000000 0.0000000
## spam 0.06930693 0.2546063
##
## zzzzjmasonorg
## train_y [,1] [,2]
## ham 0.0000000 0.0000000
## spam 0.3316832 0.5498857
##
## zzzzlocalhost
## train_y [,1] [,2]
## ham 0.2477064 0.6603443
## spam 1.8217822 0.5712174
##
## zzzzlocalhostnetnoteinccom
## train_y [,1] [,2]
## ham 0.06422018 0.245709
## spam 0.00000000 0.000000
##
## zzzzlocalhostspamassassintaintorg
## train_y [,1] [,2]
## ham 0.05963303 0.2373507
## spam 0.91089109 0.2856087
##
## zzzzspamassassintaintorg
## train_y [,1] [,2]
## ham 0.09633028 0.3525880
## spam 0.63861386 0.9426828
##
## zzzzteana
## train_y [,1] [,2]
## ham 0.07798165 0.3303002
## spam 0.00000000 0.0000000
##
## zzzzteanayahoogroupscom
## train_y [,1] [,2]
## ham 0.293578 1.170314
## spam 0.000000 0.000000
To evaluate the classifier, I predict the labels for the test set and compute a confusion matrix and overall accuracy.
test_pred <- predict(nb_model, newdata = test_x)
conf_mat <- table(
Actual = test_y,
Predicted = test_pred
)
conf_mat
## Predicted
## Actual ham spam
## ham 80 2
## spam 51 47
accuracy <- sum(test_pred == test_y) / length(test_y)
accuracy
## [1] 0.7055556
I also compute the misclassification rate:
error_rate <- 1 - accuracy
error_rate
## [1] 0.2944444
And simple precision and recall for the spam class, with safeguards in case a class is missing from the test set.
# Safely extract counts even if some cells are missing
get_cell <- function(cm, r, c) {
if (r %in% rownames(cm) && c %in% colnames(cm)) {
cm[r, c]
} else {
0
}
}
tp <- get_cell(conf_mat, "spam", "spam")
fp <- get_cell(conf_mat, "ham", "spam")
fn <- get_cell(conf_mat, "spam", "ham")
precision_spam <- ifelse(tp + fp > 0, tp / (tp + fp), NA_real_)
recall_spam <- ifelse(tp + fn > 0, tp / (tp + fn), NA_real_)
precision_spam
## [1] 0.9591837
recall_spam
## [1] 0.4795918
Finally, I classify a few new example messages to see how the model behaves on text that was not part of the training data.
new_texts <- c(
"Congratulations! You have won a free prize. Click here to claim now!!!",
"Hi, just checking if we are still on for our meeting tomorrow.",
"Limited time offer on cheap medications, no prescription needed."
)
new_corpus <- VCorpus(VectorSource(new_texts)) %>%
tm_map(content_transformer(tolower)) %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
# Use the same frequent terms as in the training DTM
new_dtm <- DocumentTermMatrix(
new_corpus,
control = list(dictionary = Terms(dtm))
)
new_mat <- as.data.frame(as.matrix(new_dtm))
# Ensure columns align with training predictors
missing_cols <- setdiff(colnames(train_x), colnames(new_mat))
for (mc in missing_cols) {
new_mat[[mc]] <- 0
}
new_mat <- new_mat[, colnames(train_x)]
new_pred <- predict(nb_model, newdata = new_mat)
tibble(
text = new_texts,
predicted_label = new_pred
)
## # A tibble: 3 × 2
## text predicted_label
## <chr> <fct>
## 1 Congratulations! You have won a free prize. Click here to cla… ham
## 2 Hi, just checking if we are still on for our meeting tomorrow. ham
## 3 Limited time offer on cheap medications, no prescription need… ham
By sampling a limited number of emails per class and restricting the vocabulary to frequent terms only, I was able to:
For future work, I could: