1 Introduction

It can be useful to be able to classify new “test” documents using already classified “training” documents. A common example is using a corpus of labeled spam and ham (non-spam) e-mails to predict whether or not a new document is spam.

For this project, we are tasked to start with a spam/ham dataset, then predict the class of new documents (either withheld from the training dataset or from another source such as your own spam folder). We are provided with the corpus (https://spamassassin.apache.org/old/publiccorpus/) and instructions on how to download the ham and spam files.

2 Load Required Libraries

library(tm)
## Loading required package: NLP
library(tidyverse)
## -- Attaching packages ----------------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   0.8.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts -------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x ggplot2::annotate() masks NLP::annotate()
## x dplyr::filter()     masks stats::filter()
## x dplyr::lag()        masks stats::lag()
library(wordcloud)
## Loading required package: RColorBrewer
library(naivebayes)
## naivebayes 0.9.6 loaded
library(e1071)

3 Data Collection

3.1 Loading Files and Folders

We have followed the unzipping process explained in the video and downloaded “easy_ham” and “spam” folders. We will further load these files to R.

# loading both test and training files
spam_directory = "C:/Users/Anil Akyildirim/Desktop/Data Science/MSDS/Data Acquisition and Management/Week 11/Project 4/spam"
easy_ham_directory = "C:/Users/Anil Akyildirim/Desktop/Data Science/MSDS/Data Acquisition and Management/Week 11/Project 4/easy_ham"
spam_files <- list.files(spam_directory)
easy_ham_files <- list.files(easy_ham_directory)

We need to remove the .cmds files from all the files.

spam_files <- spam_files[which(spam_files!="cmds")]
easy_ham_files <- easy_ham_files[which(easy_ham_files!="cmds")]

3.2 Processing Textual Data - Corpus Creation

# easy_ham folder files 
easy_ham_corpus <- easy_ham_directory %>%
  paste(., list.files(.), sep = "/") %>%
  lapply(readLines) %>%
  VectorSource() %>%
  VCorpus()

easy_ham_corpus
## <<VCorpus>>
## Metadata:  corpus specific: 0, document level (indexed): 0
## Content:  documents: 2551
# spam folder files
spam_corpus <- spam_directory %>%
  paste(., list.files(.), sep = "/") %>%
  lapply(readLines) %>%
  VectorSource() %>%
  VCorpus()

spam_corpus
## <<VCorpus>>
## Metadata:  corpus specific: 0, document level (indexed): 0
## Content:  documents: 500

4 Data Cleaning and Preperation

4.1 Corpus Cleaning

In terms of cleaning the corpus for each folder we will use the tm package and follow below steps;

1- Remove the numbers and punctuations

2- Remove stopwords such as to, from, and, the etc…

3- Remove blankspaces.

4- Reduce the terms to their stem.

# easy ham emails
easy_ham_corpus <- easy_ham_corpus %>%
  tm_map(removeNumbers) %>%
  tm_map(removePunctuation) %>%
  tm_map(removeWords, stopwords()) %>%
  tm_map(stripWhitespace) %>%
  tm_map(stemDocument)


easy_ham_corpus
## <<VCorpus>>
## Metadata:  corpus specific: 0, document level (indexed): 0
## Content:  documents: 2551
#spam emails
spam_corpus <- spam_corpus %>%
  tm_map(removeNumbers) %>%
  tm_map(removePunctuation) %>%
  tm_map(removeWords, stopwords()) %>%
  tm_map(stripWhitespace) %>%
  tm_map(stemDocument)

spam_corpus
## <<VCorpus>>
## Metadata:  corpus specific: 0, document level (indexed): 0
## Content:  documents: 500

A look at the corpus for easy_ham and spam revelas that we have 2551 documents on easy_ham and 500 documents on spam. We combine these two corpuses.

ham_or_spam_corpus <- c(easy_ham_corpus, spam_corpus)

4.2 Building a Term Document Matrix

tdm <- DocumentTermMatrix(ham_or_spam_corpus)
tdm
## <<DocumentTermMatrix (documents: 3051, terms: 59852)>>
## Non-/sparse entries: 490433/182118019
## Sparsity           : 100%
## Maximal term length: 298
## Weighting          : term frequency (tf)

4.3 Creating Word Cloud with Header Text.

wordcloud(ham_or_spam_corpus, max.words = 100, random.order = FALSE, rot.per=0.15, min.freq=5, colors = brewer.pal(8, "Dark2"))

5 Model Development

We can use a classification method such as Naive Bayes classifier to find out the presence of certain features (words) in a defined class to predict if the email is spam or ham.

5.1 Data Preperation for Model Development

Before we start creating our training and test data sets and process, we need to create a combined dataframe, label the corpus (ham or spam) as part of supervised technique.

df_ham <- as.data.frame(unlist(easy_ham_corpus), stringsAsFactors = FALSE)
df_ham$type <- "ham"
colnames(df_ham)=c("text", "email")

df_spam <- as.data.frame(unlist(spam_corpus), stringsAsFactors = FALSE)
df_spam$type <- "spam"
colnames(df_spam)=c("text", "email")

df_ham_or_spam <- rbind(df_ham, df_spam)

head(df_ham_or_spam)
##                                           text email
## 1       From exmhworkersadminredhatcom Thu Aug   ham
## 2        ReturnPath exmhworkersadminexamplecom   ham
## 3       DeliveredTo zzzzlocalhostnetnoteinccom   ham
## 4                   Receiv localhost localhost   ham
## 5 phoboslabsnetnoteinccom Postfix ESMTP id DEC   ham
## 6                    zzzzlocalhost Thu Aug EDT   ham

5.2 Prepare Test and Train Data

5.2.1 Splitting Test and Train Data

We will split 75% of the data as training data and 25% as the test data.

sample_size <- floor(0.75 * nrow(df_ham_or_spam)) # selecting sample size of 75% of the data for training. 

set.seed(123)
train <- sample(seq_len(nrow(df_ham_or_spam)), size = sample_size)

train_ham_or_spam <- df_ham_or_spam[train, ]
test_ham_or_spam <- df_ham_or_spam[-train, ]

head(train_ham_or_spam)
##                                                        text email
## 188942 ListArchiv httpwwwgeocrawlercomredirsfphplistrazorus   ham
## 134058                            To rpmzzzlistfreshrpmsnet   ham
## 124022           manifest exmh I figur ask might help track   ham
## 160997                    Refer DDEFphoboslabsnetnoteinccom   ham
## 226318                   Receiv dogmaslashnullorg localhost   ham
## 124507                                            XPrioriti   ham
head(test_ham_or_spam)
##                                         text email
## 6                  zzzzlocalhost Thu Aug EDT   ham
## 14 listmanredhatcom Postfix ESMTP id Thu Aug   ham
## 15                                       EDT   ham
## 25          intmxcorpredhatcom SMTP id gMBYi   ham
## 37   To Chris Garrigu cwgdatedfadDeepEddyCom   ham
## 40           InReplyTo TMDAdeepeddyvirciocom   ham

5.2.2 Create and Clean Corpus and Create Term Document Matrix for Training and Test Data.

# corpus creation
train_corpus <- Corpus (VectorSource(train_ham_or_spam$text)) # corpus training data
test_corpus <- Corpus(VectorSource(test_ham_or_spam$text)) # corpus test data

# corpus cleaning
train_corpus <- train_corpus %>%
  tm_map(removeNumbers) %>%
  tm_map(removePunctuation) %>%
  tm_map(removeWords, stopwords()) %>%
  tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., removeWords, stopwords()): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
test_corpus <- test_corpus %>%
  tm_map(removeNumbers) %>%
  tm_map(removePunctuation) %>%
  tm_map(removeWords, stopwords()) %>%
  tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., removeWords, stopwords()): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
train_tdm <- DocumentTermMatrix(train_corpus)
test_tdm <- DocumentTermMatrix(test_corpus)

train_tdm
## <<DocumentTermMatrix (documents: 236142, terms: 50639)>>
## Non-/sparse entries: 573964/11957420774
## Sparsity           : 100%
## Maximal term length: 245
## Weighting          : term frequency (tf)
test_tdm
## <<DocumentTermMatrix (documents: 78714, terms: 25950)>>
## Non-/sparse entries: 191569/2042436731
## Sparsity           : 100%
## Maximal term length: 298
## Weighting          : term frequency (tf)
train_corpus
## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 236142
test_corpus
## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 78714

We need to separate training data to spam and ham.

spam <- subset(train_ham_or_spam, email == "spam")
ham <- subset(train_ham_or_spam, email == "ham")

If we run all the observation in my data, R doesnt have enough memory to execute it at the moment. So, I am going to narrow down the observations by selecting words that uses at least 50 times in the training document.

fifty_times_words<- findFreqTerms(train_tdm, 50)
length(fifty_times_words)
## [1] 1616
train_tdm_2<- DocumentTermMatrix(train_corpus, control=list(dictionary = fifty_times_words))

test_tdm_2<- DocumentTermMatrix(test_corpus, control=list(dictionary = fifty_times_words))

5.3 Model Development

We need to create a classifier for each email.

# this is required in order to set the classifier for naiveBayes
class(train_tdm_2)
## [1] "DocumentTermMatrix"    "simple_triplet_matrix"
train_tdm_3 <- as.matrix(train_tdm_2)
train_tdm_3 <- as.data.frame(train_tdm_3)
class(train_tdm_3)
## [1] "data.frame"
classifier <- naiveBayes(train_tdm_3, factor(train_ham_or_spam$email))
class(classifier)
## [1] "naiveBayes"
class(test_tdm_2)
## [1] "DocumentTermMatrix"    "simple_triplet_matrix"
test_tdm_3 <- as.matrix(test_tdm_2)
test_tdm_3 <- as.data.frame(test_tdm_3)
class(test_tdm_3)
## [1] "data.frame"

6 Pediction

We can use the predict function to test the model on new data. " test_pred <- predict(classifier, newdata=test_tdm_3)"

7 Conclusion

We are able to generate prediction of email being ham or spam (using supervised technique -naive Bayes method). We can further test it against the raw data and evaluate model’s performance.

** Unfortunately, i have ran a lot of code efficiency issues on this project. Majority of the time I wasnt able to create efficient code and when i reviewed the error messages I found out that the code that i created using a lot of memmory. For example i had to change the class type to make the classifier work. **

LS0tDQp0aXRsZTogIkRvY3VtZW50IENsYXNzaWZpY2F0aW9uIg0KYXV0aG9yOiBBbmlsIEFreWlsZGlyaW0NCmRhdGU6ICIxMC8xOS8yMDE5Ig0Kb3V0cHV0Og0KICBodG1sX2RvY3VtZW50Og0KICAgIGNvZGVfZG93bmxvYWQ6IHllcw0KICAgIGNvZGVfZm9sZGluZzogaGlkZQ0KICAgIGhpZ2hsaWdodDogcHlnbWVudHMNCiAgICBudW1iZXJfc2VjdGlvbnM6IHllcw0KICAgIHRoZW1lOiBmbGF0bHkNCiAgICB0b2M6IHllcw0KICAgIHRvY19mbG9hdDogeWVzDQogIHBkZl9kb2N1bWVudDoNCiAgICB0b2M6IHllcw0KLS0tDQoNCiMgSW50cm9kdWN0aW9uDQoNCkl0IGNhbiBiZSB1c2VmdWwgdG8gYmUgYWJsZSB0byBjbGFzc2lmeSBuZXcgInRlc3QiIGRvY3VtZW50cyB1c2luZyBhbHJlYWR5IGNsYXNzaWZpZWQgInRyYWluaW5nIiBkb2N1bWVudHMuICBBIGNvbW1vbiBleGFtcGxlIGlzIHVzaW5nIGEgY29ycHVzIG9mIGxhYmVsZWQgc3BhbSBhbmQgaGFtIChub24tc3BhbSkgZS1tYWlscyB0byBwcmVkaWN0IHdoZXRoZXIgb3Igbm90IGEgbmV3IGRvY3VtZW50IGlzIHNwYW0uICANCg0KRm9yIHRoaXMgcHJvamVjdCwgd2UgYXJlIHRhc2tlZCB0byBzdGFydCB3aXRoIGEgc3BhbS9oYW0gZGF0YXNldCwgdGhlbiBwcmVkaWN0IHRoZSBjbGFzcyBvZiBuZXcgZG9jdW1lbnRzIChlaXRoZXIgd2l0aGhlbGQgZnJvbSB0aGUgdHJhaW5pbmcgZGF0YXNldCBvciBmcm9tIGFub3RoZXIgc291cmNlIHN1Y2ggYXMgeW91ciBvd24gc3BhbSBmb2xkZXIpLiBXZSBhcmUgcHJvdmlkZWQgd2l0aCB0aGUgY29ycHVzIChodHRwczovL3NwYW1hc3Nhc3Npbi5hcGFjaGUub3JnL29sZC9wdWJsaWNjb3JwdXMvKSBhbmQgaW5zdHJ1Y3Rpb25zIG9uIGhvdyB0byBkb3dubG9hZCB0aGUgaGFtIGFuZCBzcGFtIGZpbGVzLiANCg0KIyBMb2FkIFJlcXVpcmVkIExpYnJhcmllcw0KDQpgYGB7cn0NCg0KbGlicmFyeSh0bSkNCmxpYnJhcnkodGlkeXZlcnNlKQ0KbGlicmFyeSh3b3JkY2xvdWQpDQpsaWJyYXJ5KG5haXZlYmF5ZXMpDQpsaWJyYXJ5KGUxMDcxKQ0KDQpgYGANCg0KIyBEYXRhIENvbGxlY3Rpb24NCg0KIyMgTG9hZGluZyBGaWxlcyBhbmQgRm9sZGVycw0KDQpXZSBoYXZlIGZvbGxvd2VkIHRoZSB1bnppcHBpbmcgcHJvY2VzcyBleHBsYWluZWQgaW4gdGhlIHZpZGVvIGFuZCBkb3dubG9hZGVkICJlYXN5X2hhbSIgYW5kICJzcGFtIiBmb2xkZXJzLiBXZSB3aWxsIGZ1cnRoZXIgbG9hZCB0aGVzZSBmaWxlcyB0byBSLg0KDQpgYGB7cn0NCiMgbG9hZGluZyBib3RoIHRlc3QgYW5kIHRyYWluaW5nIGZpbGVzDQpzcGFtX2RpcmVjdG9yeSA9ICJDOi9Vc2Vycy9BbmlsIEFreWlsZGlyaW0vRGVza3RvcC9EYXRhIFNjaWVuY2UvTVNEUy9EYXRhIEFjcXVpc2l0aW9uIGFuZCBNYW5hZ2VtZW50L1dlZWsgMTEvUHJvamVjdCA0L3NwYW0iDQplYXN5X2hhbV9kaXJlY3RvcnkgPSAiQzovVXNlcnMvQW5pbCBBa3lpbGRpcmltL0Rlc2t0b3AvRGF0YSBTY2llbmNlL01TRFMvRGF0YSBBY3F1aXNpdGlvbiBhbmQgTWFuYWdlbWVudC9XZWVrIDExL1Byb2plY3QgNC9lYXN5X2hhbSINCnNwYW1fZmlsZXMgPC0gbGlzdC5maWxlcyhzcGFtX2RpcmVjdG9yeSkNCmVhc3lfaGFtX2ZpbGVzIDwtIGxpc3QuZmlsZXMoZWFzeV9oYW1fZGlyZWN0b3J5KQ0KYGBgDQoNCldlIG5lZWQgdG8gcmVtb3ZlIHRoZSAuY21kcyBmaWxlcyBmcm9tIGFsbCB0aGUgZmlsZXMuDQoNCmBgYHtyfQ0Kc3BhbV9maWxlcyA8LSBzcGFtX2ZpbGVzW3doaWNoKHNwYW1fZmlsZXMhPSJjbWRzIildDQplYXN5X2hhbV9maWxlcyA8LSBlYXN5X2hhbV9maWxlc1t3aGljaChlYXN5X2hhbV9maWxlcyE9ImNtZHMiKV0NCg0KYGBgDQoNCg0KIyMgUHJvY2Vzc2luZyBUZXh0dWFsIERhdGEgLSBDb3JwdXMgQ3JlYXRpb24NCg0KDQpgYGB7cn0NCiMgZWFzeV9oYW0gZm9sZGVyIGZpbGVzIA0KZWFzeV9oYW1fY29ycHVzIDwtIGVhc3lfaGFtX2RpcmVjdG9yeSAlPiUNCiAgcGFzdGUoLiwgbGlzdC5maWxlcyguKSwgc2VwID0gIi8iKSAlPiUNCiAgbGFwcGx5KHJlYWRMaW5lcykgJT4lDQogIFZlY3RvclNvdXJjZSgpICU+JQ0KICBWQ29ycHVzKCkNCg0KZWFzeV9oYW1fY29ycHVzDQoNCmBgYA0KDQoNCg0KYGBge3J9DQojIHNwYW0gZm9sZGVyIGZpbGVzDQpzcGFtX2NvcnB1cyA8LSBzcGFtX2RpcmVjdG9yeSAlPiUNCiAgcGFzdGUoLiwgbGlzdC5maWxlcyguKSwgc2VwID0gIi8iKSAlPiUNCiAgbGFwcGx5KHJlYWRMaW5lcykgJT4lDQogIFZlY3RvclNvdXJjZSgpICU+JQ0KICBWQ29ycHVzKCkNCg0Kc3BhbV9jb3JwdXMNCmBgYA0KDQojIERhdGEgQ2xlYW5pbmcgYW5kIFByZXBlcmF0aW9uDQoNCiMjIENvcnB1cyBDbGVhbmluZw0KDQpJbiB0ZXJtcyBvZiBjbGVhbmluZyB0aGUgY29ycHVzIGZvciBlYWNoIGZvbGRlciB3ZSB3aWxsIHVzZSB0aGUgdG0gcGFja2FnZSBhbmQgZm9sbG93IGJlbG93IHN0ZXBzOw0KDQoxLSBSZW1vdmUgdGhlIG51bWJlcnMgYW5kIHB1bmN0dWF0aW9ucw0KDQoyLSBSZW1vdmUgc3RvcHdvcmRzIHN1Y2ggYXMgdG8sIGZyb20sIGFuZCwgdGhlIGV0Yy4uLg0KDQozLSBSZW1vdmUgYmxhbmtzcGFjZXMuDQoNCjQtIFJlZHVjZSB0aGUgdGVybXMgdG8gdGhlaXIgc3RlbS4NCg0KDQoNCmBgYHtyfQ0KIyBlYXN5IGhhbSBlbWFpbHMNCmVhc3lfaGFtX2NvcnB1cyA8LSBlYXN5X2hhbV9jb3JwdXMgJT4lDQogIHRtX21hcChyZW1vdmVOdW1iZXJzKSAlPiUNCiAgdG1fbWFwKHJlbW92ZVB1bmN0dWF0aW9uKSAlPiUNCiAgdG1fbWFwKHJlbW92ZVdvcmRzLCBzdG9wd29yZHMoKSkgJT4lDQogIHRtX21hcChzdHJpcFdoaXRlc3BhY2UpICU+JQ0KICB0bV9tYXAoc3RlbURvY3VtZW50KQ0KDQoNCmVhc3lfaGFtX2NvcnB1cw0KYGBgDQoNCmBgYHtyfQ0KI3NwYW0gZW1haWxzDQpzcGFtX2NvcnB1cyA8LSBzcGFtX2NvcnB1cyAlPiUNCiAgdG1fbWFwKHJlbW92ZU51bWJlcnMpICU+JQ0KICB0bV9tYXAocmVtb3ZlUHVuY3R1YXRpb24pICU+JQ0KICB0bV9tYXAocmVtb3ZlV29yZHMsIHN0b3B3b3JkcygpKSAlPiUNCiAgdG1fbWFwKHN0cmlwV2hpdGVzcGFjZSkgJT4lDQogIHRtX21hcChzdGVtRG9jdW1lbnQpDQoNCnNwYW1fY29ycHVzDQoNCg0KYGBgDQoNCkEgbG9vayBhdCB0aGUgY29ycHVzIGZvciBlYXN5X2hhbSBhbmQgc3BhbSByZXZlbGFzIHRoYXQgd2UgaGF2ZSAyNTUxIGRvY3VtZW50cyBvbiBlYXN5X2hhbSBhbmQgNTAwIGRvY3VtZW50cyBvbiBzcGFtLiBXZSBjb21iaW5lIHRoZXNlIHR3byBjb3JwdXNlcy4gDQoNCg0KYGBge3J9DQoNCmhhbV9vcl9zcGFtX2NvcnB1cyA8LSBjKGVhc3lfaGFtX2NvcnB1cywgc3BhbV9jb3JwdXMpDQoNCmBgYA0KDQojIyBCdWlsZGluZyBhIFRlcm0gRG9jdW1lbnQgTWF0cml4IA0KDQpgYGB7cn0NCnRkbSA8LSBEb2N1bWVudFRlcm1NYXRyaXgoaGFtX29yX3NwYW1fY29ycHVzKQ0KdGRtDQoNCmBgYA0KDQojIyBDcmVhdGluZyBXb3JkIENsb3VkIHdpdGggSGVhZGVyIFRleHQuIA0KDQpgYGB7cn0NCndvcmRjbG91ZChoYW1fb3Jfc3BhbV9jb3JwdXMsIG1heC53b3JkcyA9IDEwMCwgcmFuZG9tLm9yZGVyID0gRkFMU0UsIHJvdC5wZXI9MC4xNSwgbWluLmZyZXE9NSwgY29sb3JzID0gYnJld2VyLnBhbCg4LCAiRGFyazIiKSkNCg0KYGBgDQoNCiMgTW9kZWwgRGV2ZWxvcG1lbnQNCg0KV2UgY2FuIHVzZSBhIGNsYXNzaWZpY2F0aW9uIG1ldGhvZCBzdWNoIGFzIE5haXZlIEJheWVzIGNsYXNzaWZpZXIgdG8gZmluZCBvdXQgdGhlIHByZXNlbmNlIG9mIGNlcnRhaW4gZmVhdHVyZXMgKHdvcmRzKSBpbiBhIGRlZmluZWQgY2xhc3MgdG8gcHJlZGljdCBpZiB0aGUgZW1haWwgaXMgc3BhbSBvciBoYW0uIA0KDQojIyBEYXRhIFByZXBlcmF0aW9uIGZvciBNb2RlbCBEZXZlbG9wbWVudA0KDQpCZWZvcmUgd2Ugc3RhcnQgY3JlYXRpbmcgb3VyIHRyYWluaW5nIGFuZCB0ZXN0IGRhdGEgc2V0cyBhbmQgcHJvY2Vzcywgd2UgbmVlZCB0byBjcmVhdGUgYSBjb21iaW5lZCBkYXRhZnJhbWUsIGxhYmVsIHRoZSBjb3JwdXMgKGhhbSBvciBzcGFtKSBhcyBwYXJ0IG9mIHN1cGVydmlzZWQgdGVjaG5pcXVlLg0KDQpgYGB7cn0NCg0KZGZfaGFtIDwtIGFzLmRhdGEuZnJhbWUodW5saXN0KGVhc3lfaGFtX2NvcnB1cyksIHN0cmluZ3NBc0ZhY3RvcnMgPSBGQUxTRSkNCmRmX2hhbSR0eXBlIDwtICJoYW0iDQpjb2xuYW1lcyhkZl9oYW0pPWMoInRleHQiLCAiZW1haWwiKQ0KDQpkZl9zcGFtIDwtIGFzLmRhdGEuZnJhbWUodW5saXN0KHNwYW1fY29ycHVzKSwgc3RyaW5nc0FzRmFjdG9ycyA9IEZBTFNFKQ0KZGZfc3BhbSR0eXBlIDwtICJzcGFtIg0KY29sbmFtZXMoZGZfc3BhbSk9YygidGV4dCIsICJlbWFpbCIpDQoNCmRmX2hhbV9vcl9zcGFtIDwtIHJiaW5kKGRmX2hhbSwgZGZfc3BhbSkNCg0KaGVhZChkZl9oYW1fb3Jfc3BhbSkNCmBgYA0KDQojIyBQcmVwYXJlIFRlc3QgYW5kIFRyYWluIERhdGENCg0KIyMjIFNwbGl0dGluZyBUZXN0IGFuZCBUcmFpbiBEYXRhDQoNCldlIHdpbGwgc3BsaXQgNzUlIG9mIHRoZSBkYXRhIGFzIHRyYWluaW5nIGRhdGEgYW5kIDI1JSBhcyB0aGUgdGVzdCBkYXRhLiANCg0KDQpgYGB7cn0NCg0Kc2FtcGxlX3NpemUgPC0gZmxvb3IoMC43NSAqIG5yb3coZGZfaGFtX29yX3NwYW0pKSAjIHNlbGVjdGluZyBzYW1wbGUgc2l6ZSBvZiA3NSUgb2YgdGhlIGRhdGEgZm9yIHRyYWluaW5nLiANCg0Kc2V0LnNlZWQoMTIzKQ0KdHJhaW4gPC0gc2FtcGxlKHNlcV9sZW4obnJvdyhkZl9oYW1fb3Jfc3BhbSkpLCBzaXplID0gc2FtcGxlX3NpemUpDQoNCnRyYWluX2hhbV9vcl9zcGFtIDwtIGRmX2hhbV9vcl9zcGFtW3RyYWluLCBdDQp0ZXN0X2hhbV9vcl9zcGFtIDwtIGRmX2hhbV9vcl9zcGFtWy10cmFpbiwgXQ0KDQpoZWFkKHRyYWluX2hhbV9vcl9zcGFtKQ0KaGVhZCh0ZXN0X2hhbV9vcl9zcGFtKQ0KDQoNCmBgYA0KDQojIyMgQ3JlYXRlIGFuZCBDbGVhbiBDb3JwdXMgYW5kIENyZWF0ZSBUZXJtIERvY3VtZW50IE1hdHJpeCBmb3IgVHJhaW5pbmcgYW5kIFRlc3QgRGF0YS4NCg0KYGBge3J9DQojIGNvcnB1cyBjcmVhdGlvbg0KdHJhaW5fY29ycHVzIDwtIENvcnB1cyAoVmVjdG9yU291cmNlKHRyYWluX2hhbV9vcl9zcGFtJHRleHQpKSAjIGNvcnB1cyB0cmFpbmluZyBkYXRhDQp0ZXN0X2NvcnB1cyA8LSBDb3JwdXMoVmVjdG9yU291cmNlKHRlc3RfaGFtX29yX3NwYW0kdGV4dCkpICMgY29ycHVzIHRlc3QgZGF0YQ0KDQojIGNvcnB1cyBjbGVhbmluZw0KdHJhaW5fY29ycHVzIDwtIHRyYWluX2NvcnB1cyAlPiUNCiAgdG1fbWFwKHJlbW92ZU51bWJlcnMpICU+JQ0KICB0bV9tYXAocmVtb3ZlUHVuY3R1YXRpb24pICU+JQ0KICB0bV9tYXAocmVtb3ZlV29yZHMsIHN0b3B3b3JkcygpKSAlPiUNCiAgdG1fbWFwKHN0cmlwV2hpdGVzcGFjZSkNCg0KdGVzdF9jb3JwdXMgPC0gdGVzdF9jb3JwdXMgJT4lDQogIHRtX21hcChyZW1vdmVOdW1iZXJzKSAlPiUNCiAgdG1fbWFwKHJlbW92ZVB1bmN0dWF0aW9uKSAlPiUNCiAgdG1fbWFwKHJlbW92ZVdvcmRzLCBzdG9wd29yZHMoKSkgJT4lDQogIHRtX21hcChzdHJpcFdoaXRlc3BhY2UpDQoNCnRyYWluX3RkbSA8LSBEb2N1bWVudFRlcm1NYXRyaXgodHJhaW5fY29ycHVzKQ0KdGVzdF90ZG0gPC0gRG9jdW1lbnRUZXJtTWF0cml4KHRlc3RfY29ycHVzKQ0KDQp0cmFpbl90ZG0NCnRlc3RfdGRtDQp0cmFpbl9jb3JwdXMNCnRlc3RfY29ycHVzDQoNCmBgYA0KDQpXZSBuZWVkIHRvIHNlcGFyYXRlIHRyYWluaW5nIGRhdGEgdG8gc3BhbSBhbmQgaGFtLg0KDQpgYGB7cn0NCg0Kc3BhbSA8LSBzdWJzZXQodHJhaW5faGFtX29yX3NwYW0sIGVtYWlsID09ICJzcGFtIikNCmhhbSA8LSBzdWJzZXQodHJhaW5faGFtX29yX3NwYW0sIGVtYWlsID09ICJoYW0iKQ0KDQoNCmBgYA0KDQoNCg0KSWYgd2UgcnVuIGFsbCB0aGUgb2JzZXJ2YXRpb24gaW4gbXkgZGF0YSwgUiBkb2VzbnQgaGF2ZSBlbm91Z2ggbWVtb3J5IHRvIGV4ZWN1dGUgaXQgYXQgdGhlIG1vbWVudC4gU28sIEkgYW0gZ29pbmcgdG8gbmFycm93IGRvd24gdGhlIG9ic2VydmF0aW9ucyBieSBzZWxlY3Rpbmcgd29yZHMgdGhhdCB1c2VzIGF0IGxlYXN0IDUwIHRpbWVzIGluIHRoZSB0cmFpbmluZyBkb2N1bWVudC4NCg0KDQpgYGB7cn0NCg0KZmlmdHlfdGltZXNfd29yZHM8LSBmaW5kRnJlcVRlcm1zKHRyYWluX3RkbSwgNTApDQpsZW5ndGgoZmlmdHlfdGltZXNfd29yZHMpDQoNCmBgYA0KDQoNCmBgYHtyfQ0KDQp0cmFpbl90ZG1fMjwtIERvY3VtZW50VGVybU1hdHJpeCh0cmFpbl9jb3JwdXMsIGNvbnRyb2w9bGlzdChkaWN0aW9uYXJ5ID0gZmlmdHlfdGltZXNfd29yZHMpKQ0KDQp0ZXN0X3RkbV8yPC0gRG9jdW1lbnRUZXJtTWF0cml4KHRlc3RfY29ycHVzLCBjb250cm9sPWxpc3QoZGljdGlvbmFyeSA9IGZpZnR5X3RpbWVzX3dvcmRzKSkNCg0KYGBgDQoNCg0KDQojIyBNb2RlbCBEZXZlbG9wbWVudA0KDQpXZSBuZWVkIHRvIGNyZWF0ZSBhIGNsYXNzaWZpZXIgZm9yIGVhY2ggZW1haWwuDQoNCmBgYHtyfQ0KIyB0aGlzIGlzIHJlcXVpcmVkIGluIG9yZGVyIHRvIHNldCB0aGUgY2xhc3NpZmllciBmb3IgbmFpdmVCYXllcw0KY2xhc3ModHJhaW5fdGRtXzIpDQp0cmFpbl90ZG1fMyA8LSBhcy5tYXRyaXgodHJhaW5fdGRtXzIpDQp0cmFpbl90ZG1fMyA8LSBhcy5kYXRhLmZyYW1lKHRyYWluX3RkbV8zKQ0KY2xhc3ModHJhaW5fdGRtXzMpDQoNCg0KYGBgDQoNCg0KDQpgYGB7cn0NCg0KY2xhc3NpZmllciA8LSBuYWl2ZUJheWVzKHRyYWluX3RkbV8zLCBmYWN0b3IodHJhaW5faGFtX29yX3NwYW0kZW1haWwpKQ0KDQpgYGANCg0KYGBge3J9DQpjbGFzcyhjbGFzc2lmaWVyKQ0KDQpgYGANCg0KYGBge3J9DQpjbGFzcyh0ZXN0X3RkbV8yKQ0KdGVzdF90ZG1fMyA8LSBhcy5tYXRyaXgodGVzdF90ZG1fMikNCnRlc3RfdGRtXzMgPC0gYXMuZGF0YS5mcmFtZSh0ZXN0X3RkbV8zKQ0KY2xhc3ModGVzdF90ZG1fMykNCg0KDQpgYGANCg0KDQoNCiMgUGVkaWN0aW9uDQoNCldlIGNhbiB1c2UgdGhlIHByZWRpY3QgZnVuY3Rpb24gdG8gdGVzdCB0aGUgbW9kZWwgb24gbmV3IGRhdGEuICIgdGVzdF9wcmVkIDwtIHByZWRpY3QoY2xhc3NpZmllciwgbmV3ZGF0YT10ZXN0X3RkbV8zKSINCg0KIyBDb25jbHVzaW9uDQoNCldlIGFyZSBhYmxlIHRvIGdlbmVyYXRlIHByZWRpY3Rpb24gb2YgZW1haWwgYmVpbmcgaGFtIG9yIHNwYW0gKHVzaW5nIHN1cGVydmlzZWQgdGVjaG5pcXVlIC1uYWl2ZSBCYXllcyBtZXRob2QpLiBXZSBjYW4gZnVydGhlciB0ZXN0IGl0IGFnYWluc3QgdGhlIHJhdyBkYXRhIGFuZCBldmFsdWF0ZSBtb2RlbCdzIHBlcmZvcm1hbmNlLg0KDQoqKiBVbmZvcnR1bmF0ZWx5LCBpIGhhdmUgcmFuIGEgbG90IG9mIGNvZGUgZWZmaWNpZW5jeSBpc3N1ZXMgb24gdGhpcyBwcm9qZWN0LiBNYWpvcml0eSBvZiB0aGUgdGltZSBJIHdhc250IGFibGUgdG8gY3JlYXRlIGVmZmljaWVudCBjb2RlIGFuZCB3aGVuIGkgcmV2aWV3ZWQgdGhlIGVycm9yIG1lc3NhZ2VzIEkgZm91bmQgb3V0IHRoYXQgdGhlIGNvZGUgdGhhdCBpIGNyZWF0ZWQgdXNpbmcgYSBsb3Qgb2YgbWVtbW9yeS4gRm9yIGV4YW1wbGUgaSBoYWQgdG8gY2hhbmdlIHRoZSBjbGFzcyB0eXBlIHRvIG1ha2UgdGhlIGNsYXNzaWZpZXIgd29yay4gKioNCg==