Consider the dataset ‘Amazon reviews’ which contains the reviews of the ‘Philips Avent 3 Pack 9oz Bottles’ used for kids along with the ratings given by users. Perform the following steps for classifying the reviews as good/bad(1 indicating that the review is positive and 0 indicating that the review is negative)

a) Read the given Amazon_reviews.csv file into a variable named ‘product_review’.

library(data.table)
library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
setwd('/Users/faiz/Downloads/_POP/KLU/bigdataanalytics/Session10')
list.files()
##  [1] "Amazon Reviews.Rmd"  "amazon_reviews.csv"  "amazon_reviews.xlsx"
##  [4] "Amazon-Reviews.pdf"  "Amazon-Reviews.Rmd"  "mydata.csv"         
##  [7] "spam"                "SpamDetection.Rmd"   "test_data.csv"      
## [10] "tr_data.csv"         "train_data.csv"
product_review <- fread('amazon_reviews.csv', showProgress = FALSE)
product_review <- product_review %>% 
  rename(review = Description)
head(product_review)
##                Name Rating       Date
##              <char>  <num>     <char>
## 1:           ashley      5 28/02/2020
## 2:           B.Fish      5 23/12/2019
## 3:  Amazon Customer      1 02/06/2018
## 4: Shelbie Anderson      5 10/12/2020
## 5:        alexandra      5 16/12/2019
## 6:      Dave Gerber      5 20/06/2020
##                                                                                                                                                                                                                                                                                                                                                                                                                                                             review
##                                                                                                                                                                                                                                                                                                                                                                                                                                                             <char>
## 1:                                                                                                                                                                                   One of the nipples caved in a lot so that’s why it’s 4 stars but when I bought new ones the bottles worked great. Easy to clean and maintain. However, if you do use 9 oz... it is very hard to get the formula and water together without spilling out of the top\nRead more
## 2:                                                                                                I bought these after using the 4oz size. Everything is interchangeable which is great! These are the only style bottle my baby likes and as he got bigger I needed a larger size. I use both these and the 4oz with no complaints. They are a but big for my 4 month old to hold but that's no big deal, he hasn't really gotten a hang of it anyway.\nRead more
## 3: My daughter had been using the Kiinde bottles since birth, but we quickly grew tired of paying for their one time use bottle bags. These Avent Natural Nipple Bottles were the perfect replacement!! No milk leaks out of the sides of her mouth unlike with other ""hourglass"" shaped nipples. Have had for a few months now and so far so good! easy to take apart, clean and put back together again and the measurments haven't washed off yet!\nRead more
## 4:                                                                                                                                                                                                                                                                                                                                                                                                      We’ve used these for a long time. Good product.\nRead more
## 5:                                                                                                                                                                                                                                                                            These were mine and my daughters favorite when she was younger, so we repurchased them for little sister! No leaks, easy to clean, and never caused any nipple confusion.\nRead more
## 6:                                                                                                                                                                                                                                                             My baby boy loves breastfeeding. My husband use these  bottles to feed him at night when he helps me. My baby boys loves these nipples.  We tried other nipples and he didn’t like them.\nRead more

b) Display the table of ratings i.e., the count of each kind of rating.

library(dplyr)

table_of_ratings <- product_review %>% group_by(Rating) %>% summarize(count=n())

table_of_ratings
## # A tibble: 5 × 2
##   Rating count
##    <dbl> <int>
## 1      1    62
## 2      2    95
## 3      3    21
## 4      4   163
## 5      5   979

c) We can observe that the ratings are quite spread on the extreme ends. Generally, people write reviews if they are super happy or dislike the product. Add a new column named ‘rating_new’ by putting a ‘1’ for good reviews i.e the records whose rating values are {4,5} and ‘0’ for bad reviews i.e the records whose ratings are {1, 2}. Discard the records Where the rating is equal to 3.

filter rows that have rating a 3 as they are not significant

product_review_filtered <- product_review %>% filter(Rating != '3')
product_review_filtered
##                   Name Rating       Date
##                 <char>  <num>     <char>
##    1:           ashley      5 28/02/2020
##    2:           B.Fish      5 23/12/2019
##    3:  Amazon Customer      1 02/06/2018
##    4: Shelbie Anderson      5 10/12/2020
##    5:        alexandra      5 16/12/2019
##   ---                                   
## 1295: Shelbie Anderson      5 10/12/2020
## 1296:        alexandra      5 16/12/2019
## 1297:      Dave Gerber      2 20/06/2020
## 1298:            Jinju      4 20/05/2019
## 1299:           Bianca      5 22/12/2019
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       review
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       <char>
##    1:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          One of the nipples caved in a lot so that’s why it’s 4 stars but when I bought new ones the bottles worked great. Easy to clean and maintain. However, if you do use 9 oz... it is very hard to get the formula and water together without spilling out of the top\nRead more
##    2:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       I bought these after using the 4oz size. Everything is interchangeable which is great! These are the only style bottle my baby likes and as he got bigger I needed a larger size. I use both these and the 4oz with no complaints. They are a but big for my 4 month old to hold but that's no big deal, he hasn't really gotten a hang of it anyway.\nRead more
##    3:                                                                                                                                                                                                                                                                                                                                                                                                                                                        My daughter had been using the Kiinde bottles since birth, but we quickly grew tired of paying for their one time use bottle bags. These Avent Natural Nipple Bottles were the perfect replacement!! No milk leaks out of the sides of her mouth unlike with other ""hourglass"" shaped nipples. Have had for a few months now and so far so good! easy to take apart, clean and put back together again and the measurments haven't washed off yet!\nRead more
##    4:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             We’ve used these for a long time. Good product.\nRead more
##    5:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   These were mine and my daughters favorite when she was younger, so we repurchased them for little sister! No leaks, easy to clean, and never caused any nipple confusion.\nRead more
##   ---                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## 1295:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             We’ve used these for a long time. Good product.\nRead more
## 1296:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   These were mine and my daughters favorite when she was younger, so we repurchased them for little sister! No leaks, easy to clean, and never caused any nipple confusion.\nRead more
## 1297:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    My baby boy loves breastfeeding. My husband use these  bottles to feed him at night when he helps me. My baby boys loves these nipples.  We tried other nipples and he didn’t like them.\nRead more
## 1298: My third child loves this bottle. Takes this with a grain of salt, though, because all of our kids have preferred different bottles across different brands. I don't think him preferring this bottle makes it superior to all other bottles--it just makes it superior for him.For our purposes, this is a great bottle. The measurements on the outside are accurate and the writing hasn't washed off or smudged at all. We have microwaved and dishwashed these bottles regularly and they've held up well.My only complaint about these bottles is that they come with the size 2 nipples. By the time we had bought these bottles, our kid had already moved on to size 3. So we had lots of size 2 nipples that we couldn't use and needed to purchase additional nipples for all of our bottles. I wish there were some way to select the size nipples that one wants to go with their new bottles.\nRead more
## 1299:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Love these bottles ! Make sure that the little indent lines up with the notch on the inside of the nipple. That’s what controls the airflow for less air in the bottle\nRead more
table_of_ratings_filtered <- product_review_filtered %>% group_by(Rating) %>% summarize(count=n())

table_of_ratings_filtered
## # A tibble: 4 × 2
##   Rating count
##    <dbl> <int>
## 1      1    62
## 2      2    95
## 3      4   163
## 4      5   979

Add a new column named ‘rating_new’ by putting a ‘1’ for good reviews i.e the records whose rating values are {4,5} and ‘0’ for bad reviews i.e the records whose ratings are {1, 2}

library(dplyr)
product_review_modified <-product_review_filtered %>%
  mutate(rating_new = case_when(Rating %in% c('1', '2') ~ 0, 
                               Rating %in% c('4', '5') ~ 1
                               ))

product_review_modified
##                   Name Rating       Date
##                 <char>  <num>     <char>
##    1:           ashley      5 28/02/2020
##    2:           B.Fish      5 23/12/2019
##    3:  Amazon Customer      1 02/06/2018
##    4: Shelbie Anderson      5 10/12/2020
##    5:        alexandra      5 16/12/2019
##   ---                                   
## 1295: Shelbie Anderson      5 10/12/2020
## 1296:        alexandra      5 16/12/2019
## 1297:      Dave Gerber      2 20/06/2020
## 1298:            Jinju      4 20/05/2019
## 1299:           Bianca      5 22/12/2019
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       review
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       <char>
##    1:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          One of the nipples caved in a lot so that’s why it’s 4 stars but when I bought new ones the bottles worked great. Easy to clean and maintain. However, if you do use 9 oz... it is very hard to get the formula and water together without spilling out of the top\nRead more
##    2:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       I bought these after using the 4oz size. Everything is interchangeable which is great! These are the only style bottle my baby likes and as he got bigger I needed a larger size. I use both these and the 4oz with no complaints. They are a but big for my 4 month old to hold but that's no big deal, he hasn't really gotten a hang of it anyway.\nRead more
##    3:                                                                                                                                                                                                                                                                                                                                                                                                                                                        My daughter had been using the Kiinde bottles since birth, but we quickly grew tired of paying for their one time use bottle bags. These Avent Natural Nipple Bottles were the perfect replacement!! No milk leaks out of the sides of her mouth unlike with other ""hourglass"" shaped nipples. Have had for a few months now and so far so good! easy to take apart, clean and put back together again and the measurments haven't washed off yet!\nRead more
##    4:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             We’ve used these for a long time. Good product.\nRead more
##    5:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   These were mine and my daughters favorite when she was younger, so we repurchased them for little sister! No leaks, easy to clean, and never caused any nipple confusion.\nRead more
##   ---                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## 1295:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             We’ve used these for a long time. Good product.\nRead more
## 1296:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   These were mine and my daughters favorite when she was younger, so we repurchased them for little sister! No leaks, easy to clean, and never caused any nipple confusion.\nRead more
## 1297:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    My baby boy loves breastfeeding. My husband use these  bottles to feed him at night when he helps me. My baby boys loves these nipples.  We tried other nipples and he didn’t like them.\nRead more
## 1298: My third child loves this bottle. Takes this with a grain of salt, though, because all of our kids have preferred different bottles across different brands. I don't think him preferring this bottle makes it superior to all other bottles--it just makes it superior for him.For our purposes, this is a great bottle. The measurements on the outside are accurate and the writing hasn't washed off or smudged at all. We have microwaved and dishwashed these bottles regularly and they've held up well.My only complaint about these bottles is that they come with the size 2 nipples. By the time we had bought these bottles, our kid had already moved on to size 3. So we had lots of size 2 nipples that we couldn't use and needed to purchase additional nipples for all of our bottles. I wish there were some way to select the size nipples that one wants to go with their new bottles.\nRead more
## 1299:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Love these bottles ! Make sure that the little indent lines up with the notch on the inside of the nipple. That’s what controls the airflow for less air in the bottle\nRead more
##       rating_new
##            <num>
##    1:          1
##    2:          1
##    3:          0
##    4:          1
##    5:          1
##   ---           
## 1295:          1
## 1296:          1
## 1297:          0
## 1298:          1
## 1299:          1

Updated Rating counts

table_of_ratings_modified <- product_review_modified %>% group_by(rating_new) %>% summarize(count=n())

table_of_ratings_modified
## # A tibble: 2 × 2
##   rating_new count
##        <dbl> <int>
## 1          0   157
## 2          1  1142

d) Divide the cleaned dataset into 2 parts namely training set and test set wherein the training set possesses the initial 80% records and the test set the rest.

library(caTools)
review_cleaned <- subset(product_review_modified, select = -c(Rating) )
head(review_cleaned)
##                Name       Date
##              <char>     <char>
## 1:           ashley 28/02/2020
## 2:           B.Fish 23/12/2019
## 3:  Amazon Customer 02/06/2018
## 4: Shelbie Anderson 10/12/2020
## 5:        alexandra 16/12/2019
## 6:      Dave Gerber 20/06/2020
##                                                                                                                                                                                                                                                                                                                                                                                                                                                             review
##                                                                                                                                                                                                                                                                                                                                                                                                                                                             <char>
## 1:                                                                                                                                                                                   One of the nipples caved in a lot so that’s why it’s 4 stars but when I bought new ones the bottles worked great. Easy to clean and maintain. However, if you do use 9 oz... it is very hard to get the formula and water together without spilling out of the top\nRead more
## 2:                                                                                                I bought these after using the 4oz size. Everything is interchangeable which is great! These are the only style bottle my baby likes and as he got bigger I needed a larger size. I use both these and the 4oz with no complaints. They are a but big for my 4 month old to hold but that's no big deal, he hasn't really gotten a hang of it anyway.\nRead more
## 3: My daughter had been using the Kiinde bottles since birth, but we quickly grew tired of paying for their one time use bottle bags. These Avent Natural Nipple Bottles were the perfect replacement!! No milk leaks out of the sides of her mouth unlike with other ""hourglass"" shaped nipples. Have had for a few months now and so far so good! easy to take apart, clean and put back together again and the measurments haven't washed off yet!\nRead more
## 4:                                                                                                                                                                                                                                                                                                                                                                                                      We’ve used these for a long time. Good product.\nRead more
## 5:                                                                                                                                                                                                                                                                            These were mine and my daughters favorite when she was younger, so we repurchased them for little sister! No leaks, easy to clean, and never caused any nipple confusion.\nRead more
## 6:                                                                                                                                                                                                                                                             My baby boy loves breastfeeding. My husband use these  bottles to feed him at night when he helps me. My baby boys loves these nipples.  We tried other nipples and he didn’t like them.\nRead more
##    rating_new
##         <num>
## 1:          1
## 2:          1
## 3:          0
## 4:          1
## 5:          1
## 6:          1
set.seed(123) 
split = sample.split(review_cleaned$rating_new, SplitRatio = 0.75) 
  
training_data = subset(review_cleaned, split == TRUE) 
testing_data = subset(review_cleaned, split == FALSE) 

dim(training_data)
## [1] 974   4
dim(testing_data)
## [1] 325   4

e) Import the tm library. Now consider the ‘review’ column, create a corpus; tokenize it by constructing a Document term matrix;

library(tm)
## Loading required package: NLP
library(NLP)
#library(tidytext)
#library(tidyr)
reviews_train_vs <- VectorSource(training_data$review)
reviews_test_vs <- VectorSource(testing_data$review)

review_train_corpus <- Corpus(reviews_train_vs)
review_test_corpus <- Corpus(reviews_test_vs)

# Use class function to know the data type of the variable
class(review_train_corpus)
## [1] "SimpleCorpus" "Corpus"
# Print the structure of the Corpus
print(review_train_corpus)
## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 974
print(review_test_corpus)
## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 325

Creating dense DTMs. There are three necessary steps: (1) tokenize, (2) create vocabulary, and (3) match and count.

library(lattice)
training_data$textLength <- nchar(training_data$review)
hist(training_data$textLength)

histogram(~textLength, data = training_data)

* To view the contents of the review Corpus

inspect(review_train_corpus[1])
## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 1
## 
## [1] One of the nipples caved in a lot so that’s why it’s 4 stars but when I bought new ones the bottles worked great. Easy to clean and maintain. However, if you do use 9 oz... it is very hard to get the formula and water together without spilling out of the top\nRead more
#inspect(review_test_corpus[1])

Function for Clean The Text:

Remove the punctuation and numbers and store it in the matrix format and name it as ‘training_set_toy’.

stopwords()
##   [1] "i"          "me"         "my"         "myself"     "we"        
##   [6] "our"        "ours"       "ourselves"  "you"        "your"      
##  [11] "yours"      "yourself"   "yourselves" "he"         "him"       
##  [16] "his"        "himself"    "she"        "her"        "hers"      
##  [21] "herself"    "it"         "its"        "itself"     "they"      
##  [26] "them"       "their"      "theirs"     "themselves" "what"      
##  [31] "which"      "who"        "whom"       "this"       "that"      
##  [36] "these"      "those"      "am"         "is"         "are"       
##  [41] "was"        "were"       "be"         "been"       "being"     
##  [46] "have"       "has"        "had"        "having"     "do"        
##  [51] "does"       "did"        "doing"      "would"      "should"    
##  [56] "could"      "ought"      "i'm"        "you're"     "he's"      
##  [61] "she's"      "it's"       "we're"      "they're"    "i've"      
##  [66] "you've"     "we've"      "they've"    "i'd"        "you'd"     
##  [71] "he'd"       "she'd"      "we'd"       "they'd"     "i'll"      
##  [76] "you'll"     "he'll"      "she'll"     "we'll"      "they'll"   
##  [81] "isn't"      "aren't"     "wasn't"     "weren't"    "hasn't"    
##  [86] "haven't"    "hadn't"     "doesn't"    "don't"      "didn't"    
##  [91] "won't"      "wouldn't"   "shan't"     "shouldn't"  "can't"     
##  [96] "cannot"     "couldn't"   "mustn't"    "let's"      "that's"    
## [101] "who's"      "what's"     "here's"     "there's"    "when's"    
## [106] "where's"    "why's"      "how's"      "a"          "an"        
## [111] "the"        "and"        "but"        "if"         "or"        
## [116] "because"    "as"         "until"      "while"      "of"        
## [121] "at"         "by"         "for"        "with"       "about"     
## [126] "against"    "between"    "into"       "through"    "during"    
## [131] "before"     "after"      "above"      "below"      "to"        
## [136] "from"       "up"         "down"       "in"         "out"       
## [141] "on"         "off"        "over"       "under"      "again"     
## [146] "further"    "then"       "once"       "here"       "there"     
## [151] "when"       "where"      "why"        "how"        "all"       
## [156] "any"        "both"       "each"       "few"        "more"      
## [161] "most"       "other"      "some"       "such"       "no"        
## [166] "nor"        "not"        "only"       "own"        "same"      
## [171] "so"         "than"       "too"        "very"
review_corpus_clean <- function(corpus){
  
  corpus_clean <- tm_map(corpus,tolower) 

#remove numbers from text
      
corpus_clean <-tm_map(corpus_clean,removeNumbers)

# remove Punctuation marks like ?, . etc
corpus_clean <-tm_map(corpus_clean,removePunctuation)

# remove stop words

corpus_clean <-tm_map(corpus_clean,removeWords,stopwords())

# to remove extra white space characters
corpus_clean <-tm_map(corpus_clean,stripWhitespace)
return(corpus_clean)
}

Cleaning Train and Test Review Corpus

review_train_corpus_clean <-review_corpus_clean(review_train_corpus)
## Warning in tm_map.SimpleCorpus(corpus, tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(corpus_clean, removeNumbers): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(corpus_clean, removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(corpus_clean, removeWords, stopwords()):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(corpus_clean, stripWhitespace): transformation
## drops documents
#inspect(review_train_corpus_clean[1])
review_test_corpus_clean <-review_corpus_clean(review_test_corpus)
## Warning in tm_map.SimpleCorpus(corpus, tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(corpus_clean, removeNumbers): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(corpus_clean, removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(corpus_clean, removeWords, stopwords()):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(corpus_clean, stripWhitespace): transformation
## drops documents
inspect(review_train_corpus_clean[15])
## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 1
## 
## [1]  baby boy loves breastfeeding husband use bottles feed night helps baby boys loves nipples tried nipples didn’t like read

Documment Term Matrix:

training_set_toy <- DocumentTermMatrix(review_train_corpus_clean)

dim(training_set_toy)
## [1] 974 164
testing_set_toy <- DocumentTermMatrix(review_test_corpus_clean)

inspect(testing_set_toy)
## <<DocumentTermMatrix (documents: 325, terms: 164)>>
## Non-/sparse entries: 8648/44652
## Sparsity           : 84%
## Maximal term length: 15
## Weighting          : term frequency (tf)
## Sample             :
##     Terms
## Docs baby bottle bottles loves nipples one read size time use
##   15    0      3       6     1       4   1    1    4    1   1
##   26    0      3       6     1       4   1    1    4    1   1
##   30    0      3       6     1       4   1    1    4    1   1
##   34    0      3       6     1       4   1    1    4    1   1
##   42    0      3       6     1       4   1    1    4    1   1
##   54    0      3       6     1       4   1    1    4    1   1
##   59    0      3       6     1       4   1    1    4    1   1
##   72    0      3       6     1       4   1    1    4    1   1
##   82    0      3       6     1       4   1    1    4    1   1
##   83    0      3       6     1       4   1    1    4    1   1

Converting rating_new column to a dataframe .

rating_df <- function(data){
  data <- data %>% 
  rename(y = rating_new)
data <- data$y
data <- as.data.frame(data)
colnames(data) <- c('y')
colnames(data)
  
  return(data)
}
rating_train_df <- rating_df(training_data)

Binding with with Document Term Matrix and back to train data frame

 x <-  cbind(rating_train_df$y,training_set_toy)
dtmMatrix <- as.matrix(x)
write.csv(dtmMatrix, "train_data.csv")
list.files()
##  [1] "Amazon Reviews.Rmd"   "amazon_reviews.csv"   "amazon_reviews.xlsx" 
##  [4] "Amazon-Reviews_files" "Amazon-Reviews.pdf"   "Amazon-Reviews.Rmd"  
##  [7] "mydata.csv"           "spam"                 "SpamDetection.Rmd"   
## [10] "test_data.csv"        "tr_data.csv"          "train_data.csv"
list.files()
##  [1] "Amazon Reviews.Rmd"   "amazon_reviews.csv"   "amazon_reviews.xlsx" 
##  [4] "Amazon-Reviews_files" "Amazon-Reviews.pdf"   "Amazon-Reviews.Rmd"  
##  [7] "mydata.csv"           "spam"                 "SpamDetection.Rmd"   
## [10] "test_data.csv"        "tr_data.csv"          "train_data.csv"
library(data.table)
df_train <- fread('train_data.csv',showProgress = FALSE)
head(df_train)
##       V1    V2 bottles bought caved clean  easy formula   get great  hard
##    <int> <int>   <int>  <int> <int> <int> <int>   <int> <int> <int> <int>
## 1:     1     1       1      1     1     1     1       1     1     1     1
## 2:     2     0       2      0     0     1     1       0     0     0     0
## 3:     3     1       0      0     0     0     0       0     0     0     0
## 4:     4     1       6      1     0     0     0       0     0     1     0
## 5:     5     1       1      0     0     0     0       0     0     0     0
## 6:     6     1       0      1     0     0     0       0     0     1     0
##    however   lot maintain   new nipples   one  ones  read spilling stars
##      <int> <int>    <int> <int>   <int> <int> <int> <int>    <int> <int>
## 1:       1     1        1     1       1     1     1     1        1     1
## 2:       0     0        0     0       1     1     0     1        0     0
## 3:       0     0        0     0       0     0     0     1        0     0
## 4:       0     0        0     1       4     1     0     1        0     0
## 5:       0     0        0     0       0     0     0     1        0     0
## 6:       0     0        0     0       0     0     0     1        0     0
##    together   top   use water without worked    ’s apart avent  back  bags
##       <int> <int> <int> <int>   <int>  <int> <int> <int> <int> <int> <int>
## 1:        1     1     1     1       1      1     2     0     0     0     0
## 2:        1     0     1     0       0      0     0     1     1     1     1
## 3:        0     0     0     0       0      0     0     0     0     0     0
## 4:        0     0     1     0       0      0     0     0     0     0     0
## 5:        0     0     0     0       0      0     1     0     0     0     0
## 6:        0     0     1     0       0      0     0     0     0     0     0
##    birth bottle daughter   far  good  grew havent hourglass kiinde leaks
##    <int>  <int>    <int> <int> <int> <int>  <int>     <int>  <int> <int>
## 1:     0      0        0     0     0     0      0         0      0     0
## 2:     1      1        1     1     1     1      1         1      1     1
## 3:     0      0        0     0     1     0      0         0      0     0
## 4:     0      3        0     0     0     0      0         0      0     0
## 5:     0      1        0     0     0     0      0         0      0     0
## 6:     0      1        0     0     0     0      0         0      0     0
##    measurments  milk months mouth natural nipple   now paying perfect   put
##          <int> <int>  <int> <int>   <int>  <int> <int>  <int>   <int> <int>
## 1:           0     0      0     0       0      0     0      0       0     0
## 2:           1     1      1     1       1      1     1      1       1     1
## 3:           0     0      0     0       0      0     0      0       0     0
## 4:           0     0      0     0       0      0     0      0       0     0
## 5:           0     0      0     0       0      1     0      0       0     0
## 6:           0     0      0     0       0      0     0      0       0     0
##    quickly replacement shaped sides since  take  time tired unlike using washed
##      <int>       <int>  <int> <int> <int> <int> <int> <int>  <int> <int>  <int>
## 1:       0           0      0     0     0     0     0     0      0     0      0
## 2:       1           1      1     1     1     1     1     1      1     1      1
## 3:       0           0      0     0     0     0     1     0      0     0      0
## 4:       0           0      0     0     0     0     1     0      0     0      1
## 5:       0           0      0     0     0     0     0     0      0     0      0
## 6:       0           0      0     0     0     0     0     0      0     1      0
##      yet  long product  used   ’ve accurate across additional already bottlesit
##    <int> <int>   <int> <int> <int>    <int>  <int>      <int>   <int>     <int>
## 1:     0     0       0     0     0        0      0          0       0         0
## 2:     1     0       0     0     0        0      0          0       0         0
## 3:     0     1       1     1     1        0      0          0       0         0
## 4:     0     0       0     0     0        1      1          1       1         1
## 5:     0     0       0     0     0        0      0          0       0         0
## 6:     0     0       0     0     0        0      0          0       0         0
##    brands child  come complaint couldnt different dishwashed  dont grain hasnt
##     <int> <int> <int>     <int>   <int>     <int>      <int> <int> <int> <int>
## 1:      0     0     0         0       0         0          0     0     0     0
## 2:      0     0     0         0       0         0          0     0     0     0
## 3:      0     0     0         0       0         0          0     0     0     0
## 4:      1     1     1         1       1         2          1     1     1     1
## 5:      0     0     0         0       0         0          0     0     0     0
## 6:      0     0     0         0       0         0          0     0     0     1
##     held himfor  just   kid  kids  lots loves makes measurements microwaved
##    <int>  <int> <int> <int> <int> <int> <int> <int>        <int>      <int>
## 1:     0      0     0     0     0     0     0     0            0          0
## 2:     0      0     0     0     0     0     0     0            0          0
## 3:     0      0     0     0     0     0     0     0            0          0
## 4:     1      1     1     1     1     1     1     2            1          1
## 5:     0      0     0     0     0     0     0     0            0          0
## 6:     0      0     0     0     0     0     0     0            0          0
##    moved needed outside preferred preferring purchase purposes regularly  salt
##    <int>  <int>   <int>     <int>      <int>    <int>    <int>     <int> <int>
## 1:     0      0       0         0          0        0        0         0     0
## 2:     0      0       0         0          0        0        0         0     0
## 3:     0      0       0         0          0        0        0         0     0
## 4:     1      1       1         1          1        1        1         1     1
## 5:     0      0       0         0          0        0        0         0     0
## 6:     0      1       0         0          0        0        0         0     0
##    select  size smudged superior takes theyve think third though wants   way
##     <int> <int>   <int>    <int> <int>  <int> <int> <int>  <int> <int> <int>
## 1:      0     0       0        0     0      0     0     0      0     0     0
## 2:      0     0       0        0     0      0     0     0      0     0     0
## 3:      0     0       0        0     0      0     0     0      0     0     0
## 4:      1     4       1        2     1      1     1     1      1     1     1
## 5:      0     0       0        0     0      0     0     0      0     0     0
## 6:      0     2       0        0     0      0     0     0      0     0     0
##    wellmy  wish writing   air airflow controls indent inside  less lines little
##     <int> <int>   <int> <int>   <int>    <int>  <int>  <int> <int> <int>  <int>
## 1:      0     0       0     0       0        0      0      0     0     0      0
## 2:      0     0       0     0       0        0      0      0     0     0      0
## 3:      0     0       0     0       0        0      0      0     0     0      0
## 4:      1     1       1     0       0        0      0      0     0     0      0
## 5:      0     0       0     1       1        1      1      1     1     1      1
## 6:      0     0       0     0       0        0      0      0     0     0      0
##     love  make notch  sure anyway  baby   big bigger complaints  deal
##    <int> <int> <int> <int>  <int> <int> <int>  <int>      <int> <int>
## 1:     0     0     0     0      0     0     0      0          0     0
## 2:     0     0     0     0      0     0     0      0          0     0
## 3:     0     0     0     0      0     0     0      0          0     0
## 4:     0     0     0     0      0     0     0      0          0     0
## 5:     1     1     1     1      0     0     0      0          0     0
## 6:     0     0     0     0      1     1     2      1          1     1
##    everything   got gotten  hang  hold interchangeable larger likes month   old
##         <int> <int>  <int> <int> <int>           <int>  <int> <int> <int> <int>
## 1:          0     0      0     0     0               0      0     0     0     0
## 2:          0     0      0     0     0               0      0     0     0     0
## 3:          0     0      0     0     0               0      0     0     0     0
## 4:          0     0      0     0     0               0      0     0     0     0
## 5:          0     0      0     0     0               0      0     0     0     0
## 6:          1     1      1     1     1               1      1     1     1     1
##    really style thats   boy  boys breastfeeding didn’t  feed helps husband
##     <int> <int> <int> <int> <int>         <int>  <int> <int> <int>   <int>
## 1:      0     0     0     0     0             0      0     0     0       0
## 2:      0     0     0     0     0             0      0     0     0       0
## 3:      0     0     0     0     0             0      0     0     0       0
## 4:      0     0     0     0     0             0      0     0     0       0
## 5:      0     0     0     0     0             0      0     0     0       0
## 6:      1     1     1     0     0             0      0     0     0       0
##     like night tried caused confusion daughters favorite  mine never
##    <int> <int> <int>  <int>     <int>     <int>    <int> <int> <int>
## 1:     0     0     0      0         0         0        0     0     0
## 2:     0     0     0      0         0         0        0     0     0
## 3:     0     0     0      0         0         0        0     0     0
## 4:     0     0     0      0         0         0        0     0     0
## 5:     0     0     0      0         0         0        0     0     0
## 6:     0     0     0      0         0         0        0     0     0
##    repurchased sister younger
##          <int>  <int>   <int>
## 1:           0      0       0
## 2:           0      0       0
## 3:           0      0       0
## 4:           0      0       0
## 5:           0      0       0
## 6:           0      0       0
df_train <- df_train%>%
  rename(y = V2)
#str(training_set_toy)
testdtmMatrix <- as.matrix(testing_set_toy)
write.csv(testdtmMatrix, "test_data.csv")
df_test <- fread('test_data.csv',showProgress = FALSE)

converting to factor variable

glimpse(df_train)
## Rows: 974
## Columns: 166
## $ V1              <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
## $ y               <int> 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, …
## $ bottles         <int> 1, 2, 0, 6, 1, 0, 0, 1, 6, 1, 1, 0, 2, 0, 1, 6, 1, 2, …
## $ bought          <int> 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, …
## $ caved           <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ clean           <int> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, …
## $ easy            <int> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, …
## $ formula         <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ get             <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ great           <int> 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, …
## $ hard            <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ however         <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ lot             <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ maintain        <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ new             <int> 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, …
## $ nipples         <int> 1, 1, 0, 4, 0, 0, 0, 2, 4, 0, 1, 0, 1, 0, 2, 4, 1, 1, …
## $ one             <int> 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, …
## $ ones            <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ read            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ spilling        <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ stars           <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ together        <int> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, …
## $ top             <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ use             <int> 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, …
## $ water           <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ without         <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ worked          <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ `’s`            <int> 2, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 2, 0, …
## $ apart           <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ avent           <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ back            <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ bags            <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ birth           <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ bottle          <int> 0, 1, 0, 3, 1, 1, 0, 0, 3, 1, 0, 1, 1, 0, 0, 3, 0, 1, …
## $ daughter        <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ far             <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ good            <int> 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ grew            <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ havent          <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ hourglass       <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ kiinde          <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ leaks           <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, …
## $ measurments     <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ milk            <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ months          <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ mouth           <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ natural         <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ nipple          <int> 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, …
## $ now             <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ paying          <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ perfect         <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ put             <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ quickly         <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ replacement     <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ shaped          <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ sides           <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ since           <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ take            <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ time            <int> 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, …
## $ tired           <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ unlike          <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ using           <int> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, …
## $ washed          <int> 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, …
## $ yet             <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ long            <int> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ product         <int> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ used            <int> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ `’ve`           <int> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ accurate        <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ across          <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ additional      <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ already         <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ bottlesit       <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ brands          <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ child           <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ come            <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ complaint       <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ couldnt         <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ different       <int> 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, …
## $ dishwashed      <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ dont            <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ grain           <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ hasnt           <int> 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, …
## $ held            <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ himfor          <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ just            <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ kid             <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ kids            <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ lots            <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ loves           <int> 0, 0, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 2, 1, 0, 0, …
## $ makes           <int> 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, …
## $ measurements    <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ microwaved      <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ moved           <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ needed          <int> 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, …
## $ outside         <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ preferred       <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ preferring      <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ purchase        <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ purposes        <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ regularly       <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ salt            <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ select          <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ size            <int> 0, 0, 0, 4, 0, 2, 0, 0, 4, 0, 0, 2, 0, 0, 0, 4, 0, 0, …
## $ smudged         <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ superior        <int> 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, …
## $ takes           <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ theyve          <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ think           <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ third           <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ though          <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ wants           <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ way             <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ wellmy          <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ wish            <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ writing         <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ air             <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ airflow         <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ controls        <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ indent          <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ inside          <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ less            <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ lines           <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ little          <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ love            <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ make            <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ notch           <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ sure            <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ anyway          <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ baby            <int> 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 1, 0, 0, 2, 0, 0, 0, …
## $ big             <int> 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, …
## $ bigger          <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ complaints      <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ deal            <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ everything      <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ got             <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ gotten          <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ hang            <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ hold            <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ interchangeable <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ larger          <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ likes           <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ month           <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ old             <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ really          <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ style           <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ thats           <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ boy             <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ boys            <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ breastfeeding   <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ `didn’t`        <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ feed            <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ helps           <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ husband         <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ like            <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ night           <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ tried           <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ caused          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ confusion       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ daughters       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ favorite        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ mine            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ never           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ repurchased     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ sister          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ younger         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
df_train$y <- as.factor(df_train$y)
#df$y
training_set_toy <- df_train
dim(training_set_toy)
## [1] 974 166
#coding the target feature as factor , levels = c(0, 1)
training_set_toy$y = factor(training_set_toy$y,levels = c(0,1))
                            #,labels = c("Don't Buy","Buy" )) 
head(training_set_toy)
##       V1      y bottles bought caved clean  easy formula   get great  hard
##    <int> <fctr>   <int>  <int> <int> <int> <int>   <int> <int> <int> <int>
## 1:     1      1       1      1     1     1     1       1     1     1     1
## 2:     2      0       2      0     0     1     1       0     0     0     0
## 3:     3      1       0      0     0     0     0       0     0     0     0
## 4:     4      1       6      1     0     0     0       0     0     1     0
## 5:     5      1       1      0     0     0     0       0     0     0     0
## 6:     6      1       0      1     0     0     0       0     0     1     0
##    however   lot maintain   new nipples   one  ones  read spilling stars
##      <int> <int>    <int> <int>   <int> <int> <int> <int>    <int> <int>
## 1:       1     1        1     1       1     1     1     1        1     1
## 2:       0     0        0     0       1     1     0     1        0     0
## 3:       0     0        0     0       0     0     0     1        0     0
## 4:       0     0        0     1       4     1     0     1        0     0
## 5:       0     0        0     0       0     0     0     1        0     0
## 6:       0     0        0     0       0     0     0     1        0     0
##    together   top   use water without worked    ’s apart avent  back  bags
##       <int> <int> <int> <int>   <int>  <int> <int> <int> <int> <int> <int>
## 1:        1     1     1     1       1      1     2     0     0     0     0
## 2:        1     0     1     0       0      0     0     1     1     1     1
## 3:        0     0     0     0       0      0     0     0     0     0     0
## 4:        0     0     1     0       0      0     0     0     0     0     0
## 5:        0     0     0     0       0      0     1     0     0     0     0
## 6:        0     0     1     0       0      0     0     0     0     0     0
##    birth bottle daughter   far  good  grew havent hourglass kiinde leaks
##    <int>  <int>    <int> <int> <int> <int>  <int>     <int>  <int> <int>
## 1:     0      0        0     0     0     0      0         0      0     0
## 2:     1      1        1     1     1     1      1         1      1     1
## 3:     0      0        0     0     1     0      0         0      0     0
## 4:     0      3        0     0     0     0      0         0      0     0
## 5:     0      1        0     0     0     0      0         0      0     0
## 6:     0      1        0     0     0     0      0         0      0     0
##    measurments  milk months mouth natural nipple   now paying perfect   put
##          <int> <int>  <int> <int>   <int>  <int> <int>  <int>   <int> <int>
## 1:           0     0      0     0       0      0     0      0       0     0
## 2:           1     1      1     1       1      1     1      1       1     1
## 3:           0     0      0     0       0      0     0      0       0     0
## 4:           0     0      0     0       0      0     0      0       0     0
## 5:           0     0      0     0       0      1     0      0       0     0
## 6:           0     0      0     0       0      0     0      0       0     0
##    quickly replacement shaped sides since  take  time tired unlike using washed
##      <int>       <int>  <int> <int> <int> <int> <int> <int>  <int> <int>  <int>
## 1:       0           0      0     0     0     0     0     0      0     0      0
## 2:       1           1      1     1     1     1     1     1      1     1      1
## 3:       0           0      0     0     0     0     1     0      0     0      0
## 4:       0           0      0     0     0     0     1     0      0     0      1
## 5:       0           0      0     0     0     0     0     0      0     0      0
## 6:       0           0      0     0     0     0     0     0      0     1      0
##      yet  long product  used   ’ve accurate across additional already bottlesit
##    <int> <int>   <int> <int> <int>    <int>  <int>      <int>   <int>     <int>
## 1:     0     0       0     0     0        0      0          0       0         0
## 2:     1     0       0     0     0        0      0          0       0         0
## 3:     0     1       1     1     1        0      0          0       0         0
## 4:     0     0       0     0     0        1      1          1       1         1
## 5:     0     0       0     0     0        0      0          0       0         0
## 6:     0     0       0     0     0        0      0          0       0         0
##    brands child  come complaint couldnt different dishwashed  dont grain hasnt
##     <int> <int> <int>     <int>   <int>     <int>      <int> <int> <int> <int>
## 1:      0     0     0         0       0         0          0     0     0     0
## 2:      0     0     0         0       0         0          0     0     0     0
## 3:      0     0     0         0       0         0          0     0     0     0
## 4:      1     1     1         1       1         2          1     1     1     1
## 5:      0     0     0         0       0         0          0     0     0     0
## 6:      0     0     0         0       0         0          0     0     0     1
##     held himfor  just   kid  kids  lots loves makes measurements microwaved
##    <int>  <int> <int> <int> <int> <int> <int> <int>        <int>      <int>
## 1:     0      0     0     0     0     0     0     0            0          0
## 2:     0      0     0     0     0     0     0     0            0          0
## 3:     0      0     0     0     0     0     0     0            0          0
## 4:     1      1     1     1     1     1     1     2            1          1
## 5:     0      0     0     0     0     0     0     0            0          0
## 6:     0      0     0     0     0     0     0     0            0          0
##    moved needed outside preferred preferring purchase purposes regularly  salt
##    <int>  <int>   <int>     <int>      <int>    <int>    <int>     <int> <int>
## 1:     0      0       0         0          0        0        0         0     0
## 2:     0      0       0         0          0        0        0         0     0
## 3:     0      0       0         0          0        0        0         0     0
## 4:     1      1       1         1          1        1        1         1     1
## 5:     0      0       0         0          0        0        0         0     0
## 6:     0      1       0         0          0        0        0         0     0
##    select  size smudged superior takes theyve think third though wants   way
##     <int> <int>   <int>    <int> <int>  <int> <int> <int>  <int> <int> <int>
## 1:      0     0       0        0     0      0     0     0      0     0     0
## 2:      0     0       0        0     0      0     0     0      0     0     0
## 3:      0     0       0        0     0      0     0     0      0     0     0
## 4:      1     4       1        2     1      1     1     1      1     1     1
## 5:      0     0       0        0     0      0     0     0      0     0     0
## 6:      0     2       0        0     0      0     0     0      0     0     0
##    wellmy  wish writing   air airflow controls indent inside  less lines little
##     <int> <int>   <int> <int>   <int>    <int>  <int>  <int> <int> <int>  <int>
## 1:      0     0       0     0       0        0      0      0     0     0      0
## 2:      0     0       0     0       0        0      0      0     0     0      0
## 3:      0     0       0     0       0        0      0      0     0     0      0
## 4:      1     1       1     0       0        0      0      0     0     0      0
## 5:      0     0       0     1       1        1      1      1     1     1      1
## 6:      0     0       0     0       0        0      0      0     0     0      0
##     love  make notch  sure anyway  baby   big bigger complaints  deal
##    <int> <int> <int> <int>  <int> <int> <int>  <int>      <int> <int>
## 1:     0     0     0     0      0     0     0      0          0     0
## 2:     0     0     0     0      0     0     0      0          0     0
## 3:     0     0     0     0      0     0     0      0          0     0
## 4:     0     0     0     0      0     0     0      0          0     0
## 5:     1     1     1     1      0     0     0      0          0     0
## 6:     0     0     0     0      1     1     2      1          1     1
##    everything   got gotten  hang  hold interchangeable larger likes month   old
##         <int> <int>  <int> <int> <int>           <int>  <int> <int> <int> <int>
## 1:          0     0      0     0     0               0      0     0     0     0
## 2:          0     0      0     0     0               0      0     0     0     0
## 3:          0     0      0     0     0               0      0     0     0     0
## 4:          0     0      0     0     0               0      0     0     0     0
## 5:          0     0      0     0     0               0      0     0     0     0
## 6:          1     1      1     1     1               1      1     1     1     1
##    really style thats   boy  boys breastfeeding didn’t  feed helps husband
##     <int> <int> <int> <int> <int>         <int>  <int> <int> <int>   <int>
## 1:      0     0     0     0     0             0      0     0     0       0
## 2:      0     0     0     0     0             0      0     0     0       0
## 3:      0     0     0     0     0             0      0     0     0       0
## 4:      0     0     0     0     0             0      0     0     0       0
## 5:      0     0     0     0     0             0      0     0     0       0
## 6:      1     1     1     0     0             0      0     0     0       0
##     like night tried caused confusion daughters favorite  mine never
##    <int> <int> <int>  <int>     <int>     <int>    <int> <int> <int>
## 1:     0     0     0      0         0         0        0     0     0
## 2:     0     0     0      0         0         0        0     0     0
## 3:     0     0     0      0         0         0        0     0     0
## 4:     0     0     0      0         0         0        0     0     0
## 5:     0     0     0      0         0         0        0     0     0
## 6:     0     0     0      0         0         0        0     0     0
##    repurchased sister younger
##          <int>  <int>   <int>
## 1:           0      0       0
## 2:           0      0       0
## 3:           0      0       0
## 4:           0      0       0
## 5:           0      0       0
## 6:           0      0       0

g) Import the caret library and create an SVM classification model by training the ‘training_set_toy’ data frame and considering ‘y’ as the response variable.

library(caret)
## Loading required package: ggplot2
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
## 
##     annotate
# Structure of dataset
#str(training_set_toy)
#training_set_toy
train_control = trainControl(method = "cv", number = 5)
set.seed(59)
model <- train(y~., data = training_set_toy, method = "svmLinear", trControl = train_control )
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.

## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.

## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.

## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.

## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.

## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
model
## Support Vector Machines with Linear Kernel 
## 
## 974 samples
## 165 predictors
##   2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 779, 780, 779, 778, 780 
## Resampling results:
## 
##   Accuracy   Kappa
##   0.8788568  0    
## 
## Tuning parameter 'C' was held constant at a value of 1
res<-as_tibble(model$results[which.min(model$results[,2]),])
res
## # A tibble: 1 × 5
##       C Accuracy Kappa AccuracySD KappaSD
##   <dbl>    <dbl> <dbl>      <dbl>   <dbl>
## 1     1    0.879     0    0.00238       0
head(df_test)
##       V1 anyway  baby   big bigger bottle bought complaints  deal everything
##    <int>  <int> <int> <int>  <int>  <int>  <int>      <int> <int>      <int>
## 1:     1      1     1     2      1      1      1          1     1          1
## 2:     2      0     0     0      0      0      0          0     0          0
## 3:     3      0     2     0      0      0      0          0     0          0
## 4:     4      0     0     0      0      0      1          0     0          0
## 5:     5      0     0     0      0      1      0          0     0          0
## 6:     6      0     0     0      0      0      0          0     0          0
##      got gotten great  hang hasnt  hold interchangeable larger likes month
##    <int>  <int> <int> <int> <int> <int>           <int>  <int> <int> <int>
## 1:     1      1     1     1     1     1               1      1     1     1
## 2:     0      0     0     0     0     0               0      0     0     0
## 3:     0      0     0     0     0     0               0      0     0     0
## 4:     0      0     1     0     0     0               0      0     0     0
## 5:     0      0     0     0     0     0               0      0     0     0
## 6:     0      0     0     0     0     0               0      0     0     0
##    needed   old  read really  size style thats   use using caused clean
##     <int> <int> <int>  <int> <int> <int> <int> <int> <int>  <int> <int>
## 1:      1     1     1      1     2     1     1     1     1      0     0
## 2:      0     0     1      0     0     0     0     0     0      1     1
## 3:      0     0     1      0     0     0     0     1     0      0     0
## 4:      0     0     1      0     0     0     0     1     0      0     1
## 5:      0     0     1      0     0     0     0     1     1      0     1
## 6:      0     0     1      0     0     0     0     0     0      1     1
##    confusion daughters  easy favorite leaks little  mine never nipple
##        <int>     <int> <int>    <int> <int>  <int> <int> <int>  <int>
## 1:         0         0     0        0     0      0     0     0      0
## 2:         1         1     1        1     1      1     1     1      1
## 3:         0         0     0        0     0      0     0     0      0
## 4:         0         0     1        0     0      0     0     0      0
## 5:         0         0     1        0     1      0     0     0      1
## 6:         1         1     1        1     1      1     1     1      1
##    repurchased sister younger bottles   boy  boys breastfeeding didn’t  feed
##          <int>  <int>   <int>   <int> <int> <int>         <int>  <int> <int>
## 1:           0      0       0       0     0     0             0      0     0
## 2:           1      1       1       0     0     0             0      0     0
## 3:           0      0       0       1     1     1             1      1     1
## 4:           0      0       0       1     0     0             0      0     0
## 5:           0      0       0       2     0     0             0      0     0
## 6:           1      1       1       0     0     0             0      0     0
##    helps husband  like loves night nipples tried caved formula   get  hard
##    <int>   <int> <int> <int> <int>   <int> <int> <int>   <int> <int> <int>
## 1:     0       0     0     0     0       0     0     0       0     0     0
## 2:     0       0     0     0     0       0     0     0       0     0     0
## 3:     1       1     1     2     1       2     1     0       0     0     0
## 4:     0       0     0     0     0       1     0     1       1     1     1
## 5:     0       0     0     0     0       1     0     0       0     0     0
## 6:     0       0     0     0     0       0     0     0       0     0     0
##    however   lot maintain   new   one  ones spilling stars together   top water
##      <int> <int>    <int> <int> <int> <int>    <int> <int>    <int> <int> <int>
## 1:       0     0        0     0     0     0        0     0        0     0     0
## 2:       0     0        0     0     0     0        0     0        0     0     0
## 3:       0     0        0     0     0     0        0     0        0     0     0
## 4:       1     1        1     1     1     1        1     1        1     1     1
## 5:       0     0        0     0     1     0        0     0        1     0     0
## 6:       0     0        0     0     0     0        0     0        0     0     0
##    without worked    ’s apart avent  back  bags birth daughter   far  good
##      <int>  <int> <int> <int> <int> <int> <int> <int>    <int> <int> <int>
## 1:       0      0     0     0     0     0     0     0        0     0     0
## 2:       0      0     0     0     0     0     0     0        0     0     0
## 3:       0      0     0     0     0     0     0     0        0     0     0
## 4:       1      1     2     0     0     0     0     0        0     0     0
## 5:       0      0     0     1     1     1     1     1        1     1     1
## 6:       0      0     0     0     0     0     0     0        0     0     0
##     grew havent hourglass kiinde measurments  milk months mouth natural   now
##    <int>  <int>     <int>  <int>       <int> <int>  <int> <int>   <int> <int>
## 1:     0      0         0      0           0     0      0     0       0     0
## 2:     0      0         0      0           0     0      0     0       0     0
## 3:     0      0         0      0           0     0      0     0       0     0
## 4:     0      0         0      0           0     0      0     0       0     0
## 5:     1      1         1      1           1     1      1     1       1     1
## 6:     0      0         0      0           0     0      0     0       0     0
##    paying perfect   put quickly replacement shaped sides since  take  time
##     <int>   <int> <int>   <int>       <int>  <int> <int> <int> <int> <int>
## 1:      0       0     0       0           0      0     0     0     0     0
## 2:      0       0     0       0           0      0     0     0     0     0
## 3:      0       0     0       0           0      0     0     0     0     0
## 4:      0       0     0       0           0      0     0     0     0     0
## 5:      1       1     1       1           1      1     1     1     1     1
## 6:      0       0     0       0           0      0     0     0     0     0
##    tired unlike washed   yet  long product  used   ’ve   air airflow controls
##    <int>  <int>  <int> <int> <int>   <int> <int> <int> <int>   <int>    <int>
## 1:     0      0      0     0     0       0     0     0     0       0        0
## 2:     0      0      0     0     0       0     0     0     0       0        0
## 3:     0      0      0     0     0       0     0     0     0       0        0
## 4:     0      0      0     0     0       0     0     0     0       0        0
## 5:     1      1      1     1     0       0     0     0     0       0        0
## 6:     0      0      0     0     0       0     0     0     0       0        0
##    indent inside  less lines  love  make notch  sure accurate across additional
##     <int>  <int> <int> <int> <int> <int> <int> <int>    <int>  <int>      <int>
## 1:      0      0     0     0     0     0     0     0        0      0          0
## 2:      0      0     0     0     0     0     0     0        0      0          0
## 3:      0      0     0     0     0     0     0     0        0      0          0
## 4:      0      0     0     0     0     0     0     0        0      0          0
## 5:      0      0     0     0     0     0     0     0        0      0          0
## 6:      0      0     0     0     0     0     0     0        0      0          0
##    already bottlesit brands child  come complaint couldnt different dishwashed
##      <int>     <int>  <int> <int> <int>     <int>   <int>     <int>      <int>
## 1:       0         0      0     0     0         0       0         0          0
## 2:       0         0      0     0     0         0       0         0          0
## 3:       0         0      0     0     0         0       0         0          0
## 4:       0         0      0     0     0         0       0         0          0
## 5:       0         0      0     0     0         0       0         0          0
## 6:       0         0      0     0     0         0       0         0          0
##     dont grain  held himfor  just   kid  kids  lots makes measurements
##    <int> <int> <int>  <int> <int> <int> <int> <int> <int>        <int>
## 1:     0     0     0      0     0     0     0     0     0            0
## 2:     0     0     0      0     0     0     0     0     0            0
## 3:     0     0     0      0     0     0     0     0     0            0
## 4:     0     0     0      0     0     0     0     0     0            0
## 5:     0     0     0      0     0     0     0     0     0            0
## 6:     0     0     0      0     0     0     0     0     0            0
##    microwaved moved outside preferred preferring purchase purposes regularly
##         <int> <int>   <int>     <int>      <int>    <int>    <int>     <int>
## 1:          0     0       0         0          0        0        0         0
## 2:          0     0       0         0          0        0        0         0
## 3:          0     0       0         0          0        0        0         0
## 4:          0     0       0         0          0        0        0         0
## 5:          0     0       0         0          0        0        0         0
## 6:          0     0       0         0          0        0        0         0
##     salt select smudged superior takes theyve think third though wants   way
##    <int>  <int>   <int>    <int> <int>  <int> <int> <int>  <int> <int> <int>
## 1:     0      0       0        0     0      0     0     0      0     0     0
## 2:     0      0       0        0     0      0     0     0      0     0     0
## 3:     0      0       0        0     0      0     0     0      0     0     0
## 4:     0      0       0        0     0      0     0     0      0     0     0
## 5:     0      0       0        0     0      0     0     0      0     0     0
## 6:     0      0       0        0     0      0     0     0      0     0     0
##    wellmy  wish writing
##     <int> <int>   <int>
## 1:      0     0       0
## 2:      0     0       0
## 3:      0     0       0
## 4:      0     0       0
## 5:      0     0       0
## 6:      0     0       0
dim(df_train)
## [1] 974 166
dim(df_test)
## [1] 325 165
y_pred<- predict(model,df_test)
y_pred
##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## Levels: 0 1
dim(y_pred)
## NULL
dim(testing_data$y)
## NULL

h) Now consider the review column of the test dataset and create a corpus, create a DTM by taking into account the terms of the DTM of the training set.

#glimpse(df_test)
df_test$y <- as.factor(df_test$y)
#df$y
testing_set_toy <- df_test
testing_set_toy$y = factor(testing_set_toy$y)
#str(testing_set_toy)

i) Predict the ratings of the test dataset using the generated classification model and store the results in ‘model_toy_result’. Subtract the rating value by 1 as the classes produced by the classification model are 1 and 2.

model_toy_result <- predict(model, newdata = testing_set_toy) 
#confusionMatrix(model_toy_result, testing_set_toy$y)

model_toy_result
##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## Levels: 0 1