Preface

Classification is a task that requires the use of machine learning algorithms that learn how to assign a class label to example from the problem domain. Classification predictive modeling involves assigning a class label to input examples.

In this section we’re going to use classification predictive model using Naive Bayes Classifier and Decision Tree.

Objective

We’re going to use a data set of “Women Clothing E-Commerce Review” which you can obtain from the link below. and our main objective is:

To predict whether a customer will give a recommendation to a product or not, based on their rating and reviews

Data Source: Womens Clothing E-commerce Reviews

Library Preparation

# Library Preparation
library(tidyverse)
library(tm)
library(RColorBrewer)
library(wordcloud)
library(textstem)
library(rsample)
library(caret)
library(e1071)
library(ROCR)
library(partykit)


Read Data and Data Information

review <- read.csv("Kaggle DataSet/Womens Clothing E-Commerce Reviews/Womens Clothing E-Commerce Reviews.csv")
review

Data Information:

  1. X : index number
  2. Clothing.ID : clothing ID
  3. Age : Customer age
  4. Title : Review’s title
  5. Review.Text : Reviews content
  6. Rating : product rating
  7. Recommended.IND : Either customer recommended the product or not
  8. Positive.Feedback.Count : other customer feedback to the review
  9. Division.Name : Product division
  10. Department.Name : Product department
  11. Class.Name : Product class


Data Wrangling

Data Inspection

Let’s see our data summary

summary(review)
#>        X          Clothing.ID          Age          Title          
#>  Min.   :    0   Min.   :   0.0   Min.   :18.0   Length:23486      
#>  1st Qu.: 5871   1st Qu.: 861.0   1st Qu.:34.0   Class :character  
#>  Median :11742   Median : 936.0   Median :41.0   Mode  :character  
#>  Mean   :11742   Mean   : 918.1   Mean   :43.2                     
#>  3rd Qu.:17614   3rd Qu.:1078.0   3rd Qu.:52.0                     
#>  Max.   :23485   Max.   :1205.0   Max.   :99.0                     
#>  Review.Text            Rating      Recommended.IND  Positive.Feedback.Count
#>  Length:23486       Min.   :1.000   Min.   :0.0000   Min.   :  0.000        
#>  Class :character   1st Qu.:4.000   1st Qu.:1.0000   1st Qu.:  0.000        
#>  Mode  :character   Median :5.000   Median :1.0000   Median :  1.000        
#>                     Mean   :4.196   Mean   :0.8224   Mean   :  2.536        
#>                     3rd Qu.:5.000   3rd Qu.:1.0000   3rd Qu.:  3.000        
#>                     Max.   :5.000   Max.   :1.0000   Max.   :122.000        
#>  Division.Name      Department.Name     Class.Name       
#>  Length:23486       Length:23486       Length:23486      
#>  Class :character   Class :character   Class :character  
#>  Mode  :character   Mode  :character   Mode  :character  
#>                                                          
#>                                                          
#> 

Highlight: No NA value

str(review)
#> 'data.frame':    23486 obs. of  11 variables:
#>  $ X                      : int  0 1 2 3 4 5 6 7 8 9 ...
#>  $ Clothing.ID            : int  767 1080 1077 1049 847 1080 858 858 1077 1077 ...
#>  $ Age                    : int  33 34 60 50 47 49 39 39 24 34 ...
#>  $ Title                  : chr  "" "" "Some major design flaws" "My favorite buy!" ...
#>  $ Review.Text            : chr  "Absolutely wonderful - silky and sexy and comfortable" "Love this dress!  it's sooo pretty.  i happened to find it in a store, and i'm glad i did bc i never would have"| __truncated__ "I had such high hopes for this dress and really wanted it to work for me. i initially ordered the petite small "| __truncated__ "I love, love, love this jumpsuit. it's fun, flirty, and fabulous! every time i wear it, i get nothing but great compliments!" ...
#>  $ Rating                 : int  4 5 3 5 5 2 5 4 5 5 ...
#>  $ Recommended.IND        : int  1 1 0 1 1 0 1 1 1 1 ...
#>  $ Positive.Feedback.Count: int  0 4 0 0 6 4 1 4 0 0 ...
#>  $ Division.Name          : chr  "Initmates" "General" "General" "General Petite" ...
#>  $ Department.Name        : chr  "Intimate" "Dresses" "Dresses" "Bottoms" ...
#>  $ Class.Name             : chr  "Intimates" "Dresses" "Dresses" "Pants" ...

Highlilght: Need adjustment to data type

review

Data cleansing points:

  1. Drop column : X, Clothing.ID, Title
  • Drop column X because X column contained the information of row number only (irrelevant to prediction process)
  • Drop column Clothing.ID because Clothing.ID column contained the information of ID product (irrelevant to prediction process: too specific and we’re not analyzing each product but rather in general)
  • Drop column Title because Review.Text has more description about the review, so Title column wouldn’t be needed except we’re going to analyze in general only from the title review
  1. Change Age column into group range :

Age Group Range:

  • < 36
  • 36 - 55
  • 56 - 70
  • > 70
  1. Change data type:
  • Age to factor
  • Recommended.IND to factor
  • Division.Name to factor
  • Department.Name to factor
  • Class.Name to factor

Data Cleansing

review.filter <- review %>%
  select(-c(X, Clothing.ID, Title)) %>% # Drop column
  mutate(Age = case_when(Age < 36 ~ "< 36", # Grouping Age column into group range
                         Age %in% seq(36, 55) ~ "36 - 55",
                         Age %in% seq(56, 70) ~ "56 - 70",
                         Age > 70 ~ "> 70", 
                         T ~ as.character(Age)),
         Age = as.factor(Age),  # Change data type
         Recommended.IND = as.factor(Recommended.IND),
         Division.Name = as.factor(Division.Name),
         Department.Name = as.factor(Department.Name),
         Class.Name = as.factor(Class.Name))


review.filter

Recheck that the data is not a perfect separation case

table(review$Rating, review$Recommended.IND)
#>    
#>         0     1
#>   1   826    16
#>   2  1471    94
#>   3  1682  1189
#>   4   168  4909
#>   5    25 13106

The frequency table has proven that the recommendation point is not separated according to rating only.

In this section we’re going to inspect whether a customer is giving a recommendation to a product based on their review (text) and rating to a product. In the other words, we’re going to separate 2 section:

  1. Inspect the text review (Text mining) with Naive Bayes classifier
  2. Inspect the rating to a product with decision tree model

Text Mining

Naive Bayes is a learning algorithm commonly applied to text classification. The main idea is to classified a tendency of a certain word (counted by frequency) to a group class (for this particular case, whether a certain word has tendency to group 1 [customer give a recommendation to a product] or group 0 [customer is not giving a recommendation to a product] )

Now let’s prepare our data.

Feature Engineering

We’re going to select only the review contain Review.Text column, and the data whether a customer giving a recommendation to a product (column Recommended.IND)

# Put selected column to a variable called "review_text"
review.text <- review.filter %>% 
  select(Recommended.IND, Review.Text)
review.text

Text Cleansing

we have our data ready. But as you can see below, our reviews is raw:

review.text$Review.Text[1:3]
#> [1] "Absolutely wonderful - silky and sexy and comfortable"                                                                                                                                                                                                                                                                                                                                                                                                                                                               
#> [2] "Love this dress!  it's sooo pretty.  i happened to find it in a store, and i'm glad i did bc i never would have ordered it online bc it's petite.  i bought a petite and am 5'8\".  i love the length on me- hits just a little below the knee.  would definitely be a true midi on someone who is truly petite."                                                                                                                                                                                                    
#> [3] "I had such high hopes for this dress and really wanted it to work for me. i initially ordered the petite small (my usual size) but i found this to be outrageously small. so small in fact that i could not zip it up! i reordered it in petite medium, which was just ok. overall, the top half was comfortable and fit nicely, but the bottom half had a very tight under layer and several somewhat cheap (net) over layers. imo, a major design flaw was the net over layer sewn directly into the zipper - it c"
  1. contained uppercase & lowercase (R is sensitive, meaning the word “A” and “a” will be considered as different word. We need to generalized into more consistent words)
  2. contain number, character, stop words(irrelevant to prediction)

let’s clean our data with the corpus form to support the processing text cleansing.

Create Corpus

To change an object into a corpus form we’re using function Vcorpus()

text.corpus <- VCorpus(VectorSource(review.text$Review.Text))

Change to Lower Case, Remove Character, Number, and Stop words

# 1. Transform all text into lower case
text.corpus <- tm_map(x = text.corpus, content_transformer(tolower))

# 2. Remove all numbers in text
text.corpus <- tm_map(x = text.corpus, removeNumbers)

# 3. remove stop-words
text.corpus <- tm_map(x = text.corpus, removeWords, stopwords("english"))

Create function to replace character to white space

removeChar <- content_transformer(FUN = function(x, pattern){
  gsub(x = x, 
       pattern = pattern, 
       replacement = "") 
})
# 4. remove characters
text.corpus <- tm_map(text.corpus, removeChar, "/")
text.corpus <- tm_map(text.corpus, removeChar, "@")
text.corpus <- tm_map(text.corpus, removeChar, "-")
text.corpus <- tm_map(text.corpus, removeChar, "\\.")

# 5. remove all punctuation
text.corpus <- tm_map(text.corpus, removePunctuation)

# 6. Remove white space
text.corpus <- tm_map(text.corpus, stripWhitespace)

Let’s see our text sample

lapply(text.corpus[1:5]$content, as.character)
#> [[1]]
#> [1] "absolutely wonderful silky sexy comfortable"
#> 
#> [[2]]
#> [1] "love dress sooo pretty happened find store glad bc never ordered online bc petite bought petite love length hits just little knee definitely true midi someone truly petite"
#> 
#> [[3]]
#> [1] " high hopes dress really wanted work initially ordered petite small usual size found outrageously small small fact zip reordered petite medium just ok overall top half comfortable fit nicely bottom half tight layer several somewhat cheap net layers imo major design flaw net layer sewn directly zipper c"
#> 
#> [[4]]
#> [1] " love love love jumpsuit fun flirty fabulous every time wear get nothing great compliments"
#> 
#> [[5]]
#> [1] " shirt flattering due adjustable front tie perfect length wear leggings sleeveless pairs well cardigan love shirt"

Lemmatized Words

text.corpus <- lemmatize_strings(lapply(text.corpus$content, as.character), dictionary = lexicon::hash_lemmas)

Document-Term Matrix (DTM)

To get the meaning a word has to our target variable, let’s converting text data into mathematical matrices using DocumentTermMatrix() function. Document-Term Matrix convert our rows to matrix, the rows of the matrix represent the sentences from the data which needs to be analyzed and the columns of the matrix represent the word. The dice under the matrix represent the number of occurrences of the words.

Let’s create the DTM and see the result:

#Create dtm 
text.dtm <- DocumentTermMatrix(x = text.corpus)

#inspect dtm result
inspect(text.dtm)
#> <<DocumentTermMatrix (documents: 23486, terms: 12812)>>
#> Non-/sparse entries: 561501/300341131
#> Sparsity           : 100%
#> Maximal term length: 32
#> Weighting          : term frequency (tf)
#> Sample             :
#>        Terms
#> Docs    color dress fit good like look love size top wear
#>   11072     0     0   2    0    0    1    0    2   1    1
#>   12348     0     0   0    0    1    1    0    1   0    2
#>   1238      0     1   1    1    1    1    0    1   2    2
#>   15453     0     0   0    2    0    0    0    1   0    0
#>   15501     0     2   0    0    0    0    1    0   0    5
#>   21091     2     0   0    0    1    1    0    2   0    1
#>   21176     0     0   0    0    0    1    1    1   0    2
#>   3474      0     0   3    1    0    0    0    0   0    1
#>   3883      1     0   1    2    0    0    2    0   2    0
#>   5448      1     0   1    0    1    3    0    0   1    0

Cross Validation

To prepare our model, let’s split our data to training (used for model building), and data testing (to test our prediction).

Splitting to Data Train and Data Testing

RNGkind(sample.kind = "Rounding") 
set.seed(77)


# create sample index for splitting data
num.row <- sample(nrow(text.dtm), nrow(text.dtm)*0.75)


# Split data into data train and testing
text.train <- text.dtm[num.row,]
text.test <- text.dtm[-num.row,]

Subsetting Actual Value for Data Test & Data Train

We need to save our actual value from Recommended.IND column according to text.train index and text.test index for further cross validation process:

#Actual value (target variable) for data train
train.actual <- review.text[num.row,"Recommended.IND"]

#Actual value (target variable) for data train
test.actual <-review.text[-num.row, "Recommended.IND"]

Count Word Frequency

As we already have the text matrix (count the number of occurrences of the words), now we’re going to limit the words list into only the most frequent words. Considering our big data I’m going to set the lowest frequency to 100

# Limit the words list which has frequency > 100 to our data train
text.frequency <- findFreqTerms(text.train, lowfreq = 100)

# Subset text frequency to our data train according to the condition
text.train <- text.train[, text.frequency]

# Inspect data train
inspect(text.train)
#> <<DocumentTermMatrix (documents: 17614, terms: 640)>>
#> Non-/sparse entries: 355719/10917241
#> Sparsity           : 97%
#> Maximal term length: 13
#> Weighting          : term frequency (tf)
#> Sample             :
#>        Terms
#> Docs    color dress fit good like look love size top wear
#>   10140     3     0   2    1    1    0    0    1   1    0
#>   12348     0     0   0    0    1    1    0    1   0    2
#>   1238      0     1   1    1    1    1    0    1   2    2
#>   12812     0     0   0    0    0    1    1    0   0    1
#>   12994     0     0   2    0    0    1    2    0   4    0
#>   14963     0     0   0    1    3    0    1    0   5    1
#>   21091     2     0   0    0    1    1    0    2   0    1
#>   22039     0     1   2    1    2    0    0    1   4    1
#>   5448      1     0   1    0    1    3    0    0   1    0
#>   6317      0     0   0    1    2    5    0    0   3    2

Create function to convert our words into True and False

condition : If a certain word appear (> 0) in a row then set as True/ 1, if not appear (word == 0 then set as False/0)

bernoulli_conv <- function(x){
  x <- as.factor(ifelse(x > 0, 1, 0)) 
  return(x)
}
# Apply function into our data train and data test
text.train.bn <- apply(X = text.train, MARGIN = 2, FUN = bernoulli_conv)
text.test.bn <- apply(X = text.test, MARGIN = 2, FUN = bernoulli_conv)
#Quick check to result
text.train.bn[1:5, 1:6]
#>        Terms
#> Docs    absolutely comfortable sexy wonderful buy definitely
#>   6842  "0"        "1"         "0"  "0"       "0" "0"       
#>   16850 "0"        "0"         "0"  "0"       "0" "0"       
#>   20253 "0"        "0"         "0"  "0"       "0" "0"       
#>   22311 "0"        "0"         "0"  "0"       "0" "0"       
#>   17350 "0"        "0"         "0"  "0"       "0" "0"

Model Building & Prediction

As our preparation is set, now we can build our model with naive bayes:

Model Building

# Build model with data train, y is actual value from data set
model.nb <- naiveBayes(x=text.train.bn , y=train.actual)

Prediction

We have our naive bayes model ready, let’s try our model prediction to our data test:

text.prediction <- predict(model.nb, newdata = text.test.bn)

#Quick check our model prediction
text.prediction[1:5]
#> [1] 0 0 1 1 0
#> Levels: 0 1

Model Evaluation

confusionMatrix(data = text.prediction, reference = test.actual, positive="1")
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction    0    1
#>          0  672  411
#>          1  356 4433
#>                                              
#>                Accuracy : 0.8694             
#>                  95% CI : (0.8605, 0.8779)   
#>     No Information Rate : 0.8249             
#>     P-Value [Acc > NIR] : <0.0000000000000002
#>                                              
#>                   Kappa : 0.5571             
#>                                              
#>  Mcnemar's Test P-Value : 0.0512             
#>                                              
#>             Sensitivity : 0.9152             
#>             Specificity : 0.6537             
#>          Pos Pred Value : 0.9257             
#>          Neg Pred Value : 0.6205             
#>              Prevalence : 0.8249             
#>          Detection Rate : 0.7549             
#>    Detection Prevalence : 0.8156             
#>       Balanced Accuracy : 0.7844             
#>                                              
#>        'Positive' Class : 1                  
#> 

Confusion Matrix Interpretation:

  1. Accuracy : 86%
  • Means that our model 86% TRUE predicted True Positive and True Negative (customer indeed IS giving a recommendation, and customer IS NOT giving a recommendation)
  1. Highlight point: Pos Pred Value 92% (Precision)
  • Means that our model accurately predicted 92% TRUE POSITIVE (True as indeed customer giving a recommendation) against the total of True Positive and False Positive (total our positive class prediction). As we don’t wanted to get wrong prediction about whether a customer will give the recommendation and in reality they don’t, that’s why we point out precision value.

Text Visualization

To get words that could best describe our customer review, let’s see the list of words that has high frequency to our analysis.

  1. Create data frame to our text.dtm
text.words <- as.matrix(text.dtm)
words.list <- sort(colSums(text.words), decreasing = T)

words.df <- data.frame(word = names(words.list), freq=words.list)
words.df
  1. Visualization (Bar plot)
#prepare color for plot
colors.viz <- brewer.pal(10, "Set3")

#Visualization
barplot(height = words.df$freq[1:10], names = words.df$word[1:10], main = "Most Frequent Words", col = colors.viz)

From the bar plot we can see that the most frequent words used are: Dress, fit, size, love and so on.

Insights:

  1. Our customer reviews frequently mentioned about dress, fit (can be interpreted as fitting), and size

  2. We can assumed that our most frequent reviewed product are dress and top

  3. We can assumed that many customers are concerned about fitting, size, and color

  4. Visualization (Word Cloud)

As we can see a glimpse of most frequent used words in above bar plot, notice that using bar plot can’t so much help us to get more information and get more words that can describe our customers reviews (we can’t plot 100 words to a bar plot right?).

Therefore, we’re going to create word cloud to catch more words that frequenly appear in our review text :

#Prepare color palette
colors.wc <- brewer.pal(10, "Spectral")

#take 200 most frequent words in our text review
wordcloud(words = words.df$word, freq = words.df$freq, 
          max.words = 200, random.order = F,
          colors = colors.wc)

As more words appear in word cloud visualization, we can take more idea about our customer reviews:

Insight

  1. There are some words related to fitting size (small, long, short, big, large, length,…), these words, again emphasized that our customer concerned about product fitting
  2. There are some words that might be related to product material quality (quality, material, comfortable, fabric, soft)
  3. Positive words that related to reviews in general: love, like, great, perfect, cute, beautiful, nice, pretty, and so on.

Text Mining Conclusion

As we have seen our model evaluation (using Confusion Matrix), we can conclude :

Our text mining prediction has great performance in accuracy, especially at precission as our main concerened to our model. We can also oulined the most frequent words appeared in our customer reviews are dress, fit, and size



Naive Bayes

What if our concern is to get better prediction focusing at point of the customer won’t giving a recommendation? So we can evaluate our product which has low recommendation? (Focus on True Negative)

If our focus is according to above condition, the point we need to pay more attention is the specificity rate. As we know that our specificity rate is quite low (65%), we can do some adjustment to develop our model performance by balancing our data so that the data has balance proportion to it’s target variable.

prop.table(table(review.text$Recommended.IND))
#> 
#>         0         1 
#> 0.1776377 0.8223623

the balance proportion of target variable might help to our model to learn fairly for both class (recommend and not recommend review), and yet considering our data has more information that hasn’t been analyze, let’s try to use them instead.

Le’s see our data.frame again to recall how it looks like

review.filter

Now we won’t use the text review, we’re going to see that variables that may affecting to Recommended.IND, let’s drop Review.Text column:

review.non.text <- review.filter %>% select(-Review.Text)
review.non.text

Cross Validation

Le’s prepare our data training and data testing:

RNGkind(sample.kind = "Rounding")
set.seed(77)


index.num <- initial_split(review.non.text, prop = 0.75, strata = Recommended.IND)
review.train <- training(index.num)
review.test <- testing(index.num)
nrow(review.train)
#> [1] 17614
nrow(review.test)
#> [1] 5872

Build Model (Naive Bayes)

Prepare our model to our data train :

model.nb.nonText <- naiveBayes(Recommended.IND ~ ., data = review.train)

Prediction

Let’s create model prediction and put argument class to type parameter do we get the result prediction of each class target

recommend.prediction <- predict(model.nb.nonText, newdata = review.test , type="class")

Model Evaluation

confusionMatrix(data = recommend.prediction, reference = review.test$Recommended.IND , positive = "1")
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction    0    1
#>          0  995  324
#>          1   48 4505
#>                                                
#>                Accuracy : 0.9366               
#>                  95% CI : (0.9301, 0.9427)     
#>     No Information Rate : 0.8224               
#>     P-Value [Acc > NIR] : < 0.00000000000000022
#>                                                
#>                   Kappa : 0.8035               
#>                                                
#>  Mcnemar's Test P-Value : < 0.00000000000000022
#>                                                
#>             Sensitivity : 0.9329               
#>             Specificity : 0.9540               
#>          Pos Pred Value : 0.9895               
#>          Neg Pred Value : 0.7544               
#>              Prevalence : 0.8224               
#>          Detection Rate : 0.7672               
#>    Detection Prevalence : 0.7754               
#>       Balanced Accuracy : 0.9434               
#>                                                
#>        'Positive' Class : 1                    
#> 

Confusion Matrix Interpretation

  1. Accuracy 93%
  • To predict target class with non text variable, our model has a high rate accuracy 93% True to predict class positive and negative
  1. Sensitivity 93%
  • means that our model has 93% TRUE at predicting class positive against all of our actual positive class (7 % predicted as negative class)
  1. Specificity : 95%
  • means that our model has 95% TRUE at predicting class negative against of all of our negative prediction (5% predicted as positive class)
  1. Pos Pred Value 98%
  • means that our model has 98% TRUE at predicting class positive out of all our positive class prediction (2% false at predicting positive class)

ROC AUC

ROC

Our naive bayes model classifier has a very good rate of predicting our target variable (non text variable), where can see the result and evaluation from the confusion matrix. Another way to see how well our model predicting class of our target variable is to see the ROC and AUC.

ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. Let’s try to create the ROC to evaluate our model

  1. Save our prediction, and put the argument “raw” to type parameter. Rather than giving our prediction directly to each class, we’ll use the probability rate instead.
review.test$prediction <- predict(model.nb.nonText, newdata = review.test , type="raw")
head(review.test$prediction, 3)
#>                0         1
#> [1,] 0.001733000 0.9982670
#> [2,] 0.001077287 0.9989227
#> [3,] 0.029367568 0.9706324

As we can see above

# object prediction
ROC.prediction <- prediction(predictions = review.test$prediction[,"1"], labels = review.test$Recommended.IND)
# ROC curve
par(mfrow=c(1,2))
plot(performance(prediction.obj = ROC.prediction, measure = "tpr", x.measure = "fpr"), col = "green", main = "True Positive ROC")
abline(0,1, lty= 2)
plot(performance(prediction.obj = ROC.prediction, measure = "tnr", x.measure = "fnr"), col = "red", main = "True Negative ROC")
abline(0,1, lty= 2)

ROC Interpretation

Plot 1 (True Positive ROC)

  1. Y axis (True Positive Rate), X axis (False Positive Rate)

As we see the plot above the curve (green line) is close to 1 (or approximately 1) to Y axis. The curve is showing us that our model has great performance at classifying positive class, that’s why the curve is close to 1, True Positive Rate (Y axis), rather to False Positive rate (x axis)

Note : If our model has low rate/performance at classifying class, say True Positive Rate is 50 % and False Positive 50 %, the green line will be close at the diagonal line

Plot 2 (True Negative Rate)

  1. Y axis (True Negative Rate), X axis (False Negative Rate)

The interpretation of the curve is just exactly like the plot 1 has, but in this case our curve showing True Negative value(red line curve) against false negative. Look closely to both curve, there’s a slight differences between plots. Apparently our plot 2 has curve is much closer to point 1 in the Y axis, means that our model has better rate to classifying True Negative class (compared to True Positive class).

AUC

AUC stands for “Area under the ROC Curve.” That is, AUC measures the entire two-dimensional area underneath the entire ROC curve. The range of AUC is 0 to 1.

The closer AUC rate to 1, the more our model capable of classifying Negative and Positive class. Note that at the ROC, we can see the comparison of true positive and the false positive (or true negative and false negative), in AUC the rate is representing a binary classification model’s ability to separate positive classes from negative classes.

As we see from both curve at ROC, we might get AUC rate close to 1. Let’s check our AUC:

#create AUC object
AUC.review <- performance(prediction.obj = ROC.prediction, measure = "auc")

str(AUC.review)
#> Formal class 'performance' [package "ROCR"] with 6 slots
#>   ..@ x.name      : chr "None"
#>   ..@ y.name      : chr "Area under the ROC curve"
#>   ..@ alpha.name  : chr "none"
#>   ..@ x.values    : list()
#>   ..@ y.values    :List of 1
#>   .. ..$ : num 0.972
#>   ..@ alpha.values: list()
#Inspect AUC value
AUC.review@y.values
#> [[1]]
#> [1] 0.9715397

As expected our AUC rate is high and really close to 1. Once again it’s verified our model performance.

Decision Tree

Decision Tree is a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. Let’s recall our data frame:

review.non.text

From our data frame, we may assume that recommendation value is might be correlated to rating value. Let’s check :

table(review.train$Recommended.IND, review.train$Rating)
#>    
#>        1    2    3    4    5
#>   0  637 1086 1259  130   17
#>   1   14   74  907 3653 9837

There’s a chance of people giving rating 5 to product but not giving a recommendation, and there’s also a chance of people giving rating 1 to a product and yet still recommend the product.

Decision Tree Model

For now, let’s create the decision tree and interpret the result to get clearer the idea of decision tree.

#create decision tree
model.dtree <- ctree(formula = Recommended.IND ~., #create formula (target variable ~ . (using all variable as predictor))
                     data = review.train, # use our data train to create model
                     control = ctree_control(mincriterion=.95, #Set alpha: 0.95, splitting node where p-value < 0.05  
                                             minsplit=0, #minimum number of observation in each internal node
                                             minbucket=0)) #minimum number of observation in each terminal node
plot(model.dtree, type = "simple")

Decision Tree Interpretation

Note: as seen above, we can see that the majority variable that splitting our tree is Rating. This condition indicating that our data has few variable that effecting the target variable, (in the other word, rating is the major effect to recommendation value), it might not quite useful to use decision tree to our case. But for now let’s inspect the result and we can still get some insights :

Condition :

recommendation == 0 (Customer is not giving a recommendation to a product)

recommendation == 1 (Customer is giving a recommendation to a product)

Root Node (Rating <= 3 & > 3)

When Rating is <= 3

  • Rating 2 <= 3

recommendation is == 0

with the chance 58.1%

  • Rating 1 <= 2

recommendation is == 0

with the chance 93.6 %

– Rating <= 1

recommendation is == 0

with the chance 97.8%

When Rating is > 3

  • Rating > 4

recommendation is == 1

with the chance 99.2%

  • Rating 3 <= 4 and Department.Name (Bottom, intimate, Tops)

recommendation is == 1

with the chance 97.3 %

  • Rating 3 <= 4 and Department.Name (Dresses, Jacket, Trend)

with the chance 94.9 %

Prediction and Model Evaluation

As we’ve seen our decision tree model, let’s see decision tree prediction and evaluate them

predict.dtree <- predict(model.dtree, newdata =review.test)
confusionMatrix(data = predict.dtree, reference = review.test$Recommended.IND, positive = "1")
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction    0    1
#>          0  997  304
#>          1   46 4525
#>                                                
#>                Accuracy : 0.9404               
#>                  95% CI : (0.934, 0.9463)      
#>     No Information Rate : 0.8224               
#>     P-Value [Acc > NIR] : < 0.00000000000000022
#>                                                
#>                   Kappa : 0.814                
#>                                                
#>  Mcnemar's Test P-Value : < 0.00000000000000022
#>                                                
#>             Sensitivity : 0.9370               
#>             Specificity : 0.9559               
#>          Pos Pred Value : 0.9899               
#>          Neg Pred Value : 0.7663               
#>              Prevalence : 0.8224               
#>          Detection Rate : 0.7706               
#>    Detection Prevalence : 0.7784               
#>       Balanced Accuracy : 0.9465               
#>                                                
#>        'Positive' Class : 1                    
#> 

Confusion Matrix Interpretation

Our decision model has also high rate at accuracy 94%, sensitivity, 93 %, specificity 95 %, and pos pred value 98 %. Overall our decision tree has a very good performance to predict a our target variable.

Summary

As we have evaluated all our models, we can conclude that all models have good performance to predict target variable.

  1. With only text reviews, apparently our model has quite good performance at predicting our target variable. As the main concerned of our model is precision rate, our model is considered has great performance.

2.Without text review, both naive bayes and decision tree has high performance at predicting our target variable. And yet for this specific case, my own preference is to use the naive bayes model, as it has great performance and also has a low computational load to apply it.