Classification is a task that requires the use of machine learning algorithms that learn how to assign a class label to example from the problem domain. Classification predictive modeling involves assigning a class label to input examples.
In this section we’re going to use classification predictive model using Naive Bayes Classifier and Decision Tree.
We’re going to use a data set of “Women Clothing E-Commerce Review” which you can obtain from the link below. and our main objective is:
To predict whether a customer will give a recommendation to a product or not, based on their rating and reviews
Data Source: Womens Clothing E-commerce Reviews
# Library Preparation
library(tidyverse)
library(tm)
library(RColorBrewer)
library(wordcloud)
library(textstem)
library(rsample)
library(caret)
library(e1071)
library(ROCR)
library(partykit)review <- read.csv("Kaggle DataSet/Womens Clothing E-Commerce Reviews/Womens Clothing E-Commerce Reviews.csv")
reviewData Information:
X : index numberClothing.ID : clothing IDAge : Customer ageTitle : Review’s titleReview.Text : Reviews contentRating : product ratingRecommended.IND : Either customer recommended the
product or notPositive.Feedback.Count : other customer feedback to
the reviewDivision.Name : Product divisionDepartment.Name : Product departmentClass.Name : Product classLet’s see our data summary
summary(review)#> X Clothing.ID Age Title
#> Min. : 0 Min. : 0.0 Min. :18.0 Length:23486
#> 1st Qu.: 5871 1st Qu.: 861.0 1st Qu.:34.0 Class :character
#> Median :11742 Median : 936.0 Median :41.0 Mode :character
#> Mean :11742 Mean : 918.1 Mean :43.2
#> 3rd Qu.:17614 3rd Qu.:1078.0 3rd Qu.:52.0
#> Max. :23485 Max. :1205.0 Max. :99.0
#> Review.Text Rating Recommended.IND Positive.Feedback.Count
#> Length:23486 Min. :1.000 Min. :0.0000 Min. : 0.000
#> Class :character 1st Qu.:4.000 1st Qu.:1.0000 1st Qu.: 0.000
#> Mode :character Median :5.000 Median :1.0000 Median : 1.000
#> Mean :4.196 Mean :0.8224 Mean : 2.536
#> 3rd Qu.:5.000 3rd Qu.:1.0000 3rd Qu.: 3.000
#> Max. :5.000 Max. :1.0000 Max. :122.000
#> Division.Name Department.Name Class.Name
#> Length:23486 Length:23486 Length:23486
#> Class :character Class :character Class :character
#> Mode :character Mode :character Mode :character
#>
#>
#>
Highlight: No NA value
str(review)#> 'data.frame': 23486 obs. of 11 variables:
#> $ X : int 0 1 2 3 4 5 6 7 8 9 ...
#> $ Clothing.ID : int 767 1080 1077 1049 847 1080 858 858 1077 1077 ...
#> $ Age : int 33 34 60 50 47 49 39 39 24 34 ...
#> $ Title : chr "" "" "Some major design flaws" "My favorite buy!" ...
#> $ Review.Text : chr "Absolutely wonderful - silky and sexy and comfortable" "Love this dress! it's sooo pretty. i happened to find it in a store, and i'm glad i did bc i never would have"| __truncated__ "I had such high hopes for this dress and really wanted it to work for me. i initially ordered the petite small "| __truncated__ "I love, love, love this jumpsuit. it's fun, flirty, and fabulous! every time i wear it, i get nothing but great compliments!" ...
#> $ Rating : int 4 5 3 5 5 2 5 4 5 5 ...
#> $ Recommended.IND : int 1 1 0 1 1 0 1 1 1 1 ...
#> $ Positive.Feedback.Count: int 0 4 0 0 6 4 1 4 0 0 ...
#> $ Division.Name : chr "Initmates" "General" "General" "General Petite" ...
#> $ Department.Name : chr "Intimate" "Dresses" "Dresses" "Bottoms" ...
#> $ Class.Name : chr "Intimates" "Dresses" "Dresses" "Pants" ...
Highlilght: Need adjustment to data type
reviewData cleansing points:
X, Clothing.ID,
TitleX because X
column contained the information of row number only (irrelevant to
prediction process)Clothing.ID because
Clothing.ID column contained the information of ID product
(irrelevant to prediction process: too specific and we’re not analyzing
each product but rather in general)Title because
Review.Text has more description about the review, so
Title column wouldn’t be needed except we’re going to
analyze in general only from the title reviewAge column into group range :Age Group Range:
Age to factorRecommended.IND to factorDivision.Name to factorDepartment.Name to factorClass.Name to factorreview.filter <- review %>%
select(-c(X, Clothing.ID, Title)) %>% # Drop column
mutate(Age = case_when(Age < 36 ~ "< 36", # Grouping Age column into group range
Age %in% seq(36, 55) ~ "36 - 55",
Age %in% seq(56, 70) ~ "56 - 70",
Age > 70 ~ "> 70",
T ~ as.character(Age)),
Age = as.factor(Age), # Change data type
Recommended.IND = as.factor(Recommended.IND),
Division.Name = as.factor(Division.Name),
Department.Name = as.factor(Department.Name),
Class.Name = as.factor(Class.Name))
review.filterRecheck that the data is not a perfect separation case
table(review$Rating, review$Recommended.IND)#>
#> 0 1
#> 1 826 16
#> 2 1471 94
#> 3 1682 1189
#> 4 168 4909
#> 5 25 13106
The frequency table has proven that the recommendation point is not separated according to rating only.
In this section we’re going to inspect whether a customer is giving a recommendation to a product based on their review (text) and rating to a product. In the other words, we’re going to separate 2 section:
Naive Bayes is a learning algorithm commonly applied to text classification. The main idea is to classified a tendency of a certain word (counted by frequency) to a group class (for this particular case, whether a certain word has tendency to group 1 [customer give a recommendation to a product] or group 0 [customer is not giving a recommendation to a product] )
Now let’s prepare our data.
We’re going to select only the review contain
Review.Text column, and the data whether a customer giving
a recommendation to a product (column Recommended.IND)
# Put selected column to a variable called "review_text"
review.text <- review.filter %>%
select(Recommended.IND, Review.Text)
review.textwe have our data ready. But as you can see below, our reviews is raw:
review.text$Review.Text[1:3]#> [1] "Absolutely wonderful - silky and sexy and comfortable"
#> [2] "Love this dress! it's sooo pretty. i happened to find it in a store, and i'm glad i did bc i never would have ordered it online bc it's petite. i bought a petite and am 5'8\". i love the length on me- hits just a little below the knee. would definitely be a true midi on someone who is truly petite."
#> [3] "I had such high hopes for this dress and really wanted it to work for me. i initially ordered the petite small (my usual size) but i found this to be outrageously small. so small in fact that i could not zip it up! i reordered it in petite medium, which was just ok. overall, the top half was comfortable and fit nicely, but the bottom half had a very tight under layer and several somewhat cheap (net) over layers. imo, a major design flaw was the net over layer sewn directly into the zipper - it c"
let’s clean our data with the corpus form to support the processing text cleansing.
To change an object into a corpus form we’re using function
Vcorpus()
text.corpus <- VCorpus(VectorSource(review.text$Review.Text))# 1. Transform all text into lower case
text.corpus <- tm_map(x = text.corpus, content_transformer(tolower))
# 2. Remove all numbers in text
text.corpus <- tm_map(x = text.corpus, removeNumbers)
# 3. remove stop-words
text.corpus <- tm_map(x = text.corpus, removeWords, stopwords("english"))Create function to replace character to white space
removeChar <- content_transformer(FUN = function(x, pattern){
gsub(x = x,
pattern = pattern,
replacement = "")
})# 4. remove characters
text.corpus <- tm_map(text.corpus, removeChar, "/")
text.corpus <- tm_map(text.corpus, removeChar, "@")
text.corpus <- tm_map(text.corpus, removeChar, "-")
text.corpus <- tm_map(text.corpus, removeChar, "\\.")
# 5. remove all punctuation
text.corpus <- tm_map(text.corpus, removePunctuation)
# 6. Remove white space
text.corpus <- tm_map(text.corpus, stripWhitespace)Let’s see our text sample
lapply(text.corpus[1:5]$content, as.character)#> [[1]]
#> [1] "absolutely wonderful silky sexy comfortable"
#>
#> [[2]]
#> [1] "love dress sooo pretty happened find store glad bc never ordered online bc petite bought petite love length hits just little knee definitely true midi someone truly petite"
#>
#> [[3]]
#> [1] " high hopes dress really wanted work initially ordered petite small usual size found outrageously small small fact zip reordered petite medium just ok overall top half comfortable fit nicely bottom half tight layer several somewhat cheap net layers imo major design flaw net layer sewn directly zipper c"
#>
#> [[4]]
#> [1] " love love love jumpsuit fun flirty fabulous every time wear get nothing great compliments"
#>
#> [[5]]
#> [1] " shirt flattering due adjustable front tie perfect length wear leggings sleeveless pairs well cardigan love shirt"
text.corpus <- lemmatize_strings(lapply(text.corpus$content, as.character), dictionary = lexicon::hash_lemmas)To get the meaning a word has to our target variable, let’s
converting text data into mathematical matrices using
DocumentTermMatrix() function. Document-Term Matrix convert
our rows to matrix, the rows of the matrix represent the sentences from
the data which needs to be analyzed and the columns of the matrix
represent the word. The dice under the matrix represent the number of
occurrences of the words.
Let’s create the DTM and see the result:
#Create dtm
text.dtm <- DocumentTermMatrix(x = text.corpus)
#inspect dtm result
inspect(text.dtm)#> <<DocumentTermMatrix (documents: 23486, terms: 12812)>>
#> Non-/sparse entries: 561501/300341131
#> Sparsity : 100%
#> Maximal term length: 32
#> Weighting : term frequency (tf)
#> Sample :
#> Terms
#> Docs color dress fit good like look love size top wear
#> 11072 0 0 2 0 0 1 0 2 1 1
#> 12348 0 0 0 0 1 1 0 1 0 2
#> 1238 0 1 1 1 1 1 0 1 2 2
#> 15453 0 0 0 2 0 0 0 1 0 0
#> 15501 0 2 0 0 0 0 1 0 0 5
#> 21091 2 0 0 0 1 1 0 2 0 1
#> 21176 0 0 0 0 0 1 1 1 0 2
#> 3474 0 0 3 1 0 0 0 0 0 1
#> 3883 1 0 1 2 0 0 2 0 2 0
#> 5448 1 0 1 0 1 3 0 0 1 0
To prepare our model, let’s split our data to training (used for model building), and data testing (to test our prediction).
RNGkind(sample.kind = "Rounding")
set.seed(77)
# create sample index for splitting data
num.row <- sample(nrow(text.dtm), nrow(text.dtm)*0.75)
# Split data into data train and testing
text.train <- text.dtm[num.row,]
text.test <- text.dtm[-num.row,]We need to save our actual value from Recommended.IND
column according to text.train index and
text.test index for further cross validation process:
#Actual value (target variable) for data train
train.actual <- review.text[num.row,"Recommended.IND"]
#Actual value (target variable) for data train
test.actual <-review.text[-num.row, "Recommended.IND"]As we already have the text matrix (count the number of occurrences of the words), now we’re going to limit the words list into only the most frequent words. Considering our big data I’m going to set the lowest frequency to 100
# Limit the words list which has frequency > 100 to our data train
text.frequency <- findFreqTerms(text.train, lowfreq = 100)
# Subset text frequency to our data train according to the condition
text.train <- text.train[, text.frequency]
# Inspect data train
inspect(text.train)#> <<DocumentTermMatrix (documents: 17614, terms: 640)>>
#> Non-/sparse entries: 355719/10917241
#> Sparsity : 97%
#> Maximal term length: 13
#> Weighting : term frequency (tf)
#> Sample :
#> Terms
#> Docs color dress fit good like look love size top wear
#> 10140 3 0 2 1 1 0 0 1 1 0
#> 12348 0 0 0 0 1 1 0 1 0 2
#> 1238 0 1 1 1 1 1 0 1 2 2
#> 12812 0 0 0 0 0 1 1 0 0 1
#> 12994 0 0 2 0 0 1 2 0 4 0
#> 14963 0 0 0 1 3 0 1 0 5 1
#> 21091 2 0 0 0 1 1 0 2 0 1
#> 22039 0 1 2 1 2 0 0 1 4 1
#> 5448 1 0 1 0 1 3 0 0 1 0
#> 6317 0 0 0 1 2 5 0 0 3 2
Create function to convert our words into True and False
condition : If a certain word appear (> 0) in a row then set as True/ 1, if not appear (word == 0 then set as False/0)
bernoulli_conv <- function(x){
x <- as.factor(ifelse(x > 0, 1, 0))
return(x)
}# Apply function into our data train and data test
text.train.bn <- apply(X = text.train, MARGIN = 2, FUN = bernoulli_conv)
text.test.bn <- apply(X = text.test, MARGIN = 2, FUN = bernoulli_conv)#Quick check to result
text.train.bn[1:5, 1:6]#> Terms
#> Docs absolutely comfortable sexy wonderful buy definitely
#> 6842 "0" "1" "0" "0" "0" "0"
#> 16850 "0" "0" "0" "0" "0" "0"
#> 20253 "0" "0" "0" "0" "0" "0"
#> 22311 "0" "0" "0" "0" "0" "0"
#> 17350 "0" "0" "0" "0" "0" "0"
As our preparation is set, now we can build our model with naive bayes:
# Build model with data train, y is actual value from data set
model.nb <- naiveBayes(x=text.train.bn , y=train.actual)We have our naive bayes model ready, let’s try our model prediction to our data test:
text.prediction <- predict(model.nb, newdata = text.test.bn)
#Quick check our model prediction
text.prediction[1:5]#> [1] 0 0 1 1 0
#> Levels: 0 1
confusionMatrix(data = text.prediction, reference = test.actual, positive="1")#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction 0 1
#> 0 672 411
#> 1 356 4433
#>
#> Accuracy : 0.8694
#> 95% CI : (0.8605, 0.8779)
#> No Information Rate : 0.8249
#> P-Value [Acc > NIR] : <0.0000000000000002
#>
#> Kappa : 0.5571
#>
#> Mcnemar's Test P-Value : 0.0512
#>
#> Sensitivity : 0.9152
#> Specificity : 0.6537
#> Pos Pred Value : 0.9257
#> Neg Pred Value : 0.6205
#> Prevalence : 0.8249
#> Detection Rate : 0.7549
#> Detection Prevalence : 0.8156
#> Balanced Accuracy : 0.7844
#>
#> 'Positive' Class : 1
#>
Confusion Matrix Interpretation:
To get words that could best describe our customer review, let’s see the list of words that has high frequency to our analysis.
text.dtmtext.words <- as.matrix(text.dtm)
words.list <- sort(colSums(text.words), decreasing = T)
words.df <- data.frame(word = names(words.list), freq=words.list)
words.df#prepare color for plot
colors.viz <- brewer.pal(10, "Set3")
#Visualization
barplot(height = words.df$freq[1:10], names = words.df$word[1:10], main = "Most Frequent Words", col = colors.viz)
From the bar plot we can see that the most frequent words used are:
Dress, fit, size, love and so on.
Insights:
Our customer reviews frequently mentioned about dress, fit (can be interpreted as fitting), and size
We can assumed that our most frequent reviewed product are dress and top
We can assumed that many customers are concerned about fitting, size, and color
Visualization (Word Cloud)
As we can see a glimpse of most frequent used words in above bar plot, notice that using bar plot can’t so much help us to get more information and get more words that can describe our customers reviews (we can’t plot 100 words to a bar plot right?).
Therefore, we’re going to create word cloud to catch more words that frequenly appear in our review text :
#Prepare color palette
colors.wc <- brewer.pal(10, "Spectral")
#take 200 most frequent words in our text review
wordcloud(words = words.df$word, freq = words.df$freq,
max.words = 200, random.order = F,
colors = colors.wc)As more words appear in word cloud visualization, we can take more idea about our customer reviews:
Insight
As we have seen our model evaluation (using Confusion Matrix), we can conclude :
Our text mining prediction has great performance in accuracy, especially at precission as our main concerened to our model. We can also oulined the most frequent words appeared in our customer reviews are dress, fit, and size
What if our concern is to get better prediction focusing at point of the customer won’t giving a recommendation? So we can evaluate our product which has low recommendation? (Focus on True Negative)
If our focus is according to above condition, the point we need to pay more attention is the specificity rate. As we know that our specificity rate is quite low (65%), we can do some adjustment to develop our model performance by balancing our data so that the data has balance proportion to it’s target variable.
prop.table(table(review.text$Recommended.IND))#>
#> 0 1
#> 0.1776377 0.8223623
the balance proportion of target variable might help to our model to learn fairly for both class (recommend and not recommend review), and yet considering our data has more information that hasn’t been analyze, let’s try to use them instead.
Le’s see our data.frame again to recall how it looks like
review.filterNow we won’t use the text review, we’re going to see that variables
that may affecting to Recommended.IND, let’s drop
Review.Text column:
review.non.text <- review.filter %>% select(-Review.Text)
review.non.textLe’s prepare our data training and data testing:
RNGkind(sample.kind = "Rounding")
set.seed(77)
index.num <- initial_split(review.non.text, prop = 0.75, strata = Recommended.IND)
review.train <- training(index.num)
review.test <- testing(index.num)nrow(review.train)#> [1] 17614
nrow(review.test)#> [1] 5872
Prepare our model to our data train :
model.nb.nonText <- naiveBayes(Recommended.IND ~ ., data = review.train)Let’s create model prediction and put argument class to
type parameter do we get the result prediction of each
class target
recommend.prediction <- predict(model.nb.nonText, newdata = review.test , type="class")confusionMatrix(data = recommend.prediction, reference = review.test$Recommended.IND , positive = "1")#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction 0 1
#> 0 995 324
#> 1 48 4505
#>
#> Accuracy : 0.9366
#> 95% CI : (0.9301, 0.9427)
#> No Information Rate : 0.8224
#> P-Value [Acc > NIR] : < 0.00000000000000022
#>
#> Kappa : 0.8035
#>
#> Mcnemar's Test P-Value : < 0.00000000000000022
#>
#> Sensitivity : 0.9329
#> Specificity : 0.9540
#> Pos Pred Value : 0.9895
#> Neg Pred Value : 0.7544
#> Prevalence : 0.8224
#> Detection Rate : 0.7672
#> Detection Prevalence : 0.7754
#> Balanced Accuracy : 0.9434
#>
#> 'Positive' Class : 1
#>
Confusion Matrix Interpretation
Our naive bayes model classifier has a very good rate of predicting our target variable (non text variable), where can see the result and evaluation from the confusion matrix. Another way to see how well our model predicting class of our target variable is to see the ROC and AUC.
ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. Let’s try to create the ROC to evaluate our model
review.test$prediction <- predict(model.nb.nonText, newdata = review.test , type="raw")
head(review.test$prediction, 3)#> 0 1
#> [1,] 0.001733000 0.9982670
#> [2,] 0.001077287 0.9989227
#> [3,] 0.029367568 0.9706324
As we can see above
# object prediction
ROC.prediction <- prediction(predictions = review.test$prediction[,"1"], labels = review.test$Recommended.IND)# ROC curve
par(mfrow=c(1,2))
plot(performance(prediction.obj = ROC.prediction, measure = "tpr", x.measure = "fpr"), col = "green", main = "True Positive ROC")
abline(0,1, lty= 2)
plot(performance(prediction.obj = ROC.prediction, measure = "tnr", x.measure = "fnr"), col = "red", main = "True Negative ROC")
abline(0,1, lty= 2)
ROC Interpretation
Plot 1 (True Positive ROC)
As we see the plot above the curve (green line) is close to 1 (or approximately 1) to Y axis. The curve is showing us that our model has great performance at classifying positive class, that’s why the curve is close to 1, True Positive Rate (Y axis), rather to False Positive rate (x axis)
Note : If our model has low rate/performance at classifying class, say True Positive Rate is 50 % and False Positive 50 %, the green line will be close at the diagonal line
Plot 2 (True Negative Rate)
The interpretation of the curve is just exactly like the plot 1 has, but in this case our curve showing True Negative value(red line curve) against false negative. Look closely to both curve, there’s a slight differences between plots. Apparently our plot 2 has curve is much closer to point 1 in the Y axis, means that our model has better rate to classifying True Negative class (compared to True Positive class).
AUC stands for “Area under the ROC Curve.” That is, AUC measures the entire two-dimensional area underneath the entire ROC curve. The range of AUC is 0 to 1.
The closer AUC rate to 1, the more our model capable of classifying Negative and Positive class. Note that at the ROC, we can see the comparison of true positive and the false positive (or true negative and false negative), in AUC the rate is representing a binary classification model’s ability to separate positive classes from negative classes.
As we see from both curve at ROC, we might get AUC rate close to 1. Let’s check our AUC:
#create AUC object
AUC.review <- performance(prediction.obj = ROC.prediction, measure = "auc")
str(AUC.review)#> Formal class 'performance' [package "ROCR"] with 6 slots
#> ..@ x.name : chr "None"
#> ..@ y.name : chr "Area under the ROC curve"
#> ..@ alpha.name : chr "none"
#> ..@ x.values : list()
#> ..@ y.values :List of 1
#> .. ..$ : num 0.972
#> ..@ alpha.values: list()
#Inspect AUC value
AUC.review@y.values#> [[1]]
#> [1] 0.9715397
As expected our AUC rate is high and really close to 1. Once again it’s verified our model performance.
Decision Tree is a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. Let’s recall our data frame:
review.non.textFrom our data frame, we may assume that recommendation value is might be correlated to rating value. Let’s check :
table(review.train$Recommended.IND, review.train$Rating)#>
#> 1 2 3 4 5
#> 0 637 1086 1259 130 17
#> 1 14 74 907 3653 9837
There’s a chance of people giving rating 5 to product but not giving a recommendation, and there’s also a chance of people giving rating 1 to a product and yet still recommend the product.
For now, let’s create the decision tree and interpret the result to get clearer the idea of decision tree.
#create decision tree
model.dtree <- ctree(formula = Recommended.IND ~., #create formula (target variable ~ . (using all variable as predictor))
data = review.train, # use our data train to create model
control = ctree_control(mincriterion=.95, #Set alpha: 0.95, splitting node where p-value < 0.05
minsplit=0, #minimum number of observation in each internal node
minbucket=0)) #minimum number of observation in each terminal node
plot(model.dtree, type = "simple")
Decision Tree Interpretation
Note: as seen above, we can see that the majority variable that
splitting our tree is Rating. This condition indicating
that our data has few variable that effecting the target variable, (in
the other word, rating is the major effect to recommendation value), it
might not quite useful to use decision tree to our case. But for now
let’s inspect the result and we can still get some insights :
Condition :
recommendation == 0 (Customer is not giving a recommendation to a product)
recommendation == 1 (Customer is giving a recommendation to a product)
Root Node (Rating <= 3 & > 3)
When Rating is <= 3
recommendation is == 0
with the chance 58.1%
recommendation is == 0
with the chance 93.6 %
– Rating <= 1
recommendation is == 0
with the chance 97.8%
When Rating is > 3
recommendation is == 1
with the chance 99.2%
recommendation is == 1
with the chance 97.3 %
with the chance 94.9 %
As we’ve seen our decision tree model, let’s see decision tree prediction and evaluate them
predict.dtree <- predict(model.dtree, newdata =review.test)
confusionMatrix(data = predict.dtree, reference = review.test$Recommended.IND, positive = "1")#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction 0 1
#> 0 997 304
#> 1 46 4525
#>
#> Accuracy : 0.9404
#> 95% CI : (0.934, 0.9463)
#> No Information Rate : 0.8224
#> P-Value [Acc > NIR] : < 0.00000000000000022
#>
#> Kappa : 0.814
#>
#> Mcnemar's Test P-Value : < 0.00000000000000022
#>
#> Sensitivity : 0.9370
#> Specificity : 0.9559
#> Pos Pred Value : 0.9899
#> Neg Pred Value : 0.7663
#> Prevalence : 0.8224
#> Detection Rate : 0.7706
#> Detection Prevalence : 0.7784
#> Balanced Accuracy : 0.9465
#>
#> 'Positive' Class : 1
#>
Confusion Matrix Interpretation
Our decision model has also high rate at accuracy 94%, sensitivity, 93 %, specificity 95 %, and pos pred value 98 %. Overall our decision tree has a very good performance to predict a our target variable.
As we have evaluated all our models, we can conclude that all models have good performance to predict target variable.
2.Without text review, both naive bayes and decision tree has high performance at predicting our target variable. And yet for this specific case, my own preference is to use the naive bayes model, as it has great performance and also has a low computational load to apply it.