Social media sources deliver a massive amount of valuable textual data on a daily basis. The numerical value of these data can be extracted, so that whole sentences, or broader, posts, can be summarized. This value is often named ‘sentiment’, as it encapsulates the emotional background of the correlated expression. Sentiment analysis allows to gather an overall social feedback towards particular goods, events or even people. It helps in areas such as consumer analysis or marketing.
A paper written by Go, Bhayani and Huang (2011) is the most notable article that has laid a ground for the development of sentiment analysis area. In this short report, I will try to reproduce their approach, but use a different data set. In their paper Go et al. present an analysis of sentiment in Twitter posts - tweets. Tweets are short messages posted on a public website, in which people mostly express their opinions on a wide range of topics. Go et al. train several classifiers to see whether they can effectively measure whether a particular tweet has rather positive or negative sentiment. To train a classifier that can effectively catch what is the sentiment of the textual data, a labeled data set is necessary. Unfortunately, Twitter does not provide users with an option to mark their tweet with some sentiment measure, however Go et al. make use of emoticons. If the particular tweet has emoticons such as: ‘:)’ or ‘:D’ they consider such tweet as positive on, whereas when the tweet has emoticons such as: ‘:(’ or ‘:-(’ they consider such tweet as negative on. The classifier methods that they utilize include: SVM, Naive Bayes and logistic regression, as well as a manual way to assign sentiment to each word. Their results indicate that the accuracy of the machine learning methods is above 80%, which is relatively good.
In this report, I would like to utilize another data set, from a different area. Game reviews are a way for developers to know how well they did their job. Either a game gets good, positive reviews or it is completely dragged through the mud. Given a game had very good reviews, developers usually give it another chance by releasing sequel with new story, graphics or audio. But the attitude towards sequels is very mixed, though. Sometimes it is even better not to try to fix a masterpiece, but several times in the history of gaming developers managed to provide gamers with another episode of excellent quality entertainment. Given that, I would like to extend this report in another area - by measuring a difference in sentiment of original and sequel games.
The data consists of reviews in the form of JSONs that were fetched using Steam’s API for Mass Effect 1 and 2, and Mafia 1 and 2. The data is not provided here, but feel free to download the reviews using their API: https://partner.steamgames.com/doc/store/getreviews. The API call returns several thousand of the most fresh reviews for the particular title. A single JSON contains: an author, timestamp, review text, popularity (number of times the review has been liked) an other seemingly not necessary here features.
#ME
me <- fromJSON("me.json", flatten=TRUE)
me1.orig <- me %>% filter(product_id == "17460")
me2.orig <- me %>% filter(product_id == "24980")
me$product_id <- ifelse(me$product_id == "17460", "Mass Effect 1", "Mass Effect 2")
me$gameplay <- cut(me$hours, breaks = 5, include.lowest = T, dig.lab = 4)
#mafia
mafia <- fromJSON("mafia.json", flatten=TRUE)
mafia1.orig <- mafia %>% filter(product_id == "40990")
mafia2.orig <- mafia %>% filter(product_id == "50130")
mafia$product_id <- ifelse(mafia$product_id == "40990", "Mafia 1", "Mafia 2")
mafia$gameplay <- cut(mafia$hours, breaks = 5, include.lowest = T, dig.lab = 4)
createTdmDtm <- function(text) {
toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
corp <- Corpus(VectorSource(text))
corp <- tm_map(corp, toSpace, "/")
corp <- tm_map(corp, toSpace, "@")
corp <- tm_map(corp, toSpace, "\\|")
corp <- tm_map(corp, toSpace, "[^\x01-\x7F]")
corp <- tm_map(corp, toSpace, "¦")
corp <- tm_map(corp, stripWhitespace)
corp <- tm_map(corp, removePunctuation)
corp <- tm_map(corp, removeNumbers)
corp <- tm_map(corp, content_transformer(tolower))
corp <- tm_map(corp, removeWords, stopwords("english"))
corp <- tm_map(corp, stemDocument)
dtm <- DocumentTermMatrix(corp)
tdm <- TermDocumentMatrix(corp)
list(dtm, tdm)
}
mostCommonWordsPlt <- function(DTM1, DTM2, sparsity = 0.9, gameName1, gameName2){
freq1 <- colSums(as.matrix(removeSparseTerms(DTM1, sparsity)))
freq2 <- colSums(as.matrix(removeSparseTerms(DTM2, sparsity)))
freq <- full_join(
data.frame(word = names(freq1), freq1 = freq1),
data.frame(word = names(freq2), freq2 = freq2))
freq %>%
subset(., freq1 > 30 | freq2 > 30) %>%
head(20) %>%
ggplot(aes(x = reorder(word, -freq1))) +
geom_bar(aes(y = freq1, fill = "Sequel"), stat = "identity", alpha = 0.7) +
geom_bar(aes(y = freq2, fill = "Original"), stat = "identity", alpha = 0.7) +
theme(axis.text.x = element_text(angle=45, hjust=1)) +
labs(x = "Words", y = "Frequencies", title = paste0("Words frequencies for ", gameName1, " and ", gameName2)) +
scale_fill_manual(name=NULL, values=c(Original ="lightblue", Sequel ="red")) +
theme(legend.key.size = unit(0.25, "cm"), legend.position="bottom", plot.title = element_text(size=8))
}
The first step is to parse the review’s text into R, tokenize the words, remove unnecessary words, punctuations, special characters (which are obviously redundant in data that comes from the Internet) and in the end stem the words to their root forms.
Each of the games has the following number of reviews:
rbind(
c("", "Mass Effect", "Mafia"),
cbind(
rbind("Original",
"Sequel"),
rbind(nrow(me1.orig),
nrow(me2.orig)),
rbind(nrow(mafia1.orig),
nrow(mafia2.orig))
)
) %>%
kable() %>%
kable_styling() %>%
row_spec(1, bold = T)
| Mass Effect | Mafia | |
| Original | 6747 | 503 |
| Sequel | 7151 | 10937 |
Let us have a quick derivation from the main research topic an have a look at the most common root forms in each of the games. Each of the games has the word game itself as one of the most popular. Name of the game often shows up in most reviews. Story and play are also common. Mass Effect is more of a RPG/shooter game - story is an important review word, one for a single player game genre. Mafia - a shooter game with a great story. Most of the reviews are already emphasized with words like great or good, which inclines that the game ratings are mostly good.
df <- rbind(c(
"Mass Effect 1", "", "Mass Effect 2", "", "Mafia 1", "", "Mafia 2", ""
),
cbind(
head(names(sort(rowSums(as.matrix(me1.TDM)), T)), 10),
head(sort(rowSums(as.matrix(me1.TDM)), T), 10),
head(names(sort(rowSums(as.matrix(me2.TDM)), T)), 10),
head(sort(rowSums(as.matrix(me2.TDM)), T), 10),
head(names(sort(rowSums(as.matrix(mafia1.TDM)), T)), 10),
head(sort(rowSums(as.matrix(mafia1.TDM)), T), 10),
head(names(sort(rowSums(as.matrix(mafia2.TDM)), T)), 10),
head(sort(rowSums(as.matrix(mafia2.TDM)), T), 10)
)
)
rownames(df) <- NULL
kable(df, caption = "Number of most common terms occurences") %>%
kable_styling() %>%
row_spec(1, bold = T)
| Mass Effect 1 | Mass Effect 2 | Mafia 1 | Mafia 2 | ||||
| game | 11185 | game | 10480 | game | 1041 | game | 17736 |
| play | 3931 | effect | 3669 | mafia | 298 | stori | 6300 |
| effect | 3256 | mass | 3525 | play | 265 | mafia | 5170 |
| stori | 3179 | play | 3471 | stori | 222 | play | 4098 |
| mass | 3111 | one | 2695 | time | 192 | good | 3922 |
| one | 2410 | stori | 2631 | get | 175 | great | 3841 |
| like | 2067 | best | 2197 | one | 171 | like | 3750 |
| can | 2030 | first | 2001 | mission | 169 | one | 2681 |
| great | 1978 | charact | 1976 | like | 157 | time | 2655 |
| charact | 1967 | great | 1818 | just | 147 | just | 2490 |
If we now distinguish the conclusions between the original and sequel game, we do not see much differences. Both games in both editions share the most common words. However, we can already note that the number of reviews for Mafia 1 is much lower than for its descendant.
mostCommonWordsPlt(me1.DTM, me2.DTM, 0.85, gameName1 = "Mass Effect 1", gameName2 = "Mass Effect 2")
mostCommonWordsPlt(mafia1.DTM, mafia2.DTM, 0.85, gameName1 = "Mafia 1", gameName2 = "Mafia 2")
Another way to represent the most common words is to prepare a wordcloud. We present them below, they should resemble the aforementioned information, but the higher sparsity matrices were used in these examples.
me1.DTM.nonsparse <- removeSparseTerms(me1.DTM, 0.9)
me1.freq <- colSums(as.matrix(me1.DTM.nonsparse))
w1 <- wordcloud2(data.frame(names(me1.freq),
me1.freq),
color = "random-dark")
saveWidget(w1, '1.html', selfcontained = F)
webshot('1.html', '1.png', vwidth=500,vheight=350, delay = 5)
me2.DTM.nonsparse <- removeSparseTerms(me2.DTM, 0.9)
me2.freq <- colSums(as.matrix(me2.DTM.nonsparse))
w2 <- wordcloud2(data.frame(names(me2.freq),
me2.freq),
color = "random-dark")
saveWidget(w2, '2.html', selfcontained = F)
webshot('2.html', '2.png', vwidth=500,vheight=350, delay = 5)
mafia1.DTM.nonsparse <- removeSparseTerms(mafia1.DTM, 0.9)
mafia1.freq <- colSums(as.matrix(mafia1.DTM.nonsparse))
w3 <- wordcloud2(data.frame(names(mafia1.freq),
mafia1.freq),
color = "random-dark")
saveWidget(w3, '3.html', selfcontained = F)
webshot('3.html', '3.png', vwidth=500,vheight=350, delay = 5)
mafia2.DTM.nonsparse <- removeSparseTerms(mafia2.DTM, 0.9)
mafia2.freq <- colSums(as.matrix(mafia2.DTM.nonsparse))
w4 <- wordcloud2(data.frame(names(mafia2.freq),
mafia2.freq),
color = "random-dark")
saveWidget(w4, '4.html', selfcontained = F)
webshot('4.html', '4.png', vwidth=500,vheight=350, delay = 5)
The main goal of the analysis is to find what is the average sentiment of the reviews an whether it is higher for sequels or not. In the simplest scenario, Go et al. assess the tweet’s (review’s) sentiment based on the sum of sentiment of particular keywords. In this approach similar keyword dictionart is applied and overall sentiment (sentiment score for each review) is presented in the grapsh below.
In case of the original/sequel “battle”, Mass Effect has a better sentiment score for the sequel game. In this case, there were over 82% of positive reviews, which is also a percentage of people who recommended the game. It is also worth noting that the percentage of people who did not recommend the game is equal to the sum of neutral and negative reviews. Mafia games are similiar to Mass Effect ones - sequel is better, on overall the sentiment values are very close for Mafia 2 and Mass Effect 2. The percentage of positive sentiment score is a little bit lower than the percentage of recommend reviews for the original, but 92% score of recommendation matches positive and neutral reviews summed up.
me1.sent <- analyzeSentiment(me1.orig$text)
plotSentiment(me1.sent$SentimentHE, xlab = "Mass Effect 1")
me2.sent <- analyzeSentiment(me2.orig$text)
plotSentiment(me2.sent$SentimentHE, xlab = "Mass Effect 2")
mafia1.sent <- analyzeSentiment(mafia1.orig$text)
plotSentiment(mafia1.sent$SentimentHE, xlab = "Mafia 1")
mafia2.sent <- analyzeSentiment(iconv(mafia2.orig$text, "latin1", "ASCII", sub="")) #these have been identified as a wrongly formatted reviews
plotSentiment(mafia2.sent$SentimentHE, xlab = "Mafia 2")
rbind(
c("Mass Effect 1", "Mass Effect 2"),
cbind(round(mean(me1.sent$SentimentQDAP, na.rm = T), 3),
round(mean(me2.sent$SentimentQDAP, na.rm = T), 3))
) %>%
kable(caption = "Mean sentiment for Mass Effect games", digits = 2) %>%
kable_styling() %>%
row_spec(1, bold = T)
| Mass Effect 1 | Mass Effect 2 |
| 0.194 | 0.215 |
rbind(c("Mass Effect 1", "Mass Effect 2"),
cbind(
round(prop.table(table(convertToDirection(me1.sent$SentimentQDAP))), 3),
round(prop.table(table(convertToDirection(me2.sent$SentimentQDAP))), 3))
) %>%
kable(caption = "Sentiment scores for Mass Effect games") %>%
kable_styling() %>%
row_spec(1, bold = T)
| Mass Effect 1 | Mass Effect 2 | |
| negative | 0.062 | 0.055 |
| neutral | 0.113 | 0.123 |
| positive | 0.825 | 0.822 |
rbind(c("Mass Effect 1", "Mass Effect 2"),
cbind(
round(prop.table(table(me1.orig$recommended)), 3),
round(prop.table(table(me2.orig$recommended)), 3))
) %>%
kable(caption = "Percentage of reviewers that recommend Mass Effect games") %>%
kable_styling() %>%
row_spec(1, bold = T)
| Mass Effect 1 | Mass Effect 2 | |
| FALSE | 0.061 | 0.053 |
| TRUE | 0.939 | 0.947 |
rbind(
c("Mafia 1", "Mafia 2"),
cbind(round(mean(mafia1.sent$SentimentQDAP, na.rm = T), 3),
round(mean(mafia2.sent$SentimentQDAP, na.rm = T), 3))
) %>%
kable(caption = "Mean sentiment for Mafia games", digits = 2) %>%
kable_styling() %>%
row_spec(1, bold = T)
| Mafia 1 | Mafia 2 |
| 0.191 | 0.214 |
rbind(c("Mafia 1", "Mafia 2"),
cbind(
round(prop.table(table(convertToDirection(mafia1.sent$SentimentQDAP))), 3),
round(prop.table(table(convertToDirection(mafia2.sent$SentimentQDAP))), 3))
) %>%
kable(caption = "Sentiment scores for Mafia games") %>%
kable_styling() %>%
row_spec(1, bold = T)
| Mafia 1 | Mafia 2 | |
| negative | 0.072 | 0.067 |
| neutral | 0.123 | 0.129 |
| positive | 0.805 | 0.804 |
rbind(c("Mafia 1", "Mafia 2"),
cbind(
round(prop.table(table(mafia1.orig$recommended)), 3),
round(prop.table(table(mafia2.orig$recommended)), 3))
) %>%
kable(caption = "Percentage of reviewers that recommend Mafia games") %>%
kable_styling() %>%
row_spec(1, bold = T)
| Mafia 1 | Mafia 2 | |
| FALSE | 0.113 | 0.076 |
| TRUE | 0.887 | 0.924 |
Returning back to the main flow of the analyzed paper, Go et al. employ three machine learning techniques that are trained to classify whether particular tweet (in this case particular review) has positive or negative sentiment. The label is fetched from the recommend option that is given to the reviewers. The models are built over dummy variables representing aggregated usage of particular words in a review.
me1.df <- cbind(as.data.frame(data.matrix(me1.DTM), stringsAsfactors = FALSE), as.factor(me1.orig$recommended))
me2.df <- cbind(as.data.frame(data.matrix(me2.DTM), stringsAsfactors = FALSE), as.factor(me2.orig$recommended))
colnames(me1.df)[ncol(me1.df)] <- "Recommended"
colnames(me2.df)[ncol(me2.df)] <- "Recommended"
set.seed(2137)
me1.df.cut <- me1.df[sample(nrow(me1.df), 2500), ]
me2.df.cut <- me2.df[sample(nrow(me2.df), 2500), ]
me1.part <- createDataPartition(me1.df.cut$Recommended, p = 0.8, list = F)
me2.part <- createDataPartition(me1.df.cut$Recommended, p = 0.8, list = F)
me1.train <- me1.df.cut[me1.part, c(colSums(me1.df.cut[, -ncol(me1.df.cut)]) > 0.8*0.025*nrow(me1.df.cut), T)]
me1.test <- me1.df.cut[-me1.part, c(colSums(me1.df.cut[, -ncol(me1.df.cut)]) > 0.8*0.025*nrow(me1.df.cut), T)]
me2.train <- me2.df.cut[me2.part, c(colSums(me2.df.cut[, -ncol(me2.df.cut)]) > 0.8*0.025*nrow(me2.df.cut), T)]
me2.test <- me2.df.cut[-me2.part, c(colSums(me2.df.cut[, -ncol(me2.df.cut)]) > 0.8*0.025*nrow(me2.df.cut), T)]
me1.train <- upSample(me1.train, me1.train$Recommended)
me2.train <- upSample(me2.train, me2.train$Recommended)
me1.train$Class <- NULL
me2.train$Class <- NULL
# me 1
glm.model1 <- train(Recommended ~ . ,
data = me1.train,
method="glm",
family=binomial(),
preProcess = c("center", "scale"))
# me 2
glm.model2 <- train(Recommended ~ . ,
data = me2.train,
method="glm",
family=binomial(),
preProcess = c("center", "scale"))
me1.glm.pred <- predict(glm.model1, me1.test)
me2.glm.pred <- predict(glm.model2, me2.test)
#----------------------------------
# me 1
svm.model1 <- train(Recommended ~ . ,
data = me1.train,
method = "svmLinear",
preProcess = c("center", "scale"))
# me 2
svm.model2 <- train(Recommended ~ . ,
data = me2.train,
method = "svmLinear",
preProcess = c("center", "scale"))
me1.svm.pred <- predict(svm.model1, me1.test)
me2.svm.pred <- predict(svm.model2, me2.test)
#----------------------------------
# me 1
nb.model1 <- train(Recommended ~ . ,
data = me1.train,
method = "nb",
preProcess = c("center", "scale"))
# me 2
nb.model2 <- train(Recommended ~ . ,
data = me2.train,
method = "nb",
preProcess = c("center", "scale"))
me1.nb.pred <- predict(nb.model1, me1.test)
me2.nb.pred <- predict(nb.model2, me2.test)
mafia1.df <- cbind(as.data.frame(data.matrix(mafia1.DTM), stringsAsfactors = FALSE), as.factor(mafia1.orig$recommended))
mafia2.df <- cbind(as.data.frame(data.matrix(mafia2.DTM), stringsAsfactors = FALSE), as.factor(mafia2.orig$recommended))
colnames(mafia1.df)[ncol(mafia1.df)] <- "Recommended"
colnames(mafia2.df)[ncol(mafia2.df)] <- "Recommended"
set.seed(2137)
mafia1.df.cut <- mafia1.df
mafia2.df.cut <- mafia2.df[sample(nrow(mafia2.df), 2500), ]
mafia1.part <- createDataPartition(mafia1.df.cut$Recommended, p = 0.8, list = F)
mafia2.part <- createDataPartition(mafia2.df.cut$Recommended, p = 0.8, list = F)
mafia1.train <- mafia1.df.cut[mafia1.part, c(colSums(mafia1.df.cut[, -ncol(mafia1.df.cut)]) > 0.8*0.025*nrow(mafia1.df.cut), T)]
mafia1.test <- mafia1.df.cut[-mafia1.part, c(colSums(mafia1.df.cut[, -ncol(mafia1.df.cut)]) > 0.8*0.025*nrow(mafia1.df.cut), T)]
mafia2.train <- mafia2.df.cut[mafia2.part, c(colSums(mafia2.df.cut[, -ncol(mafia2.df.cut)]) > 0.8*0.025*nrow(mafia2.df.cut), T)]
mafia2.test <- mafia2.df.cut[-mafia2.part, c(colSums(mafia2.df.cut[, -ncol(mafia2.df.cut)]) > 0.8*0.025*nrow(mafia2.df.cut), T)]
mafia1.train <- upSample(mafia1.train, mafia1.train$Recommended)
mafia2.train <- upSample(mafia2.train, mafia2.train$Recommended)
mafia1.train$Class <- NULL
mafia2.train$Class <- NULL
# mafia 1
glm.model1 <- train(Recommended ~ . ,
data = mafia1.train,
method="glm",
family=binomial(),
preProcess = c("center", "scale"))
# mafia 2
glm.model2 <- train(Recommended ~ . ,
data = mafia2.train,
method="glm",
family=binomial(),
preProcess = c("center", "scale"))
mafia1.glm.pred <- predict(glm.model1, mafia1.test)
mafia2.glm.pred <- predict(glm.model2, mafia2.test)
#----------------------------------
# mafia 1
svm.model1 <- train(Recommended ~ . ,
data = mafia1.train,
method = "svmLinear",
preProcess = c("center", "scale"))
# mafia 2
svm.model2 <- train(Recommended ~ . ,
data = mafia2.train,
method = "svmLinear",
preProcess = c("center", "scale"))
mafia1.svm.pred <- predict(svm.model1, mafia1.test)
mafia2.svm.pred <- predict(svm.model2, mafia2.test)
#----------------------------------
# mafia 1
nb.model1 <- train(Recommended ~ . ,
data = mafia1.train,
method = "nb",
preProcess = c("center", "scale"))
# mafia 2
nb.model2 <- train(Recommended ~ . ,
data = mafia2.train,
method = "nb",
preProcess = c("center", "scale"))
mafia1.nb.pred <- predict(nb.model1, mafia1.test)
mafia2.nb.pred <- predict(nb.model2, mafia2.test)
rbind(c("", "Mass Effect 1", "Mass Effect 2", "Mafia 1", "Mafia 2"),
cbind(c("logistic regression", "SVM", "Naive Bayes"),
c(sprintf("%2.4f",confusionMatrix(me1.test$Recommended, me1.glm.pred)$overall["Accuracy"]),
sprintf("%2.4f",confusionMatrix(me1.test$Recommended, me1.svm.pred)$overall["Accuracy"]),
sprintf("%2.4f",confusionMatrix(me1.test$Recommended, me1.nb.pred)$overall["Accuracy"])),
c(sprintf("%2.4f",confusionMatrix(me2.test$Recommended, me2.glm.pred)$overall["Accuracy"]),
sprintf("%2.4f",confusionMatrix(me2.test$Recommended, me2.svm.pred)$overall["Accuracy"]),
sprintf("%2.4f",confusionMatrix(me2.test$Recommended, me2.nb.pred)$overall["Accuracy"])),
c(sprintf("%2.4f",confusionMatrix(mafia1.test$Recommended, mafia1.glm.pred)$overall["Accuracy"]),
sprintf("%2.4f",confusionMatrix(mafia1.test$Recommended, mafia1.svm.pred)$overall["Accuracy"]),
sprintf("%2.4f",confusionMatrix(mafia1.test$Recommended, mafia1.nb.pred)$overall["Accuracy"])),
c(sprintf("%2.4f",confusionMatrix(mafia2.test$Recommended, mafia2.glm.pred)$overall["Accuracy"]),
sprintf("%2.4f",confusionMatrix(mafia2.test$Recommended, mafia2.svm.pred)$overall["Accuracy"]),
sprintf("%2.4f",confusionMatrix(mafia2.test$Recommended, mafia2.nb.pred)$overall["Accuracy"]))
)
) %>%
kable(row.names = F, digits = 2, caption = "Accuracy measure for each of the studied models") %>%
kable_styling() %>%
column_spec(1, bold = T) %>%
row_spec(1, bold = T)
| Mass Effect 1 | Mass Effect 2 | Mafia 1 | Mafia 2 | |
| logistic regression | 0.9018 | 0.8657 | 0.5300 | 0.8920 |
| SVM | 0.8858 | 0.8818 | 0.8800 | 0.8760 |
| Naive Bayes | 0.1062 | 0.8958 | 0.2500 | 0.9200 |
In their work Go et al. have applied models to both unigrams and bigrams. Due to much richer dataset (in textual sense) applying bigram model here would be impractical. Therefore, only unigram model results are reported. In addition to that, I have limited the training sample sizes to 2500 obs. due to the same reason. Feel free to remove this restriction if you have more computational power.
The results are comparable to the ones obtained by Go et al. In their work Naive Bayes and SVM approaches were the best. However in this scenario, the training sample might have been too poor to provide enough variants of observations, therefore in case of Mass Effect 1 and Mafia 1 Naive Bayes is very poor in terms of prediction power. Therefore SVM is the best in terms of accuracy in this scenario. It needs to be noted though, that the accuracy of the models is very similar to the number of positive cases in the dataset and lower than the ratio in the testing sample. This is a minor disadvantage of Go et al. paper, because we cannot relate accuracy to the real number of observations in the sample.
rbind(c("Mass Effect 1", "Mass Effect 2", "Mafia 1", "Mafia 2"),
round(unname(c(
prop.table(table(me1.test$Recommended))['TRUE'],
prop.table(table(me2.test$Recommended))['TRUE'],
prop.table(table(mafia1.test$Recommended))['TRUE'],
prop.table(table(mafia2.test$Recommended))['TRUE']
)), 2)
) %>%
kable(row.names = F, digits = 2, caption = "Percentage of positive reviews in the testing sample") %>%
kable_styling() %>%
row_spec(1, bold = T)
| Mass Effect 1 | Mass Effect 2 | Mafia 1 | Mafia 2 |
| 0.94 | 0.94 | 0.89 | 0.92 |
Therefore I present balanced accuracy measure for each of the models. This indicates that the training sample feature engineering is either poor in this scenario or the training sample variety is too small. However, SVM proves to be of value, due to accuracy at circa 60-65%, which is not that bad. In summary, following Go et al. approach one can build a predictive model to automatically detect sentiment in the online reviews.
rbind(c("", "Mass Effect 1", "Mass Effect 2", "Mafia 1", "Mafia 2"),
cbind(c("logistic regression", "SVM", "Naive Bayes"),
c(sprintf("%2.4f",confusionMatrix(me1.test$Recommended, me1.glm.pred)$byClass["Balanced Accuracy"]),
sprintf("%2.4f",confusionMatrix(me1.test$Recommended, me1.svm.pred)$byClass["Balanced Accuracy"]),
sprintf("%2.4f",confusionMatrix(me1.test$Recommended, me1.nb.pred)$byClass["Balanced Accuracy"])),
c(sprintf("%2.4f",confusionMatrix(me2.test$Recommended, me2.glm.pred)$byClass["Balanced Accuracy"]),
sprintf("%2.4f",confusionMatrix(me2.test$Recommended, me2.svm.pred)$byClass["Balanced Accuracy"]),
sprintf("%2.4f",confusionMatrix(me2.test$Recommended, me2.nb.pred)$byClass["Balanced Accuracy"])),
c(sprintf("%2.4f",confusionMatrix(mafia1.test$Recommended, mafia1.glm.pred)$byClass["Balanced Accuracy"]),
sprintf("%2.4f",confusionMatrix(mafia1.test$Recommended, mafia1.svm.pred)$byClass["Balanced Accuracy"]),
sprintf("%2.4f",confusionMatrix(mafia1.test$Recommended, mafia1.nb.pred)$byClass["Balanced Accuracy"])),
c(sprintf("%2.4f",confusionMatrix(mafia2.test$Recommended, mafia2.glm.pred)$byClass["Balanced Accuracy"]),
sprintf("%2.4f",confusionMatrix(mafia2.test$Recommended, mafia2.svm.pred)$byClass["Balanced Accuracy"]),
sprintf("%2.4f",confusionMatrix(mafia2.test$Recommended, mafia2.nb.pred)$byClass["Balanced Accuracy"]))
)
) %>%
kable(row.names = F, digits = 2, caption = "Balanced accuracy measure for each of the studied models") %>%
kable_styling() %>%
column_spec(1, bold = T) %>%
row_spec(1, bold = T)
| Mass Effect 1 | Mass Effect 2 | Mafia 1 | Mafia 2 | |
| logistic regression | 0.5865 | 0.5842 | 0.4988 | 0.6556 |
| SVM | 0.5669 | 0.6211 | 0.6526 | 0.6385 |
| Naive Bayes | 0.5101 | 0.6124 | 0.5640 | NA |