Sentiment Analsys

What Game has Better Reviews BRAZILIAN DRUG DEALER 3: I OPENED A PORTAL TO HELL IN THE FAVELA TRYING TO REVIVE MIT AIA I NEED TO CLOSE IT or Call of Duty: Black Ops 7.

I want to find out which game reviewers think of as better BRAZILIAN DRUG DEALER 3: I OPENED A PORTAL TO HELL IN THE FAVELA TRYING TO REVIVE MIT AIA I NEED TO CLOSE IT, or the latest call of duty. I am looking at these two games because I want to find out if a small developer can make a better product than a multi-million dollar company.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tidytext)  # Tidy text mining
library(textdata)  # Lexicons of sentiment data
library(widyr)     # Easily calculating pairwise counts
library(igraph)


Attaching package: 'igraph'

The following objects are masked from 'package:lubridate':

    %--%, union

The following objects are masked from 'package:dplyr':

    as_data_frame, groups, union

The following objects are masked from 'package:purrr':

    compose, simplify

The following object is masked from 'package:tidyr':

    crossing

The following object is masked from 'package:tibble':

    as_data_frame

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

reviews <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/mallown_xavier_edu/IQCp3iVPgxopT4rynL8P7M8yAXkYB0hgc2drIpxUuPxjIDM?download=1")

New names:
Rows: 200 Columns: 24
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(2): language, review dbl (17): ...1, recommendationid, author.steamid,
author.num_games_owned, au... lgl (5): voted_up, steam_purchase,
received_for_free, written_during_early_...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

Here we imported the data there are 100 reviews for each game so the number of reviews are the same.

But for better display and stuff later I am going to add a variable for the game names.

reviews <- reviews %>%
  mutate(game_name = case_when(
    appid == 3191050 ~ "BRAZILIAN DRUG DEALER 3",
    appid == 3606480 ~ "Call of Duty"))

He we get all the words to be tokens

but first we have to give each review its own id so we are using row number.

reviews <- 
  reviews %>% 
  mutate(review_id = row_number())

Now we unrest the tokens and remove stop words.

tidy_reviews <- 
  reviews %>% 
  unnest_tokens(word, review) %>% 
  anti_join(stop_words)

Joining with `by = join_by(word)`

Analysis

Now that we have the reviews separated out we can start looking at them to see if there are any differences between how people view these games.

We are first going to look at pomposity and negativity.

bing <- 
  get_sentiments("bing")
review_counts <- 
  tidy_reviews %>% 
  group_by(game_name, word) %>% 
  summarize(n = n()) %>% 
  inner_join(bing)

`summarise()` has grouped output by 'game_name'. You can override using the
`.groups` argument.
Joining with `by = join_by(word)`

review_counts %>%
  group_by(game_name,sentiment) %>% 
  summarize(n = n()) %>% 
  arrange(-n)

`summarise()` has grouped output by 'game_name'. You can override using the
`.groups` argument.

# A tibble: 4 × 3
# Groups:   game_name [2]
  game_name               sentiment     n
  <chr>                   <chr>     <int>
1 Call of Duty            negative    128
2 Call of Duty            positive     97
3 BRAZILIAN DRUG DEALER 3 negative     53
4 BRAZILIAN DRUG DEALER 3 positive     48

It looks like both games have more negativity than positiveity, however the difference between Call of Duties negativity compared to BRAZILIAN DRUG DEALER 3’s is far greater. however COD also has more words that are positive and negative in the first place.

Lets look at the words for this.

review_counts %>% 
  group_by(game_name) %>% 
  filter(n>5) %>% 
  mutate(n = ifelse(sentiment == "negative", -n, n)) %>% 
  mutate(word = reorder(word, n)) %>% 
  ggplot(aes(word, n)) +
  geom_col() +
  coord_flip() +
  facet_wrap(~game_name, ncol = 2) +
  geom_text(aes(label = signif(n, digits = 3)), nudge_y = 8) +
  labs(title = "Positive and Negative Words for BBD3 and COD",
          subtitle = "Only words appearing at least 5 times are shown")

It looks like Call of Duty has more occurrences of words that appears in bing than BRAZILIAN DRUG DEALER 3 however it can be noticed that hell appears a lot in BRAZILIAN DRUG DEALER 3’s reviews and it should probably not be included in this sentiment analysis as the location of hell plays a major part in the plot of BRAZILIAN DRUG DEALER 3: I OPENED A PORTAL TO HELL IN THE FAVELA TRYING TO REVIVE MIT AIA I NEED TO CLOSE IT.

review_counts %>%
  filter(game_name == "BRAZILIAN DRUG DEALER 3") %>% 
  filter(!word == 'hell') %>% 
  filter(n > 1) %>%
  mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = sentiment)) +
  geom_col() +
  coord_flip() +
  scale_fill_brewer(palette="Set1") +
  labs(title = "BBD3 Sentiment Scores by Word",
       subtitle = "Scorable words appearing at least 2 times",
       x = "Word",
       y = "Contribution to sentiment")

Funny Appearing as a negative is quite odd as this game is a comedy game.

I think that the simplicity of Bing might be a little bad so i am going to look at the emotions that people feel in reviews.

nrc <- 
  get_sentiments("nrc")
nrc %>% 
  filter(word == "hell")

# A tibble: 5 × 2
  word  sentiment
  <chr> <chr>    
1 hell  anger    
2 hell  disgust  
3 hell  fear     
4 hell  negative 
5 hell  sadness

It looks like we should remove hell form BRAZILIAN DRUG DEALER 3: I OPENED A PORTAL TO HELL IN THE FAVELA TRYING TO REVIVE MIT AIA I NEED TO CLOSE IT again.

BBD3_sentiment <- 
  tidy_reviews %>% 
  filter(game_name == "BRAZILIAN DRUG DEALER 3") %>% 
  filter(!word == "hell") %>% 
  inner_join(nrc, by = "word", relationship = "many-to-many") %>% 
  group_by(sentiment) %>%
  summarize(`Count`=n(),
            `Percent of scoreable words` = `Count`/nrow(.)) %>% 
  arrange(-`Percent of scoreable words`)

CoD_sentiment <- 
  tidy_reviews %>% 
  filter(game_name != "BRAZILIAN DRUG DEALER 3") %>% 
  inner_join(nrc, by = "word", relationship = "many-to-many") %>% 
  group_by(sentiment) %>%
  summarize(`Count`=n(),
            `Percent of scoreable words` = `Count`/nrow(.)) %>% 
  arrange(-`Percent of scoreable words`)

Lets see if we can compare the emotions of these two games

tidy_reviews %>% 
  filter(game_name == "BRAZILIAN DRUG DEALER 3") %>% 
  inner_join(nrc, by = "word", relationship = "many-to-many") %>% 
  nrow() / # divided by
# Total number of words in Dracula:
reviews %>%
  filter(game_name == "BRAZILIAN DRUG DEALER 3") %>% 
  unnest_tokens(word,review) %>%
  nrow()

[1] 0.2312734

tidy_reviews %>% 
  filter(game_name != "BRAZILIAN DRUG DEALER 3") %>% 
  inner_join(nrc, by = "word", relationship = "many-to-many") %>% 
  nrow() / # divided by
# Total number of words in Dracula:
reviews %>%
  filter(game_name != "BRAZILIAN DRUG DEALER 3") %>% 
  unnest_tokens(word,review) %>%
  nrow()

[1] 0.2312517

Hip Hip Hooray! the percent of words with sentiment is really similar so we can compare.

##i learned the hard way to make the id vectors
BBD3_sentiment$name <- "BRAZILIAN DRUG DEALER 3"
CoD_sentiment$name <- "Call of Duty"

combined_sentiment <- 
  bind_rows(BBD3_sentiment,CoD_sentiment)

combined_sentiment %>% 
  ggplot(aes(x=sentiment, y = `Percent of scoreable words`, fill=name))+
         geom_col(position = "dodge") +
  scale_fill_brewer(palette="Set1") +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "A comparison of the emotive sentiments found in BBD3 and CoD",
       subtitle = "Using the NRC Lexicon (Mohammad and Turney, 2013), shown as a percent of scorable words",
       x = "Sentiment",
       fill = "Book") +
  theme(axis.text.x = element_text(angle = 45, hjust=1))

When looking at the graph you can see there is a very similar positive and negative sentiment with Call of Duty and BRAZILIAN DRUG DEALER 3: I OPENED A PORTAL TO HELL IN THE FAVELA TRYING TO REVIVE MIT AIA I NEED TO CLOSE IT, the most interesting thing i see in this sentiment analysis is how there is more fear in which I find Really funny.

Conclusion

I do not think I could with sentiment analysis see how both of these games are looked upon by the players, this is because of the large degree of sarcasm used in both responses, however I believe that if I could throw the games rating in the analysis then i could accurately make up for sarcasm. the other issue is that a notable number of the reviews for BRAZILIAN DRUG DEALER 3: I OPENED A PORTAL TO HELL IN THE FAVELA TRYING TO REVIVE MIT AIA I NEED TO CLOSE IT, are written in portguese despite being noted as english reviews by steam. All in all I could not learn much form this other than BRAZILIAN DRUG DEALER 3: I OPENED A PORTAL TO HELL IN THE FAVELA TRYING TO REVIVE MIT AIA I NEED TO CLOSE IT inspires more fear than Call of Duty.