Xbox and Playstation have been rivals for the majority of my lifetime. With recent talks about next generation consoles, I wanted to analyze the sentiments of the communities using subreddits r/PS5 and r/XboxSeriesX.
Data
Each row in the dataset is a single Reddit post with its title, body text, console label, and date created.
By applying tidy text tools, I look at emotional tone (using NRC), word pair networks, and how overall sentiment changes over time for each console.
library(tidyverse)
Warning: package 'readr' was built under R version 4.4.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidytext)
Warning: package 'tidytext' was built under R version 4.4.3
library(lubridate)library(igraph)
Warning: package 'igraph' was built under R version 4.4.3
Attaching package: 'igraph'
The following objects are masked from 'package:lubridate':
%--%, union
The following objects are masked from 'package:dplyr':
as_data_frame, groups, union
The following objects are masked from 'package:purrr':
compose, simplify
The following object is masked from 'package:tidyr':
crossing
The following object is masked from 'package:tibble':
as_data_frame
The following objects are masked from 'package:stats':
decompose, spectrum
The following object is masked from 'package:base':
union
library(ggraph)
Warning: package 'ggraph' was built under R version 4.4.3
library(scales)
Attaching package: 'scales'
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
library(widyr)
Warning: package 'widyr' was built under R version 4.4.3
Rows: 13901 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): word, sentiment
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 600 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): subreddit, id, title, selftext, permalink, post_text, date_created...
dbl (4): line, created_utc, score, num_comments
dttm (1): datetimestamp
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
posts %>%count(console)
# A tibble: 2 × 2
console n
<chr> <int>
1 PS5 300
2 Xbox Series X 300
Question 1: How do PS5 and XBOX Series X subreddits differ in tone?
I wanted to see which subreddit uses more emotionally-charged words, like joy, anger, fear, or trust. To check this, I matched all the words from each post with the NRC emotion lexicon, which tags words with different emotions. Then I compared the proportions across both consoles to see if one group tends to talk in a more emotional way than the other.
nrc_counts <- tidy_posts %>%inner_join(nrc, by ="word", relationship ="many-to-many") %>%count(console, sentiment)nrc_props <- nrc_counts %>%group_by(console) %>%mutate(prop = n /sum(n))head(nrc_props)
ggplot(nrc_props, aes(x = sentiment, y = prop, fill = console)) +geom_col(position ="dodge") +scale_y_continuous(labels = scales::percent) +labs(title ="NRC Emotional Proportions in PS5 vs Xbox Series X Posts",x ="Emotion",y ="Percent of Emotion-Tagged Words",fill ="Console" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
Across the categories, PS5 posts show higher proportions of emotion based words on a few more categories than Xbox Series X posts.
Xbox posts still contain plenty of emotional language though, overall the two are pretty much proportional to each other.
Question 2: What word pairs are most common for each console?
Next, I wanted to explore what words appeared frequently together in each subreddit to see what topics and phrases show up repeatedly for each console.
Looking at word pairs that appear together in the same post can help reveal themes like performance, features, issues, or specific games that people talk about.
# A tibble: 10 × 3
item1 item2 n
<chr> <chr> <dbl>
1 game games 44
2 game xbox 38
3 games xbox 33
4 ps5 games 30
5 xbox series 30
6 2 game 29
7 game time 24
8 game play 23
9 games play 22
10 2 games 21
set.seed(123)ps5_net <- pairwise %>%filter(item1 != item2, n >=4) %>%filter(item1 %in% tidy_posts$word[tidy_posts$console =="PS5"]) %>%graph_from_data_frame()ggraph(ps5_net, layout ="fr") +geom_edge_link(aes(edge_alpha = n, edge_width = n), color ="#0077cc") +geom_node_point(size =3) +geom_node_text(aes(label = name), repel =TRUE) +theme_void() +labs(title ="PS5 Word Pair Network")
Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
ℹ Please use the `transform` argument instead.
Warning: ggrepel: 347 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
set.seed(123)xbox_net <- pairwise %>%filter(item1 != item2, n >=4) %>%filter(item1 %in% tidy_posts$word[tidy_posts$console =="Xbox Series X"]) %>%graph_from_data_frame()ggraph(xbox_net, layout ="fr") +geom_edge_link(aes(edge_alpha = n, edge_width = n), color ="#228B22") +geom_node_point(size =3) +geom_node_text(aes(label = name), repel =TRUE) +theme_void() +labs(title ="Xbox Series X Word Pair Network")
Warning: ggrepel: 338 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
Both diagrams show similar clusters around current popular games and general gaming terms, which makes sense for two subreddits focused on consoles.
Question 3: How does sentiment change over time for PS5 and Xbox posts?
I wanted to see whether sentiment for each console is trending in a more positive or negative direction over time.
Using the bing lexicon, I scored words as positive or negative, then calculated a daily sentiment score for each console (positive count minus negative count).
ggplot(sent_time, aes(x = date, y = score, color = console)) +geom_line(size =1) +geom_smooth(method ="loess", se =FALSE) +labs(title ="Daily Sentiment Score Over Time",subtitle ="Score = positive words − negative words",x ="Date",y ="Sentiment Score",color ="Console" ) +theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
`geom_smooth()` using formula = 'y ~ x'
The PS5 and Xbox timelines cover different date ranges because the subreddits get different amounts of activity. PS5 has enough traffic that the API returns mostly recent posts, while Xbox’s posts stretch further back. Because of this, the two lines don’t overlap perfectly, but each subreddit shows its own swings in sentiment. Both consoles have similarly jumpy patterns, which makes sense for online gaming communities where reactions can change quickly based on news or updates.
Conclusion
Overall, this analysis shows that despite that constant bickering and debating on which console or community is better, both communities act and feel comparably to each other. The data set could and should be massively scaled up to get a more accurate analysis, but I imagine based on how competitive these two companies have been with each other, and this analysis so far, that the initial conclusion would still hold up and the results from the two subreddits would be similar.