This week we utilize a dataset introduced by TidyTuesday, a great weekly project that issues one interesting dataset per week for analysis/visualization. The particular dataset we will be taking a look is the Steam “Video Games and Sliced” data, which contains measures like average/highest numbers of player at the same time for each game. It would be interesting to get a general idea of what games dominated certain time period. A Shiny dashboard/app could be useful to compare an set of video games of interest. In addition, we could try to create an animation based on the count data through time.
Let’s get started.
Loading TidyTuesday data could be easily done by loading the tidytuesdayR package and run tt_load function. Or use a git link.
games <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-16/games.csv') %>%
mutate(gamename = str_to_title(gamename)) %>%
mutate(month_num = as.integer(factor(month, levels = month.name))) %>%
mutate(time = ymd(paste0(year, "-", month_num, "-", "01")))
##
## -- Column specification --------------------------------------------------------
## cols(
## gamename = col_character(),
## year = col_double(),
## month = col_character(),
## avg = col_double(),
## gain = col_double(),
## peak = col_double(),
## avg_peak_perc = col_character()
## )
games %>% count(gamename, sort = T)
## # A tibble: 1,257 x 2
## gamename n
## <chr> <int>
## 1 Age Of Empires® Iii (2007) 104
## 2 Airmech Strike 104
## 3 Alan Wake 104
## 4 Alan Wake's American Nightmare 104
## 5 Alien Swarm 104
## 6 Aliens Vs. Predator 104
## 7 Alpha Protocol 104
## 8 Amnesia: The Dark Descent 104
## 9 Anno 2070 104
## 10 Anomaly Warzone Earth 104
## # ... with 1,247 more rows
There are in total 1248 games included in the dataset.
games %>% count(year)
## # A tibble: 10 x 2
## year n
## * <dbl> <int>
## 1 2012 1419
## 2 2013 3957
## 3 2014 5668
## 4 2015 7688
## 5 2016 9838
## 6 2017 11676
## 7 2018 12765
## 8 2019 13622
## 9 2020 14486
## 10 2021 2512
Data available from July 2012 to February 2021, it would be interesting to know which game was the “king” of each year.
head(games)
## # A tibble: 6 x 9
## gamename year month avg gain peak avg_peak_perc month_num time
## <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <int> <date>
## 1 Counter-~ 2021 Febr~ 7.41e5 -2196. 1.12e6 65.9567% 2 2021-02-01
## 2 Dota 2 2021 Febr~ 4.05e5 -27840. 6.52e5 62.1275% 2 2021-02-01
## 3 Playerun~ 2021 Febr~ 1.99e5 -2290. 4.47e5 44.4707% 2 2021-02-01
## 4 Apex Leg~ 2021 Febr~ 1.21e5 49216. 1.97e5 61.4752% 2 2021-02-01
## 5 Rust 2021 Febr~ 1.18e5 -24375. 2.24e5 52.4988% 2 2021-02-01
## 6 Team For~ 2021 Febr~ 1.01e5 18083. 1.34e5 75.7603% 2 2021-02-01
Average/highest number of players at the same time could be used to demonstrate popularity of the game.
games %>%
group_by(gamename, year) %>%
summarise(peak_year = max(peak)) %>%
group_by(year) %>%
top_n(peak_year, n = 10) %>%
ungroup() %>%
mutate(gamename = reorder_within(gamename, peak_year, year)) %>%
ggplot(aes(y = gamename,
x = peak_year)) +
geom_col() +
facet_wrap(~ year, scale = "free", ncol = 2) +
scale_x_continuous(labels = scales::comma) +
scale_y_reordered() +
labs(x = "highest number of players at the same time during the year",
y = "game")
## `summarise()` has grouped output by 'gamename'. You can override using the `.groups` argument.
Did you find the game you were playing on the list during the years? Dota 2 and CS: GO have been dominating the list for many years. PUBG joined the competition in 2017 and had remained on the list as top 3 since. We also have some shooting stars like Left for Dead 2, Fallout 4, Witcher 3, and one of my favorite time-killer Monster Hunter World. These games appeared on the top 10 list for a few years, and got replaced by competitors. LOLer might be disappointed as the data only represents Steam games. We might have a stand-alone KBWM issue for League of Legends soon.
games %>%
filter(str_detect(gamename, "Dota 2|Global Offensive|Playerunknown")) %>%
ggplot(aes(x = time, y = peak, color = gamename)) +
geom_line(size = 1.5) +
scale_y_continuous(labels = scales::comma) +
labs(y = "highest number of players at the same time",
x = "time",
color = "game")
It seems that Dota 2 has been wining since 2014 to later-half of 2019. While PUBG had a huge peak during the turn of 2018 and dropped significantly after 2019. CS:GO on the other hand has been increasing in a steady pace. However, it’s a shame that I have never played any of the three. FPS is never my game, and I had LOL in place of Dota 2.
Use the THIS LINK to get to the Shiny app.
Thank you for reviewing. Feel free to leave any comments/suggestions.