This week we utilize a dataset introduced by TidyTuesday, a great weekly project that issues one interesting dataset per week for analysis/visualization. The particular dataset we will be taking a look is the Steam “Video Games and Sliced” data, which contains measures like average/highest numbers of player at the same time for each game. It would be interesting to get a general idea of what games dominated certain time period. A Shiny dashboard/app could be useful to compare an set of video games of interest. In addition, we could try to create an animation based on the count data through time.

Let’s get started.


Loading the data

Loading TidyTuesday data could be easily done by loading the tidytuesdayR package and run tt_load function. Or use a git link.

games <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-16/games.csv') %>% 
 mutate(gamename = str_to_title(gamename)) %>% 
 mutate(month_num = as.integer(factor(month, levels = month.name))) %>% 
 mutate(time = ymd(paste0(year, "-", month_num, "-", "01")))
## 
## -- Column specification --------------------------------------------------------
## cols(
##   gamename = col_character(),
##   year = col_double(),
##   month = col_character(),
##   avg = col_double(),
##   gain = col_double(),
##   peak = col_double(),
##   avg_peak_perc = col_character()
## )


Overal description of the data and the questions

games %>% count(gamename, sort = T)
## # A tibble: 1,257 x 2
##    gamename                           n
##    <chr>                          <int>
##  1 Age Of Empires® Iii (2007)       104
##  2 Airmech Strike                   104
##  3 Alan Wake                        104
##  4 Alan Wake's American Nightmare   104
##  5 Alien Swarm                      104
##  6 Aliens Vs. Predator              104
##  7 Alpha Protocol                   104
##  8 Amnesia: The Dark Descent        104
##  9 Anno 2070                        104
## 10 Anomaly Warzone Earth            104
## # ... with 1,247 more rows

There are in total 1248 games included in the dataset.

games %>% count(year)
## # A tibble: 10 x 2
##     year     n
##  * <dbl> <int>
##  1  2012  1419
##  2  2013  3957
##  3  2014  5668
##  4  2015  7688
##  5  2016  9838
##  6  2017 11676
##  7  2018 12765
##  8  2019 13622
##  9  2020 14486
## 10  2021  2512

Data available from July 2012 to February 2021, it would be interesting to know which game was the “king” of each year.

head(games)
## # A tibble: 6 x 9
##   gamename   year month    avg    gain   peak avg_peak_perc month_num time      
##   <chr>     <dbl> <chr>  <dbl>   <dbl>  <dbl> <chr>             <int> <date>    
## 1 Counter-~  2021 Febr~ 7.41e5  -2196. 1.12e6 65.9567%              2 2021-02-01
## 2 Dota 2     2021 Febr~ 4.05e5 -27840. 6.52e5 62.1275%              2 2021-02-01
## 3 Playerun~  2021 Febr~ 1.99e5  -2290. 4.47e5 44.4707%              2 2021-02-01
## 4 Apex Leg~  2021 Febr~ 1.21e5  49216. 1.97e5 61.4752%              2 2021-02-01
## 5 Rust       2021 Febr~ 1.18e5 -24375. 2.24e5 52.4988%              2 2021-02-01
## 6 Team For~  2021 Febr~ 1.01e5  18083. 1.34e5 75.7603%              2 2021-02-01

Average/highest number of players at the same time could be used to demonstrate popularity of the game.


Which game had the highest number of player at the same time of each year measured

games %>% 
 group_by(gamename, year) %>% 
 summarise(peak_year = max(peak)) %>% 
 group_by(year) %>% 
 top_n(peak_year, n = 10) %>% 
 ungroup() %>% 
 mutate(gamename = reorder_within(gamename, peak_year, year)) %>% 
 ggplot(aes(y = gamename, 
            x = peak_year)) + 
 geom_col() + 
 facet_wrap(~ year, scale = "free", ncol = 2) + 
 scale_x_continuous(labels = scales::comma) + 
 scale_y_reordered() + 
 labs(x = "highest number of players at the same time during the year",
      y = "game")
## `summarise()` has grouped output by 'gamename'. You can override using the `.groups` argument.

Did you find the game you were playing on the list during the years? Dota 2 and CS: GO have been dominating the list for many years. PUBG joined the competition in 2017 and had remained on the list as top 3 since. We also have some shooting stars like Left for Dead 2, Fallout 4, Witcher 3, and one of my favorite time-killer Monster Hunter World. These games appeared on the top 10 list for a few years, and got replaced by competitors. LOLer might be disappointed as the data only represents Steam games. We might have a stand-alone KBWM issue for League of Legends soon.


Take a closer look at the Dota 2, CS: GO, and PUBG competition

games %>% 
 filter(str_detect(gamename, "Dota 2|Global Offensive|Playerunknown")) %>% 
 ggplot(aes(x = time, y = peak, color = gamename)) + 
 geom_line(size = 1.5) + 
 scale_y_continuous(labels = scales::comma) + 
 labs(y = "highest number of players at the same time",
      x = "time",
      color = "game")

It seems that Dota 2 has been wining since 2014 to later-half of 2019. While PUBG had a huge peak during the turn of 2018 and dropped significantly after 2019. CS:GO on the other hand has been increasing in a steady pace. However, it’s a shame that I have never played any of the three. FPS is never my game, and I had LOL in place of Dota 2.


A Shiny app to explore the popularity of any game you choose

Use the THIS LINK to get to the Shiny app.


Thank you for reviewing. Feel free to leave any comments/suggestions.