We will learn to do data visualization based on real life data from the internet. Today, we are going to use Steam Charts dataset. The data is about video games dataset scraped from www.steamcharts.com and uploaded to Kaggle. You can find get the dataset from here : https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-03-16/readme.md#gamescsv. It has been informed that the dataset is already clean, so we don’t need to clean the data in our process.
Our goal for this chapter is to make adashboard containing interactive charts using Shiny R and deploy it to ShinyApps from Steam Charts dataset. The dashboard is aimed for anyone who has passion in video games, especially for gamers looking for references and game developers who wants to find ideas about gmaes they want to build.
Load the required packages.
library(plyr)
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.0 v stringr 1.4.0
## v tidyr 1.1.3 v forcats 0.5.1
## v readr 1.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::arrange() masks plyr::arrange()
## x purrr::compact() masks plyr::compact()
## x dplyr::count() masks plyr::count()
## x dplyr::failwith() masks plyr::failwith()
## x dplyr::filter() masks stats::filter()
## x dplyr::id() masks plyr::id()
## x dplyr::lag() masks stats::lag()
## x dplyr::mutate() masks plyr::mutate()
## x dplyr::rename() masks plyr::rename()
## x dplyr::summarise() masks plyr::summarise()
## x dplyr::summarize() masks plyr::summarize()
library(lubridate)##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(zoo)##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(glue)##
## Attaching package: 'glue'
## The following object is masked from 'package:dplyr':
##
## collapse
library(plotly)##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following objects are masked from 'package:plyr':
##
## arrange, mutate, rename, summarise
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(scales)##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(RColorBrewer)
options(scipen = 999)Load the dataset.
games <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-16/games.csv')##
## -- Column specification --------------------------------------------------------
## cols(
## gamename = col_character(),
## year = col_double(),
## month = col_character(),
## avg = col_double(),
## gain = col_double(),
## peak = col_double(),
## avg_peak_perc = col_character()
## )
head(games)str(games)## spec_tbl_df [83,631 x 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ gamename : chr [1:83631] "Counter-Strike: Global Offensive" "Dota 2" "PLAYERUNKNOWN'S BATTLEGROUNDS" "Apex Legends" ...
## $ year : num [1:83631] 2021 2021 2021 2021 2021 ...
## $ month : chr [1:83631] "February" "February" "February" "February" ...
## $ avg : num [1:83631] 741013 404832 198958 120983 117742 ...
## $ gain : num [1:83631] -2196 -27840 -2290 49216 -24375 ...
## $ peak : num [1:83631] 1123485 651615 447390 196799 224276 ...
## $ avg_peak_perc: chr [1:83631] "65.9567%" "62.1275%" "44.4707%" "61.4752%" ...
## - attr(*, "spec")=
## .. cols(
## .. gamename = col_character(),
## .. year = col_double(),
## .. month = col_character(),
## .. avg = col_double(),
## .. gain = col_double(),
## .. peak = col_double(),
## .. avg_peak_perc = col_character()
## .. )
Our data has 83,631 rows and 7 columns. The dataset tells us about the number of games played monthly from 2012 to 2021. The gamename column contains names of the games which appeared repeteadly. Moreover, month and year columns seems to be unsuitable for us to make a time series data. Therefore, we will change gamename into factor and transform month and year into datetime.
games <- games %>%
mutate(tanggal = ymd(paste(year,"-",month,"-01")))
games$date <- as.yearmon(games$tanggal, "%y%m")
games <- games %>%
select(tanggal, year, month, date, gamename, avg, gain, peak, avg_peak_perc)%>%
mutate(tanggal = as.Date(games$tanggal, format = "%b %Y"))Aggregating the data to get average players per game monthly.
target <- c("Counter-Strike: Global Offensive", "Dota 2")
agg1 <- games %>%
select(date, year, gamename, avg) %>%
group_by(gamename, date) %>%
filter(gamename %in% target) %>%
summarise(average = mean(avg)) %>%
arrange(date, desc(average))## `summarise()` has grouped output by 'gamename'. You can override using the `.groups` argument.
agg1Visualize the aggregated data.
plot1 <- ggplot(data = agg1, aes(x = date, y = average, fill = gamename)) +
# geom_area(position = "dodge") +
geom_line() +
geom_point(size = 1, aes(text = glue("{date}= {average}"))) +
labs(x = "",
y = "",
fill = "Category") +
theme_minimal()## Warning: Ignoring unknown aesthetics: text
ggplotly(plot1, tooltip = "text") %>%
config(displayModeBar = F)We will make 3 trendlines for 3 selected games in the dahsboard where we choose the unique value of gamename as our input in out dashboard.
Make a cumulative total of average players for each games monthly.
avgcumtotal <- games %>%
select(date, gamename, avg) %>%
group_by(gamename, date) %>%
summarise(total_avg = sum(avg)) %>%
mutate(cumtotal = cumsum(total_avg)) %>%
filter(gamename == "Dota 2") %>%
head(25)## `summarise()` has grouped output by 'gamename'. You can override using the `.groups` argument.
avgcumtotalVisualize the data.
accum2 <- ggplot(data = avgcumtotal, aes(x = date, y = cumtotal)) +
geom_line(aes(linetype = gamename, color = gamename),show.legend = FALSE) +
geom_point(size = 0.1, aes(text = glue("{date}: {cumtotal}"))) +
theme(legend.position="top") +
labs(title = "Player Accumulation Number Over Periods",
x = "",
y = "") +
theme_minimal()## Warning: Ignoring unknown aesthetics: text
ggplotly(accum2, tooltip = "text") %>%
config(displayModeBar = F)Note: It is optional.
Make a column informing about gain percentage of a game per month.
gainpercent <- games %>%
select(date, gamename, avg) %>%
filter(gamename == "Dota 2") %>%
mutate(gain_percent = (avg/lead(avg)-1)*100) %>%
mutate(gain_percent = round(gain_percent, 1))
gainpercentVisualize the data.
plot2 <- ggplot(gainpercent, aes(x = date, y = gain_percent)) +
geom_bar(stat = "identity", aes(fill = gamename, text = glue("{date}: {gain_percent}%"))) +
# geom_text(aes(label = gain_percent,
# vjust = ifelse(gain_percent >= 0, 0, 1))) +
scale_y_continuous("Gain Percentage (%)") +
labs(title = paste("Gain Percentage of Dota"),
y = "",
x = "") ## Warning: Ignoring unknown aesthetics: text
# theme(legend.position = "none")
ggplotly(plot2, tooltip = "text") %>%
config(displayModeBar = F)## Warning: Removed 1 rows containing missing values (position_stack).
This barplot will be linked with our trend line plot where there will be 3 gain percentage charts in our dashboard.
Make a column informing about percentage of difference in average compared to the previous month.
topgainer <- games %>%
select(tanggal, date, gamename, gain, avg) %>%
group_by(gamename) %>%
arrange(gamename, date) %>%
mutate(gain_percent = (avg/lead(avg)-1)*100) %>%
filter(date == "Jan 2018") %>%
arrange(desc(gain_percent)) %>%
mutate(gain_percent = round(gain_percent, 0)) %>%
select(gamename, gain_percent) %>%
head(5)
topgainerVisualize the data.
plot5 <- ggplot(topgainer, aes(x = gain_percent, y = reorder(gamename, gain_percent)), color = "green") +
geom_col(aes(text = glue("{gain_percent}%"))) +
labs(title = "",
x = "Gain Percentage (%)",
y = "") +
guides(fill = FALSE) +
theme_minimal()## Warning: Ignoring unknown aesthetics: text
ggplotly(plot5, tooltip = "text") %>%
config(displayModeBar = F)IN our dashboard, the table of top 5 gainers and the chart will be showed in our dahsboard where we can change the date time to find the top 5 gainers for each month.
Before we deploy our result into a dashboard in ShinyApps, we need to construct our data into the right form where we customize ui.R , server.R, and global.R based on our desired form.
We just need to follow the instructions and our dashboard will be shown afterwards. You can see the result in here: https://sayyidmquthb.shinyapps.io/steamcharts/