Source: https://www.kaggle.com/gregorut/videogamesales

It is interesting that I was never interested in video games. One of the reasons I found this dataset and decided to work on it is because I am curious to know what are the popular games out there that my friends are playing and talking about. The dataset is rich with 16,598 observations and 11 variables. It was scraped from vgchartz.com and contains a list of video games with sales greater than 100,000 copies. The information includes name for the game’s name, ranking, platform of the games release such as PC or PS4, year of the game’s release, genre of each game, and their publishers. Games are sold in North America, Europe, of course Japan, the dreamland of game, sales in the rest of the world, and total worldwide sales.

I will use ggplot, dplyr to create a line chart and a scatter chart showing the video games sales in the world over the course from 1984 to 2020. Plotly and Highcharter are also being used for interactivity.

Create a line chart to visualize Video Game Sales from 1984 to 2020 in different regions.

Loading necessary packages

library(readr)
library(ggplot2)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ tibble  3.1.8     ✔ dplyr   1.1.0
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ purrr   1.0.1     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(RColorBrewer)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(ggthemes)
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

Set working directory and load the dataset

setwd("/Users/Linh/Desktop/DATASETS ")
vgsales <- read_csv("vgsales.csv")
## Rows: 16598 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Name, Platform, Year, Genre, Publisher
## dbl (6): Rank, NA_Sales, EU_Sales, JP_Sales, Other_Sales, Global_Sales
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Create an interactive Line Chart with ggplot and plotly

Because the dataset has over 16,000 names of different games, it is not an effective way to choose game’s name for legend. I created the line chart grouped by genre. The chart reflects the Global Sales ( in million) over time from 1984 to 2020. Thanks to plotly, each mouse over tooptip shows detailed information of each game, ranking, year, genre, publisher, and sales in different regions.

videogames <- ggplot(vgsales,
                         aes(x = Year, y = Global_Sales, color = Genre, group = 1 )) +
  theme_economist() +
  geom_line() +
  geom_point(
    aes(text =
          paste(paste("Name:", Name, "<br>"),
                paste("Rank:", Rank, "<br>"),
                paste("Year:", Year, "<br>"),
                paste("Genre:", Genre, "<br>"),
                paste("Publisher:", Publisher, "<br>"),
                paste("EU Sales:", EU_Sales, "<br>"),
                paste("NA Sales:", NA_Sales, "<br>"),
                paste("Japan Sales:", JP_Sales, "<br>"),
                paste("Global Sales:", Global_Sales, "<br>"))),
    size = 1,
    data = vgsales) +
  scale_x_continuous(breaks=seq(1984, 2020, 6)) +
  scale_color_pander() +
  labs(x="Year", y="Global Sales in Million") +
  ggtitle("The Global Sales of Video Game over time")
## Warning in geom_point(aes(text = paste(paste("Name:", Name, "<br>"),
## paste("Rank:", : Ignoring unknown aesthetics: text

Adding the plot with tooltip

videogames2 <- ggplotly(videogames, tooltip =  "text")
videogames2

Create a different visualization with scatterplot and highcharter.

Also, I adjust the position of the legend and customize tooltips to provide more necessary information.

highchart() %>%
  hc_add_series(data = vgsales,
                type = "scatter",
                hcaes(x = Year,
                      y = Global_Sales,
                      group = Genre)) %>%
  hc_legend(align = "center", verticalAlign = "top", layout = "horizontal") %>%
  hc_title(text = "My Centered Title", align = "center") %>%
  hc_xAxis(title = list(text="Year")) %>%
  hc_yAxis(title = list(text="Global Sales in Million")) %>%
  hc_title(text = "Global Sales of Video Games over time") %>%
  hc_plotOptions(series = list(marker = list(symbol = "circle"))) %>%
  hc_add_theme(hc_theme_ffx()) %>%
  hc_tooltip(shared = TRUE,
             pointFormat = "Year: {point.Year}<br>Name: {point.Name}<br>Rank: {point.Rank}<br>Genre: {point.Genre}<br>Global Sales: {point.Global_Sales}")

What I found interesting from these two visualization is that people started to play game in the early years, but starting from 2002 to 2016 is really the period in which video game gained traction.

There is an outlier which is the release of Wii Sports in 2006 with the highest individual global sales of all games in this dataset.

Nintendo seems to be the publisher that has most of the popular games that break sale records.

The most popular genre is Sport, produced by large publisher Nintendo. Popular genres such as Sport, Racing, and role- playing are produced by larger publishers to North America, Europe, and Japan markets.

North America has been the dominance market that contributing to the Global sales.

I have noticed that from 2017 to 2020, there might be missing data or errors and also, the year 2016 is likely not containing a full year of sales.

What I wish to improve: I wish I could do a better job in telling a story of the relation between sales in different regions such as North America compared to Europe or a comparison between sales in Japan, a single country, and a whole different region.