What to Play during Quarantines? EDA using vgchartz’ Data

Andari Reksi

2020-09-03

Background

Playing games with our consoles during work-at-home period is one of the best way to entertain ourselves in this trying times. However, there are too many games to choose, and we have to maintain our budget in choosing which games to buy. Let’s find out which past popular games that are worth to buy in the upcoming PS Store sale!

About the Data

This dataset is from Kaggle, and is a real data scraped from vgchartz.com as of April 12th, 2019. attention: this data has many N/A, so any game recommendations from this analysis comes from the available data and may not represent the actual game popularity

Data Read

##   Rank                               Name        Genre ESRB_Rating Platform
## 1    1                         Wii Sports       Sports           E      Wii
## 2    2                  Super Mario Bros.     Platform                  NES
## 3    3                     Mario Kart Wii       Racing           E      Wii
## 4    4      PlayerUnknown's Battlegrounds      Shooter                   PC
## 5    5                  Wii Sports Resort       Sports           E      Wii
## 6    6 Pokemon Red / Green / Blue Version Role-Playing           E       GB
##          Publisher        Developer Critic_Score User_Score Total_Shipped
## 1         Nintendo     Nintendo EAD          7.7         NA         82.86
## 2         Nintendo     Nintendo EAD         10.0         NA         40.24
## 3         Nintendo     Nintendo EAD          8.2        9.1         37.14
## 4 PUBG Corporation PUBG Corporation           NA         NA         36.60
## 5         Nintendo     Nintendo EAD          8.0        8.8         33.09
## 6         Nintendo       Game Freak          9.4         NA         31.38
##   Global_Sales NA_Sales PAL_Sales JP_Sales Other_Sales Year
## 1           NA       NA        NA       NA          NA 2006
## 2           NA       NA        NA       NA          NA 1985
## 3           NA       NA        NA       NA          NA 2008
## 4           NA       NA        NA       NA          NA 2017
## 5           NA       NA        NA       NA          NA 2009
## 6           NA       NA        NA       NA          NA 1998

Rank - Ranking of overall sales
Name - Name of the game
Platform - Platform of the game (i.e. PC, PS4, XOne, etc.)
Genre - Genre of the game
ESRB Rating - ESRB Rating of the game
Publisher - Publisher of the game
Developer - Developer of the game
Critic Score - Critic score of the game from 10
User Score - Users score the game from 10
Total Shipped - Total shipped copies of the game
Global_Sales - Total worldwide sales (in millions)
NA_Sales - Sales in North America (in millions)
PAL_Sales - Sales in Europe (in millions)
JP_Sales - Sales in Japan (in millions)
Other_Sales - Sales in the rest of the world (in millions)
Year - Year of release of the game

Data Preparation

## ── Attaching packages ─────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

To simplify the analysis, I want to drop regional sales variables and only take the “global” sales variable:

## 'data.frame':    55792 obs. of  12 variables:
##  $ Rank         : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Name         : Factor w/ 37102 levels "_summer ##","- Arcane preRaise -",..: 35588 30375 18586 23666 35590 23822 21482 31512 21486 19620 ...
##  $ Genre        : Factor w/ 20 levels "Action","Action-Adventure",..: 18 11 13 16 18 14 11 12 11 7 ...
##  $ ESRB_Rating  : Factor w/ 9 levels "","AO","E","E10",..: 3 1 3 1 3 3 3 3 3 1 ...
##  $ Platform     : Factor w/ 74 levels "2600","3DO","3DS",..: 65 42 65 48 65 25 21 25 65 48 ...
##  $ Publisher    : Factor w/ 3069 levels "][ Games","@unepic_fran",..: 1883 1883 1883 2151 1883 1883 1883 1883 1883 1753 ...
##  $ Developer    : Factor w/ 8065 levels "",".theprodukkt",..: 4984 4984 4984 5635 4984 2726 4984 1159 4984 4665 ...
##  $ Critic_Score : num  7.7 10 8.2 NA 8 9.4 9.1 NA 8.6 10 ...
##  $ User_Score   : num  NA NA 9.1 NA 8.8 NA 8.1 NA 9.2 NA ...
##  $ Total_Shipped: num  82.9 40.2 37.1 36.6 33.1 ...
##  $ Global_Sales : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Year         : num  2006 1985 2008 2017 2009 ...

Because we’re concerning about “which PS games to play”, the logical decision is to filter down the Platform into “PS4” only (as it is the latest platform from Playstation line):

## 'data.frame':    1755 obs. of  12 variables:
##  $ Rank         : int  21 35 46 51 69 77 85 98 101 103 ...
##  $ Name         : Factor w/ 37102 levels "_summer ##","- Arcane preRaise -",..: 12802 4895 25336 4917 10770 10769 14373 4897 10771 18727 ...
##  $ Genre        : Factor w/ 20 levels "Action","Action-Adventure",..: 1 16 2 16 18 18 1 16 18 2 ...
##  $ ESRB_Rating  : Factor w/ 9 levels "","AO","E","E10",..: 7 7 7 7 3 3 9 7 3 9 ...
##  $ Platform     : Factor w/ 74 levels "2600","3DO","3DS",..: 54 54 54 54 54 54 54 54 54 54 ...
##  $ Publisher    : Factor w/ 3069 levels "][ Games","@unepic_fran",..: 2282 75 2282 75 749 779 2491 75 779 2491 ...
##  $ Developer    : Factor w/ 8065 levels "",".theprodukkt",..: 5982 7335 5978 6436 2119 2098 3001 7335 2116 3477 ...
##  $ Critic_Score : num  9.7 0 9.8 8 8.3 8.9 9.1 0 0 9.1 ...
##  $ User_Score   : num  0 0 0 0 0 0 8 0 0 0 ...
##  $ Total_Shipped: num  0 0 0 0 0 0 10 0 0 9 ...
##  $ Global_Sales : num  19.4 15.1 13.9 13.4 11.8 ...
##  $ Year         : num  2014 2015 2018 2017 2017 ...

We have narrowed our observations into 1,755 games.

Exploratory Data Analysis

Each gamers usually have their own favorite genre, although sometimes they may try games from other genres too. But how does this genre competes with each other in terms of sales?

## # A tibble: 6 x 3
##   Genre            total.sales     n
##   <fct>                  <dbl> <int>
## 1 Racing                  24.8    79
## 2 Action-Adventure        51.2   112
## 3 Role-Playing            54.4   216
## 4 Sports                 109.    113
## 5 Action                 117.    355
## 6 Shooter                145.    154

Top-Selling Genre in PS4 Games

Who knew Shooting genre has the highest sales number among others genre, with almost $150 millions from ‘only’ 154 millions copy sold? On the other hand, if we judge popularity of a genre based on their copy sold, then it means Action, Sports, Role-Playing, and Action-Adventure are the most popular among gamers worlwide.
But why Shooting game has the higest $ sales even with lower copy sold? My assumption that this is because they rarely sold at discount price (e.g. not included in “Summer Sale” by PS Store).

Genre Popularities on Each Year

Does any particular year catches your attention? For me, 2016 is the golden year for Role-Playing genre (the genre I usually play), because there were these games that launched:

##                                        Name Global_Sales
## 1                          Final Fantasy XV         5.07
## 2 Star Ocean 5: Integrity and Faithlessness         0.45
## 3                              Rainbow Moon         0.00
## 4 Superdimension Neptune vs Sega Hard Girls         0.00
## 5                         The Banner Saga 2         0.00

With the global sales of 5 million (from only PS4 version!), “Final Fantasy XV” is still a popular game that has strong fanbase all over the world.

Final Fantasy XV

Final Fantasy XV

Greatest Game Developer Ever

What did Sledgehammer Games create that give them 7.53 millions in sales? Your answer:

##                             Name Global_Sales
## 1 Call of Duty: Advanced Warfare         7.53
Call of Duty: Advanced Warfare

Call of Duty: Advanced Warfare

Knowing these developers may come in handy for gamers, as we can follow them on their social media to keep up with their next game projects! For example, if you like playing Call of Duty, then you might want to pay attention to Sledgehammer Games' announcements.

Highest-Rated Games

The ultimate, no-brainer way to recommend which game to play is actually by choosing which games that have the highest rating, both from fellow gamers and critiques’ judgement. And since this information is not available on Playstation store, we might as well found out which games that entertain people the most here.

Disclaimer: there are many N/A or 0 information on these scores (critic & users). Any recommendation made from the analysis below comes from incomplete dataset.

If you’re a fellow gamer, I think we can both agree that “user score” from this dataset DOES NOT represent the gamers favorite. I’d suggest we can rather look at the first table (derived from the critic’s score), as it has more well-known games.

Conclusion

When it comes to games, there are no specific game recommendations. Even if you like a particular genre, there is no guarantee that you will like any other games from the same genre. However, to sum up the EDA above:

  1. If you’re looking to try new games, you can look the games under the “Shooter”, “Action”, “Sports”, “Role-Playing”, “Action-Adventure” genre, as they are the most popular
  2. Another way to look for new game is by paying attention to the next release from popular game developers. If you like Call of Duty, then you may follow Sledgehammer Games social media account for their next releases.
  3. Want to play with kids? Sport games are the way to go!
  4. The ultimate way to look for game recommendations is to compare User Score and Critics Score. While my analysis above doesn’t represent real recommendation (because there are many missing score), you can always look for score information online.