1 Executive Summary

Video games are played across the world, now more than ever before.

The aim of this report is to analyse the consumption of video games around the world and consider the trends in the popularity of genres across time.

The main discoveries are that:

  1. Video game consumption of the Top 200 games is highest in North America, followed by Europe and Japan

  2. The Action, Shooter and Platform genres are most popular across all regions, though certain regions further favour particular genres more so than others

  3. 2010 & 2011 were responsible for the release of the most Top 200 games between 1982-2015

  4. There has been a decline in the number of recently released games entering the Top 200

  5. The Action, Shooter and Sports genres have increased in popularity over the years, while Puzzle-Based and Platform games have decreased in popularity

Considering the research questions outlined regarding consumption and genre prevalence, it is clear that globally, but particularly in North America, Europe and Japan, video games are heavily consumed, with action-oriented genres proving particularly popular today.


2 Full Report

2.1 Initial Data Analysis (IDA)

We chose to analyse data relating to the Top 200 video games, sourcing our data from kaggle.com

vgsales = read.csv("vgsales.csv")

A preliminary examination of our data follows:

2.1.1 Complexity of Data

Considering the top and bottom 5 rows of data, we are able to see the 12 distinct variables involved in our analysis.

Initially, there were 16,600 entries, which we cut to 200 to make the data easier to process.

Summary of our Data

summary(vgsales)
##       Rank                                    Name        Platform 
##  Min.   :  1.00   Grand Theft Auto V            :  4   PS3    :28  
##  1st Qu.: 50.75   Assassin's Creed II           :  2   X360   :28  
##  Median :100.50   Assassin's Creed III          :  2   Wii    :21  
##  Mean   :100.50   Battlefield 3                 :  2   DS     :19  
##  3rd Qu.:150.25   Call of Duty 4: Modern Warfare:  2   PS2    :18  
##  Max.   :200.00   Call of Duty: Advanced Warfare:  2   PS     :14  
##                   (Other)                       :186   (Other):72  
##       Year              Genre                          Publisher 
##  2009   : 16   Action      :36   Nintendo                   :80  
##  2010   : 16   Shooter     :35   Activision                 :21  
##  2011   : 13   Platform    :31   Electronic Arts            :17  
##  2012   : 13   Role-Playing:30   Sony Computer Entertainment:17  
##  2007   : 12   Racing      :16   Microsoft Game Studios     :15  
##  2008   : 12   Misc        :15   Take-Two Interactive       :13  
##  (Other):118   (Other)     :37   (Other)                    :37  
##     NA_Sales         EU_Sales         JP_Sales        Other_Sales     
##  Min.   : 0.070   Min.   : 0.000   Min.   : 0.0000   Min.   : 0.0000  
##  1st Qu.: 2.638   1st Qu.: 1.705   1st Qu.: 0.1075   1st Qu.: 0.3300  
##  Median : 3.565   Median : 2.300   Median : 0.8300   Median : 0.6850  
##  Mean   : 4.892   Mean   : 3.048   Mean   : 1.4116   Mean   : 0.9398  
##  3rd Qu.: 5.628   3rd Qu.: 3.522   3rd Qu.: 2.1475   3rd Qu.: 1.0750  
##  Max.   :41.490   Max.   :29.020   Max.   :10.2200   Max.   :10.5700  
##                                                                       
##   Global_Sales      Region.where.sold.most
##  Min.   : 5.080   Europe       : 34       
##  1st Qu.: 5.838   japan        : 17       
##  Median : 7.325   North America:148       
##  Mean   :10.291   other        :  1       
##  3rd Qu.:11.217                           
##  Max.   :82.740                           
## 

Names of our Variables

# Names of our variables
names(vgsales)
##  [1] "Rank"                   "Name"                  
##  [3] "Platform"               "Year"                  
##  [5] "Genre"                  "Publisher"             
##  [7] "NA_Sales"               "EU_Sales"              
##  [9] "JP_Sales"               "Other_Sales"           
## [11] "Global_Sales"           "Region.where.sold.most"

Dimensions of our Data

## Size of data
dim(vgsales)
## [1] 200  12

Top 5 Rows of our Data

# Quick look at top 5 rows of data
head(vgsales)
##   Rank                     Name Platform Year        Genre Publisher
## 1    1               Wii Sports      Wii 2006       Sports  Nintendo
## 2    2        Super Mario Bros.      NES 1985     Platform  Nintendo
## 3    3           Mario Kart Wii      Wii 2008       Racing  Nintendo
## 4    4        Wii Sports Resort      Wii 2009       Sports  Nintendo
## 5    5 Pokemon Red/Pokemon Blue       GB 1996 Role-Playing  Nintendo
## 6    6                   Tetris       GB 1989       Puzzle  Nintendo
##   NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
## 1    41.49    29.02     3.77        8.46        82.74
## 2    29.08     3.58     6.81        0.77        40.24
## 3    15.85    12.88     3.79        3.31        35.82
## 4    15.75    11.01     3.28        2.96        33.00
## 5    11.27     8.89    10.22        1.00        31.37
## 6    23.20     2.26     4.22        0.58        30.26
##   Region.where.sold.most
## 1          North America
## 2          North America
## 3          North America
## 4          North America
## 5          North America
## 6          North America

Bottom 5 Rows of our Data

# Quick look at bottom 5 rows of data
tail(vgsales)
##     Rank                                Name Platform Year      Genre
## 195  195          Microsoft Flight Simulator       PC 1996 Simulation
## 196  196                      Guitar Hero II      PS2 2006       Misc
## 197  197                     Resident Evil 5      PS3 2009     Action
## 198  198                  Grand Theft Auto V     XOne 2014     Action
## 199  199 Grand Theft Auto: Vice City Stories      PSP 2006     Action
## 200  200                      FIFA Soccer 11      PS3 2010     Sports
##                  Publisher NA_Sales EU_Sales JP_Sales Other_Sales
## 195 Microsoft Game Studios     3.22     1.69     0.00        0.20
## 196              RedOctane     3.81     0.63     0.00        0.68
## 197                 Capcom     1.96     1.43     1.08        0.65
## 198   Take-Two Interactive     2.66     2.01     0.00        0.41
## 199   Take-Two Interactive     1.70     2.02     0.16        1.21
## 200        Electronic Arts     0.60     3.29     0.06        1.13
##     Global_Sales Region.where.sold.most
## 195         5.12          North America
## 196         5.12          North America
## 197         5.11          North America
## 198         5.08          North America
## 199         5.08                 Europe
## 200         5.08                 Europe

2.1.2 Classification of Variables

We re-classified the Name, Platform, Year, Genre and Publisher which were labelled ‘characters’, to be ‘factors’ to ensure they would be considered qualitative variables by R.

The quantitative variables were correctly labelled ‘numerical’, and left unchanged.

Re-classification of Variables

name = factor(vgsales$Name)
platform = factor(vgsales$Platform)
year = factor(vgsales$Year)
genre = factor(vgsales$Genre)
publisher = factor(vgsales$Publisher)

R’s Classification of Variables

## R's classification of variables
str(vgsales)
## 'data.frame':    200 obs. of  12 variables:
##  $ Rank                  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Name                  : Factor w/ 175 levels "Animal Crossing: New Leaf",..: 172 139 91 173 123 156 106 171 109 35 ...
##  $ Platform              : Factor w/ 21 levels "2600","3DS","DS",..: 17 9 17 17 4 4 3 17 17 9 ...
##  $ Year                  : Factor w/ 32 levels "1982","1984",..: 22 3 24 25 12 6 22 22 25 2 ...
##  $ Genre                 : Factor w/ 12 levels "Action","Adventure",..: 11 5 7 11 8 6 5 4 5 9 ...
##  $ Publisher             : Factor w/ 23 levels "505 Games","Activision",..: 12 12 12 12 12 12 12 12 12 12 ...
##  $ NA_Sales              : num  41.5 29.1 15.8 15.8 11.3 ...
##  $ EU_Sales              : num  29.02 3.58 12.88 11.01 8.89 ...
##  $ JP_Sales              : num  3.77 6.81 3.79 3.28 10.22 ...
##  $ Other_Sales           : num  8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
##  $ Global_Sales          : num  82.7 40.2 35.8 33 31.4 ...
##  $ Region.where.sold.most: Factor w/ 4 levels "Europe","japan",..: 3 3 3 3 3 3 3 3 3 3 ...

We then isolated our variables.

## Isolating variables
rank = vgsales$Rank
name = vgsales$Name
platform = vgsales$Platform
year = vgsales$Year
genre = vgsales$Genre
publisher = vgsales$Publisher
NAsales = vgsales$NA_Sales
EUsales = vgsales$EU_Sales
JPsales = vgsales$JP_Sales
otherSales = vgsales$Other_Sales
globalSales = vgsales$Global_Sales
Region = vgsales$Region.where.most

2.1.3 Source of Data

The dataset was compiled by Data Engineer Gregory Smith, who sourced his data from a network known as vgchartz.

Vgchartz collects their data through calculated estimates, polls with video game retailers and video game communities, studying resale prices to determine popularity and consulting directly with developers and retail stores.

When assessing the accuracy of this data we found multiple discussions and threads that came to an agreement of a 10% - 15% inaccuracy in their weekly data (further confirmed on the vgchartz website).

Primarily, this is due to retailers unwilling to share their sales data, meaning some estimates had to be made. Although their data is not 100% accurate, there is still a very strong indicator of the trends in video game sales.

Further, due to the dataset being completed late October of 2016, the video game sales for November and December have not been included. This presents a possible issue when looking at 2016 trends as both November and December often produce the highest sales during the year.

Additionally, video games for older game platforms such as the PS3 have stopped development which will inevitably reduce sales regardless of popularity.

2.1.4 Stakeholders

This dataset would be valuable for video game developers and companies looking to develop a successful game.

By examining this dataset, developers and companies can extract advantageous information such as the most popular genres and regions with high levels of video game consumption, that would increase the likelihood of a successful video game launch.

2.1.5 Domain Knowledge

Data from vgchartz has been used in multiple research reports aiming to develop an understanding of the current and previous success of certain games. These reports strive to accurately predict which types of games may be successful in the future.

2.1.6 Summary

The data came from a video game sales tracking network known as vgchartz, sourced from across the industry and community.

The data is mostly valid because it has been extracted from largely reputable sources, though some error is acknowledged.

Possible issues include under-reporting of sales due to lack of access to every sales source, and incomplete data from 2016.

Each row represents information regarding one of the Top 200 video games.

Each column represents a variable which may be used to further understand and evaluate the success of the Top 200 video games (e.g. genre, sales figures)


2.2 Research Question 1

2.2.1 What does video game consumption look like around the world?

While video games are popular around the world, knowledge regarding the largest markets for video games is valuable for potential developers.

North America, Europe, and Japan are the biggest markets for gaming compared to other less developed countries due to well infrastructured networking systems (Nichols 2014).

As seen in the barplot below, North America has the most abundant sales with 148 of the 200 games selling the most in North America. Europe has the second most abundant sales for 31 games. In Japan the video game sales are least abundant with only 17 of the 200 games selling the most in Japan.

library(ggplot2)
Region=vgsales$Region.where.sold.most

p6 = ggplot(vgsales, aes(x = Region)) 
p6 + geom_bar(fill="#FFCC66") + ggtitle("Times Each Region had the Highest Sales for a Top 200 Game")+ theme(plot.title = element_text(face = "bold"))+ theme(axis.text.x = element_text(color="black", 
                           size=11, angle=360))+ theme(axis.text.y = element_text(color="black", 
                           size=11, angle=360))+theme(axis.title.y = element_text(colour="grey20",size=15,face="bold"))+theme(axis.title.x = element_text(colour="grey20",size=15,face="bold"))

The maximum number of sales for a game in North America is 41.49 million sales.The median value of sales is 3.57 million sales, and the mean sales are 4.89 million.

Countries in North America such as Canada and the United States could be responsible for such high sales as they are are two of the major consumers for video game products (Nichols 2014).

p3 = ggplot(data = vgsales, aes(x = "", y = NAsales))+ theme(axis.title.x = element_blank()) +   # Remove x-axis label
     ylab("Sales(millions)") 
p3 + geom_boxplot(fill="#FF9999") + ggtitle("North America Sales")+theme(plot.title = element_text(face = "bold"))+theme(axis.title.y = element_text(colour="grey20",size=11,face="bold"))

summary(NAsales)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.070   2.638   3.565   4.892   5.628  41.490

The most sales for a game in Europe is 29.02 million sales.The median value of the Europe sales is 2.3 million sales, which is lower than the mean value of 3.05 million.

p4 = ggplot(data = vgsales, aes(x = "", y = EUsales))+ theme(axis.title.x = element_blank()) +   # Remove x-axis label
     ylab("Sales(millions)") 
p4 + geom_boxplot(fill="#FF9999") + ggtitle("Europe Sales")+theme(plot.title = element_text(face = "bold"))+theme(axis.title.y = element_text(colour="grey20",size=11,face="bold"))

summary(EUsales)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.705   2.300   3.048   3.522  29.020

The maximum sales for a game in Japan is 10.22 million sales.The median value of the Japan sales is 0.83 million sales, which is lower than the mean value of 1.4 million.

p5 = ggplot(data = vgsales, aes(x = "", y = JPsales))+ theme(axis.title.x = element_blank()) +   # Remove x-axis label
     ylab("Sales(millions)") 
p5 + geom_boxplot(fill="#FF9999") + ggtitle("Japan Sales")+theme(plot.title = element_text(face = "bold"))+theme(axis.title.y = element_text(colour="grey20",size=11,face="bold"))

summary(JPsales)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1075  0.8300  1.4116  2.1475 10.2200

2.2.2 Summary

The highest amount of sales are from North America, with average sales reaching 4.89 million. 148 of the 200 games have sold the most in North America. These results don’t necessarily reflect the whole gaming scene however.

2.3 Research Question 2

2.3.2 Observations

  1. Across all regions, the most popular genres are Action, Platform and Shooting games.

This is consistent with both the European and North American region with the exception of Japan as roleplaying games are dominant in its sales.

  1. Europe

In Europe, the Action Genre tops the sales of 99.62 million, representing 19.5% of total sales. Genres that come close include Shooter type games at 98.12 million (19.2%) and Sports at 89.99 millions (17.4%). The European market has the most equal spread of sales across the various genres, excluding Fighting games.

  1. Japan

In Japan, a total of 90.31 million sales have been made, with Roleplaying games representing an enormous 42% of total sales in its market. Platform games come in second with 61.99 millions (26.5%), then another drop to Racing games with 28.03 million (11.99%). Roleplaying and platform games are the leading forces in the Japanese market the remaining genres representing 31.5% of total sales.

  1. North America

In North America, Shooting games has the largest sales at 200.48 million, representing 24% of the total sales in this region. Platform games also come in second with 183.82 million (22.3%) then Action games at 134.38 million (16.3%).

2.4 Research Question 3

2.4.2 Observations

  1. The number of popular games grew exponentially over the last two decades, peaking in 2010-11

This rapid increase is probably because of the democratisation of technology, increasing individuals’ access to gaming consoles and hence the volume of games purchased.

  1. In recent years, fewer releases have entered the Top 200 video games list

The marked drop in Top 200 games released in recent years may be due to a broader shift away from video games, as new technologies such as mobile games and VR have begun to decrease the market for video games. This trend is likely to continue as such technologies become more prominent and accessible.

It may also be possible that because these games have been released more recently, there is a smaller window for gaming purchases, hence decreasing the likelihood of a game being one of the Top 200 most purchased games of all time.

  1. The prevalence of the Sports, Shooter and Action genres has grown noticeably over the past three decades

There is a notable trend towards more action-oriented genres such as Shooting, Racing, Action and Sports in the last 15 years.

Popular games released prior to 2000 are predominantly platform or puzzle-based yet in recent years, these genres appear to be some of the least popular.

This may be due to the advent of mass media and growing interest in racing, sports and action.

2.4.3 Summary

In spite of exponential growth in video game over the past few decades, sales of recent releases appear to be lower than that of older games.

Further, the popularity of Sports, Shooter and Action genres has increased significantly.


3 References

O’Brien, M. (2019). The Video Game Business by Randy Nichols (review). [online] Muse.jhu.edu. Available at: https://muse.jhu.edu/article/609061/pdf [Accessed 20 Mar. 2019].

Style: APA