Video games are played across the world, now more than ever before.
The aim of this report is to analyse the consumption of video games around the world and consider the trends in the popularity of genres across time.
The main discoveries are that:
Video game consumption of the Top 200 games is highest in North America, followed by Europe and Japan
The Action, Shooter and Platform genres are most popular across all regions, though certain regions further favour particular genres more so than others
2010 & 2011 were responsible for the release of the most Top 200 games between 1982-2015
There has been a decline in the number of recently released games entering the Top 200
The Action, Shooter and Sports genres have increased in popularity over the years, while Puzzle-Based and Platform games have decreased in popularity
Considering the research questions outlined regarding consumption and genre prevalence, it is clear that globally, but particularly in North America, Europe and Japan, video games are heavily consumed, with action-oriented genres proving particularly popular today.
We chose to analyse data relating to the Top 200 video games, sourcing our data from kaggle.com
vgsales = read.csv("vgsales.csv")
A preliminary examination of our data follows:
Considering the top and bottom 5 rows of data, we are able to see the 12 distinct variables involved in our analysis.
Initially, there were 16,600 entries, which we cut to 200 to make the data easier to process.
Summary of our Data
summary(vgsales)
## Rank Name Platform
## Min. : 1.00 Grand Theft Auto V : 4 PS3 :28
## 1st Qu.: 50.75 Assassin's Creed II : 2 X360 :28
## Median :100.50 Assassin's Creed III : 2 Wii :21
## Mean :100.50 Battlefield 3 : 2 DS :19
## 3rd Qu.:150.25 Call of Duty 4: Modern Warfare: 2 PS2 :18
## Max. :200.00 Call of Duty: Advanced Warfare: 2 PS :14
## (Other) :186 (Other):72
## Year Genre Publisher
## 2009 : 16 Action :36 Nintendo :80
## 2010 : 16 Shooter :35 Activision :21
## 2011 : 13 Platform :31 Electronic Arts :17
## 2012 : 13 Role-Playing:30 Sony Computer Entertainment:17
## 2007 : 12 Racing :16 Microsoft Game Studios :15
## 2008 : 12 Misc :15 Take-Two Interactive :13
## (Other):118 (Other) :37 (Other) :37
## NA_Sales EU_Sales JP_Sales Other_Sales
## Min. : 0.070 Min. : 0.000 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 2.638 1st Qu.: 1.705 1st Qu.: 0.1075 1st Qu.: 0.3300
## Median : 3.565 Median : 2.300 Median : 0.8300 Median : 0.6850
## Mean : 4.892 Mean : 3.048 Mean : 1.4116 Mean : 0.9398
## 3rd Qu.: 5.628 3rd Qu.: 3.522 3rd Qu.: 2.1475 3rd Qu.: 1.0750
## Max. :41.490 Max. :29.020 Max. :10.2200 Max. :10.5700
##
## Global_Sales Region.where.sold.most
## Min. : 5.080 Europe : 34
## 1st Qu.: 5.838 japan : 17
## Median : 7.325 North America:148
## Mean :10.291 other : 1
## 3rd Qu.:11.217
## Max. :82.740
##
Names of our Variables
# Names of our variables
names(vgsales)
## [1] "Rank" "Name"
## [3] "Platform" "Year"
## [5] "Genre" "Publisher"
## [7] "NA_Sales" "EU_Sales"
## [9] "JP_Sales" "Other_Sales"
## [11] "Global_Sales" "Region.where.sold.most"
Dimensions of our Data
## Size of data
dim(vgsales)
## [1] 200 12
Top 5 Rows of our Data
# Quick look at top 5 rows of data
head(vgsales)
## Rank Name Platform Year Genre Publisher
## 1 1 Wii Sports Wii 2006 Sports Nintendo
## 2 2 Super Mario Bros. NES 1985 Platform Nintendo
## 3 3 Mario Kart Wii Wii 2008 Racing Nintendo
## 4 4 Wii Sports Resort Wii 2009 Sports Nintendo
## 5 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo
## 6 6 Tetris GB 1989 Puzzle Nintendo
## NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
## 1 41.49 29.02 3.77 8.46 82.74
## 2 29.08 3.58 6.81 0.77 40.24
## 3 15.85 12.88 3.79 3.31 35.82
## 4 15.75 11.01 3.28 2.96 33.00
## 5 11.27 8.89 10.22 1.00 31.37
## 6 23.20 2.26 4.22 0.58 30.26
## Region.where.sold.most
## 1 North America
## 2 North America
## 3 North America
## 4 North America
## 5 North America
## 6 North America
Bottom 5 Rows of our Data
# Quick look at bottom 5 rows of data
tail(vgsales)
## Rank Name Platform Year Genre
## 195 195 Microsoft Flight Simulator PC 1996 Simulation
## 196 196 Guitar Hero II PS2 2006 Misc
## 197 197 Resident Evil 5 PS3 2009 Action
## 198 198 Grand Theft Auto V XOne 2014 Action
## 199 199 Grand Theft Auto: Vice City Stories PSP 2006 Action
## 200 200 FIFA Soccer 11 PS3 2010 Sports
## Publisher NA_Sales EU_Sales JP_Sales Other_Sales
## 195 Microsoft Game Studios 3.22 1.69 0.00 0.20
## 196 RedOctane 3.81 0.63 0.00 0.68
## 197 Capcom 1.96 1.43 1.08 0.65
## 198 Take-Two Interactive 2.66 2.01 0.00 0.41
## 199 Take-Two Interactive 1.70 2.02 0.16 1.21
## 200 Electronic Arts 0.60 3.29 0.06 1.13
## Global_Sales Region.where.sold.most
## 195 5.12 North America
## 196 5.12 North America
## 197 5.11 North America
## 198 5.08 North America
## 199 5.08 Europe
## 200 5.08 Europe
We re-classified the Name, Platform, Year, Genre and Publisher which were labelled ‘characters’, to be ‘factors’ to ensure they would be considered qualitative variables by R.
The quantitative variables were correctly labelled ‘numerical’, and left unchanged.
Re-classification of Variables
name = factor(vgsales$Name)
platform = factor(vgsales$Platform)
year = factor(vgsales$Year)
genre = factor(vgsales$Genre)
publisher = factor(vgsales$Publisher)
R’s Classification of Variables
## R's classification of variables
str(vgsales)
## 'data.frame': 200 obs. of 12 variables:
## $ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Name : Factor w/ 175 levels "Animal Crossing: New Leaf",..: 172 139 91 173 123 156 106 171 109 35 ...
## $ Platform : Factor w/ 21 levels "2600","3DS","DS",..: 17 9 17 17 4 4 3 17 17 9 ...
## $ Year : Factor w/ 32 levels "1982","1984",..: 22 3 24 25 12 6 22 22 25 2 ...
## $ Genre : Factor w/ 12 levels "Action","Adventure",..: 11 5 7 11 8 6 5 4 5 9 ...
## $ Publisher : Factor w/ 23 levels "505 Games","Activision",..: 12 12 12 12 12 12 12 12 12 12 ...
## $ NA_Sales : num 41.5 29.1 15.8 15.8 11.3 ...
## $ EU_Sales : num 29.02 3.58 12.88 11.01 8.89 ...
## $ JP_Sales : num 3.77 6.81 3.79 3.28 10.22 ...
## $ Other_Sales : num 8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
## $ Global_Sales : num 82.7 40.2 35.8 33 31.4 ...
## $ Region.where.sold.most: Factor w/ 4 levels "Europe","japan",..: 3 3 3 3 3 3 3 3 3 3 ...
We then isolated our variables.
## Isolating variables
rank = vgsales$Rank
name = vgsales$Name
platform = vgsales$Platform
year = vgsales$Year
genre = vgsales$Genre
publisher = vgsales$Publisher
NAsales = vgsales$NA_Sales
EUsales = vgsales$EU_Sales
JPsales = vgsales$JP_Sales
otherSales = vgsales$Other_Sales
globalSales = vgsales$Global_Sales
Region = vgsales$Region.where.most
The dataset was compiled by Data Engineer Gregory Smith, who sourced his data from a network known as vgchartz.
Vgchartz collects their data through calculated estimates, polls with video game retailers and video game communities, studying resale prices to determine popularity and consulting directly with developers and retail stores.
When assessing the accuracy of this data we found multiple discussions and threads that came to an agreement of a 10% - 15% inaccuracy in their weekly data (further confirmed on the vgchartz website).
Primarily, this is due to retailers unwilling to share their sales data, meaning some estimates had to be made. Although their data is not 100% accurate, there is still a very strong indicator of the trends in video game sales.
Further, due to the dataset being completed late October of 2016, the video game sales for November and December have not been included. This presents a possible issue when looking at 2016 trends as both November and December often produce the highest sales during the year.
Additionally, video games for older game platforms such as the PS3 have stopped development which will inevitably reduce sales regardless of popularity.
This dataset would be valuable for video game developers and companies looking to develop a successful game.
By examining this dataset, developers and companies can extract advantageous information such as the most popular genres and regions with high levels of video game consumption, that would increase the likelihood of a successful video game launch.
Data from vgchartz has been used in multiple research reports aiming to develop an understanding of the current and previous success of certain games. These reports strive to accurately predict which types of games may be successful in the future.
The data came from a video game sales tracking network known as vgchartz, sourced from across the industry and community.
The data is mostly valid because it has been extracted from largely reputable sources, though some error is acknowledged.
Possible issues include under-reporting of sales due to lack of access to every sales source, and incomplete data from 2016.
Each row represents information regarding one of the Top 200 video games.
Each column represents a variable which may be used to further understand and evaluate the success of the Top 200 video games (e.g. genre, sales figures)
While video games are popular around the world, knowledge regarding the largest markets for video games is valuable for potential developers.
North America, Europe, and Japan are the biggest markets for gaming compared to other less developed countries due to well infrastructured networking systems (Nichols 2014).
As seen in the barplot below, North America has the most abundant sales with 148 of the 200 games selling the most in North America. Europe has the second most abundant sales for 31 games. In Japan the video game sales are least abundant with only 17 of the 200 games selling the most in Japan.
library(ggplot2)
Region=vgsales$Region.where.sold.most
p6 = ggplot(vgsales, aes(x = Region))
p6 + geom_bar(fill="#FFCC66") + ggtitle("Times Each Region had the Highest Sales for a Top 200 Game")+ theme(plot.title = element_text(face = "bold"))+ theme(axis.text.x = element_text(color="black",
size=11, angle=360))+ theme(axis.text.y = element_text(color="black",
size=11, angle=360))+theme(axis.title.y = element_text(colour="grey20",size=15,face="bold"))+theme(axis.title.x = element_text(colour="grey20",size=15,face="bold"))
The maximum number of sales for a game in North America is 41.49 million sales.The median value of sales is 3.57 million sales, and the mean sales are 4.89 million.
Countries in North America such as Canada and the United States could be responsible for such high sales as they are are two of the major consumers for video game products (Nichols 2014).
p3 = ggplot(data = vgsales, aes(x = "", y = NAsales))+ theme(axis.title.x = element_blank()) + # Remove x-axis label
ylab("Sales(millions)")
p3 + geom_boxplot(fill="#FF9999") + ggtitle("North America Sales")+theme(plot.title = element_text(face = "bold"))+theme(axis.title.y = element_text(colour="grey20",size=11,face="bold"))
summary(NAsales)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.070 2.638 3.565 4.892 5.628 41.490
The most sales for a game in Europe is 29.02 million sales.The median value of the Europe sales is 2.3 million sales, which is lower than the mean value of 3.05 million.
p4 = ggplot(data = vgsales, aes(x = "", y = EUsales))+ theme(axis.title.x = element_blank()) + # Remove x-axis label
ylab("Sales(millions)")
p4 + geom_boxplot(fill="#FF9999") + ggtitle("Europe Sales")+theme(plot.title = element_text(face = "bold"))+theme(axis.title.y = element_text(colour="grey20",size=11,face="bold"))
summary(EUsales)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.705 2.300 3.048 3.522 29.020
The maximum sales for a game in Japan is 10.22 million sales.The median value of the Japan sales is 0.83 million sales, which is lower than the mean value of 1.4 million.
p5 = ggplot(data = vgsales, aes(x = "", y = JPsales))+ theme(axis.title.x = element_blank()) + # Remove x-axis label
ylab("Sales(millions)")
p5 + geom_boxplot(fill="#FF9999") + ggtitle("Japan Sales")+theme(plot.title = element_text(face = "bold"))+theme(axis.title.y = element_text(colour="grey20",size=11,face="bold"))
summary(JPsales)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1075 0.8300 1.4116 2.1475 10.2200
The highest amount of sales are from North America, with average sales reaching 4.89 million. 148 of the 200 games have sold the most in North America. These results don’t necessarily reflect the whole gaming scene however.
When considering global video game sales, it is valuable to also consider the popularity of certain genres. Such information is valuable in informing our stakeholders about which genres will likely translate into higher sales, around the world.
Calculating the total value of sales across the various genres and regions:
#subset(VideoGames,Genre == 'Sports')
#sport=subset(VideoGames,Genre == 'Sports')
library(ggplot2)
#sum(sport$NA_Sales) = '95.29'
#sum(sport$EU_Sales) = '89.99'
#sum(sport$JP_Sales) = '14.67'
#subset(VideoGames,Genre == 'Action')
#action=subset(VideoGames,Genre == 'Action')
#sum(action$NA_Sales) = '134.38'
#sum(action$EU_Sales) = '99.62'
#sum(action$JP_Sales) = '19.69'
#fight=subset(VideoGames,Genre =='Fighting')
#sum(fight$NA_Sales) = '25.35'
#sum(fight$EU_Sales) = '10.54'
#sum(fight$JP_Sales) = '14.07'
#platform=subset(VideoGames,Genre =='Platform')
#sum(platform$NA_Sales) = '183.82'
#sum(platform$EU_Sales) = '75.58'
#sum(platform$JP_Sales) = '61.99'
#race=subset(VideoGames,Genre =='Racing')
#sum(race$NA_Sales) = '79.05'
#sum(race$EU_Sales) = '58.06'
#sum(race$JP_Sales) = '28.03'
#roleplay=subset(VideoGames,Genre =='Role-Playing')
#sum(roleplay$NA_Sales) = '105.72'
#sum(roleplay$EU_Sales) = '79.41'
#sum(roleplay$JP_Sales) = '90.31'
#shoot=subset(VideoGames,Genre =='Shooter')
#sum(shoot$NA_Sales) = '200.48'
#sum(shoot$EU_Sales) = '98.12'
#sum(shoot$JP_Sales) = '5.87'
Compiling this information into a table:
data <- read.table(text="
Genre,Region,Sales_in_million
Sports,North America,95.29
Sports,Europe,89.99
Sports,Japan,14.67
Action,North America,134.38
Action,Europe,99.62
Action,Japan,19.69
Fighting,North America,25.35
Fighting,Europe,10.54
Fighting,Japan,14.07
Platform,North America,183.82
Platform,Europe,75.58
Platform,Japan,61.99
Race,North America,79.05
Race,Europe,58.06
Race,Japan,28.03
Roleplay,North America,105.72
Roleplay,Europe,79.41
Roleplay,Japan,90.31
Shooter,North America,200.48
Shooter,Europe,98.12
Shooter,Japan,5.87", header=TRUE, sep=",")
library(ggplot2)
ggplot(data, aes(x=Region, y=Sales_in_million, fill=Genre)) +
geom_bar(stat="identity", position="dodge") +
theme(axis.text.x = element_text(angle = 90))
This is consistent with both the European and North American region with the exception of Japan as roleplaying games are dominant in its sales.
In Europe, the Action Genre tops the sales of 99.62 million, representing 19.5% of total sales. Genres that come close include Shooter type games at 98.12 million (19.2%) and Sports at 89.99 millions (17.4%). The European market has the most equal spread of sales across the various genres, excluding Fighting games.
In Japan, a total of 90.31 million sales have been made, with Roleplaying games representing an enormous 42% of total sales in its market. Platform games come in second with 61.99 millions (26.5%), then another drop to Racing games with 28.03 million (11.99%). Roleplaying and platform games are the leading forces in the Japanese market the remaining genres representing 31.5% of total sales.
In North America, Shooting games has the largest sales at 200.48 million, representing 24% of the total sales in this region. Platform games also come in second with 183.82 million (22.3%) then Action games at 134.38 million (16.3%).
In considering video game consumption, it is also valuable for stakeholders to understand the trends in genre prevalence and playing patterns as a whole, which may assist in developing games which will be popular with their target audience at present.
Using a comparative barplot, we graphed the year of release and genre of the Top 200 games in our vgsales data set, enabling an investigation into gaming trends.
# Using ggplot to graph the genres of the most popular games in the past 30 years
library(ggplot2)
# Allowing p to equal a ggplot which will incorporate the number of games sold each year
p = ggplot(vgsales, aes(x = Year))
# Adding further details including: sub-dividing each 'bar' by the genre to demonstrate its prevalence, flipping the x and y axes, creating a sideways bar graph, moving the legend to the right, adopting a minimal theme, and labelling
p + geom_bar(aes(fill = Genre), position = position_stack(reverse = TRUE)) + coord_flip() + theme(legend.position = "right") + theme_minimal() + labs(y = "Number of Games", title = "Year of Release of the Top 200 Games", subtitle = "Further considering evolutions in genre prevalence")
This rapid increase is probably because of the democratisation of technology, increasing individuals’ access to gaming consoles and hence the volume of games purchased.
The marked drop in Top 200 games released in recent years may be due to a broader shift away from video games, as new technologies such as mobile games and VR have begun to decrease the market for video games. This trend is likely to continue as such technologies become more prominent and accessible.
It may also be possible that because these games have been released more recently, there is a smaller window for gaming purchases, hence decreasing the likelihood of a game being one of the Top 200 most purchased games of all time.
There is a notable trend towards more action-oriented genres such as Shooting, Racing, Action and Sports in the last 15 years.
Popular games released prior to 2000 are predominantly platform or puzzle-based yet in recent years, these genres appear to be some of the least popular.
This may be due to the advent of mass media and growing interest in racing, sports and action.
In spite of exponential growth in video game over the past few decades, sales of recent releases appear to be lower than that of older games.
Further, the popularity of Sports, Shooter and Action genres has increased significantly.
O’Brien, M. (2019). The Video Game Business by Randy Nichols (review). [online] Muse.jhu.edu. Available at: https://muse.jhu.edu/article/609061/pdf [Accessed 20 Mar. 2019].
Style: APA