From the dawn of video games in the 1950’s, to the golden age of arcade games in the 1970’s and 1980’s, and all the way up to present day, playing video games has been a popular hobby for kids and adults alike. As the technology grew, so did the gaming industry, and so did the complexity of video games. With major companies such as Nintendo, Sony, and Microsoft pioneering the industry and developing games and gaming devices, the consumers of the world became hooked on video games. But, not until recently has the professional gaming scene truly taken off, proving that playing video games can be a full-time job. Using data pulled from vgchartz.com through https://www.kaggle.com/gregorut/videogamesales?select=vgsales.csv we are able to explore and make insights into video game genres, platforms, publishers, and games. This analysis will look at the top genres, platforms, publishers, games, and will explore trends in the video game industry as well. Through this we will be able to make insights into where the industry is headed and also see how it has developed over the years.
This dataset contains a list of video games with sales greater than 100,000 copies.
Fields include:
• Rank - Ranking of overall sales
• Name - The games name
• Platform - Platform of the games release (i.e. PC,PS4, etc.)
• Year - Year of the game’s release
• Genre - Genre of the game
• Publisher - Publisher of the game
• NA_Sales - Sales in North America (in millions)
• EU_Sales - Sales in Europe (in millions)
• JP_Sales - Sales in Japan (in millions)
• Other_Sales - Sales in the rest of the world (in millions)
• Global_Sales - Total worldwide sales.
There are 16,598 records.
With so many different types of video games available to play it is interesting to look into what genre of game people prefer. Some people solely play sports games like FIFA or Madden, while others only play shooter games like Call of Duty, and others play all types of games. Finding out what genres have done well historically could be useful information for companies thinking of developing a game as it can be a peak into what the consumers tend to enjoy playing. This graph below shows the total sales by genre over the entire dataset with bars colored to show the breakdown of which market the sales occurred in. This can help us understand how each market contributes to the sales of games and which markets companies may want to target depending on the genre of game they are developing.
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = ('c:/ProgramData/Anaconda3/Library/plugins/platforms')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
path = 'U:/'
file = 'vgsales.csv'
df= pd.read_csv(path+file)
#create genre dataframe
genre_df = df.groupby(['Genre'])['NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales'].sum().reset_index()
#sort by global sales
genre_df = genre_df.sort_values(by=['Global_Sales'], ascending=False)
#delete column
del genre_df['Global_Sales']
from matplotlib.ticker import FuncFormatter
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(1,1,1)
genre_df.plot(kind='bar', stacked=True, ax=ax, x='Genre')
plt.ylabel('Total Sales', fontsize=18, labelpad=20)
plt.xlabel('Genre', fontsize=18, labelpad=20)
plt.title('Total Sales by Genre', fontsize=18)
plt.xticks(rotation = 0, horizontalalignment='center', fontsize=10)
plt.yticks(fontsize=14)
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos:('%1.0fM')%(x*1)))
plt.show()
As can be seen above, action games seem to be the most popular games with nearly $1.8 billion in total sales. Following that, we can see that sports and shooter games are also extremely good selling games both coming in at over $1 billion in total sales. In addition, role playing, platform, miscellaneous, and racing games all seem to be big hits. It is intriguing to see how much Japan contributes to the role-playing games sales while in most other genres they are one of the bottom two markets for sales. Moreover, it is interesting to learn that shooter games seem to not be very popular in the Japanese market. Also, it is important to note how significant the North American sales, which make up nearly half of all sales, are to all of these categories. Finally, we can learn that the more niche genres of video games tend to be fighting, simulation, puzzle, adventure, and strategy games as they do not compare in sales to the rest of the categories.
We can further explore this dataset by finding out who the top video game publishers are. This donut chart shows the top 10 publishers based on total sales. This can be useful to know for a multitude of reasons. For one, companies can look at these high performing companies and learn from them on how to operate a successful video game publishing business. Second, this can help us link successful genres to successful companies who may specialize in those specific genres of video games. Third, this can show us the major players in the video game market.
#create publisher dataframe
pub_df = df.groupby(['Publisher'])['NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales'].sum().reset_index()
# sort by global sales
pub_df = pub_df.sort_values(by=['Global_Sales'], ascending=False)
pub_df = pub_df.reset_index()
del pub_df['index']
pub_df.drop(pub_df.index[10:],inplace=True)
# colors for donut
number_outside_colors = len(pub_df.Publisher.unique())
outside_color_ref_number = np.arange(number_outside_colors)*2
fig = plt.figure(figsize=(14,14))
ax = fig.add_subplot(1,1,1)
colormap = plt.get_cmap("tab20c")
outer_colors = colormap(outside_color_ref_number)
pub_total_sales = pub_df.Global_Sales.sum()
pub_df.groupby(['Publisher'])['Global_Sales'].sum().plot(
kind='pie', radius=1, colors = outer_colors, pctdistance = .80, labeldistance = 1.1,
wedgeprops = dict(edgecolor='w'), textprops={'fontsize':16},
autopct = lambda p: '{:.2f}%\n(${:.2f}M)'.format(p,(p/100)*pub_total_sales),
startangle=90)
hole = plt.Circle((0,0), 0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)
ax.yaxis.set_visible(False)
plt.title("Total Sales by Top 10 Publisher's", fontsize=18)
ax.text(0,0, 'Total Sales\n' + '$' + str(round(pub_total_sales)) + 'M', size=18, ha='center', va='center')
ax.axis('equal')
plt.tight_layout()
plt.show()
From looking at this chart, it is quite clear that both Electronic Arts (EA) and Nintendo are incredibly successful companies. Nintendo coming in at the top with 28.55% of total sales of the top 10 publishers tells us that the Japanese company is popular worldwide. This seems quite obvious as they are famous for Pokémon and Wii which we will explore in more detail in another chart. In addition, EA is quite famous for their FIFA and Madden Franchises which makes sense as to why they are a top publisher. It is interesting to note that EA is an American company and Nintendo is a Japanese company. Thinking back to the top selling genres, and what markets made up total sales, it is evident that these companies must dominate their respective markets. In addition, it is clear that both Activision and Sony also are excellent publishing companies. Also, it is interesting to note that Ubisoft is the fourth highest selling publisher, and we can assume this is because they are being carried by their two most popular gaming series, Assassin’s Creed and Tom Clancy.
Now that we know the top publisher and genres, we can further explore this dataset by finding out the top games. This donut chart below shows the top 10 games by total sales. From this we can connect genres and publishers to games, to better understand the overall makeup of the gaming industry. In addition, we can explore why these specific games are popular to better understand what makes them such big hits.
game_df = df.groupby(['Name'])['NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales'].sum().reset_index()
game_df = game_df.sort_values(by=['Global_Sales'], ascending=False)
game_df = game_df.reset_index()
del game_df['index']
game_df.drop(game_df.index[10:], inplace=True)
number_outside_colors = len(game_df.Name.unique())
outside_color_ref_number = np.arange(number_outside_colors)*2
fig = plt.figure(figsize=(12,12))
ax = fig.add_subplot(1,1,1)
colormap = plt.get_cmap("tab20c")
outer_colors = colormap(outside_color_ref_number)
game_total_sales = game_df.Global_Sales.sum()
game_df.groupby(['Name'])['Global_Sales'].sum().plot(
kind='pie', radius=1, colors = outer_colors, pctdistance = .80, labeldistance = 1.1,
wedgeprops = dict(edgecolor='w'), textprops={'fontsize':16},
autopct = lambda p: '{:.2f}%\n(${:.2f}M)'.format(p,(p/100)*game_total_sales),
startangle=90)
hole = plt.Circle((0,0), 0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)
ax.yaxis.set_visible(False)
plt.title('Total Sales by Top 10 Games', fontsize=18)
ax.text(0,0, 'Total Sales\n' + '$' + str(round(game_total_sales)) + 'M', size=18, ha='center', va='center')
ax.axis('equal')
plt.tight_layout()
#plt.legend(game_df.Name, loc="right")
plt.show()
As can be seen above, there are really only 2 significant games that dominated the industry over the last 40 or so years. Those are Wii Sports and Grand Theft Auto V. It is interesting to note that Nintendo published 70% of the top 10 selling games ever with Wii Sports, Wii Sports Resort, Mario Kart Wii, New Super Mario Brothers, Pokémon Red/Pokémon Blue, Super Mario Brothers, and Tetris. This better explains why Nintendo held the top spot for publishers in the top 10 publisher chart. In addition, it is intriguing that two of the remaining three games are part of Activision’s Call of Duty series, which also helps to explain their prominence in the industry. Also, Grand Theft Auto V being the second highest selling game goes to show how popular a game that was once criticized for being too violent can still become a top selling game in the world by providing players with incredibly detailed story mode plot lines and continuous updates to the online platform. In addition, we must note that Wii sports came free with the purchase of the Wii console so the sales of the individual game can also be attributed to the sales of the console itself. Also, it is fascinating that both violent shooter games like Call of Duty and family friendly games like Mario Kart are both so popular. This just goes to show that there is a market for all types of games.
All of these games published by all of these companies still have to be played on something, so to better understand the dataset we can explore the top platforms. In addition, there is constant debate over the best way to play a game. Whether that be on Microsoft’s Xbox, Sony’s PlayStation, or on a PC the debate will never end. Though it is truly a personal preference which platform people play on, we can at least find out what the most popular platforms are based on sales. This bar chart below shows the top 20 platforms by total sales with bars colored to show what market contributed to each portion of sales.
plat_df = df.groupby(['Platform'])['NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales'].sum().reset_index()
plat_df = plat_df.sort_values(by=['Global_Sales'], ascending=False)
plat_df = plat_df.reset_index()
del plat_df['index']
plat_df.drop(plat_df.index[20:31], inplace=True)
del plat_df['Global_Sales']
from matplotlib.ticker import FuncFormatter
fig = plt.figure(figsize=(11,8))
ax = fig.add_subplot(1,1,1)
plat_df.plot(kind='bar', stacked=True, ax=ax, x='Platform')
plt.ylabel('Total Sales', fontsize=18, labelpad=20)
plt.xlabel('Platform', fontsize=18, labelpad=20)
plt.title('Total Sales by Top 20 Platforms', fontsize=18)
plt.xticks(rotation = 0, horizontalalignment='center', fontsize=11)
plt.yticks(fontsize=14)
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos:('%1.0fM')%(x*1)))
plt.show()
From the chart above, we can see the platform with the most top selling games was the PlayStation 2 with over $1.2 billion in sales. Following that, is the Xbox360 with nearly $1 billion and close behind that is the PlayStation 3. Next, we have the Wii and DS, and finally the original PlayStation. After these top 6 platforms there is a significant drop off in sales of games that are played on other platforms. One thing to note is the explosion of platform gaming in the 2000’s and up to the present day. This can help explain why these specific consoles are at the top. Also, it is interesting to see once again that North America makes up a significant portion of sales for games on these platforms. Another intriguing thing is how much more the Japanese favor the companies from their home country. This can be seen in their increased sales with the PlayStation’s from Sony, and all of the platforms by Nintendo, but most importantly the DS. Finally, it is important to mention that if we were to look at these charts again in 10 years, I believe we would see major change as games have shifted to be played on the newer versions of these platforms.
To better summarize the trends in video game sales we can look at the top selling game in each year. This chart below shows the top selling game in each year from 1980-2015 with bars colored by the genre of the game. This can help us understand what series have been truly stealing the show. Also, this can show us how the most popular genre may have shifted over the years as platforms and games have gotten more advanced.
game_year_df = df.groupby(['Name','Year'])['Global_Sales'].sum().reset_index()
game_year_df = game_year_df.sort_values(by=['Global_Sales'], ascending=False)
game_year_df = game_year_df.reset_index()
del game_year_df['index']
copy_df = df.copy()
game_year_df = pd.merge(game_year_df, copy_df, how='left', on='Name')
game_year_df.drop_duplicates(subset =['Name','Year_x'], inplace = True)
temp_df = game_year_df.copy()
game_year_df = game_year_df.groupby(['Year_x'])['Global_Sales_x'].max().reset_index()
game_year_df = pd.merge(game_year_df, temp_df, how='left', left_on=['Year_x','Global_Sales_x'], right_on = ['Year_x','Global_Sales_x'])
game_year_df = game_year_df.drop(['Rank','Platform','Year_y','Publisher','NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales_y'], axis=1)
game_year_df = game_year_df.drop([38,37,36], axis = 0)
import matplotlib.patches as mpatches
def get_colors(df):
colors=[]
for genre in df['Genre']:
if genre == 'Action':
colors.append('lightcoral')
elif genre == 'Shooter':
colors.append('green')
elif genre == 'Racing':
colors.append('blue')
elif genre == 'Sports':
colors.append('red')
elif genre == 'Simulation':
colors.append('yellow')
elif genre == 'Role-Playing':
colors.append('brown')
elif genre == 'Platform':
colors.append('orange')
elif genre == 'Adventure':
colors.append('silver')
elif genre == 'Puzzle':
colors.append('grey')
else:
colors.append('black')
return colors
game_year_df['Year_x'] = game_year_df['Year_x'].astype(int)
fig = plt.figure(figsize=(15,12))
ax1 = fig.add_subplot(1,1,1)
Action = mpatches.Patch(color='lightcoral', label='Action')
Shooter = mpatches.Patch(color='green', label='Shooter')
Racing = mpatches.Patch(color='blue', label='Racing')
Sports = mpatches.Patch(color='red', label='Sports')
Simulation = mpatches.Patch(color='yellow', label='Simulation')
RolePlaying = mpatches.Patch(color='brown', label='Role-Playing')
Platform = mpatches.Patch(color='orange', label='Platform')
Adventure = mpatches.Patch(color='silver', label='Adventure')
Puzzle = mpatches.Patch(color='grey', label='Puzzle')
mycolors = get_colors(game_year_df)
game_year_df.plot(kind='barh',ax=ax1, x='Year_x', y='Global_Sales_x',color=mycolors)
for row_counter, value_in in enumerate(game_year_df.Name):
ax1.text(game_year_df.Global_Sales_x[row_counter], row_counter, value_in, color='black', size=15)
plt.xlim(0, game_year_df.Global_Sales_x.max()*1.12)
plt.axvline(game_year_df.Global_Sales_x.mean(), color='black', linestyle='dashed')
ax1.text(game_year_df.Global_Sales_x.mean()+2, 0, 'Mean = ' + str(game_year_df.Global_Sales_x.mean()), rotation=0, fontsize=14)
ax1.legend(title='Genre',handles=[Action, Shooter, Racing, Sports, Simulation, RolePlaying, Platform, Adventure, Puzzle], fontsize=18)
ax1.set_title('Top Selling Game by Year', fontsize=18)
ax1.set_xlabel('Global Sales (millions)', fontsize=18)
ax1.set_ylabel('Year', fontsize=18)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.show()
As can be seen above, video game sales after 2004 have all been greater than the average sales from 1980-2015. This can tell us that video games continue to become more popular. In addition, it is clear that providing a free game with a console, like Nintendo did with Wii Sports, can help skyrocket sales and overall exposure of a game. Also, it is interesting to note that platform, adventure, and puzzle games were most popular in the early stages of the video game industry, but now sports and shooter games seem to be taking over. This chart shows us that the Call of Duty series is a smash hit, as it was the top selling game from 2010-2015 every year except for 2013. The reason it was not the top seller in 2013 was because the number 2 top selling game ever Grand Theft Auto V came out. In addition, this chart proves the Super Mario Brothers franchise was incredibly popular in 1985 and still is to this day. Also, this chart teaches us that when Pokémon comes out, Pokémon dominates, as can be seen form 1996-2000. Finally, it is awesome to learn that Nintendogs was the top selling game in 2005 and the only top selling simulation game ever, as I remember loving the game in early elementary school and remember everyone playing it on the DS.
Now that we have explored the past and the present gaming industry it is important to look at the trends that are present in the industry. By looking at the genre rank by year, we can better understand what genres are trending in the world today, as well as which ones are falling behind. This can help us predict future trends for game genres and overall understand what genres attract the most consumers. The chart below is a bump chart of genres ranked by year from 2009-2015 with lines colored by genre.
genre_line_df = df.groupby(['Year','Genre'])['Global_Sales'].sum().reset_index()
genre_line_df = genre_line_df.pivot(index='Year', columns='Genre', values='Global_Sales')
genre_line_df = genre_line_df.drop(genre_line_df.index[0:29])
genre_line_df = genre_line_df.drop(genre_line_df.index[7:])
genre_line_ranked = genre_line_df.rank(1, ascending=False, method='min')
genre_line_ranked = genre_line_ranked.T
genre_line_ranked2 = genre_line_ranked.T
fig = plt.figure(figsize = (29,18))
ax = fig.add_subplot(111)
genre_line_ranked2.plot.line(ax=ax, marker='o', markeredgewidth=2, linewidth=10,
markersize=12, markerfacecolor='white', colormap="tab20c")
ax.invert_yaxis()
#num_rows = genre_line_ranked2.shape[0]
num_cols = genre_line_ranked2.shape[1]
plt.ylabel('Yearly Ranking', fontsize=30)
plt.xlabel('Year', fontsize=30)
plt.title('Ranking of Genre Total Global Sales by Year', fontsize=31, pad=10)
plt.xticks(fontsize=25)
plt.yticks(np.arange(0,14,1), fontsize=25)
ax.legend(bbox_to_anchor=(1.01,1.01), fontsize=19)
plt.show()
As can be seen above, action has a secure place at the top of the top selling genres. Thinking back to the chart about top selling genres overall, this further stamps the point that action games are the most popular games. In addition, we can see that shooter games have held a top spot and now sit second above sports. This reiterates the points made in the previous chart as Call of Duty has become one of the most popular game franchises in the world. Also, this shows that sports games continue to be a popular option for players. This also teaches us that puzzle and strategy games are not the type of games people want anymore. This seems to make sense as those games tend to be slow and tedious. From this we can say it seems that the market is headed in a direction of fast paced games. In addition, it is clear that role playing games are still quite popular as it seems people enjoy escaping their reality and pretending to be in another. Using these ranks, we can predict that action, shooter, and sports games will continue to be the top genres for years to come.
After understanding what trends have been around this past decade, we should also look at how these top genres have been selling during that time. Using the genre sales over the years, we can better understand how profitable certain genres have been, as well as which genres to stay away from if making a game. In addition, this can show us important trends in the overall landscape of gaming as some games today are shifting to different revenue sources other than one-time purchases. The chart below explores these topics through a multiple line plot of genre sales by year from 2009-2016 with lines colored by genre.
genre_line_df2 = df.groupby(['Year','Genre'])['Global_Sales'].sum().reset_index()
genre_lineplot_df = genre_line_df2.copy()
row = 0
for i in genre_line_df2.Year:
if (i < 2009) or (i>2016):
genre_lineplot_df = genre_lineplot_df.drop([row], axis=0)
#print(i)
row = row + 1
from matplotlib.ticker import FuncFormatter
fig = plt.figure(figsize = (12,8))
ax = fig.add_subplot(1,1,1)
my_colors = {'Action':'blue','Adventure':'red','Fighting':'green','Misc':'grey','Platform':'purple',
'Puzzle':'gold','Racing':'orange','Role-Playing':'yellow','Shooter':'salmon',
'Simulation':'brown','Sports':'black','Strategy':'olive'}
for key, grp in genre_lineplot_df.groupby(['Genre']):
grp.plot(ax=ax, kind='line', x='Year', y='Global_Sales', color=my_colors[key], label=key, marker='8')
plt.title('Genre Sales by Year', fontsize=18)
ax.set_xlabel('Year', fontsize=18)
ax.set_ylabel('Global Sales (millions)', fontsize=18, labelpad=14)
ax.tick_params(axis='x', labelsize=14)
ax.tick_params(axis='y', labelsize=14)
ax.yaxis.set_major_formatter( FuncFormatter( lambda x, pos:('$%1.1fM')%(x)))
plt.show()
This chart shows some very interesting trends in the gaming industry. To start, it backs up our previous findings that action, shooter, and sports games are the most popular. It also shows that many of the other genres all have quite low sales. What is concerning is that every genres sales are trending downward. But, if we look at this from a different perspective it is actually very interesting and can show us how gaming is changing. For one, games nowadays are offering in game purchases. This has become the significant revenue source that these games are losing out on by not selling as high of numbers. In certain games, like Call of Duty and FIFA, these in game purchases are incredibly popular. In addition, games like FIFA and Madden pretty much require in game purchases if a user is looking to have a team good enough to compete at the top. Thus, these negative trending sales are not so negative after all, as these games have found other very profitable ways to make money. In addition, this trend of dropping sales can be seen as evidence of the shifting format games are beginning to be released as. Games nowadays, such as Fortnite and Call of Duty Warzone, are actually free to download. These games rely on in game purchases of a multitude of things, mostly aesthetic, for their profits. The idea of releasing a free to play game is extremely interesting and would have been seen as insane just 10 years ago. The reason this format works so well now is that by offering a game for free more people download and play it, and with more people playing, these game accumulate more in game purchases, thus offsetting the loss incurred by not charging for the game. This chart truly exemplifies the changing tides of the gaming industry.
From these visualizations we can learn a few things about video games, platforms, publishers, genres, and trends. By looking at total sales by genre paired with the top selling game per year, we can prove that the dominant forces in the industry are action, sports, shooter, and role playing. In addition, we learned that North America plays a significant role in overall sales in the gaming industry, and that the Japanese market prefers homegrown companies and games. Then, by looking at the top selling games we can conclude that the top publishers like Nintendo have shaped the industry and the top genres by releasing world famous games. Also, we can confidently say that there is a market for all types of games, as violent games are just as popular as family friendly ones. Moreover, we learned that PlayStation, and especially Japanese platforms, have dominated the market. Furthermore, by looking at the top games by year and the ranking of genre from 2009-2015, we can understand that the industry shifted from platform, adventure, and puzzle games to sports, shooter, and role-playing games. Finally, by looking at genre sales from 2009-2016, we recognized the change from one-time purchase games to games reliant on in game purchases; and we saw an early sign of what was to come in the near future of free to play games. Most importantly, we can attest that we are at the precipice of a new generation of gaming as the format begins to change, the technology continues to advance, and the industry continues to grow.