Overview

My EDA was focused on Steam game data sourced from Steam Spy. Steam Spy is Steam stats service based on Web API provided by Valve and gathers data from user profile. Steam is a popular online gaming platform. I looked at game variables such as Release Date, Price, Ratings, # of Owners, Total Playtime, Median Playtime, Developers, Genre, etc.

Importing Datasets

This is the data that I’m going to be working with for this EDA. All data was sourced from Steam Spy on 11/26/19. There are two .cvs datasets: 1) Steam dataset with game title, and other variables (Release Date, Price, Ratings, # of Owners, Total Playtime, Median Playtime, Developers, etc) and 2) genre classification of each game.

I wrangled the data in a separate R file. I’m going to directly load in the clean, tidy data here.

Exploring Number of Games Released by Year

Taking an initial glance at the data, I want to first see the total amount and percentage change of new games released on the Steam platform from year to year.

release_year total_released growth_rate
2004 29 NA
2005 33 0.1379310
2006 115 2.4848485
2007 107 -0.0695652
2008 193 0.8037383
2009 335 0.7357513
2010 314 -0.0626866
2011 310 -0.0127389
2012 405 0.3064516
2013 549 0.3555556
2014 1648 2.0018215
2015 2728 0.6553398
2016 4346 0.5931085
2017 6498 0.4951680
2018 8483 0.3054786
2019 8410 -0.0086054

Takeaways

The first plot shows the total number of games released on the Steam platform each year from 2014 to 2019. We can see that there is an exponential increase of games every year. However, to truly unpack the insight, I looked at the change in year-over-year growth rate of new games released on Steam from 2014 to 2019.

Looking at the second plot you can see that there is a spike in the growth rate of new games released in 2006, 2009, and in 2014. These spikes are correlated to major updates to the Steam platform for both users and game developers. In 2006, Steam started approaching 3rd party publishers to release their games on the Steam platform. In 2009, Steam introduced Steam Cloud. This allowed users to store their games on Steam-owned cloud server that then could be accessed via any computer running the Steam client. In 2014, Steam’s parent company, Valve, announced plans to hugely widen the number of games they allow onto Steam by approving developers directly via ended Steam GreenLight and introducing Steam Direct.

As background, during the days of Greenlight, developers pitch games to Steamers, who vote by answering the question “Would you buy this game if it were available in Steam?” If a game is popular enough, eventually Valve approve it. On Steam Direct, game developers register with Valve and, after verification, publish games to Steam, bypassing the “popularity contest” completely.


Exploring Number of Games Released by Month

Next, I wanted to take a closer look at new Steam game releases by month. I want to understand if there was a trend on when game developers would typically release their games on Steam.

release_month total_released
1 2143
2 2446
3 2744
4 2777
5 2922
6 2489
7 2920
8 3182
9 3265
10 3426
11 3140
12 3049

Takeaways

Looking at the graph, we can see that the most amount of new games released on the Steam platform was in October. We can also see a general trend of games being released in the 4th Quarter of the year.

Now why might this be the case? Is there some reason why game developers release games in the latter half of the year? My hunch is that game developers what to release their games closer, but before, holidays such as Black Friday and Christmas. This is to take advantage that consumers are more likely to spend their money to buy their game for either themselves or others.

Another interesting observation is that there seems to be less new game releases in the month of June compared to other months of the year. A possible explanation is that many of the large game developers are revealing their games at big gaming conventions such as E3 that happen in June. Smaller game developers, that make up a good portion on Steam game may release their games in other months to avoid competition with these bigger game developers. For example, a small indie game developer wouldn’t want to release their game at the same time the new Call of Duty is being announced and released the same week.


Exploring Game Pricing

How have game prices changed over 2004 - 2019?

The first area I wanted to explore is the average price of Steam games have trended over the 15 year period.

release_year avg_price
2004 6.659655
2005 8.674687
2006 9.053097
2007 8.889533
2008 9.863548
2009 9.718364
2010 9.016555
2011 9.379400
2012 10.928031
2013 10.626743
2014 9.650562
2015 8.351543
2016 8.147715
2017 8.035172
2018 7.422085
2019 7.894500

Takeaways

Initially, I expected the average price to be relatively the same over time. However, looking at the plot, we can see that the average prices of games have increased until 2012 and then decreases. Note, the average price of a Steam game in 2019 is about a dollar less than the average price in 2005!


Why has there been a decrease in average game price after 2012?

Next, I wanted to understand what might have caused the overall decrease in average game price starting in 2012. My hunch is that Free-to-Play games are becoming more and more popular and thus driving down average price. To test my hypothesis, I calculated the proportion of Free-to-Play games to released games and visualized my calculations on a plot.

release_year n total prop
2004 1 29 0.0344828
2005 1 33 0.0303030
2006 4 115 0.0347826
2007 5 107 0.0467290
2008 9 193 0.0466321
2009 8 335 0.0238806
2010 9 314 0.0286624
2011 11 310 0.0354839
2012 28 405 0.0691358
2013 38 549 0.0692168
2014 96 1648 0.0582524
2015 206 2728 0.0755132
2016 436 4346 0.1003221
2017 659 6498 0.1014158
2018 910 8483 0.1072734
2019 857 8410 0.1019025

Takeaways

Looking at the plot, it seems like my initial hypothesis looks to be correct. At least from personal experience, many other games that I play now-a-days are classified as a “Freemium” game. This is where a game is free of charge, however the game developers charge money for additional features, services, or virtual or physical goods (think loot boxes, in-game currency, etc.).


Takeaways

Looking at the faceted plot, we can see that there is an overall general increase in lower priced games (priced less than $10). We can also see as the number of new released games on the Steam platform increase, the diversity in prices also increases. This wider spread in prices may indicate game developers employing different pricing strategies to better attract consumers to buy their games. Yes, free and low-priced games are becoming and more popular; however, this doesn’t stop some game developers to price their games at a very high amount comparatively.


Exploring Game Genres

Is there a playtime difference between genres?

First, I wanted to explore if there was a large difference between average playtime of different game genres. The average playtime here is the average playtime over the past 2 week. For this EDA, I am looking at the two week timeframe from November 12th, 2019 to November 26th, 2019.

genre total_playtime
action 175110
adventure 175110
early_access 22927
indie 85056
mmo 37949
rpg 83578
simulation 67136
sports 17166
strategy 54136

Takeaways

Overall, it seems like games under the Action and Adventure genre have much higher average playtime in the last two weeks compared to other game genres. Interestingly, Action and Adventure have the same exact play time. Is this because game that is classified as Action is also classfied as Adventure? Or is it just pure coincidence?

To understand the abovementioned questions, I took a count of all the games with action and adventure genre classification.

n count
2 14199
4 117
6 3
8 1
genre game publishers
action Zombie Apocalypse GameTop.com
action Zombie Apocalypse Kapitan
adventure Zombie Apocalypse GameTop.com
adventure Zombie Apocalypse Kapitan

Weird, there seems to be games with multiple adventure and action genres tags. Turns out, there are games by different publishers with the same names!

Overall, since we can see in the column n that all the numbers are even, this means that all action genre games are also classified as adventure games. I suppose no adventure is without action in the eyes of game developers.


Is there a trend over time in game genre?

Next, I wanted to see if the proportion of different game genres change over the 15 year period.

Takeaways

Looking at the faceted plot, we can see that over time, there has been a dramatic increase in indie genre games. There has been a slight increase in early_access simulation, and rpg games, while there has been a decline in strategy genre games. Action, Adventure, MMO, and Sports has relatively stayed the same.


Exploring Gaming Nostalgia

Is there a certain title or genre of game that has a particular nostalgia factor? What games have stood the test of time?

To determine which games have a nostalgia factor, I look at the top trending games in the past two week.

release_year number_released
2004 1
2006 1
2007 1
2009 1
2010 1
2011 1
2012 5
2013 9
2014 7
2015 8
2016 18
2017 13
2018 17
2019 14

I then filtered for games released pre-2010, and arranged average playtime from most to least.

genre game release_year trending avg_playtime metascore
action Team Fortress 2 2007 Yes 1393 92
adventure Team Fortress 2 2007 Yes 1393 92
indie Garry’s Mod 2006 Yes 523 NA
simulation Garry’s Mod 2006 Yes 523 NA
strategy Sid Meier’s Civilization V 2010 Yes 230 90
action Left 4 Dead 2 2009 Yes 191 89
adventure Left 4 Dead 2 2009 Yes 191 89
action Counter-Strike: Source 2004 Yes 150 88
adventure Counter-Strike: Source 2004 Yes 150 88

*Note: some games are duplicated, this is because they are classified under two genres.


Takeaways

Of all the games listed above, the most surprising was Garry’s Mod. Compared to the other games on the list, it is not a typical action/adventure first-person shooter or a classic strategy game. Instead, Garry’s Mod is an indie-simulation game. As a physics sandbox game, there aren’t even any predefined aims or goals to the game. Yet, despite its unique genre and not critically reviewed, the game is still being played 13 years after its initial release.


Exploring Metascore

What genre are the top rated games?

To have a large enough sample size to ensure a relatively accurately and widely agreed upon metascore, I filtered for games with 10M+ owners. According to the Metacritic website, anything above 80 (Green) is a pretty decent game.

genre n
action 17
adventure 17
indie 4
mmo 4
rpg 5
simulation 1
sports 1
strategy 2


What are the Top 10 highest rated games?

Again, to have a large enough sample size to ensure a relatively accurately and widely agreed upon metascore, I filtered for games with 10M+ owners.

game metascore avg_playtime
Grand Theft Auto V 96 826
Half-Life 2 96 112
The Elder Scrolls V: Skyrim 94 155
Team Fortress 2 92 1393
Dota 2 90 1866
Sid Meier’s Civilization V 90 230
Borderlands 2 89 275
Left 4 Dead 2 89 191
Counter-Strike: Source 88 150
PLAYERUNKNOWN’S BATTLEGROUNDS 86 994

Takeaways

Looking at the first plot, we can see that the bulk of 80+ metascore games are categorized under action and adventure. Interestingly, out of all the games in the Top 10 Highest Rated Games table, Sid Meier’s Civilization V is the only pure strategy genre game on the list. This may imply two conclusions: 1) developing a highly rated pure strategy game is quite hard to do and 2) developing a highly rated game doesn’t have to be in the action/strategy genre.


Exploring Number of Owners

What is the distribution of owners across Steam games?

I want to identify if there are any anomalies in distribution of Steam game owners. My initial hypothesis is that the distribution will be very left skewed. This is based on the fact that developing and publishing a highly popular game is very difficult nowadays.

owners n
50,000,000 .. 100,000,000 1
100,000,000 .. 200,000,000 2
20,000,000 .. 50,000,000 4
10,000,000 .. 20,000,000 30
5,000,000 .. 10,000,000 59
2,000,000 .. 5,000,000 217
1,000,000 .. 2,000,000 350
500,000 .. 1,000,000 600
200,000 .. 500,000 1414
100,000 .. 200,000 1480
50,000 .. 100,000 1940
20,000 .. 50,000 3649
0 .. 20,000 24058

Takeaways

As noted, my hypothesis stands: the distribution of number of owners across Steam games is extremely left-skewed. Over 70% of all games on the Steam platform have between 0 and 20,000 owners. I honestly did not expect a steep drop off between games that have less than 20,000 owners and games that have 20,000+ owners. This may indicate that for many game developers and their games, achieving 20,000 owners on the Steam platform is a very tough barrier to cross.


How does metascore relate to number of owners?

I want to explore if there are any differences in metascore distributions among difffernt owner groups.

stats 20M+ 10M-20M 5M-10M 1M-5M <1M
Min. 69.00 62.00 63.00 39.00 20.00
1st Qu. 83.00 79.75 73.00 75.00 65.00
Median 84.50 83.50 81.50 81.00 72.00
Mean 82.83 82.65 79.81 79.56 70.56
3rd Qu. 86.00 89.25 85.25 86.00 78.00
Max. 90.00 96.00 96.00 95.00 98.00
NA’s 1 10 23 212 30404

Takeaways

When looking at the boxplots, we can see that the number of owners increase, the smaller the spread of metascores the games have and, in general, have higher metascores as well. This intuitively makes sense because in order for a game to be very successful, it has to cater to a diverse set of players, as well as garner high reviews from both critics and players.

Interestingly enough, the highest rated game (seen in the <1M) is not owned by the most amount of people. This might be due to the fact the game is relatively unknown and has yet to be discovered by the general masses.


Conclusion/Next Steps

There are many potential next steps we could take. I think that one interesting next step would be integrate Twitch (popular videogame streaming platform) data with the Steam data. This would allow me to explore which Steam games are the most popular among livestreamers, how these Steam games trend / perform on the Steam platform or looking at whether or not if introduction of the game on Twitch helped with its popularity on Steam.