The gaming industry is continuing to grow, with introductions such as the metaverse as well as the creation of more games than anyone has ever seen. The number of people playing games has also increased due to COVID-19 related lock downs. With so many games and so many people playing, it is a great time to build out a great game. In my analysis, I will be trying to understand more about what makes a great game, and what makes a not so great one.
The data that I will be using for this analysis is from the gaming website Steam (https://store.steampowered.com/). I will also be using a dataset of tweets that are mentioning the top 5 games based on concurrent players from (https://steamcharts.com/top).
The goal of my analysis will be to understand more of what makes games good and what makes games bad. In order to do this, I will be using a games overall reviews to see if a game is great, solid, or bad. Once a game has been labelled as good or bad, I will be creating visualizations in order to understand more about what makes games good or bad. The below tabs will take you to a dictionary of my games data and some summary statistics that help us understand the data. Please note that we only included games created from when Steam was created (2003) to the most recent full calendar year (2021).
| Variable | Description |
|---|---|
| url | The game’s store page URL |
| types | The type of download the game is |
| name | Name of the Game |
| desc_snippet | Snippet of the description of the game |
| recent_reviews | Game’s most recent reviews |
| all_reviews | Game’s reviews all-time |
| release_date | Release date |
| developer | Game Developer |
| publisher | Game Publisher |
| popular_tags | Popular tags that the game is given |
| game_details | Details of the game type |
| languages | Languages the game supports |
| achievements | Number of Achievements in the game |
| genre | Genre(s) of the game |
| game_description | Description of the game |
| mature_content | Types of content the game contains |
| minimum_requirements | Operational requirements to run the game |
| recommended_requirements | Recommended Operational requirements to run the game |
| original_price | Release price of the game |
| discount_price | Discounted price of the game |
| Action | Whether or not the game is in the Action genre |
| Adventure | Whether or not the game is in the Adventure genre |
| Massively Multiplayer | Whether or not the game is in the Massively Multiplayer genre |
| Strategy | Whether or not the game is in the Strategy genre |
| Free to Play | Whether or not the game is in the Free to Play genre |
| RPG | Whether or not the game is in the RPG genre |
| Indie | Whether or not the game is in the Indie genre |
| Early Access | Whether or not the game is in the Early Access genre |
| Simulation | Whether or not the game is in the Simulation genre |
| Racing | Whether or not the game is in the Racing genre |
| Casual | Whether or not the game is in the Casual genre |
| Sports | Whether or not the game is in the Sports genre |
| Violent | Whether or not the game is in the Violent genre |
| Gore | Whether or not the game is in the Gore genre |
| Valve | Whether or not the game is in the Valve genre |
| Nudity | Whether or not the game is in the Nudity genre |
| Animation & Modeling | Whether or not the game is in the Animation & Modeling genre |
| Design & Illustration | Whether or not the game is in the Design & Illustration genre |
| Utilities | Whether or not the game is in the Utilities genre |
| Sexual Content | Whether or not the game is in the Sexual Content genre |
| Game Development | Whether or not the game is in the Game Development genre |
| Education | Whether or not the game is in the Education genre |
| Software Training | Whether or not the game is in the Software Training genre |
| Web Publishing | Whether or not the game is in the Web Publishing genre |
| Video Production | Whether or not the game is in the Video Production genre |
| Audio Production | Whether or not the game is in the Audio Production genre |
| Movie | Whether or not the game is in the Movie genre |
| Photo Editing | Whether or not the game is in the Photo Editing genre |
| Accounting | Whether or not the game is in the Accounting genre |
| Documentary | Whether or not the game is in the Documentary genre |
| Short | Whether or not the game is in the Short genre |
| 360 Video | Whether or not the game is in the 360 Video genre |
| Tutorial | Whether or not the game is in the Tutorial genre |
| HTC | Whether or not the game is in the HTC genre |
| game_quality | Quality of the game (Great, Solid, or Bad) |
| language_count | Number of languages the game supports |
| game_type | Type of game (Single Player, Multi Player, or Mixed) |
| price | Price of the game (in numeric terms) |
| discounted_price | Discounted price of the game (in numeric terms) |
| percent | Percent of reviews that are positive |
| anova_game_type | Numerical value for game type |
| Number of Games | Oldest Game | Number of possible languages | Most Expensive game | Most Achievements |
|---|---|---|---|---|
| 35575 | 2004-01-20 | 35 | 650560 | 9821 |
This page will contain trends and analysis that I ran in order to understand more about the things that affect the quality of a game. There are most likely many more things that can be taken into account when understanding the quality of game. In future analysis a more complete dataset understanding the types of developers who create games as well as other variables would aid in this understanding. For the sake of this analysis, I looked into Price, Genre, Release Year, Number of Languages Supported, and Game Type
What one would think has an effect on whether or not a game is good is the price. if you are paying more for a game, you would think that you are getting a finished product. However, that may not always be the case. The visualization below is going to help us understand if there is a correlation between price and the quality of a game. Note that only prices under $120 are included, since this will get games that are either within the normal price range of $60 and things like premium editions that can go up to $120.
According to the visualizaton above, our initial idea about the effect of price on the quality of game does seem to be consistent within the interquartile range of our plot. The more expensive a game is, the more likely it is to be a good game. However, once you are out of this range there is not as much consistency with our initial argument. We see that there appears to be just as many more expensive great games as there are bad and solid games. It is also important to note that the most expensive game in this visualization is a bad game.
The next visualization that we will look at is how the game’s genre affects its quality. Initially, I can think of many games that are going to be worse that are in the Indie genre. I believe this because it is easy to make an Indie game, and there are a lot of Indie games out there. With so many games and such ease of creation there are definitely buggy Indie games on Steam.
Above we can see that our initial assumptions about this distribution were true. Indie Games are the most prominent game in this dataset and we can see that the majority of them are bad games. Something else interesting to note about this visual is that each type of game quality has a similar distribution of games of each genre. We see that for each game the most prominent is Indie games, with Action coming in second every time. This is an interesting nod to the types of genres I chose for this analysis, as they are the most common genres on steam.
Something interesting to see is whether or not the release date has an effect on the number of Great, Solid, or Bad games that came out this year. The below visual will inform us on whether or not this is the case. This will also help us to understand whether or not there were some great years for gaming.
Over the last 18 years of Steam being online, there has been a constant increase in games made as demonstrated by the graph above. Something to note is that this distribution is also very similar throughout. With Bad games being created every single year much more than Great or Solid games are. Something interesting to me is the dip seen towards the end of the graph. With a peak and then the overall number of games decreasing I was confused as to why this could be. I believe this to be because of the amount of detail that is going into games these days, and that the developers are taking more time to build cleaner games.
Inclusivity is everywhere in today’s world. Whether it is being more inclusive towards people for things they believe or more inclusive to the things that they enjoy. With that being said, the best thing that the gaming industry can do to be more inclusive is to incorporate languages into their games. The boxplot below will illustrate the number of languages the game can translate to and if that has had an effect on its quality.
This has a similar distribution to the price that we looked at earlier in this analysis. With the Great games having a higher average languages than any other game quality. I believe this distribution to be expected, as the greater games are more likely to be from larger studios that are more capable of translation. Smaller, less quality games are probably only going to have one language that they can manage.
The final visualization for this page is looking into the way that the type of game can affect the quality of it. With a lot of larger developers tending to be creating mixed games (games with single and multi-player elements) I expect the single player games here to be overwhelmingly bad.
Based on this graph, my analysis was right as to Single Player games being overwhelmingly bad. However, I fully expected there to be more Great and Solid quality games than bad games in the other two genres. I was mistaken and surprised to find out that Bad games hold the majority in each category. This is interesting to me because it begs the question of just how many other multiplayer and mixed games are out there that cause their reviews to be worse.
In order to understand more how people feel about the games that they are currently playing, I used the Twitter API to pull tweets mentioning the top-five games based on concurrent player count. These top 5 games are based on the website SteamCharts that was mentioned earlier. Something to note is that I skipped over using Dota 2 for this analysis because there were not enough tweets for them.
From the moving visualization above, we can see that there aren’t many times of day that tweets about any of these games are positive. This is to be expected since most people go to twitter to rant about the games issues and things that are making them upset. Something else interesting to note is that the majority of the time, the negative or positive sentiment is similar in tweets. Most of the games have time periods that they are receiving both negative and positive sentiment, but there are some overwhelmingly negative times.
In this tab, we will be running a one-way ANOVA test in order to determine the effect that the 3 different game types have on the percent of positive reviews that they receive. Below will be a summary of the ANOVA test run as well as its interpretation.
## Df Sum Sq Mean Sq F value Pr(>F)
## anova_game_type 2 17888 8944 19.1 5.18e-09 ***
## Residuals 16611 7778476 468
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 18961 observations deleted due to missingness
From the above analysis we see that we have a P value of less than .05, meaning that we have a statistically significant model. What this means is that the type of game that is being created has a direct influence on whether or not the game is Great, Solid, or Bad. This is interesting as earlier it seemed like there would not be a very solid comparison. Rather, due to how many bad games there were for each game type, I figured that this would not be a statistically significant predictor to the model