Introduction to the data

The gaming industry is continuing to grow, with introductions such as the metaverse as well as the creation of more games than anyone has ever seen. The number of people playing games has also increased due to COVID-19 related lock downs. With so many games and so many people playing, it is a great time to build out a great game. In my analysis, I will be trying to understand more about what makes a great game, and what makes a not so great one.

The data that I will be using for this analysis is from the gaming website Steam (https://store.steampowered.com/). I will also be using a dataset of tweets that are mentioning the top 5 games based on concurrent players from (https://steamcharts.com/top).

The goal of my analysis will be to understand more of what makes games good and what makes games bad. In order to do this, I will be using a games overall reviews to see if a game is great, solid, or bad. Once a game has been labelled as good or bad, I will be creating visualizations in order to understand more about what makes games good or bad. The below tabs will take you to a dictionary of my games data and some summary statistics that help us understand the data. Please note that we only included games created from when Steam was created (2003) to the most recent full calendar year (2021).

Data Dictionary (Games)

Variable Description
url The game’s store page URL
types The type of download the game is
name Name of the Game
desc_snippet Snippet of the description of the game
recent_reviews Game’s most recent reviews
all_reviews Game’s reviews all-time
release_date Release date
developer Game Developer
publisher Game Publisher
popular_tags Popular tags that the game is given
game_details Details of the game type
languages Languages the game supports
achievements Number of Achievements in the game
genre Genre(s) of the game
game_description Description of the game
mature_content Types of content the game contains
minimum_requirements Operational requirements to run the game
recommended_requirements Recommended Operational requirements to run the game
original_price Release price of the game
discount_price Discounted price of the game
Action Whether or not the game is in the Action genre
Adventure Whether or not the game is in the Adventure genre
Massively Multiplayer Whether or not the game is in the Massively Multiplayer genre
Strategy Whether or not the game is in the Strategy genre
Free to Play Whether or not the game is in the Free to Play genre
RPG Whether or not the game is in the RPG genre
Indie Whether or not the game is in the Indie genre
Early Access Whether or not the game is in the Early Access genre
Simulation Whether or not the game is in the Simulation genre
Racing Whether or not the game is in the Racing genre
Casual Whether or not the game is in the Casual genre
Sports Whether or not the game is in the Sports genre
Violent Whether or not the game is in the Violent genre
Gore Whether or not the game is in the Gore genre
Valve Whether or not the game is in the Valve genre
Nudity Whether or not the game is in the Nudity genre
Animation & Modeling Whether or not the game is in the Animation & Modeling genre
Design & Illustration Whether or not the game is in the Design & Illustration genre
Utilities Whether or not the game is in the Utilities genre
Sexual Content Whether or not the game is in the Sexual Content genre
Game Development Whether or not the game is in the Game Development genre
Education Whether or not the game is in the Education genre
Software Training Whether or not the game is in the Software Training genre
Web Publishing Whether or not the game is in the Web Publishing genre
Video Production Whether or not the game is in the Video Production genre
Audio Production Whether or not the game is in the Audio Production genre
Movie Whether or not the game is in the Movie genre
Photo Editing Whether or not the game is in the Photo Editing genre
Accounting Whether or not the game is in the Accounting genre
Documentary Whether or not the game is in the Documentary genre
Short Whether or not the game is in the Short genre
360 Video Whether or not the game is in the 360 Video genre
Tutorial Whether or not the game is in the Tutorial genre
HTC Whether or not the game is in the HTC genre
game_quality Quality of the game (Great, Solid, or Bad)
language_count Number of languages the game supports
game_type Type of game (Single Player, Multi Player, or Mixed)
price Price of the game (in numeric terms)
discounted_price Discounted price of the game (in numeric terms)
percent Percent of reviews that are positive
anova_game_type Numerical value for game type

Summary Data (Games)

Number of Games Oldest Game Number of possible languages Most Expensive game Most Achievements
35575 2004-01-20 35 650560 9821

Sentiment analysis

In order to understand more how people feel about the games that they are currently playing, I used the Twitter API to pull tweets mentioning the top-five games based on concurrent player count. These top 5 games are based on the website SteamCharts that was mentioned earlier. Something to note is that I skipped over using Dota 2 for this analysis because there were not enough tweets for them.

From the moving visualization above, we can see that there aren’t many times of day that tweets about any of these games are positive. This is to be expected since most people go to twitter to rant about the games issues and things that are making them upset. Something else interesting to note is that the majority of the time, the negative or positive sentiment is similar in tweets. Most of the games have time periods that they are receiving both negative and positive sentiment, but there are some overwhelmingly negative times.

Analysis of Variance

In this tab, we will be running a one-way ANOVA test in order to determine the effect that the 3 different game types have on the percent of positive reviews that they receive. Below will be a summary of the ANOVA test run as well as its interpretation.

##                    Df  Sum Sq Mean Sq F value   Pr(>F)    
## anova_game_type     2   17888    8944    19.1 5.18e-09 ***
## Residuals       16611 7778476     468                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 18961 observations deleted due to missingness

From the above analysis we see that we have a P value of less than .05, meaning that we have a statistically significant model. What this means is that the type of game that is being created has a direct influence on whether or not the game is Great, Solid, or Bad. This is interesting as earlier it seemed like there would not be a very solid comparison. Rather, due to how many bad games there were for each game type, I figured that this would not be a statistically significant predictor to the model