CSGO vs Dota2: Steam Review Analysis

I am extremely into computer games, so much so, I founded Xavier University’s Rocket League team/club. Sadly I will be leaving that behind after I graduate this semester. If you are into PC gaming, then you know what Steam is. For those who don’t know, Steam is an online game store where you can purchase games, add and play with friends, give gifts, and socialize in many ways. For this particular project I wanted to tackle a couple of very popular games, both owned by Valve (who own Steam), and take a look at their reviews in depth.

I will be going over the data that I have collected, provide an easy to follow visual analysis on that data, get into the sentiment side of things, and end with a bit of predictive analysis using Logistic Regression. I am hoping to get a better understanding at what factors play into how a person reviews a game, as well as look into how two Valve-owned games compete with one another. For this project, I decided to look at Counterstrike: Global Offensive and Dota 2, two very popular, competitive games.

The Data

To start off, I will be describing the current dataset’s columns. This first dataset deals with the reviewer’s information and whether or not they recommend that game. All of this data was taken using Steam’s review API, so most of the descriptions are coming from this source: https://partner.steamgames.com/doc/store/getreviews

The raw data used in this project can be downloaded here: https://myxavier-my.sharepoint.com/:u:/g/personal/shomakerc_xavier_edu/EXxNVIEczQVMuoR7CnDdcAEBXT--I92yDXBeMYMx_kIiow?download=1

Data Dictionary and Preview

Variable.Name Variable.Description
voted_up True means it was a positive recommendation
author.steamid The user’s SteamID
author.num_games_owned Number of games owned by the user
author.num_reviews Number of reviews written by the user
author.playtime_forever Lifetime playtime tracked in this app (in hours)
author.playtime_last_two_weeks Playtime tracked in the past two weeks for this app (in hours)
author.playtime_at_review Playtime when the review was written (in hours)
votes_up The number of users that found this review helpful
votes_funny The number of users that found this review funny
game The game the review was written for

Descriptive Dataset Analysis

In this next portion I will begin picking apart the data and create meaningful associations between some of the variables.

Reviewer Playtime

This visualization represents the distribution of playtime at the time of writing their review and whether or not they recommended it. The next two graphics are the most interesting to me. Dota 2’s playerbase plays their game much more than CSGO, even if they did not recommend the game. Within CSGO, the people who did not recommend the game still put many, many hours into it, about as much as those who do recommend the game.

Recent Playtime

This graphic is similar to the prior, however we are now looking at the reviewer’s playtime in the last two weeks and if their recommendation has an effect on that, by game. We see that CSGO players are still quite active, and putting in up to dozens of hours in each week. Even those who do not recommend the game still put in a few hours each week.

Dota 2 is a different story. There are still a handful of individuals who play the game, but nowhere near as much as CSGO. If you are a Dota 2 player and did not leave a positive review, your recent playtime is most likely 0.

Does Experience = Helpful?

This graphic shows the relationship to a player’s hours spent in game compared with how many votes of helpfulness their review received. I was curious if players who have devoted hundreds, even up to thousands of hours of their free time into a game, if their review would be perceived as more helpful.

We see a large difference between these games. It seems as if Dota 2 may value their more veteran players more than CSGO does. With that said, there were not many reviews in the CSGO dataset that was gathered that had many upvoted reviews. This may be due to chance or an error on how the data was collected.

Data Correlations

It makes sense that there are strong variable correlations between games owned/games reviewed, total playtime/playtime at review, and up-votes/funny votes. Each of these pairings are tied in some way, which were not big surprises.

Unstructured Dataset Sentiment Analysis

For this portion of the project, I am now going to be including the rest of the data, including the individual’s review. I will be using the tidytext package in order to tokenize the reviews by word, so that each word will be broken out and able to be looked at. Here are the most commonly used words in the 100 reviews from either game.

Sentiment Scores

This graph shows the most impactful words using the bing sentiment scoring dictionary. n

Emotional Tie In

The following graph will show how much each game’s reviews contain each of the emotions used in the nrc dictionary.

Sentiment Scores with User Playtime

The bar labels represent the bar’s sentiment score. As we can see, Dota 2 seems to be on both extremes, more negative in their negative reviews, and more positive in their positive reviews. One of the interesting factors with this graph is that Dota 2 seems to be played much more in the reviewer’s lifetime, but over the last two weeks CS:GO is much more dominant.

Predictive Analysis

In the last section, I have created a logistic regression model to predict whether or not an individual may recommend the game or not.

Model 1

## 
## Logistic Regression Results
## ==========================================================
##                                    Dependent variable:    
##                                ---------------------------
##                                         voted_up          
## ----------------------------------------------------------
## author.num_games_owned              0.009*** (0.003)      
## author.num_reviews                   -0.002 (0.009)       
## author.playtime_last_two_weeks      -0.169** (0.078)      
## votes_up                             0.028** (0.012)      
## votes_funny                         -0.051** (0.025)      
## comment_count                         0.015 (0.126)       
## gamedota2                            -0.704 (0.452)       
## Constant                            -1.301*** (0.396)     
## ----------------------------------------------------------
## Observations                               200            
## Log Likelihood                           -81.733          
## Akaike Inf. Crit.                        179.465          
## ==========================================================
## Note:                          *p<0.1; **p<0.05; ***p<0.01

Improved Model

## 
## Logistic Regression Results
## ==========================================================
##                                    Dependent variable:    
##                                ---------------------------
##                                         voted_up          
## ----------------------------------------------------------
## author.num_games_owned              0.008*** (0.002)      
## author.playtime_last_two_weeks      -0.169** (0.078)      
## votes_up                             0.028** (0.012)      
## votes_funny                         -0.050** (0.023)      
## gamedota2                            -0.711 (0.450)       
## Constant                            -1.285*** (0.388)     
## ----------------------------------------------------------
## Observations                               200            
## Log Likelihood                           -81.758          
## Akaike Inf. Crit.                        175.516          
## ==========================================================
## Note:                          *p<0.1; **p<0.05; ***p<0.01

Predictions

Person 1 reviewed csgo and owns 7 games, has played 0.4 hours the last two weeks, has no helpful votes but does have 1 who voted their review as funny. The second reviewer reviewed dota2 and owns 14 games, played 5 hours in the last 2 weeks, has 7 up votes and 2 rated as funny.

##          1          2 
## 0.26078662 0.07211294

For 100 people with the first set of characteristics, there are 26 who do not recommend their reviewed game. For 100 people with the second set of characteristics, there are only 7 people who do not recommend their game.