For this analysis, I’m going to be looking at video game data. Specifically, this dataset has thousands of video games that were released in or before 2017 and looks at things like their sales, critic scores, user scores, genre, publisher, published year, platform, and rating. From this data I want to find out what makes a video game successful in terms of sales. For example I may look at which genre’s tend to sell the highest or if there is a correlation between high critic or user score and total sales. This data is from Kaggle user Rush Kirubi and they got the data from a website called Metacritic.
This data is mostly set up well but I don’t like that the critic scores are out of 100 and the user scores are out of 10. So I’m going to replace the current user score column with one that multiplies each value by 10 called Users_Score. I will also take each of the sales columns and replace the zeros with NAs so I can take a more accurate average of them. Also I will create a new column called Console_Type that groups together all platforms into Xbox, PlayStation, Nintendo, PC, or other categories. This will be useful because the current platform column is based on specific consoles, not their parent companies. So there is currently no way to compare the big gaming companies like Xbox and PlayStation.
| Variables in Dataset | Variable Type | Explanation |
|---|---|---|
| Name | character | The name of the video game. |
| Platform | character | The console the game was released on. |
| Year_of_Release | numeric | The year the game was released. |
| Genre | character | Category of the game. |
| Publisher | character | Company who published the game. |
| NA_Sales | numeric | Game sales in North America (in millions of units). |
| EU_Sales | numeric | Game sales in Europe (in millions of units). |
| JP_Sales | numeric | Game sales in Japan (in millions of units). |
| Other_Sales | numeric | Game sales in the rest of the world (in millions of units). |
| Total_Sales | numeric | Total sales in the world (in millions of units). |
| Critic_Score | numeric | Aggregate score out of 100 compiled by Metacritic staff. |
| Critic_Count | numeric | The number of critics used in coming up with the Criticscore |
| Users_Score | numeric | Score out of 100 by Metacritic’s subscribers. |
| User_Count | numeric | Number of users who gave a score. |
| Rating | character | The ESRB ratings. |
| Console_Type | character | The company who owns the console that the game is played on. |
Now I’ll provide a usable data table for anyone who wants to take a closer look at the data.This table includes a random sample of 1000 of the 17,416 games in the dataset.
Now I’ll show 5 summary statistics about the dataset that I find interesting. I will be looking at average global sales, average critic score, average user score, maximum global sales and minimum critic score.
## # A tibble: 1 × 5
## `Average Global Sales` `Average Critic Score` Average User S…¹ Maxim…² Minim…³
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 516469. 68.9 71.2 82.5 13
## # … with abbreviated variable names ¹`Average User Score`,
## # ²`Maximum Global Sales`, ³`Minimum Critic Score`
This shows us that the average global sales for a game is 516,469.3, the average critic score is 68.9, the average user score is a little higher at 71.2, the maximum global sales for a game is 82,540,000 (this is Wii Sports), and the minimum critic score was 13 (this belongs to a game called Ride to Hell).
This visualization is actually going to be two in one. I want to compare the correlation between critic score and total sales as well as user score and total sales. To do this I will make two scatterplots. This will show us which score type has a stronger correlation with total sales.
## Warning: Removed 9081 rows containing missing values (geom_point).
## Warning: Removed 9619 rows containing missing values (geom_point).
This shows us that for both critic and user score, there is a positive correlation with global sales. This is unsurprising as better games should sell more. Unfortunately there doesn’t seem to be a huge difference between the two types of score and their correlations with global sales. With that said, we can see that critic score seems to have a slightly stronger correlation as they have less games with low score and high sales, but it’s not a huge difference. This surprises me as I thought user score would be a stronger factor because they are the ones who actually buy the games in mass.
Now that we’ve established that critic score has a positive correlation with global sales. I’ll now look at the variation of critic scores for the different console companies. To do this I will use the Console_Company column I created and the Critic_Score column and make a boxplot.
This shows us that the other category has the highest average critic score, this category is mostly made up of Atari games and Sega games as well as other more dated games. I’m not surprised at this because many of the ‘other’ games are considered classics and should be rated highly by critics. We also see that PC games are next in average critic score, followed by Xbox and PlayStation, and lastly Nintendo. It may seem surprising that Xbox and PlayStation are very similar but it makes sense because these consoles have a lot of the same games and the ratings won’t change much depending on the console somebody plays on. I am a little surprised that Nintendo is at the bottom because they have a lot of very successful games in terms of total sales but clearly they also have a lot of lowly scored games.
For this visualization, I will be looking at the average global sales
for each genre to see which ones sell the best. To do this I’ll make a
barplot.
Here we can see that platform games sell the most copies on average by a decent amount. Shooters are second and racing, role-playing, and sports games are next. We also see that adventure and strategy games sell the least copies on average. I’m not surprised that platform and shooter games are the highest selling because with platforms, they are accessible to everyone and have been popular for a long time with games like Mario and Sonic. Shooters are the most popular genre nowadays with games like Fortnite and Call of Duty. I’m also not surprised with adventure and strategy games being the lowest selling because they can be a lot more niche and harder to get in to.
For this analysis I want to see how the different consoles sell in different parts of the world. To do this I will make four different bar graphs, one for North American sales, one for Europe sales, one for Japan sales, and one for other sales. Each of these graphs will show the average sales in that part of the world for each console to show us who performs the best in which parts of the world. I also only want to see this for more modern games so I have a better idea of what sells now, so I will filter the year of release to be on or after 2001.
This provided very interesting results as each part of the world has different preferences in console. Starting with North America, we can see that Xbox is ahead by a large margin followed by PlayStation and Nintendo with PC being minimal and nothing in the other category. I was surprised that PlayStation was not closer to Xbox in North America as they are always competing with each other whenever they release new consoles. I was also surprised that PC didn’t have more average sales as PC gaming has been on the rise in the US. For Europe,PlayStation has a large lead over the competition with Nintendo coming in second, Xbox in third, then PC and nothing for other. It’s interesting that Xbox was far ahead of PLayStation in North America but it’s the opposite in Europe. For Japan, Nintendo is unsurprisingly in the first spot as it is a Japanese company. Interestingly, the other category comes in second for Japan with consoles like the WonderSwan and DreamCast makinf an impact, this is the only area of the world where the other category is significant after 2000. PlayStation (a formerly Japanese owned company, now American owned) is in third for Japan followed by PC and then Xbox in last. Lastly, in the other areas of the world, PlayStation is by far the highest seller with Xbox barely ahead of Nintendo for second and PC behind them while other has nothing. Overall, it’s interesting that Xbox has a lead in North America (the biggest market for games) over PlayStation and Nintendo, but in other areas of the world it’s a different story. This is a sign that PlayStation and Nintendo are more globally focused than Xbox as they appeal to other countries more.
For this visualization, I want to know how user score differentiate
based on console and ratings of games. To find this I will make a bar
graph with console on the x axis, user score on the y, and rating as the
fill. I’m excluding the other console category because most of the games
in the category don’t have a rating. Lastly, I’m only including the four
main ratings that are used for games (E, E10+, T, M) because all of the
other ratings, such as AO or K-A, don’t have enough instances to have a
good average.
This shows us that for all console types except for PC, E10+ games are the lowest rated by users. This is likely because E10+ games fit a weird niche between kids and teenagers that may not appeal to many people. We also see that games rated T for teen are the highest rated across the board (for Xbox this and M are virtually the same) followed by games rated M for mature. It was expected that T and M would be the highest rated games as they are made for adults instead of kids which means the game play and stories are likely more complex in these types of games. Interestingly, we see that Xbox games are rated the lowest by users while PlayStation and Nintendo are rated similarly high.
Since the data in my original dataset only has games that go up to 2017, I wanted to know more about games that were released more recently. So for my secondary data I scraped the website where the original data was gathered from, Metacritic, and got data on thee top 300 games of 2021 based on critic score. Sadly the website sorts games by critic score so I had to grab the top 300 based on that, I would rather have had a more random sample of games from the year, but luckily the user scores still have a lot of variability. The data I grabbed from Metacritic contains the Name, Platform, Date of release, user score, and critic score of each game.
I am going to be making some changes to the data so that it matches up with our original data better. I will be again multiplying the user scores by 10 so that they are out of 100 like the critic scores. I will also be making a console type column in this dataset. I will also make a year of release column since currently there is a date column but not just the year.
Here is an interactive data table with the new scraped data.
For my last analysis, I will be looking at how user scores have
changed throughout recent years. I want to know if people have been
enjoying games more or less over time. I also want to be able to see
this for each individual console type so I will facet based on that
variable, I will not include the other category as those games only go
up to around 2001 and I want to use more recent years. There will be
four bar charts that show the average user score for each year from
2008-2017 and our new 2021 data. To do this I will need to combine the
rows of the two datasets. The new dataset I make will have the name,
user score, console type and year of release of each game.
It’s unsurprising that 2021 ended up being the highest year for every console type because they are technically the top 300 games based on critic score and while the user score has more variability, the games are likely still considered to be good for the most part so their ratings should be higher than the other years which include every game that came out. There are other interesting trends in this data, particularly how PC, Xbox, and PlayStation all were steadily declining in user score throughout the 2010s (PC and Xbox at a higher rate than PlayStation) while Nintendo never really did. I’m not sure what could’ve caused this but Nintendo seems to be the most consistent throughout the years when it comes to user score and PC is the most inconsistent.