Steam is the most famous video game digital distribution services and store of all time. From its official release in 2003, there has been more than 70,000 games released on steam until today.We can say that Steam has managed to revolutionize the PC Games industries by introducing many useful features such as online update automation for developers and cloud storage as well as online payment for gamers. This article will try to cover the vast data that are nested inside the Steam platform which contains the trend and popularity of PC games over decades.
In this article, we will use the dataset that are provided from Kaggle which are scraped through Steam Web API.
## Rows: 71,716
## Columns: 39
## $ AppID <int> 20200, 655370, 1732930, 1355720, 1139950, 1…
## $ Name <chr> "Galactic Bowling", "Train Bandit", "Jolt P…
## $ Release.date <chr> "Oct 21, 2008", "Oct 12, 2017", "Nov 17, 20…
## $ Estimated.owners <chr> "0 - 20000", "0 - 20000", "0 - 20000", "0 -…
## $ Peak.CCU <int> 0, 0, 0, 0, 0, 68, 3, 2, 1, 0, 5, 0, 0, 0, …
## $ Required.age <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Price <dbl> 19.99, 0.99, 4.99, 5.99, 0.00, 0.00, 10.99,…
## $ DLC.count <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0…
## $ About.the.game <chr> "Galactic Bowling is an exaggerated and sty…
## $ Supported.languages <chr> "['English']", "['English', 'French', 'Ital…
## $ Full.audio.languages <chr> "[]", "[]", "[]", "[]", "[]", "[]", "[]", "…
## $ Reviews <chr> "", "", "", "", "", "", "", "", "", "", "“N…
## $ Header.image <chr> "https://cdn.akamai.steamstatic.com/steam/a…
## $ Website <chr> "http://www.galacticbowling.net", "http://t…
## $ Support.url <chr> "", "", "", "https://henosisgame.com/", "ht…
## $ Support.email <chr> "", "support@rustymoyher.com", "ramoncampia…
## $ Windows <chr> "True", "True", "True", "True", "True", "Tr…
## $ Mac <chr> "False", "True", "False", "True", "True", "…
## $ Linux <chr> "False", "False", "False", "True", "False",…
## $ Metacritic.score <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 62, 0, 0, 0, …
## $ Metacritic.url <chr> "", "", "", "", "", "", "", "", "", "", "ht…
## $ User.score <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Positive <int> 6, 53, 0, 3, 50, 87, 21, 0, 76, 225, 589, 1…
## $ Negative <int> 11, 5, 0, 0, 8, 49, 7, 0, 6, 45, 212, 58, 0…
## $ Score.rank <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Achievements <int> 30, 12, 0, 0, 17, 0, 62, 0, 25, 32, 34, 0, …
## $ Recommendations <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 427, 0, 0, 0,…
## $ Notes <chr> "", "", "", "", "This Game may contain cont…
## $ Average.playtime.forever <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 703, 67, 224, 0,…
## $ Average.playtime.two.weeks <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Median.playtime.forever <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 782, 93, 257, 0,…
## $ Median.playtime.two.weeks <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Developers <chr> "Perpetual FX Creative", "Rusty Moyher", "C…
## $ Publishers <chr> "Perpetual FX Creative", "Wild Rooster", "C…
## $ Categories <chr> "Single-player,Multi-player,Steam Achieveme…
## $ Genres <chr> "Casual,Indie,Sports", "Action,Indie", "Act…
## $ Tags <chr> "Indie,Casual,Sports,Bowling", "Indie,Actio…
## $ Screenshots <chr> "https://cdn.akamai.steamstatic.com/steam/a…
## $ Movies <chr> "http://cdn.akamai.steamstatic.com/steam/ap…
There are over 39 columns and 71k columns and here are the data preprocesses steps that are being applied:
The criteria for unused columns are the ones that don’t provide useful informations such as url ((Header.image, Website, Support.url, Support.email, Metacritic.url, Screenshots, Movies), have a lot of missing value (Metacritic.score, User.score, Score.rank,) and might not be related to the analysis the we will do (Notes, About.the.game, Reviews, Recommendations, Average.playtime.two.weeks, Median.playtime.two.weeks, Required.age, Windows, Mac, Linux)
From previous examination, there are no duplicate entries; however, there are some games that have multiple Steam App ID even though they have the same game name. We will pick the entries that have the highest number of the peak users (as the duplicated entires have 0 peak users).
## AppID Name Estimated.owners
## 1 849200 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 2 890030 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 3 849178 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 4 849163 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 5 849166 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 6 849177 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 7 849186 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 8 750920 Shadow of the Tomb Raider: Definitive Edition 1000000 - 2000000
## 9 849161 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 10 849165 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 11 890031 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 12 849168 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 13 849162 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 14 890032 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 15 849164 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 16 849167 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 17 849179 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 18 849160 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 19 905320 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
## 20 849176 Shadow of the Tomb Raider: Definitive Edition 0 - 20000
By changing the data type of our Release date column into datetime format, we can extract year and decade from the release date column.
There are some columns like supported languages and full audio which has HTML elements in it (such as <) and needs to be reworded. We also added the count for how many language, audio, categories, genres, tags, developers and publishers a game has. We also need to melt some of columns that have multiple value in one entry such as language, audio, categories, genres, tags and reworded into and store it into separate dataframe that are needed to be in the plot.
After we managed to do all of this steps, here is the final result of the preprocess:
## Rows: 63,940
## Columns: 29
## $ AppID <int> 20200, 655370, 1732930, 1355720, 1139950, 146…
## $ Name <chr> "Galactic Bowling", "Train Bandit", "Jolt Pro…
## $ Release.date <date> 2008-10-21, 2017-10-12, 2021-11-17, 2020-07-…
## $ Estimated.owners <chr> "0-20k", "0-20k", "0-20k", "0-20k", "0-20k", …
## $ Peak.Users <int> 0, 0, 0, 0, 0, 68, 3, 2, 1, 0, 5, 0, 0, 0, 3,…
## $ Price <dbl> 19.99, 0.99, 4.99, 5.99, 0.00, 0.00, 10.99, 9…
## $ DLC.count <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ About.the.game <chr> "Galactic Bowling is an exaggerated and styli…
## $ Supported.languages <chr> "English", "English, French, Italian, German,…
## $ Full.audio.languages <chr> "", "", "", "", "", "", "", "English, German"…
## $ Positive <int> 6, 53, 0, 3, 50, 87, 21, 0, 76, 225, 589, 147…
## $ Negative <int> 11, 5, 0, 0, 8, 49, 7, 0, 6, 45, 212, 58, 0, …
## $ Achievements <int> 30, 12, 0, 0, 17, 0, 62, 0, 25, 32, 34, 0, 25…
## $ Average.playtime.forever <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 703, 67, 224, 0, 1…
## $ Median.playtime.forever <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 782, 93, 257, 0, 1…
## $ Developers <chr> "Perpetual FX Creative", "Rusty Moyher", "Cam…
## $ Publishers <chr> "Perpetual FX Creative", "Wild Rooster", "Cam…
## $ Categories <chr> "Single-player,Multi-player,Steam Achievement…
## $ Genres <chr> "Casual,Indie,Sports", "Action,Indie", "Actio…
## $ Tags <chr> "Indie,Casual,Sports,Bowling", "Indie,Action,…
## $ Year <dbl> 2008, 2017, 2021, 2020, 2020, 2021, 2022, 202…
## $ Decade <chr> "2000s", "2010s", "2020s", "2020s", "2020s", …
## $ count_lang <dbl> 1, 10, 2, 11, 2, 1, 3, 2, 10, 9, 5, 1, 1, 1, …
## $ count_audio <dbl> 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 1, 0, 0, 0, …
## $ count_cat <dbl> 4, 7, 1, 2, 2, 8, 3, 2, 3, 5, 5, 6, 2, 3, 3, …
## $ count_genres <dbl> 3, 2, 4, 3, 2, 6, 2, 1, 4, 3, 2, 1, 2, 2, 6, …
## $ count_tags <dbl> 4, 20, 0, 19, 6, 20, 20, 0, 6, 6, 20, 6, 6, 2…
## $ count_devs <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ count_pubs <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, …
So after we managed to preprocess the dataset, we will see the top 10 most popular games based on the peak users as well as the top 10 publishers and developers behind the popular games.
So as we can see that there are a mix between single player games and multiplayer games as well as some games such as GTA V, Sons of the forest and COD MW 2 which have both single player campaign and multiplayer coop mode. some quick descriptions for the game:
Hogwarts Legacy, Sons of the Forest and Resident Evil 4 which are released in 2023 have strong performance as single-player game compared with other multi-player games that are released earlier in this top 10.
As we can see that the top 10 publishers and developers mostly cam from the same order as the top 10 popular games. We can also see that the companies behind most popular games have taken the role of both the publishers and developers.
This Steams dataset contained every game release date from 1990 until the current date (October 2023) that is released on Steam which means that we can analyze the yearly trend.
We can see that there are increasing trends in the number of games released each years especially starting from 2013; however, the number might fall down on the 2023 even though the year 2023 hasnt finished yet as the number of released games in 2023 is down to 20% of 2022 number’s.
We can see that there are a mix of increase and decrease of the average price of games(total of every games released price in that year divided by total number of games released that year). The turning point in decreasing average price from 2013 might be attributed with an increasing number of released games. There is also 1 more important turning point which is 2019 where the average game price is increasing even when the number of games released increased.
*you can click on the legend to turn on/off the line
As it turned out, the increased number of paid games released and the decreased number of free games in 2020 have contributed in increasing the average price of games released.
Now, we will examine the trends of the genres, types and supported languages of this dataset. We will first analyze the top 5 genres of each year.
from the plot we can deduce that Indie genres have increasing trend from 2013 and strategy genre are not included into the top 5 starting from 2015 and is replaced by simulation genre. TO gain more information, we will take a closer look with line plot.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
from the closer looks starting from year 2012, we can see that the casual genres have also increased from 2014 and might merge with action and adventure genres, the two genres which can complement each others in a lot of games. The fact that Simulation genre also replacing the Strategy genres might refer the game industry collectively prefer to make simulation games where players can experience a lot more realistic approach in game rather than deep or complex strategy genres. Next, we will take a look at the Game type or categories.
We can see that single-player has always been the main game mode that all of gamers have played. The interesting things are PVP genre have only started from 2017 and multi-player genres have never significantly increase as well as have decreasing trend started from 2020. This might be attributing to the fact that there are many troubles with another form of multiplayer aspect which is called live-service game where recently are many closed in 2023.
From the supported languages, it is expected as English takes 9,687 games out of 10,205 while other 6 languages only amounted around 2,000 games in 2022. Other interesting things to note is that Chinese and Japanese languages have started to become adopted more in 2017 and 2019 which might tells that both Chinese and Japanese native games are becoming more international as well as more games adopting the languages for the Chinese and Japanese market.
The full audio supported also followed the same trend as the supported languages. The full audio languages are consisted only 20-25% from the games that supported that said languages. In 2022, the trend of full audio language for these 6 languages decreased when the number of games released in 2022 are increased. Even though the number of full audio languages have increasing trend, the decline in 2022 indicated that a lot of games released in that year indicated that the developers might not want to include full audio support for foreign languages in the future.
The main takeaways from the analyses that we have done after applying the preprocessing steps: