1. Introduction

Steam is the most famous video game digital distribution services and store of all time. From its official release in 2003, there has been more than 70,000 games released on steam until today.We can say that Steam has managed to revolutionize the PC Games industries by introducing many useful features such as online update automation for developers and cloud storage as well as online payment for gamers. This article will try to cover the vast data that are nested inside the Steam platform which contains the trend and popularity of PC games over decades.

2. Data Preprocessing

In this article, we will use the dataset that are provided from Kaggle which are scraped through Steam Web API.

## Rows: 71,716
## Columns: 39
## $ AppID                      <int> 20200, 655370, 1732930, 1355720, 1139950, 1…
## $ Name                       <chr> "Galactic Bowling", "Train Bandit", "Jolt P…
## $ Release.date               <chr> "Oct 21, 2008", "Oct 12, 2017", "Nov 17, 20…
## $ Estimated.owners           <chr> "0 - 20000", "0 - 20000", "0 - 20000", "0 -…
## $ Peak.CCU                   <int> 0, 0, 0, 0, 0, 68, 3, 2, 1, 0, 5, 0, 0, 0, …
## $ Required.age               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Price                      <dbl> 19.99, 0.99, 4.99, 5.99, 0.00, 0.00, 10.99,…
## $ DLC.count                  <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0…
## $ About.the.game             <chr> "Galactic Bowling is an exaggerated and sty…
## $ Supported.languages        <chr> "['English']", "['English', 'French', 'Ital…
## $ Full.audio.languages       <chr> "[]", "[]", "[]", "[]", "[]", "[]", "[]", "…
## $ Reviews                    <chr> "", "", "", "", "", "", "", "", "", "", "“N…
## $ Header.image               <chr> "https://cdn.akamai.steamstatic.com/steam/a…
## $ Website                    <chr> "http://www.galacticbowling.net", "http://t…
## $ Support.url                <chr> "", "", "", "https://henosisgame.com/", "ht…
## $ Support.email              <chr> "", "support@rustymoyher.com", "ramoncampia…
## $ Windows                    <chr> "True", "True", "True", "True", "True", "Tr…
## $ Mac                        <chr> "False", "True", "False", "True", "True", "…
## $ Linux                      <chr> "False", "False", "False", "True", "False",…
## $ Metacritic.score           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 62, 0, 0, 0, …
## $ Metacritic.url             <chr> "", "", "", "", "", "", "", "", "", "", "ht…
## $ User.score                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Positive                   <int> 6, 53, 0, 3, 50, 87, 21, 0, 76, 225, 589, 1…
## $ Negative                   <int> 11, 5, 0, 0, 8, 49, 7, 0, 6, 45, 212, 58, 0…
## $ Score.rank                 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Achievements               <int> 30, 12, 0, 0, 17, 0, 62, 0, 25, 32, 34, 0, …
## $ Recommendations            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 427, 0, 0, 0,…
## $ Notes                      <chr> "", "", "", "", "This Game may contain cont…
## $ Average.playtime.forever   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 703, 67, 224, 0,…
## $ Average.playtime.two.weeks <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Median.playtime.forever    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 782, 93, 257, 0,…
## $ Median.playtime.two.weeks  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Developers                 <chr> "Perpetual FX Creative", "Rusty Moyher", "C…
## $ Publishers                 <chr> "Perpetual FX Creative", "Wild Rooster", "C…
## $ Categories                 <chr> "Single-player,Multi-player,Steam Achieveme…
## $ Genres                     <chr> "Casual,Indie,Sports", "Action,Indie", "Act…
## $ Tags                       <chr> "Indie,Casual,Sports,Bowling", "Indie,Actio…
## $ Screenshots                <chr> "https://cdn.akamai.steamstatic.com/steam/a…
## $ Movies                     <chr> "http://cdn.akamai.steamstatic.com/steam/ap…

There are over 39 columns and 71k columns and here are the data preprocesses steps that are being applied:

2.1. Removing unused columns

The criteria for unused columns are the ones that don’t provide useful informations such as url ((Header.image, Website, Support.url, Support.email, Metacritic.url, Screenshots, Movies), have a lot of missing value (Metacritic.score, User.score, Score.rank,) and might not be related to the analysis the we will do (Notes, About.the.game, Reviews, Recommendations, Average.playtime.two.weeks, Median.playtime.two.weeks, Required.age, Windows, Mac, Linux)

2.2. Removing duplicated and zero-owners game entry

From previous examination, there are no duplicate entries; however, there are some games that have multiple Steam App ID even though they have the same game name. We will pick the entries that have the highest number of the peak users (as the duplicated entires have 0 peak users).

##     AppID                                          Name  Estimated.owners
## 1  849200 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 2  890030 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 3  849178 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 4  849163 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 5  849166 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 6  849177 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 7  849186 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 8  750920 Shadow of the Tomb Raider: Definitive Edition 1000000 - 2000000
## 9  849161 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 10 849165 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 11 890031 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 12 849168 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 13 849162 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 14 890032 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 15 849164 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 16 849167 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 17 849179 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 18 849160 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 19 905320 Shadow of the Tomb Raider: Definitive Edition         0 - 20000
## 20 849176 Shadow of the Tomb Raider: Definitive Edition         0 - 20000

2.3. Change the data type of Release date and create Year and Decade columns

By changing the data type of our Release date column into datetime format, we can extract year and decade from the release date column.

2.4. Reworded columns values & added the count for some columns

There are some columns like supported languages and full audio which has HTML elements in it (such as &lt;) and needs to be reworded. We also added the count for how many language, audio, categories, genres, tags, developers and publishers a game has. We also need to melt some of columns that have multiple value in one entry such as language, audio, categories, genres, tags and reworded into and store it into separate dataframe that are needed to be in the plot.

After we managed to do all of this steps, here is the final result of the preprocess:

## Rows: 63,940
## Columns: 29
## $ AppID                    <int> 20200, 655370, 1732930, 1355720, 1139950, 146…
## $ Name                     <chr> "Galactic Bowling", "Train Bandit", "Jolt Pro…
## $ Release.date             <date> 2008-10-21, 2017-10-12, 2021-11-17, 2020-07-…
## $ Estimated.owners         <chr> "0-20k", "0-20k", "0-20k", "0-20k", "0-20k", …
## $ Peak.Users               <int> 0, 0, 0, 0, 0, 68, 3, 2, 1, 0, 5, 0, 0, 0, 3,…
## $ Price                    <dbl> 19.99, 0.99, 4.99, 5.99, 0.00, 0.00, 10.99, 9…
## $ DLC.count                <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ About.the.game           <chr> "Galactic Bowling is an exaggerated and styli…
## $ Supported.languages      <chr> "English", "English, French, Italian, German,…
## $ Full.audio.languages     <chr> "", "", "", "", "", "", "", "English, German"…
## $ Positive                 <int> 6, 53, 0, 3, 50, 87, 21, 0, 76, 225, 589, 147…
## $ Negative                 <int> 11, 5, 0, 0, 8, 49, 7, 0, 6, 45, 212, 58, 0, …
## $ Achievements             <int> 30, 12, 0, 0, 17, 0, 62, 0, 25, 32, 34, 0, 25…
## $ Average.playtime.forever <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 703, 67, 224, 0, 1…
## $ Median.playtime.forever  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 782, 93, 257, 0, 1…
## $ Developers               <chr> "Perpetual FX Creative", "Rusty Moyher", "Cam…
## $ Publishers               <chr> "Perpetual FX Creative", "Wild Rooster", "Cam…
## $ Categories               <chr> "Single-player,Multi-player,Steam Achievement…
## $ Genres                   <chr> "Casual,Indie,Sports", "Action,Indie", "Actio…
## $ Tags                     <chr> "Indie,Casual,Sports,Bowling", "Indie,Action,…
## $ Year                     <dbl> 2008, 2017, 2021, 2020, 2020, 2021, 2022, 202…
## $ Decade                   <chr> "2000s", "2010s", "2020s", "2020s", "2020s", …
## $ count_lang               <dbl> 1, 10, 2, 11, 2, 1, 3, 2, 10, 9, 5, 1, 1, 1, …
## $ count_audio              <dbl> 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 1, 0, 0, 0, …
## $ count_cat                <dbl> 4, 7, 1, 2, 2, 8, 3, 2, 3, 5, 5, 6, 2, 3, 3, …
## $ count_genres             <dbl> 3, 2, 4, 3, 2, 6, 2, 1, 4, 3, 2, 1, 2, 2, 6, …
## $ count_tags               <dbl> 4, 20, 0, 19, 6, 20, 20, 0, 6, 6, 20, 6, 6, 2…
## $ count_devs               <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ count_pubs               <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, …

5. Conclusion

The main takeaways from the analyses that we have done after applying the preprocessing steps:

  • 2023 have 3 games that have strong popularity which can overtake the performance of other games that are released earlier in the top 10 most popular games globally.
  • Most popular games have the same developers and publishers.
  • The number of games released each year have increased until 2023 where as until October 2023, the number of games released only 2,233 games compared with 10,205 games in 2022 which might indicates that trend for the number of new games in 2023 might be decreasing.
  • The average price of games have decreased until 2021 where it increased because of the decrease in new free games and increase in new paid games.
  • Indie games have been on increasing trends starting from 2012 and Casual genre have been on increasing trends starting from 2014.
  • Strategy genre have been overtaken by Simulation genre in the top 5 starting from 2016 which might indicated that game industry have been trying to shift the direction of their new games to more simple and casual game instead of strategy and complex game.
  • Single-player games have always been strong and followed by multi-player or pvp games that have been stable until 2021 where it started to decrease which might indicates that future games direction is single-player focused.
  • The support for the foreign (other than English) languages have been in increasing trend until in 2022 where it started to falldown a little bit.