Introduction

Do you know some pieces of Video Games History?

You might have played Snake or Space-wars game on your Nokia phone ten years ago. But, do you ever wonder when does the Video Game exists in the world?

Video Game was exists back in 1952, by the British Porfessor named A.S Douglas who created OXO or Tic-Tac-Toe, as his doctoral dissertation. In 1958, William Higinbotham created Tennis for Two in the Brookhaven National Laboratory in Uptop, New Yok, US.

Video Game Sales in The World

I have retreived data from kaggle.
Input the Data

# Importing the Data
vgsales <- read.csv("data_input/vgsales.csv")
vgsales
# Inspecting the Data
str(vgsales)
## 'data.frame':    16598 obs. of  11 variables:
##  $ Rank        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Name        : chr  "Wii Sports" "Super Mario Bros." "Mario Kart Wii" "Wii Sports Resort" ...
##  $ Platform    : chr  "Wii" "NES" "Wii" "Wii" ...
##  $ Year        : chr  "2006" "1985" "2008" "2009" ...
##  $ Genre       : chr  "Sports" "Platform" "Racing" "Sports" ...
##  $ Publisher   : chr  "Nintendo" "Nintendo" "Nintendo" "Nintendo" ...
##  $ NA_Sales    : num  41.5 29.1 15.8 15.8 11.3 ...
##  $ EU_Sales    : num  29.02 3.58 12.88 11.01 8.89 ...
##  $ JP_Sales    : num  3.77 6.81 3.79 3.28 10.22 ...
##  $ Other_Sales : num  8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
##  $ Global_Sales: num  82.7 40.2 35.8 33 31.4 ...

This data consist of 11 columns with 6 rows consist of characters, and others consist of number.

# Dimension of data
dim(vgsales)
## [1] 16598    11

Number of row is 16598 in the data

Data Exploration

head(vgsales, n = 10)
tail(vgsales, n = 10)

Data Preprocessing

Converting Data Types

# Converting Data Type

vgsales$Platform <- as.factor(vgsales$Platform)
vgsales$Year <- as.character(vgsales$Year)
vgsales$Genre <- as.factor(vgsales$Genre)
vgsales$Publisher <- as.factor(vgsales$Publisher)
str(vgsales)
## 'data.frame':    16598 obs. of  11 variables:
##  $ Rank        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Name        : chr  "Wii Sports" "Super Mario Bros." "Mario Kart Wii" "Wii Sports Resort" ...
##  $ Platform    : Factor w/ 31 levels "2600","3DO","3DS",..: 26 12 26 26 6 6 5 26 26 12 ...
##  $ Year        : chr  "2006" "1985" "2008" "2009" ...
##  $ Genre       : Factor w/ 12 levels "Action","Adventure",..: 11 5 7 11 8 6 5 4 5 9 ...
##  $ Publisher   : Factor w/ 579 levels "10TACLE Studios",..: 369 369 369 369 369 369 369 369 369 369 ...
##  $ NA_Sales    : num  41.5 29.1 15.8 15.8 11.3 ...
##  $ EU_Sales    : num  29.02 3.58 12.88 11.01 8.89 ...
##  $ JP_Sales    : num  3.77 6.81 3.79 3.28 10.22 ...
##  $ Other_Sales : num  8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
##  $ Global_Sales: num  82.7 40.2 35.8 33 31.4 ...

Inspecting NA Values

vgsales_new <- vgsales[vgsales$Year != "N/A", ]
vgsales_new[vgsales_new$Year == "N/A",]
vgsales_new$Year <- as.numeric(vgsales_new$Year)

Summary of the Data

summary(vgsales_new)
##       Rank           Name              Platform         Year     
##  Min.   :    1   Length:16327       DS     :2133   Min.   :1980  
##  1st Qu.: 4136   Class :character   PS2    :2127   1st Qu.:2003  
##  Median : 8295   Mode  :character   PS3    :1304   Median :2007  
##  Mean   : 8293                      Wii    :1290   Mean   :2006  
##  3rd Qu.:12442                      X360   :1235   3rd Qu.:2010  
##  Max.   :16600                      PSP    :1197   Max.   :2020  
##                                     (Other):7041                 
##           Genre                             Publisher        NA_Sales      
##  Action      :3253   Electronic Arts             : 1339   Min.   : 0.0000  
##  Sports      :2304   Activision                  :  966   1st Qu.: 0.0000  
##  Misc        :1710   Namco Bandai Games          :  928   Median : 0.0800  
##  Role-Playing:1471   Ubisoft                     :  918   Mean   : 0.2654  
##  Shooter     :1282   Konami Digital Entertainment:  823   3rd Qu.: 0.2400  
##  Adventure   :1276   THQ                         :  712   Max.   :41.4900  
##  (Other)     :5031   (Other)                     :10641                    
##     EU_Sales          JP_Sales         Other_Sales        Global_Sales    
##  Min.   : 0.0000   Min.   : 0.00000   Min.   : 0.00000   Min.   : 0.0100  
##  1st Qu.: 0.0000   1st Qu.: 0.00000   1st Qu.: 0.00000   1st Qu.: 0.0600  
##  Median : 0.0200   Median : 0.00000   Median : 0.01000   Median : 0.1700  
##  Mean   : 0.1476   Mean   : 0.07866   Mean   : 0.04832   Mean   : 0.5402  
##  3rd Qu.: 0.1100   3rd Qu.: 0.04000   3rd Qu.: 0.04000   3rd Qu.: 0.4800  
##  Max.   :29.0200   Max.   :10.22000   Max.   :10.57000   Max.   :82.7400  
## 

Feature Engineering

In this section, I want to make new 3 columns which each one consist of percentage of region sales on global sales, based on NA_Sales, EU_Sales, JP_Sales.

vgsales_new$Percent_NA_Sales <- round(((vgsales_new$NA_Sales / vgsales_new$Global_Sales) * 100), digits = 2)
vgsales_new$Percent_EU_Sales <- round(((vgsales_new$EU_Sales / vgsales_new$Global_Sales) * 100), digits = 2)
vgsales_new$Percent_JP_Sales <- round(((vgsales_new$JP_Sales / vgsales_new$Global_Sales) * 100), digits = 2)
vgsales_new$Percent_Other_Sales <- round(((vgsales_new$Other_Sales / vgsales_new$Global_Sales) * 100), digits = 2)
vgsales_new

Explanation 1. PlatformThe most popular platform is DS or Nintendo DS with 2163 games.
2. Time range of this data include games sales from 1980 until 2010.
3. Action placed the highest Game genre, followed by Sports.
4. The popular game publisher is Electronic Arts with 1351 games.
5. Sales
In the sales number, all sales in Norht America, Europe, Japan, and other shows small number in the min, 1st Quarter of data. and also Median. I suspect that the distribution of data is right skewed.

Data Analysis

What Publisher had the most game issued?

top_publisher <- aggregate(Year ~ Publisher, data = vgsales_new, FUN = length)
colnames(top_publisher)[2] <- "Count"
head(top_publisher[order(-top_publisher$Count),], n = 15)

From the data, revealed that the Electronic Arts had launched 1339 games and put them in the first game publisher. EA is followed by Activision. Both publishers are from United States. Furthermore, third position until ninth comes from Japan game makers.

What genre of games are being issued the most?

top_genre <- aggregate(Year ~ Genre, data = vgsales_new, FUN = length)
colnames(top_genre)[2] <- "Count"
plot1 <- head(top_genre[order(-top_genre$Count),], 5)
plot1
library(RColorBrewer)

color1 <- brewer.pal(5, "Dark2")

barplot(plot1$Count, # angka
        names.arg = plot1$Genre, # label kategori
        main = "Top 5 Game Genre",
        xlab = "Genre",
        col = color1
        )

What Platform of games are being issued the most?

(Platfrom game mana yang paling banyak dikeluarkan?)

top_platform <- aggregate(Year ~ Platform, data = vgsales_new, FUN =length)
colnames(top_platform)[2] <- "Count"
head(top_platform[order(-top_platform$Count),], 10)
plot2 <- head(top_platform[order(-top_platform$Count),], 5)
barplot(plot2$Count,
        names.arg = plot1$Platform, # label kategori
        main = "Top 5 Platform",
        xlab = "Platform",
        col = color1
        )

What year the sales of games in the world experienced the highest in Global Market?

(Pada tahun berapa penjualan game paling tinggi di pasar Global?)

top_sales <- aggregate(Global_Sales ~ Year, data = vgsales_new, FUN = sum)
colnames(top_sales)[2] <- "Total"
plot3 <- top_sales[order(-top_sales$Total),]
plot3
plot(x = plot3$Year, y = plot3$Total, type = "p", xlab = "Year", ylab = "Sales in Million USD",
     main = "Video Game Sales History 1980 - 2020",
     )

How much average Sales earned every year?

(Berapa rata-rata sales yang dibuat setiap tahun?)

avg_sales <- aggregate(Global_Sales ~ Year, data = vgsales_new, FUN = mean)
plot4 <- avg_sales[order(-avg_sales$Year),]
plot4
plot(x = plot4$Year, y = plot4$Global_Sales, xlab = "Year", ylab = "Average Sales in Million USD",
     main = "Average Video Game Sales History 1980 - 2020",
     )

Insight:

We can compare the first and second plot. In the first plot, the number of sales is increase from 1980 to 2009, and goes downward to 2020. On the other hand, average video sales games shows the reverse. From 1980 to 1990, the average sales in high reaching 4 Million USD. Then, the number is stuck below 1 Million since early 1990 until 2020.
I would like to argue this happened because of there was less number of games on 1980 to 1990 compared to 1990 to 2020. Eventhough the number of sales peak in 2009, the average of sales is below 1 Mio USD, and it was because of there are many games. We can se the reverse on the period of 1980 - 1990 where there are less sales, but the average sales per game is quite high.

What genre has the highest number of the game, and how much the total of its revenue from Global Market?

(Genre apa yang paling banyak di pasaran, dan berapa total salesnya di pasar global?0)

top_sales_genre <- aggregate(Global_Sales ~ Genre, data = vgsales_new, FUN = sum)
colnames(top_sales_genre)[2] <- "Total"
top_sales_genre[order(-top_sales_genre$Total),]

What game publisher has the highest total sales in Global Sales?

(Publisher mana yang memiliki total penjualan tertinggi?)

top_sales_publisher <- aggregate(Global_Sales ~ Publisher, data = vgsales_new, FUN = sum)
colnames(top_sales_publisher)[2] <- "Total"
top_sales_publisher[order(-top_sales_publisher$Total),]
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.0     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
plot5 <- vgsales_new %>%
  group_by(Year) %>% 
  summarise(NA_Sales = sum(NA_Sales),
         JP_Sales = sum(JP_Sales),
         EU_Sales = sum(EU_Sales),
         Other_Sales = sum(Other_Sales)) %>% 
  select(Year, NA_Sales, JP_Sales, EU_Sales, Other_Sales) %>% 
  ungroup()
plot5
plot(x = plot5$Year, y = plot5$NA_Sales, pch = 1, xlab = "Year", ylab = "NA Sales")
lines(plot5$Year, plot5$NA_Sales, pch = 2)

plot(x = plot5$Year, y = plot5$JP_Sales, pch = 1, xlab = "Year", ylab = "JP Sales")
lines(plot5$Year, plot5$JP_Sales, pch = 3)

plot(x = plot5$Year, y = plot5$EU_Sales, pch = 1, xlab = "Year", ylab = "EU Sales")
lines(plot5$Year, plot5$EU_Sales, pch = 3)

plot(x = plot5$Year, y = plot5$Other_Sales, pch = 1, xlab = "Year", ylab = "Other Sales")
lines(plot5$Year, plot5$Other_Sales, pch = 3)

### Insight: Four charts above compares sales history in various market of video games. From the plot we can conclude that:
1. Video games starts and grows in North America. After two years, Japan sales starts to grow followed by EU Sales in 1985. Other market is the last market which starts having sales in the late 1980’s.
2. Game Sales trend always increase and peaked in 2008. However, every market has different volume of sales. The sales then decrease infrom 2009 to 2020.

Note. The lowest game sales in 2020 may caused by the data on this analysis.

2008 Game Sales Analysis

sales_2008 <- vgsales_new[vgsales_new$Year == "2008", ]
sales_2008

Total Publisher

What publisher had launched the most games in 2008?

publi_2008 <- aggregate(Year ~ Publisher, data = vgsales_new, FUN = length)
colnames(publi_2008)[2] <- "Count"
head(publi_2008[order(-publi_2008$Count),], n = 10)

Inisght:
From the graph and table, still we could see that the most productive game publisher is Electronic Arts, followed by Activision, Namco, Ubisoft, Konami and THQ.

Total Platform

platform_2008 <- aggregate(Year ~ Platform, data = vgsales_new, FUN = length)
colnames(platform_2008)[2] <- "Count"
head(platform_2008[order(-platform_2008$Count),], n = 10)

How much of each market contribute to sales in 2008?

prop_2008 <- vgsales_new %>% 
  filter(Year == "2008") %>% 
  summarise(Percent_NA_Sales = mean(Percent_NA_Sales),
            Percent_EU_Sales = mean(Percent_EU_Sales),
            Percent_JP_Sales = mean(Percent_JP_Sales),
            Percent_Other_Sales = mean(Percent_Other_Sales)
            ) %>% 
  ungroup()
prop_2008

Insight:

In 2008 when the peak of video game sales, most sales was made from North America market, followed by JP, Europe, and other sales. We can say that North America still the largest market of video games.

Conclusion

Gaming industry nowadays had achieved more than advance gaming. However, games still has their history. The evolution of game is recorded by this data which records game sales from 1980 until 2020. There were 16.598 games sold with 577 game publishers, various game platforms, and genres. I must notice that this data couldn’t catch hollisticly game sold from 1980 to 2020.

By comparing total sales per year and average video game sales, we can find an interesting insigth. The highest average video game sales happened in 1985 to 1990, but the total sales peak is happened in 2009. Talking about sales market around the world. North America (USA, Canada) still dominate game sales until present. The Japanese market started generating sales of games in 1982 and then increased until mid-1995. Then, the development of the European market was still stagnant from 1980 to 1990 and experienced an increase in the middle of 1995.