You might have played Snake or Space-wars game on your Nokia phone ten years ago. But, do you ever wonder when does the Video Game exists in the world?
Video Game was exists back in 1952, by the British Porfessor named A.S Douglas who created OXO or Tic-Tac-Toe, as his doctoral dissertation. In 1958, William Higinbotham created Tennis for Two in the Brookhaven National Laboratory in Uptop, New Yok, US.
I have retreived data from kaggle.
Input the Data
# Importing the Data
vgsales <- read.csv("data_input/vgsales.csv")
vgsales# Inspecting the Data
str(vgsales)## 'data.frame': 16598 obs. of 11 variables:
## $ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Name : chr "Wii Sports" "Super Mario Bros." "Mario Kart Wii" "Wii Sports Resort" ...
## $ Platform : chr "Wii" "NES" "Wii" "Wii" ...
## $ Year : chr "2006" "1985" "2008" "2009" ...
## $ Genre : chr "Sports" "Platform" "Racing" "Sports" ...
## $ Publisher : chr "Nintendo" "Nintendo" "Nintendo" "Nintendo" ...
## $ NA_Sales : num 41.5 29.1 15.8 15.8 11.3 ...
## $ EU_Sales : num 29.02 3.58 12.88 11.01 8.89 ...
## $ JP_Sales : num 3.77 6.81 3.79 3.28 10.22 ...
## $ Other_Sales : num 8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
## $ Global_Sales: num 82.7 40.2 35.8 33 31.4 ...
This data consist of 11 columns with 6 rows consist of characters, and others consist of number.
# Dimension of data
dim(vgsales)## [1] 16598 11
Number of row is 16598 in the data
head(vgsales, n = 10)tail(vgsales, n = 10)# Converting Data Type
vgsales$Platform <- as.factor(vgsales$Platform)
vgsales$Year <- as.character(vgsales$Year)
vgsales$Genre <- as.factor(vgsales$Genre)
vgsales$Publisher <- as.factor(vgsales$Publisher)
str(vgsales)## 'data.frame': 16598 obs. of 11 variables:
## $ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Name : chr "Wii Sports" "Super Mario Bros." "Mario Kart Wii" "Wii Sports Resort" ...
## $ Platform : Factor w/ 31 levels "2600","3DO","3DS",..: 26 12 26 26 6 6 5 26 26 12 ...
## $ Year : chr "2006" "1985" "2008" "2009" ...
## $ Genre : Factor w/ 12 levels "Action","Adventure",..: 11 5 7 11 8 6 5 4 5 9 ...
## $ Publisher : Factor w/ 579 levels "10TACLE Studios",..: 369 369 369 369 369 369 369 369 369 369 ...
## $ NA_Sales : num 41.5 29.1 15.8 15.8 11.3 ...
## $ EU_Sales : num 29.02 3.58 12.88 11.01 8.89 ...
## $ JP_Sales : num 3.77 6.81 3.79 3.28 10.22 ...
## $ Other_Sales : num 8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
## $ Global_Sales: num 82.7 40.2 35.8 33 31.4 ...
vgsales_new <- vgsales[vgsales$Year != "N/A", ]
vgsales_new[vgsales_new$Year == "N/A",]vgsales_new$Year <- as.numeric(vgsales_new$Year)summary(vgsales_new)## Rank Name Platform Year
## Min. : 1 Length:16327 DS :2133 Min. :1980
## 1st Qu.: 4136 Class :character PS2 :2127 1st Qu.:2003
## Median : 8295 Mode :character PS3 :1304 Median :2007
## Mean : 8293 Wii :1290 Mean :2006
## 3rd Qu.:12442 X360 :1235 3rd Qu.:2010
## Max. :16600 PSP :1197 Max. :2020
## (Other):7041
## Genre Publisher NA_Sales
## Action :3253 Electronic Arts : 1339 Min. : 0.0000
## Sports :2304 Activision : 966 1st Qu.: 0.0000
## Misc :1710 Namco Bandai Games : 928 Median : 0.0800
## Role-Playing:1471 Ubisoft : 918 Mean : 0.2654
## Shooter :1282 Konami Digital Entertainment: 823 3rd Qu.: 0.2400
## Adventure :1276 THQ : 712 Max. :41.4900
## (Other) :5031 (Other) :10641
## EU_Sales JP_Sales Other_Sales Global_Sales
## Min. : 0.0000 Min. : 0.00000 Min. : 0.00000 Min. : 0.0100
## 1st Qu.: 0.0000 1st Qu.: 0.00000 1st Qu.: 0.00000 1st Qu.: 0.0600
## Median : 0.0200 Median : 0.00000 Median : 0.01000 Median : 0.1700
## Mean : 0.1476 Mean : 0.07866 Mean : 0.04832 Mean : 0.5402
## 3rd Qu.: 0.1100 3rd Qu.: 0.04000 3rd Qu.: 0.04000 3rd Qu.: 0.4800
## Max. :29.0200 Max. :10.22000 Max. :10.57000 Max. :82.7400
##
In this section, I want to make new 3 columns which each one consist of percentage of region sales on global sales, based on NA_Sales, EU_Sales, JP_Sales.
vgsales_new$Percent_NA_Sales <- round(((vgsales_new$NA_Sales / vgsales_new$Global_Sales) * 100), digits = 2)
vgsales_new$Percent_EU_Sales <- round(((vgsales_new$EU_Sales / vgsales_new$Global_Sales) * 100), digits = 2)
vgsales_new$Percent_JP_Sales <- round(((vgsales_new$JP_Sales / vgsales_new$Global_Sales) * 100), digits = 2)
vgsales_new$Percent_Other_Sales <- round(((vgsales_new$Other_Sales / vgsales_new$Global_Sales) * 100), digits = 2)
vgsales_newExplanation 1. PlatformThe most popular platform is DS or Nintendo DS with 2163 games.
2. Time range of this data include games sales from 1980 until 2010.
3. Action placed the highest Game genre, followed by Sports.
4. The popular game publisher is Electronic Arts with 1351 games.
5. Sales
In the sales number, all sales in Norht America, Europe, Japan, and other shows small number in the min, 1st Quarter of data. and also Median. I suspect that the distribution of data is right skewed.
top_publisher <- aggregate(Year ~ Publisher, data = vgsales_new, FUN = length)
colnames(top_publisher)[2] <- "Count"
head(top_publisher[order(-top_publisher$Count),], n = 15)From the data, revealed that the Electronic Arts had launched 1339 games and put them in the first game publisher. EA is followed by Activision. Both publishers are from United States. Furthermore, third position until ninth comes from Japan game makers.
top_genre <- aggregate(Year ~ Genre, data = vgsales_new, FUN = length)
colnames(top_genre)[2] <- "Count"
plot1 <- head(top_genre[order(-top_genre$Count),], 5)
plot1library(RColorBrewer)
color1 <- brewer.pal(5, "Dark2")
barplot(plot1$Count, # angka
names.arg = plot1$Genre, # label kategori
main = "Top 5 Game Genre",
xlab = "Genre",
col = color1
)(Platfrom game mana yang paling banyak dikeluarkan?)
top_platform <- aggregate(Year ~ Platform, data = vgsales_new, FUN =length)
colnames(top_platform)[2] <- "Count"
head(top_platform[order(-top_platform$Count),], 10)plot2 <- head(top_platform[order(-top_platform$Count),], 5)
barplot(plot2$Count,
names.arg = plot1$Platform, # label kategori
main = "Top 5 Platform",
xlab = "Platform",
col = color1
)(Pada tahun berapa penjualan game paling tinggi di pasar Global?)
top_sales <- aggregate(Global_Sales ~ Year, data = vgsales_new, FUN = sum)
colnames(top_sales)[2] <- "Total"
plot3 <- top_sales[order(-top_sales$Total),]
plot3plot(x = plot3$Year, y = plot3$Total, type = "p", xlab = "Year", ylab = "Sales in Million USD",
main = "Video Game Sales History 1980 - 2020",
)(Berapa rata-rata sales yang dibuat setiap tahun?)
avg_sales <- aggregate(Global_Sales ~ Year, data = vgsales_new, FUN = mean)
plot4 <- avg_sales[order(-avg_sales$Year),]
plot4plot(x = plot4$Year, y = plot4$Global_Sales, xlab = "Year", ylab = "Average Sales in Million USD",
main = "Average Video Game Sales History 1980 - 2020",
)We can compare the first and second plot. In the first plot, the number of sales is increase from 1980 to 2009, and goes downward to 2020. On the other hand, average video sales games shows the reverse. From 1980 to 1990, the average sales in high reaching 4 Million USD. Then, the number is stuck below 1 Million since early 1990 until 2020.
I would like to argue this happened because of there was less number of games on 1980 to 1990 compared to 1990 to 2020. Eventhough the number of sales peak in 2009, the average of sales is below 1 Mio USD, and it was because of there are many games. We can se the reverse on the period of 1980 - 1990 where there are less sales, but the average sales per game is quite high.
(Genre apa yang paling banyak di pasaran, dan berapa total salesnya di pasar global?0)
top_sales_genre <- aggregate(Global_Sales ~ Genre, data = vgsales_new, FUN = sum)
colnames(top_sales_genre)[2] <- "Total"
top_sales_genre[order(-top_sales_genre$Total),](Publisher mana yang memiliki total penjualan tertinggi?)
top_sales_publisher <- aggregate(Global_Sales ~ Publisher, data = vgsales_new, FUN = sum)
colnames(top_sales_publisher)[2] <- "Total"
top_sales_publisher[order(-top_sales_publisher$Total),]library(tidyverse)## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
plot5 <- vgsales_new %>%
group_by(Year) %>%
summarise(NA_Sales = sum(NA_Sales),
JP_Sales = sum(JP_Sales),
EU_Sales = sum(EU_Sales),
Other_Sales = sum(Other_Sales)) %>%
select(Year, NA_Sales, JP_Sales, EU_Sales, Other_Sales) %>%
ungroup()
plot5plot(x = plot5$Year, y = plot5$NA_Sales, pch = 1, xlab = "Year", ylab = "NA Sales")
lines(plot5$Year, plot5$NA_Sales, pch = 2)plot(x = plot5$Year, y = plot5$JP_Sales, pch = 1, xlab = "Year", ylab = "JP Sales")
lines(plot5$Year, plot5$JP_Sales, pch = 3)plot(x = plot5$Year, y = plot5$EU_Sales, pch = 1, xlab = "Year", ylab = "EU Sales")
lines(plot5$Year, plot5$EU_Sales, pch = 3)plot(x = plot5$Year, y = plot5$Other_Sales, pch = 1, xlab = "Year", ylab = "Other Sales")
lines(plot5$Year, plot5$Other_Sales, pch = 3) ### Insight: Four charts above compares sales history in various market of video games. From the plot we can conclude that:
1. Video games starts and grows in North America. After two years, Japan sales starts to grow followed by EU Sales in 1985. Other market is the last market which starts having sales in the late 1980’s.
2. Game Sales trend always increase and peaked in 2008. However, every market has different volume of sales. The sales then decrease infrom 2009 to 2020.
Note. The lowest game sales in 2020 may caused by the data on this analysis.
sales_2008 <- vgsales_new[vgsales_new$Year == "2008", ]
sales_2008What publisher had launched the most games in 2008?
publi_2008 <- aggregate(Year ~ Publisher, data = vgsales_new, FUN = length)
colnames(publi_2008)[2] <- "Count"
head(publi_2008[order(-publi_2008$Count),], n = 10)Inisght:
From the graph and table, still we could see that the most productive game publisher is Electronic Arts, followed by Activision, Namco, Ubisoft, Konami and THQ.
platform_2008 <- aggregate(Year ~ Platform, data = vgsales_new, FUN = length)
colnames(platform_2008)[2] <- "Count"
head(platform_2008[order(-platform_2008$Count),], n = 10)prop_2008 <- vgsales_new %>%
filter(Year == "2008") %>%
summarise(Percent_NA_Sales = mean(Percent_NA_Sales),
Percent_EU_Sales = mean(Percent_EU_Sales),
Percent_JP_Sales = mean(Percent_JP_Sales),
Percent_Other_Sales = mean(Percent_Other_Sales)
) %>%
ungroup()
prop_2008Insight:
In 2008 when the peak of video game sales, most sales was made from North America market, followed by JP, Europe, and other sales. We can say that North America still the largest market of video games.
Gaming industry nowadays had achieved more than advance gaming. However, games still has their history. The evolution of game is recorded by this data which records game sales from 1980 until 2020. There were 16.598 games sold with 577 game publishers, various game platforms, and genres. I must notice that this data couldn’t catch hollisticly game sold from 1980 to 2020.
By comparing total sales per year and average video game sales, we can find an interesting insigth. The highest average video game sales happened in 1985 to 1990, but the total sales peak is happened in 2009. Talking about sales market around the world. North America (USA, Canada) still dominate game sales until present. The Japanese market started generating sales of games in 1982 and then increased until mid-1995. Then, the development of the European market was still stagnant from 1980 to 1990 and experienced an increase in the middle of 1995.