Photo by Andrzej Rembowski
The objective of this video game sales analysis is to explore and provide insights into various aspects of the video game industry from 1980 to 2020. This analysis aims to answer the following questions:
Which genres and platforms are the top 10 most popular in terms of sales in each region and worldwide?
Is there a correlation between the year of release and global sales? Are newer games generally more successful?
How do regional sales differ across North America, Europe, Japan, and other parts of the world? Are there any notable variations in genre or platform preferences among different regions?
Which publishers have the highest global sales? Are there any publishers that consistently perform well across different regions?
Can we identify any trends or patterns in the sales data over time?
R programming
Tableau
readxl
tidyverse
writexl
dplyr
tidyr
#load the readxl
library(readxl)
#import the dataset
vgsales <- read_excel("C:/stuff/vgsales.xlsx")
#print the data set
vgsales
## # A tibble: 16,598 × 11
## Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1 Wii Sports Wii 2006 Spor… Nintendo 41.5 29.0 3.77
## 2 2 Super Mario … NES 1985 Plat… Nintendo 29.1 3.58 6.81
## 3 3 Mario Kart W… Wii 2008 Raci… Nintendo 15.8 12.9 3.79
## 4 4 Wii Sports R… Wii 2009 Spor… Nintendo 15.8 11.0 3.28
## 5 5 Pokemon Red/… GB 1996 Role… Nintendo 11.3 8.89 10.2
## 6 6 Tetris GB 1989 Puzz… Nintendo 23.2 2.26 4.22
## 7 7 New Super Ma… DS 2006 Plat… Nintendo 11.4 9.23 6.5
## 8 8 Wii Play Wii 2006 Misc Nintendo 14.0 9.2 2.93
## 9 9 New Super Ma… Wii 2009 Plat… Nintendo 14.6 7.06 4.7
## 10 10 Duck Hunt NES 1984 Shoo… Nintendo 26.9 0.63 0.28
## # ℹ 16,588 more rows
## # ℹ 2 more variables: Other_Sales <dbl>, Global_Sales <dbl>
I pivot wide data format into long data format.
#load the tidyverse
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#pivot wide data format into long data format
long_gamedata<-pivot_longer(vgsales,cols=c("NA_Sales","EU_Sales","JP_Sales","Other_Sales","Global_Sales"),names_to = "Sales",values_to = "Sales_value")
#print the data set
long_gamedata
## # A tibble: 82,990 × 8
## Rank Name Platform Year Genre Publisher Sales Sales_value
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 1 Wii Sports Wii 2006 Sports Nintendo NA_Sal… 41.5
## 2 1 Wii Sports Wii 2006 Sports Nintendo EU_Sal… 29.0
## 3 1 Wii Sports Wii 2006 Sports Nintendo JP_Sal… 3.77
## 4 1 Wii Sports Wii 2006 Sports Nintendo Other_… 8.46
## 5 1 Wii Sports Wii 2006 Sports Nintendo Global… 82.7
## 6 2 Super Mario Bros. NES 1985 Platform Nintendo NA_Sal… 29.1
## 7 2 Super Mario Bros. NES 1985 Platform Nintendo EU_Sal… 3.58
## 8 2 Super Mario Bros. NES 1985 Platform Nintendo JP_Sal… 6.81
## 9 2 Super Mario Bros. NES 1985 Platform Nintendo Other_… 0.77
## 10 2 Super Mario Bros. NES 1985 Platform Nintendo Global… 40.2
## # ℹ 82,980 more rows
Next, I doing a preview of the top 10 rows from the dataset.
#Preview 10 rows
long_gamedata %>%
head(10)
## # A tibble: 10 × 8
## Rank Name Platform Year Genre Publisher Sales Sales_value
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 1 Wii Sports Wii 2006 Sports Nintendo NA_Sal… 41.5
## 2 1 Wii Sports Wii 2006 Sports Nintendo EU_Sal… 29.0
## 3 1 Wii Sports Wii 2006 Sports Nintendo JP_Sal… 3.77
## 4 1 Wii Sports Wii 2006 Sports Nintendo Other_… 8.46
## 5 1 Wii Sports Wii 2006 Sports Nintendo Global… 82.7
## 6 2 Super Mario Bros. NES 1985 Platform Nintendo NA_Sal… 29.1
## 7 2 Super Mario Bros. NES 1985 Platform Nintendo EU_Sal… 3.58
## 8 2 Super Mario Bros. NES 1985 Platform Nintendo JP_Sal… 6.81
## 9 2 Super Mario Bros. NES 1985 Platform Nintendo Other_… 0.77
## 10 2 Super Mario Bros. NES 1985 Platform Nintendo Global… 40.2
I am previewing the shape of the dataset.
dim(long_gamedata)
## [1] 82990 8
I have identified ‘N/A’ values in the Year column and I am cleaning the data by removing all instances of ‘N/A’ from that column. The reason for deleting them is that these values account for only 1.66% of the total rows in the Year column, which amounts to 1,355 rows out of 81,635 rows. Removing them would not significantly affect the data.
#get an unique value from year column
unique(long_gamedata$Year)
## [1] "2006" "1985" "2008" "2009" "1996" "1989" "1984" "2005" "1999" "2007"
## [11] "2010" "2013" "2004" "1990" "1988" "2002" "2001" "2011" "1998" "2015"
## [21] "2012" "2014" "1992" "1997" "1993" "1994" "1982" "2003" "1986" "2000"
## [31] "N/A" "1995" "2016" "1991" "1981" "1987" "1980" "1983" "2020" "2017"
#filtering the 'N/A' in the Year column
long_gamedata %>%
filter(grepl("N/A",long_gamedata$Year)) %>%
select(everything())
## # A tibble: 1,355 × 8
## Rank Name Platform Year Genre Publisher Sales Sales_value
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 180 Madden NFL 2004 PS2 N/A Sports Electronic Ar… NA_S… 4.26
## 2 180 Madden NFL 2004 PS2 N/A Sports Electronic Ar… EU_S… 0.26
## 3 180 Madden NFL 2004 PS2 N/A Sports Electronic Ar… JP_S… 0.01
## 4 180 Madden NFL 2004 PS2 N/A Sports Electronic Ar… Othe… 0.71
## 5 180 Madden NFL 2004 PS2 N/A Sports Electronic Ar… Glob… 5.23
## 6 378 FIFA Soccer 2004 PS2 N/A Sports Electronic Ar… NA_S… 0.59
## 7 378 FIFA Soccer 2004 PS2 N/A Sports Electronic Ar… EU_S… 2.36
## 8 378 FIFA Soccer 2004 PS2 N/A Sports Electronic Ar… JP_S… 0.04
## 9 378 FIFA Soccer 2004 PS2 N/A Sports Electronic Ar… Othe… 0.51
## 10 378 FIFA Soccer 2004 PS2 N/A Sports Electronic Ar… Glob… 3.49
## # ℹ 1,345 more rows
#delete all 'N/A' in the Year column
long_gamedata<-subset(long_gamedata,Year !="N/A")
#print the data set
long_gamedata
## # A tibble: 81,635 × 8
## Rank Name Platform Year Genre Publisher Sales Sales_value
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 1 Wii Sports Wii 2006 Sports Nintendo NA_Sal… 41.5
## 2 1 Wii Sports Wii 2006 Sports Nintendo EU_Sal… 29.0
## 3 1 Wii Sports Wii 2006 Sports Nintendo JP_Sal… 3.77
## 4 1 Wii Sports Wii 2006 Sports Nintendo Other_… 8.46
## 5 1 Wii Sports Wii 2006 Sports Nintendo Global… 82.7
## 6 2 Super Mario Bros. NES 1985 Platform Nintendo NA_Sal… 29.1
## 7 2 Super Mario Bros. NES 1985 Platform Nintendo EU_Sal… 3.58
## 8 2 Super Mario Bros. NES 1985 Platform Nintendo JP_Sal… 6.81
## 9 2 Super Mario Bros. NES 1985 Platform Nintendo Other_… 0.77
## 10 2 Super Mario Bros. NES 1985 Platform Nintendo Global… 40.2
## # ℹ 81,625 more rows
#Preview the shape of the dataset after deleting the 'N/A' values
dim(long_gamedata)
## [1] 81635 8
I have found an outlier in the dataset, specifically the year 2020. Therefore, I have decided to remove this outlier from the dataset because it accounts for only 0.006% of the total rows in the Year column, which amounts to 5 rows out of 81,630 rows. Removing it would not significantly affect the data.
#filtering the "2020" in the Year column
long_gamedata %>%
filter(Year==2020) %>%
select(everything())
## # A tibble: 5 × 8
## Rank Name Platform Year Genre Publisher Sales Sales_value
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 5959 Imagine: Makeup Artist DS 2020 Simul… Ubisoft NA_S… 0.27
## 2 5959 Imagine: Makeup Artist DS 2020 Simul… Ubisoft EU_S… 0
## 3 5959 Imagine: Makeup Artist DS 2020 Simul… Ubisoft JP_S… 0
## 4 5959 Imagine: Makeup Artist DS 2020 Simul… Ubisoft Othe… 0.02
## 5 5959 Imagine: Makeup Artist DS 2020 Simul… Ubisoft Glob… 0.29
#delete all "2020" in the Year column
long_gamedata<-subset(long_gamedata,Year !='2020')
#print the data set
long_gamedata
## # A tibble: 81,630 × 8
## Rank Name Platform Year Genre Publisher Sales Sales_value
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 1 Wii Sports Wii 2006 Sports Nintendo NA_Sal… 41.5
## 2 1 Wii Sports Wii 2006 Sports Nintendo EU_Sal… 29.0
## 3 1 Wii Sports Wii 2006 Sports Nintendo JP_Sal… 3.77
## 4 1 Wii Sports Wii 2006 Sports Nintendo Other_… 8.46
## 5 1 Wii Sports Wii 2006 Sports Nintendo Global… 82.7
## 6 2 Super Mario Bros. NES 1985 Platform Nintendo NA_Sal… 29.1
## 7 2 Super Mario Bros. NES 1985 Platform Nintendo EU_Sal… 3.58
## 8 2 Super Mario Bros. NES 1985 Platform Nintendo JP_Sal… 6.81
## 9 2 Super Mario Bros. NES 1985 Platform Nintendo Other_… 0.77
## 10 2 Super Mario Bros. NES 1985 Platform Nintendo Global… 40.2
## # ℹ 81,620 more rows
#Preview the shape of the dataset after deleting the outliers
dim(long_gamedata)
## [1] 81630 8
Next, I transform the data structure of the year column from a character (chr) data type to a double (dbl) data type.
#preview of data structure
glimpse(long_gamedata)
## Rows: 81,630
## Columns: 8
## $ Rank <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4…
## $ Name <chr> "Wii Sports", "Wii Sports", "Wii Sports", "Wii Sports", "W…
## $ Platform <chr> "Wii", "Wii", "Wii", "Wii", "Wii", "NES", "NES", "NES", "N…
## $ Year <chr> "2006", "2006", "2006", "2006", "2006", "1985", "1985", "1…
## $ Genre <chr> "Sports", "Sports", "Sports", "Sports", "Sports", "Platfor…
## $ Publisher <chr> "Nintendo", "Nintendo", "Nintendo", "Nintendo", "Nintendo"…
## $ Sales <chr> "NA_Sales", "EU_Sales", "JP_Sales", "Other_Sales", "Global…
## $ Sales_value <dbl> 41.49, 29.02, 3.77, 8.46, 82.74, 29.08, 3.58, 6.81, 0.77, …
#change data structure from <chr> to <dbl> in the year column
long_gamedata$Year<-as.numeric(long_gamedata$Year)
#preview of data structure
glimpse(long_gamedata)
## Rows: 81,630
## Columns: 8
## $ Rank <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4…
## $ Name <chr> "Wii Sports", "Wii Sports", "Wii Sports", "Wii Sports", "W…
## $ Platform <chr> "Wii", "Wii", "Wii", "Wii", "Wii", "NES", "NES", "NES", "N…
## $ Year <dbl> 2006, 2006, 2006, 2006, 2006, 1985, 1985, 1985, 1985, 1985…
## $ Genre <chr> "Sports", "Sports", "Sports", "Sports", "Sports", "Platfor…
## $ Publisher <chr> "Nintendo", "Nintendo", "Nintendo", "Nintendo", "Nintendo"…
## $ Sales <chr> "NA_Sales", "EU_Sales", "JP_Sales", "Other_Sales", "Global…
## $ Sales_value <dbl> 41.49, 29.02, 3.77, 8.46, 82.74, 29.08, 3.58, 6.81, 0.77, …
I use tableau to visualize charts that might be an interesting information from the dataset. Therefore, I need to export this dataset into an XLSX file to further visualize it in Tableau.
#export the dataset into an XLSX file
library(writexl)
write_xlsx(long_gamedata,path = "long_gamedata.xlsx")
This dashboard presents the sales of games categorized by publishers, platforms, and genres in different years which providing an insightful visual representation of the gaming industry’s sales trends.
As evident from the bar chart above, the video game industry witnessed its highest growth in the 2000s. Intrigued by this surge, I decided to explore the factors contributing to the industry’s boom during that period. To investigate further, I examined the number of video game releases by considering the distinct count of game names released each year. In this analysis, if a game was released on multiple platforms but shared the same name, it was considered as one game.
long_gamedata %>%
group_by(Year) %>%
summarize(distinct_count = n_distinct(Name))
## # A tibble: 38 × 2
## Year distinct_count
## <dbl> <int>
## 1 1980 9
## 2 1981 46
## 3 1982 36
## 4 1983 17
## 5 1984 14
## 6 1985 14
## 7 1986 21
## 8 1987 16
## 9 1988 15
## 10 1989 17
## # ℹ 28 more rows
As you can observe from the treemap and the number of video game releases, it becomes evident that one of the reasons behind the booming video game industry during the 2000s was the exceptionally high number of video game releases specifically in the year 2008. While conducting research on the video game market industry, I discovered that during the 2000s, major companies such as Microsoft, Sony, and Nintendo began releasing gaming consoles into the market. This coincided with significant technological advancements, making gaming consoles more accessible and improving the overall quality of games in terms of graphics and gameplay. As a result, these factors likely contributed to the blooming of the video game industry during that time.
Next, I visualized the treemap above to observe the overall share of publishers in each region in terms of sales. One interesting finding is that in North America, Europe, and the rest of the world, the top 5 publishers are dominated by big companies. However, in Japan, while they share some of the top publishers with other regions, they also have their own Japanese publishers included in their top 5 list. Additionally, Nintendo plays a dominant role as a publisher in overall.
For the packed bubble chart above, we can observe that North America, Europe, and the rest of the world share their top 3 genres with each other. However, Japan stands out with its own unique genre, indicating that Japan’s video game market has distinct trends and does not conform as closely to other regions. Furthermore, the global genre preferences also demonstrate a similarity in the top 3 genres across North America, Europe, and the rest of the world. This may be attributed to the fact that three out of four regions have a strong preference for these genres, leading to significant sales.
I am arranging the dataset in descending order.
#arrange the dataset
long_gamedata<-arrange(long_gamedata,desc(Sales_value))
#print the data set
long_gamedata
## # A tibble: 81,630 × 8
## Rank Name Platform Year Genre Publisher Sales Sales_value
## <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 1 Wii Sports Wii 2006 Spor… Nintendo Glob… 82.7
## 2 1 Wii Sports Wii 2006 Spor… Nintendo NA_S… 41.5
## 3 2 Super Mario Bros. NES 1985 Plat… Nintendo Glob… 40.2
## 4 3 Mario Kart Wii Wii 2008 Raci… Nintendo Glob… 35.8
## 5 4 Wii Sports Resort Wii 2009 Spor… Nintendo Glob… 33
## 6 5 Pokemon Red/Pokemon B… GB 1996 Role… Nintendo Glob… 31.4
## 7 6 Tetris GB 1989 Puzz… Nintendo Glob… 30.3
## 8 7 New Super Mario Bros. DS 2006 Plat… Nintendo Glob… 30.0
## 9 2 Super Mario Bros. NES 1985 Plat… Nintendo NA_S… 29.1
## 10 1 Wii Sports Wii 2006 Spor… Nintendo EU_S… 29.0
## # ℹ 81,620 more rows
For the treemap above, I have visualized platform preferences in each region.
Next, I will analyze each region individually to identify the top 10 genres and platforms in terms of sales.
Sales in North America (in millions)
#Sales in North America
long_gamedata %>%
filter(grepl("NA_Sales",long_gamedata$Sales)) %>%
select(Genre,Platform,Sales_value)
## # A tibble: 16,326 × 3
## Genre Platform Sales_value
## <chr> <chr> <dbl>
## 1 Sports Wii 41.5
## 2 Platform NES 29.1
## 3 Shooter NES 26.9
## 4 Puzzle GB 23.2
## 5 Racing Wii 15.8
## 6 Sports Wii 15.8
## 7 Misc X360 15.0
## 8 Platform Wii 14.6
## 9 Misc Wii 14.0
## 10 Platform SNES 12.8
## # ℹ 16,316 more rows
#Top 10 sales in North America
long_gamedata %>%
filter(grepl("NA_Sales",long_gamedata$Sales)) %>%
select(Genre,Platform,Sales_value) %>%
head(10)
## # A tibble: 10 × 3
## Genre Platform Sales_value
## <chr> <chr> <dbl>
## 1 Sports Wii 41.5
## 2 Platform NES 29.1
## 3 Shooter NES 26.9
## 4 Puzzle GB 23.2
## 5 Racing Wii 15.8
## 6 Sports Wii 15.8
## 7 Misc X360 15.0
## 8 Platform Wii 14.6
## 9 Misc Wii 14.0
## 10 Platform SNES 12.8
The table above provides the top 10 genres and platforms in terms of sales in North America.
Sales in Europe (in millions)
#Sales in Europe
long_gamedata %>%
filter(grepl("EU_Sales",long_gamedata$Sales)) %>%
select(Genre,Platform,Sales_value)
## # A tibble: 16,326 × 3
## Genre Platform Sales_value
## <chr> <chr> <dbl>
## 1 Sports Wii 29.0
## 2 Racing Wii 12.9
## 3 Sports Wii 11.0
## 4 Simulation DS 11
## 5 Action PS3 9.27
## 6 Misc DS 9.26
## 7 Platform DS 9.23
## 8 Misc Wii 9.2
## 9 Role-Playing GB 8.89
## 10 Sports Wii 8.59
## # ℹ 16,316 more rows
#Top 10 sales in Europe
long_gamedata %>%
filter(grepl("EU_Sales",long_gamedata$Sales)) %>%
select(Genre,Platform,Sales_value) %>%
head(10)
## # A tibble: 10 × 3
## Genre Platform Sales_value
## <chr> <chr> <dbl>
## 1 Sports Wii 29.0
## 2 Racing Wii 12.9
## 3 Sports Wii 11.0
## 4 Simulation DS 11
## 5 Action PS3 9.27
## 6 Misc DS 9.26
## 7 Platform DS 9.23
## 8 Misc Wii 9.2
## 9 Role-Playing GB 8.89
## 10 Sports Wii 8.59
The table above provides the top 10 genres and platforms in terms of sales in Europe.
Sales in Japan (in millions)
#Sales in Japan
long_gamedata %>%
filter(grepl("JP_Sales",long_gamedata$Sales)) %>%
select(Genre,Platform,Sales_value)
## # A tibble: 16,326 × 3
## Genre Platform Sales_value
## <chr> <chr> <dbl>
## 1 Role-Playing GB 10.2
## 2 Role-Playing GB 7.2
## 3 Platform NES 6.81
## 4 Platform DS 6.5
## 5 Role-Playing DS 6.04
## 6 Role-Playing DS 5.65
## 7 Role-Playing GBA 5.38
## 8 Simulation DS 5.33
## 9 Puzzle DS 5.32
## 10 Role-Playing PSP 4.87
## # ℹ 16,316 more rows
#Top 10 sales in Japan
long_gamedata %>%
filter(grepl("JP_Sales",long_gamedata$Sales)) %>%
select(Genre,Platform,Sales_value) %>%
head(10)
## # A tibble: 10 × 3
## Genre Platform Sales_value
## <chr> <chr> <dbl>
## 1 Role-Playing GB 10.2
## 2 Role-Playing GB 7.2
## 3 Platform NES 6.81
## 4 Platform DS 6.5
## 5 Role-Playing DS 6.04
## 6 Role-Playing DS 5.65
## 7 Role-Playing GBA 5.38
## 8 Simulation DS 5.33
## 9 Puzzle DS 5.32
## 10 Role-Playing PSP 4.87
The table above provides the top 10 genres and platforms in terms of sales in Japan.
Sales for the rest of the world (in millions)
#Sales in the rest of the world
long_gamedata %>%
filter(grepl("Other_Sales",long_gamedata$Sales)) %>%
select(Genre,Platform,Sales_value)
## # A tibble: 16,326 × 3
## Genre Platform Sales_value
## <chr> <chr> <dbl>
## 1 Action PS2 10.6
## 2 Sports Wii 8.46
## 3 Racing PS2 7.53
## 4 Action PS3 4.14
## 5 Racing Wii 3.31
## 6 Sports Wii 2.96
## 7 Sports PS2 2.93
## 8 Platform DS 2.9
## 9 Misc Wii 2.85
## 10 Simulation DS 2.75
## # ℹ 16,316 more rows
#Top 10 sales for the rest of the world
long_gamedata %>%
filter(grepl("Other_Sales",long_gamedata$Sales)) %>%
select(Genre,Platform,Sales_value)%>%
head(10)
## # A tibble: 10 × 3
## Genre Platform Sales_value
## <chr> <chr> <dbl>
## 1 Action PS2 10.6
## 2 Sports Wii 8.46
## 3 Racing PS2 7.53
## 4 Action PS3 4.14
## 5 Racing Wii 3.31
## 6 Sports Wii 2.96
## 7 Sports PS2 2.93
## 8 Platform DS 2.9
## 9 Misc Wii 2.85
## 10 Simulation DS 2.75
The table above provides the top 10 genres and platforms in terms of sales for the rest of the world.
Total worldwide sales (in millions)
#Worldwide sales
long_gamedata %>%
filter(grepl("Global_Sales",long_gamedata$Sales)) %>%
select(Genre,Platform,Sales_value)
## # A tibble: 16,326 × 3
## Genre Platform Sales_value
## <chr> <chr> <dbl>
## 1 Sports Wii 82.7
## 2 Platform NES 40.2
## 3 Racing Wii 35.8
## 4 Sports Wii 33
## 5 Role-Playing GB 31.4
## 6 Puzzle GB 30.3
## 7 Platform DS 30.0
## 8 Misc Wii 29.0
## 9 Platform Wii 28.6
## 10 Shooter NES 28.3
## # ℹ 16,316 more rows
#Top 10 worldwide sales
long_gamedata %>%
filter(grepl("Global_Sales",long_gamedata$Sales)) %>%
select(Genre,Platform,Sales_value) %>%
head(10)
## # A tibble: 10 × 3
## Genre Platform Sales_value
## <chr> <chr> <dbl>
## 1 Sports Wii 82.7
## 2 Platform NES 40.2
## 3 Racing Wii 35.8
## 4 Sports Wii 33
## 5 Role-Playing GB 31.4
## 6 Puzzle GB 30.3
## 7 Platform DS 30.0
## 8 Misc Wii 29.0
## 9 Platform Wii 28.6
## 10 Shooter NES 28.3
The table above provides the top 10 genres and platforms in terms of sales for the worldwide.
The first solution: I calculated the correlation between year of release and global sales using the correlation or cor() function.
global_sale_values<-long_gamedata %>%
filter(grepl("Global_Sales",long_gamedata$Sales))
#find the correlation
correlation_global_year<-cor(global_sale_values$Year,global_sale_values$Sales_value)
print(correlation_global_year)
## [1] -0.07472447
The result above (correlation = -0.07472447) implies that there was a weak negative correlation between the year of release and global sales.
The second solution: I visualized the correlation by plotting a scatter plot between the year of release and global sales.
#plot a scatter plot
plot(global_sale_values$Year,global_sale_values$Sales_value,type="p",main="Scatter Plot",xlab="Year",ylab="Global_Sales_value")
#add a trendline
trendline<-lm( global_sale_values$Sales_value ~ global_sale_values$Year)
abline(trendline,col="green")
As the result from plotting, it shows that there is actually a weak negative correlation between the year of release and global sales.
From the statistical calculation and visualization of the correlation between the year of release and global sales, it implies that there is a weak negative correlation between them. This suggests that as the year increases, global game sales tend to decrease slightly and weakly. Therefore, it is possible that newer games may not be more successful. However, the success of games also depends on the future state of the video game industry and the social trends in upcoming periods.
I used tableau to visualize charts, so I export the dataset into an XLSX file in order to further visualize it in Tableau.
#export the dataset into an XLSX file
library(writexl)
write_xlsx(long_gamedata,path = "long_gamedata.xlsx")
For the heatmap above, we will observe the variation in genre preferences among different regions, implying the highlighted preferences in each region. In Europe, the genre with the highest total sales was Action, amounting to 516.5 million dollars. In Japan, the highest sum of sales was for the Role-Playing genre, reaching 350.3 million dollars. Similarly, in North America, the highest sum of sales was for the Action genre, totaling 861.8 million dollars. Additionally, in the rest of the world, the highest sum of sales was also for the Action genre, generating 184.9 million dollars. In summary, Action emerged as the genre with the highest total sales in North America, Europe, and the rest of the world.
For the heatmap above, we will observe the variation in platform preferences among different regions, implying the highlighted preferences in each region. In Europe, the platform with the highest total sales was PS3, generating 340.5 million dollars in sales. The second highest was PS2, with sales amounting to 332.6 million dollars. In Japan, the platform with the highest total sales was DS, reaching 175.0 million dollars. The second highest was PS, with sales of 139.8 million dollars. Similarly, in North America, the highest sum of sales was for the X360 platform, totaling 594.3 million dollars. The second highest was PS2, with sales of 572.9 million dollars. Additionally, in the rest of the world, the highest sum of sales was for the PS2 platform, generating 190.5 million dollars. The second highest was PS3, with sales amounting to 140.8 million dollars. In summary, PS2 and PS3 emerged as the platforms with the highest total sales in Europe and the rest of the world, while DS and PS dominated in Japan. In North America, X360 and PS2 were the top-selling platforms.
| Regions | Genre preferences (top one) | Platform preferences (top two) |
| Europe | Action | PS3 and PS2 |
| North America | Action | X360 and PS2 |
| Japan | Role-playing | DS and PS |
| The rest of the world | Action | PS2 and PS3 |
From the observations above, it is evident that each region has its unique preferences for platforms and genres in the video game industry. However, upon closer examination, it becomes apparent that Japan stands out as the only region that maintains its distinctive preferences without conforming to the preferences of the rest of the world.
#find publishers that have the highest global sales
sum_sales_by_publisher_global <- global_sale_values %>%
group_by(Publisher) %>%
summarise(Sum_Sales = sum(Sales_value)) %>%
arrange(desc(Sum_Sales))
#print the dataset
sum_sales_by_publisher_global
## # A tibble: 577 × 2
## Publisher Sum_Sales
## <chr> <dbl>
## 1 Nintendo 1784.
## 2 Electronic Arts 1093.
## 3 Activision 721.
## 4 Sony Computer Entertainment 607.
## 5 Ubisoft 473.
## 6 Take-Two Interactive 399.
## 7 THQ 340.
## 8 Konami Digital Entertainment 279.
## 9 Sega 271.
## 10 Namco Bandai Games 254.
## # ℹ 567 more rows
According to the results, Nintendo is the publisher with the highest global sales, amounting to 1,784 million dollars.
#find the top 10 publisher in each region
top_10_publisher_NA<-long_gamedata %>%
filter(grepl("NA_Sales",Sales)) %>%
select(Name,Publisher,Sales_value,Sales) %>%
head(10)
top_10_publisher_EU<-long_gamedata %>%
filter(grepl("EU_Sales",Sales)) %>%
select(Name,Publisher,Sales_value,Sales) %>%
head(10)
top_10_publisher_JP<-long_gamedata %>%
filter(grepl("JP_Sales",Sales)) %>%
select(Name,Publisher,Sales_value,Sales) %>%
head(10)
top_10_publisher_Other<-long_gamedata %>%
filter(grepl("Other_Sales",Sales)) %>%
select(Name,Publisher,Sales_value,Sales) %>%
head(10)
After all, I merged the top 10 publishers in each region to determine which publishers are popular across regions.
#merging the data
merged_top_10_publishers <- bind_rows(
top_10_publisher_NA,
top_10_publisher_EU,
top_10_publisher_JP,
top_10_publisher_Other)
#print the data
merged_top_10_publishers
## # A tibble: 40 × 4
## Name Publisher Sales_value Sales
## <chr> <chr> <dbl> <chr>
## 1 Wii Sports Nintendo 41.5 NA_Sales
## 2 Super Mario Bros. Nintendo 29.1 NA_Sales
## 3 Duck Hunt Nintendo 26.9 NA_Sales
## 4 Tetris Nintendo 23.2 NA_Sales
## 5 Mario Kart Wii Nintendo 15.8 NA_Sales
## 6 Wii Sports Resort Nintendo 15.8 NA_Sales
## 7 Kinect Adventures! Microsoft Game Studios 15.0 NA_Sales
## 8 New Super Mario Bros. Wii Nintendo 14.6 NA_Sales
## 9 Wii Play Nintendo 14.0 NA_Sales
## 10 Super Mario World Nintendo 12.8 NA_Sales
## # ℹ 30 more rows
#get the unique publisers
unique(merged_top_10_publishers$Publisher)
## [1] "Nintendo" "Microsoft Game Studios"
## [3] "Take-Two Interactive" "Capcom"
## [5] "Sony Computer Entertainment" "Konami Digital Entertainment"
#count the unique publisher
table(merged_top_10_publishers$Publisher)
##
## Capcom Konami Digital Entertainment
## 1 1
## Microsoft Game Studios Nintendo
## 1 33
## Sony Computer Entertainment Take-Two Interactive
## 1 3
From the table above, we can observe that out of the 40 publishers resulting from merging the top 10 publishers in terms of sales in each region, 33 of them are Nintendo, accounting for 82.5% of the total. Therefore, Nintendo emerges as the most popular publisher across multiple regions.
#load the writexl
library(writexl)
#export the data to do a visualize in tableau
write_xlsx(long_gamedata,path = "long_gamedata.xlsx")
From the line chart above shows that there are trends in the sales data over time especially during the 2000s, and one possible reason for this is the increase in technology, especially in console gaming, which contributes to higher video game sales. When we take a closer look at sales by region during the 2000s, we find that North America creates the highest sales, followed by Europe. When we look at the trend line in global sales, it implies that the sales value of video games might be increasing as the years go by. Regarding the trend line in sales by region over time, we observe that North America and Europe show a sharp increase in the sales value of video games as the years progress. However, in Japan and the rest of the world, the sales value might be increasing, but not as sharply as in North America and Europe.
The analysis from the dataset reveals several key findings about the video game industry. Firstly, the industry’s uniqueness is influenced by different regions and societal trends. Specifically, Japan’s video game market exhibits distinct preferences in genres, publishers and platforms compared to other regions like Europe and North America. While Europe and North America tend to have similar preferences and other regions around the world also follow these preferences, but to a lesser degree. Secondly, the market share in the video game industry is dominated by large companies. Most of the top-selling games in the industry come from these big companies. Thirdly, over the years, it appears that the video game industry may not be experiencing the same level of growth as it did in the early 2000s. This could be attributed to advancements in technology, which have led the gaming industry to shift towards more accessible platforms such as mobile gaming, rather than relying on console gaming or computer gaming, in order to reach more audience. Fourthly, North America generates the highest revenue in the industry, followed by Europe, and the action genre dominates in terms of popularity. Lastly, in terms of global sales, Nintendo emerges as the leading publisher with the highest sales.