Introduction

For the capstone project, I have selected the video game sales analysis to work on. For the case study, I will perform the junior data analyst for the marketing team at little game company where in Chicago. Now, my company want creating a new game, I need know which genre of game, who is the greatest consumers group. I will use RStudio for analysing.
Link

Ask

1.How many game sales on global from 2013 to 2020?

2.Which area are the most sales volume?

3.Which kind of game is the most popular?Why it is the most popular genre?

4.In the global, is that having the lowest marketing place?

5.Which platform is the most popular?

Prepare

Loading some packages.

###Setting langguage and installing packages
Sys.setenv(LANG="en")
setwd("~/Project 03")
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.6     v purrr   0.3.4
## v tibble  3.1.7     v dplyr   1.0.9
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.3
## Warning: package 'tibble' was built under R version 4.1.3
## Warning: package 'tidyr' was built under R version 4.1.3
## Warning: package 'readr' was built under R version 4.1.3
## Warning: package 'purrr' was built under R version 4.1.3
## Warning: package 'dplyr' was built under R version 4.1.3
## Warning: package 'stringr' was built under R version 4.1.3
## Warning: package 'forcats' was built under R version 4.1.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(skimr)
## Warning: package 'skimr' was built under R version 4.1.3
library(janitor)
## Warning: package 'janitor' was built under R version 4.1.3
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(scales)
## Warning: package 'scales' was built under R version 4.1.3
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(dplyr)
library(DataExplorer)
## Warning: package 'DataExplorer' was built under R version 4.1.3
library(ggplot2)

Process

Above we have already prepared the packages. Now, we are starting cleaning data. The data may be have some dirty data will affect analysis, so i using clean_names function. And, using plot_intro, plot_missing, profile_missing to ensure the structure of data. I use RStudio to clean and process data.

###Loading the data
salesps4<-read.csv("~/Project 03/PS4_GamesSales.csv")
salesxbox<-read.csv("~/Project 03/XboxOne_GameSales.csv")
videogamessales<-read.csv("~/Project 03/Video_Games_Sales_as_at_22_Dec_2016.csv")

###Checking the data structure
glimpse(salesps4)
## Rows: 1,034
## Columns: 9
## $ Game          <chr> "Grand Theft Auto V", "Call of Duty: Black Ops 3", "Red ~
## $ Year          <chr> "2014", "2015", "2018", "2017", "2017", "2016", "2016", ~
## $ Genre         <chr> "Action", "Shooter", "Action-Adventure", "Shooter", "Spo~
## $ Publisher     <chr> "Rockstar Games", "Activision", "Rockstar Games", "Activ~
## $ North.America <dbl> 6.06, 6.18, 5.26, 4.67, 1.27, 1.26, 4.49, 3.64, 3.11, 2.~
## $ Europe        <dbl> 9.71, 6.05, 6.21, 6.21, 8.64, 7.95, 3.93, 3.39, 3.83, 3.~
## $ Japan         <dbl> 0.60, 0.41, 0.21, 0.40, 0.15, 0.12, 0.21, 0.32, 0.19, 0.~
## $ Rest.of.World <dbl> 3.02, 2.44, 2.26, 2.12, 1.73, 1.61, 1.70, 1.41, 1.36, 1.~
## $ Global        <dbl> 19.39, 15.09, 13.94, 13.40, 11.80, 10.94, 10.33, 8.76, 8~
glimpse(salesxbox)
## Rows: 613
## Columns: 10
## $ Pos           <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1~
## $ Game          <chr> "Grand Theft Auto V", "Call of Duty: Black Ops 3", "Call~
## $ Year          <chr> "2014", "2015", "2017", "2018", "2014", "2014", "2016", ~
## $ Genre         <chr> "Action", "Shooter", "Shooter", "Action-Adventure", "Mis~
## $ Publisher     <chr> "Rockstar Games", "Activision", "Activision", "Rockstar ~
## $ North.America <dbl> 4.70, 4.63, 3.75, 3.76, 3.23, 3.25, 3.37, 2.94, 2.94, 2.~
## $ Europe        <dbl> 3.25, 2.04, 1.91, 1.47, 1.71, 1.49, 1.26, 1.62, 1.49, 1.~
## $ Japan         <dbl> 0.01, 0.02, 0.00, 0.00, 0.00, 0.01, 0.02, 0.02, 0.03, 0.~
## $ Rest.of.World <dbl> 0.76, 0.68, 0.57, 0.54, 0.49, 0.48, 0.48, 0.45, 0.45, 0.~
## $ Global        <dbl> 8.72, 7.37, 6.23, 5.77, 5.43, 5.22, 5.13, 5.03, 4.92, 4.~
glimpse(videogamessales)
## Rows: 16,719
## Columns: 16
## $ Name            <chr> "Wii Sports", "Super Mario Bros.", "Mario Kart Wii", "~
## $ Platform        <chr> "Wii", "NES", "Wii", "Wii", "GB", "GB", "DS", "Wii", "~
## $ Year_of_Release <chr> "2006", "1985", "2008", "2009", "1996", "1989", "2006"~
## $ Genre           <chr> "Sports", "Platform", "Racing", "Sports", "Role-Playin~
## $ Publisher       <chr> "Nintendo", "Nintendo", "Nintendo", "Nintendo", "Ninte~
## $ NA_Sales        <dbl> 41.36, 29.08, 15.68, 15.61, 11.27, 23.20, 11.28, 13.96~
## $ EU_Sales        <dbl> 28.96, 3.58, 12.76, 10.93, 8.89, 2.26, 9.14, 9.18, 6.9~
## $ JP_Sales        <dbl> 3.77, 6.81, 3.79, 3.28, 10.22, 4.22, 6.50, 2.93, 4.70,~
## $ Other_Sales     <dbl> 8.45, 0.77, 3.29, 2.95, 1.00, 0.58, 2.88, 2.84, 2.24, ~
## $ Global_Sales    <dbl> 82.53, 40.24, 35.52, 32.77, 31.37, 30.26, 29.80, 28.92~
## $ Critic_Score    <int> 76, NA, 82, 80, NA, NA, 89, 58, 87, NA, NA, 91, NA, 80~
## $ Critic_Count    <int> 51, NA, 73, 73, NA, NA, 65, 41, 80, NA, NA, 64, NA, 63~
## $ User_Score      <dbl> 8.0, NA, 8.3, 8.0, NA, NA, 8.5, 6.6, 8.4, NA, NA, 8.6,~
## $ User_Count      <int> 322, NA, 709, 192, NA, NA, 431, 129, 594, NA, NA, 464,~
## $ Developer       <chr> "Nintendo", "", "Nintendo", "Nintendo", "", "", "Ninte~
## $ Rating          <chr> "E", "", "E", "E", "", "", "E", "E", "E", "", "", "E",~
salesxbox<-salesxbox[,-1]
glimpse(salesxbox)
## Rows: 613
## Columns: 9
## $ Game          <chr> "Grand Theft Auto V", "Call of Duty: Black Ops 3", "Call~
## $ Year          <chr> "2014", "2015", "2017", "2018", "2014", "2014", "2016", ~
## $ Genre         <chr> "Action", "Shooter", "Shooter", "Action-Adventure", "Mis~
## $ Publisher     <chr> "Rockstar Games", "Activision", "Activision", "Rockstar ~
## $ North.America <dbl> 4.70, 4.63, 3.75, 3.76, 3.23, 3.25, 3.37, 2.94, 2.94, 2.~
## $ Europe        <dbl> 3.25, 2.04, 1.91, 1.47, 1.71, 1.49, 1.26, 1.62, 1.49, 1.~
## $ Japan         <dbl> 0.01, 0.02, 0.00, 0.00, 0.00, 0.01, 0.02, 0.02, 0.03, 0.~
## $ Rest.of.World <dbl> 0.76, 0.68, 0.57, 0.54, 0.49, 0.48, 0.48, 0.45, 0.45, 0.~
## $ Global        <dbl> 8.72, 7.37, 6.23, 5.77, 5.43, 5.22, 5.13, 5.03, 4.92, 4.~
###Checking the data in visualization
salesps4 %>% plot_intro()

salesps4 %>% plot_missing()

salesps4 %>% profile_missing()
##         feature num_missing pct_missing
## 1          Game           0   0.0000000
## 2          Year           0   0.0000000
## 3         Genre           0   0.0000000
## 4     Publisher           0   0.0000000
## 5 North.America           0   0.0000000
## 6        Europe           0   0.0000000
## 7         Japan           0   0.0000000
## 8 Rest.of.World           0   0.0000000
## 9        Global          11   0.0106383
salesxbox %>% plot_intro()

salesxbox %>% plot_missing()

salesxbox %>% profile_missing()
##         feature num_missing pct_missing
## 1          Game           0  0.00000000
## 2          Year           0  0.00000000
## 3         Genre           0  0.00000000
## 4     Publisher           0  0.00000000
## 5 North.America           0  0.00000000
## 6        Europe           0  0.00000000
## 7         Japan           0  0.00000000
## 8 Rest.of.World           0  0.00000000
## 9        Global           8  0.01305057
videogamessales %>% plot_intro()

videogamessales %>% plot_missing()

videogamessales %>% profile_missing()
##            feature num_missing  pct_missing
## 1             Name           0 0.0000000000
## 2         Platform           0 0.0000000000
## 3  Year_of_Release           0 0.0000000000
## 4            Genre           0 0.0000000000
## 5        Publisher           0 0.0000000000
## 6         NA_Sales           0 0.0000000000
## 7         EU_Sales           0 0.0000000000
## 8         JP_Sales           0 0.0000000000
## 9      Other_Sales           0 0.0000000000
## 10    Global_Sales           2 0.0001196244
## 11    Critic_Score        8582 0.5133082122
## 12    Critic_Count        8582 0.5133082122
## 13      User_Score        9129 0.5460254800
## 14      User_Count        9129 0.5460254800
## 15       Developer           0 0.0000000000
## 16          Rating           0 0.0000000000
###Cleaning the data
salesps4<-clean_names(salesps4)
salesxbox<-clean_names(salesxbox)
videogamessales<-clean_names(videogamessales)

###Ensure NA in your data
colSums(is.na(salesps4))
##          game          year         genre     publisher north_america 
##             0             0             0             0             0 
##        europe         japan rest_of_world        global 
##             0             0             0            11
colSums(is.na(salesxbox))
##          game          year         genre     publisher north_america 
##             0             0             0             0             0 
##        europe         japan rest_of_world        global 
##             0             0             0             8
colSums(is.na(videogamessales))
##            name        platform year_of_release           genre       publisher 
##               0               0               0               0               0 
##        na_sales        eu_sales        jp_sales     other_sales    global_sales 
##               0               0               0               0               2 
##    critic_score    critic_count      user_score      user_count       developer 
##            8582            8582            9129            9129               0 
##          rating 
##               0
###Filtering NA in data
salesps4<-filter(salesps4,!is.na(global))
salesxbox<-filter(salesxbox,!is.na(global))
videogamessales<-filter(videogamessales,!is.na(global_sales) & !is.na(critic_score) & !is.na(critic_count) & !is.na(user_score) & !is.na(user_count))

###Checking again data structure
salesps4 %>% plot_intro()

salesps4 %>% plot_missing()

salesps4 %>% profile_missing()
##         feature num_missing pct_missing
## 1          game           0           0
## 2          year           0           0
## 3         genre           0           0
## 4     publisher           0           0
## 5 north_america           0           0
## 6        europe           0           0
## 7         japan           0           0
## 8 rest_of_world           0           0
## 9        global           0           0
salesxbox %>% plot_intro()

salesxbox %>% plot_missing()

salesxbox %>% profile_missing()
##         feature num_missing pct_missing
## 1          game           0           0
## 2          year           0           0
## 3         genre           0           0
## 4     publisher           0           0
## 5 north_america           0           0
## 6        europe           0           0
## 7         japan           0           0
## 8 rest_of_world           0           0
## 9        global           0           0
videogamessales %>% plot_intro()

videogamessales %>% plot_missing()

videogamessales %>% profile_missing()
##            feature num_missing pct_missing
## 1             name           0           0
## 2         platform           0           0
## 3  year_of_release           0           0
## 4            genre           0           0
## 5        publisher           0           0
## 6         na_sales           0           0
## 7         eu_sales           0           0
## 8         jp_sales           0           0
## 9      other_sales           0           0
## 10    global_sales           0           0
## 11    critic_score           0           0
## 12    critic_count           0           0
## 13      user_score           0           0
## 14      user_count           0           0
## 15       developer           0           0
## 16          rating           0           0
###Check NA again
colSums(is.na(salesps4))
##          game          year         genre     publisher north_america 
##             0             0             0             0             0 
##        europe         japan rest_of_world        global 
##             0             0             0             0
colSums(is.na(salesxbox))
##          game          year         genre     publisher north_america 
##             0             0             0             0             0 
##        europe         japan rest_of_world        global 
##             0             0             0             0
colSums(is.na(videogamessales))
##            name        platform year_of_release           genre       publisher 
##               0               0               0               0               0 
##        na_sales        eu_sales        jp_sales     other_sales    global_sales 
##               0               0               0               0               0 
##    critic_score    critic_count      user_score      user_count       developer 
##               0               0               0               0               0 
##          rating 
##               0
###Processing data in period from 2013 to 2020
salesps4<-subset(salesps4,year<="2020")
salesxbox<-subset(salesxbox,year<="2020")
videogamessales<-subset(videogamessales, year_of_release<="2020" & year_of_release>="2013")

Analysis

From the above data has been cleaning and processing, now, I plot the barplot to show the games sales which on platform PS4 and XBox respectively, from 2013 to 2020. The graphs are showing below.

ggplot(salesps4,aes(x=year,y=north_america))+geom_bar(stat='identity',color='lightblue')+labs(x='Year',y='North America Sales',title="North America Sales Volume from 2013 to 2020",tag="PS4")+geom_text(aes(x=1,y=50),label=sum(salesps4$north_america))

ggplot(salesps4,aes(x=year,y=europe))+geom_bar(stat='identity',color='lightblue')+labs(x='Year',y='Europe Sales',title="Europe Sales Volume from 2013 to 2020",tag="PS4")+geom_text(aes(x=1,y=50),label=sum(salesps4$europe))

ggplot(salesps4,aes(x=year,y=japan))+geom_bar(stat='identity',color='lightblue')+labs(x='Year',y='Japan Sales',title="Japan Sales Volume from 2013 to 2020",tag="PS4")+geom_text(aes(x=1,y=20),label=sum(salesps4$japan))

ggplot(salesps4,aes(x=year,y=rest_of_world))+geom_bar(stat='identity',color='lightblue')+labs(x='Year',y='Rest of World Sales',title="Rest of World Sales Volume from 2013 to 2020",tag="PS4")+geom_text(aes(x=1,y=30),label=sum(salesps4$rest_of_world))

ggplot(salesps4,aes(x=year,y=global))+geom_bar(stat='identity',color='lightblue')+labs(x='Year',y='Global Sales',title="Global Sales Volume from 2013 to 2020",tag="PS4")+geom_text(aes(x=1,y=150),label=sum(salesps4$global))

ggplot(salesxbox,aes(x=year,y=north_america))+geom_bar(stat='identity',color='lightgreen')+labs(x='Year',y='North America Sales',title="North America Sales Volume from 2013 to 2020",tag="XBox")+geom_text(aes(x=1,y=50),label=sum(salesxbox$north_america))

ggplot(salesxbox,aes(x=year,y=europe))+geom_bar(stat='identity',color='lightgreen')+labs(x='Year',y='Europe Sales',title="Europe Sales Volume from 2013 to 2020",tag="XBox")+geom_text(aes(x=1,y=30),label=sum(salesxbox$europe))

ggplot(salesxbox,aes(x=year,y=japan))+geom_bar(stat='identity',color='lightgreen')+labs(x='Year',y='Japan Sales',title="Japan Sales Volume from 2013 to 2020",tag="XBox")+geom_text(aes(x=1,y=1.5),label=sum(salesxbox$japan))

ggplot(salesxbox,aes(x=year,y=rest_of_world))+geom_bar(stat='identity',color='lightgreen')+labs(x='Year',y='Rest of World Sales',title="Rest of World Sales Volume from 2013 to 2020",tag="XBox")+geom_text(aes(x=1,y=8),label=sum(salesxbox$rest_of_world))

ggplot(salesxbox,aes(x=year,y=global))+geom_bar(stat='identity',color='lightgreen')+labs(x='Year',y='Global Sales',title="Global Sales Volume from 2013 to 2020",tag="XBox")+geom_text(aes(x=1,y=70),label=sum(salesxbox$global))

Both PS4 and XBox, in the Japan marketing is the lowest.

In here, I show the genre game which presented the most popular genre.

salesps4genre<-salesps4 %>% group_by(genre) %>% summarise(Count = n(),Perc=round(n()/nrow(.)*100,2)) %>% arrange(desc(Count))
salesps4genre
## # A tibble: 17 x 3
##    genre            Count  Perc
##    <chr>            <int> <dbl>
##  1 Action             204 25   
##  2 Role-Playing       105 12.9 
##  3 Shooter             74  9.07
##  4 Adventure           70  8.58
##  5 Sports              69  8.46
##  6 Misc                53  6.5 
##  7 Racing              47  5.76
##  8 Action-Adventure    38  4.66
##  9 Platform            33  4.04
## 10 Fighting            32  3.92
## 11 Strategy            25  3.06
## 12 Simulation          21  2.57
## 13 Music               18  2.21
## 14 Puzzle               9  1.1 
## 15 MMO                  8  0.98
## 16 Visual Novel         8  0.98
## 17 Party                2  0.25
salesxboxgenre<-salesxbox %>% group_by(genre) %>% summarise(Count = n(),Perc=round(n()/nrow(.)*100,2)) %>% arrange(desc(Count))
salesxboxgenre
## # A tibble: 16 x 3
##    genre            Count  Perc
##    <chr>            <int> <dbl>
##  1 Action             116  23.2
##  2 Shooter             65  13  
##  3 Sports              58  11.6
##  4 Racing              45   9  
##  5 Adventure           36   7.2
##  6 Role-Playing        35   7  
##  7 Misc                29   5.8
##  8 Action-Adventure    28   5.6
##  9 Simulation          19   3.8
## 10 Platform            18   3.6
## 11 Fighting            16   3.2
## 12 Strategy            14   2.8
## 13 Music               13   2.6
## 14 Puzzle               4   0.8
## 15 MMO                  2   0.4
## 16 Visual Novel         2   0.4

Above tables are showing the game sales and the genre of game, ‘Action’ is the most popular genre.

ggplot(salesps4, aes(x =genre,y = north_america))+geom_bar(stat="identity", color='blue')+labs(x = "Genre", y = "North.America Sales",title="North.America Sales Volume Of Games Genre",tag="PS4")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=75),label=sum(salesps4$north_america))

ggplot(salesps4, aes(x =genre,y =europe ))+geom_bar(stat="identity", color='blue')+labs(x = "Genre", y = "Europe Sales",title="Europe  Sales Volume Of Games Genre",tag="PS4")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=75),label=sum(salesps4$europe))

ggplot(salesps4, aes(x =genre,y =japan))+geom_bar(stat="identity", color='blue')+labs(x = "Genre", y = "Japan Sales",title="Japan Sales Volume Of Games Genre",tag="PS4")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=13),label=sum(salesps4$japan))

ggplot(salesps4, aes(x =genre,y =rest_of_world))+geom_bar(stat="identity", color='blue')+labs(x = "Genre", y = "Rest.of.World Sales",title="Rest.of.World Sales Volume Of Games Genre",tag="PS4")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=30),label=sum(salesps4$rest_of_world))

ggplot(salesps4, aes(x =genre,y =global))+geom_bar(stat="identity", color='blue')+labs(x = "Genre", y = "Global Sales",title="Global Sales Volume Of Games Genre",tag="PS4")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=160),label=sum(salesps4$global))

ggplot(salesxbox, aes(x =genre,y = north_america))+geom_bar(stat="identity", color='green')+labs(x = "Genre", y = "North.America Sales",title="North.America Sales Volume Of Games Genre",tag="XBox")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=60),label=sum(salesxbox$north_america))

ggplot(salesxbox, aes(x =genre,y =europe ))+geom_bar(stat="identity", color='green')+labs(x = "Genre", y = "Europe Sales",title="Europe  Sales Volume Of Games Genre",tag="XBox")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=30),label=sum(salesxbox$europe))

ggplot(salesxbox, aes(x =genre,y =japan))+geom_bar(stat="identity", color='green')+labs(x = "Genre", y = "Japan Sales",title="Japan Sales Volume Of Games Genre",tag="XBox")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=5),label=sum(salesxbox$japan))

ggplot(salesxbox, aes(x =genre,y =rest_of_world))+geom_bar(stat="identity", color='green')+labs(x = "Genre", y = "Rest.of.World Sales",title="Rest.of.World Sales Volume Of Games Genre",tag="XBox")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=10),label=sum(salesxbox$rest_of_world))

ggplot(salesxbox, aes(x =genre,y =global))+geom_bar(stat="identity", color='green')+labs(x = "Genre", y = "Global Sales",title="Global Sales Volume Of Games Genre",tag="XBox")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=100),label=sum(salesxbox$global))

salesps4 %>% group_by(publisher) %>% summarise(Count = n(),Perc=round(n()/nrow(.)*100,2)) %>% arrange(desc(Count))
## # A tibble: 147 x 3
##    publisher                              Count  Perc
##    <chr>                                  <int> <dbl>
##  1 Namco Bandai Games                        56  6.86
##  2 Sony Interactive Entertainment            47  5.76
##  3 Ubisoft                                   44  5.39
##  4 Square Enix                               40  4.9 
##  5 Tecmo Koei                                37  4.53
##  6 Activision                                30  3.68
##  7 Capcom                                    30  3.68
##  8 Warner Bros. Interactive Entertainment    27  3.31
##  9 Sony Computer Entertainment               25  3.06
## 10 Electronic Arts                           21  2.57
## # ... with 137 more rows
salesxbox %>% group_by(publisher) %>% summarise(Count = n(),Perc=round(n()/nrow(.)*100,2)) %>% arrange(desc(Count))
## # A tibble: 96 x 3
##    publisher                              Count  Perc
##    <chr>                                  <int> <dbl>
##  1 Ubisoft                                   44   8.8
##  2 Microsoft Studios                         31   6.2
##  3 Activision                                29   5.8
##  4 Warner Bros. Interactive Entertainment    26   5.2
##  5 Electronic Arts                           22   4.4
##  6 Capcom                                    19   3.8
##  7 EA Sports                                 19   3.8
##  8 Namco Bandai Games                        19   3.8
##  9 505 Games                                 16   3.2
## 10 THQ Nordic                                16   3.2
## # ... with 86 more rows

In the two type platforms, on PS4 the game of action is having the most sales, on XBox the game of ‘Shooter’ is the best sales.

ggplot(videogamessales, aes(x =genre,y =na_sales))+geom_bar(stat="identity", color='blue')+labs(x = "Genre", y = "North.America Sales",title="North.America Sales Volume Of Games Genre")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=100),label=sum(videogamessales$na_sales))

ggplot(videogamessales, aes(x =genre,y =eu_sales ))+geom_bar(stat="identity", color='blue')+labs(x = "Genre", y = "Europe Sales",title="Europe  Sales Volume Of Games Genre")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=100),label=sum(videogamessales$eu_sales))

ggplot(videogamessales, aes(x =genre,y =jp_sales))+geom_bar(stat="identity", color='blue')+labs(x = "Genre", y = "Japan Sales",title="Japan Sales Volume Of Games Genre")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=25),label=sum(videogamessales$jp_sales))

ggplot(videogamessales, aes(x =genre,y =other_sales ))+geom_bar(stat="identity", color='blue')+labs(x = "Genre", y = "Rest.of.World Sales",title="Rest.of.World Sales Volume Of Games Genre")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=30),label=sum(videogamessales$other_sales ))

ggplot(videogamessales, aes(x =genre,y =global_sales))+geom_bar(stat="identity", color='blue')+labs(x = "Genre", y = "Global Sales",title="Global Sales Volume Of Games Genre")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=250),label=sum(videogamessales$global_sales))

ggplot(videogamessales, aes(x =platform,y =na_sales))+geom_bar(stat="identity", color='yellow')+labs(x = "Platform", y = "North.America Sales",title="North.America Sales Volume Of Games Genre")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=80),label=sum(videogamessales$na_sales))

ggplot(videogamessales, aes(x =platform,y =eu_sales ))+geom_bar(stat="identity", color='yellow')+labs(x = "Platform", y = "Europe Sales",title="Europe  Sales Volume Of Games Genre")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=110),label=sum(videogamessales$eu_sales))

ggplot(videogamessales, aes(x =platform,y =jp_sales))+geom_bar(stat="identity", color='yellow')+labs(x = "Platform", y = "Japan Sales",title="Japan Sales Volume Of Games Genre")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=25),label=sum(videogamessales$jp_sales))

ggplot(videogamessales, aes(x =platform,y =other_sales ))+geom_bar(stat="identity", color='yellow')+labs(x = "Platform", y = "Rest.of.World Sales",title="Rest.of.World Sales Volume Of Games Genre")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=50),label=sum(videogamessales$other_sales ))

ggplot(videogamessales, aes(x =platform,y =global_sales))+geom_bar(stat="identity", color='yellow')+labs(x = "Platform", y = "Global Sales",title="Global Sales Volume Of Games Genre")+theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 12),axis.text.y = element_text(vjust = 1, hjust = 1, size = 12))+geom_text(aes(x=1,y=250),label=sum(videogamessales$global_sales))

videogamessales %>% group_by(platform) %>% summarise(Count = n(),Perc=round(n()/nrow(.)*100,2)) %>% arrange(desc(Count))
## # A tibble: 9 x 3
##   platform Count  Perc
##   <chr>    <int> <dbl>
## 1 PS4        249 25.5 
## 2 XOne       165 16.9 
## 3 PC         148 15.2 
## 4 PS3        120 12.3 
## 5 X360        81  8.3 
## 6 PSV         76  7.79
## 7 WiiU        69  7.07
## 8 3DS         67  6.86
## 9 PSP          1  0.1
videogamessales %>% group_by(genre) %>% summarise(Count = n(),Perc=round(n()/nrow(.)*100,2)) %>% arrange(desc(Count))
## # A tibble: 12 x 3
##    genre        Count  Perc
##    <chr>        <int> <dbl>
##  1 Action         309 31.7 
##  2 Shooter        132 13.5 
##  3 Role-Playing   127 13.0 
##  4 Sports         110 11.3 
##  5 Racing          59  6.05
##  6 Platform        50  5.12
##  7 Adventure       46  4.71
##  8 Fighting        42  4.3 
##  9 Misc            42  4.3 
## 10 Simulation      26  2.66
## 11 Strategy        26  2.66
## 12 Puzzle           7  0.72

In the all platform, action game is the most popular, and shooter is the second most popular, have percentage from all genre 31.7% and 13.5%, respectively. PS4 is the best sales in global, have the highest percentage from all platforms (25.5%).

Conclusion

From 2013 to 2020, the total game sales which on PS4 and XBox are 594.79 millions and 268.73 millions. For all platform, the total game sales is 714.07 millions. Is a huge number, North America is having the greatest sales in global, is the greatest consumers group in global, but Japan marketing is the smallest consumers group in global. In all genre game, action game and shooter game are the most popular, them have percentage from all genre 31.7% and 13.5%, respectively. The reason of causing this may be action game is easily control for playing, and the more action games have interesting and heroism story. Shooter game have a model, usually, it can choose two players or single player, it is a good design for increasing the game fun. All the platform, PS4 is the best sales in global, have the highest percentage from all platforms 25.5%.

Share

Video Games Sales

Presentation: Video Games Sales