Intro

Game industry is one of the fastest growing industry in the world right now. There are many consoles that have been produced by game manufacturer. We will be looking at the sales that is generated by the consoles and create some visualizations. You can download the data here

Library

Load the required packages.

library(ggplot2)
library(tidyverse)

Data preparation

Load the dataset

console <- read.csv("data/console.csv")

Brief explanation about the data

glimpse(console)
## Rows: 44
## Columns: 6
## $ ConsoleID    <chr> "PC", "PS2", "DS", "GB", "PS4", "PS", "Wii", "PS3", "X360~
## $ Console_Name <chr> "Personal Computer", "PlayStation 2", "Nintendo DS", "Gam~
## $ Manufacturer <chr> "Computer", "Sony", "Nintendo", "Nintendo", "Sony", "Sony~
## $ Release_Year <int> 1975, 2000, 2004, 1989, 2013, 1994, 2006, 2006, 2005, 200~
## $ Sales        <dbl> 1000.00, 155.00, 154.02, 118.69, 108.90, 102.49, 101.63, ~
## $ Type         <chr> "Home", "Home", "Handheld", "Handheld", "Home", "Home", "~

We can see that the data consisted of 44 rows and 6 columns.

The columns consisted of: ConsoleID: Console Id

Console_Name : Console Real Name

Manufacturer : Console Manufacturer

Release_Year : Console Release Year

Sales: Number of units sold(in million)

Type : Type of console(home or handheld)

Data Manipulation

We need to change the data types of some variables.

console <- console %>% 
 mutate(Manufacturer = as.factor(Manufacturer)) %>% 
 mutate(Type = as.factor(Type))
glimpse(console)
## Rows: 44
## Columns: 6
## $ ConsoleID    <chr> "PC", "PS2", "DS", "GB", "PS4", "PS", "Wii", "PS3", "X360~
## $ Console_Name <chr> "Personal Computer", "PlayStation 2", "Nintendo DS", "Gam~
## $ Manufacturer <fct> Computer, Sony, Nintendo, Nintendo, Sony, Sony, Nintendo,~
## $ Release_Year <int> 1975, 2000, 2004, 1989, 2013, 1994, 2006, 2006, 2005, 200~
## $ Sales        <dbl> 1000.00, 155.00, 154.02, 118.69, 108.90, 102.49, 101.63, ~
## $ Type         <fct> Home, Home, Handheld, Handheld, Home, Home, Home, Home, H~

Check missing values

colSums(is.na(console))
##    ConsoleID Console_Name Manufacturer Release_Year        Sales         Type 
##            0            0            0            0            0            0

Great! No missing values.

Check duplicate

sum(duplicated(console))
## [1] 0

Great! No duplicates.

Practical Statistics

Let’s see the summary of the data

summary(console)
##   ConsoleID         Console_Name          Manufacturer  Release_Year 
##  Length:44          Length:44          Nintendo :14    Min.   :1972  
##  Class :character   Class :character   Sega     : 7    1st Qu.:1988  
##  Mode  :character   Mode  :character   Sony     : 6    Median :1994  
##                                        Atari    : 3    Mean   :1996  
##                                        Microsoft: 3    3rd Qu.:2004  
##                                        Coleco   : 2    Max.   :2017  
##                                        (Other)  : 9                  
##      Sales               Type   
##  Min.   :   0.40   Handheld:10  
##  1st Qu.:   3.00   Home    :34  
##  Median :  12.09                
##  Mean   :  59.36                
##  3rd Qu.:  76.78                
##  Max.   :1000.00                
## 

From the summary above, we can see that: 1. Nintendo as a manufacturer have the greatest number of consoles released with 14 consoles.

  1. The first console was released in 1972 and the latest console was released in 2017.

  2. Console sales have a big range from 0.40 until 1000.

  3. The type of consoles is divided into 2 categories which is Handheld with 10 consoles and Home with 34 consoles.

Check outliers

boxplot(console$Sales)

We can see that there is outlier in the sales data

Data Visualization

We will create a visualization for Top 10 Console

top_10_console <- console %>% 
  select(Console_Name,Sales,Manufacturer) %>% 
  arrange(desc(Sales)) %>% 
  head(10)
top_10_console
##         Console_Name   Sales Manufacturer
## 1  Personal Computer 1000.00     Computer
## 2      PlayStation 2  155.00         Sony
## 3        Nintendo DS  154.02     Nintendo
## 4           Game Boy  118.69     Nintendo
## 5      PlayStation 4  108.90         Sony
## 6        PlayStation  102.49         Sony
## 7                Wii  101.63     Nintendo
## 8      PlayStation 3   87.40         Sony
## 9           Xbox 360   84.00    Microsoft
## 10  Game Boy Advance   81.51     Nintendo
ggplot(top_10_console,aes(x=reorder(Console_Name,Sales), y = Sales,fill= Manufacturer))+
  geom_col()+
  coord_flip()+
  geom_text(aes(label = Sales),nudge_y = 60,col = "Blue")+
  labs(title = "Top 10 Consoles",x =NULL, y="Sales(in million units)")+
  theme_classic()

We can see that personal computer has the biggest number of sales for consoles, followed by PlayStation 2, Nintendo DS, Game Boy, PlayStation 4 etc. We can also see that personal computer has a sales way above the other consoles.

Now we will see the Top 5 Manufacturer

top_5_manufacturer <- console %>% 
  group_by(Manufacturer) %>% 
  summarise(Total_Sales = sum(Sales)) %>% 
  arrange(desc(Total_Sales)) %>% 
  head(5)

top_5_manufacturer
## # A tibble: 5 x 2
##   Manufacturer Total_Sales
##   <fct>              <dbl>
## 1 Computer          1000  
## 2 Nintendo           774. 
## 3 Sony               544. 
## 4 Microsoft          155. 
## 5 Sega                79.4
ggplot(top_5_manufacturer,aes(x=reorder(Manufacturer,Total_Sales), y = Total_Sales,fill=Total_Sales))+
  geom_col()+
  scale_fill_gradient(low = "#f8d979",high = "#363fe6" )+
  coord_flip()+
  geom_text(aes(label = Total_Sales),nudge_y = 60,col = "Blue")+
  labs(title = "Top 5 Manufacturer",x =NULL, y="Sales(in million units)")+
  theme_classic()

From the plot above, we can see that Computer is leading the manufacturer in sales followed by Nintendo, Sony, Microsoft and Sega.

Now we will compare the type of consoles

type_comparison <- console %>% 
  group_by(Type) %>% 
  summarise(Total_Sales = sum(Sales)) %>% 
  arrange(desc(Total_Sales))

type_comparison  
## # A tibble: 2 x 2
##   Type     Total_Sales
##   <fct>          <dbl>
## 1 Home           2074.
## 2 Handheld        538.
ggplot(type_comparison, aes(y=reorder(Type, Total_Sales), x= Total_Sales, fill=Type))+
  geom_col()+
  coord_flip()+
  geom_text(aes(label = Total_Sales),nudge_x = 60,col = "red")+
  labs(title = "Console Sales by Type",y =NULL, x="Sales(in million units)")+
  theme_classic()

From the plot above, we can see that Home consoles generated more sales than Handheld Consoles.

Conclusion

Personal computer generates more sales than other consoles. This might happened because personal computer offer more functionality other than just playing games such as work. Computer manufacturer also generates more sales than other manufacturer. We can also see that Home consoles generates more sales than Handheld consoles. This might happened because people prefer to play at home with larger screen. While Handheld consoles offer more mobility but generally it has a small screen which might be an uninteresting features for some people.