Game industry is one of the fastest growing industry in the world right now. There are many consoles that have been produced by game manufacturer. We will be looking at the sales that is generated by the consoles and create some visualizations. You can download the data here
Load the required packages.
library(ggplot2)
library(tidyverse)console <- read.csv("data/console.csv")glimpse(console)## Rows: 44
## Columns: 6
## $ ConsoleID <chr> "PC", "PS2", "DS", "GB", "PS4", "PS", "Wii", "PS3", "X360~
## $ Console_Name <chr> "Personal Computer", "PlayStation 2", "Nintendo DS", "Gam~
## $ Manufacturer <chr> "Computer", "Sony", "Nintendo", "Nintendo", "Sony", "Sony~
## $ Release_Year <int> 1975, 2000, 2004, 1989, 2013, 1994, 2006, 2006, 2005, 200~
## $ Sales <dbl> 1000.00, 155.00, 154.02, 118.69, 108.90, 102.49, 101.63, ~
## $ Type <chr> "Home", "Home", "Handheld", "Handheld", "Home", "Home", "~
We can see that the data consisted of 44 rows and 6 columns.
The columns consisted of: ConsoleID: Console Id
Console_Name : Console Real Name
Manufacturer : Console Manufacturer
Release_Year : Console Release Year
Sales: Number of units sold(in million)
Type : Type of console(home or handheld)
We need to change the data types of some variables.
console <- console %>%
mutate(Manufacturer = as.factor(Manufacturer)) %>%
mutate(Type = as.factor(Type))
glimpse(console)## Rows: 44
## Columns: 6
## $ ConsoleID <chr> "PC", "PS2", "DS", "GB", "PS4", "PS", "Wii", "PS3", "X360~
## $ Console_Name <chr> "Personal Computer", "PlayStation 2", "Nintendo DS", "Gam~
## $ Manufacturer <fct> Computer, Sony, Nintendo, Nintendo, Sony, Sony, Nintendo,~
## $ Release_Year <int> 1975, 2000, 2004, 1989, 2013, 1994, 2006, 2006, 2005, 200~
## $ Sales <dbl> 1000.00, 155.00, 154.02, 118.69, 108.90, 102.49, 101.63, ~
## $ Type <fct> Home, Home, Handheld, Handheld, Home, Home, Home, Home, H~
colSums(is.na(console))## ConsoleID Console_Name Manufacturer Release_Year Sales Type
## 0 0 0 0 0 0
Great! No missing values.
sum(duplicated(console))## [1] 0
Great! No duplicates.
Let’s see the summary of the data
summary(console)## ConsoleID Console_Name Manufacturer Release_Year
## Length:44 Length:44 Nintendo :14 Min. :1972
## Class :character Class :character Sega : 7 1st Qu.:1988
## Mode :character Mode :character Sony : 6 Median :1994
## Atari : 3 Mean :1996
## Microsoft: 3 3rd Qu.:2004
## Coleco : 2 Max. :2017
## (Other) : 9
## Sales Type
## Min. : 0.40 Handheld:10
## 1st Qu.: 3.00 Home :34
## Median : 12.09
## Mean : 59.36
## 3rd Qu.: 76.78
## Max. :1000.00
##
From the summary above, we can see that: 1. Nintendo as a manufacturer have the greatest number of consoles released with 14 consoles.
The first console was released in 1972 and the latest console was released in 2017.
Console sales have a big range from 0.40 until 1000.
The type of consoles is divided into 2 categories which is Handheld with 10 consoles and Home with 34 consoles.
boxplot(console$Sales)We can see that there is outlier in the sales data
We will create a visualization for Top 10 Console
top_10_console <- console %>%
select(Console_Name,Sales,Manufacturer) %>%
arrange(desc(Sales)) %>%
head(10)
top_10_console## Console_Name Sales Manufacturer
## 1 Personal Computer 1000.00 Computer
## 2 PlayStation 2 155.00 Sony
## 3 Nintendo DS 154.02 Nintendo
## 4 Game Boy 118.69 Nintendo
## 5 PlayStation 4 108.90 Sony
## 6 PlayStation 102.49 Sony
## 7 Wii 101.63 Nintendo
## 8 PlayStation 3 87.40 Sony
## 9 Xbox 360 84.00 Microsoft
## 10 Game Boy Advance 81.51 Nintendo
ggplot(top_10_console,aes(x=reorder(Console_Name,Sales), y = Sales,fill= Manufacturer))+
geom_col()+
coord_flip()+
geom_text(aes(label = Sales),nudge_y = 60,col = "Blue")+
labs(title = "Top 10 Consoles",x =NULL, y="Sales(in million units)")+
theme_classic()We can see that personal computer has the biggest number of sales for consoles, followed by PlayStation 2, Nintendo DS, Game Boy, PlayStation 4 etc. We can also see that personal computer has a sales way above the other consoles.
Now we will see the Top 5 Manufacturer
top_5_manufacturer <- console %>%
group_by(Manufacturer) %>%
summarise(Total_Sales = sum(Sales)) %>%
arrange(desc(Total_Sales)) %>%
head(5)
top_5_manufacturer## # A tibble: 5 x 2
## Manufacturer Total_Sales
## <fct> <dbl>
## 1 Computer 1000
## 2 Nintendo 774.
## 3 Sony 544.
## 4 Microsoft 155.
## 5 Sega 79.4
ggplot(top_5_manufacturer,aes(x=reorder(Manufacturer,Total_Sales), y = Total_Sales,fill=Total_Sales))+
geom_col()+
scale_fill_gradient(low = "#f8d979",high = "#363fe6" )+
coord_flip()+
geom_text(aes(label = Total_Sales),nudge_y = 60,col = "Blue")+
labs(title = "Top 5 Manufacturer",x =NULL, y="Sales(in million units)")+
theme_classic()From the plot above, we can see that Computer is leading the manufacturer in sales followed by Nintendo, Sony, Microsoft and Sega.
Now we will compare the type of consoles
type_comparison <- console %>%
group_by(Type) %>%
summarise(Total_Sales = sum(Sales)) %>%
arrange(desc(Total_Sales))
type_comparison ## # A tibble: 2 x 2
## Type Total_Sales
## <fct> <dbl>
## 1 Home 2074.
## 2 Handheld 538.
ggplot(type_comparison, aes(y=reorder(Type, Total_Sales), x= Total_Sales, fill=Type))+
geom_col()+
coord_flip()+
geom_text(aes(label = Total_Sales),nudge_x = 60,col = "red")+
labs(title = "Console Sales by Type",y =NULL, x="Sales(in million units)")+
theme_classic()From the plot above, we can see that Home consoles generated more sales than Handheld Consoles.
Personal computer generates more sales than other consoles. This might happened because personal computer offer more functionality other than just playing games such as work. Computer manufacturer also generates more sales than other manufacturer. We can also see that Home consoles generates more sales than Handheld consoles. This might happened because people prefer to play at home with larger screen. While Handheld consoles offer more mobility but generally it has a small screen which might be an uninteresting features for some people.