Explanation

Hi!! Welcome to my LBB in this LBB i’m gonna use Game sales dataset Enjoy!!

Input Data

Make sure our data placed in the same folder our R project data.

data <- read.csv("data_input/vgsales.csv")

Input data is DONE ! then let’s get started

Data Inspection

head(data)
##   Rank                     Name Platform Year        Genre Publisher NA_Sales
## 1    1               Wii Sports      Wii 2006       Sports  Nintendo    41.49
## 2    2        Super Mario Bros.      NES 1985     Platform  Nintendo    29.08
## 3    3           Mario Kart Wii      Wii 2008       Racing  Nintendo    15.85
## 4    4        Wii Sports Resort      Wii 2009       Sports  Nintendo    15.75
## 5    5 Pokemon Red/Pokemon Blue       GB 1996 Role-Playing  Nintendo    11.27
## 6    6                   Tetris       GB 1989       Puzzle  Nintendo    23.20
##   EU_Sales JP_Sales Other_Sales Global_Sales
## 1    29.02     3.77        8.46        82.74
## 2     3.58     6.81        0.77        40.24
## 3    12.88     3.79        3.31        35.82
## 4    11.01     3.28        2.96        33.00
## 5     8.89    10.22        1.00        31.37
## 6     2.26     4.22        0.58        30.26
dim(data)
## [1] 16598    11
names(data)
##  [1] "Rank"         "Name"         "Platform"     "Year"         "Genre"       
##  [6] "Publisher"    "NA_Sales"     "EU_Sales"     "JP_Sales"     "Other_Sales" 
## [11] "Global_Sales"

From our inspection we can conclude : * the data contain 16598 of rows and 11 of coloumns * Each of column name : “Rank”, “Name”, “Platform”, “Year”, “Genre”, “Publisher”, “NA_Sales”, “EU_Sales”, “JP_Sales”, “Other_Sales”, “Global_Sales”

Data Cleansing & Coertions

Check data type for each column

str(data)
## 'data.frame':    16598 obs. of  11 variables:
##  $ Rank        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Name        : chr  "Wii Sports" "Super Mario Bros." "Mario Kart Wii" "Wii Sports Resort" ...
##  $ Platform    : chr  "Wii" "NES" "Wii" "Wii" ...
##  $ Year        : chr  "2006" "1985" "2008" "2009" ...
##  $ Genre       : chr  "Sports" "Platform" "Racing" "Sports" ...
##  $ Publisher   : chr  "Nintendo" "Nintendo" "Nintendo" "Nintendo" ...
##  $ NA_Sales    : num  41.5 29.1 15.8 15.8 11.3 ...
##  $ EU_Sales    : num  29.02 3.58 12.88 11.01 8.89 ...
##  $ JP_Sales    : num  3.77 6.81 3.79 3.28 10.22 ...
##  $ Other_Sales : num  8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
##  $ Global_Sales: num  82.7 40.2 35.8 33 31.4 ...

From this result, we find some of data type not in the corect type. we need to convert it into corect type (data coertion)

data$Platform <- as.factor(data$Platform)
data$Genre <- as.factor(data$Genre)
data$Publisher <- as.factor(data$Publisher)
data$Year <- as.numeric(data$Year)
## Warning: NAs introduced by coercion
str(data)
## 'data.frame':    16598 obs. of  11 variables:
##  $ Rank        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Name        : chr  "Wii Sports" "Super Mario Bros." "Mario Kart Wii" "Wii Sports Resort" ...
##  $ Platform    : Factor w/ 31 levels "2600","3DO","3DS",..: 26 12 26 26 6 6 5 26 26 12 ...
##  $ Year        : num  2006 1985 2008 2009 1996 ...
##  $ Genre       : Factor w/ 12 levels "Action","Adventure",..: 11 5 7 11 8 6 5 4 5 9 ...
##  $ Publisher   : Factor w/ 579 levels "10TACLE Studios",..: 369 369 369 369 369 369 369 369 369 369 ...
##  $ NA_Sales    : num  41.5 29.1 15.8 15.8 11.3 ...
##  $ EU_Sales    : num  29.02 3.58 12.88 11.01 8.89 ...
##  $ JP_Sales    : num  3.77 6.81 3.79 3.28 10.22 ...
##  $ Other_Sales : num  8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
##  $ Global_Sales: num  82.7 40.2 35.8 33 31.4 ...

Each of column already changed into desired data type

Check for missing value

colSums(is.na(data))
##         Rank         Name     Platform         Year        Genre    Publisher 
##            0            0            0          271            0            0 
##     NA_Sales     EU_Sales     JP_Sales  Other_Sales Global_Sales 
##            0            0            0            0            0

There is missing value, so we need to remove the missing data first

data_clean<- na.omit(data)
anyNA(data_clean)
## [1] FALSE

Great!! No missing value anymore

Now, dataset is ready to be processed and analyzed

Data Explanation

Brief explanation

summary(data_clean)
##       Rank           Name              Platform         Year     
##  Min.   :    1   Length:16327       DS     :2133   Min.   :1980  
##  1st Qu.: 4136   Class :character   PS2    :2127   1st Qu.:2003  
##  Median : 8295   Mode  :character   PS3    :1304   Median :2007  
##  Mean   : 8293                      Wii    :1290   Mean   :2006  
##  3rd Qu.:12442                      X360   :1235   3rd Qu.:2010  
##  Max.   :16600                      PSP    :1197   Max.   :2020  
##                                     (Other):7041                 
##           Genre                             Publisher        NA_Sales      
##  Action      :3253   Electronic Arts             : 1339   Min.   : 0.0000  
##  Sports      :2304   Activision                  :  966   1st Qu.: 0.0000  
##  Misc        :1710   Namco Bandai Games          :  928   Median : 0.0800  
##  Role-Playing:1471   Ubisoft                     :  918   Mean   : 0.2654  
##  Shooter     :1282   Konami Digital Entertainment:  823   3rd Qu.: 0.2400  
##  Adventure   :1276   THQ                         :  712   Max.   :41.4900  
##  (Other)     :5031   (Other)                     :10641                    
##     EU_Sales          JP_Sales         Other_Sales        Global_Sales    
##  Min.   : 0.0000   Min.   : 0.00000   Min.   : 0.00000   Min.   : 0.0100  
##  1st Qu.: 0.0000   1st Qu.: 0.00000   1st Qu.: 0.00000   1st Qu.: 0.0600  
##  Median : 0.0200   Median : 0.00000   Median : 0.01000   Median : 0.1700  
##  Mean   : 0.1476   Mean   : 0.07866   Mean   : 0.04832   Mean   : 0.5402  
##  3rd Qu.: 0.1100   3rd Qu.: 0.04000   3rd Qu.: 0.04000   3rd Qu.: 0.4800  
##  Max.   :29.0200   Max.   :10.22000   Max.   :10.57000   Max.   :82.7400  
## 

Data Manipulation & Transformation

  1. What is the highest total sales in the Global?
most_global <- aggregate(Global_Sales~Genre,data_clean,sum)
sort(most_global$Global_Sales, decreasing = T)[1]
## [1] 1722.88

Answer : The highest total sales in the Global is 1772.88

2, Related to number 1, Which Genre is it?

most_global[most_global$Global_Sales==1722.88,]
##    Genre Global_Sales
## 1 Action      1722.88

Answer : Action is the highest total sales in the Global

  1. Who publish the most games?
most_publish <- as.data.frame(sort(table(data_clean$Publisher),decreasing = T))[1:5,]
names(most_publish)[1]<-paste("Publisher")
most_publish
##                      Publisher Freq
## 1              Electronic Arts 1339
## 2                   Activision  966
## 3           Namco Bandai Games  928
## 4                      Ubisoft  918
## 5 Konami Digital Entertainment  823

Answer : Electronic Arts(EA) is the Most Publish Game

  1. Where is the most Electronic Arts(EA) sales?
ea <- data_clean[data_clean$Publisher=="Electronic Arts",]
NA.sum <- sum(ea$NA_Sales)
EU.sum <- sum(ea$EU_Sales)
JP.sum <- sum(ea$JP_Sales)
Other.sum <- sum(ea$Other_Sales)

ea.sum <- cbind("Electronic Arts"=c("NA_Sales","EU_Sales","JP_Sales","Other_Sales"))
ea.sum <- cbind(ea.sum,as.data.frame(c(NA.sum,EU.sum,JP.sum,Other.sum)))

names(ea.sum)[2] <- paste("Sales")
ea.sum
##   Electronic Arts  Sales
## 1        NA_Sales 584.22
## 2        EU_Sales 367.38
## 3        JP_Sales  13.98
## 4     Other_Sales 127.63

Answer : Electronic Arts(EA) sales the most in North America or NA

  1. Which game in Electronic Arts(EA) sales the highest?
most_salesea<- aggregate(Global_Sales~Name,ea,sum)
sort(most_salesea$Global_Sales, decreasing = T)[1]
## [1] 19.02
most_salesea[most_salesea$Global_Sales==19.02,]
##        Name Global_Sales
## 130 FIFA 15        19.02

Answer : FIFA 15 is the most sales game in Electronic Arts(EA)

  1. What platform is FIFA 15 played?
ea[ea$Name=="FIFA 15",3]
## [1] PS4  PS3  X360 XOne PSV  Wii  3DS  PC  
## 31 Levels: 2600 3DO 3DS DC DS GB GBA GC GEN GG N64 NES NG PC PCFX PS ... XOne

Answer : You can Play FIFA 15 in PS4, PS3, X360, XOne, PSV, Wii, 3DS, PC

  1. Which Platform is FIFA 15 sales the most?
fifa <- ea[ea$Name=="FIFA 15",]
most_sales_fifa<- aggregate(Global_Sales~Platform,fifa,sum)
sort(most_sales_fifa$Global_Sales, decreasing = T)[1]
## [1] 6.59
most_sales_fifa[most_sales_fifa$Global_Sales==6.59,]
##   Platform Global_Sales
## 4      PS4         6.59

Answer : FIFA 15 sales the most in PS4 platform

Conclusion

As you can see, EA has the most publish game. EA sales the most in North America with tatal of sales 584.22. and FIFA 15 was one of the most sales in EA. you can play FIFA 15 in PS4, PS3, X360, XOne, PSV, Wii, 3DS, PC. FIFA 15 Sales the most in PS4 with total of sales 6.59 in the global.