Hi!! Welcome to my LBB in this LBB i’m gonna use Game sales dataset Enjoy!!
Make sure our data placed in the same folder our R project data.
Input data is DONE ! then let’s get started
## Rank Name Platform Year Genre Publisher NA_Sales
## 1 1 Wii Sports Wii 2006 Sports Nintendo 41.49
## 2 2 Super Mario Bros. NES 1985 Platform Nintendo 29.08
## 3 3 Mario Kart Wii Wii 2008 Racing Nintendo 15.85
## 4 4 Wii Sports Resort Wii 2009 Sports Nintendo 15.75
## 5 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo 11.27
## 6 6 Tetris GB 1989 Puzzle Nintendo 23.20
## EU_Sales JP_Sales Other_Sales Global_Sales
## 1 29.02 3.77 8.46 82.74
## 2 3.58 6.81 0.77 40.24
## 3 12.88 3.79 3.31 35.82
## 4 11.01 3.28 2.96 33.00
## 5 8.89 10.22 1.00 31.37
## 6 2.26 4.22 0.58 30.26
## [1] 16598 11
## [1] "Rank" "Name" "Platform" "Year" "Genre"
## [6] "Publisher" "NA_Sales" "EU_Sales" "JP_Sales" "Other_Sales"
## [11] "Global_Sales"
From our inspection we can conclude : * the data contain 16598 of rows and 11 of coloumns * Each of column name : “Rank”, “Name”, “Platform”, “Year”, “Genre”, “Publisher”, “NA_Sales”, “EU_Sales”, “JP_Sales”, “Other_Sales”, “Global_Sales”
Check data type for each column
## 'data.frame': 16598 obs. of 11 variables:
## $ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Name : chr "Wii Sports" "Super Mario Bros." "Mario Kart Wii" "Wii Sports Resort" ...
## $ Platform : chr "Wii" "NES" "Wii" "Wii" ...
## $ Year : chr "2006" "1985" "2008" "2009" ...
## $ Genre : chr "Sports" "Platform" "Racing" "Sports" ...
## $ Publisher : chr "Nintendo" "Nintendo" "Nintendo" "Nintendo" ...
## $ NA_Sales : num 41.5 29.1 15.8 15.8 11.3 ...
## $ EU_Sales : num 29.02 3.58 12.88 11.01 8.89 ...
## $ JP_Sales : num 3.77 6.81 3.79 3.28 10.22 ...
## $ Other_Sales : num 8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
## $ Global_Sales: num 82.7 40.2 35.8 33 31.4 ...
From this result, we find some of data type not in the corect type. we need to convert it into corect type (data coertion)
data$Platform <- as.factor(data$Platform)
data$Genre <- as.factor(data$Genre)
data$Publisher <- as.factor(data$Publisher)
data$Year <- as.numeric(data$Year)## Warning: NAs introduced by coercion
## 'data.frame': 16598 obs. of 11 variables:
## $ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Name : chr "Wii Sports" "Super Mario Bros." "Mario Kart Wii" "Wii Sports Resort" ...
## $ Platform : Factor w/ 31 levels "2600","3DO","3DS",..: 26 12 26 26 6 6 5 26 26 12 ...
## $ Year : num 2006 1985 2008 2009 1996 ...
## $ Genre : Factor w/ 12 levels "Action","Adventure",..: 11 5 7 11 8 6 5 4 5 9 ...
## $ Publisher : Factor w/ 579 levels "10TACLE Studios",..: 369 369 369 369 369 369 369 369 369 369 ...
## $ NA_Sales : num 41.5 29.1 15.8 15.8 11.3 ...
## $ EU_Sales : num 29.02 3.58 12.88 11.01 8.89 ...
## $ JP_Sales : num 3.77 6.81 3.79 3.28 10.22 ...
## $ Other_Sales : num 8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
## $ Global_Sales: num 82.7 40.2 35.8 33 31.4 ...
Each of column already changed into desired data type
Check for missing value
## Rank Name Platform Year Genre Publisher
## 0 0 0 271 0 0
## NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
## 0 0 0 0 0
There is missing value, so we need to remove the missing data first
## [1] FALSE
Great!! No missing value anymore
Now, dataset is ready to be processed and analyzed
Brief explanation
## Rank Name Platform Year
## Min. : 1 Length:16327 DS :2133 Min. :1980
## 1st Qu.: 4136 Class :character PS2 :2127 1st Qu.:2003
## Median : 8295 Mode :character PS3 :1304 Median :2007
## Mean : 8293 Wii :1290 Mean :2006
## 3rd Qu.:12442 X360 :1235 3rd Qu.:2010
## Max. :16600 PSP :1197 Max. :2020
## (Other):7041
## Genre Publisher NA_Sales
## Action :3253 Electronic Arts : 1339 Min. : 0.0000
## Sports :2304 Activision : 966 1st Qu.: 0.0000
## Misc :1710 Namco Bandai Games : 928 Median : 0.0800
## Role-Playing:1471 Ubisoft : 918 Mean : 0.2654
## Shooter :1282 Konami Digital Entertainment: 823 3rd Qu.: 0.2400
## Adventure :1276 THQ : 712 Max. :41.4900
## (Other) :5031 (Other) :10641
## EU_Sales JP_Sales Other_Sales Global_Sales
## Min. : 0.0000 Min. : 0.00000 Min. : 0.00000 Min. : 0.0100
## 1st Qu.: 0.0000 1st Qu.: 0.00000 1st Qu.: 0.00000 1st Qu.: 0.0600
## Median : 0.0200 Median : 0.00000 Median : 0.01000 Median : 0.1700
## Mean : 0.1476 Mean : 0.07866 Mean : 0.04832 Mean : 0.5402
## 3rd Qu.: 0.1100 3rd Qu.: 0.04000 3rd Qu.: 0.04000 3rd Qu.: 0.4800
## Max. :29.0200 Max. :10.22000 Max. :10.57000 Max. :82.7400
##
most_global <- aggregate(Global_Sales~Genre,data_clean,sum)
sort(most_global$Global_Sales, decreasing = T)[1]## [1] 1722.88
Answer : The highest total sales in the Global is 1772.88
2, Related to number 1, Which Genre is it?
## Genre Global_Sales
## 1 Action 1722.88
Answer : Action is the highest total sales in the Global
most_publish <- as.data.frame(sort(table(data_clean$Publisher),decreasing = T))[1:5,]
names(most_publish)[1]<-paste("Publisher")
most_publish## Publisher Freq
## 1 Electronic Arts 1339
## 2 Activision 966
## 3 Namco Bandai Games 928
## 4 Ubisoft 918
## 5 Konami Digital Entertainment 823
Answer : Electronic Arts(EA) is the Most Publish Game
ea <- data_clean[data_clean$Publisher=="Electronic Arts",]
NA.sum <- sum(ea$NA_Sales)
EU.sum <- sum(ea$EU_Sales)
JP.sum <- sum(ea$JP_Sales)
Other.sum <- sum(ea$Other_Sales)
ea.sum <- cbind("Electronic Arts"=c("NA_Sales","EU_Sales","JP_Sales","Other_Sales"))
ea.sum <- cbind(ea.sum,as.data.frame(c(NA.sum,EU.sum,JP.sum,Other.sum)))
names(ea.sum)[2] <- paste("Sales")
ea.sum## Electronic Arts Sales
## 1 NA_Sales 584.22
## 2 EU_Sales 367.38
## 3 JP_Sales 13.98
## 4 Other_Sales 127.63
Answer : Electronic Arts(EA) sales the most in North America or NA
most_salesea<- aggregate(Global_Sales~Name,ea,sum)
sort(most_salesea$Global_Sales, decreasing = T)[1]## [1] 19.02
## Name Global_Sales
## 130 FIFA 15 19.02
Answer : FIFA 15 is the most sales game in Electronic Arts(EA)
## [1] PS4 PS3 X360 XOne PSV Wii 3DS PC
## 31 Levels: 2600 3DO 3DS DC DS GB GBA GC GEN GG N64 NES NG PC PCFX PS ... XOne
Answer : You can Play FIFA 15 in PS4, PS3, X360, XOne, PSV, Wii, 3DS, PC
fifa <- ea[ea$Name=="FIFA 15",]
most_sales_fifa<- aggregate(Global_Sales~Platform,fifa,sum)
sort(most_sales_fifa$Global_Sales, decreasing = T)[1]## [1] 6.59
## Platform Global_Sales
## 4 PS4 6.59
Answer : FIFA 15 sales the most in PS4 platform
As you can see, EA has the most publish game. EA sales the most in North America with tatal of sales 584.22. and FIFA 15 was one of the most sales in EA. you can play FIFA 15 in PS4, PS3, X360, XOne, PSV, Wii, 3DS, PC. FIFA 15 Sales the most in PS4 with total of sales 6.59 in the global.