1 Exploring Data Structure and Question
setwd("~/Downloads")
library(readxl)
library(dplyr)
dat<-read_excel("data.xlsx",sheet=1)
question<-read_excel("data.xlsx",sheet=3)
question
## # A tibble: 9 x 2
## X__1 Questions
## <dbl> <chr>
## 1 1 How many unique values in the Name field?
## 2 2 What is the most popular game platform basing on a number of game…
## 3 3.1 Which Nintendo's game genre released in 2016 had the most highest…
## 4 3.2 According to 3.1, List top-3 highest critic score under that game…
## 5 4 Visualizing/Plotting a number of games and year of release betwee…
## 6 NA 4.1) Which year of release had the most drastic change in a numbe…
## 7 NA 4.2) Which year of release had the most drastic change in a numbe…
## 8 5 What is the most popular platform for games in Shooter category? …
## 9 6 List the names of game developers in Simulation category which ha…
glimpse(dat)
## Observations: 16,719
## Variables: 16
## $ Name <chr> "Wii Sports", "Super Mario Bros.", "Mario Kart...
## $ Platform <chr> "Wii", "NES", "Wii", "Wii", "GB", "GB", "DS", ...
## $ Year_of_Release <chr> "2006", "1985", "2008", "2009", "1996", "1989"...
## $ Genre <chr> "Sports", "Platform", "Racing", "Sports", "Rol...
## $ Publisher <chr> "Nintendo", "Nintendo", "Nintendo", "Nintendo"...
## $ NA_Sales <dbl> 41.36, 29.08, 15.68, 15.61, 11.27, 23.20, 11.2...
## $ EU_Sales <dbl> 28.96, 3.58, 12.76, 10.93, 8.89, 2.26, 9.14, 9...
## $ JP_Sales <dbl> 3.77, 6.81, 3.79, 3.28, 10.22, 4.22, 6.50, 2.9...
## $ Other_Sales <dbl> 8.45, 0.77, 3.29, 2.95, 1.00, 0.58, 2.88, 2.84...
## $ Global_Sales <dbl> 82.53, 40.24, 35.52, 32.77, 31.37, 30.26, 29.8...
## $ Critic_Score <dbl> 76, NA, 82, 80, NA, NA, 89, 58, 87, NA, NA, 91...
## $ Critic_Count <dbl> 51, NA, 73, 73, NA, NA, 65, 41, 80, NA, NA, 64...
## $ User_Score <chr> "8", NA, "8.3000000000000007", "8", NA, NA, "8...
## $ User_Count <dbl> 322, NA, 709, 192, NA, NA, 431, 129, 594, NA, ...
## $ Developer <chr> "Nintendo", NA, "Nintendo", "Nintendo", NA, NA...
## $ Rating <chr> "E", NA, "E", "E", NA, NA, "E", "E", "E", NA, ...
Question1: How many unique values in the Name field?
Answer is 11563 unique values.
count<-dat%>%summarise(n=n_distinct(Name))
count #unique values in the Name field
## # A tibble: 1 x 1
## n
## <int>
## 1 11563
3.1: Which Nintendo’s game genre released in 2016 had the most highest critic score in average?
Answer is Action (76.8 score)
dat%>%filter(Publisher=="Nintendo",Year_of_Release==2016)%>%group_by(Genre)%>%summarise(Average.Critic_score=mean(Critic_Score,na.rm=T))
## # A tibble: 6 x 2
## Genre Average.Critic_score
## <chr> <dbl>
## 1 Action 76.8
## 2 Misc 68
## 3 Role-Playing 75
## 4 Shooter 69
## 5 Simulation NaN
## 6 Strategy NaN
4 :Visualizing/Plotting a number of games and year of release between 2000 - 2016 of Nintendo, Electronic Arts, and Sony Computer Entertainment, and answer the question listed below.
library(ggplot2)
temp<-dat%>%filter(Year_of_Release>=2000 & Year_of_Release<=2016, Publisher%in%c("Nintendo","Electronic Arts","Sony Computer Entertainment"))%>%group_by(Year_of_Release,Publisher)%>%summarise(n=n())
temp
## # A tibble: 51 x 3
## # Groups: Year_of_Release [?]
## Year_of_Release Publisher n
## <chr> <chr> <int>
## 1 2000 Electronic Arts 31
## 2 2000 Nintendo 23
## 3 2000 Sony Computer Entertainment 32
## 4 2001 Electronic Arts 43
## 5 2001 Nintendo 22
## 6 2001 Sony Computer Entertainment 38
## 7 2002 Electronic Arts 88
## 8 2002 Nintendo 22
## 9 2002 Sony Computer Entertainment 30
## 10 2003 Electronic Arts 85
## # ... with 41 more rows
temp%>%ggplot(aes(x=Year_of_Release,y=n,fill=Publisher))+geom_bar(stat="identity")
4.1: Which year of release had the most drastic change in a number of games for Nintendo?
#ANSWER is 2004.
4.2: Which year of release had the most drastic change in a number of games for Electronic Arts?
ANSWER is 2002.
5: What is the most popular platform for games in Shooter category? And why?
ANSWER is X360.
dat%>%filter(Genre=="Shooter")%>%group_by(Platform)%>%summarise(n=n())%>%arrange(desc(n))
## # A tibble: 25 x 2
## Platform n
## <chr> <int>
## 1 X360 203
## 2 PS2 160
## 3 PS3 156
## 4 PC 150
## 5 XB 132
## 6 PS 96
## 7 Wii 65
## 8 GC 48
## 9 DS 42
## 10 PS4 41
## # ... with 15 more rows
6: List the names of game developers in Simulation category which have an average critic score higher than 60, and also have developped total of games more than 15 games.
ANSWER is the list below…
dat%>%filter(Genre=="Simulation")%>%group_by(Developer)%>%summarise(mean.Critic_Score=mean(Critic_Score,na.rm=T),n=n())%>%filter(n>15,mean.Critic_Score>60)%>%arrange(desc(mean.Critic_Score))
## # A tibble: 5 x 3
## Developer mean.Critic_Score n
## <chr> <dbl> <int>
## 1 Maxis 74.3 46
## 2 Ubisoft 70 33
## 3 Konami 68.9 19
## 4 From Software 65.7 16
## 5 Electronic Arts 65.3 17