1 Exploring Data Structure and Question

setwd("~/Downloads")
library(readxl)
library(dplyr)
dat<-read_excel("data.xlsx",sheet=1)
question<-read_excel("data.xlsx",sheet=3)
question
## # A tibble: 9 x 2
##    X__1 Questions                                                         
##   <dbl> <chr>                                                             
## 1   1   How many unique values in the Name field?                         
## 2   2   What is the most popular game platform basing on a number of game…
## 3   3.1 Which Nintendo's game genre released in 2016 had the most highest…
## 4   3.2 According to 3.1, List top-3 highest critic score under that game…
## 5   4   Visualizing/Plotting a number of games and year of release betwee…
## 6  NA   4.1) Which year of release had the most drastic change in a numbe…
## 7  NA   4.2) Which year of release had the most drastic change in a numbe…
## 8   5   What is the most popular platform for games in Shooter category? …
## 9   6   List the names of game developers in Simulation category which ha…
glimpse(dat)
## Observations: 16,719
## Variables: 16
## $ Name            <chr> "Wii Sports", "Super Mario Bros.", "Mario Kart...
## $ Platform        <chr> "Wii", "NES", "Wii", "Wii", "GB", "GB", "DS", ...
## $ Year_of_Release <chr> "2006", "1985", "2008", "2009", "1996", "1989"...
## $ Genre           <chr> "Sports", "Platform", "Racing", "Sports", "Rol...
## $ Publisher       <chr> "Nintendo", "Nintendo", "Nintendo", "Nintendo"...
## $ NA_Sales        <dbl> 41.36, 29.08, 15.68, 15.61, 11.27, 23.20, 11.2...
## $ EU_Sales        <dbl> 28.96, 3.58, 12.76, 10.93, 8.89, 2.26, 9.14, 9...
## $ JP_Sales        <dbl> 3.77, 6.81, 3.79, 3.28, 10.22, 4.22, 6.50, 2.9...
## $ Other_Sales     <dbl> 8.45, 0.77, 3.29, 2.95, 1.00, 0.58, 2.88, 2.84...
## $ Global_Sales    <dbl> 82.53, 40.24, 35.52, 32.77, 31.37, 30.26, 29.8...
## $ Critic_Score    <dbl> 76, NA, 82, 80, NA, NA, 89, 58, 87, NA, NA, 91...
## $ Critic_Count    <dbl> 51, NA, 73, 73, NA, NA, 65, 41, 80, NA, NA, 64...
## $ User_Score      <chr> "8", NA, "8.3000000000000007", "8", NA, NA, "8...
## $ User_Count      <dbl> 322, NA, 709, 192, NA, NA, 431, 129, 594, NA, ...
## $ Developer       <chr> "Nintendo", NA, "Nintendo", "Nintendo", NA, NA...
## $ Rating          <chr> "E", NA, "E", "E", NA, NA, "E", "E", "E", NA, ...

Question1: How many unique values in the Name field?

Answer is 11563 unique values.

count<-dat%>%summarise(n=n_distinct(Name)) 
count #unique values in the Name field
## # A tibble: 1 x 1
##       n
##   <int>
## 1 11563

3.1: Which Nintendo’s game genre released in 2016 had the most highest critic score in average?

Answer is Action (76.8 score)

dat%>%filter(Publisher=="Nintendo",Year_of_Release==2016)%>%group_by(Genre)%>%summarise(Average.Critic_score=mean(Critic_Score,na.rm=T))
## # A tibble: 6 x 2
##   Genre        Average.Critic_score
##   <chr>                       <dbl>
## 1 Action                       76.8
## 2 Misc                         68  
## 3 Role-Playing                 75  
## 4 Shooter                      69  
## 5 Simulation                  NaN  
## 6 Strategy                    NaN

4 :Visualizing/Plotting a number of games and year of release between 2000 - 2016 of Nintendo, Electronic Arts, and Sony Computer Entertainment, and answer the question listed below.

library(ggplot2)
temp<-dat%>%filter(Year_of_Release>=2000 & Year_of_Release<=2016, Publisher%in%c("Nintendo","Electronic Arts","Sony Computer Entertainment"))%>%group_by(Year_of_Release,Publisher)%>%summarise(n=n())
temp
## # A tibble: 51 x 3
## # Groups:   Year_of_Release [?]
##    Year_of_Release Publisher                       n
##    <chr>           <chr>                       <int>
##  1 2000            Electronic Arts                31
##  2 2000            Nintendo                       23
##  3 2000            Sony Computer Entertainment    32
##  4 2001            Electronic Arts                43
##  5 2001            Nintendo                       22
##  6 2001            Sony Computer Entertainment    38
##  7 2002            Electronic Arts                88
##  8 2002            Nintendo                       22
##  9 2002            Sony Computer Entertainment    30
## 10 2003            Electronic Arts                85
## # ... with 41 more rows
temp%>%ggplot(aes(x=Year_of_Release,y=n,fill=Publisher))+geom_bar(stat="identity")

4.1: Which year of release had the most drastic change in a number of games for Nintendo?

#ANSWER is 2004.

4.2: Which year of release had the most drastic change in a number of games for Electronic Arts?

ANSWER is 2002.


ANSWER is X360.

dat%>%filter(Genre=="Shooter")%>%group_by(Platform)%>%summarise(n=n())%>%arrange(desc(n))
## # A tibble: 25 x 2
##    Platform     n
##    <chr>    <int>
##  1 X360       203
##  2 PS2        160
##  3 PS3        156
##  4 PC         150
##  5 XB         132
##  6 PS          96
##  7 Wii         65
##  8 GC          48
##  9 DS          42
## 10 PS4         41
## # ... with 15 more rows

6: List the names of game developers in Simulation category which have an average critic score higher than 60, and also have developped total of games more than 15 games.

ANSWER is the list below…

dat%>%filter(Genre=="Simulation")%>%group_by(Developer)%>%summarise(mean.Critic_Score=mean(Critic_Score,na.rm=T),n=n())%>%filter(n>15,mean.Critic_Score>60)%>%arrange(desc(mean.Critic_Score))
## # A tibble: 5 x 3
##   Developer       mean.Critic_Score     n
##   <chr>                       <dbl> <int>
## 1 Maxis                        74.3    46
## 2 Ubisoft                      70      33
## 3 Konami                       68.9    19
## 4 From Software                65.7    16
## 5 Electronic Arts              65.3    17