Outlines :
library(ggplot2)
library(dplyr)
library(gridExtra)
library(plotly)
This year we had a very interesting case with the game No Man's Sky
. The game was released on PS4 and PC in August and is famous for having built over the past year a very large hype (see Kotaku,http://kotaku.com/the-no-mans-sky-hype-dilemma-1785416931,Eurogamer : http://www.eurogamer.net/articles/2016-12-20-no-mans-sky-changed-the-video-game-hype-train-forever)
The reason was a game that did not delivered all the features it was promising (although I found the game quite good and exactly what I was expecting) and suffered from very bad reviews on Steam, whereas the game had pretty good professional reviews.
So having the information about the Metacritic score
(professional review) and User score
(public review) would be interesting to look at, also in regards of the Sales it generates, because most of the people had the game pre-ordered.
df<-read.csv('Video_Games_Sales_as_at_30_Nov_2016.csv',sep=',')
df$year = as.numeric(as.character(df$Year_of_Release))
df$User_Score_num = as.numeric(as.character(df$User_Score)) *10
#create new columns to regroup the Platform by manufacturers
sony<-c('PS','PS2','PS3','PS4' ,'PSP','PSV')
microsoft<-c('PC','X360','XB','XOne')
nintendo<-c('3DS','DS','GBA','GC','N64','Wii','WiiU')
sega<-c('DC')
newPlatform<-function(x){
if (x %in% sony == TRUE) {return('SONY')}
else if(x %in% microsoft == TRUE) {return('MICROSOFT')}
else if(x %in% nintendo == TRUE) {return('NINTENDO')}
else if(x %in% sega == TRUE) {return('SEGA')}
else{return('OTHER')}
}
df$newPlatform<-sapply(df$Platform, newPlatform)
df2 <- na.omit(df)
#there are still few rows for which the Rating is an empty string
df2<-filter(df2,Rating!='')
filter(df2,Name=="No Man's Sky")
## Name Platform Year_of_Release Genre Publisher NA_Sales
## 1 No Man's Sky PS4 2016 Action Hello Games 0.62
## EU_Sales JP_Sales Other_Sales Global_Sales Critic_Score Critic_Count
## 1 0.75 0.03 0.27 1.67 71 93
## User_Score User_Count Developer Rating year User_Score_num newPlatform
## 1 4.5 5046 Hello Games T 2016 45 SONY
We can select the same games (Platform
, Genre
) to check the correlation between the MetaCritic
and User
scores :
df2<-filter(df2,Genre=='Action' & Platform=='PS4')
summary(df2)
## Name Platform Year_of_Release
## 7 Days to Die : 1 PS4 :57 2015 :22
## Aegis of Earth: Protonovus Assault: 1 3DS : 0 2014 :17
## Anima - Gate of Memories : 1 DC : 0 2016 :16
## Assassin's Creed Chronicles: China: 1 DS : 0 2013 : 2
## Assassin's Creed IV: Black Flag : 1 GBA : 0 1985 : 0
## Assassin's Creed: Unity : 1 GC : 0 1988 : 0
## (Other) :51 (Other): 0 (Other): 0
## Genre Publisher
## Action :57 Warner Bros. Interactive Entertainment:10
## Adventure: 0 Ubisoft : 7
## Fighting : 0 Activision : 5
## Misc : 0 Sony Computer Entertainment : 5
## Platform : 0 Namco Bandai Games : 3
## Puzzle : 0 Square Enix : 3
## (Other) : 0 (Other) :24
## NA_Sales EU_Sales JP_Sales Other_Sales
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0200 1st Qu.:0.0300 1st Qu.:0.00000 1st Qu.:0.0100
## Median :0.1300 Median :0.1800 Median :0.01000 Median :0.0500
## Mean :0.4054 Mean :0.5284 Mean :0.05105 Mean :0.1816
## 3rd Qu.:0.5500 3rd Qu.:0.7600 3rd Qu.:0.07000 3rd Qu.:0.2700
## Max. :3.8800 Max. :6.0400 Max. :0.48000 Max. :1.9100
##
## Global_Sales Critic_Score Critic_Count User_Score
## Min. : 0.010 Min. :44.00 Min. : 4.00 7.8 : 5
## 1st Qu.: 0.080 1st Qu.:69.00 1st Qu.: 29.00 7.9 : 4
## Median : 0.330 Median :73.00 Median : 42.00 5.7 : 3
## Mean : 1.167 Mean :72.98 Mean : 44.96 6.6 : 3
## 3rd Qu.: 1.670 3rd Qu.:80.00 3rd Qu.: 60.00 7.6 : 3
## Max. :12.190 Max. :97.00 Max. :100.00 8.1 : 3
## (Other):36
## User_Count Developer Rating year
## Min. : 6.0 TT Games : 5 M :25 Min. :2013
## 1st Qu.: 50.0 Omega Force : 3 T :19 1st Qu.:2014
## Median : 98.0 PlatinumGames : 2 E10+ :11 Median :2015
## Mean : 756.1 Sucker Punch : 2 E : 2 Mean :2015
## 3rd Qu.:1104.0 Techland : 2 : 0 3rd Qu.:2016
## Max. :6304.0 Ubisoft Montreal: 2 AO : 0 Max. :2016
## (Other) :41 (Other): 0
## User_Score_num newPlatform
## Min. :34.00 Length:57
## 1st Qu.:63.00 Class :character
## Median :72.00 Mode :character
## Mean :69.44
## 3rd Qu.:78.00
## Max. :86.00
##
ggplot(data=df2,aes(x= User_Score_num,y= Critic_Score,label=Name)) + geom_point(aes(color=Rating,size=Global_Sales)) + xlim(0,100) + ylim(0,100) + geom_abline(intercept = 0, slope = 1, color="red")
We can select to plot the outliers games, meaning games for which the difference between the 2 Scores is above a given threshold.
df2$DiffScore<-df2$Critic_Score - df2$User_Score_num
#define the mean and standard deviation for the Scores difference
meanDF<-mean(df2$DiffScore)
sdDF<-sd(df2$DiffScore)
sprintf("mean: %f sd :%f", meanDF,sdDF)
## [1] "mean: 3.543860 sd :12.087582"
ggplot(df2, aes(x = Critic_Score - User_Score_num)) + geom_histogram(aes(y = ..density..),bins=50) + stat_function(fun = dnorm,args = with(df2, c(mean = meanDF, sd = sdDF)),color='red')
#definition of the threshold
Threshold<- meanDF + sdDF
So it turns out that No Man's Sky
lies between \(\mu\) \(\pm\) 1*\(\sigma\) (15)
and then plot in particular games for which the difference between the 2 scores is above this threshold :
ggplot(data=filter(df2,Genre=='Action' & Platform=='PS4'),aes(x= User_Score_num,y= Critic_Score,label=Name)) + geom_point(aes(color=Rating,size=Global_Sales)) + xlim(0,100) + ylim(0,100) + geom_abline(intercept = 0, slope = 1, color="red") + geom_text(aes(label=ifelse(abs(Critic_Score-User_Score_num)>Threshold,as.character(Name),''),hjust=0, vjust=0))
The interesting thing to remark is that No Man's Sky
is in the third position (by Global Sales) for games above the Threshold (the 2 first being Ubisoft games … ¯\_(ツ) _/¯
df2 %>% filter(abs(DiffScore)>Threshold) %>% select(DiffScore, Global_Sales, Name, year) %>% arrange(desc(Global_Sales))
## DiffScore Global_Sales Name year
## 1 17 4.01 Watch Dogs 2014
## 2 21 3.93 Assassin's Creed: Unity 2014
## 3 26 1.67 No Man's Sky 2016
## 4 17 1.56 Mafia III 2016
## 5 24 0.52 Rory McIlroy PGA Tour 2015
## 6 20 0.43 Skylanders: Trap Team 2014
## 7 47 0.30 Skylanders: SuperChargers 2015
## 8 -21 0.02 Aegis of Earth: Protonovus Assault 2016
## 9 -27 0.02 Anima - Gate of Memories 2016
## 10 16 0.02 Dead Rising 2 2016
The game may not be an outlier in the Score
difference however it is an important outlier in the number of reviews, between professional and public reviews.
There goes the hype train …
df2$DiffRev<-df2$Critic_Count - df2$User_Count
#define the mean and standard deviation for the Scores difference
meanRev<-mean(df2$DiffRev)
sdRev<-sd(df2$DiffRev)
sprintf("mean: %f sd :%f", meanRev,sdRev)
## [1] "mean: -711.122807 sd :1313.378890"
ggplot(df2, aes(x = DiffRev)) + geom_histogram(aes(y = ..density..),bins=50) + stat_function(fun = dnorm,args = with(df2, c(mean = meanRev, sd = sdRev)),color='red')
#definition of the threshold
ThresholdRev<- meanRev - 3*sdRev
ggplot(data=df2,aes(x= User_Score_num- Critic_Score,y = Critic_Count - User_Count,label=Name)) + geom_point(aes(color=Rating,size=Global_Sales)) + geom_text(aes(label=ifelse(abs(DiffRev)>abs(ThresholdRev),as.character(Name),''),hjust=0, vjust=0))
Using a 3\(\sigma\) cut really defines No Man's Sky
as an outlier here because it’s the 2nd game having the largest difference in the number of reviews and at the same time having a large difference in the Scores
difference.
dfAll <- na.omit(df)
dfAll<-filter(dfAll,Rating!='')
dfAll$DiffScore<-dfAll$Critic_Score - dfAll$User_Score_num
dfAll$DiffRev<-dfAll$Critic_Count - dfAll$User_Count
plot_ly(dfAll,x = dfAll$DiffScore, y = dfAll$DiffRev, text = paste("Name : ", dfAll$Name), mode="markers", color=factor(dfAll$newPlatform) ,size=dfAll$Global_Sales)
History :