Content
Alongside the fields: Name, Platform, Year_of_Release, Genre, Publisher, NA_Sales, EU_Sales, JP_Sales, Other_Sales, Global_Sales, we have:-
Critic_score - Aggregate score compiled by Metacritic staff Critic_count - The number of critics used in coming up with the Critic_score User_score - Score by Metacritic’s subscribers User_count - Number of users who gave the user_score Developer - Party responsible for creating the game Rating - The ESRB ratings
Some important analysis about the data are explained below: Derived from Video game sales from Vgchartz and corresponding ratings from Metacritic
->Reading dataset into R
vgame1.df <- read.csv(paste("Video Game Sales Data.csv.csv", sep=""))
View(vgame1.df)
-Dimensions of the dataset
dim(vgame1.df)
## [1] 16719 16
-Visualizing the data , Frequency count based on the year of release of a video game
table(vgame1.df$Year_of_Release)
##
## 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
## 9 46 36 17 14 14 21 16 15 17 16 41 43 62 121
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
## 219 263 289 379 338 350 482 829 775 762 939 1006 1197 1427 1426
## 2010 2011 2012 2013 2014 2015 2016 2017 2020 N/A
## 1255 1136 653 544 581 606 502 3 1 269
->Cleaning the data
- Since the data is huge , indicating the video game sales in several years, my project considered the video game sales of the years 2016 & 2015
-1) Viewing the subset-1 containing data of only year 2016
vgame2016<- vgame1.df[which(vgame1.df$Year_of_Release=="2016"), ]
View(vgame2016)
-Dimensions of subset-1(year 2016)
dim(vgame2016)
## [1] 502 16
-Summarizing the subset-1 (year 2016)
library(psych)
describe(vgame2016)
## vars n mean sd median trimmed mad min
## Name* 1 502 5739.26 3456.23 5894.00 5739.69 4452.25 3.00
## Platform* 2 502 19.83 7.47 19.00 20.43 2.97 3.00
## Year_of_Release* 3 502 37.00 0.00 37.00 37.00 0.00 37.00
## Genre* 4 502 5.85 3.89 5.00 5.53 4.45 2.00
## Publisher* 5 502 309.33 178.72 354.00 315.46 231.29 7.00
## NA_Sales 6 502 0.09 0.26 0.01 0.03 0.01 0.00
## EU_Sales 7 502 0.10 0.36 0.01 0.03 0.01 0.00
## JP_Sales 8 502 0.04 0.13 0.00 0.02 0.00 0.00
## Other_Sales 9 502 0.03 0.09 0.00 0.01 0.00 0.00
## Global_Sales 10 502 0.26 0.70 0.06 0.11 0.07 0.01
## Critic_Score 11 232 73.16 11.74 74.50 74.28 11.12 31.00
## Critic_Count 12 232 30.25 23.89 22.00 26.98 19.27 4.00
## User_Score* 13 502 41.07 36.42 49.50 39.55 52.63 1.00
## User_Count 14 262 264.97 671.24 57.00 126.51 69.68 5.00
## Developer* 15 502 545.22 580.25 378.00 484.34 558.94 1.00
## Rating* 16 502 3.86 3.13 3.00 3.58 2.97 1.00
## max range skew kurtosis se
## Name* 11536.00 11533.00 -0.04 -1.25 154.26
## Platform* 31.00 28.00 -0.45 0.36 0.33
## Year_of_Release* 37.00 0.00 NaN NaN 0.00
## Genre* 13.00 11.00 0.41 -1.45 0.17
## Publisher* 576.00 569.00 -0.26 -1.32 7.98
## NA_Sales 2.98 2.98 5.69 42.87 0.01
## EU_Sales 5.75 5.75 9.76 128.76 0.02
## JP_Sales 2.26 2.26 11.51 169.13 0.01
## Other_Sales 1.11 1.11 6.80 57.74 0.00
## Global_Sales 7.59 7.58 6.52 52.98 0.03
## Critic_Score 93.00 62.00 -0.85 0.64 0.77
## Critic_Count 113.00 109.00 1.12 0.66 1.57
## User_Score* 97.00 96.00 0.01 -1.69 1.63
## User_Count 7064.00 7059.00 6.43 52.11 41.47
## Developer* 1671.00 1670.00 0.60 -1.15 25.90
## Rating* 9.00 8.00 0.57 -1.29 0.14
-One-way Contingency tables of subset-1( year 2016)
mytable1<- with(vgame2016,table(Genre))
mytable1
## Genre
## Action Adventure Fighting Misc
## 0 178 56 16 32
## Platform Puzzle Racing Role-Playing Shooter
## 15 1 24 54 47
## Simulation Sports Strategy
## 18 48 13
mytable2<- with(vgame2016,table(Platform))
mytable2
## Platform
## 2600 3DO 3DS DC DS GB GBA GC GEN GG N64 NES NG PC PCFX
## 0 0 46 0 0 0 0 0 0 0 0 0 0 54 0
## PS PS2 PS3 PS4 PSP PSV SAT SCD SNES TG16 Wii WiiU WS X360 XB
## 0 0 38 164 0 85 0 0 0 0 1 14 0 13 0
## XOne
## 87
mytable3<- with(vgame2016,table(Rating))
mytable3
## Rating
## AO E E10+ EC K-A M RP T
## 222 0 66 50 0 0 78 0 86
-Two-way Contingency tables of subset-1( year 2016)
mytable<-xtabs(~ Genre+Rating,data=vgame2016)
mytable
## Rating
## Genre AO E E10+ EC K-A M RP T
## 0 0 0 0 0 0 0 0 0
## Action 88 0 9 20 0 0 34 0 27
## Adventure 38 0 0 2 0 0 10 0 6
## Fighting 5 0 0 1 0 0 0 0 10
## Misc 16 0 5 6 0 0 1 0 4
## Platform 2 0 2 8 0 0 0 0 3
## Puzzle 0 0 0 1 0 0 0 0 0
## Racing 7 0 17 0 0 0 0 0 0
## Role-Playing 31 0 1 2 0 0 8 0 12
## Shooter 5 0 0 7 0 0 25 0 10
## Simulation 10 0 6 0 0 0 0 0 2
## Sports 12 0 26 2 0 0 0 0 8
## Strategy 8 0 0 1 0 0 0 0 4
mytable11<-xtabs(~ Genre+Platform,data=vgame2016)
mytable11
## Platform
## Genre 2600 3DO 3DS DC DS GB GBA GC GEN GG N64 NES NG PC PCFX PS
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Action 0 0 22 0 0 0 0 0 0 0 0 0 0 7 0 0
## Adventure 0 0 5 0 0 0 0 0 0 0 0 0 0 5 0 0
## Fighting 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0
## Misc 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0
## Platform 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0
## Puzzle 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Racing 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0
## Role-Playing 0 0 7 0 0 0 0 0 0 0 0 0 0 4 0 0
## Shooter 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0
## Simulation 0 0 3 0 0 0 0 0 0 0 0 0 0 8 0 0
## Sports 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0
## Strategy 0 0 1 0 0 0 0 0 0 0 0 0 0 8 0 0
## Platform
## Genre PS2 PS3 PS4 PSP PSV SAT SCD SNES TG16 Wii WiiU WS X360 XB
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Action 0 13 59 0 35 0 0 0 0 0 6 0 3 0
## Adventure 0 2 14 0 25 0 0 0 0 0 1 0 1 0
## Fighting 0 2 7 0 2 0 0 0 0 0 1 0 0 0
## Misc 0 6 10 0 3 0 0 0 0 1 2 0 1 0
## Platform 0 1 5 0 0 0 0 0 0 0 2 0 1 0
## Puzzle 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## Racing 0 0 9 0 0 0 0 0 0 0 0 0 0 0
## Role-Playing 0 5 18 0 15 0 0 0 0 0 1 0 0 0
## Shooter 0 1 20 0 0 0 0 0 0 0 1 0 1 0
## Simulation 0 0 4 0 1 0 0 0 0 0 0 0 0 0
## Sports 0 8 16 0 2 0 0 0 0 0 0 0 6 0
## Strategy 0 0 2 0 1 0 0 0 0 0 0 0 0 0
## Platform
## Genre XOne
## 0
## Action 33
## Adventure 3
## Fighting 2
## Misc 4
## Platform 3
## Puzzle 0
## Racing 9
## Role-Playing 4
## Shooter 15
## Simulation 2
## Sports 11
## Strategy 1
mytable12<-xtabs(~ Rating+Platform,data=vgame2016)
mytable12
## Platform
## Rating 2600 3DO 3DS DC DS GB GBA GC GEN GG N64 NES NG PC PCFX PS PS2 PS3
## 0 0 32 0 0 0 0 0 0 0 0 0 0 17 0 0 0 24
## AO 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## E 0 0 5 0 0 0 0 0 0 0 0 0 0 12 0 0 0 4
## E10+ 0 0 4 0 0 0 0 0 0 0 0 0 0 2 0 0 0 4
## EC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## K-A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## M 0 0 1 0 0 0 0 0 0 0 0 0 0 14 0 0 0 1
## RP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## T 0 0 4 0 0 0 0 0 0 0 0 0 0 9 0 0 0 5
## Platform
## Rating PS4 PSP PSV SAT SCD SNES TG16 Wii WiiU WS X360 XB XOne
## 61 0 69 0 0 0 0 0 2 0 1 0 16
## AO 0 0 0 0 0 0 0 0 0 0 0 0 0
## E 20 0 0 0 0 0 0 0 2 0 5 0 18
## E10+ 16 0 4 0 0 0 0 1 7 0 4 0 8
## EC 0 0 0 0 0 0 0 0 0 0 0 0 0
## K-A 0 0 0 0 0 0 0 0 0 0 0 0 0
## M 32 0 2 0 0 0 0 0 0 0 1 0 27
## RP 0 0 0 0 0 0 0 0 0 0 0 0 0
## T 35 0 10 0 0 0 0 0 3 0 2 0 18
-2) Viewing the subset-2 containing data of year 2015
vgame2015<- vgame1.df[which(vgame1.df$Year_of_Release=="2015"), ]
View(vgame2015)
-Dimensions of subset-2( year 2015)
dim(vgame2015)
## [1] 606 16
-Summarizing the subset-2 (year 2015)
library(psych)
describe(vgame2015)
## vars n mean sd median trimmed mad min
## Name* 1 606 5816.44 3367.04 5832.00 5832.54 4175.00 4.00
## Platform* 2 606 19.14 8.21 19.00 19.67 2.97 3.00
## Year_of_Release* 3 606 36.00 0.00 36.00 36.00 0.00 36.00
## Genre* 4 606 5.50 3.88 3.00 5.10 1.48 2.00
## Publisher* 5 606 309.81 178.25 354.00 316.18 231.29 7.00
## NA_Sales 6 606 0.18 0.49 0.02 0.07 0.03 0.00
## EU_Sales 7 606 0.16 0.48 0.02 0.06 0.03 0.00
## JP_Sales 8 606 0.06 0.19 0.01 0.02 0.01 0.00
## Other_Sales 9 606 0.05 0.16 0.01 0.02 0.01 0.00
## Global_Sales 10 606 0.44 1.10 0.09 0.20 0.10 0.01
## Critic_Score 11 225 72.87 12.44 74.00 74.05 10.38 19.00
## Critic_Count 12 225 32.31 24.35 26.00 29.15 22.24 4.00
## User_Score* 13 606 38.71 36.44 42.00 36.61 60.79 1.00
## User_Count 14 297 393.37 1166.83 65.00 139.76 78.58 4.00
## Developer* 15 606 522.15 603.88 180.00 453.34 265.39 1.00
## Rating* 16 606 3.64 3.13 3.00 3.31 2.97 1.00
## max range skew kurtosis se
## Name* 11534.00 11530.00 -0.02 -1.20 136.78
## Platform* 31.00 28.00 -0.56 -0.13 0.33
## Year_of_Release* 36.00 0.00 NaN NaN 0.00
## Genre* 13.00 11.00 0.58 -1.28 0.16
## Publisher* 574.00 567.00 -0.28 -1.23 7.24
## NA_Sales 6.03 6.03 6.17 51.98 0.02
## EU_Sales 6.12 6.12 7.64 76.86 0.02
## JP_Sales 2.79 2.79 9.42 109.84 0.01
## Other_Sales 2.38 2.38 8.20 93.11 0.01
## Global_Sales 14.63 14.62 6.40 58.93 0.04
## Critic_Score 96.00 77.00 -1.36 3.33 0.83
## Critic_Count 103.00 99.00 1.00 0.21 1.62
## User_Score* 97.00 96.00 0.12 -1.68 1.48
## User_Count 10665.00 10661.00 6.02 42.34 67.71
## Developer* 1677.00 1676.00 0.67 -1.15 24.53
## Rating* 9.00 8.00 0.73 -1.09 0.13
-One-way Contingency tables of subset-2( year 2015)
mytable1<- with(vgame2015,table(Genre))
mytable1
## Genre
## Action Adventure Fighting Misc
## 0 253 54 21 39
## Platform Puzzle Racing Role-Playing Shooter
## 13 6 18 78 34
## Simulation Sports Strategy
## 15 59 16
mytable2<- with(vgame2015,table(Platform))
mytable2
## Platform
## 2600 3DO 3DS DC DS GB GBA GC GEN GG N64 NES NG PC PCFX
## 0 0 86 0 0 0 0 0 0 0 0 0 0 50 0
## PS PS2 PS3 PS4 PSP PSV SAT SCD SNES TG16 Wii WiiU WS X360 XB
## 0 0 73 137 3 110 0 0 0 0 4 28 0 35 0
## XOne
## 80
mytable3<- with(vgame2015,table(Rating))
mytable3
## Rating
## AO E E10+ EC K-A M RP T
## 291 0 87 51 0 0 71 0 106
-Two-way Contingency tables of subset-2(year 2015)
mytable<-xtabs(~ Genre+Rating,data=vgame2015)
mytable
## Rating
## Genre AO E E10+ EC K-A M RP T
## 0 0 0 0 0 0 0 0 0
## Action 132 0 19 29 0 0 35 0 38
## Adventure 40 0 0 1 0 0 7 0 6
## Fighting 7 0 0 0 0 0 3 0 11
## Misc 19 0 3 8 0 0 1 0 8
## Platform 2 0 10 1 0 0 0 0 0
## Puzzle 3 0 2 1 0 0 0 0 0
## Racing 7 0 10 1 0 0 0 0 0
## Role-Playing 45 0 1 2 0 0 9 0 21
## Shooter 12 0 0 1 0 0 16 0 5
## Simulation 6 0 4 1 0 0 0 0 4
## Sports 8 0 38 5 0 0 0 0 8
## Strategy 10 0 0 1 0 0 0 0 5
mytable11<-xtabs(~ Genre+Platform,data=vgame2015)
mytable11
## Platform
## Genre 2600 3DO 3DS DC DS GB GBA GC GEN GG N64 NES NG PC PCFX PS
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Action 0 0 39 0 0 0 0 0 0 0 0 0 0 16 0 0
## Adventure 0 0 4 0 0 0 0 0 0 0 0 0 0 2 0 0
## Fighting 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0
## Misc 0 0 10 0 0 0 0 0 0 0 0 0 0 2 0 0
## Platform 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0
## Puzzle 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0
## Racing 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0
## Role-Playing 0 0 15 0 0 0 0 0 0 0 0 0 0 3 0 0
## Shooter 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0
## Simulation 0 0 3 0 0 0 0 0 0 0 0 0 0 6 0 0
## Sports 0 0 1 0 0 0 0 0 0 0 0 0 0 4 0 0
## Strategy 0 0 4 0 0 0 0 0 0 0 0 0 0 8 0 0
## Platform
## Genre PS2 PS3 PS4 PSP PSV SAT SCD SNES TG16 Wii WiiU WS X360 XB
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Action 0 35 52 3 54 0 0 0 0 3 11 0 13 0
## Adventure 0 7 10 0 21 0 0 0 0 0 0 0 4 0
## Fighting 0 5 9 0 0 0 0 0 0 0 0 0 1 0
## Misc 0 3 4 0 5 0 0 0 0 1 7 0 2 0
## Platform 0 0 2 0 2 0 0 0 0 0 4 0 1 0
## Puzzle 0 0 1 0 0 0 0 0 0 0 1 0 0 0
## Racing 0 2 5 0 1 0 0 0 0 0 0 0 1 0
## Role-Playing 0 5 25 0 20 0 0 0 0 0 3 0 0 0
## Shooter 0 3 11 0 0 0 0 0 0 0 1 0 3 0
## Simulation 0 1 2 0 1 0 0 0 0 0 0 0 1 0
## Sports 0 12 15 0 4 0 0 0 0 0 1 0 9 0
## Strategy 0 0 1 0 2 0 0 0 0 0 0 0 0 0
## Platform
## Genre XOne
## 0
## Action 27
## Adventure 6
## Fighting 3
## Misc 5
## Platform 0
## Puzzle 0
## Racing 6
## Role-Playing 7
## Shooter 11
## Simulation 1
## Sports 13
## Strategy 1
mytable12<-xtabs(~ Rating+Platform,data=vgame2015)
mytable12
## Platform
## Rating 2600 3DO 3DS DC DS GB GBA GC GEN GG N64 NES NG PC PCFX PS PS2 PS3
## 0 0 57 0 0 0 0 0 0 0 0 0 0 14 0 0 0 39
## AO 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## E 0 0 10 0 0 0 0 0 0 0 0 0 0 9 0 0 0 9
## E10+ 0 0 13 0 0 0 0 0 0 0 0 0 0 3 0 0 0 5
## EC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## K-A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## M 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 4
## RP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## T 0 0 6 0 0 0 0 0 0 0 0 0 0 9 0 0 0 16
## Platform
## Rating PS4 PSP PSV SAT SCD SNES TG16 Wii WiiU WS X360 XB XOne
## 52 3 82 0 0 0 0 1 7 0 10 0 26
## AO 0 0 0 0 0 0 0 0 0 0 0 0 0
## E 18 0 4 0 0 0 0 2 11 0 11 0 13
## E10+ 8 0 3 0 0 0 0 1 7 0 4 0 7
## EC 0 0 0 0 0 0 0 0 0 0 0 0 0
## K-A 0 0 0 0 0 0 0 0 0 0 0 0 0
## M 21 0 6 0 0 0 0 0 1 0 4 0 20
## RP 0 0 0 0 0 0 0 0 0 0 0 0 0
## T 38 0 15 0 0 0 0 0 2 0 6 0 14
-> Box plots of different variables independently showing the comaparison in 2016 & 2015
par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$NA_Sales,
main="Boxplot of North America sales in 2016",
col=c("yellow"),
horizontal=TRUE,
xlab="NA sales" ))
with(vgame2015,boxplot(vgame2015$NA_Sales,
main="Boxplot of North America sales in 2015",
col=c("yellow"),
horizontal=TRUE,
xlab="NA sales" ))
par(mfrow=c(2,1))
with(vgame2016, boxplot(vgame2016$EU_Sales,
main="Boxplot of Europe sales in 2016",
col=c("yellow"),xlim=c(0,3),
horizontal=TRUE,
xlab="EU sales" ))
with(vgame2015, boxplot(vgame2015$EU_Sales,
main="Boxplot of Europe sales in 2015",
col=c("yellow"),xlim=c(0,3),
horizontal=TRUE,
xlab="EU sales" ))
par(mfrow=c(2,1))
with(vgame2016, boxplot(vgame2016$JP_Sales,
main="Boxplot of Japan sales in 2016",
col=c("yellow"),
horizontal=TRUE,
xlab="JP sales" ))
with(vgame2015, boxplot(vgame2015$JP_Sales,
main="Boxplot of Japan sales in 2015",
col=c("yellow"),
horizontal=TRUE,
xlab="JP sales" ))
par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$Global_Sales,
main="Boxplot of Global sales in 2016",
col=c("yellow"),
horizontal=TRUE,
xlab="Global sales" ))
with(vgame2015,boxplot(vgame2015$Global_Sales,
main="Boxplot of Global sales in 2015",
col=c("yellow"),
horizontal=TRUE,
xlab="Global sales" ))
par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$Other_Sales,
main="Boxplot of Other sales in 2016",
col=c("yellow"),
horizontal=TRUE,
xlab="Other sales" ))
with(vgame2015,boxplot(vgame2015$Other_Sales,
main="Boxplot of Other sales in 2015",
col=c("yellow"),
horizontal=TRUE,
xlab="Other sales" ))
–Changing User_score variable from factor vestor to numeric vector for year 2016
str(vgame2016)
## 'data.frame': 502 obs. of 16 variables:
## $ Name : Factor w/ 11563 levels "","'98 Koshien",..: 3121 7423 10726 1238 727 10403 3121 1238 3053 727 ...
## $ Platform : Factor w/ 31 levels "2600","3DO","3DS",..: 19 3 19 19 19 19 31 31 19 31 ...
## $ Year_of_Release: Factor w/ 40 levels "1980","1981",..: 37 37 37 37 37 37 37 37 37 37 ...
## $ Genre : Factor w/ 13 levels "","Action","Adventure",..: 12 9 10 10 10 10 12 10 2 10 ...
## $ Publisher : Factor w/ 582 levels "10TACLE Studios",..: 140 371 467 17 140 536 140 17 536 140 ...
## $ NA_Sales : num 0.66 2.98 1.85 1.61 1.1 1.35 0.43 1.46 0.6 1.28 ...
## $ EU_Sales : num 5.75 1.45 2.5 2 2.15 1.7 2.05 0.74 1.25 0.77 ...
## $ JP_Sales : num 0.08 2.26 0.19 0.15 0.21 0.15 0 0 0.06 0 ...
## $ Other_Sales : num 1.11 0.45 0.85 0.71 0.61 0.6 0.17 0.22 0.35 0.2 ...
## $ Global_Sales : num 7.59 7.14 5.38 4.46 4.08 3.8 2.65 2.42 2.26 2.25 ...
## $ Critic_Score : int 85 NA 93 77 88 80 84 78 76 87 ...
## $ Critic_Count : int 41 NA 113 82 31 64 50 17 91 37 ...
## $ User_Score : Factor w/ 97 levels "","0","0.2","0.3",..: 49 1 78 33 83 69 54 30 62 81 ...
## $ User_Count : int 398 NA 7064 1129 809 2219 201 290 635 440 ...
## $ Developer : Factor w/ 1697 levels "","10tacle Studios",..: 455 1 1002 733 440 910 455 733 1561 440 ...
## $ Rating : Factor w/ 9 levels "","AO","E","E10+",..: 3 1 9 7 7 7 3 7 7 7 ...
vgame2016$User_Score<-as.integer(vgame2016$User_Score)
str(vgame2016$User_Score)
## int [1:502] 49 1 78 33 83 69 54 30 62 81 ...
–Changing User_score variable from factor vestor to numeric vector for year 2015
str(vgame2015)
## 'data.frame': 606 obs. of 16 variables:
## $ Name : Factor w/ 11563 levels "","'98 Koshien",..: 1234 3120 9144 1234 2986 10729 3935 9043 2986 10206 ...
## $ Platform : Factor w/ 31 levels "2600","3DO","3DS",..: 19 19 19 31 19 19 31 27 31 19 ...
## $ Year_of_Release: Factor w/ 40 levels "1980","1981",..: 36 36 36 36 36 36 36 36 36 36 ...
## $ Genre : Factor w/ 13 levels "","Action","Adventure",..: 10 12 10 10 9 2 10 10 9 9 ...
## $ Publisher : Factor w/ 582 levels "10TACLE Studios",..: 17 140 140 17 65 467 330 371 65 354 ...
## $ NA_Sales : num 6.03 1.12 2.99 4.59 2.53 2.07 2.78 1.54 2.51 1.02 ...
## $ EU_Sales : num 5.86 6.12 3.49 2.11 3.27 1.71 1.27 1.18 1.32 2.13 ...
## $ JP_Sales : num 0.36 0.06 0.22 0.01 0.24 0.08 0.03 1.46 0.01 0.23 ...
## $ Other_Sales : num 2.38 1.28 1.28 0.68 1.13 0.76 0.41 0.26 0.38 0.59 ...
## $ Global_Sales : num 14.63 8.57 7.98 7.39 7.16 ...
## $ Critic_Score : int NA 82 NA NA 87 86 84 81 88 92 ...
## $ Critic_Count : int NA 42 NA NA 58 78 101 88 39 79 ...
## $ User_Score : Factor w/ 97 levels "","0","0.2","0.3",..: 1 42 1 1 64 80 63 84 61 91 ...
## $ User_Count : int NA 896 NA NA 4228 1264 2438 1184 1749 10179 ...
## $ Developer : Factor w/ 1697 levels "","10tacle Studios",..: 1 452 1 1 176 233 20 1035 176 282 ...
## $ Rating : Factor w/ 9 levels "","AO","E","E10+",..: 1 3 1 1 7 9 9 4 7 7 ...
vgame2015$User_Score<-as.integer(vgame2015$User_Score)
str(vgame2015$User_Score)
## int [1:606] 1 42 1 1 64 80 63 84 61 91 ...
par(mfrow=c(2,2))
with(vgame2016, boxplot(vgame2016$Critic_Score,
main="Boxplot of Critic score in 2016",
col=c("yellow"),
horizontal=TRUE,
xlab="Critic score" ))
with(vgame2016, boxplot(vgame2016$User_Score,
main="Boxplot of User score in 2016",
col=c("yellow"),
horizontal=TRUE,
xlab="User score" ))
with(vgame2015, boxplot(vgame2015$Critic_Score,
main="Boxplot of Critic score in 2015",
col=c("yellow"),
horizontal=TRUE,
xlab="Critic score" ))
with(vgame2015, boxplot(vgame2015$User_Score,
main="Boxplot of User score in 2015",
col=c("yellow"),
horizontal=TRUE,
xlab="User score" ))
par(mfrow=c(2,2))
with(vgame2016, boxplot(vgame2016$Critic_Count,
main="Boxplot of Critic count in 2016",
col=c("yellow"),
horizontal=TRUE,
xlab="Critic count" ))
with(vgame2016, boxplot(vgame2016$User_Count,
main="Boxplot of User count in 2016",
col=c("yellow"),
horizontal=TRUE,
xlab="User count" ))
with(vgame2015, boxplot(vgame2015$Critic_Count,
main="Boxplot of Critic count in 2015",
col=c("yellow"),
horizontal=TRUE,
xlab="Critic count" ))
with(vgame2015, boxplot(vgame2015$User_Count,
main="Boxplot of User count in 2015",
col=c("yellow"),
horizontal=TRUE,
xlab="User count" ))
-> Boxplots of variables correlated pair-wise and comaparison of them based on years 2016 & 2015
par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$Global_Sales ~ vgame2016$Genre, data=vgame2016,
horizontal=TRUE, yaxt="n",
ylab="Genre", xlab="Global sales", col=c("yellow"),
main="Comparison of Global sales based on Genre of the video game in 2016"),
axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))
with(vgame2015,boxplot(vgame2015$Global_Sales ~ vgame2015$Genre, data=vgame2015,
horizontal=TRUE, yaxt="n",
ylab="Genre", xlab="Global sales", col=c("yellow"),
main="Comparison of Global sales based on Genre of the video game in 2015"),
axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))
par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$Global_Sales ~ vgame2016$Rating, data=vgame2016,
horizontal=TRUE, yaxt="n",
ylab="Rating", xlab="Global sales", col=c("yellow"),
main="Comparison of Global sales on rating in 2016"),
axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))
with(vgame2015,boxplot(vgame2015$Global_Sales ~ vgame2015$Rating, data=vgame2015,
horizontal=TRUE, yaxt="n",
ylab="Rating", xlab="Global sales", col=c("yellow"),
main="Comparison of Global sales on rating in 2015"),
axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))
par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$Critic_Score ~ vgame2016$Rating, data=vgame2016, horizontal=TRUE, yaxt="n",
ylab="Rating", xlab="Critic score", col=c("yellow"),
main="Comparison of Critic score based on Rating in 2016"),
axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))
with(vgame2015,boxplot(vgame2015$Critic_Score~ vgame2015$Rating, data=vgame2015, horizontal=TRUE, yaxt="n",
ylab="Rating", xlab="Critic score", col=c("yellow"),
main="Comparison of Critic score based on Rating in 2015"),
axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))
par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$User_Score ~ vgame2016$Rating, data=vgame2016, horizontal=TRUE, yaxt="n",
ylab="Rating", xlab="User score", col=c("yellow"),
main="Comparison of User score based on Rating in 2016"),
axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))
with(vgame2015,boxplot(vgame2015$User_Score~ vgame2015$Rating, data=vgame2015, horizontal=TRUE, yaxt="n",
ylab="Rating", xlab="User score", col=c("yellow"),
main="Comparison of User score basedon Rating in 2015"),
axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))
-> Histograms of different variables correlated pair-wise in year 2016
library(lattice)
histogram(~Genre | Rating, data=vgame2016)
histogram(~Genre | Platform, data=vgame2016)
histogram(~Platform | Rating, data=vgame2016)
-> histograms of different variables correlated pair-wise in year 2015
library(lattice)
histogram(~Genre | Rating, data=vgame2015)
histogram(~Genre | Platform, data=vgame2015)
histogram(~Platform | Rating, data=vgame2015)
-> Scatterplots of variables showing comparison in sales in years 2016 & 2015
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplot(Global_Sales~ Critic_Score, data=vgame2016, spread=FALSE,
smoother.args=list(lty=2),pch=19,
main="Scatterplot of Global Sales vs. Critic score in 2016",
xlab="Critic score",
ylab="Global sales ",cex=0.6)
scatterplot(Global_Sales ~ User_Score, data=vgame2016, spread=FALSE,
smoother.args=list(lty=2),pch=19,xlim=c(1,100),
main="Scatterplot of Global sales vs. User SCore in 2016",
xlab="User Score",
ylab="Global sales ",cex=0.6)
scatterplot(Global_Sales~ Critic_Score, data=vgame2015, spread=FALSE,
smoother.args=list(lty=2),pch=19,
main="Scatterplot of Global sales vs. Critic score in 2015",
xlab="Critic score",
ylab="Global sales ",cex=0.6)
scatterplot(Global_Sales ~ User_Score, data=vgame2015, spread=FALSE,
smoother.args=list(lty=2),pch=19,xlim=c(1,100),
main="Scatterplot of Global sales vs. User SCore in 2015",
xlab="User score",
ylab="Global sales ",cex=0.6)
–Changing the Rating variables from factor vector to integer vector of year 2016
str(vgame2016)
## 'data.frame': 502 obs. of 16 variables:
## $ Name : Factor w/ 11563 levels "","'98 Koshien",..: 3121 7423 10726 1238 727 10403 3121 1238 3053 727 ...
## $ Platform : Factor w/ 31 levels "2600","3DO","3DS",..: 19 3 19 19 19 19 31 31 19 31 ...
## $ Year_of_Release: Factor w/ 40 levels "1980","1981",..: 37 37 37 37 37 37 37 37 37 37 ...
## $ Genre : Factor w/ 13 levels "","Action","Adventure",..: 12 9 10 10 10 10 12 10 2 10 ...
## $ Publisher : Factor w/ 582 levels "10TACLE Studios",..: 140 371 467 17 140 536 140 17 536 140 ...
## $ NA_Sales : num 0.66 2.98 1.85 1.61 1.1 1.35 0.43 1.46 0.6 1.28 ...
## $ EU_Sales : num 5.75 1.45 2.5 2 2.15 1.7 2.05 0.74 1.25 0.77 ...
## $ JP_Sales : num 0.08 2.26 0.19 0.15 0.21 0.15 0 0 0.06 0 ...
## $ Other_Sales : num 1.11 0.45 0.85 0.71 0.61 0.6 0.17 0.22 0.35 0.2 ...
## $ Global_Sales : num 7.59 7.14 5.38 4.46 4.08 3.8 2.65 2.42 2.26 2.25 ...
## $ Critic_Score : int 85 NA 93 77 88 80 84 78 76 87 ...
## $ Critic_Count : int 41 NA 113 82 31 64 50 17 91 37 ...
## $ User_Score : int 49 1 78 33 83 69 54 30 62 81 ...
## $ User_Count : int 398 NA 7064 1129 809 2219 201 290 635 440 ...
## $ Developer : Factor w/ 1697 levels "","10tacle Studios",..: 455 1 1002 733 440 910 455 733 1561 440 ...
## $ Rating : Factor w/ 9 levels "","AO","E","E10+",..: 3 1 9 7 7 7 3 7 7 7 ...
vgame2016$Rating<-as.numeric(vgame2016$Rating)
str(vgame2016$Rating)
## num [1:502] 3 1 9 7 7 7 3 7 7 7 ...
–Changing the Rating variables from factor vector to integer vector of year 2015
str(vgame2015)
## 'data.frame': 606 obs. of 16 variables:
## $ Name : Factor w/ 11563 levels "","'98 Koshien",..: 1234 3120 9144 1234 2986 10729 3935 9043 2986 10206 ...
## $ Platform : Factor w/ 31 levels "2600","3DO","3DS",..: 19 19 19 31 19 19 31 27 31 19 ...
## $ Year_of_Release: Factor w/ 40 levels "1980","1981",..: 36 36 36 36 36 36 36 36 36 36 ...
## $ Genre : Factor w/ 13 levels "","Action","Adventure",..: 10 12 10 10 9 2 10 10 9 9 ...
## $ Publisher : Factor w/ 582 levels "10TACLE Studios",..: 17 140 140 17 65 467 330 371 65 354 ...
## $ NA_Sales : num 6.03 1.12 2.99 4.59 2.53 2.07 2.78 1.54 2.51 1.02 ...
## $ EU_Sales : num 5.86 6.12 3.49 2.11 3.27 1.71 1.27 1.18 1.32 2.13 ...
## $ JP_Sales : num 0.36 0.06 0.22 0.01 0.24 0.08 0.03 1.46 0.01 0.23 ...
## $ Other_Sales : num 2.38 1.28 1.28 0.68 1.13 0.76 0.41 0.26 0.38 0.59 ...
## $ Global_Sales : num 14.63 8.57 7.98 7.39 7.16 ...
## $ Critic_Score : int NA 82 NA NA 87 86 84 81 88 92 ...
## $ Critic_Count : int NA 42 NA NA 58 78 101 88 39 79 ...
## $ User_Score : int 1 42 1 1 64 80 63 84 61 91 ...
## $ User_Count : int NA 896 NA NA 4228 1264 2438 1184 1749 10179 ...
## $ Developer : Factor w/ 1697 levels "","10tacle Studios",..: 1 452 1 1 176 233 20 1035 176 282 ...
## $ Rating : Factor w/ 9 levels "","AO","E","E10+",..: 1 3 1 1 7 9 9 4 7 7 ...
vgame2015$Rating<-as.numeric(vgame2015$Rating)
str(vgame2015$Rating)
## num [1:606] 1 3 1 1 7 9 9 4 7 7 ...
->Correlation Matrix visualization
library(corrplot)
## corrplot 0.84 loaded
corrplot(corr=cor(vgame2016[ ,6:14 ], use="complete.obs"),
method ="ellipse", main="correlation matrix of variables in 2016")
library(corrplot)
corrplot(corr=cor(vgame2015[ ,6:14 ], use="complete.obs"),
method ="ellipse" , main="correlation matrix of variables in 2015")
->Corrogram
library(corrgram)
corrgram(vgame2016, order=FALSE,
lower.panel=panel.shade,
upper.panel=panel.pie,
diag.panel=panel.minmax,
text.panel=panel.txt,
main="Corrgram of all the variables in 2016")
library(corrgram)
corrgram(vgame2015, order=FALSE,
lower.panel=panel.shade,
upper.panel=panel.pie,
diag.panel=panel.minmax,
text.panel=panel.txt,
main="Corrgram of all the variables in 2015")
->Scatterplot matrix
library(car)
scatterplotMatrix(formula = ~ Critic_Score + Critic_Count +
User_Score+ User_Count +Global_Sales , cex=0.6,
spread=FALSE, smoother.args=list(lty=2),pch=19,
data=vgame2016, diagonal="histogram",
main="scatterplot matrix in 2016")
scatterplotMatrix(formula = ~ Critic_Score + Critic_Count +
User_Score+ User_Count +Global_Sales , cex=0.6,
spread=FALSE, smoother.args=list(lty=2),pch=19,
data=vgame2015, diagonal="histogram",
main="scatterplot matrix in 2015")
-> Pearson’s chi-squared test not applied due to lack of definite categorical variables
->Appropriate dependent T- tests can be carried out for deciding the statistical significance of the dependency as follows
->NULL HYPOTHESIS: Global sales is independent of Critic score , Critic count, NA sales,EU sales, JP sales, Other sales and User count
attach(vgame2016)
t.test(Critic_Score,Global_Sales,paired=TRUE, data=vgame2016)
##
## Paired t-test
##
## data: Critic_Score and Global_Sales
## t = 96.234, df = 231, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 71.26568 74.24484
## sample estimates:
## mean of the differences
## 72.75526
t.test(Critic_Count,Global_Sales,paired=TRUE, data=vgame2016)
##
## Paired t-test
##
## data: Critic_Count and Global_Sales
## t = 19.309, df = 231, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 26.80807 32.90072
## sample estimates:
## mean of the differences
## 29.8544
t.test(Other_Sales,Global_Sales,paired=TRUE, data=vgame2016)
##
## Paired t-test
##
## data: Other_Sales and Global_Sales
## t = -8.4585, df = 501, p-value = 2.989e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2838165 -0.1768209
## sample estimates:
## mean of the differences
## -0.2303187
-Since the p-value is very low (<0.001) , there does appear a significant relationship between the two variables.
-Hence, the t-test rejects the null hypothesis that the two variables are indepependent and is statistically significant
- Therefore,the Global sales of any year depends on the Critic score , Critic count , NA sales, EU sales, JP sales and Other sales in some or the other way.
->Regression Model
Model -1
fit1<-lm(Global_Sales~Platform+Genre+Critic_Score+Critic_Count+
User_Score+User_Count+Rating, data=vgame1.df)
summary(fit1)
##
## Call:
## lm(formula = Global_Sales ~ Platform + Genre + Critic_Score +
## Critic_Count + User_Score + User_Count + Rating, data = vgame1.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.540 -0.581 -0.165 0.288 79.639
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.998e-01 1.273e+00 -0.628 0.52978
## PlatformDC -4.174e-01 5.075e-01 -0.822 0.41084
## PlatformDS 2.744e-01 1.634e-01 1.680 0.09308 .
## PlatformGBA 1.617e-02 1.857e-01 0.087 0.93061
## PlatformGC -2.011e-01 1.721e-01 -1.168 0.24282
## PlatformPC -9.402e-01 1.626e-01 -5.782 7.69e-09 ***
## PlatformPS 9.016e-01 2.055e-01 4.388 1.16e-05 ***
## PlatformPS2 2.107e-01 1.534e-01 1.373 0.16967
## PlatformPS3 1.318e-03 1.561e-01 0.008 0.99326
## PlatformPS4 -3.834e-01 1.823e-01 -2.103 0.03551 *
## PlatformPSP -8.528e-02 1.679e-01 -0.508 0.61154
## PlatformPSV -3.645e-01 2.158e-01 -1.689 0.09130 .
## PlatformWii 8.059e-01 1.626e-01 4.957 7.33e-07 ***
## PlatformWiiU -3.091e-01 2.340e-01 -1.321 0.18658
## PlatformX360 -1.544e-01 1.555e-01 -0.993 0.32068
## PlatformXB -3.696e-01 1.628e-01 -2.270 0.02323 *
## PlatformXOne -1.655e-01 2.004e-01 -0.826 0.40872
## GenreAdventure -2.378e-01 1.179e-01 -2.017 0.04370 *
## GenreFighting -3.015e-02 1.030e-01 -0.293 0.76981
## GenreMisc 2.648e-01 1.025e-01 2.583 0.00980 **
## GenrePlatform 1.309e-02 1.039e-01 0.126 0.89975
## GenrePuzzle -4.225e-01 1.737e-01 -2.433 0.01500 *
## GenreRacing 5.406e-02 9.150e-02 0.591 0.55469
## GenreRole-Playing -2.635e-01 8.085e-02 -3.260 0.00112 **
## GenreShooter -8.903e-03 7.592e-02 -0.117 0.90665
## GenreSimulation 1.445e-01 1.142e-01 1.265 0.20575
## GenreSports -1.868e-02 8.676e-02 -0.215 0.82954
## GenreStrategy -2.766e-01 1.193e-01 -2.318 0.02049 *
## Critic_Score 2.320e-02 2.207e-03 10.510 < 2e-16 ***
## Critic_Count 2.156e-02 1.466e-03 14.705 < 2e-16 ***
## User_Score0.6 -3.128e-01 2.155e+00 -0.145 0.88464
## User_Score0.7 -1.040e+00 2.151e+00 -0.483 0.62877
## User_Score0.9 2.648e-01 2.152e+00 0.123 0.90209
## User_Score1 -2.908e-01 1.756e+00 -0.166 0.86852
## User_Score1.2 -5.986e-01 1.757e+00 -0.341 0.73342
## User_Score1.3 1.970e-01 2.153e+00 0.092 0.92709
## User_Score1.4 -5.001e-01 1.605e+00 -0.312 0.75530
## User_Score1.5 -1.889e-01 1.756e+00 -0.108 0.91433
## User_Score1.7 -1.070e-01 1.388e+00 -0.077 0.93855
## User_Score1.8 -2.657e-01 1.521e+00 -0.175 0.86133
## User_Score1.9 7.800e-01 1.757e+00 0.444 0.65707
## User_Score2 1.572e-01 1.389e+00 0.113 0.90985
## User_Score2.1 -1.010e+00 1.374e+00 -0.735 0.46242
## User_Score2.2 -1.275e+00 1.436e+00 -0.888 0.37445
## User_Score2.3 2.121e-02 1.757e+00 0.012 0.99037
## User_Score2.4 -1.980e-01 1.361e+00 -0.146 0.88430
## User_Score2.5 -4.616e-01 1.374e+00 -0.336 0.73687
## User_Score2.6 3.652e+00 1.522e+00 2.399 0.01648 *
## User_Score2.7 -2.833e-01 1.409e+00 -0.201 0.84061
## User_Score2.8 -2.879e-01 1.310e+00 -0.220 0.82601
## User_Score2.9 -1.226e-01 1.409e+00 -0.087 0.93067
## User_Score3 -2.428e-01 1.329e+00 -0.183 0.85498
## User_Score3.1 -1.978e-02 1.306e+00 -0.015 0.98792
## User_Score3.2 8.009e-01 1.389e+00 0.577 0.56421
## User_Score3.3 -2.168e-03 1.329e+00 -0.002 0.99870
## User_Score3.4 1.876e-01 1.329e+00 0.141 0.88778
## User_Score3.5 -4.297e-01 1.297e+00 -0.331 0.74051
## User_Score3.6 -1.333e-02 1.303e+00 -0.010 0.99183
## User_Score3.7 -3.033e-01 1.310e+00 -0.232 0.81686
## User_Score3.8 -2.406e-01 1.297e+00 -0.186 0.85280
## User_Score3.9 -2.227e-01 1.341e+00 -0.166 0.86810
## User_Score4 -3.964e-01 1.293e+00 -0.307 0.75918
## User_Score4.1 -1.274e-01 1.283e+00 -0.099 0.92089
## User_Score4.2 -6.936e-01 1.299e+00 -0.534 0.59335
## User_Score4.3 -7.998e-02 1.288e+00 -0.062 0.95048
## User_Score4.4 -4.466e-01 1.282e+00 -0.348 0.72768
## User_Score4.5 -8.509e-01 1.285e+00 -0.662 0.50796
## User_Score4.6 -2.440e-01 1.282e+00 -0.190 0.84908
## User_Score4.7 -3.423e-01 1.295e+00 -0.264 0.79159
## User_Score4.8 -1.460e-01 1.270e+00 -0.115 0.90851
## User_Score4.9 -3.931e-01 1.275e+00 -0.308 0.75795
## User_Score5 -2.563e-01 1.265e+00 -0.203 0.83948
## User_Score5.1 -5.652e-01 1.277e+00 -0.443 0.65809
## User_Score5.2 -5.047e-01 1.268e+00 -0.398 0.69054
## User_Score5.3 -4.172e-01 1.263e+00 -0.330 0.74113
## User_Score5.4 -4.327e-01 1.263e+00 -0.343 0.73188
## User_Score5.5 -4.527e-01 1.263e+00 -0.358 0.71998
## User_Score5.6 -4.046e-01 1.261e+00 -0.321 0.74841
## User_Score5.7 -3.694e-01 1.261e+00 -0.293 0.76959
## User_Score5.8 -4.754e-01 1.257e+00 -0.378 0.70527
## User_Score5.9 -5.232e-01 1.260e+00 -0.415 0.67805
## User_Score6 -5.941e-01 1.254e+00 -0.474 0.63571
## User_Score6.1 -5.567e-01 1.259e+00 -0.442 0.65843
## User_Score6.2 -6.702e-01 1.255e+00 -0.534 0.59348
## User_Score6.3 -1.925e-01 1.253e+00 -0.154 0.87793
## User_Score6.4 -4.478e-01 1.256e+00 -0.356 0.72148
## User_Score6.5 -4.844e-01 1.254e+00 -0.386 0.69935
## User_Score6.6 -2.928e-01 1.253e+00 -0.234 0.81524
## User_Score6.7 -4.997e-01 1.255e+00 -0.398 0.69045
## User_Score6.8 -6.781e-01 1.251e+00 -0.542 0.58771
## User_Score6.9 -6.476e-01 1.253e+00 -0.517 0.60526
## User_Score7 -7.265e-01 1.250e+00 -0.581 0.56117
## User_Score7.1 -5.364e-01 1.252e+00 -0.429 0.66823
## User_Score7.2 -6.429e-01 1.252e+00 -0.513 0.60769
## User_Score7.3 -6.086e-01 1.250e+00 -0.487 0.62629
## User_Score7.4 -5.271e-01 1.250e+00 -0.422 0.67336
## User_Score7.5 -5.930e-01 1.249e+00 -0.475 0.63509
## User_Score7.6 -6.017e-01 1.251e+00 -0.481 0.63045
## User_Score7.7 -6.135e-01 1.250e+00 -0.491 0.62359
## User_Score7.8 -6.442e-01 1.249e+00 -0.516 0.60595
## User_Score7.9 -4.772e-01 1.250e+00 -0.382 0.70264
## User_Score8 -2.664e-01 1.249e+00 -0.213 0.83116
## User_Score8.1 -7.796e-01 1.250e+00 -0.624 0.53284
## User_Score8.2 -7.409e-01 1.250e+00 -0.593 0.55324
## User_Score8.3 -6.534e-01 1.250e+00 -0.523 0.60126
## User_Score8.4 -6.174e-01 1.251e+00 -0.493 0.62170
## User_Score8.5 -6.258e-01 1.250e+00 -0.500 0.61679
## User_Score8.6 -5.690e-01 1.251e+00 -0.455 0.64934
## User_Score8.7 -5.448e-01 1.252e+00 -0.435 0.66355
## User_Score8.8 -8.526e-01 1.252e+00 -0.681 0.49602
## User_Score8.9 -6.483e-01 1.254e+00 -0.517 0.60532
## User_Score9 -4.506e-01 1.257e+00 -0.359 0.71994
## User_Score9.1 -6.733e-01 1.260e+00 -0.534 0.59311
## User_Score9.2 -5.577e-01 1.276e+00 -0.437 0.66199
## User_Score9.3 -1.087e+00 1.286e+00 -0.845 0.39838
## User_Score9.4 -4.807e-01 1.361e+00 -0.353 0.72395
## User_Score9.5 -1.076e+00 1.476e+00 -0.729 0.46615
## User_Score9.6 -1.416e+00 1.763e+00 -0.803 0.42190
## User_Count 7.179e-04 4.352e-05 16.497 < 2e-16 ***
## RatingAO 1.958e-01 1.773e+00 0.110 0.91205
## RatingE 7.574e-02 2.216e-01 0.342 0.73254
## RatingE10+ -2.638e-01 2.236e-01 -1.180 0.23803
## RatingK-A -4.113e-01 1.779e+00 -0.231 0.81717
## RatingM -2.903e-01 2.217e-01 -1.309 0.19048
## RatingRP 8.913e-01 1.276e+00 0.698 0.48504
## RatingT -2.800e-01 2.185e-01 -1.281 0.20008
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.753 on 6891 degrees of freedom
## (9702 observations deleted due to missingness)
## Multiple R-squared: 0.1984, Adjusted R-squared: 0.1839
## F-statistic: 13.65 on 125 and 6891 DF, p-value: < 2.2e-16
-Converting factor variables into numeric vector
str(vgame1.df)
## 'data.frame': 16719 obs. of 16 variables:
## $ Name : Factor w/ 11563 levels "","'98 Koshien",..: 11059 9406 5573 11061 7417 9771 6693 11057 6696 2620 ...
## $ Platform : Factor w/ 31 levels "2600","3DO","3DS",..: 26 12 26 26 6 6 5 26 26 12 ...
## $ Year_of_Release: Factor w/ 40 levels "1980","1981",..: 27 6 29 30 17 10 27 27 30 5 ...
## $ Genre : Factor w/ 13 levels "","Action","Adventure",..: 12 6 8 12 9 7 6 5 6 10 ...
## $ Publisher : Factor w/ 582 levels "10TACLE Studios",..: 371 371 371 371 371 371 371 371 371 371 ...
## $ NA_Sales : num 41.4 29.1 15.7 15.6 11.3 ...
## $ EU_Sales : num 28.96 3.58 12.76 10.93 8.89 ...
## $ JP_Sales : num 3.77 6.81 3.79 3.28 10.22 ...
## $ Other_Sales : num 8.45 0.77 3.29 2.95 1 0.58 2.88 2.84 2.24 0.47 ...
## $ Global_Sales : num 82.5 40.2 35.5 32.8 31.4 ...
## $ Critic_Score : int 76 NA 82 80 NA NA 89 58 87 NA ...
## $ Critic_Count : int 51 NA 73 73 NA NA 65 41 80 NA ...
## $ User_Score : Factor w/ 97 levels "","0","0.2","0.3",..: 79 1 82 79 1 1 84 65 83 1 ...
## $ User_Count : int 322 NA 709 192 NA NA 431 129 594 NA ...
## $ Developer : Factor w/ 1697 levels "","10tacle Studios",..: 1035 1 1035 1035 1 1 1035 1035 1035 1 ...
## $ Rating : Factor w/ 9 levels "","AO","E","E10+",..: 3 1 3 3 1 1 3 3 3 1 ...
vgame1.df$Platform<-as.numeric(vgame1.df$Platform)
vgame1.df$Genre<-as.numeric(vgame1.df$Genre)
vgame1.df$User_Score<-as.numeric(vgame1.df$User_Score)
vgame1.df$Rating<-as.numeric(vgame1.df$Rating)
vgame1.df$Publisher<-as.numeric(vgame1.df$Publisher)
vgame1.df$Developer<-as.numeric(vgame1.df$Developer)
str(vgame1.df)
## 'data.frame': 16719 obs. of 16 variables:
## $ Name : Factor w/ 11563 levels "","'98 Koshien",..: 11059 9406 5573 11061 7417 9771 6693 11057 6696 2620 ...
## $ Platform : num 26 12 26 26 6 6 5 26 26 12 ...
## $ Year_of_Release: Factor w/ 40 levels "1980","1981",..: 27 6 29 30 17 10 27 27 30 5 ...
## $ Genre : num 12 6 8 12 9 7 6 5 6 10 ...
## $ Publisher : num 371 371 371 371 371 371 371 371 371 371 ...
## $ NA_Sales : num 41.4 29.1 15.7 15.6 11.3 ...
## $ EU_Sales : num 28.96 3.58 12.76 10.93 8.89 ...
## $ JP_Sales : num 3.77 6.81 3.79 3.28 10.22 ...
## $ Other_Sales : num 8.45 0.77 3.29 2.95 1 0.58 2.88 2.84 2.24 0.47 ...
## $ Global_Sales : num 82.5 40.2 35.5 32.8 31.4 ...
## $ Critic_Score : int 76 NA 82 80 NA NA 89 58 87 NA ...
## $ Critic_Count : int 51 NA 73 73 NA NA 65 41 80 NA ...
## $ User_Score : num 79 1 82 79 1 1 84 65 83 1 ...
## $ User_Count : int 322 NA 709 192 NA NA 431 129 594 NA ...
## $ Developer : num 1035 1 1035 1035 1 ...
## $ Rating : num 3 1 3 3 1 1 3 3 3 1 ...
Model-2 (Revised (Model-1)
fit1<-lm(Global_Sales~Platform+Genre+Critic_Score+Critic_Count+
User_Score+User_Count+Rating, data=vgame1.df)
summary(fit1)
##
## Call:
## lm(formula = Global_Sales ~ Platform + Genre + Critic_Score +
## Critic_Count + User_Score + User_Count + Rating, data = vgame1.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.157 -0.568 -0.206 0.188 80.964
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.855e-01 1.486e-01 -3.266 0.0011 **
## Platform 2.596e-03 2.789e-03 0.931 0.3519
## Genre -1.274e-02 5.813e-03 -2.192 0.0284 *
## Critic_Score 1.829e-02 2.131e-03 8.581 <2e-16 ***
## Critic_Count 2.023e-02 1.337e-03 15.131 <2e-16 ***
## User_Score -2.961e-03 1.886e-03 -1.570 0.1164
## User_Count 5.531e-04 4.092e-05 13.519 <2e-16 ***
## Rating -7.631e-02 8.633e-03 -8.839 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.805 on 7009 degrees of freedom
## (9702 observations deleted due to missingness)
## Multiple R-squared: 0.1352, Adjusted R-squared: 0.1343
## F-statistic: 156.5 on 7 and 7009 DF, p-value: < 2.2e-16
Model-3
fit2<-lm(Global_Sales~Platform+Genre+Publisher+Developer+Critic_Score+Critic_Count+
User_Score+User_Count+Rating, data=vgame1.df)
summary(fit2)
##
## Call:
## lm(formula = Global_Sales ~ Platform + Genre + Publisher + Developer +
## Critic_Score + Critic_Count + User_Score + User_Count + Rating,
## data = vgame1.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.113 -0.565 -0.203 0.192 80.939
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.692e-01 1.550e-01 -3.672 0.000242 ***
## Platform 2.389e-03 2.789e-03 0.857 0.391540
## Genre -1.160e-02 5.834e-03 -1.988 0.046843 *
## Publisher -1.371e-04 1.211e-04 -1.133 0.257389
## Developer 1.540e-04 4.580e-05 3.363 0.000776 ***
## Critic_Score 1.800e-02 2.135e-03 8.427 < 2e-16 ***
## Critic_Count 2.020e-02 1.338e-03 15.096 < 2e-16 ***
## User_Score -2.762e-03 1.894e-03 -1.458 0.144913
## User_Count 5.580e-04 4.092e-05 13.637 < 2e-16 ***
## Rating -7.698e-02 8.641e-03 -8.909 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.804 on 7007 degrees of freedom
## (9702 observations deleted due to missingness)
## Multiple R-squared: 0.1366, Adjusted R-squared: 0.1355
## F-statistic: 123.2 on 9 and 7007 DF, p-value: < 2.2e-16
- From the above models , Model 1 is the best fit followed by Model 3 decided through the higher multiple R-squared values.