Game Sales Prediction: ML in R
Objectives
Tremendous amount of games are being developed every year, bringing relaxation or excitement to people around the world. While game companies have made huge profits from successful games, the real money makers are like drops in the ocean. It would be cruicial to continuously create innovative and succsful games to maintain growth. The project is trying to explore what would be key factors to determine the sales of a game and to predict how many percentages of newly developed games could be successful.
The next section will briefly talk about the dataset and data wrangling process, and then head to the analysis and machine learning sections, which contain results and exerpts of some highlighted codes. At the end of each section, important insights are concluded for more intuitive understanding.
About The Dataset
Desciptions of variables
The dataset records video games sales in 2016. There are 16720 rows and 16 columns in the data set. Each column represents an aspect describing video games.
Name: Name of the game.
Platform: Name of the game machine. For example, PS2, Xbox, and so on.
Year_of_Release: It is a number describing when did the game release.
Genre: Type of the game. For example, shooting, action, and so on.
Publisher: The company that distributes the video game.
NA_Sales: The Video Games Sales in North American.
EU_Sales: The Video Games Sales in European.
JP_Sales: The Video Games Sales in Japan.
Other_Sales: The Video Games Sales in the rest of areas.
Global_Sales: The Video Games Sales for all markets.
Critic_Score: Number, describing the critic score.
Critic_Count: Number, describing how many players leave comments.
User_Score: Number, describing the Users’ satisfaction score.
User_Count: Number, describing how many players give score.
Developer: The company that design the game
Rating: The age requirement for buying the gamec like E, E10+,M,A.
Data cleaning
1. Removing rows with missing values: Missing values include no values (’’) and NA values (N/A).
2. Transforming non-numeric values: Although the User_Count and the User_Score columns are numeric, they are regarded as strings when being loaded into R. They should be converted to numeric values:
Videogame <- read.csv("VideoGames.csv",header = TRUE,na.strings = c("", "N/A"))
Videogame <- na.omit(Videogame)
Videogame$User_Count<-as.numeric(as.character(Videogame$User_Count))
Videogame$User_Score<-as.numeric(as.character(Videogame$User_Score))3. Refining categorical variables: The Publisher column contains many different values, indicating the participation of small publishers in game development. Yet they account for very little amount of influences while harm the analysis performance. As a result, only the top 6 publishers that have the highest sales are used in the visualization analysis:
Videogame$Publisher2=0
Videogame$Publisher2[(Videogame$Publisher=="Nintendo")|(Videogame$Publisher=="Activision")|(Videogame$Publisher=="Sony Computer Entertainment")|(Videogame$Publisher=="Electronic Arts")|(Videogame$Publisher=="Take-Two Interactive")|(Videogame$Publisher=="Ubisoft")]=1
Videogame_headpub <- Videogame[Videogame$Publisher2==1,]4. Transforming time variables: The impact of Year_of_Release is analyzed through year comparison, “before”(0) or “after”(1) year 2010, to evaluate the relevance of the game:
Videogame_headpub$year2[Videogame_headpub$Year_of_Release<"2010"]=0
Videogame_headpub$year2[(Videogame_headpub$Year_of_Release=="2010")|(Videogame_headpub$Year_of_Release>"2000")]=15. Removing variables based on business rules: It is believed that most games are well-known for its publishers rather than designer studios, which are also highly related to publishers who may invest in their growth. The Developer column is removed.
6. Removing dependent variables: Global sales are the sum of NA_Sales, EU_Sales, JP_Sales and Other_Sales. These four columns are removed to eliminate dependency in the analysis session:
Videogame_ms <- Videogame_headpub[,c(2,4,5,10,11,13,16,18)]As a result, the new dataset with 9 columns will be used in analysis.
Descriptive Analysis
Sales for each types of genre
Top 6 sales publisher
Name of top sales game
Regions
Years
Scores
##
## Call:
## lm(formula = Global_Sales ~ User_Score, data = Videogame)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.046 -0.663 -0.419 -0.008 81.654
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.08805 0.12050 -0.731 0.465
## User_Score 0.12047 0.01644 7.326 2.64e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.956 on 6823 degrees of freedom
## Multiple R-squared: 0.007805, Adjusted R-squared: 0.00766
## F-statistic: 53.68 on 1 and 6823 DF, p-value: 2.637e-13
##
## Call:
## lm(formula = Global_Sales ~ Critic_Score, data = Videogame)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.536 -0.680 -0.324 0.162 81.560
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.585797 0.119252 -13.3 <2e-16 ***
## Critic_Score 0.033632 0.001665 20.2 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.907 on 6823 degrees of freedom
## Multiple R-squared: 0.05643, Adjusted R-squared: 0.05629
## F-statistic: 408.1 on 1 and 6823 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Global_Sales ~ User_Score + Critic_Score, data = Videogame)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.865 -0.677 -0.313 0.175 81.608
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.28537 0.13241 -9.708 < 2e-16 ***
## User_Score -0.10179 0.01965 -5.179 2.29e-07 ***
## Critic_Score 0.03977 0.00204 19.488 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.904 on 6822 degrees of freedom
## Multiple R-squared: 0.06013, Adjusted R-squared: 0.05985
## F-statistic: 218.2 on 2 and 6822 DF, p-value: < 2.2e-16
Ratings
##
## Call:
## lm(formula = Global_Sales ~ Rating, data = Videogame)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.985 -0.651 -0.419 0.001 81.589
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.950 1.955 0.998 0.319
## RatingE -1.009 1.955 -0.516 0.606
## RatingE10+ -1.369 1.956 -0.700 0.484
## RatingK-A -0.030 2.764 -0.011 0.991
## RatingM -0.955 1.955 -0.488 0.625
## RatingRP -1.920 2.764 -0.695 0.487
## RatingT -1.371 1.955 -0.701 0.483
##
## Residual standard error: 1.955 on 6818 degrees of freedom
## Multiple R-squared: 0.009725, Adjusted R-squared: 0.008854
## F-statistic: 11.16 on 6 and 6818 DF, p-value: 1.984e-12
Conclusion
North American is the biggest video games’ market in terms of sales in 2016. That suggests that video games developers should pay close attentions to players’ preferences in NA because the biggest video game market is right there.
Top 6 publishers are: Electronic Arts, Nintendo, Activision, Sony Entertainment, Take-Two Interactive, and Ubisoft.
Top 10 Video Games in terms of global sales are: Wii Sports Resort, Wii Sports, Wii Play, Wii Fit, Wii Fit Plus, New Super Mario Bros. Wii, New Super Mario Bros, Mario Kart Wii, Mario Kart DS, and Kinect Adventures.
The most popular genre is Action.
The games released in 2006 has the most sales in 2016 compare to others. 2006 is an important year for video games industry.
Global Sales have a positive relationship with critic score and negative relationship with user score, which suggests that scores of a game would be an important indicator in the beta-test stage but critics’ ideas might be more valuable and precise.
Feature Selection
This section is used to explore the first question proposed in the objective: what would be key factors to determine the sales of a game? All columns will be evaluated to find out one or several the ultimate determinant(s) of game sales. The conclusion gives publishers and developers actionable strategies to achieve higher sales when developing new games.
Best subset selection, backward selection & foward selection
These methods are abandoned because of slow calculation and unreasonable outcome.
Example codes:
library(leaps)
Videogame_fwd=regsubsets(Global_Sales~.,data=Videogame_ms, method="forward")
smyVdgfwd <- summary(Videogame_fwd)
which.min(smyVdgfwd$cp)
coef(Videogame_fwd,9)Linear regression
Regression on all variables:
##
## Call:
## lm(formula = Global_Sales ~ Genre + Publisher + Critic_Score +
## User_Score + Rating + year2 + Platform, data = Videogame_ms)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.963 -0.889 -0.313 0.398 77.916
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127804 2.668861 -0.797 0.42536
## GenreAdventure -0.684862 0.332604 -2.059 0.03958
## GenreFighting -0.026767 0.359389 -0.074 0.94064
## GenreMisc 0.070000 0.225216 0.311 0.75597
## GenrePlatform 0.085755 0.242247 0.354 0.72337
## GenrePuzzle -1.497947 0.466733 -3.209 0.00135
## GenreRacing 0.396048 0.213168 1.858 0.06329
## GenreRole-Playing -0.556110 0.234352 -2.373 0.01771
## GenreShooter -0.069156 0.173910 -0.398 0.69091
## GenreSimulation 0.014346 0.261068 0.055 0.95618
## GenreSports -0.256675 0.181140 -1.417 0.15660
## GenreStrategy -0.669534 0.330550 -2.026 0.04291
## PublisherElectronic Arts -0.391123 0.156854 -2.494 0.01270
## PublisherNintendo 2.179888 0.238616 9.136 < 2e-16
## PublisherSony Computer Entertainment -0.298602 0.209249 -1.427 0.15369
## PublisherTake-Two Interactive -0.265373 0.203895 -1.302 0.19319
## PublisherUbisoft -0.322839 0.169069 -1.910 0.05630
## Critic_Score 0.064081 0.004925 13.011 < 2e-16
## User_Score -0.179535 0.046344 -3.874 0.00011
## RatingE -0.423682 2.590396 -0.164 0.87009
## RatingE10+ -0.984208 2.590864 -0.380 0.70407
## RatingK-A -1.543205 3.671321 -0.420 0.67427
## RatingM 0.041850 2.587557 0.016 0.98710
## RatingT -0.545363 2.588214 -0.211 0.83313
## year2 -0.215716 0.418383 -0.516 0.60618
## PlatformDC -0.435304 2.639764 -0.165 0.86903
## PlatformDS 0.863373 0.384300 2.247 0.02474
## PlatformGBA -0.444206 0.421528 -1.054 0.29207
## PlatformGC -0.247785 0.392205 -0.632 0.52759
## PlatformPC -0.167553 0.396707 -0.422 0.67280
## PlatformPS 1.252679 0.561305 2.232 0.02571
## PlatformPS2 1.108216 0.384191 2.885 0.00395
## PlatformPS3 0.994446 0.384604 2.586 0.00977
## PlatformPS4 1.214403 0.444027 2.735 0.00628
## PlatformPSP 0.672712 0.414869 1.622 0.10502
## PlatformPSV 0.421279 0.569899 0.739 0.45984
## PlatformWii 2.023709 0.379847 5.328 1.07e-07
## PlatformWiiU -0.642974 0.471218 -1.364 0.17252
## PlatformX360 1.017128 0.383329 2.653 0.00801
## PlatformXB 0.143322 0.400765 0.358 0.72066
## PlatformXOne 0.508469 0.467181 1.088 0.27652
##
## (Intercept)
## GenreAdventure *
## GenreFighting
## GenreMisc
## GenrePlatform
## GenrePuzzle **
## GenreRacing .
## GenreRole-Playing *
## GenreShooter
## GenreSimulation
## GenreSports
## GenreStrategy *
## PublisherElectronic Arts *
## PublisherNintendo ***
## PublisherSony Computer Entertainment
## PublisherTake-Two Interactive
## PublisherUbisoft .
## Critic_Score ***
## User_Score ***
## RatingE
## RatingE10+
## RatingK-A
## RatingM
## RatingT
## year2
## PlatformDC
## PlatformDS *
## PlatformGBA
## PlatformGC
## PlatformPC
## PlatformPS *
## PlatformPS2 **
## PlatformPS3 **
## PlatformPS4 **
## PlatformPSP
## PlatformPSV
## PlatformWii ***
## PlatformWiiU
## PlatformX360 **
## PlatformXB
## PlatformXOne
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.573 on 2771 degrees of freedom
## Multiple R-squared: 0.1732, Adjusted R-squared: 0.1613
## F-statistic: 14.51 on 40 and 2771 DF, p-value: < 2.2e-16
Regression on selected significant variables:
##
## Call:
## lm(formula = Global_Sales ~ User_Score + Critic_Score, data = Videogame_ms)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.447 -0.978 -0.504 0.206 81.275
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.284828 0.327126 -6.985 3.55e-12 ***
## User_Score -0.099100 0.043507 -2.278 0.0228 *
## Critic_Score 0.057013 0.004665 12.222 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.728 on 2809 degrees of freedom
## Multiple R-squared: 0.05768, Adjusted R-squared: 0.05701
## F-statistic: 85.97 on 2 and 2809 DF, p-value: < 2.2e-16
Regression on further selected significant variables based on the above results:
##
## Call:
## lm(formula = Global_Sales ~ User_Score, data = Videogame_ms)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.525 -0.977 -0.636 -0.027 81.207
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.09040 0.28058 -0.322 0.747
## User_Score 0.17666 0.03817 4.628 3.85e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.8 on 2810 degrees of freedom
## Multiple R-squared: 0.007566, Adjusted R-squared: 0.007213
## F-statistic: 21.42 on 1 and 2810 DF, p-value: 3.853e-06
One more regression on further selected significant variables based on the above results:
##
## Call:
## lm(formula = Global_Sales ~ Critic_Score, data = Videogame_ms)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.209 -0.985 -0.505 0.192 81.212
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.595806 0.297496 -8.726 <2e-16 ***
## Critic_Score 0.051503 0.003991 12.904 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.73 on 2810 degrees of freedom
## Multiple R-squared: 0.05594, Adjusted R-squared: 0.0556
## F-statistic: 166.5 on 1 and 2810 DF, p-value: < 2.2e-16
LASSO
Find out the best lamda value:
## [1] 0.01769035
Pre-test - Find out the mean square of differences:
## [1] 8.851365
Calculate the LASSO coefficients:
## 632 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) -2.149687474
## (Intercept) .
## Platform3DO .
## Platform3DS -0.604853323
## PlatformDS 0.008024551
## PlatformGB .
## PlatformGBA -1.107696019
## PlatformGC -0.972496850
## PlatformGEN .
## PlatformGG .
## PlatformN64 .
## PlatformNES .
## PlatformNG .
## PlatformPC -0.897320560
## PlatformPCFX .
## PlatformPS 0.383630832
## PlatformPS2 0.230696274
## PlatformPS3 0.145633558
## PlatformPS4 0.335223286
## PlatformPSP -0.108948124
## PlatformPSV -0.259836303
## PlatformSAT .
## PlatformSCD .
## PlatformSNES .
## PlatformTG16 .
## PlatformWii 1.163873186
## PlatformWiiU -1.226842028
## PlatformWS .
## PlatformX360 0.157701865
## PlatformXB -0.622844411
## PlatformXOne -0.168403907
## GenreAdventure -0.531941432
## GenreFighting .
## GenreMisc 0.061187938
## GenrePlatform 0.061524263
## GenrePuzzle -1.201832979
## GenreRacing 0.372708741
## GenreRole-Playing -0.436093593
## GenreShooter .
## GenreSimulation .
## GenreSports -0.155643195
## GenreStrategy -0.549035712
## Publisher1C Company .
## Publisher20th Century Fox Video Games .
## Publisher2D Boy .
## Publisher3DO .
## Publisher49Games .
## Publisher505 Games .
## Publisher5pb .
## Publisher7G//AMES .
## Publisher989 Sports .
## Publisher989 Studios .
## PublisherAbylight .
## PublisherAcclaim Entertainment .
## PublisherAccolade .
## PublisherAckkstudios .
## PublisherAcquire .
## PublisherActivision 0.232229446
## PublisherActivision Blizzard .
## PublisherActivision Value .
## PublisherAdeline Software .
## PublisherAerosoft .
## PublisherAgatsuma Entertainment .
## PublisherAgetec .
## PublisherAksys Games .
## PublisherAlawar Entertainment .
## PublisherAlchemist .
## PublisherAlternative Software .
## PublisherAltron .
## PublisherAlvion .
## PublisherAmerican Softworks .
## PublisherAngel Studios .
## PublisherAnswer Software .
## PublisherAQ Interactive .
## PublisherAqua Plus .
## PublisherAques .
## PublisherArc System Works .
## PublisherArena Entertainment .
## PublisherAria .
## PublisherArika .
## PublisherArtDink .
## PublisherAruze Corp .
## PublisherASC Games .
## PublisherAscaron Entertainment .
## PublisherAscaron Entertainment GmbH .
## PublisherASCII Entertainment .
## PublisherASCII Media Works .
## PublisherAsgard .
## PublisherASK .
## PublisherAsmik Ace Entertainment .
## PublisherAsmik Corp .
## PublisherAspyr .
## PublisherAstragon .
## PublisherAsylum Entertainment .
## PublisherAtari .
## PublisherAthena .
## PublisherAtlus .
## PublisherAvalon Interactive .
## PublisherAvanquest .
## PublisherAvanquest Software .
## PublisherAxela .
## PublisherBAM! Entertainment .
## PublisherBanpresto .
## PublisherBenesse .
## PublisherBerkeley .
## PublisherBethesda Softworks .
## PublisherBig Ben Interactive .
## PublisherBig Fish Games .
## PublisherBigben Interactive .
## PublisherbitComposer Games .
## PublisherBlack Bean Games .
## PublisherBlack Label Games .
## PublisherBlast! Entertainment Ltd .
## PublisherBlue Byte .
## PublisherBMG Interactive Entertainment .
## PublisherBohemia Interactive .
## PublisherBomb .
## PublisherBoost On .
## PublisherBPS .
## PublisherBrash Entertainment .
## PublisherBroccoli .
## PublisherBushiRoad .
## PublisherCapcom .
## PublisherCave .
## PublisherCBS Electronics .
## PublisherCCP .
## PublisherCDV Software Entertainment .
## PublisherChunSoft .
## PublisherCity Interactive .
## PublisherCloud Imperium Games Corporation .
## PublisherCoconuts Japan .
## PublisherCodemasters .
## PublisherCodemasters Online .
## PublisherCokeM Interactive .
## PublisherColeco .
## PublisherComfort .
## PublisherCommseed .
## PublisherCompile .
## PublisherCompile Heart .
## PublisherConspiracy Entertainment .
## PublisherCore Design Ltd. .
## PublisherCPG Products .
## PublisherCrave Entertainment .
## PublisherCreative Core .
## PublisherCrimson Cow .
## PublisherCrystal Dynamics .
## PublisherCrytek .
## PublisherCTO SpA .
## PublisherCulture Brain .
## PublisherCulture Publishers .
## PublisherCyberFront .
## PublisherCygames .
## PublisherD3Publisher .
## PublisherDaedalic .
## PublisherDaedalic Entertainment .
## PublisherDaito .
## PublisherData Age .
## PublisherData Design Interactive .
## PublisherData East .
## PublisherDatam Polystar .
## PublisherDeep Silver .
## PublisherDestination Software, Inc .
## PublisherDestineer .
## PublisherDetn8 Games .
## PublisherDevolver Digital .
## PublisherDHM Interactive .
## PublisherDigiCube .
## PublisherDisney Interactive Studios .
## PublisherDorart .
## Publisherdramatic create .
## PublisherDreamCatcher Interactive .
## PublisherDreamWorks Interactive .
## PublisherDSI Games .
## PublisherDTP Entertainment .
## PublisherDusenberry Martin Racing .
## PublisherEA Games .
## PublisherEasy Interactive .
## PublisherEcole .
## PublisherEdia .
## PublisherEidos Interactive .
## PublisherElectronic Arts -0.087929924
## PublisherElectronic Arts Victor .
## PublisherElf .
## PublisherElite .
## PublisherEmpire Interactive .
## PublisherEncore .
## PublisherEnix Corporation .
## PublisherEnjoy Gaming ltd. .
## PublisherEnterbrain .
## PublisherEON Digital Entertainment .
## PublisherEpic Games .
## PublisherEpoch .
## PublisherErtain .
## PublisherESP .
## PublisherEssential Games .
## PublisherEvolution Games .
## PublisherEvolved Games .
## PublisherExcalibur Publishing .
## PublisherExperience Inc. .
## PublisherExtreme Entertainment Group .
## PublisherFalcom Corporation .
## PublisherFields .
## PublisherFlashpoint Games .
## PublisherFlight-Plan .
## PublisherFocus Home Interactive .
## PublisherFocus Multimedia .
## Publisherfonfun .
## PublisherForeign Media Games .
## PublisherFortyfive .
## PublisherFox Interactive .
## PublisherFrom Software .
## PublisherFuji .
## PublisherFunbox Media .
## PublisherFuncom .
## PublisherFunSoft .
## PublisherFunsta .
## PublisherFuRyu .
## PublisherFuRyu Corporation .
## PublisherG.Rev .
## PublisherGaga .
## PublisherGainax Network Systems .
## PublisherGakken .
## PublisherGame Arts .
## PublisherGame Factory .
## PublisherGame Life .
## PublisherGamebridge .
## PublisherGamecock .
## PublisherGameloft .
## PublisherGameMill Entertainment .
## PublisherGames Workshop .
## PublisherGameTek .
## PublisherGathering of Developers .
## PublisherGearbox Software .
## PublisherGeneral Entertainment .
## PublisherGenki .
## PublisherGenterprise .
## PublisherGhostlight .
## PublisherGiga .
## PublisherGiza10 .
## PublisherGlams .
## PublisherGlobal A Entertainment .
## PublisherGlobal Star .
## PublisherGN Software .
## PublisherGOA .
## PublisherGotham Games .
## PublisherGraffiti .
## PublisherGrand Prix Games .
## PublisherGraphsim Entertainment .
## PublisherGremlin Interactive Ltd .
## PublisherGriffin International .
## PublisherGroove Games .
## PublisherGSP .
## PublisherGT Interactive .
## PublisherGungHo .
## PublisherGust .
## PublisherHackberry .
## PublisherHAL Laboratory .
## PublisherHamster Corporation .
## PublisherHappinet .
## PublisherHarmonix Music Systems .
## PublisherHasbro Interactive .
## PublisherHavas Interactive .
## PublisherHeadup Games .
## PublisherHearty Robin .
## PublisherHect .
## PublisherHello Games .
## PublisherHer Interactive .
## PublisherHip Interactive .
## PublisherHMH Interactive .
## PublisherHome Entertainment Suppliers .
## PublisherHudson Entertainment .
## PublisherHudson Soft .
## PublisherHuman Entertainment .
## PublisherHuneX .
## PublisherIceberg Interactive .
## Publisherid Software .
## PublisherIdea Factory .
## PublisherIdea Factory International .
## PublisherIE Institute .
## PublisherIgnition Entertainment .
## PublisherIllusion Softworks .
## PublisherImadio .
## PublisherImage Epoch .
## Publisherimageepoch Inc. .
## PublisherImageworks .
## PublisherImagic .
## PublisherImagineer .
## PublisherImax .
## PublisherIndie Games .
## PublisherInfogrames .
## PublisherInsomniac Games .
## PublisherInterchannel .
## PublisherInterchannel-Holon .
## PublisherInterplay .
## PublisherInterplay Productions .
## PublisherInterworks Unlimited, Inc. .
## PublisherInti Creates .
## PublisherIntroversion Software .
## PublisherinXile Entertainment .
## PublisherIrem Software Engineering .
## PublisherITT Family Games .
## PublisherIvolgamus .
## PublisheriWin .
## PublisherJack of All Games .
## PublisherJaleco .
## PublisherJester Interactive .
## PublisherJorudan .
## PublisherJoWood Productions .
## PublisherJust Flight .
## PublisherJVC .
## PublisherKadokawa Games .
## PublisherKadokawa Shoten .
## PublisherKaga Create .
## PublisherKalypso Media .
## PublisherKamui .
## PublisherKando Games .
## PublisherKarin Entertainment .
## PublisherKemco .
## PublisherKID .
## PublisherKids Station .
## PublisherKing Records .
## PublisherKnowledge Adventure .
## PublisherKoch Media .
## PublisherKokopeli Digital Studios .
## PublisherKonami Digital Entertainment .
## PublisherKool Kizz .
## PublisherKSS .
## PublisherLaguna .
## PublisherLegacy Interactive .
## PublisherLEGO Media .
## PublisherLevel 5 .
## PublisherLexicon Entertainment .
## PublisherLicensed 4U .
## PublisherLighthouse Interactive .
## PublisherLiquid Games .
## PublisherLittle Orbit .
## PublisherLocus .
## PublisherLSP Games .
## PublisherLucasArts .
## PublisherMad Catz .
## PublisherMagical Company .
## PublisherMagix .
## PublisherMajesco Entertainment .
## PublisherMamba Games .
## PublisherMarvel Entertainment .
## PublisherMarvelous Entertainment .
## PublisherMarvelous Games .
## PublisherMarvelous Interactive .
## PublisherMasque Publishing .
## PublisherMastertronic .
## PublisherMastiff .
## PublisherMattel Interactive .
## PublisherMax Five .
## PublisherMaximum Family Games .
## PublisherMaxis .
## PublisherMC2 Entertainment .
## PublisherMedia Entertainment .
## PublisherMedia Factory .
## PublisherMedia Rings .
## PublisherMedia Works .
## PublisherMediaQuest .
## PublisherMen-A-Vision .
## PublisherMentor Interactive .
## PublisherMercury Games .
## PublisherMerscom LLC .
## PublisherMetro 3D .
## PublisherMichaelsoft .
## PublisherMicro Cabin .
## PublisherMicroids .
## PublisherMicroprose .
## PublisherMicrosoft Game Studios .
## PublisherMidas Interactive Entertainment .
## PublisherMidway Games .
## PublisherMilestone .
## PublisherMilestone S.r.l .
## PublisherMilestone S.r.l. .
## PublisherMinato Station .
## PublisherMindscape .
## PublisherMirai Shounen .
## PublisherMisawa .
## PublisherMitsui .
## Publishermixi, Inc .
## PublisherMLB.com .
## PublisherMojang .
## PublisherMonte Christo Multimedia .
## PublisherMoss .
## PublisherMTO .
## PublisherMTV Games .
## PublisherMud Duck Productions .
## PublisherMumbo Jumbo .
## PublisherMycom .
## PublisherMyelin Media .
## PublisherMystique .
## PublisherNamco Bandai Games .
## PublisherNatsume .
## PublisherNavarre Corp .
## PublisherNaxat Soft .
## PublisherNCS .
## PublisherNCSoft .
## PublisherNDA Productions .
## PublisherNEC .
## PublisherNEC Interchannel .
## PublisherNeko Entertainment .
## PublisherNetRevo .
## PublisherNew .
## PublisherNew World Computing .
## PublisherNewKidCo .
## PublisherNexon .
## PublisherNichibutsu .
## PublisherNihon Falcom Corporation .
## PublisherNintendo 2.301811270
## PublisherNippon Amuse .
## PublisherNippon Columbia .
## PublisherNippon Ichi Software .
## PublisherNippon Telenet .
## PublisherNitroplus .
## PublisherNobilis .
## PublisherNordcurrent .
## PublisherNordic Games .
## PublisherNovaLogic .
## PublisherNumber None .
## PublisherO-Games .
## PublisherO3 Entertainment .
## PublisherOcean .
## PublisherOffice Create .
## PublisherOn Demand .
## PublisherOngakukan .
## PublisherOrigin Systems .
## PublisherOtomate .
## PublisherOxygen Interactive .
## PublisherP2 Games .
## PublisherPacific Century Cyber Works .
## PublisherPack In Soft .
## PublisherPack-In-Video .
## PublisherPalcom .
## PublisherPanther Software .
## PublisherPaon .
## PublisherPaon Corporation .
## PublisherParadox Development .
## PublisherParadox Interactive .
## PublisherParker Bros. .
## PublisherPerformance Designed Products .
## PublisherPhantagram .
## PublisherPhantom EFX .
## PublisherPhenomedia .
## PublisherPhoenix Games .
## PublisherPiacci .
## PublisherPinnacle .
## PublisherPioneer LDC .
## PublisherPlay It .
## PublisherPlaylogic Game Factory .
## PublisherPlaymates .
## PublisherPlaymore .
## PublisherPlayV .
## PublisherPlenty .
## PublisherPM Studios .
## PublisherPony Canyon .
## PublisherPopCap Games .
## PublisherPopcorn Arcade .
## PublisherPopTop Software .
## PublisherPow .
## PublisherPQube .
## PublisherPrincess Soft .
## PublisherPrototype .
## PublisherPsygnosis .
## PublisherQuelle .
## PublisherQuest .
## PublisherQuinrose .
## PublisherQuintet .
## PublisherRage Software .
## PublisherRebellion .
## PublisherRebellion Developments .
## PublisherRED Entertainment .
## PublisherRed Flagship .
## PublisherRed Orb .
## PublisherRed Storm Entertainment .
## PublisherRedOctane .
## PublisherReef Entertainment .
## PublisherresponDESIGN .
## PublisherRevolution (Japan) .
## PublisherRevolution Software .
## PublisherRising Star Games .
## PublisherRiverhillsoft .
## PublisherRocket Company .
## PublisherRondomedia .
## PublisherRTL .
## PublisherRussel .
## PublisherSammy Corporation .
## PublisherSaurus .
## PublisherScholastic Inc. .
## PublisherSCi .
## PublisherScreenlife .
## PublisherSCS Software .
## PublisherSears .
## PublisherSega .
## PublisherSeta Corporation .
## PublisherSeventh Chord .
## PublisherShogakukan .
## PublisherSimon & Schuster Interactive .
## PublisherSlightly Mad Studios .
## PublisherSlitherine Software .
## PublisherSNK .
## PublisherSNK Playmore .
## PublisherSocieta .
## PublisherSold Out .
## PublisherSonnet .
## PublisherSony Computer Entertainment .
## PublisherSony Computer Entertainment America .
## PublisherSony Computer Entertainment Europe .
## PublisherSony Music Entertainment .
## PublisherSony Online Entertainment .
## PublisherSouthPeak Games .
## PublisherSpike .
## PublisherSPS .
## PublisherSquare .
## PublisherSquare EA .
## PublisherSquare Enix .
## PublisherSquare Enix .
## PublisherSquareSoft .
## PublisherSSI .
## PublisherStainless Games .
## PublisherStarfish .
## PublisherStarpath Corp. .
## PublisherSting .
## PublisherStorm City Games .
## PublisherStrategy First .
## PublisherSuccess .
## PublisherSummitsoft .
## PublisherSunflowers .
## PublisherSunrise Interactive .
## PublisherSunsoft .
## PublisherSweets .
## PublisherSwing! Entertainment .
## PublisherSyscom .
## PublisherSystem 3 .
## PublisherSystem 3 Arcade Software .
## PublisherSystem Soft .
## PublisherT&E Soft .
## PublisherTaito .
## PublisherTakara .
## PublisherTakara Tomy .
## PublisherTake-Two Interactive .
## PublisherTakuyo .
## PublisherTalonSoft .
## PublisherTDK Core .
## PublisherTDK Mediactive .
## PublisherTeam17 Software .
## PublisherTechnos Japan Corporation .
## PublisherTechnoSoft .
## PublisherTecmo Koei .
## PublisherTelegames .
## PublisherTelltale Games .
## PublisherTelstar .
## PublisherTetris Online .
## PublisherTGL .
## PublisherThe Adventure Company .
## PublisherThe Learning Company .
## PublisherTHQ .
## PublisherTigervision .
## PublisherTime Warner Interactive .
## PublisherTitus .
## PublisherTivola .
## PublisherTOHO .
## PublisherTommo .
## PublisherTomy Corporation .
## PublisherTopWare Interactive .
## PublisherTouchstone .
## PublisherTradewest .
## PublisherTrion Worlds .
## PublisherTripwire Interactive .
## PublisherTru Blu Entertainment .
## PublisherTryfirst .
## PublisherTYO .
## PublisherType-Moon .
## PublisherU.S. Gold .
## PublisherUbisoft -0.025976015
## PublisherUbisoft Annecy .
## PublisherUEP Systems .
## PublisherUFO Interactive .
## PublisherUIG Entertainment .
## PublisherUltravision .
## PublisherUniversal Gamex .
## PublisherUniversal Interactive .
## PublisherUnknown .
## PublisherValcon Games .
## PublisherValuSoft .
## PublisherValve .
## PublisherValve Software .
## PublisherVap .
## PublisherVatical Entertainment .
## PublisherVic Tokai .
## PublisherVictor Interactive .
## PublisherVideo System .
## PublisherViews .
## PublisherVir2L Studios .
## PublisherVirgin Interactive .
## PublisherVirtual Play Games .
## PublisherVisco .
## PublisherVivendi Games .
## PublisherWanadoo .
## PublisherWarashi .
## PublisherWargaming.net .
## PublisherWarner Bros. Interactive Entertainment .
## PublisherWarp .
## PublisherWayForward Technologies .
## PublisherWestwood Studios .
## PublisherWhite Park Bay Software .
## PublisherWizard Video Games .
## PublisherXicat Interactive .
## PublisherXing Entertainment .
## PublisherXplosiv .
## PublisherXS Games .
## PublisherXseed Games .
## PublisherYacht Club Games .
## PublisherYamasa Entertainment .
## PublisherYeti .
## PublisherYuke's .
## PublisherYumedia .
## PublisherZenrin .
## PublisherZoo Digital Publishing .
## PublisherZoo Games .
## PublisherZushi Games .
## Critic_Score 0.060310881
## User_Score -0.145462445
## RatingE .
## RatingE10+ -0.467661542
## RatingEC .
## RatingK-A .
## RatingM 0.453774079
## RatingRP .
## RatingT -0.065534613
## year2 -0.122773959
Conclusion
In the linear regression method, significant predictors are: Genre (Adventure, Puzzle, Role-Playing, Strategy), Publishers (Electronic Arts, Nintendo), Critic Score, User Score, and Platform (DS, PS, PS2, PS3, PS4, Wii, X360). Through further selection, User Score and Critic Score are the ultimate determinants.
In the LASSO method, sifnificant predictors (absolute coefficient >=1) are: Publisher (Nitendo), Platform (GBA, Wii, WiiU).
Comparing three models, BSS consumes the most computing capacibility and is not suitable for the dataset which has many categorical values. The linear regression model generates many sinificant factors while it suffers the possibility of overfitting. The LASSO regression model is more efficient than the LR and points out reasonable significant variables.
To Conclude, Genre doesn’t count that much, each Genre have many successful games. But people don’t really prefer the puzzle genre. Compared to the user score, most people care more about the critic score. Platforms do count, but only the most famous and recent ones are important. Also, people care more about the M rating. The most successful publisher in terms of influencing sales is Nintendo.
Classification
This section is used to explore the second question: how many percentages of newly developed games could be successful? Compared to the first question, it is more like a belated analysis when games have been released and analysts try to predict potential success rate. The conclusion points out an important principle that can be applied to the game industry.
Logistic regression
Pre-test and calculate the accuracy of the model:
##
## Call:
## glm(formula = aboveave ~ Genre + Publisher + Critic_Score + User_Score +
## Rating + year2 + Platform, data = Videogame_aboveave_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.8692 -0.2641 -0.1073 0.2108 1.1572
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.143263 0.400562 0.358 0.7207
## GenreAdventure -0.127195 0.074462 -1.708 0.0878
## GenreFighting -0.066647 0.076443 -0.872 0.3834
## GenreMisc 0.035490 0.044634 0.795 0.4267
## GenrePlatform 0.012314 0.050415 0.244 0.8071
## GenrePuzzle 0.137997 0.110776 1.246 0.2131
## GenreRacing 0.042515 0.044579 0.954 0.3404
## GenreRole-Playing 0.002250 0.050319 0.045 0.9643
## GenreShooter 0.009588 0.036040 0.266 0.7902
## GenreSimulation 0.011152 0.052816 0.211 0.8328
## GenreSports -0.093297 0.036955 -2.525 0.0117
## GenreStrategy -0.151406 0.070026 -2.162 0.0308
## PublisherElectronic Arts -0.014159 0.032582 -0.435 0.6639
## PublisherNintendo 0.277494 0.049413 5.616 2.37e-08
## PublisherSony Computer Entertainment -0.063683 0.043106 -1.477 0.1398
## PublisherTake-Two Interactive -0.018069 0.042567 -0.424 0.6713
## PublisherUbisoft -0.070852 0.034673 -2.043 0.0412
## Critic_Score 0.012547 0.001026 12.228 < 2e-16
## User_Score -0.012411 0.009465 -1.311 0.1900
## RatingE -0.660588 0.378519 -1.745 0.0812
## RatingE10+ -0.760603 0.378491 -2.010 0.0447
## RatingM -0.602867 0.377665 -1.596 0.1107
## RatingT -0.663170 0.377746 -1.756 0.0794
## year2 -0.066637 0.089993 -0.740 0.4591
## PlatformDS 0.024159 0.079918 0.302 0.7625
## PlatformGBA -0.158507 0.085687 -1.850 0.0646
## PlatformGC -0.147752 0.079318 -1.863 0.0627
## PlatformPC -0.175034 0.080036 -2.187 0.0289
## PlatformPS 0.110989 0.116765 0.951 0.3420
## PlatformPS2 0.108597 0.077732 1.397 0.1626
## PlatformPS3 0.063097 0.077113 0.818 0.4134
## PlatformPS4 0.208977 0.090336 2.313 0.0209
## PlatformPSP -0.037464 0.083834 -0.447 0.6550
## PlatformPSV 0.023207 0.115023 0.202 0.8401
## PlatformWii 0.132103 0.077099 1.713 0.0869
## PlatformWiiU -0.128639 0.103322 -1.245 0.2133
## PlatformX360 0.079737 0.076859 1.037 0.2997
## PlatformXB -0.118720 0.080981 -1.466 0.1429
## PlatformXOne 0.082686 0.092429 0.895 0.3712
##
## (Intercept)
## GenreAdventure .
## GenreFighting
## GenreMisc
## GenrePlatform
## GenrePuzzle
## GenreRacing
## GenreRole-Playing
## GenreShooter
## GenreSimulation
## GenreSports *
## GenreStrategy *
## PublisherElectronic Arts
## PublisherNintendo ***
## PublisherSony Computer Entertainment
## PublisherTake-Two Interactive
## PublisherUbisoft *
## Critic_Score ***
## User_Score
## RatingE .
## RatingE10+ *
## RatingM
## RatingT .
## year2
## PlatformDS
## PlatformGBA .
## PlatformGC .
## PlatformPC *
## PlatformPS
## PlatformPS2
## PlatformPS3
## PlatformPS4 *
## PlatformPSP
## PlatformPSV
## PlatformWii .
## PlatformWiiU
## PlatformX360
## PlatformXB
## PlatformXOne
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1393984)
##
## Null deviance: 255.18 on 1405 degrees of freedom
## Residual deviance: 190.56 on 1367 degrees of freedom
## AIC: 1260.1
##
## Number of Fisher Scoring iterations: 2
## [1] 0.6763869
## glm.pred
## 0 1
## 0.8926031 0.1073969
Delete insignificant variables and calculate the accuracy of the model:
##
## Call:
## glm(formula = aboveave ~ Genre + Publisher + Critic_Score + Rating +
## Platform, data = Videogame_aboveave_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.8696 -0.2681 -0.1097 0.2100 1.1505
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 0.0488268 0.3893676 0.125
## GenreAdventure -0.1266492 0.0744174 -1.702
## GenreFighting -0.0615466 0.0762089 -0.808
## GenreMisc 0.0376508 0.0446126 0.844
## GenrePlatform 0.0142628 0.0503949 0.283
## GenrePuzzle 0.1360882 0.1107721 1.229
## GenreRacing 0.0468567 0.0444723 1.054
## GenreRole-Playing 0.0028222 0.0503101 0.056
## GenreShooter 0.0088230 0.0360388 0.245
## GenreSimulation 0.0150672 0.0527202 0.286
## GenreSports -0.0906058 0.0368800 -2.457
## GenreStrategy -0.1473952 0.0698586 -2.110
## PublisherElectronic Arts -0.0150700 0.0325699 -0.463
## PublisherNintendo 0.2693414 0.0490875 5.487
## PublisherSony Computer Entertainment -0.0680271 0.0430136 -1.582
## PublisherTake-Two Interactive -0.0217251 0.0424945 -0.511
## PublisherUbisoft -0.0757430 0.0345056 -2.195
## Critic_Score 0.0118393 0.0008621 13.733
## RatingE -0.6635888 0.3785501 -1.753
## RatingE10+ -0.7643686 0.3785206 -2.019
## RatingM -0.6050426 0.3777001 -1.602
## RatingT -0.6673439 0.3777647 -1.767
## PlatformDS 0.0155456 0.0796478 0.195
## PlatformGBA -0.1696815 0.0852617 -1.990
## PlatformGC -0.1587524 0.0788756 -2.013
## PlatformPC -0.1734553 0.0800375 -2.167
## PlatformPS 0.1447707 0.0972912 1.488
## PlatformPS2 0.0997233 0.0770975 1.293
## PlatformPS3 0.0634319 0.0771198 0.823
## PlatformPS4 0.2194714 0.0900217 2.438
## PlatformPSP -0.0469681 0.0835173 -0.562
## PlatformPSV 0.0206291 0.1150094 0.179
## PlatformWii 0.1266967 0.0769896 1.646
## PlatformWiiU -0.1293070 0.1033275 -1.251
## PlatformX360 0.0779728 0.0768503 1.015
## PlatformXB -0.1281546 0.0806599 -1.589
## PlatformXOne 0.0924544 0.0921576 1.003
## Pr(>|t|)
## (Intercept) 0.9002
## GenreAdventure 0.0890 .
## GenreFighting 0.4195
## GenreMisc 0.3988
## GenrePlatform 0.7772
## GenrePuzzle 0.2195
## GenreRacing 0.2922
## GenreRole-Playing 0.9553
## GenreShooter 0.8066
## GenreSimulation 0.7751
## GenreSports 0.0141 *
## GenreStrategy 0.0350 *
## PublisherElectronic Arts 0.6437
## PublisherNintendo 4.87e-08 ***
## PublisherSony Computer Entertainment 0.1140
## PublisherTake-Two Interactive 0.6093
## PublisherUbisoft 0.0283 *
## Critic_Score < 2e-16 ***
## RatingE 0.0798 .
## RatingE10+ 0.0436 *
## RatingM 0.1094
## RatingT 0.0775 .
## PlatformDS 0.8453
## PlatformGBA 0.0468 *
## PlatformGC 0.0443 *
## PlatformPC 0.0304 *
## PlatformPS 0.1370
## PlatformPS2 0.1961
## PlatformPS3 0.4109
## PlatformPS4 0.0149 *
## PlatformPSP 0.5740
## PlatformPSV 0.8577
## PlatformWii 0.1001
## PlatformWiiU 0.2110
## PlatformX360 0.3105
## PlatformXB 0.1123
## PlatformXOne 0.3159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1394275)
##
## Null deviance: 255.18 on 1405 degrees of freedom
## Residual deviance: 190.88 on 1369 degrees of freedom
## AIC: 1258.4
##
## Number of Fisher Scoring iterations: 2
## [1] 0.6799431
## glm.pred
## 0 1
## 0.8933144 0.1066856
KNN
When K=1, the accuracy of the model and the successful rate should be:
## [1] 0.886202
## knn.pred.avgsales
## 0 1
## 0.8008535 0.1991465
When K=5,the accuracy of the model and the successful rate should be:
## [1] 0.8947368
## knn.pred.avgsales5
## 0 1
## 0.8421053 0.1578947
When K=10,the accuracy of the model and the successful rate should be:
## [1] 0.8911807
## knn.pred.avgsales10
## 0 1
## 0.8556188 0.1443812
Conclusion
In the logistic regression method, the percentage of video games will be successful is 10.67%, while the accuracy is only 67.99%. The result needs further validation.
In the KNN method, when K=1, the successful rate is 19.91% and the accuracy is 88.62%. When K=5, the successful rate is 15.79%, the accuracy is 89.47%. When K=10, the successful rate is 14.44%, and the accuracy is 89.12%.
Comparing the two models, they all predict that the successful rate should be between 10% and 20% while KNN have a higher accuracy rate. The percentage result demonstrated by KNN implies a typical business principle: Pareto principle, or 80/20 principle which indicates that roughly 80% effects come from 20% of contents/contributors. Although games might be thought to have long-tail markets, the result suggests that it might be another mass market and investors could use this principle in the game industry to support investment decisions.