EMAIL : “poojagsagar9896@gmail.com”

COLLEGE : “NIT Warangal”

Introduction

A video game is an electronic game that can be played on a computing device, such as a personal computer, gaming console or mobile phone. Depending on the platform, video games can be subcategorized into computer games and console games. In recent years however, the emergence of social networks, smartphones and tablets introduced new categories such as mobile and social games. Video games have come a long way since the first games emerged in the 1970s. Today’s video games offer photorealistic graphics and simulate reality to a degree which is astonishing in many cases.

Video games are a billion-dollar business and have been for many years. In 2016, the video game market in the United States was valued at 17.68 billion U.S. dollars. That same year U.S. consumers were said to spend roughly double the amount on gaming content, hardware and accessories. What is important is that the first generation of gamers is now grown up with significant spending power; therefore, despite high penetration rates among kids, video games can no longer be considered solely a child’s play. In fact, it was found that video gaming is gaining on popularity among the seniors in the United States. Fun and mental agility are among the main reasons cited by the older gamers as to why they choose this pastime.

Among the many prosperous representatives of the video game industry are the three major players that have been in the game for decades and remain in the leadership positions as of 2015. Those three are: Sony , Microsoft, and Nintendo. Sony’s PlayStation 4 is the bestseller among current generation consoles. By the end of 2015, Sony had sold 16.75 million units of the popular console. All three gaming brands are also the most recognized among gamers in the United States, with Nintendo being the frontrunner.

Overview

Video games often follow a predictable model of sales. Several factors must be taken into place before making any such model.

In the United States, NPD December accounts for nearly 25%, while October, November, and December account for 50% of the calendar year’s revenue. Additionally, history shows that from January to October within any given year the data for a system’s installed base is roughly a linear function. This is always subject to change if there is a price cut or a system seller game like Halo. In addition, November’s hardware numbers are often 2.5 times greater than October due to holiday’s gift-giving nature.

The NPD top 10 can account for 25-45% of the market (in dollars or units).The dropoff from #10 to #20 for a non-holiday month is around 60K to 100K. See for example, Feb 08 and April 08. The console specific top 10 (e.g. Top 10 PS3 games) tails out around 30-60,000 units. At least 33k is needed to reach the Top 100 on November 2008, a holiday month. Despite not regularly reaching the NPD top 10, Nintendo DS sells the most DS than any other system.

The grey market can be used to gauge supply issues. For certain cases, there is a strong correlation in monthly unit sales and grey market prices from auction sites like EBay.

North American console sales are usually at least twice as big as they are in Europe or Japan. Compare Wii and PS2 sales in Japan and USA.

Good Canada console sales, such as in NPD April 2007, fall along the range of 30,000-40,000 units.

Context

Motivated by Gregory Smith’s web scrape of VGChartz Video Games Sales, this data set simply extends the number of variables with another web scrape from Metacritic. Unfortunately, there are missing observations as Metacritic only covers a subset of the platforms. Also, a game may not have all the observations of the additional variables discussed below. Complete cases are ~ 6,900

Content

Alongside the fields: Name, Platform, Year_of_Release, Genre, Publisher, NA_Sales, EU_Sales, JP_Sales, Other_Sales, Global_Sales, we have:-

Critic_score - Aggregate score compiled by Metacritic staff Critic_count - The number of critics used in coming up with the Critic_score User_score - Score by Metacritic’s subscribers User_count - Number of users who gave the user_score Developer - Party responsible for creating the game Rating - The ESRB ratings

Some important analysis about the data are explained below: Derived from Video game sales from Vgchartz and corresponding ratings from Metacritic

->Reading dataset into R

vgame1.df <- read.csv(paste("Video Game Sales Data.csv.csv", sep=""))
View(vgame1.df)

-Dimensions of the dataset

dim(vgame1.df)
## [1] 16719    16

-Visualizing the data , Frequency count based on the year of release of a video game

table(vgame1.df$Year_of_Release)
## 
## 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 
##    9   46   36   17   14   14   21   16   15   17   16   41   43   62  121 
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 
##  219  263  289  379  338  350  482  829  775  762  939 1006 1197 1427 1426 
## 2010 2011 2012 2013 2014 2015 2016 2017 2020  N/A 
## 1255 1136  653  544  581  606  502    3    1  269

->Cleaning the data

-1) Viewing the subset-1 containing data of only year 2016

vgame2016<- vgame1.df[which(vgame1.df$Year_of_Release=="2016"), ]
View(vgame2016)

-Dimensions of subset-1(year 2016)

dim(vgame2016)
## [1] 502  16

-Summarizing the subset-1 (year 2016)

library(psych)
describe(vgame2016)
##                  vars   n    mean      sd  median trimmed     mad   min
## Name*               1 502 5739.26 3456.23 5894.00 5739.69 4452.25  3.00
## Platform*           2 502   19.83    7.47   19.00   20.43    2.97  3.00
## Year_of_Release*    3 502   37.00    0.00   37.00   37.00    0.00 37.00
## Genre*              4 502    5.85    3.89    5.00    5.53    4.45  2.00
## Publisher*          5 502  309.33  178.72  354.00  315.46  231.29  7.00
## NA_Sales            6 502    0.09    0.26    0.01    0.03    0.01  0.00
## EU_Sales            7 502    0.10    0.36    0.01    0.03    0.01  0.00
## JP_Sales            8 502    0.04    0.13    0.00    0.02    0.00  0.00
## Other_Sales         9 502    0.03    0.09    0.00    0.01    0.00  0.00
## Global_Sales       10 502    0.26    0.70    0.06    0.11    0.07  0.01
## Critic_Score       11 232   73.16   11.74   74.50   74.28   11.12 31.00
## Critic_Count       12 232   30.25   23.89   22.00   26.98   19.27  4.00
## User_Score*        13 502   41.07   36.42   49.50   39.55   52.63  1.00
## User_Count         14 262  264.97  671.24   57.00  126.51   69.68  5.00
## Developer*         15 502  545.22  580.25  378.00  484.34  558.94  1.00
## Rating*            16 502    3.86    3.13    3.00    3.58    2.97  1.00
##                       max    range  skew kurtosis     se
## Name*            11536.00 11533.00 -0.04    -1.25 154.26
## Platform*           31.00    28.00 -0.45     0.36   0.33
## Year_of_Release*    37.00     0.00   NaN      NaN   0.00
## Genre*              13.00    11.00  0.41    -1.45   0.17
## Publisher*         576.00   569.00 -0.26    -1.32   7.98
## NA_Sales             2.98     2.98  5.69    42.87   0.01
## EU_Sales             5.75     5.75  9.76   128.76   0.02
## JP_Sales             2.26     2.26 11.51   169.13   0.01
## Other_Sales          1.11     1.11  6.80    57.74   0.00
## Global_Sales         7.59     7.58  6.52    52.98   0.03
## Critic_Score        93.00    62.00 -0.85     0.64   0.77
## Critic_Count       113.00   109.00  1.12     0.66   1.57
## User_Score*         97.00    96.00  0.01    -1.69   1.63
## User_Count        7064.00  7059.00  6.43    52.11  41.47
## Developer*        1671.00  1670.00  0.60    -1.15  25.90
## Rating*              9.00     8.00  0.57    -1.29   0.14

-One-way Contingency tables of subset-1( year 2016)

mytable1<- with(vgame2016,table(Genre))
mytable1
## Genre
##                    Action    Adventure     Fighting         Misc 
##            0          178           56           16           32 
##     Platform       Puzzle       Racing Role-Playing      Shooter 
##           15            1           24           54           47 
##   Simulation       Sports     Strategy 
##           18           48           13
mytable2<- with(vgame2016,table(Platform))
mytable2
## Platform
## 2600  3DO  3DS   DC   DS   GB  GBA   GC  GEN   GG  N64  NES   NG   PC PCFX 
##    0    0   46    0    0    0    0    0    0    0    0    0    0   54    0 
##   PS  PS2  PS3  PS4  PSP  PSV  SAT  SCD SNES TG16  Wii WiiU   WS X360   XB 
##    0    0   38  164    0   85    0    0    0    0    1   14    0   13    0 
## XOne 
##   87
mytable3<- with(vgame2016,table(Rating))
mytable3
## Rating
##        AO    E E10+   EC  K-A    M   RP    T 
##  222    0   66   50    0    0   78    0   86

-Two-way Contingency tables of subset-1( year 2016)

mytable<-xtabs(~ Genre+Rating,data=vgame2016)
mytable
##               Rating
## Genre             AO  E E10+ EC K-A  M RP  T
##                 0  0  0    0  0   0  0  0  0
##   Action       88  0  9   20  0   0 34  0 27
##   Adventure    38  0  0    2  0   0 10  0  6
##   Fighting      5  0  0    1  0   0  0  0 10
##   Misc         16  0  5    6  0   0  1  0  4
##   Platform      2  0  2    8  0   0  0  0  3
##   Puzzle        0  0  0    1  0   0  0  0  0
##   Racing        7  0 17    0  0   0  0  0  0
##   Role-Playing 31  0  1    2  0   0  8  0 12
##   Shooter       5  0  0    7  0   0 25  0 10
##   Simulation   10  0  6    0  0   0  0  0  2
##   Sports       12  0 26    2  0   0  0  0  8
##   Strategy      8  0  0    1  0   0  0  0  4
mytable11<-xtabs(~ Genre+Platform,data=vgame2016)
mytable11
##               Platform
## Genre          2600 3DO 3DS DC DS GB GBA GC GEN GG N64 NES NG PC PCFX PS
##                   0   0   0  0  0  0   0  0   0  0   0   0  0  0    0  0
##   Action          0   0  22  0  0  0   0  0   0  0   0   0  0  7    0  0
##   Adventure       0   0   5  0  0  0   0  0   0  0   0   0  0  5    0  0
##   Fighting        0   0   1  0  0  0   0  0   0  0   0   0  0  1    0  0
##   Misc            0   0   5  0  0  0   0  0   0  0   0   0  0  0    0  0
##   Platform        0   0   2  0  0  0   0  0   0  0   0   0  0  1    0  0
##   Puzzle          0   0   0  0  0  0   0  0   0  0   0   0  0  0    0  0
##   Racing          0   0   0  0  0  0   0  0   0  0   0   0  0  6    0  0
##   Role-Playing    0   0   7  0  0  0   0  0   0  0   0   0  0  4    0  0
##   Shooter         0   0   0  0  0  0   0  0   0  0   0   0  0  9    0  0
##   Simulation      0   0   3  0  0  0   0  0   0  0   0   0  0  8    0  0
##   Sports          0   0   0  0  0  0   0  0   0  0   0   0  0  5    0  0
##   Strategy        0   0   1  0  0  0   0  0   0  0   0   0  0  8    0  0
##               Platform
## Genre          PS2 PS3 PS4 PSP PSV SAT SCD SNES TG16 Wii WiiU WS X360 XB
##                  0   0   0   0   0   0   0    0    0   0    0  0    0  0
##   Action         0  13  59   0  35   0   0    0    0   0    6  0    3  0
##   Adventure      0   2  14   0  25   0   0    0    0   0    1  0    1  0
##   Fighting       0   2   7   0   2   0   0    0    0   0    1  0    0  0
##   Misc           0   6  10   0   3   0   0    0    0   1    2  0    1  0
##   Platform       0   1   5   0   0   0   0    0    0   0    2  0    1  0
##   Puzzle         0   0   0   0   1   0   0    0    0   0    0  0    0  0
##   Racing         0   0   9   0   0   0   0    0    0   0    0  0    0  0
##   Role-Playing   0   5  18   0  15   0   0    0    0   0    1  0    0  0
##   Shooter        0   1  20   0   0   0   0    0    0   0    1  0    1  0
##   Simulation     0   0   4   0   1   0   0    0    0   0    0  0    0  0
##   Sports         0   8  16   0   2   0   0    0    0   0    0  0    6  0
##   Strategy       0   0   2   0   1   0   0    0    0   0    0  0    0  0
##               Platform
## Genre          XOne
##                   0
##   Action         33
##   Adventure       3
##   Fighting        2
##   Misc            4
##   Platform        3
##   Puzzle          0
##   Racing          9
##   Role-Playing    4
##   Shooter        15
##   Simulation      2
##   Sports         11
##   Strategy        1
mytable12<-xtabs(~ Rating+Platform,data=vgame2016)
mytable12
##       Platform
## Rating 2600 3DO 3DS DC DS GB GBA GC GEN GG N64 NES NG PC PCFX PS PS2 PS3
##           0   0  32  0  0  0   0  0   0  0   0   0  0 17    0  0   0  24
##   AO      0   0   0  0  0  0   0  0   0  0   0   0  0  0    0  0   0   0
##   E       0   0   5  0  0  0   0  0   0  0   0   0  0 12    0  0   0   4
##   E10+    0   0   4  0  0  0   0  0   0  0   0   0  0  2    0  0   0   4
##   EC      0   0   0  0  0  0   0  0   0  0   0   0  0  0    0  0   0   0
##   K-A     0   0   0  0  0  0   0  0   0  0   0   0  0  0    0  0   0   0
##   M       0   0   1  0  0  0   0  0   0  0   0   0  0 14    0  0   0   1
##   RP      0   0   0  0  0  0   0  0   0  0   0   0  0  0    0  0   0   0
##   T       0   0   4  0  0  0   0  0   0  0   0   0  0  9    0  0   0   5
##       Platform
## Rating PS4 PSP PSV SAT SCD SNES TG16 Wii WiiU WS X360 XB XOne
##         61   0  69   0   0    0    0   0    2  0    1  0   16
##   AO     0   0   0   0   0    0    0   0    0  0    0  0    0
##   E     20   0   0   0   0    0    0   0    2  0    5  0   18
##   E10+  16   0   4   0   0    0    0   1    7  0    4  0    8
##   EC     0   0   0   0   0    0    0   0    0  0    0  0    0
##   K-A    0   0   0   0   0    0    0   0    0  0    0  0    0
##   M     32   0   2   0   0    0    0   0    0  0    1  0   27
##   RP     0   0   0   0   0    0    0   0    0  0    0  0    0
##   T     35   0  10   0   0    0    0   0    3  0    2  0   18

-2) Viewing the subset-2 containing data of year 2015

vgame2015<- vgame1.df[which(vgame1.df$Year_of_Release=="2015"), ]
View(vgame2015)

-Dimensions of subset-2( year 2015)

dim(vgame2015)
## [1] 606  16

-Summarizing the subset-2 (year 2015)

library(psych)
describe(vgame2015)
##                  vars   n    mean      sd  median trimmed     mad   min
## Name*               1 606 5816.44 3367.04 5832.00 5832.54 4175.00  4.00
## Platform*           2 606   19.14    8.21   19.00   19.67    2.97  3.00
## Year_of_Release*    3 606   36.00    0.00   36.00   36.00    0.00 36.00
## Genre*              4 606    5.50    3.88    3.00    5.10    1.48  2.00
## Publisher*          5 606  309.81  178.25  354.00  316.18  231.29  7.00
## NA_Sales            6 606    0.18    0.49    0.02    0.07    0.03  0.00
## EU_Sales            7 606    0.16    0.48    0.02    0.06    0.03  0.00
## JP_Sales            8 606    0.06    0.19    0.01    0.02    0.01  0.00
## Other_Sales         9 606    0.05    0.16    0.01    0.02    0.01  0.00
## Global_Sales       10 606    0.44    1.10    0.09    0.20    0.10  0.01
## Critic_Score       11 225   72.87   12.44   74.00   74.05   10.38 19.00
## Critic_Count       12 225   32.31   24.35   26.00   29.15   22.24  4.00
## User_Score*        13 606   38.71   36.44   42.00   36.61   60.79  1.00
## User_Count         14 297  393.37 1166.83   65.00  139.76   78.58  4.00
## Developer*         15 606  522.15  603.88  180.00  453.34  265.39  1.00
## Rating*            16 606    3.64    3.13    3.00    3.31    2.97  1.00
##                       max    range  skew kurtosis     se
## Name*            11534.00 11530.00 -0.02    -1.20 136.78
## Platform*           31.00    28.00 -0.56    -0.13   0.33
## Year_of_Release*    36.00     0.00   NaN      NaN   0.00
## Genre*              13.00    11.00  0.58    -1.28   0.16
## Publisher*         574.00   567.00 -0.28    -1.23   7.24
## NA_Sales             6.03     6.03  6.17    51.98   0.02
## EU_Sales             6.12     6.12  7.64    76.86   0.02
## JP_Sales             2.79     2.79  9.42   109.84   0.01
## Other_Sales          2.38     2.38  8.20    93.11   0.01
## Global_Sales        14.63    14.62  6.40    58.93   0.04
## Critic_Score        96.00    77.00 -1.36     3.33   0.83
## Critic_Count       103.00    99.00  1.00     0.21   1.62
## User_Score*         97.00    96.00  0.12    -1.68   1.48
## User_Count       10665.00 10661.00  6.02    42.34  67.71
## Developer*        1677.00  1676.00  0.67    -1.15  24.53
## Rating*              9.00     8.00  0.73    -1.09   0.13

-One-way Contingency tables of subset-2( year 2015)

mytable1<- with(vgame2015,table(Genre))
mytable1
## Genre
##                    Action    Adventure     Fighting         Misc 
##            0          253           54           21           39 
##     Platform       Puzzle       Racing Role-Playing      Shooter 
##           13            6           18           78           34 
##   Simulation       Sports     Strategy 
##           15           59           16
mytable2<- with(vgame2015,table(Platform))
mytable2
## Platform
## 2600  3DO  3DS   DC   DS   GB  GBA   GC  GEN   GG  N64  NES   NG   PC PCFX 
##    0    0   86    0    0    0    0    0    0    0    0    0    0   50    0 
##   PS  PS2  PS3  PS4  PSP  PSV  SAT  SCD SNES TG16  Wii WiiU   WS X360   XB 
##    0    0   73  137    3  110    0    0    0    0    4   28    0   35    0 
## XOne 
##   80
mytable3<- with(vgame2015,table(Rating))
mytable3
## Rating
##        AO    E E10+   EC  K-A    M   RP    T 
##  291    0   87   51    0    0   71    0  106

-Two-way Contingency tables of subset-2(year 2015)

mytable<-xtabs(~ Genre+Rating,data=vgame2015)
mytable
##               Rating
## Genre               AO   E E10+  EC K-A   M  RP   T
##                  0   0   0    0   0   0   0   0   0
##   Action       132   0  19   29   0   0  35   0  38
##   Adventure     40   0   0    1   0   0   7   0   6
##   Fighting       7   0   0    0   0   0   3   0  11
##   Misc          19   0   3    8   0   0   1   0   8
##   Platform       2   0  10    1   0   0   0   0   0
##   Puzzle         3   0   2    1   0   0   0   0   0
##   Racing         7   0  10    1   0   0   0   0   0
##   Role-Playing  45   0   1    2   0   0   9   0  21
##   Shooter       12   0   0    1   0   0  16   0   5
##   Simulation     6   0   4    1   0   0   0   0   4
##   Sports         8   0  38    5   0   0   0   0   8
##   Strategy      10   0   0    1   0   0   0   0   5
mytable11<-xtabs(~ Genre+Platform,data=vgame2015)
mytable11
##               Platform
## Genre          2600 3DO 3DS DC DS GB GBA GC GEN GG N64 NES NG PC PCFX PS
##                   0   0   0  0  0  0   0  0   0  0   0   0  0  0    0  0
##   Action          0   0  39  0  0  0   0  0   0  0   0   0  0 16    0  0
##   Adventure       0   0   4  0  0  0   0  0   0  0   0   0  0  2    0  0
##   Fighting        0   0   2  0  0  0   0  0   0  0   0   0  0  1    0  0
##   Misc            0   0  10  0  0  0   0  0   0  0   0   0  0  2    0  0
##   Platform        0   0   4  0  0  0   0  0   0  0   0   0  0  0    0  0
##   Puzzle          0   0   4  0  0  0   0  0   0  0   0   0  0  0    0  0
##   Racing          0   0   0  0  0  0   0  0   0  0   0   0  0  3    0  0
##   Role-Playing    0   0  15  0  0  0   0  0   0  0   0   0  0  3    0  0
##   Shooter         0   0   0  0  0  0   0  0   0  0   0   0  0  5    0  0
##   Simulation      0   0   3  0  0  0   0  0   0  0   0   0  0  6    0  0
##   Sports          0   0   1  0  0  0   0  0   0  0   0   0  0  4    0  0
##   Strategy        0   0   4  0  0  0   0  0   0  0   0   0  0  8    0  0
##               Platform
## Genre          PS2 PS3 PS4 PSP PSV SAT SCD SNES TG16 Wii WiiU WS X360 XB
##                  0   0   0   0   0   0   0    0    0   0    0  0    0  0
##   Action         0  35  52   3  54   0   0    0    0   3   11  0   13  0
##   Adventure      0   7  10   0  21   0   0    0    0   0    0  0    4  0
##   Fighting       0   5   9   0   0   0   0    0    0   0    0  0    1  0
##   Misc           0   3   4   0   5   0   0    0    0   1    7  0    2  0
##   Platform       0   0   2   0   2   0   0    0    0   0    4  0    1  0
##   Puzzle         0   0   1   0   0   0   0    0    0   0    1  0    0  0
##   Racing         0   2   5   0   1   0   0    0    0   0    0  0    1  0
##   Role-Playing   0   5  25   0  20   0   0    0    0   0    3  0    0  0
##   Shooter        0   3  11   0   0   0   0    0    0   0    1  0    3  0
##   Simulation     0   1   2   0   1   0   0    0    0   0    0  0    1  0
##   Sports         0  12  15   0   4   0   0    0    0   0    1  0    9  0
##   Strategy       0   0   1   0   2   0   0    0    0   0    0  0    0  0
##               Platform
## Genre          XOne
##                   0
##   Action         27
##   Adventure       6
##   Fighting        3
##   Misc            5
##   Platform        0
##   Puzzle          0
##   Racing          6
##   Role-Playing    7
##   Shooter        11
##   Simulation      1
##   Sports         13
##   Strategy        1
mytable12<-xtabs(~ Rating+Platform,data=vgame2015)
mytable12
##       Platform
## Rating 2600 3DO 3DS DC DS GB GBA GC GEN GG N64 NES NG PC PCFX PS PS2 PS3
##           0   0  57  0  0  0   0  0   0  0   0   0  0 14    0  0   0  39
##   AO      0   0   0  0  0  0   0  0   0  0   0   0  0  0    0  0   0   0
##   E       0   0  10  0  0  0   0  0   0  0   0   0  0  9    0  0   0   9
##   E10+    0   0  13  0  0  0   0  0   0  0   0   0  0  3    0  0   0   5
##   EC      0   0   0  0  0  0   0  0   0  0   0   0  0  0    0  0   0   0
##   K-A     0   0   0  0  0  0   0  0   0  0   0   0  0  0    0  0   0   0
##   M       0   0   0  0  0  0   0  0   0  0   0   0  0 15    0  0   0   4
##   RP      0   0   0  0  0  0   0  0   0  0   0   0  0  0    0  0   0   0
##   T       0   0   6  0  0  0   0  0   0  0   0   0  0  9    0  0   0  16
##       Platform
## Rating PS4 PSP PSV SAT SCD SNES TG16 Wii WiiU WS X360 XB XOne
##         52   3  82   0   0    0    0   1    7  0   10  0   26
##   AO     0   0   0   0   0    0    0   0    0  0    0  0    0
##   E     18   0   4   0   0    0    0   2   11  0   11  0   13
##   E10+   8   0   3   0   0    0    0   1    7  0    4  0    7
##   EC     0   0   0   0   0    0    0   0    0  0    0  0    0
##   K-A    0   0   0   0   0    0    0   0    0  0    0  0    0
##   M     21   0   6   0   0    0    0   0    1  0    4  0   20
##   RP     0   0   0   0   0    0    0   0    0  0    0  0    0
##   T     38   0  15   0   0    0    0   0    2  0    6  0   14

-> Box plots of different variables independently showing the comaparison in 2016 & 2015

par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$NA_Sales, 
        main="Boxplot of  North America sales in 2016",
        col=c("yellow"),
        horizontal=TRUE,
        xlab="NA sales" ))
with(vgame2015,boxplot(vgame2015$NA_Sales, 
                       main="Boxplot of  North America sales in 2015",
                       col=c("yellow"),
                       horizontal=TRUE,
                       xlab="NA sales" ))

par(mfrow=c(2,1))
with(vgame2016, boxplot(vgame2016$EU_Sales, 
        main="Boxplot of  Europe sales in 2016",
        col=c("yellow"),xlim=c(0,3),
        horizontal=TRUE,
        xlab="EU sales" ))
with(vgame2015, boxplot(vgame2015$EU_Sales, 
                        main="Boxplot of  Europe sales in 2015",
                        col=c("yellow"),xlim=c(0,3),
                        horizontal=TRUE,
                        xlab="EU sales" ))

par(mfrow=c(2,1))
with(vgame2016, boxplot(vgame2016$JP_Sales, 
        main="Boxplot of  Japan sales in 2016",
        col=c("yellow"),
        horizontal=TRUE,
        xlab="JP sales" ))
with(vgame2015, boxplot(vgame2015$JP_Sales, 
                        main="Boxplot of  Japan sales in 2015",
                        col=c("yellow"),
                        horizontal=TRUE,
                        xlab="JP sales" ))

par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$Global_Sales, 
        main="Boxplot of  Global sales in 2016",
        col=c("yellow"),
        horizontal=TRUE,
        xlab="Global sales" ))
with(vgame2015,boxplot(vgame2015$Global_Sales, 
                       main="Boxplot of  Global sales in 2015",
                       col=c("yellow"),
                       horizontal=TRUE,
                       xlab="Global sales" ))

par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$Other_Sales, 
        main="Boxplot of Other sales in 2016",
        col=c("yellow"),
        horizontal=TRUE,
        xlab="Other sales" ))
with(vgame2015,boxplot(vgame2015$Other_Sales, 
                       main="Boxplot of  Other sales in 2015",
                       col=c("yellow"),
                       horizontal=TRUE,
                       xlab="Other sales" ))

–Changing User_score variable from factor vestor to numeric vector for year 2016

str(vgame2016)
## 'data.frame':    502 obs. of  16 variables:
##  $ Name           : Factor w/ 11563 levels "","'98 Koshien",..: 3121 7423 10726 1238 727 10403 3121 1238 3053 727 ...
##  $ Platform       : Factor w/ 31 levels "2600","3DO","3DS",..: 19 3 19 19 19 19 31 31 19 31 ...
##  $ Year_of_Release: Factor w/ 40 levels "1980","1981",..: 37 37 37 37 37 37 37 37 37 37 ...
##  $ Genre          : Factor w/ 13 levels "","Action","Adventure",..: 12 9 10 10 10 10 12 10 2 10 ...
##  $ Publisher      : Factor w/ 582 levels "10TACLE Studios",..: 140 371 467 17 140 536 140 17 536 140 ...
##  $ NA_Sales       : num  0.66 2.98 1.85 1.61 1.1 1.35 0.43 1.46 0.6 1.28 ...
##  $ EU_Sales       : num  5.75 1.45 2.5 2 2.15 1.7 2.05 0.74 1.25 0.77 ...
##  $ JP_Sales       : num  0.08 2.26 0.19 0.15 0.21 0.15 0 0 0.06 0 ...
##  $ Other_Sales    : num  1.11 0.45 0.85 0.71 0.61 0.6 0.17 0.22 0.35 0.2 ...
##  $ Global_Sales   : num  7.59 7.14 5.38 4.46 4.08 3.8 2.65 2.42 2.26 2.25 ...
##  $ Critic_Score   : int  85 NA 93 77 88 80 84 78 76 87 ...
##  $ Critic_Count   : int  41 NA 113 82 31 64 50 17 91 37 ...
##  $ User_Score     : Factor w/ 97 levels "","0","0.2","0.3",..: 49 1 78 33 83 69 54 30 62 81 ...
##  $ User_Count     : int  398 NA 7064 1129 809 2219 201 290 635 440 ...
##  $ Developer      : Factor w/ 1697 levels "","10tacle Studios",..: 455 1 1002 733 440 910 455 733 1561 440 ...
##  $ Rating         : Factor w/ 9 levels "","AO","E","E10+",..: 3 1 9 7 7 7 3 7 7 7 ...
vgame2016$User_Score<-as.integer(vgame2016$User_Score)
str(vgame2016$User_Score)
##  int [1:502] 49 1 78 33 83 69 54 30 62 81 ...

–Changing User_score variable from factor vestor to numeric vector for year 2015

str(vgame2015)
## 'data.frame':    606 obs. of  16 variables:
##  $ Name           : Factor w/ 11563 levels "","'98 Koshien",..: 1234 3120 9144 1234 2986 10729 3935 9043 2986 10206 ...
##  $ Platform       : Factor w/ 31 levels "2600","3DO","3DS",..: 19 19 19 31 19 19 31 27 31 19 ...
##  $ Year_of_Release: Factor w/ 40 levels "1980","1981",..: 36 36 36 36 36 36 36 36 36 36 ...
##  $ Genre          : Factor w/ 13 levels "","Action","Adventure",..: 10 12 10 10 9 2 10 10 9 9 ...
##  $ Publisher      : Factor w/ 582 levels "10TACLE Studios",..: 17 140 140 17 65 467 330 371 65 354 ...
##  $ NA_Sales       : num  6.03 1.12 2.99 4.59 2.53 2.07 2.78 1.54 2.51 1.02 ...
##  $ EU_Sales       : num  5.86 6.12 3.49 2.11 3.27 1.71 1.27 1.18 1.32 2.13 ...
##  $ JP_Sales       : num  0.36 0.06 0.22 0.01 0.24 0.08 0.03 1.46 0.01 0.23 ...
##  $ Other_Sales    : num  2.38 1.28 1.28 0.68 1.13 0.76 0.41 0.26 0.38 0.59 ...
##  $ Global_Sales   : num  14.63 8.57 7.98 7.39 7.16 ...
##  $ Critic_Score   : int  NA 82 NA NA 87 86 84 81 88 92 ...
##  $ Critic_Count   : int  NA 42 NA NA 58 78 101 88 39 79 ...
##  $ User_Score     : Factor w/ 97 levels "","0","0.2","0.3",..: 1 42 1 1 64 80 63 84 61 91 ...
##  $ User_Count     : int  NA 896 NA NA 4228 1264 2438 1184 1749 10179 ...
##  $ Developer      : Factor w/ 1697 levels "","10tacle Studios",..: 1 452 1 1 176 233 20 1035 176 282 ...
##  $ Rating         : Factor w/ 9 levels "","AO","E","E10+",..: 1 3 1 1 7 9 9 4 7 7 ...
vgame2015$User_Score<-as.integer(vgame2015$User_Score)
str(vgame2015$User_Score)
##  int [1:606] 1 42 1 1 64 80 63 84 61 91 ...
par(mfrow=c(2,2))
with(vgame2016, boxplot(vgame2016$Critic_Score, 
                        main="Boxplot of  Critic score in 2016",
                        col=c("yellow"),
                        horizontal=TRUE,
                        xlab="Critic score" ))
with(vgame2016, boxplot(vgame2016$User_Score, 
                        main="Boxplot of  User score in 2016",
                        col=c("yellow"),
                        horizontal=TRUE,
                        xlab="User score" ))
with(vgame2015, boxplot(vgame2015$Critic_Score, 
                        main="Boxplot of  Critic score in 2015",
                        col=c("yellow"),
                        horizontal=TRUE,
                        xlab="Critic score" ))
with(vgame2015, boxplot(vgame2015$User_Score, 
                        main="Boxplot of  User score in 2015",
                        col=c("yellow"),
                        horizontal=TRUE,
                        xlab="User score" ))

par(mfrow=c(2,2))
with(vgame2016, boxplot(vgame2016$Critic_Count, 
                        main="Boxplot of  Critic count in 2016",
                        col=c("yellow"),
                        horizontal=TRUE,
                        xlab="Critic count" ))
with(vgame2016, boxplot(vgame2016$User_Count, 
                        main="Boxplot of  User count in 2016",
                        col=c("yellow"),
                        horizontal=TRUE,
                        xlab="User count" ))
with(vgame2015, boxplot(vgame2015$Critic_Count, 
                        main="Boxplot of  Critic count in 2015",
                        col=c("yellow"),
                        horizontal=TRUE,
                        xlab="Critic count" ))
with(vgame2015, boxplot(vgame2015$User_Count, 
                        main="Boxplot of  User count in 2015",
                        col=c("yellow"),
                        horizontal=TRUE,
                        xlab="User count" ))

-> Boxplots of variables correlated pair-wise and comaparison of them based on years 2016 & 2015

par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$Global_Sales ~ vgame2016$Genre, data=vgame2016,        
                       horizontal=TRUE, yaxt="n", 
                      ylab="Genre", xlab="Global sales", col=c("yellow"),
                      main="Comparison of Global sales based on Genre of the video game in 2016"),
axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))
with(vgame2015,boxplot(vgame2015$Global_Sales ~ vgame2015$Genre, data=vgame2015,    
                       horizontal=TRUE, yaxt="n", 
                       ylab="Genre", xlab="Global sales", col=c("yellow"),
                       main="Comparison of Global sales based on Genre of the video game in 2015"),
     axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))

par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$Global_Sales ~ vgame2016$Rating, data=vgame2016, 
                       horizontal=TRUE, yaxt="n", 
                       ylab="Rating", xlab="Global sales", col=c("yellow"),
                       main="Comparison of Global sales on rating in 2016"),
     axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))
with(vgame2015,boxplot(vgame2015$Global_Sales ~ vgame2015$Rating, data=vgame2015, 
                       horizontal=TRUE, yaxt="n", 
                       ylab="Rating", xlab="Global sales", col=c("yellow"),
                       main="Comparison of Global sales on rating in 2015"),
     axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))

par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$Critic_Score ~ vgame2016$Rating, data=vgame2016, horizontal=TRUE, yaxt="n", 
                       ylab="Rating", xlab="Critic score", col=c("yellow"),
                       main="Comparison of Critic score based on Rating in 2016"),
     axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))
with(vgame2015,boxplot(vgame2015$Critic_Score~ vgame2015$Rating, data=vgame2015, horizontal=TRUE, yaxt="n", 
                       ylab="Rating", xlab="Critic score", col=c("yellow"),
                       main="Comparison of Critic score based on Rating in 2015"),
     axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))

par(mfrow=c(2,1))
with(vgame2016,boxplot(vgame2016$User_Score ~ vgame2016$Rating, data=vgame2016, horizontal=TRUE, yaxt="n", 
                       ylab="Rating", xlab="User score", col=c("yellow"),
                       main="Comparison of User score based on Rating in 2016"),
     axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))
with(vgame2015,boxplot(vgame2015$User_Score~ vgame2015$Rating, data=vgame2015, horizontal=TRUE, yaxt="n", 
                       ylab="Rating", xlab="User score", col=c("yellow"),
                       main="Comparison of User score basedon Rating in 2015"),
     axis(side=2, at=c(1,2,3,4,5,6,7,8,9,10,11,12) ))

-> Histograms of different variables correlated pair-wise in year 2016

library(lattice)
histogram(~Genre | Rating, data=vgame2016)

histogram(~Genre | Platform, data=vgame2016)

histogram(~Platform | Rating, data=vgame2016)

-> histograms of different variables correlated pair-wise in year 2015

library(lattice)
histogram(~Genre | Rating, data=vgame2015)

histogram(~Genre | Platform, data=vgame2015)

histogram(~Platform | Rating, data=vgame2015)

-> Scatterplots of variables showing comparison in sales in years 2016 & 2015

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplot(Global_Sales~ Critic_Score, data=vgame2016, spread=FALSE,
            smoother.args=list(lty=2),pch=19,
            main="Scatterplot of Global Sales vs. Critic score in 2016",
            xlab="Critic score",
            ylab="Global sales ",cex=0.6)

scatterplot(Global_Sales ~ User_Score, data=vgame2016, spread=FALSE,
            smoother.args=list(lty=2),pch=19,xlim=c(1,100),
            main="Scatterplot of Global sales vs. User SCore in 2016",
            xlab="User Score",
            ylab="Global sales ",cex=0.6)

scatterplot(Global_Sales~ Critic_Score, data=vgame2015, spread=FALSE,
            smoother.args=list(lty=2),pch=19,
            main="Scatterplot of Global sales vs. Critic score in 2015",
            xlab="Critic score",
            ylab="Global sales ",cex=0.6)

scatterplot(Global_Sales ~ User_Score, data=vgame2015, spread=FALSE,
            smoother.args=list(lty=2),pch=19,xlim=c(1,100),
            main="Scatterplot of Global sales vs. User SCore in 2015",
            xlab="User score",
            ylab="Global sales ",cex=0.6)

–Changing the Rating variables from factor vector to integer vector of year 2016

str(vgame2016)
## 'data.frame':    502 obs. of  16 variables:
##  $ Name           : Factor w/ 11563 levels "","'98 Koshien",..: 3121 7423 10726 1238 727 10403 3121 1238 3053 727 ...
##  $ Platform       : Factor w/ 31 levels "2600","3DO","3DS",..: 19 3 19 19 19 19 31 31 19 31 ...
##  $ Year_of_Release: Factor w/ 40 levels "1980","1981",..: 37 37 37 37 37 37 37 37 37 37 ...
##  $ Genre          : Factor w/ 13 levels "","Action","Adventure",..: 12 9 10 10 10 10 12 10 2 10 ...
##  $ Publisher      : Factor w/ 582 levels "10TACLE Studios",..: 140 371 467 17 140 536 140 17 536 140 ...
##  $ NA_Sales       : num  0.66 2.98 1.85 1.61 1.1 1.35 0.43 1.46 0.6 1.28 ...
##  $ EU_Sales       : num  5.75 1.45 2.5 2 2.15 1.7 2.05 0.74 1.25 0.77 ...
##  $ JP_Sales       : num  0.08 2.26 0.19 0.15 0.21 0.15 0 0 0.06 0 ...
##  $ Other_Sales    : num  1.11 0.45 0.85 0.71 0.61 0.6 0.17 0.22 0.35 0.2 ...
##  $ Global_Sales   : num  7.59 7.14 5.38 4.46 4.08 3.8 2.65 2.42 2.26 2.25 ...
##  $ Critic_Score   : int  85 NA 93 77 88 80 84 78 76 87 ...
##  $ Critic_Count   : int  41 NA 113 82 31 64 50 17 91 37 ...
##  $ User_Score     : int  49 1 78 33 83 69 54 30 62 81 ...
##  $ User_Count     : int  398 NA 7064 1129 809 2219 201 290 635 440 ...
##  $ Developer      : Factor w/ 1697 levels "","10tacle Studios",..: 455 1 1002 733 440 910 455 733 1561 440 ...
##  $ Rating         : Factor w/ 9 levels "","AO","E","E10+",..: 3 1 9 7 7 7 3 7 7 7 ...
vgame2016$Rating<-as.numeric(vgame2016$Rating)
str(vgame2016$Rating)
##  num [1:502] 3 1 9 7 7 7 3 7 7 7 ...

–Changing the Rating variables from factor vector to integer vector of year 2015

str(vgame2015)
## 'data.frame':    606 obs. of  16 variables:
##  $ Name           : Factor w/ 11563 levels "","'98 Koshien",..: 1234 3120 9144 1234 2986 10729 3935 9043 2986 10206 ...
##  $ Platform       : Factor w/ 31 levels "2600","3DO","3DS",..: 19 19 19 31 19 19 31 27 31 19 ...
##  $ Year_of_Release: Factor w/ 40 levels "1980","1981",..: 36 36 36 36 36 36 36 36 36 36 ...
##  $ Genre          : Factor w/ 13 levels "","Action","Adventure",..: 10 12 10 10 9 2 10 10 9 9 ...
##  $ Publisher      : Factor w/ 582 levels "10TACLE Studios",..: 17 140 140 17 65 467 330 371 65 354 ...
##  $ NA_Sales       : num  6.03 1.12 2.99 4.59 2.53 2.07 2.78 1.54 2.51 1.02 ...
##  $ EU_Sales       : num  5.86 6.12 3.49 2.11 3.27 1.71 1.27 1.18 1.32 2.13 ...
##  $ JP_Sales       : num  0.36 0.06 0.22 0.01 0.24 0.08 0.03 1.46 0.01 0.23 ...
##  $ Other_Sales    : num  2.38 1.28 1.28 0.68 1.13 0.76 0.41 0.26 0.38 0.59 ...
##  $ Global_Sales   : num  14.63 8.57 7.98 7.39 7.16 ...
##  $ Critic_Score   : int  NA 82 NA NA 87 86 84 81 88 92 ...
##  $ Critic_Count   : int  NA 42 NA NA 58 78 101 88 39 79 ...
##  $ User_Score     : int  1 42 1 1 64 80 63 84 61 91 ...
##  $ User_Count     : int  NA 896 NA NA 4228 1264 2438 1184 1749 10179 ...
##  $ Developer      : Factor w/ 1697 levels "","10tacle Studios",..: 1 452 1 1 176 233 20 1035 176 282 ...
##  $ Rating         : Factor w/ 9 levels "","AO","E","E10+",..: 1 3 1 1 7 9 9 4 7 7 ...
vgame2015$Rating<-as.numeric(vgame2015$Rating)
str(vgame2015$Rating)
##  num [1:606] 1 3 1 1 7 9 9 4 7 7 ...

->Correlation Matrix visualization

library(corrplot)    
## corrplot 0.84 loaded
corrplot(corr=cor(vgame2016[ ,6:14 ], use="complete.obs"), 
         method ="ellipse", main="correlation matrix of variables in 2016")

library(corrplot)    
corrplot(corr=cor(vgame2015[ ,6:14 ], use="complete.obs"), 
         method ="ellipse" , main="correlation matrix of variables in 2015")

->Corrogram

library(corrgram)
corrgram(vgame2016, order=FALSE, 
         lower.panel=panel.shade,
         upper.panel=panel.pie, 
         diag.panel=panel.minmax,
         text.panel=panel.txt,
         main="Corrgram of all the variables in 2016")

library(corrgram)
corrgram(vgame2015, order=FALSE, 
         lower.panel=panel.shade,
         upper.panel=panel.pie, 
         diag.panel=panel.minmax,
         text.panel=panel.txt,
         main="Corrgram of all the  variables in 2015")

->Scatterplot matrix

library(car)
scatterplotMatrix(formula = ~  Critic_Score + Critic_Count +
                    User_Score+ User_Count +Global_Sales , cex=0.6,
                  spread=FALSE, smoother.args=list(lty=2),pch=19,
                  data=vgame2016, diagonal="histogram", 
                  main="scatterplot matrix in 2016")

scatterplotMatrix(formula = ~  Critic_Score + Critic_Count +
                    User_Score+ User_Count +Global_Sales , cex=0.6,
                  spread=FALSE, smoother.args=list(lty=2),pch=19,
                  data=vgame2015, diagonal="histogram", 
                  main="scatterplot matrix in 2015")

-> Pearson’s chi-squared test not applied due to lack of definite categorical variables

->Appropriate dependent T- tests can be carried out for deciding the statistical significance of the dependency as follows

->NULL HYPOTHESIS: Global sales is independent of Critic score , Critic count, NA sales,EU sales, JP sales, Other sales and User count

attach(vgame2016)
t.test(Critic_Score,Global_Sales,paired=TRUE, data=vgame2016)
## 
##  Paired t-test
## 
## data:  Critic_Score and Global_Sales
## t = 96.234, df = 231, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  71.26568 74.24484
## sample estimates:
## mean of the differences 
##                72.75526
t.test(Critic_Count,Global_Sales,paired=TRUE, data=vgame2016)
## 
##  Paired t-test
## 
## data:  Critic_Count and Global_Sales
## t = 19.309, df = 231, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  26.80807 32.90072
## sample estimates:
## mean of the differences 
##                 29.8544
t.test(Other_Sales,Global_Sales,paired=TRUE, data=vgame2016)
## 
##  Paired t-test
## 
## data:  Other_Sales and Global_Sales
## t = -8.4585, df = 501, p-value = 2.989e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2838165 -0.1768209
## sample estimates:
## mean of the differences 
##              -0.2303187

-Since the p-value is very low (<0.001) , there does appear a significant relationship between the two variables.

-Hence, the t-test rejects the null hypothesis that the two variables are indepependent and is statistically significant

->Regression Model

Model -1

fit1<-lm(Global_Sales~Platform+Genre+Critic_Score+Critic_Count+
           User_Score+User_Count+Rating, data=vgame1.df)
summary(fit1)
## 
## Call:
## lm(formula = Global_Sales ~ Platform + Genre + Critic_Score + 
##     Critic_Count + User_Score + User_Count + Rating, data = vgame1.df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.540 -0.581 -0.165  0.288 79.639 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -7.998e-01  1.273e+00  -0.628  0.52978    
## PlatformDC        -4.174e-01  5.075e-01  -0.822  0.41084    
## PlatformDS         2.744e-01  1.634e-01   1.680  0.09308 .  
## PlatformGBA        1.617e-02  1.857e-01   0.087  0.93061    
## PlatformGC        -2.011e-01  1.721e-01  -1.168  0.24282    
## PlatformPC        -9.402e-01  1.626e-01  -5.782 7.69e-09 ***
## PlatformPS         9.016e-01  2.055e-01   4.388 1.16e-05 ***
## PlatformPS2        2.107e-01  1.534e-01   1.373  0.16967    
## PlatformPS3        1.318e-03  1.561e-01   0.008  0.99326    
## PlatformPS4       -3.834e-01  1.823e-01  -2.103  0.03551 *  
## PlatformPSP       -8.528e-02  1.679e-01  -0.508  0.61154    
## PlatformPSV       -3.645e-01  2.158e-01  -1.689  0.09130 .  
## PlatformWii        8.059e-01  1.626e-01   4.957 7.33e-07 ***
## PlatformWiiU      -3.091e-01  2.340e-01  -1.321  0.18658    
## PlatformX360      -1.544e-01  1.555e-01  -0.993  0.32068    
## PlatformXB        -3.696e-01  1.628e-01  -2.270  0.02323 *  
## PlatformXOne      -1.655e-01  2.004e-01  -0.826  0.40872    
## GenreAdventure    -2.378e-01  1.179e-01  -2.017  0.04370 *  
## GenreFighting     -3.015e-02  1.030e-01  -0.293  0.76981    
## GenreMisc          2.648e-01  1.025e-01   2.583  0.00980 ** 
## GenrePlatform      1.309e-02  1.039e-01   0.126  0.89975    
## GenrePuzzle       -4.225e-01  1.737e-01  -2.433  0.01500 *  
## GenreRacing        5.406e-02  9.150e-02   0.591  0.55469    
## GenreRole-Playing -2.635e-01  8.085e-02  -3.260  0.00112 ** 
## GenreShooter      -8.903e-03  7.592e-02  -0.117  0.90665    
## GenreSimulation    1.445e-01  1.142e-01   1.265  0.20575    
## GenreSports       -1.868e-02  8.676e-02  -0.215  0.82954    
## GenreStrategy     -2.766e-01  1.193e-01  -2.318  0.02049 *  
## Critic_Score       2.320e-02  2.207e-03  10.510  < 2e-16 ***
## Critic_Count       2.156e-02  1.466e-03  14.705  < 2e-16 ***
## User_Score0.6     -3.128e-01  2.155e+00  -0.145  0.88464    
## User_Score0.7     -1.040e+00  2.151e+00  -0.483  0.62877    
## User_Score0.9      2.648e-01  2.152e+00   0.123  0.90209    
## User_Score1       -2.908e-01  1.756e+00  -0.166  0.86852    
## User_Score1.2     -5.986e-01  1.757e+00  -0.341  0.73342    
## User_Score1.3      1.970e-01  2.153e+00   0.092  0.92709    
## User_Score1.4     -5.001e-01  1.605e+00  -0.312  0.75530    
## User_Score1.5     -1.889e-01  1.756e+00  -0.108  0.91433    
## User_Score1.7     -1.070e-01  1.388e+00  -0.077  0.93855    
## User_Score1.8     -2.657e-01  1.521e+00  -0.175  0.86133    
## User_Score1.9      7.800e-01  1.757e+00   0.444  0.65707    
## User_Score2        1.572e-01  1.389e+00   0.113  0.90985    
## User_Score2.1     -1.010e+00  1.374e+00  -0.735  0.46242    
## User_Score2.2     -1.275e+00  1.436e+00  -0.888  0.37445    
## User_Score2.3      2.121e-02  1.757e+00   0.012  0.99037    
## User_Score2.4     -1.980e-01  1.361e+00  -0.146  0.88430    
## User_Score2.5     -4.616e-01  1.374e+00  -0.336  0.73687    
## User_Score2.6      3.652e+00  1.522e+00   2.399  0.01648 *  
## User_Score2.7     -2.833e-01  1.409e+00  -0.201  0.84061    
## User_Score2.8     -2.879e-01  1.310e+00  -0.220  0.82601    
## User_Score2.9     -1.226e-01  1.409e+00  -0.087  0.93067    
## User_Score3       -2.428e-01  1.329e+00  -0.183  0.85498    
## User_Score3.1     -1.978e-02  1.306e+00  -0.015  0.98792    
## User_Score3.2      8.009e-01  1.389e+00   0.577  0.56421    
## User_Score3.3     -2.168e-03  1.329e+00  -0.002  0.99870    
## User_Score3.4      1.876e-01  1.329e+00   0.141  0.88778    
## User_Score3.5     -4.297e-01  1.297e+00  -0.331  0.74051    
## User_Score3.6     -1.333e-02  1.303e+00  -0.010  0.99183    
## User_Score3.7     -3.033e-01  1.310e+00  -0.232  0.81686    
## User_Score3.8     -2.406e-01  1.297e+00  -0.186  0.85280    
## User_Score3.9     -2.227e-01  1.341e+00  -0.166  0.86810    
## User_Score4       -3.964e-01  1.293e+00  -0.307  0.75918    
## User_Score4.1     -1.274e-01  1.283e+00  -0.099  0.92089    
## User_Score4.2     -6.936e-01  1.299e+00  -0.534  0.59335    
## User_Score4.3     -7.998e-02  1.288e+00  -0.062  0.95048    
## User_Score4.4     -4.466e-01  1.282e+00  -0.348  0.72768    
## User_Score4.5     -8.509e-01  1.285e+00  -0.662  0.50796    
## User_Score4.6     -2.440e-01  1.282e+00  -0.190  0.84908    
## User_Score4.7     -3.423e-01  1.295e+00  -0.264  0.79159    
## User_Score4.8     -1.460e-01  1.270e+00  -0.115  0.90851    
## User_Score4.9     -3.931e-01  1.275e+00  -0.308  0.75795    
## User_Score5       -2.563e-01  1.265e+00  -0.203  0.83948    
## User_Score5.1     -5.652e-01  1.277e+00  -0.443  0.65809    
## User_Score5.2     -5.047e-01  1.268e+00  -0.398  0.69054    
## User_Score5.3     -4.172e-01  1.263e+00  -0.330  0.74113    
## User_Score5.4     -4.327e-01  1.263e+00  -0.343  0.73188    
## User_Score5.5     -4.527e-01  1.263e+00  -0.358  0.71998    
## User_Score5.6     -4.046e-01  1.261e+00  -0.321  0.74841    
## User_Score5.7     -3.694e-01  1.261e+00  -0.293  0.76959    
## User_Score5.8     -4.754e-01  1.257e+00  -0.378  0.70527    
## User_Score5.9     -5.232e-01  1.260e+00  -0.415  0.67805    
## User_Score6       -5.941e-01  1.254e+00  -0.474  0.63571    
## User_Score6.1     -5.567e-01  1.259e+00  -0.442  0.65843    
## User_Score6.2     -6.702e-01  1.255e+00  -0.534  0.59348    
## User_Score6.3     -1.925e-01  1.253e+00  -0.154  0.87793    
## User_Score6.4     -4.478e-01  1.256e+00  -0.356  0.72148    
## User_Score6.5     -4.844e-01  1.254e+00  -0.386  0.69935    
## User_Score6.6     -2.928e-01  1.253e+00  -0.234  0.81524    
## User_Score6.7     -4.997e-01  1.255e+00  -0.398  0.69045    
## User_Score6.8     -6.781e-01  1.251e+00  -0.542  0.58771    
## User_Score6.9     -6.476e-01  1.253e+00  -0.517  0.60526    
## User_Score7       -7.265e-01  1.250e+00  -0.581  0.56117    
## User_Score7.1     -5.364e-01  1.252e+00  -0.429  0.66823    
## User_Score7.2     -6.429e-01  1.252e+00  -0.513  0.60769    
## User_Score7.3     -6.086e-01  1.250e+00  -0.487  0.62629    
## User_Score7.4     -5.271e-01  1.250e+00  -0.422  0.67336    
## User_Score7.5     -5.930e-01  1.249e+00  -0.475  0.63509    
## User_Score7.6     -6.017e-01  1.251e+00  -0.481  0.63045    
## User_Score7.7     -6.135e-01  1.250e+00  -0.491  0.62359    
## User_Score7.8     -6.442e-01  1.249e+00  -0.516  0.60595    
## User_Score7.9     -4.772e-01  1.250e+00  -0.382  0.70264    
## User_Score8       -2.664e-01  1.249e+00  -0.213  0.83116    
## User_Score8.1     -7.796e-01  1.250e+00  -0.624  0.53284    
## User_Score8.2     -7.409e-01  1.250e+00  -0.593  0.55324    
## User_Score8.3     -6.534e-01  1.250e+00  -0.523  0.60126    
## User_Score8.4     -6.174e-01  1.251e+00  -0.493  0.62170    
## User_Score8.5     -6.258e-01  1.250e+00  -0.500  0.61679    
## User_Score8.6     -5.690e-01  1.251e+00  -0.455  0.64934    
## User_Score8.7     -5.448e-01  1.252e+00  -0.435  0.66355    
## User_Score8.8     -8.526e-01  1.252e+00  -0.681  0.49602    
## User_Score8.9     -6.483e-01  1.254e+00  -0.517  0.60532    
## User_Score9       -4.506e-01  1.257e+00  -0.359  0.71994    
## User_Score9.1     -6.733e-01  1.260e+00  -0.534  0.59311    
## User_Score9.2     -5.577e-01  1.276e+00  -0.437  0.66199    
## User_Score9.3     -1.087e+00  1.286e+00  -0.845  0.39838    
## User_Score9.4     -4.807e-01  1.361e+00  -0.353  0.72395    
## User_Score9.5     -1.076e+00  1.476e+00  -0.729  0.46615    
## User_Score9.6     -1.416e+00  1.763e+00  -0.803  0.42190    
## User_Count         7.179e-04  4.352e-05  16.497  < 2e-16 ***
## RatingAO           1.958e-01  1.773e+00   0.110  0.91205    
## RatingE            7.574e-02  2.216e-01   0.342  0.73254    
## RatingE10+        -2.638e-01  2.236e-01  -1.180  0.23803    
## RatingK-A         -4.113e-01  1.779e+00  -0.231  0.81717    
## RatingM           -2.903e-01  2.217e-01  -1.309  0.19048    
## RatingRP           8.913e-01  1.276e+00   0.698  0.48504    
## RatingT           -2.800e-01  2.185e-01  -1.281  0.20008    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.753 on 6891 degrees of freedom
##   (9702 observations deleted due to missingness)
## Multiple R-squared:  0.1984, Adjusted R-squared:  0.1839 
## F-statistic: 13.65 on 125 and 6891 DF,  p-value: < 2.2e-16

-Converting factor variables into numeric vector

str(vgame1.df)
## 'data.frame':    16719 obs. of  16 variables:
##  $ Name           : Factor w/ 11563 levels "","'98 Koshien",..: 11059 9406 5573 11061 7417 9771 6693 11057 6696 2620 ...
##  $ Platform       : Factor w/ 31 levels "2600","3DO","3DS",..: 26 12 26 26 6 6 5 26 26 12 ...
##  $ Year_of_Release: Factor w/ 40 levels "1980","1981",..: 27 6 29 30 17 10 27 27 30 5 ...
##  $ Genre          : Factor w/ 13 levels "","Action","Adventure",..: 12 6 8 12 9 7 6 5 6 10 ...
##  $ Publisher      : Factor w/ 582 levels "10TACLE Studios",..: 371 371 371 371 371 371 371 371 371 371 ...
##  $ NA_Sales       : num  41.4 29.1 15.7 15.6 11.3 ...
##  $ EU_Sales       : num  28.96 3.58 12.76 10.93 8.89 ...
##  $ JP_Sales       : num  3.77 6.81 3.79 3.28 10.22 ...
##  $ Other_Sales    : num  8.45 0.77 3.29 2.95 1 0.58 2.88 2.84 2.24 0.47 ...
##  $ Global_Sales   : num  82.5 40.2 35.5 32.8 31.4 ...
##  $ Critic_Score   : int  76 NA 82 80 NA NA 89 58 87 NA ...
##  $ Critic_Count   : int  51 NA 73 73 NA NA 65 41 80 NA ...
##  $ User_Score     : Factor w/ 97 levels "","0","0.2","0.3",..: 79 1 82 79 1 1 84 65 83 1 ...
##  $ User_Count     : int  322 NA 709 192 NA NA 431 129 594 NA ...
##  $ Developer      : Factor w/ 1697 levels "","10tacle Studios",..: 1035 1 1035 1035 1 1 1035 1035 1035 1 ...
##  $ Rating         : Factor w/ 9 levels "","AO","E","E10+",..: 3 1 3 3 1 1 3 3 3 1 ...
vgame1.df$Platform<-as.numeric(vgame1.df$Platform)
vgame1.df$Genre<-as.numeric(vgame1.df$Genre)
vgame1.df$User_Score<-as.numeric(vgame1.df$User_Score)
vgame1.df$Rating<-as.numeric(vgame1.df$Rating)
vgame1.df$Publisher<-as.numeric(vgame1.df$Publisher)
vgame1.df$Developer<-as.numeric(vgame1.df$Developer)
str(vgame1.df)
## 'data.frame':    16719 obs. of  16 variables:
##  $ Name           : Factor w/ 11563 levels "","'98 Koshien",..: 11059 9406 5573 11061 7417 9771 6693 11057 6696 2620 ...
##  $ Platform       : num  26 12 26 26 6 6 5 26 26 12 ...
##  $ Year_of_Release: Factor w/ 40 levels "1980","1981",..: 27 6 29 30 17 10 27 27 30 5 ...
##  $ Genre          : num  12 6 8 12 9 7 6 5 6 10 ...
##  $ Publisher      : num  371 371 371 371 371 371 371 371 371 371 ...
##  $ NA_Sales       : num  41.4 29.1 15.7 15.6 11.3 ...
##  $ EU_Sales       : num  28.96 3.58 12.76 10.93 8.89 ...
##  $ JP_Sales       : num  3.77 6.81 3.79 3.28 10.22 ...
##  $ Other_Sales    : num  8.45 0.77 3.29 2.95 1 0.58 2.88 2.84 2.24 0.47 ...
##  $ Global_Sales   : num  82.5 40.2 35.5 32.8 31.4 ...
##  $ Critic_Score   : int  76 NA 82 80 NA NA 89 58 87 NA ...
##  $ Critic_Count   : int  51 NA 73 73 NA NA 65 41 80 NA ...
##  $ User_Score     : num  79 1 82 79 1 1 84 65 83 1 ...
##  $ User_Count     : int  322 NA 709 192 NA NA 431 129 594 NA ...
##  $ Developer      : num  1035 1 1035 1035 1 ...
##  $ Rating         : num  3 1 3 3 1 1 3 3 3 1 ...

Model-2 (Revised (Model-1)

fit1<-lm(Global_Sales~Platform+Genre+Critic_Score+Critic_Count+
           User_Score+User_Count+Rating, data=vgame1.df)
summary(fit1)
## 
## Call:
## lm(formula = Global_Sales ~ Platform + Genre + Critic_Score + 
##     Critic_Count + User_Score + User_Count + Rating, data = vgame1.df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.157 -0.568 -0.206  0.188 80.964 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -4.855e-01  1.486e-01  -3.266   0.0011 ** 
## Platform      2.596e-03  2.789e-03   0.931   0.3519    
## Genre        -1.274e-02  5.813e-03  -2.192   0.0284 *  
## Critic_Score  1.829e-02  2.131e-03   8.581   <2e-16 ***
## Critic_Count  2.023e-02  1.337e-03  15.131   <2e-16 ***
## User_Score   -2.961e-03  1.886e-03  -1.570   0.1164    
## User_Count    5.531e-04  4.092e-05  13.519   <2e-16 ***
## Rating       -7.631e-02  8.633e-03  -8.839   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.805 on 7009 degrees of freedom
##   (9702 observations deleted due to missingness)
## Multiple R-squared:  0.1352, Adjusted R-squared:  0.1343 
## F-statistic: 156.5 on 7 and 7009 DF,  p-value: < 2.2e-16

Model-3

fit2<-lm(Global_Sales~Platform+Genre+Publisher+Developer+Critic_Score+Critic_Count+
           User_Score+User_Count+Rating, data=vgame1.df)
summary(fit2)
## 
## Call:
## lm(formula = Global_Sales ~ Platform + Genre + Publisher + Developer + 
##     Critic_Score + Critic_Count + User_Score + User_Count + Rating, 
##     data = vgame1.df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.113 -0.565 -0.203  0.192 80.939 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -5.692e-01  1.550e-01  -3.672 0.000242 ***
## Platform      2.389e-03  2.789e-03   0.857 0.391540    
## Genre        -1.160e-02  5.834e-03  -1.988 0.046843 *  
## Publisher    -1.371e-04  1.211e-04  -1.133 0.257389    
## Developer     1.540e-04  4.580e-05   3.363 0.000776 ***
## Critic_Score  1.800e-02  2.135e-03   8.427  < 2e-16 ***
## Critic_Count  2.020e-02  1.338e-03  15.096  < 2e-16 ***
## User_Score   -2.762e-03  1.894e-03  -1.458 0.144913    
## User_Count    5.580e-04  4.092e-05  13.637  < 2e-16 ***
## Rating       -7.698e-02  8.641e-03  -8.909  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.804 on 7007 degrees of freedom
##   (9702 observations deleted due to missingness)
## Multiple R-squared:  0.1366, Adjusted R-squared:  0.1355 
## F-statistic: 123.2 on 9 and 7007 DF,  p-value: < 2.2e-16

Important insights from the Regression analysis

-In regression analysis,we desire for a regression model to have significant variables and to produce a high R-squared value. This low P value / high R2 combination indicates that changes in the predictors are related to changes in the response variable and that the model explains a lot of the response variability.

-The coefficients estimate the trends while R-squared represents the scatter around the regression line.

-The interpretations of the significant variables are the same for both high and low R-squared models.

-Low R-squared values are problematic when you need precise predictions.