1.Introduction

Video games are popular all over the world. They are enjoyed by all ages. Video game industry is huge and the spending on video games per year is huge too. Sales of different types of games vary widely between countries due to local preferences. According to the market research firm SuperData, as of May 2015, the global games market was worth USD 74.2 billion. By region, North America accounted for 23.6 billion dollars, Asia for 23.1 billion dollars, Europe for 22.1 billion dollars and South America for 4.5 billion dollars. There are different genres, publisher and platforms for video games. This project relates to the sales of these video games based on different regions and analyzes the sales. Also I have analyzed which genre, platform or publisher is the most popular and has maximum number of sales.

2.Overview of the project

In this the main goal was to analyze the sales of video games in different regions. The regions are North America, Europe, Japan, other countries(comined) and then the global sales(total of all the regions). The main idea was to visualize the sales for different genres, publishers and platforms. This would give the basic idea about the most popular genres, publishers and platforms amongst all. Also analyzing the effect of genres on sales in different regions.

3.Data

For this project the data was collected from Kaggle(www.kaggle.com). This data gives us the idea about the sales of video games in different regions of the world. The distribution is with respect to genres, publishers and platforms.

Name: Name of the video game

Platform: Platform on which the game was released or is playable

Year: Year in which the game was released

Genre: Genre the game belongs to

Publisher: Name of the publisher who created the game

NA_Sales: Sales in North America

EU_Sales: Sales in Europe

JP_Sales: Sales in Japan

Other_Sales: Sales in other countries

Global_Sales: Global Sales

vgsales <- read.csv("C:/Program Files/RStudio/files/vgsales.csv")
summary(vgsales)
##       Rank                                Name          Platform   
##  Min.   :    1   Need for Speed: Most Wanted:   12   DS     :2163  
##  1st Qu.: 4151   FIFA 14                    :    9   PS2    :2161  
##  Median : 8300   LEGO Marvel Super Heroes   :    9   PS3    :1329  
##  Mean   : 8301   Madden NFL 07              :    9   Wii    :1325  
##  3rd Qu.:12450   Ratatouille                :    9   X360   :1265  
##  Max.   :16600   Angry Birds Star Wars      :    8   PSP    :1213  
##                  (Other)                    :16542   (Other):7142  
##       Year               Genre                             Publisher    
##  2009   :1431   Action      :3316   Electronic Arts             : 1351  
##  2008   :1428   Sports      :2346   Activision                  :  975  
##  2010   :1259   Misc        :1739   Namco Bandai Games          :  932  
##  2007   :1202   Role-Playing:1488   Ubisoft                     :  921  
##  2011   :1139   Shooter     :1310   Konami Digital Entertainment:  832  
##  2006   :1008   Adventure   :1286   THQ                         :  715  
##  (Other):9131   (Other)     :5113   (Other)                     :10872  
##     NA_Sales          EU_Sales          JP_Sales         Other_Sales      
##  Min.   : 0.0000   Min.   : 0.0000   Min.   : 0.00000   Min.   : 0.00000  
##  1st Qu.: 0.0000   1st Qu.: 0.0000   1st Qu.: 0.00000   1st Qu.: 0.00000  
##  Median : 0.0800   Median : 0.0200   Median : 0.00000   Median : 0.01000  
##  Mean   : 0.2647   Mean   : 0.1467   Mean   : 0.07778   Mean   : 0.04806  
##  3rd Qu.: 0.2400   3rd Qu.: 0.1100   3rd Qu.: 0.04000   3rd Qu.: 0.04000  
##  Max.   :41.4900   Max.   :29.0200   Max.   :10.22000   Max.   :10.57000  
##                                                                           
##   Global_Sales    
##  Min.   : 0.0100  
##  1st Qu.: 0.0600  
##  Median : 0.1700  
##  Mean   : 0.5374  
##  3rd Qu.: 0.4700  
##  Max.   :82.7400  
## 
library(psych)
describe(vgsales)
##              vars     n    mean      sd  median trimmed     mad  min
## Rank            1 16598 8300.61 4791.85 8300.50 8300.56 6152.05 1.00
## Name*           2 16598 5795.86 3324.01 5864.50 5810.22 4270.63 1.00
## Platform*       3 16598   16.71    8.29   17.00   16.67   10.38 1.00
## Year*           4 16598   27.61    6.00   28.00   28.00    5.93 1.00
## Genre*          5 16598    5.93    3.76    6.00    5.86    5.93 1.00
## Publisher*      6 16598  299.40  181.98  329.00  303.97  272.80 1.00
## NA_Sales        7 16598    0.26    0.82    0.08    0.13    0.12 0.00
## EU_Sales        8 16598    0.15    0.51    0.02    0.06    0.03 0.00
## JP_Sales        9 16598    0.08    0.31    0.00    0.02    0.00 0.00
## Other_Sales    10 16598    0.05    0.19    0.01    0.02    0.01 0.00
## Global_Sales   11 16598    0.54    1.56    0.17    0.27    0.21 0.01
##                   max    range  skew kurtosis    se
## Rank         16600.00 16599.00  0.00    -1.20 37.19
## Name*        11493.00 11492.00 -0.03    -1.21 25.80
## Platform*       31.00    30.00 -0.05    -1.00  0.06
## Year*           40.00    39.00 -0.86     1.68  0.05
## Genre*          12.00    11.00  0.07    -1.43  0.03
## Publisher*     579.00   578.00 -0.15    -1.40  1.41
## NA_Sales        41.49    41.49 18.80   648.86  0.01
## EU_Sales        29.02    29.02 18.87   755.71  0.00
## JP_Sales        10.22    10.22 11.20   194.15  0.00
## Other_Sales     10.57    10.57 24.23  1024.92  0.00
## Global_Sales    82.74    82.73 17.40   603.68  0.01

==> Distribution of games with respect to consoles:

plot(vgsales$Platform, main = "Platform based distribution", xlab = "Platform", ylab = "No. of games", col="cyan")

==> Distribution of games with respect to genres:

plot(vgsales$Genre, main = "Genre based distribution", xlab = "Genre", ylab = "No. of games", col="cyan")

[ Here we see that the Action genre is the most popular amongst all ]

==> Distribution of games with respect to publishers:

plot(vgsales$Publisher, main = "Publisher based distribution", xlab = "Publisher", ylab = "No. of games")

==> Hypothesis: There is no significant change in NA_Sales with respect to genre.

[ In the following data the ones having p-value<0.05 do not have a significant change but the rest change the sales significantly.]

fit <- lm( NA_Sales ~ Genre , data = vgsales)
summary(fit)
## 
## Call:
## lm(formula = NA_Sales ~ Genre, data = vgsales)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.505 -0.236 -0.153 -0.015 41.199 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.264726   0.014081  18.801  < 2e-16 ***
## GenreAdventure    -0.182455   0.026636  -6.850 7.65e-12 ***
## GenreFighting     -0.001058   0.031202  -0.034   0.9729    
## GenreMisc         -0.028820   0.024007  -1.200   0.2300    
## GenrePlatform      0.239846   0.030664   7.822 5.53e-15 ***
## GenrePuzzle       -0.052045   0.036440  -1.428   0.1532    
## GenreRacing        0.023041   0.026919   0.856   0.3921    
## GenreRole-Playing -0.044779   0.025300  -1.770   0.0768 .  
## GenreShooter       0.180007   0.026460   6.803 1.06e-11 ***
## GenreSimulation   -0.053295   0.030928  -1.723   0.0849 .  
## GenreSports        0.026557   0.021875   1.214   0.2247    
## GenreStrategy     -0.163845   0.034113  -4.803 1.58e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8108 on 16586 degrees of freedom
## Multiple R-squared:  0.01493,    Adjusted R-squared:  0.01428 
## F-statistic: 22.86 on 11 and 16586 DF,  p-value: < 2.2e-16

==> Hypothesis: There is no significant change in EU_Sales with respect to genre.

[ In the following data the ones having p-value<0.05 do not have a significant change but the rest change the sales significantly.]

fit <- lm( EU_Sales ~ Genre , data = vgsales)
summary(fit)
## 
## Call:
## lm(formula = EU_Sales ~ Genre, data = vgsales)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.2391 -0.1409 -0.1064 -0.0299 28.8594 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.158323   0.008736  18.123  < 2e-16 ***
## GenreAdventure    -0.108455   0.016526  -6.563 5.44e-11 ***
## GenreFighting     -0.038842   0.019358  -2.006 0.044821 *  
## GenreMisc         -0.034125   0.014894  -2.291 0.021966 *  
## GenrePlatform      0.069250   0.019025   3.640 0.000273 ***
## GenrePuzzle       -0.071072   0.022608  -3.144 0.001672 ** 
## GenreRacing        0.032541   0.016701   1.948 0.051380 .  
## GenreRole-Playing -0.031939   0.015697  -2.035 0.041893 *  
## GenreShooter       0.080814   0.016416   4.923 8.62e-07 ***
## GenreSimulation   -0.027551   0.019189  -1.436 0.151088    
## GenreSports        0.002312   0.013572   0.170 0.864742    
## GenreStrategy     -0.091745   0.021164  -4.335 1.47e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5031 on 16586 degrees of freedom
## Multiple R-squared:  0.00971,    Adjusted R-squared:  0.009054 
## F-statistic: 14.79 on 11 and 16586 DF,  p-value: < 2.2e-16

==> Hypothesis: There is no significant change in JP_Sales with respect to genre.

[ In the following data the ones having p-value<0.05 do not have a significant change but the rest change the sales significantly.]

fit <- lm( JP_Sales ~ Genre , data = vgsales)
summary(fit)
## 
## Call:
## lm(formula = JP_Sales ~ Genre, data = vgsales)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.2368 -0.0620 -0.0482 -0.0282  9.9832 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.048236   0.005282   9.132  < 2e-16 ***
## GenreAdventure    -0.007746   0.009992  -0.775 0.438225    
## GenreFighting      0.054771   0.011705   4.679  2.9e-06 ***
## GenreMisc          0.013731   0.009006   1.525 0.127352    
## GenrePlatform      0.099360   0.011503   8.638  < 2e-16 ***
## GenrePuzzle        0.050235   0.013670   3.675 0.000239 ***
## GenreRacing       -0.002848   0.010098  -0.282 0.777958    
## GenreRole-Playing  0.188532   0.009491  19.865  < 2e-16 ***
## GenreShooter      -0.019014   0.009926  -1.916 0.055427 .  
## GenreSimulation    0.025236   0.011602   2.175 0.029635 *  
## GenreSports        0.009467   0.008206   1.154 0.248659    
## GenreStrategy      0.024393   0.012797   1.906 0.056643 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3042 on 16586 degrees of freedom
## Multiple R-squared:  0.03352,    Adjusted R-squared:  0.03288 
## F-statistic: 52.29 on 11 and 16586 DF,  p-value: < 2.2e-16

==> Hypothesis: There is no significant change in Other_Sales with respect to genre.

[ In the following data the ones having p-value<0.05 do not have a significant change but the rest change the sales significantly.]

fit <- lm( Other_Sales ~ Genre , data = vgsales)
summary(fit)
## 
## Call:
## lm(formula = Other_Sales ~ Genre, data = vgsales)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.0784 -0.0475 -0.0333 -0.0067 10.5135 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.056508   0.003262  17.321  < 2e-16 ***
## GenreAdventure    -0.043436   0.006172  -7.038 2.02e-12 ***
## GenreFighting     -0.013253   0.007229  -1.833 0.066783 .  
## GenreMisc         -0.013196   0.005562  -2.372 0.017686 *  
## GenrePlatform      0.001720   0.007105   0.242 0.808696    
## GenrePuzzle       -0.034944   0.008443  -4.139 3.51e-05 ***
## GenreRacing        0.005358   0.006237   0.859 0.390349    
## GenreRole-Playing -0.016447   0.005862  -2.806 0.005025 ** 
## GenreShooter       0.021881   0.006131   3.569 0.000359 ***
## GenreSimulation   -0.020153   0.007166  -2.812 0.004925 ** 
## GenreSports        0.001024   0.005068   0.202 0.839867    
## GenreStrategy     -0.039826   0.007904  -5.039 4.73e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1879 on 16586 degrees of freedom
## Multiple R-squared:  0.008315,   Adjusted R-squared:  0.007657 
## F-statistic: 12.64 on 11 and 16586 DF,  p-value: < 2.2e-16

==> Hypothesis: There is no significant change in Global_Sales with respect to genre.

[ In the following data the ones having p-value<0.05 do not have a significant change but the rest change the sales significantly.]

fit <- lm( Global_Sales ~ Genre , data = vgsales)
summary(fit)
## 
## Call:
## lm(formula = Global_Sales ~ Genre, data = vgsales)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.928 -0.458 -0.307 -0.037 82.173 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.528100   0.026851  19.668  < 2e-16 ***
## GenreAdventure    -0.342221   0.050795  -6.737 1.67e-11 ***
## GenreFighting      0.001275   0.059501   0.021   0.9829    
## GenreMisc         -0.062338   0.045780  -1.362   0.1733    
## GenrePlatform      0.410241   0.058476   7.016 2.38e-12 ***
## GenrePuzzle       -0.107224   0.069491  -1.543   0.1229    
## GenreRacing        0.058001   0.051334   1.130   0.2585    
## GenreRole-Playing  0.095132   0.048247   1.972   0.0486 *  
## GenreShooter       0.263785   0.050458   5.228 1.74e-07 ***
## GenreSimulation   -0.075736   0.058980  -1.284   0.1991    
## GenreSports        0.039219   0.041715   0.940   0.3471    
## GenreStrategy     -0.270949   0.065052  -4.165 3.13e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.546 on 16586 degrees of freedom
## Multiple R-squared:  0.01194,    Adjusted R-squared:  0.01128 
## F-statistic: 18.22 on 11 and 16586 DF,  p-value: < 2.2e-16

4.Results

From the above vizualizations we can clearly say that DC and Play Station are the most popular platforms amongst all followed by xbox. Action genre is the most popular genre of all and is followed by sports and fighting respectively. We can also see that Daito is the most popular followed by TYO and Miwasa respectively.

From the above tests we can say that the genres less popular cause significant change in sales in all regions as compared to the ones that are more popular.

5.Conclusion

By the above data we can say that action games on DC or playstation for that matter are the most popular and are the ones responsible for maximum sales all over the globe. Also as these games are so abundant and popular(ranking wise), variation in the sales of one or two such games would not cause significant change in the overall sales.

6.References