Questions

Dataset Overview

Source - Kaggle

Link - https://www.kaggle.com/gregorut/videogamesales?select=vgsales.csv

The dataset I am working with is a collection of global video game sales, featuring information about each game’s name, platform, genre, publisher, release year, and sales in various regions (North America, Europe, Japan, and others). It also includes the total global sales for each game. The dataset can be accessed at Kaggle. The documentation details the sources and format of the data, providing background on the collection process and variable descriptions.

data <- read.csv("C:\\Users\\gajaw\\OneDrive\\Desktop\\STATS\\vgsales.csv")
summary(data)
##       Rank           Name             Platform             Year          
##  Min.   :    1   Length:16598       Length:16598       Length:16598      
##  1st Qu.: 4151   Class :character   Class :character   Class :character  
##  Median : 8300   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 8301                                                           
##  3rd Qu.:12450                                                           
##  Max.   :16600                                                           
##     Genre            Publisher            NA_Sales          EU_Sales      
##  Length:16598       Length:16598       Min.   : 0.0000   Min.   : 0.0000  
##  Class :character   Class :character   1st Qu.: 0.0000   1st Qu.: 0.0000  
##  Mode  :character   Mode  :character   Median : 0.0800   Median : 0.0200  
##                                        Mean   : 0.2647   Mean   : 0.1467  
##                                        3rd Qu.: 0.2400   3rd Qu.: 0.1100  
##                                        Max.   :41.4900   Max.   :29.0200  
##     JP_Sales         Other_Sales        Global_Sales    
##  Min.   : 0.00000   Min.   : 0.00000   Min.   : 0.0100  
##  1st Qu.: 0.00000   1st Qu.: 0.00000   1st Qu.: 0.0600  
##  Median : 0.00000   Median : 0.01000   Median : 0.1700  
##  Mean   : 0.07778   Mean   : 0.04806   Mean   : 0.5374  
##  3rd Qu.: 0.04000   3rd Qu.: 0.04000   3rd Qu.: 0.4700  
##  Max.   :10.22000   Max.   :10.57000   Max.   :82.7400

A simple breakdown of the summary for each column:

  1. Rank:

    • The dataset ranks games from 1 to 16,600 based on global sales, with the lowest rank (1) representing the highest-selling game.
  2. Name, Platform, Genre, Publisher:

    • These are text columns that show the name of the game, the platform it was released on (e.g., Wii, PlayStation), the genre (e.g., Action, Sports), and the publisher (e.g., Nintendo, EA). No statistical summaries apply here as they are non-numeric.
  3. Year:

    • Games in the dataset were released between 1980 and 2020.

    • The median release year is 2007, with a mean around 2006. Most games were released between 2003 and 2010, indicating that this was a high-activity period in the gaming industry.

  4. NA_Sales, EU_Sales, JP_Sales, Other_Sales, Global_Sales:

    • These represent sales in millions of units for different regions (North America, Europe, Japan, other regions) and globally.

    • The average global sales per game is 0.54 million units.

    • Some games had very high sales, with the maximum global sales reaching 82.74 million units, while many games had minimal or no sales in certain regions, as indicated by the minimum values of 0 in several columns.

  5. deviation_total_sales:

    • This shows how far each game’s sales deviate from the average global sales.

    • The maximum deviation from total sales is 82.2 million units, indicating a significant gap between the best-selling game and the average sales figures.

  6. deviation_year:

    • This measures how far each game’s release year deviates from the average release year (around 2006).

    • The largest deviation is around 26 years, indicating that there are games from as early as the 1980s to as late as the 2020s in this dataset.

Explaratory Data Analysis(EDA)

# Average and maximum global sales
mean_sales <- mean(data$Global_Sales, na.rm = TRUE)
max_sales <- max(data$Global_Sales, na.rm = TRUE)
year_range <- range(as.numeric(data$Year), na.rm = TRUE)
## Warning: NAs introduced by coercion
mean_sales
## [1] 0.5374407
max_sales
## [1] 82.74
year_range
## [1] 1980 2020

The initial exploration of the dataset reveals that the average global sales per game are approximately 0.54 million units, with a highly skewed distribution where a few blockbuster titles significantly exceed this average. The highest-selling game reached 82.74 million units, showcasing the vast disparity between top-performing games and the majority of releases. The dataset indicates that the gaming industry was particularly active between 2003 and 2010, with a large number of games released during this period. This timeframe corresponds to the rise of popular consoles like PlayStation 2, Xbox 360, and Nintendo Wii, marking a peak in innovation and consumer interest in gaming. These findings set the stage for a deeper analysis of sales trends and the factors driving game success.

Visualization 1: Average Sales by Platform

library(ggplot2)
ggplot(data, aes(x = Platform, y = Global_Sales)) +
  stat_summary(fun = mean, geom = "bar", fill = "blue") +
  labs(title = "Average Global Sales by Platform", 
       x = "Platform", 
       y = "Average Global Sales (in millions)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Visualization 2: Sales by Genre

ggplot(data, aes(x = Genre, y = Global_Sales)) +
  geom_boxplot(fill = "green") +
  labs(title = "Global Sales Distribution by Genre", 
       x = "Genre", 
       y = "Global Sales (in millions)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Hypothesis 1 - Single vs. Multiple Platforms

# Categorize games as single-platform or multi-platform
data$Platform_Type <- ifelse(data$Platform %in% c("PS4", "X360", "PC"), "Multiple", "Single")

# Two-sample t-test
t_test_result <- t.test(Global_Sales ~ Platform_Type, data = data)

# View results
t_test_result
## 
##  Welch Two Sample t-test
## 
## data:  Global_Sales by Platform_Type
## t = 2.1492, df = 3925.5, p-value = 0.03168
## alternative hypothesis: true difference in means between group Multiple and group Single is not equal to 0
## 95 percent confidence interval:
##  0.005693901 0.124042290
## sample estimates:
## mean in group Multiple   mean in group Single 
##              0.5922999              0.5274318
data$Platform_Type <- ifelse(data$Platform %in% c("PS4", "X360", "PC"), "Multiple", "Single")
ggplot(data, aes(x = Platform_Type, y = Global_Sales)) +
  geom_boxplot(fill = "yellow") +
  labs(title = "Global Sales: Single vs. Multiple Platforms", 
       x = "Platform Type", 
       y = "Global Sales (in millions)")

REGRESSION ANALYSIS - MODEL 1

# Regression model with platform, genre, and year
final_model <- lm(Global_Sales ~ Platform + Genre + Year, data = data)

# View model summary
summary(final_model)
## 
## Call:
## lm(formula = Global_Sales ~ Platform + Genre + Year, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -2.921 -0.460 -0.231  0.040 81.874 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       107.626425  10.740555  10.021  < 2e-16 ***
## Platform3DO         0.174396   0.896411   0.195 0.845748    
## Platform3DS         1.470262   0.229774   6.399 1.61e-10 ***
## PlatformDC          0.597409   0.273364   2.185 0.028874 *  
## PlatformDS          1.144613   0.203421   5.627 1.87e-08 ***
## PlatformGB          2.654543   0.224107  11.845  < 2e-16 ***
## PlatformGBA         0.787729   0.190363   4.138 3.52e-05 ***
## PlatformGC          0.772435   0.194888   3.963 7.42e-05 ***
## PlatformGEN         0.872426   0.332332   2.625 0.008669 ** 
## PlatformGG         -0.452941   1.535902  -0.295 0.768072    
## PlatformN64         0.853398   0.189236   4.510 6.54e-06 ***
## PlatformNES         2.046023   0.212122   9.645  < 2e-16 ***
## PlatformNG          0.111288   0.470991   0.236 0.813215    
## PlatformPC          1.039675   0.209595   4.960 7.11e-07 ***
## PlatformPCFX       -0.020418   1.536482  -0.013 0.989397    
## PlatformPS          0.770731   0.172568   4.466 8.01e-06 ***
## PlatformPS2         1.098724   0.190334   5.773 7.95e-09 ***
## PlatformPS3         1.563082   0.214803   7.277 3.57e-13 ***
## PlatformPS4         1.881126   0.242378   7.761 8.92e-15 ***
## PlatformPSP         1.020643   0.207107   4.928 8.38e-07 ***
## PlatformPSV         1.205750   0.235408   5.122 3.06e-07 ***
## PlatformSAT         0.280216   0.199198   1.407 0.159531    
## PlatformSCD         0.235627   0.643184   0.366 0.714113    
## PlatformSNES        0.747928   0.185208   4.038 5.41e-05 ***
## PlatformTG16        0.089196   1.091911   0.082 0.934896    
## PlatformWii         1.479050   0.208021   7.110 1.21e-12 ***
## PlatformWiiU        1.543642   0.255747   6.036 1.62e-09 ***
## PlatformWS          0.516690   0.647972   0.797 0.425233    
## PlatformX360        1.564254   0.211597   7.393 1.51e-13 ***
## PlatformXB          0.743070   0.192103   3.868 0.000110 ***
## PlatformXOne        1.705254   0.249889   6.824 9.16e-12 ***
## GenreAdventure     -0.249809   0.051341  -4.866 1.15e-06 ***
## GenreFighting      -0.022232   0.060417  -0.368 0.712898    
## GenreMisc          -0.077425   0.046379  -1.669 0.095055 .  
## GenrePlatform       0.326986   0.059438   5.501 3.83e-08 ***
## GenrePuzzle        -0.172894   0.071061  -2.433 0.014983 *  
## GenreRacing         0.025259   0.052117   0.485 0.627922    
## GenreRole-Playing   0.100248   0.048617   2.062 0.039224 *  
## GenreShooter        0.223183   0.051189   4.360 1.31e-05 ***
## GenreSimulation    -0.066781   0.060161  -1.110 0.266998    
## GenreSports        -0.023984   0.042386  -0.566 0.571509    
## GenreStrategy      -0.243725   0.066609  -3.659 0.000254 ***
## Year               -0.053946   0.005417  -9.958  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.527 on 16284 degrees of freedom
##   (271 observations deleted due to missingness)
## Multiple R-squared:  0.0507, Adjusted R-squared:  0.04825 
## F-statistic: 20.71 on 42 and 16284 DF,  p-value: < 2.2e-16

REGRESSION ANALYSIS - MODEL 2

# Regression model with Year, Platform, and interaction term
interaction_model <- lm(Global_Sales ~ Year * Platform, data = data)

# View model summary
summary(interaction_model)
## 
## Call:
## lm(formula = Global_Sales ~ Year * Platform, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.579 -0.446 -0.236  0.000 81.575 
## 
## Coefficients: (3 not defined because of singularities)
##                     Estimate Std. Error t value Pr(>|t|)   
## (Intercept)        1.807e+02  1.420e+02   1.272  0.20323   
## Year              -9.077e-02  7.163e-02  -1.267  0.20510   
## Platform3DO       -2.205e+02  3.738e+03  -0.059  0.95296   
## Platform3DS       -6.411e+01  1.658e+02  -0.387  0.69900   
## PlatformDC        -7.343e+01  2.761e+02  -0.266  0.79026   
## PlatformDS        -1.278e+01  1.467e+02  -0.087  0.93058   
## PlatformGB         2.310e+02  1.604e+02   1.440  0.14978   
## PlatformGBA       -7.987e+01  1.596e+02  -0.500  0.61685   
## PlatformGC        -5.793e+01  1.706e+02  -0.340  0.73423   
## PlatformGEN        1.242e+03  5.658e+02   2.194  0.02822 * 
## PlatformGG         1.889e-01  1.690e+00   0.112  0.91105   
## PlatformN64        1.922e+02  2.000e+02   0.961  0.33647   
## PlatformNES        3.763e+02  1.799e+02   2.092  0.03645 * 
## PlatformNG        -1.080e+02  9.305e+02  -0.116  0.90759   
## PlatformPC        -1.231e+02  1.436e+02  -0.857  0.39166   
## PlatformPCFX       5.419e-01  1.829e+00   0.296  0.76699   
## PlatformPS        -1.901e+02  1.499e+02  -1.268  0.20470   
## PlatformPS2       -3.836e+01  1.443e+02  -0.266  0.79039   
## PlatformPS3       -7.254e+01  1.462e+02  -0.496  0.61980   
## PlatformPS4        7.119e+02  2.411e+02   2.952  0.00316 **
## PlatformPSP       -8.992e+01  1.466e+02  -0.613  0.53959   
## PlatformPSV       -4.814e+01  1.785e+02  -0.270  0.78733   
## PlatformSAT       -1.446e+02  2.294e+02  -0.630  0.52846   
## PlatformSCD        2.663e+03  3.343e+03   0.797  0.42572   
## PlatformSNES       2.926e+02  1.844e+02   1.587  0.11260   
## PlatformTG16       5.011e-01  1.428e+00   0.351  0.72557   
## PlatformWii        1.319e+02  1.526e+02   0.865  0.38719   
## PlatformWiiU      -1.673e+02  2.537e+02  -0.660  0.50954   
## PlatformWS        -9.042e+01  1.536e+03  -0.059  0.95305   
## PlatformX360      -1.843e+02  1.464e+02  -1.259  0.20799   
## PlatformXB        -1.019e+02  1.626e+02  -0.627  0.53071   
## PlatformXOne       3.469e+02  2.692e+02   1.289  0.19758   
## Year:Platform3DO   1.108e-01  1.874e+00   0.059  0.95287   
## Year:Platform3DS   3.311e-02  8.330e-02   0.398  0.69099   
## Year:PlatformDC    3.731e-02  1.384e-01   0.270  0.78747   
## Year:PlatformDS    7.361e-03  7.394e-02   0.100  0.92070   
## Year:PlatformGB   -1.142e-01  8.078e-02  -1.413  0.15764   
## Year:PlatformGBA   4.065e-02  8.037e-02   0.506  0.61300   
## Year:PlatformGC    2.969e-02  8.580e-02   0.346  0.72935   
## Year:PlatformGEN  -6.224e-01  2.840e-01  -2.191  0.02844 * 
## Year:PlatformGG           NA         NA      NA       NA   
## Year:PlatformN64  -9.547e-02  1.005e-01  -0.950  0.34209   
## Year:PlatformNES  -1.882e-01  9.067e-02  -2.076  0.03789 * 
## Year:PlatformNG    5.440e-02  4.666e-01   0.117  0.90718   
## Year:PlatformPC    6.223e-02  7.245e-02   0.859  0.39042   
## Year:PlatformPCFX         NA         NA      NA       NA   
## Year:PlatformPS    9.578e-02  7.554e-02   1.268  0.20484   
## Year:PlatformPS2   2.007e-02  7.278e-02   0.276  0.78274   
## Year:PlatformPS3   3.736e-02  7.370e-02   0.507  0.61223   
## Year:PlatformPS4  -3.518e-01  1.204e-01  -2.923  0.00347 **
## Year:PlatformPSP   4.571e-02  7.389e-02   0.619  0.53613   
## Year:PlatformPSV   2.505e-02  8.951e-02   0.280  0.77963   
## Year:PlatformSAT   7.281e-02  1.153e-01   0.632  0.52755   
## Year:PlatformSCD  -1.335e+00  1.677e+00  -0.796  0.42584   
## Year:PlatformSNES -1.462e-01  9.281e-02  -1.575  0.11529   
## Year:PlatformTG16         NA         NA      NA       NA   
## Year:PlatformWii  -6.448e-02  7.683e-02  -0.839  0.40136   
## Year:PlatformWiiU  8.443e-02  1.266e-01   0.667  0.50490   
## Year:PlatformWS    4.577e-02  7.679e-01   0.060  0.95248   
## Year:PlatformX360  9.298e-02  7.379e-02   1.260  0.20768   
## Year:PlatformXB    5.162e-02  8.180e-02   0.631  0.52801   
## Year:PlatformXOne -1.707e-01  1.342e-01  -1.272  0.20340   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.529 on 16268 degrees of freedom
##   (271 observations deleted due to missingness)
## Multiple R-squared:  0.04951,    Adjusted R-squared:  0.04612 
## F-statistic: 14.61 on 58 and 16268 DF,  p-value: < 2.2e-16

Conclusion

The analysis reveals several key factors that contribute to the success of video games in the global market. Multi-platform games consistently achieve higher global sales due to their ability to reach a broader audience across different systems. Genre analysis indicates that Action and Sports games lead in global sales, showcasing their universal appeal and strong market demand. Regional insights highlight North America and Europe as dominant contributors to global sales, while Japan and other regions present niche opportunities for targeted efforts. Sales trends over time demonstrate an upward trajectory in recent years, driven by advancements in gaming technology, increased accessibility, and effective marketing strategies.

To maximize success, game developers and publishers should prioritize multi-platform releases to capture diverse markets and increase sales potential. Investment in popular genres like Action and Sports is crucial to align with consumer preferences, while monitoring emerging genres for future opportunities. Tailored marketing strategies targeting high-growth regions such as North America and Europe should be implemented, alongside exploring localized opportunities in regions like Japan. Lastly, publishers should leverage timing by aligning game launches with new console releases and peak sales periods, such as holidays, to maximize their impact and capitalize on consumer demand. These strategies, grounded in data-driven insights, provide a roadmap for optimizing global sales and ensuring sustained success in the competitive gaming industry.