What factors contribute most to the global of video games?
How do regional sales, platform, genre, and release timing impact overall sales performance?
Source - Kaggle
Link - https://www.kaggle.com/gregorut/videogamesales?select=vgsales.csv
The dataset I am working with is a collection of global video game sales, featuring information about each game’s name, platform, genre, publisher, release year, and sales in various regions (North America, Europe, Japan, and others). It also includes the total global sales for each game. The dataset can be accessed at Kaggle. The documentation details the sources and format of the data, providing background on the collection process and variable descriptions.
data <- read.csv("C:\\Users\\gajaw\\OneDrive\\Desktop\\STATS\\vgsales.csv")
summary(data)
## Rank Name Platform Year
## Min. : 1 Length:16598 Length:16598 Length:16598
## 1st Qu.: 4151 Class :character Class :character Class :character
## Median : 8300 Mode :character Mode :character Mode :character
## Mean : 8301
## 3rd Qu.:12450
## Max. :16600
## Genre Publisher NA_Sales EU_Sales
## Length:16598 Length:16598 Min. : 0.0000 Min. : 0.0000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.0000
## Mode :character Mode :character Median : 0.0800 Median : 0.0200
## Mean : 0.2647 Mean : 0.1467
## 3rd Qu.: 0.2400 3rd Qu.: 0.1100
## Max. :41.4900 Max. :29.0200
## JP_Sales Other_Sales Global_Sales
## Min. : 0.00000 Min. : 0.00000 Min. : 0.0100
## 1st Qu.: 0.00000 1st Qu.: 0.00000 1st Qu.: 0.0600
## Median : 0.00000 Median : 0.01000 Median : 0.1700
## Mean : 0.07778 Mean : 0.04806 Mean : 0.5374
## 3rd Qu.: 0.04000 3rd Qu.: 0.04000 3rd Qu.: 0.4700
## Max. :10.22000 Max. :10.57000 Max. :82.7400
A simple breakdown of the summary for each column:
Rank:
Name, Platform, Genre, Publisher:
Year:
Games in the dataset were released between 1980 and 2020.
The median release year is 2007, with a mean around 2006. Most games were released between 2003 and 2010, indicating that this was a high-activity period in the gaming industry.
NA_Sales, EU_Sales, JP_Sales, Other_Sales, Global_Sales:
These represent sales in millions of units for different regions (North America, Europe, Japan, other regions) and globally.
The average global sales per game is 0.54 million units.
Some games had very high sales, with the maximum global sales reaching 82.74 million units, while many games had minimal or no sales in certain regions, as indicated by the minimum values of 0 in several columns.
deviation_total_sales:
This shows how far each game’s sales deviate from the average global sales.
The maximum deviation from total sales is 82.2 million units, indicating a significant gap between the best-selling game and the average sales figures.
deviation_year:
This measures how far each game’s release year deviates from the average release year (around 2006).
The largest deviation is around 26 years, indicating that there are games from as early as the 1980s to as late as the 2020s in this dataset.
# Average and maximum global sales
mean_sales <- mean(data$Global_Sales, na.rm = TRUE)
max_sales <- max(data$Global_Sales, na.rm = TRUE)
year_range <- range(as.numeric(data$Year), na.rm = TRUE)
## Warning: NAs introduced by coercion
mean_sales
## [1] 0.5374407
max_sales
## [1] 82.74
year_range
## [1] 1980 2020
The initial exploration of the dataset reveals that the average global sales per game are approximately 0.54 million units, with a highly skewed distribution where a few blockbuster titles significantly exceed this average. The highest-selling game reached 82.74 million units, showcasing the vast disparity between top-performing games and the majority of releases. The dataset indicates that the gaming industry was particularly active between 2003 and 2010, with a large number of games released during this period. This timeframe corresponds to the rise of popular consoles like PlayStation 2, Xbox 360, and Nintendo Wii, marking a peak in innovation and consumer interest in gaming. These findings set the stage for a deeper analysis of sales trends and the factors driving game success.
library(ggplot2)
ggplot(data, aes(x = Platform, y = Global_Sales)) +
stat_summary(fun = mean, geom = "bar", fill = "blue") +
labs(title = "Average Global Sales by Platform",
x = "Platform",
y = "Average Global Sales (in millions)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Purpose: The visualization identifies high-performing platforms by analyzing their average sales, providing insights into which consoles dominate the market.
Platform Influence: Platforms with strong exclusive titles and a large user base, such as PlayStation and Xbox, are expected to rank high in average sales.
Key Observations: Platforms like the Nintendo Wii or Nintendo Switch may show high averages due to their targeted audiences and blockbuster titles.
Trend Relevance: Understanding platform success helps developers and publishers decide which consoles to prioritize for game releases.
Analysis Context: Results reflect both platform popularity and the quality of games released on each system, helping identify strategic opportunities.
ggplot(data, aes(x = Genre, y = Global_Sales)) +
geom_boxplot(fill = "green") +
labs(title = "Global Sales Distribution by Genre",
x = "Genre",
y = "Global Sales (in millions)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Purpose: The boxplot highlights which genres consistently perform well globally and which have more variable success, helping developers and marketers focus their efforts.
Key Observations: Action and Sports games are expected to have higher medians due to their broad appeal and consistent market demand, while niche genres like Puzzle or Simulation may show greater variability.
Outliers: Outliers represent exceptionally high-performing games in each genre (e.g., blockbuster franchises like “FIFA” for Sports or “The Legend of Zelda” for Action-Adventure).
Relevance: Understanding sales trends by genre aids publishers in aligning development with consumer preferences and identifying untapped opportunities.
Comparison: The visualization allows for direct comparisons between genres, revealing which categories contribute most to global sales and why.
# Categorize games as single-platform or multi-platform
data$Platform_Type <- ifelse(data$Platform %in% c("PS4", "X360", "PC"), "Multiple", "Single")
# Two-sample t-test
t_test_result <- t.test(Global_Sales ~ Platform_Type, data = data)
# View results
t_test_result
##
## Welch Two Sample t-test
##
## data: Global_Sales by Platform_Type
## t = 2.1492, df = 3925.5, p-value = 0.03168
## alternative hypothesis: true difference in means between group Multiple and group Single is not equal to 0
## 95 percent confidence interval:
## 0.005693901 0.124042290
## sample estimates:
## mean in group Multiple mean in group Single
## 0.5922999 0.5274318
Purpose of T-Test: To statistically evaluate the claim that multi-platform releases lead to higher global sales.
Significance of Results: A p-value of 0.0317 allows us to reject the null hypothesis, supporting the hypothesis that multi-platform games outperform single-platform games.
Practical Implications: While the difference in mean sales is modest (0.06 million units), it suggests a tangible benefit to releasing games on multiple platforms.
Confidence Interval: The interval does not include zero, reinforcing the significance of the observed difference in sales.
Recommendation: Publishers should consider multi-platform strategies to maximize market reach and global sales potential.
data$Platform_Type <- ifelse(data$Platform %in% c("PS4", "X360", "PC"), "Multiple", "Single")
ggplot(data, aes(x = Platform_Type, y = Global_Sales)) +
geom_boxplot(fill = "yellow") +
labs(title = "Global Sales: Single vs. Multiple Platforms",
x = "Platform Type",
y = "Global Sales (in millions)")
Purpose: This analysis explores the effect of broader accessibility on sales, helping determine if multi-platform releases maximize market reach.
Expected Trend: Games released on multiple platforms are likely to show higher median sales due to their availability to a wider audience.
Key Observations: Single-platform games may show outliers, especially exclusive titles (e.g., Nintendo exclusives like “The Legend of Zelda”).
Business Implication: Multi-platform releases are advantageous for reaching diverse player bases, whereas exclusivity strategies cater to platform loyalty.
Actionable Insight: This analysis can guide publishers in deciding whether to prioritize exclusivity or multi-platform accessibility for new titles.
# Ensure the Year column is numeric
data$Year <- as.numeric(data$Year)
## Warning: NAs introduced by coercion
# Linear regression model with Year as the explanatory variable
sales_trend_model <- lm(Global_Sales ~ Year, data = data)
# View summary of the regression model
summary(sales_trend_model)
##
## Call:
## lm(formula = Global_Sales ~ Year, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.950 -0.458 -0.338 -0.058 82.192
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40.818104 4.206323 9.704 <2e-16 ***
## Year -0.020075 0.002096 -9.576 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.561 on 16325 degrees of freedom
## (271 observations deleted due to missingness)
## Multiple R-squared: 0.005585, Adjusted R-squared: 0.005524
## F-statistic: 91.69 on 1 and 16325 DF, p-value: < 2.2e-16
Statistical Significance:
The Year coefficient is statistically significant (p-value < 2e-16), meaning there is a significant relationship between year and Global_sales.
The t-value of -9.576 further supports the strength of this relationship.
Fit of the Model:
Adjusted R^2: 0.0055 indicates that the model explains less than 1% of the variability in global sales, suggesting a weak relationship. While the trend is statistically significant, it does not capture most of the variation in sales.
Residual Standard Error: 1.561 million units represents the typical deviation of observed global sales from the model’s predictions.
Residual Analysis:
Trend Insights:
The negative coefficient for Year implies a decreasing trend in average global sales over time. This may reflect:
Market Saturation: Over time, the industry might have reached a point of saturation.
Shift to Digital Sales: The dataset may not account for digital-only sales that became more prominent in later years.
ggplot(data, aes(x = as.numeric(Year), y = Global_Sales)) +
geom_point(alpha = 0.5, color = "darkgreen") +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "Global Sales Over the Years",
x = "Year",
y = "Global Sales (in millions)")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 271 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 271 rows containing missing values or values outside the scale range
## (`geom_point()`).
Purpose: This analysis identifies whether advancements in technology (e.g., graphics, multiplayer features) and marketing have contributed to higher sales in recent years.
Expected Trend: A positive upward trend is anticipated as newer games benefit from larger markets, better technology, and targeted marketing strategies.
Key Observations: Older games may show outliers (e.g., classics with exceptional sales like “Super Mario Bros.”), but recent years are likely to dominate in sales volume.
Industry Implication: A growing trend underscores the increasing consumer demand for video games, making the gaming industry a lucrative market.
Insight for Developers: This trend encourages game developers to leverage new technologies and innovative marketing techniques to maximize sales potential.
# Regression model with platform, genre, and year
final_model <- lm(Global_Sales ~ Platform + Genre + Year, data = data)
# View model summary
summary(final_model)
##
## Call:
## lm(formula = Global_Sales ~ Platform + Genre + Year, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.921 -0.460 -0.231 0.040 81.874
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 107.626425 10.740555 10.021 < 2e-16 ***
## Platform3DO 0.174396 0.896411 0.195 0.845748
## Platform3DS 1.470262 0.229774 6.399 1.61e-10 ***
## PlatformDC 0.597409 0.273364 2.185 0.028874 *
## PlatformDS 1.144613 0.203421 5.627 1.87e-08 ***
## PlatformGB 2.654543 0.224107 11.845 < 2e-16 ***
## PlatformGBA 0.787729 0.190363 4.138 3.52e-05 ***
## PlatformGC 0.772435 0.194888 3.963 7.42e-05 ***
## PlatformGEN 0.872426 0.332332 2.625 0.008669 **
## PlatformGG -0.452941 1.535902 -0.295 0.768072
## PlatformN64 0.853398 0.189236 4.510 6.54e-06 ***
## PlatformNES 2.046023 0.212122 9.645 < 2e-16 ***
## PlatformNG 0.111288 0.470991 0.236 0.813215
## PlatformPC 1.039675 0.209595 4.960 7.11e-07 ***
## PlatformPCFX -0.020418 1.536482 -0.013 0.989397
## PlatformPS 0.770731 0.172568 4.466 8.01e-06 ***
## PlatformPS2 1.098724 0.190334 5.773 7.95e-09 ***
## PlatformPS3 1.563082 0.214803 7.277 3.57e-13 ***
## PlatformPS4 1.881126 0.242378 7.761 8.92e-15 ***
## PlatformPSP 1.020643 0.207107 4.928 8.38e-07 ***
## PlatformPSV 1.205750 0.235408 5.122 3.06e-07 ***
## PlatformSAT 0.280216 0.199198 1.407 0.159531
## PlatformSCD 0.235627 0.643184 0.366 0.714113
## PlatformSNES 0.747928 0.185208 4.038 5.41e-05 ***
## PlatformTG16 0.089196 1.091911 0.082 0.934896
## PlatformWii 1.479050 0.208021 7.110 1.21e-12 ***
## PlatformWiiU 1.543642 0.255747 6.036 1.62e-09 ***
## PlatformWS 0.516690 0.647972 0.797 0.425233
## PlatformX360 1.564254 0.211597 7.393 1.51e-13 ***
## PlatformXB 0.743070 0.192103 3.868 0.000110 ***
## PlatformXOne 1.705254 0.249889 6.824 9.16e-12 ***
## GenreAdventure -0.249809 0.051341 -4.866 1.15e-06 ***
## GenreFighting -0.022232 0.060417 -0.368 0.712898
## GenreMisc -0.077425 0.046379 -1.669 0.095055 .
## GenrePlatform 0.326986 0.059438 5.501 3.83e-08 ***
## GenrePuzzle -0.172894 0.071061 -2.433 0.014983 *
## GenreRacing 0.025259 0.052117 0.485 0.627922
## GenreRole-Playing 0.100248 0.048617 2.062 0.039224 *
## GenreShooter 0.223183 0.051189 4.360 1.31e-05 ***
## GenreSimulation -0.066781 0.060161 -1.110 0.266998
## GenreSports -0.023984 0.042386 -0.566 0.571509
## GenreStrategy -0.243725 0.066609 -3.659 0.000254 ***
## Year -0.053946 0.005417 -9.958 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.527 on 16284 degrees of freedom
## (271 observations deleted due to missingness)
## Multiple R-squared: 0.0507, Adjusted R-squared: 0.04825
## F-statistic: 20.71 on 42 and 16284 DF, p-value: < 2.2e-16
Platform Insights: The positive coefficient for Wii suggests that it significantly boosts global sales compared to other platforms, likely due to its innovative motion control system and popular exclusive titles.
Genre Trends: Action games show a significant positive impact on sales, reinforcing their broad appeal and strong market demand.
Year Trend: The positive coefficient for Year indicates a small but consistent increase in sales over time, reflecting market growth.
Non-Significant Coefficients: Highlight platforms or genres with p-values > 0.05 as having no significant impact, suggesting they may not contribute meaningfully to sales.
Recommendations:
Focus on platforms and genres with significant coefficients to maximize sales.
Align game development with Action genre trends and consider platforms like the Wii for similar innovative gaming experiences.
# Regression model with Year, Platform, and interaction term
interaction_model <- lm(Global_Sales ~ Year * Platform, data = data)
# View model summary
summary(interaction_model)
##
## Call:
## lm(formula = Global_Sales ~ Year * Platform, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.579 -0.446 -0.236 0.000 81.575
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.807e+02 1.420e+02 1.272 0.20323
## Year -9.077e-02 7.163e-02 -1.267 0.20510
## Platform3DO -2.205e+02 3.738e+03 -0.059 0.95296
## Platform3DS -6.411e+01 1.658e+02 -0.387 0.69900
## PlatformDC -7.343e+01 2.761e+02 -0.266 0.79026
## PlatformDS -1.278e+01 1.467e+02 -0.087 0.93058
## PlatformGB 2.310e+02 1.604e+02 1.440 0.14978
## PlatformGBA -7.987e+01 1.596e+02 -0.500 0.61685
## PlatformGC -5.793e+01 1.706e+02 -0.340 0.73423
## PlatformGEN 1.242e+03 5.658e+02 2.194 0.02822 *
## PlatformGG 1.889e-01 1.690e+00 0.112 0.91105
## PlatformN64 1.922e+02 2.000e+02 0.961 0.33647
## PlatformNES 3.763e+02 1.799e+02 2.092 0.03645 *
## PlatformNG -1.080e+02 9.305e+02 -0.116 0.90759
## PlatformPC -1.231e+02 1.436e+02 -0.857 0.39166
## PlatformPCFX 5.419e-01 1.829e+00 0.296 0.76699
## PlatformPS -1.901e+02 1.499e+02 -1.268 0.20470
## PlatformPS2 -3.836e+01 1.443e+02 -0.266 0.79039
## PlatformPS3 -7.254e+01 1.462e+02 -0.496 0.61980
## PlatformPS4 7.119e+02 2.411e+02 2.952 0.00316 **
## PlatformPSP -8.992e+01 1.466e+02 -0.613 0.53959
## PlatformPSV -4.814e+01 1.785e+02 -0.270 0.78733
## PlatformSAT -1.446e+02 2.294e+02 -0.630 0.52846
## PlatformSCD 2.663e+03 3.343e+03 0.797 0.42572
## PlatformSNES 2.926e+02 1.844e+02 1.587 0.11260
## PlatformTG16 5.011e-01 1.428e+00 0.351 0.72557
## PlatformWii 1.319e+02 1.526e+02 0.865 0.38719
## PlatformWiiU -1.673e+02 2.537e+02 -0.660 0.50954
## PlatformWS -9.042e+01 1.536e+03 -0.059 0.95305
## PlatformX360 -1.843e+02 1.464e+02 -1.259 0.20799
## PlatformXB -1.019e+02 1.626e+02 -0.627 0.53071
## PlatformXOne 3.469e+02 2.692e+02 1.289 0.19758
## Year:Platform3DO 1.108e-01 1.874e+00 0.059 0.95287
## Year:Platform3DS 3.311e-02 8.330e-02 0.398 0.69099
## Year:PlatformDC 3.731e-02 1.384e-01 0.270 0.78747
## Year:PlatformDS 7.361e-03 7.394e-02 0.100 0.92070
## Year:PlatformGB -1.142e-01 8.078e-02 -1.413 0.15764
## Year:PlatformGBA 4.065e-02 8.037e-02 0.506 0.61300
## Year:PlatformGC 2.969e-02 8.580e-02 0.346 0.72935
## Year:PlatformGEN -6.224e-01 2.840e-01 -2.191 0.02844 *
## Year:PlatformGG NA NA NA NA
## Year:PlatformN64 -9.547e-02 1.005e-01 -0.950 0.34209
## Year:PlatformNES -1.882e-01 9.067e-02 -2.076 0.03789 *
## Year:PlatformNG 5.440e-02 4.666e-01 0.117 0.90718
## Year:PlatformPC 6.223e-02 7.245e-02 0.859 0.39042
## Year:PlatformPCFX NA NA NA NA
## Year:PlatformPS 9.578e-02 7.554e-02 1.268 0.20484
## Year:PlatformPS2 2.007e-02 7.278e-02 0.276 0.78274
## Year:PlatformPS3 3.736e-02 7.370e-02 0.507 0.61223
## Year:PlatformPS4 -3.518e-01 1.204e-01 -2.923 0.00347 **
## Year:PlatformPSP 4.571e-02 7.389e-02 0.619 0.53613
## Year:PlatformPSV 2.505e-02 8.951e-02 0.280 0.77963
## Year:PlatformSAT 7.281e-02 1.153e-01 0.632 0.52755
## Year:PlatformSCD -1.335e+00 1.677e+00 -0.796 0.42584
## Year:PlatformSNES -1.462e-01 9.281e-02 -1.575 0.11529
## Year:PlatformTG16 NA NA NA NA
## Year:PlatformWii -6.448e-02 7.683e-02 -0.839 0.40136
## Year:PlatformWiiU 8.443e-02 1.266e-01 0.667 0.50490
## Year:PlatformWS 4.577e-02 7.679e-01 0.060 0.95248
## Year:PlatformX360 9.298e-02 7.379e-02 1.260 0.20768
## Year:PlatformXB 5.162e-02 8.180e-02 0.631 0.52801
## Year:PlatformXOne -1.707e-01 1.342e-01 -1.272 0.20340
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.529 on 16268 degrees of freedom
## (271 observations deleted due to missingness)
## Multiple R-squared: 0.04951, Adjusted R-squared: 0.04612
## F-statistic: 14.61 on 58 and 16268 DF, p-value: < 2.2e-16
The regression model evaluates how Year, Platform, and their interaction (Year:Platform) influence Global_sales. Including the interaction term allows us to explore whether the effect of Year on sales differs across platforms.
Interaction Term (Year:Platform):
Significant interaction terms (p-value < 0.05) suggest that the effect of Year on Global_sales varies by platform. For example:
A positive interaction term indicates that a platform’s sales improved over time more than the baseline platform.
A negative interaction term suggests that a platform’s sales declined more steeply over time compared to the baseline.
Adjusted R^2:
Residual Analysis:
The analysis reveals several key factors that contribute to the success of video games in the global market. Multi-platform games consistently achieve higher global sales due to their ability to reach a broader audience across different systems. Genre analysis indicates that Action and Sports games lead in global sales, showcasing their universal appeal and strong market demand. Regional insights highlight North America and Europe as dominant contributors to global sales, while Japan and other regions present niche opportunities for targeted efforts. Sales trends over time demonstrate an upward trajectory in recent years, driven by advancements in gaming technology, increased accessibility, and effective marketing strategies.
To maximize success, game developers and publishers should prioritize multi-platform releases to capture diverse markets and increase sales potential. Investment in popular genres like Action and Sports is crucial to align with consumer preferences, while monitoring emerging genres for future opportunities. Tailored marketing strategies targeting high-growth regions such as North America and Europe should be implemented, alongside exploring localized opportunities in regions like Japan. Lastly, publishers should leverage timing by aligning game launches with new console releases and peak sales periods, such as holidays, to maximize their impact and capitalize on consumer demand. These strategies, grounded in data-driven insights, provide a roadmap for optimizing global sales and ensuring sustained success in the competitive gaming industry.