Dataset Overview and Source

Video Game Sales Dataset (1980–2024)

This analysis explores sales trends and critic scores across thousands of video games spanning four decades.

Data Source: Kaggle - Video Game Sales 1980 – 2024

Key Variables:

  • title: Name of the game
  • genre: Game genre (Action, Sports, RPG, etc.)
  • console: Platform the game was released on
  • total_sales: Units sold worldwide in millions
  • critic_score: Review score (1–10 scale)
  • release_year: Year the game was released

R Code for Data Preparation

Here’s how the data is loaded and filtered:

library(ggplot2)
library(plotly)
library(dplyr)

df <- read.csv("Video_Games_Sales_Cleaned.csv",
               stringsAsFactors = FALSE)

# Keep top 8 genres for cleaner visuals
top_genres <- df %>%
  count(genre, sort = TRUE) %>%
  slice_head(n = 8) %>%
  pull(genre)

df_genres <- df %>% filter(genre %in% top_genres)

3D Plotly: Release Year, Critic Score & Sales

3D Plot Analysis

Key Observations:

  • Release Year Pattern: The bulk of games cluster between 2005 and 2012, reflecting the peak of the PS3/Xbox 360/Wii generation - the most prolific era in gaming history.

  • Critic Score Concentration: Scores are tightly packed between 7 and 8 across all years and genres, suggesting score inflation or standardized review benchmarks across the industry.

  • Sales Outliers: A small number of titles - mainly Action and Sports games-spike far above 5 million units, highlighting how blockbuster franchises drive disproportionate revenue.

  • Combined View: The 3D perspective reveals that high sales are not uniformly tied to high scores - release era and genre matter just as much.

Plotly Scatter: Critic Score vs. Sales by Genre

Scatter Plot Analysis

Key Observations:

  • Weak Positive Trend: Higher critic scores generally correspond to higher sales, but the relationship is loose - many high-scoring games sell modestly while some average-scoring titles sell millions.

  • Action Dominance: Action games appear most frequently across all score ranges and achieve the highest individual sales peaks, confirming their commercial dominance.

  • Sports Games: Sports titles cluster at moderate scores (7–8) but still achieve strong sales, likely driven by franchise loyalty rather than critical reception.

  • Score Ceiling Effect: Very few games score above 9, and those that do tend to be iconic titles from major publishers - quality alone does not guarantee strong sales.

ggplot Bar: Total Sales by Genre

ggplot Boxplot: Critic Score Distribution by Genre

Statistical Analysis: Summary Statistics

df_genres %>%
  group_by(genre) %>%
  summarise(
    Count        = n(),
    Mean_Sales   = round(mean(total_sales), 3),
    Median_Sales = round(median(total_sales), 3),
    SD_Sales     = round(sd(total_sales), 3),
    Mean_Score   = round(mean(critic_score), 2)
  ) %>%
  arrange(desc(Mean_Sales))
## # A tibble: 8 × 6
##   genre        Count Mean_Sales Median_Sales SD_Sales Mean_Score
##   <chr>        <int>      <dbl>        <dbl>    <dbl>      <dbl>
## 1 Shooter       1480      0.673         0.18    1.57        7.27
## 2 Sports        2581      0.46          0.2     0.814       7.33
## 3 Action        2825      0.398         0.14    1.03        7.14
## 4 Racing        1422      0.368         0.14    0.634       7.26
## 5 Role-Playing  1483      0.287         0.11    0.624       7.33
## 6 Misc          1988      0.28          0.11    0.546       7.24
## 7 Simulation    1116      0.269         0.08    0.555       7.23
## 8 Adventure     1888      0.172         0.04    0.387       7.26

Summary Statistics: Interpretation

Key Findings:

  • Action vs. Sports: Action games lead in total count (2,825 titles) while Sports games have a higher mean sales per title, suggesting Sports releases are fewer but more commercially reliable on average.

  • Shooter Sales Spike: Despite fewer titles, Shooter games show the highest mean and standard deviation in sales - driven by blockbuster franchises like Call of Duty that skew the average dramatically upward.

  • Score Consistency: Critic score means are nearly identical across all genres (ranging from 7.1 to 7.4), indicating that reviewers do not systematically favor one genre over another.

  • Platform and Racing: Racing and Platform genres show moderate sales with low variance, reflecting a stable but niche audience that supports these genres consistently.

Statistical Analysis: Linear Regression

model <- lm(total_sales ~ critic_score, data = df)
summary(model)
## 
## Call:
## lm(formula = total_sales ~ critic_score, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9140 -0.3306 -0.2306  0.0394 19.4665 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.352876   0.062091  -21.79   <2e-16 ***
## critic_score  0.234725   0.008519   27.55   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7932 on 18830 degrees of freedom
## Multiple R-squared:  0.03875,    Adjusted R-squared:  0.0387 
## F-statistic: 759.1 on 1 and 18830 DF,  p-value: < 2.2e-16

Linear Regression: Interpretation

Comprehensive Analysis:

  • Statistical Significance: The p-value on critic_score is well below 0.05, confirming that the relationship between critic score and sales is real and not due to random chance.

  • Coefficient: For every 1-point increase in critic score, total sales increase by approximately 0.235 million units - a modest but real effect.

  • R² = 0.0388: Critic score explains only 3.9% of the variation in sales. The vast majority of what drives sales - franchise brand, marketing budget, platform install base - is not captured by review scores alone.

  • Conclusion: Critic scores matter at the margins but are not the primary driver of commercial success. A game can score a 9 and sell modestly, while a 7-rated sports sequel sells millions.

Key Insights and Conclusions

Major Findings:

First: Action and Sports genres dominate total worldwide sales volume, driven by large install bases and recurring franchise releases that guarantee minimum sales floors.

Second: Critic scores are a statistically significant but weak predictor of sales (R² ≈ 0.039), confirming that marketing, platform, and brand loyalty outweigh review performance in commercial outcomes.

Third: The 2007–2010 window was the peak era of game releases, coinciding with the simultaneous maturity of the PS3, Xbox 360, and Wii platforms - the highest-competition period in gaming history.

Fourth: Score inflation is evident - the interquartile range for critic scores clusters tightly around 7.3 regardless of genre, suggesting industry-wide convergence in review standards over time.