Introduction

This project analyzes global video game sales to answer key business questions about genre performance, platform trends, regional differences, and publisher impact.

Step 1: Load the Dataset

vg <- read_csv("C:/Users/rmora/Downloads/vgsales.csv")

glimpse(vg)
## Rows: 16,598
## Columns: 11
## $ Rank         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
## $ Name         <chr> "Wii Sports", "Super Mario Bros.", "Mario Kart Wii", "Wii…
## $ Platform     <chr> "Wii", "NES", "Wii", "Wii", "GB", "GB", "DS", "Wii", "Wii…
## $ Year         <chr> "2006", "1985", "2008", "2009", "1996", "1989", "2006", "…
## $ Genre        <chr> "Sports", "Platform", "Racing", "Sports", "Role-Playing",…
## $ Publisher    <chr> "Nintendo", "Nintendo", "Nintendo", "Nintendo", "Nintendo…
## $ NA_Sales     <dbl> 41.49, 29.08, 15.85, 15.75, 11.27, 23.20, 11.38, 14.03, 1…
## $ EU_Sales     <dbl> 29.02, 3.58, 12.88, 11.01, 8.89, 2.26, 9.23, 9.20, 7.06, …
## $ JP_Sales     <dbl> 3.77, 6.81, 3.79, 3.28, 10.22, 4.22, 6.50, 2.93, 4.70, 0.…
## $ Other_Sales  <dbl> 8.46, 0.77, 3.31, 2.96, 1.00, 0.58, 2.90, 2.85, 2.26, 0.4…
## $ Global_Sales <dbl> 82.74, 40.24, 35.82, 33.00, 31.37, 30.26, 30.01, 29.02, 2…

Step 2: Data Cleaning

vg_clean <- vg %>%
  filter(!is.na(Year), !is.na(Global_Sales)) %>%
  mutate(Year = as.numeric(Year)) %>%
  filter(Year >= 1980, Year <= 2020)

glimpse(vg_clean)
## Rows: 16,327
## Columns: 11
## $ Rank         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
## $ Name         <chr> "Wii Sports", "Super Mario Bros.", "Mario Kart Wii", "Wii…
## $ Platform     <chr> "Wii", "NES", "Wii", "Wii", "GB", "GB", "DS", "Wii", "Wii…
## $ Year         <dbl> 2006, 1985, 2008, 2009, 1996, 1989, 2006, 2006, 2009, 198…
## $ Genre        <chr> "Sports", "Platform", "Racing", "Sports", "Role-Playing",…
## $ Publisher    <chr> "Nintendo", "Nintendo", "Nintendo", "Nintendo", "Nintendo…
## $ NA_Sales     <dbl> 41.49, 29.08, 15.85, 15.75, 11.27, 23.20, 11.38, 14.03, 1…
## $ EU_Sales     <dbl> 29.02, 3.58, 12.88, 11.01, 8.89, 2.26, 9.23, 9.20, 7.06, …
## $ JP_Sales     <dbl> 3.77, 6.81, 3.79, 3.28, 10.22, 4.22, 6.50, 2.93, 4.70, 0.…
## $ Other_Sales  <dbl> 8.46, 0.77, 3.31, 2.96, 1.00, 0.58, 2.90, 2.85, 2.26, 0.4…
## $ Global_Sales <dbl> 82.74, 40.24, 35.82, 33.00, 31.37, 30.26, 30.01, 29.02, 2…

Question 1: Which genre generates the highest global sales?

genre_sales <- vg_clean %>%
  group_by(Genre) %>%
  summarize(total_global_sales = sum(Global_Sales, na.rm = TRUE)) %>%
  arrange(desc(total_global_sales))

genre_sales
## # A tibble: 12 × 2
##    Genre        total_global_sales
##    <chr>                     <dbl>
##  1 Action                    1723.
##  2 Sports                    1309.
##  3 Shooter                   1026.
##  4 Role-Playing               924.
##  5 Platform                   829.
##  6 Misc                       798.
##  7 Racing                     727.
##  8 Fighting                   444.
##  9 Simulation                 390.
## 10 Puzzle                     242.
## 11 Adventure                  235.
## 12 Strategy                   173.
ggplot(genre_sales, aes(x = reorder(Genre, total_global_sales),
                        y = total_global_sales,
                        fill = Genre)) +
  geom_col() +
  coord_flip() +
  labs(title = "Total Global Sales by Genre",
       x = "Genre",
       y = "Global Sales (Millions)") +
  theme_minimal() +
  theme(legend.position = "none")

### Interpretation

The bar chart shows the total global sales for each video game genre.
Genres such as Action, Sports, and Shooter appear at the top, meaning they generate the most revenue worldwide.

This tells stakeholders that investing in these genres may lead to higher financial returns, while niche categories like Puzzle or Strategy sell less overall.

A t-test is not appropriate here because we are comparing multiple categories, not two groups. Instead, this summary table and bar chart give a clear picture of how each genre performs.

Question 2: Which gaming platforms generate the highest global sales (Top 10)

platform_sales <- vg_clean %>%
  group_by(Platform) %>%
  summarise(total_global_sales = sum(Global_Sales, na.rm = TRUE)) %>%
  arrange(desc(total_global_sales))

platform_sales
## # A tibble: 31 × 2
##    Platform total_global_sales
##    <chr>                 <dbl>
##  1 PS2                   1233.
##  2 X360                   970.
##  3 PS3                    949.
##  4 Wii                    910.
##  5 DS                     819.
##  6 PS                     727.
##  7 GBA                    314.
##  8 PSP                    292.
##  9 PS4                    278.
## 10 PC                     255.
## # ℹ 21 more rows
platform_top10 <- platform_sales %>% 
  slice_max(total_global_sales, n = 10)

ggplot(platform_top10, aes(x = reorder(Platform, total_global_sales),
                           y = total_global_sales,
                           fill = Platform)) +
  geom_col(width = 0.7) +
  coord_flip() +
  scale_y_continuous(
    breaks = seq(0, max(platform_top10$total_global_sales), by = 100),
    labels = scales::comma
  ) +
  labs(
    title = "Top 10 Platforms by Global Sales",
    x = "Platform",
    y = "Global Sales (Millions)"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10)
  )

Question 3: Global Sales Overtime 1990-2020

yearly_sales <- vg_clean %>%
  group_by(Year) %>%
  summarise(total_global_sales = sum(Global_Sales, na.rm = TRUE)) %>%
  arrange(Year)

yearly_sales
## # A tibble: 39 × 2
##     Year total_global_sales
##    <dbl>              <dbl>
##  1  1980               11.4
##  2  1981               35.8
##  3  1982               28.9
##  4  1983               16.8
##  5  1984               50.4
##  6  1985               53.9
##  7  1986               37.1
##  8  1987               21.7
##  9  1988               47.2
## 10  1989               73.4
## # ℹ 29 more rows
ggplot(yearly_sales, aes(x = Year, y = total_global_sales)) +
  geom_line(color = "#2C7FB8", size = 1) +
  geom_point(color = "#2C7FB8", size = 2) +
  geom_smooth(method = "loess", se = FALSE, color = "darkred") +
  labs(
    title = "Global Video Game Sales Over Time (With Trend Line)",
    x = "Year",
    y = "Global Sales (Millions)"
  ) +
  theme_minimal()

### Interpretation

This chart shows how global video game sales changed from 1980 to 2020.
Sales grew slowly through the 1980s and 1990s, then increased sharply in the early 2000s. The industry reached its peak around 2007–2010, which aligns with the success of consoles like the Wii, Xbox 360, DS, and PlayStation 3.

After 2010, total sales begin to drop. Part of this decline may reflect real market changes, such as the rise of mobile gaming and digital downloads replacing physical game purchases.

Data Note

There is very limited data available after 2015 in the vgsales dataset.
This is because physical sales were no longer consistently reported, and many digital sales numbers were not captured.
So the sharp downward trend after 2015 represents missing data, not an actual collapse of the gaming industry.

The overall trend line helps show the long-term growth of the video game market, followed by a data-driven decline due to incomplete reporting.

Question 4: Regional Sales Differences (NA vs EU vs JP)

regional_sales <- vg_clean %>%
  summarise(
    total_na = sum(NA_Sales, na.rm = TRUE),
    total_eu = sum(EU_Sales, na.rm = TRUE),
    total_jp = sum(JP_Sales, na.rm = TRUE)
  )

regional_sales
## # A tibble: 1 × 3
##   total_na total_eu total_jp
##      <dbl>    <dbl>    <dbl>
## 1    4333.    2409.    1284.
regional_long <- regional_sales %>%
  pivot_longer(cols = everything(),
               names_to = "Region",
               values_to = "Sales")

regional_long
## # A tibble: 3 × 2
##   Region   Sales
##   <chr>    <dbl>
## 1 total_na 4333.
## 2 total_eu 2409.
## 3 total_jp 1284.
ggplot(regional_long, aes(x = reorder(Region, Sales), y = Sales, fill = Region)) +
  geom_col(width = 0.6) +
  coord_flip() +
  labs(
    title = "Total Sales by Region (NA vs EU vs JP)",
    x = "Region",
    y = "Sales (Millions)"
  ) +
  scale_y_continuous(labels = scales::comma) +
  theme_minimal() +
  theme(legend.position = "none")

Interpretation

The comparison of regional sales shows clear differences in market size.
North America has the highest total sales, followed closely by Europe.
Japan generates significantly lower total revenue, although it remains an important market for certain genres and publishers.

These differences suggest that:

  • NA and EU should be the primary focus for global releases
  • JP may require more specialized marketing or region-specific titles
  • Games with universal appeal (action, sports) tend to perform well in NA/EU
  • JP prefers genres like RPGs and some Nintendo-exclusive titles

Understanding regional preferences helps companies decide where to allocate marketing budgets and which platforms or genres to prioritize for each region.

# Correlations between regions
# Correlation calculations
cor_na_eu <- cor(vg$NA_Sales, vg$EU_Sales, use = "complete.obs")
cor_na_jp <- cor(vg$NA_Sales, vg$JP_Sales, use = "complete.obs")
cor_eu_jp <- cor(vg$EU_Sales, vg$JP_Sales, use = "complete.obs")

# Build clean correlation table
cor_results <- tibble(
  Comparison = c("North America vs Europe",
                 "North America vs Japan",
                 "Europe vs Japan"),
  Correlation = round(c(cor_na_eu, cor_na_jp, cor_eu_jp), 3)
)

cor_results
## # A tibble: 3 × 2
##   Comparison              Correlation
##   <chr>                         <dbl>
## 1 North America vs Europe       0.768
## 2 North America vs Japan        0.45 
## 3 Europe vs Japan               0.436

Interpretation of Correlation Results

North America vs Europe (0.768) Strong positive correlation. Games that sell well in NA usually sell well in Europe too.

North America vs Japan (0.450) Moderate correlation. Japan has different gaming preferences.

Europe vs Japan (0.436) Weak-to-moderate correlation. These markets behave differently in terms of game popularity.

Conclusion: NA and EU behave similarly, while Japan is a unique market with distinct preferences. Stakeholders should use different marketing and publishing strategies for Japan.

Question 5: Which publishers consistently produce top-selling games?

Top 10 Publishers by Total Global Sales

# code here

publisher_sales <- vg_clean %>%
group_by(Publisher) %>%
summarize(total_global_sales = sum(Global_Sales, na.rm = TRUE)) %>%
arrange(desc(total_global_sales)) %>%
slice(1:10)

publisher_sales
## # A tibble: 10 × 2
##    Publisher                    total_global_sales
##    <chr>                                     <dbl>
##  1 Nintendo                                  1784.
##  2 Electronic Arts                           1093.
##  3 Activision                                 721.
##  4 Sony Computer Entertainment                607.
##  5 Ubisoft                                    474.
##  6 Take-Two Interactive                       399.
##  7 THQ                                        340.
##  8 Konami Digital Entertainment               279.
##  9 Sega                                       271.
## 10 Namco Bandai Games                         254.
ggplot(publisher_sales,
aes(x = reorder(Publisher, total_global_sales),
y = total_global_sales,
fill = Publisher)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = round(total_global_sales, 0)),
hjust = -0.1,
size = 4) +
coord_flip() +
labs(
title = "Top 10 Publishers by Global Sales",
x = "Publisher",
y = "Global Sales (Millions)"
) +
theme_minimal(base_size = 14) +
scale_y_continuous(labels = scales::comma,
expand = expansion(mult = c(0, 0.15)))

Interpretation:

Nintendo is the clear leader in global sales, far ahead of all other publishers. Electronic Arts and Activision also perform strongly, showing they consistently release successful, high-selling titles. The rest of the top publishers, such as Sony, Ubisoft, and Take-Two, show solid performance but at a smaller scale. Overall, the chart shows that a small group of major publishers dominate most of the global video game market.

✅ Final Summary of Findings

This analysis explored global video game sales across genres, platforms, regions, and publishers. The goal was to identify clear trends that could guide marketing, investment, and strategic decisions.

  1. Genre Performance

Action, Sports, and Shooter games generated the highest global sales. These genres consistently appeal to a wide audience and provide reliable returns.

  1. Top Platforms

The PlayStation 2, Xbox 360, and PlayStation 3 dominated total global sales. These consoles had long life cycles, strong game libraries, and major market influence.

  1. Sales Over Time

Global sales increased steadily from the mid-1990s to a peak around 2008–2010. After 2012, sales declined, likely due to digital distribution shifting away from physical game sales (the dataset captures only physical sales).

  1. Publisher Performance

Nintendo, Electronic Arts, and Activision are the top-selling publishers. They produce successful franchises across multiple regions and remain key industry leaders.

  1. Regional Differences

North America is the largest market, followed by Europe. Japan is smaller overall but has strong performance in specific genres and platforms.

  1. Statistical Test: Top 3 Publishers vs Others

A two-sample t-test shows that the Top 3 publishers have significantly higher average global sales than all other publishers. This confirms that their dominance is not random — they consistently outperform competitors.

✅ Final Business Recommendations 1. Prioritize High-Performing Genres

Invest more heavily in Action, Sports, and Shooter games. These categories provide reliable global demand.

  1. Strengthen Platform Partnerships

Platforms like PS2/PS3/X360 historically drive the highest sales. Future strategies should consider where players currently live (e.g., PlayStation & Nintendo ecosystems).

  1. Build Relationships with Top Publishers

Nintendo, EA, and Activision demonstrate consistent success. Working with these companies — or modeling strategies after them — increases the chance of high performance.

  1. Tailor Marketing by Region

North America and Europe are your largest markets. Japan requires more targeted, genre-specific approaches.

  1. Use Data to Identify Rising Trends

Monitoring new platforms (Switch, PS5, Series X) could provide insights into the next major sales boom.

✅ Short Closing Statement

Overall, the data shows clear leaders across genres, platforms, and publishers. The market has predictable patterns, and companies that understand these trends can make smarter investments, target audiences more effectively, and increase the likelihood of releasing top-performing games.

Appendix: Use of AI Tools

For this project, I used AI tools to support my workflow, mainly to improve coding efficiency, formatting, and visualization clarity. All analysis, interpretation, and final decisions were made by me. The following tools were used:

  1. ChatGPT (Version 5.1)

ChatGPT was used to:

Help format R Markdown chunks correctly

Fix syntax errors and improve ggplot visualizations

Suggest clear labeling, titles, and theme adjustments

Provide explanations of statistical outputs

Assist with writing short interpretations in simpler language

Help structure the analysis for the five main business questions

ChatGPT did not generate the dataset, did not run the code, and did not make conclusions for me. I reviewed and approved all code and analysis.

  1. Gamma (Gamma AI)

Gamma AI was used only to:

Convert my report summary into presentation slides

Improve layout and readability of the visuals

Speed up slide design and formatting

Gamma did not perform any data analysis.