This project analyzes global video game sales to answer key business questions about genre performance, platform trends, regional differences, and publisher impact.
vg <- read_csv("C:/Users/rmora/Downloads/vgsales.csv")
glimpse(vg)
## Rows: 16,598
## Columns: 11
## $ Rank <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
## $ Name <chr> "Wii Sports", "Super Mario Bros.", "Mario Kart Wii", "Wii…
## $ Platform <chr> "Wii", "NES", "Wii", "Wii", "GB", "GB", "DS", "Wii", "Wii…
## $ Year <chr> "2006", "1985", "2008", "2009", "1996", "1989", "2006", "…
## $ Genre <chr> "Sports", "Platform", "Racing", "Sports", "Role-Playing",…
## $ Publisher <chr> "Nintendo", "Nintendo", "Nintendo", "Nintendo", "Nintendo…
## $ NA_Sales <dbl> 41.49, 29.08, 15.85, 15.75, 11.27, 23.20, 11.38, 14.03, 1…
## $ EU_Sales <dbl> 29.02, 3.58, 12.88, 11.01, 8.89, 2.26, 9.23, 9.20, 7.06, …
## $ JP_Sales <dbl> 3.77, 6.81, 3.79, 3.28, 10.22, 4.22, 6.50, 2.93, 4.70, 0.…
## $ Other_Sales <dbl> 8.46, 0.77, 3.31, 2.96, 1.00, 0.58, 2.90, 2.85, 2.26, 0.4…
## $ Global_Sales <dbl> 82.74, 40.24, 35.82, 33.00, 31.37, 30.26, 30.01, 29.02, 2…
vg_clean <- vg %>%
filter(!is.na(Year), !is.na(Global_Sales)) %>%
mutate(Year = as.numeric(Year)) %>%
filter(Year >= 1980, Year <= 2020)
glimpse(vg_clean)
## Rows: 16,327
## Columns: 11
## $ Rank <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
## $ Name <chr> "Wii Sports", "Super Mario Bros.", "Mario Kart Wii", "Wii…
## $ Platform <chr> "Wii", "NES", "Wii", "Wii", "GB", "GB", "DS", "Wii", "Wii…
## $ Year <dbl> 2006, 1985, 2008, 2009, 1996, 1989, 2006, 2006, 2009, 198…
## $ Genre <chr> "Sports", "Platform", "Racing", "Sports", "Role-Playing",…
## $ Publisher <chr> "Nintendo", "Nintendo", "Nintendo", "Nintendo", "Nintendo…
## $ NA_Sales <dbl> 41.49, 29.08, 15.85, 15.75, 11.27, 23.20, 11.38, 14.03, 1…
## $ EU_Sales <dbl> 29.02, 3.58, 12.88, 11.01, 8.89, 2.26, 9.23, 9.20, 7.06, …
## $ JP_Sales <dbl> 3.77, 6.81, 3.79, 3.28, 10.22, 4.22, 6.50, 2.93, 4.70, 0.…
## $ Other_Sales <dbl> 8.46, 0.77, 3.31, 2.96, 1.00, 0.58, 2.90, 2.85, 2.26, 0.4…
## $ Global_Sales <dbl> 82.74, 40.24, 35.82, 33.00, 31.37, 30.26, 30.01, 29.02, 2…
genre_sales <- vg_clean %>%
group_by(Genre) %>%
summarize(total_global_sales = sum(Global_Sales, na.rm = TRUE)) %>%
arrange(desc(total_global_sales))
genre_sales
## # A tibble: 12 × 2
## Genre total_global_sales
## <chr> <dbl>
## 1 Action 1723.
## 2 Sports 1309.
## 3 Shooter 1026.
## 4 Role-Playing 924.
## 5 Platform 829.
## 6 Misc 798.
## 7 Racing 727.
## 8 Fighting 444.
## 9 Simulation 390.
## 10 Puzzle 242.
## 11 Adventure 235.
## 12 Strategy 173.
ggplot(genre_sales, aes(x = reorder(Genre, total_global_sales),
y = total_global_sales,
fill = Genre)) +
geom_col() +
coord_flip() +
labs(title = "Total Global Sales by Genre",
x = "Genre",
y = "Global Sales (Millions)") +
theme_minimal() +
theme(legend.position = "none")
### Interpretation
The bar chart shows the total global sales for each video game
genre.
Genres such as Action, Sports, and
Shooter appear at the top, meaning they generate the
most revenue worldwide.
This tells stakeholders that investing in these genres may lead to higher financial returns, while niche categories like Puzzle or Strategy sell less overall.
A t-test is not appropriate here because we are comparing multiple categories, not two groups. Instead, this summary table and bar chart give a clear picture of how each genre performs.
platform_sales <- vg_clean %>%
group_by(Platform) %>%
summarise(total_global_sales = sum(Global_Sales, na.rm = TRUE)) %>%
arrange(desc(total_global_sales))
platform_sales
## # A tibble: 31 × 2
## Platform total_global_sales
## <chr> <dbl>
## 1 PS2 1233.
## 2 X360 970.
## 3 PS3 949.
## 4 Wii 910.
## 5 DS 819.
## 6 PS 727.
## 7 GBA 314.
## 8 PSP 292.
## 9 PS4 278.
## 10 PC 255.
## # ℹ 21 more rows
platform_top10 <- platform_sales %>%
slice_max(total_global_sales, n = 10)
ggplot(platform_top10, aes(x = reorder(Platform, total_global_sales),
y = total_global_sales,
fill = Platform)) +
geom_col(width = 0.7) +
coord_flip() +
scale_y_continuous(
breaks = seq(0, max(platform_top10$total_global_sales), by = 100),
labels = scales::comma
) +
labs(
title = "Top 10 Platforms by Global Sales",
x = "Platform",
y = "Global Sales (Millions)"
) +
theme_minimal() +
theme(
legend.position = "none",
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10)
)
yearly_sales <- vg_clean %>%
group_by(Year) %>%
summarise(total_global_sales = sum(Global_Sales, na.rm = TRUE)) %>%
arrange(Year)
yearly_sales
## # A tibble: 39 × 2
## Year total_global_sales
## <dbl> <dbl>
## 1 1980 11.4
## 2 1981 35.8
## 3 1982 28.9
## 4 1983 16.8
## 5 1984 50.4
## 6 1985 53.9
## 7 1986 37.1
## 8 1987 21.7
## 9 1988 47.2
## 10 1989 73.4
## # ℹ 29 more rows
ggplot(yearly_sales, aes(x = Year, y = total_global_sales)) +
geom_line(color = "#2C7FB8", size = 1) +
geom_point(color = "#2C7FB8", size = 2) +
geom_smooth(method = "loess", se = FALSE, color = "darkred") +
labs(
title = "Global Video Game Sales Over Time (With Trend Line)",
x = "Year",
y = "Global Sales (Millions)"
) +
theme_minimal()
### Interpretation
This chart shows how global video game sales changed from 1980 to
2020.
Sales grew slowly through the 1980s and 1990s, then increased sharply in
the early 2000s. The industry reached its peak around
2007–2010, which aligns with the success of consoles
like the Wii, Xbox 360, DS, and PlayStation 3.
After 2010, total sales begin to drop. Part of this decline may reflect real market changes, such as the rise of mobile gaming and digital downloads replacing physical game purchases.
There is very limited data available after 2015 in the vgsales
dataset.
This is because physical sales were no longer consistently reported, and
many digital sales numbers were not captured.
So the sharp downward trend after 2015 represents missing
data, not an actual collapse of the gaming industry.
The overall trend line helps show the long-term growth of the video game market, followed by a data-driven decline due to incomplete reporting.
regional_sales <- vg_clean %>%
summarise(
total_na = sum(NA_Sales, na.rm = TRUE),
total_eu = sum(EU_Sales, na.rm = TRUE),
total_jp = sum(JP_Sales, na.rm = TRUE)
)
regional_sales
## # A tibble: 1 × 3
## total_na total_eu total_jp
## <dbl> <dbl> <dbl>
## 1 4333. 2409. 1284.
regional_long <- regional_sales %>%
pivot_longer(cols = everything(),
names_to = "Region",
values_to = "Sales")
regional_long
## # A tibble: 3 × 2
## Region Sales
## <chr> <dbl>
## 1 total_na 4333.
## 2 total_eu 2409.
## 3 total_jp 1284.
ggplot(regional_long, aes(x = reorder(Region, Sales), y = Sales, fill = Region)) +
geom_col(width = 0.6) +
coord_flip() +
labs(
title = "Total Sales by Region (NA vs EU vs JP)",
x = "Region",
y = "Sales (Millions)"
) +
scale_y_continuous(labels = scales::comma) +
theme_minimal() +
theme(legend.position = "none")
The comparison of regional sales shows clear differences in market
size.
North America has the highest total sales, followed closely by
Europe.
Japan generates significantly lower total revenue, although it remains
an important market for certain genres and publishers.
These differences suggest that:
Understanding regional preferences helps companies decide where to allocate marketing budgets and which platforms or genres to prioritize for each region.
# Correlations between regions
# Correlation calculations
cor_na_eu <- cor(vg$NA_Sales, vg$EU_Sales, use = "complete.obs")
cor_na_jp <- cor(vg$NA_Sales, vg$JP_Sales, use = "complete.obs")
cor_eu_jp <- cor(vg$EU_Sales, vg$JP_Sales, use = "complete.obs")
# Build clean correlation table
cor_results <- tibble(
Comparison = c("North America vs Europe",
"North America vs Japan",
"Europe vs Japan"),
Correlation = round(c(cor_na_eu, cor_na_jp, cor_eu_jp), 3)
)
cor_results
## # A tibble: 3 × 2
## Comparison Correlation
## <chr> <dbl>
## 1 North America vs Europe 0.768
## 2 North America vs Japan 0.45
## 3 Europe vs Japan 0.436
Interpretation of Correlation Results
North America vs Europe (0.768) Strong positive correlation. Games that sell well in NA usually sell well in Europe too.
North America vs Japan (0.450) Moderate correlation. Japan has different gaming preferences.
Europe vs Japan (0.436) Weak-to-moderate correlation. These markets behave differently in terms of game popularity.
Conclusion: NA and EU behave similarly, while Japan is a unique market with distinct preferences. Stakeholders should use different marketing and publishing strategies for Japan.
# code here
publisher_sales <- vg_clean %>%
group_by(Publisher) %>%
summarize(total_global_sales = sum(Global_Sales, na.rm = TRUE)) %>%
arrange(desc(total_global_sales)) %>%
slice(1:10)
publisher_sales
## # A tibble: 10 × 2
## Publisher total_global_sales
## <chr> <dbl>
## 1 Nintendo 1784.
## 2 Electronic Arts 1093.
## 3 Activision 721.
## 4 Sony Computer Entertainment 607.
## 5 Ubisoft 474.
## 6 Take-Two Interactive 399.
## 7 THQ 340.
## 8 Konami Digital Entertainment 279.
## 9 Sega 271.
## 10 Namco Bandai Games 254.
ggplot(publisher_sales,
aes(x = reorder(Publisher, total_global_sales),
y = total_global_sales,
fill = Publisher)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = round(total_global_sales, 0)),
hjust = -0.1,
size = 4) +
coord_flip() +
labs(
title = "Top 10 Publishers by Global Sales",
x = "Publisher",
y = "Global Sales (Millions)"
) +
theme_minimal(base_size = 14) +
scale_y_continuous(labels = scales::comma,
expand = expansion(mult = c(0, 0.15)))
Interpretation:
Nintendo is the clear leader in global sales, far ahead of all other publishers. Electronic Arts and Activision also perform strongly, showing they consistently release successful, high-selling titles. The rest of the top publishers, such as Sony, Ubisoft, and Take-Two, show solid performance but at a smaller scale. Overall, the chart shows that a small group of major publishers dominate most of the global video game market.
✅ Final Summary of Findings
This analysis explored global video game sales across genres, platforms, regions, and publishers. The goal was to identify clear trends that could guide marketing, investment, and strategic decisions.
Action, Sports, and Shooter games generated the highest global sales. These genres consistently appeal to a wide audience and provide reliable returns.
The PlayStation 2, Xbox 360, and PlayStation 3 dominated total global sales. These consoles had long life cycles, strong game libraries, and major market influence.
Global sales increased steadily from the mid-1990s to a peak around 2008–2010. After 2012, sales declined, likely due to digital distribution shifting away from physical game sales (the dataset captures only physical sales).
Nintendo, Electronic Arts, and Activision are the top-selling publishers. They produce successful franchises across multiple regions and remain key industry leaders.
North America is the largest market, followed by Europe. Japan is smaller overall but has strong performance in specific genres and platforms.
A two-sample t-test shows that the Top 3 publishers have significantly higher average global sales than all other publishers. This confirms that their dominance is not random — they consistently outperform competitors.
✅ Final Business Recommendations 1. Prioritize High-Performing Genres
Invest more heavily in Action, Sports, and Shooter games. These categories provide reliable global demand.
Platforms like PS2/PS3/X360 historically drive the highest sales. Future strategies should consider where players currently live (e.g., PlayStation & Nintendo ecosystems).
Nintendo, EA, and Activision demonstrate consistent success. Working with these companies — or modeling strategies after them — increases the chance of high performance.
North America and Europe are your largest markets. Japan requires more targeted, genre-specific approaches.
Monitoring new platforms (Switch, PS5, Series X) could provide insights into the next major sales boom.
✅ Short Closing Statement
Overall, the data shows clear leaders across genres, platforms, and publishers. The market has predictable patterns, and companies that understand these trends can make smarter investments, target audiences more effectively, and increase the likelihood of releasing top-performing games.
Appendix: Use of AI Tools
For this project, I used AI tools to support my workflow, mainly to improve coding efficiency, formatting, and visualization clarity. All analysis, interpretation, and final decisions were made by me. The following tools were used:
ChatGPT was used to:
Help format R Markdown chunks correctly
Fix syntax errors and improve ggplot visualizations
Suggest clear labeling, titles, and theme adjustments
Provide explanations of statistical outputs
Assist with writing short interpretations in simpler language
Help structure the analysis for the five main business questions
ChatGPT did not generate the dataset, did not run the code, and did not make conclusions for me. I reviewed and approved all code and analysis.
Gamma AI was used only to:
Convert my report summary into presentation slides
Improve layout and readability of the visuals
Speed up slide design and formatting
Gamma did not perform any data analysis.