“A video game is an interactive electronic game where players use a device (like a controller or keyboard) to manipulate on-screen visuals. Most modern video games are audio-visual, with sound and sometimes other sensory feedback like haptic technology”.
This dataset was created in 2017 and focuses on video games. Today, I’m going to explore whether there’s a correlation between how much people spend on video games and how long they actually play them. I chose this dataset because I personally love video games—especially sports games. Right now, College Football 25 is my favorite. I often find myself hesitating to buy new games when they come out because of the high prices. There’s always that fear: What if I don’t like the game? Will I have just wasted my money? So, this project is a way for me to see if others might feel the same way—are people more likely to invest their time in games they spent more money on?
I focused on three main variables, Title – the name of the video game Metrics_Sales – the total sales made on the game, measured in millions of dollars Length_All_PlayStyles_Average – the average time (in hours) players reported spending to complete the game in any play style.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
model <-lm(metrics_sales ~ length_all_playstyles_average, data = videogame_clean)summary(model)
Call:
lm(formula = metrics_sales ~ length_all_playstyles_average, data = videogame_clean)
Residuals:
Min 1Q Median 3Q Max
-2.3836 -0.3686 -0.2705 -0.0216 14.2196
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.393840 0.037202 10.586 < 2e-16 ***
length_all_playstyles_average 0.008007 0.001569 5.104 3.86e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.059 on 1210 degrees of freedom
Multiple R-squared: 0.02107, Adjusted R-squared: 0.02027
F-statistic: 26.05 on 1 and 1210 DF, p-value: 3.862e-07
📊Scatter Plot with Regression Line
interactive tooltips showing game titles , play time and sales
plot1 <-ggplot(videogame_clean, aes(x = length_all_playstyles_average, y = metrics_sales)) +geom_point(aes(color = length_all_playstyles_average,text =paste0("Title: ", title,"<br>Play Time: ", round(length_all_playstyles_average, 1), " hrs","<br>Sales: $", round(metrics_sales, 2), " million")), size =3, alpha =1) +geom_smooth(method ="lm", se =FALSE, color ="black") +labs(title ="Do time played equal dollar paid ?",x ="Average Play Time (Hours)",y ="Total Sales (Millions)", ) +theme_classic() +scale_color_gradient(low ="#56B1F7", high ="#FF69B4")
Warning in geom_point(aes(color = length_all_playstyles_average, text =
paste0("Title: ", : Ignoring unknown aesthetics: text
ggplotly(plot1, tooltip ="text")
`geom_smooth()` using formula = 'y ~ x'
The scatterplot reveals interesting insight about video game sales and playtime. I noticed that “Wii Play” tops the sales chart with 14.6 million copies sold, while “Monster Hunter Freedom” boasts the highest average playtime, despite only selling 250,000 copies. I was surprised to see that Grand Theft Auto IV had relatively modest sales, considering its popularity in the gaming community.
# A tibble: 6 × 3
title metrics_sales length_all_playstyles_average
<chr> <dbl> <dbl>
1 LifeSigns: Surgical Unit 0.01 30.5
2 Custom Robo Arena 0.01 24.0
3 Gurumin: A Monstrous Adventure 0.01 11.5
4 Spider-Man 3 0.01 10.8
5 Virtua Tennis 3 0.01 18.2
6 Front Mission 0.01 20.9
I explore the relationship between how long video game players typically spend playing a game and how that corletions to how well the game performed in terms of sales. My linear regression analysis revealed a clear connection, it sows us that games with longer play times often achieve higher sales figures. While this doesn’t prove that longer playtime directly causes increased sales, it does suggest that games with more mission or content tend to attract more players and perform better in the game store.
Initially, I wanted to keep it simple, but after analyzing the data, I wished I had included more variables to gain deeper insights, like the game publisher, genre—whether it’s action, racing, or sports—and the consoles on which the game is available. I think this would have made the findings clearer for those who aren’t familiar with gaming. However, my skills are somewhat limited right now, and I feel hesitant to tackle too many variables at once.