project

Author

bradley

Do time played equal dollar paid ?

“A video game is an interactive electronic game where players use a device (like a controller or keyboard) to manipulate on-screen visuals. Most modern video games are audio-visual, with sound and sometimes other sensory feedback like haptic technology”.

This dataset was created in 2017 and focuses on video games. Today, I’m going to explore whether there’s a correlation between how much people spend on video games and how long they actually play them. I chose this dataset because I personally love video games—especially sports games. Right now, College Football 25 is my favorite. I often find myself hesitating to buy new games when they come out because of the high prices. There’s always that fear: What if I don’t like the game? Will I have just wasted my money? So, this project is a way for me to see if others might feel the same way—are people more likely to invest their time in games they spent more money on?

I focused on three main variables, Title – the name of the video game Metrics_Sales – the total sales made on the game, measured in millions of dollars Length_All_PlayStyles_Average – the average time (in hours) players reported spending to complete the game in any play style.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
library(readr)
setwd("~/Desktop/data 110")
videogame <- read_csv("video_games.csv")
Rows: 1212 Columns: 36
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (5): Title, Metadata.Genres, Metadata.Publishers, Release.Console, Rele...
dbl (25): Features.Max Players, Metrics.Review Score, Metrics.Sales, Metrics...
lgl  (6): Features.Handheld?, Features.Multiplatform?, Features.Online?, Met...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(videogame)
# A tibble: 6 × 36
  Title       `Features.Handheld?` `Features.Max Players` Features.Multiplatfo…¹
  <chr>       <lgl>                                 <dbl> <lgl>                 
1 Super Mari… TRUE                                      1 TRUE                  
2 Lumines: P… TRUE                                      1 TRUE                  
3 WarioWare … TRUE                                      2 TRUE                  
4 Hot Shots … TRUE                                      1 TRUE                  
5 Spider-Man… TRUE                                      1 TRUE                  
6 The Urbz: … TRUE                                      1 TRUE                  
# ℹ abbreviated name: ¹​`Features.Multiplatform?`
# ℹ 32 more variables: `Features.Online?` <lgl>, Metadata.Genres <chr>,
#   `Metadata.Licensed?` <lgl>, Metadata.Publishers <chr>,
#   `Metadata.Sequel?` <lgl>, `Metrics.Review Score` <dbl>,
#   Metrics.Sales <dbl>, `Metrics.Used Price` <dbl>, Release.Console <chr>,
#   Release.Rating <chr>, `Release.Re-release?` <lgl>, Release.Year <dbl>,
#   `Length.All PlayStyles.Average` <dbl>, …
summary(videogame)
    Title           Features.Handheld? Features.Max Players
 Length:1212        Mode:logical       Min.   :1.000       
 Class :character   TRUE:1212          1st Qu.:1.000       
 Mode  :character                      Median :1.000       
                                       Mean   :1.658       
                                       3rd Qu.:2.000       
                                       Max.   :8.000       
 Features.Multiplatform? Features.Online? Metadata.Genres    Metadata.Licensed?
 Mode:logical            Mode:logical     Length:1212        Mode:logical      
 TRUE:1212               TRUE:1212        Class :character   TRUE:1212         
                                          Mode  :character                     
                                                                               
                                                                               
                                                                               
 Metadata.Publishers Metadata.Sequel? Metrics.Review Score Metrics.Sales    
 Length:1212         Mode:logical     Min.   :19.00        Min.   : 0.0100  
 Class :character    TRUE:1212        1st Qu.:60.00        1st Qu.: 0.0900  
 Mode  :character                     Median :70.00        Median : 0.2100  
                                      Mean   :68.83        Mean   : 0.5032  
                                      3rd Qu.:79.00        3rd Qu.: 0.4600  
                                      Max.   :98.00        Max.   :14.6600  
 Metrics.Used Price Release.Console    Release.Rating     Release.Re-release?
 Min.   : 4.95      Length:1212        Length:1212        Mode:logical       
 1st Qu.:14.95      Class :character   Class :character   TRUE:1212          
 Median :17.95      Mode  :character   Mode  :character                      
 Mean   :17.39                                                               
 3rd Qu.:17.95                                                               
 Max.   :49.95                                                               
  Release.Year  Length.All PlayStyles.Average Length.All PlayStyles.Leisure
 Min.   :2004   Min.   :  0.000               Min.   :  0.00               
 1st Qu.:2006   1st Qu.:  3.562               1st Qu.:  4.00               
 Median :2007   Median :  8.858               Median : 12.00               
 Mean   :2007   Mean   : 13.653               Mean   : 26.25               
 3rd Qu.:2008   3rd Qu.: 16.033               3rd Qu.: 27.60               
 Max.   :2008   Max.   :279.733               Max.   :476.27               
 Length.All PlayStyles.Median Length.All PlayStyles.Polled
 Min.   :  0.000              Min.   :   0.00             
 1st Qu.:  3.025              1st Qu.:   1.00             
 Median :  8.000              Median :   6.00             
 Mean   : 11.225              Mean   :  44.42             
 3rd Qu.: 13.783              3rd Qu.:  25.00             
 Max.   :126.000              Max.   :2300.00             
 Length.All PlayStyles.Rushed Length.Completionists.Average
 Min.   :  0.000              Min.   :  0.00               
 1st Qu.:  2.600              1st Qu.:  0.00               
 Median :  6.708              Median :  6.00               
 Mean   :  9.396              Mean   : 19.81               
 3rd Qu.: 11.367              3rd Qu.: 21.55               
 Max.   :120.200              Max.   :683.13               
 Length.Completionists.Leisure Length.Completionists.Median
 Min.   :  0.000               Min.   :  0.00              
 1st Qu.:  0.000               1st Qu.:  0.00              
 Median :  6.167               Median :  6.00              
 Mean   : 25.775               Mean   : 18.80              
 3rd Qu.: 27.117               3rd Qu.: 20.35              
 Max.   :691.567               Max.   :683.13              
 Length.Completionists.Polled Length.Completionists.Rushed
 Min.   :  0.000              Min.   :  0.00              
 1st Qu.:  0.000              1st Qu.:  0.00              
 Median :  1.000              Median :  5.50              
 Mean   :  5.658              Mean   : 16.40              
 3rd Qu.:  3.000              3rd Qu.: 18.38              
 Max.   :379.000              Max.   :674.70              
 Length.Main + Extras.Average Length.Main + Extras.Leisure
 Min.   :  0.000              Min.   :  0.00              
 1st Qu.:  0.000              1st Qu.:  0.00              
 Median :  7.292              Median :  8.00              
 Mean   : 12.731              Mean   : 18.87              
 3rd Qu.: 16.113              3rd Qu.: 21.03              
 Max.   :291.000              Max.   :478.93              
 Length.Main + Extras.Median Length.Main + Extras.Polled
 Min.   :  0.0               Min.   :   0               
 1st Qu.:  0.0               1st Qu.:   0               
 Median :  7.0               Median :   1               
 Mean   : 12.1               Mean   :  14               
 3rd Qu.: 15.0               3rd Qu.:   7               
 Max.   :291.0               Max.   :1100               
 Length.Main + Extras.Rushed Length.Main Story.Average
 Min.   :  0.000             Min.   : 0.000           
 1st Qu.:  0.000             1st Qu.: 0.000           
 Median :  6.283             Median : 6.575           
 Mean   : 10.320             Mean   : 8.466           
 3rd Qu.: 12.942             3rd Qu.:11.033           
 Max.   :291.000             Max.   :72.383           
 Length.Main Story.Leisure Length.Main Story.Median Length.Main Story.Polled
 Min.   :  0.00            Min.   : 0.000           Min.   :   0.00         
 1st Qu.:  0.00            1st Qu.: 0.000           1st Qu.:   0.00         
 Median :  8.00            Median : 6.042           Median :   3.00         
 Mean   : 11.05            Mean   : 8.281           Mean   :  24.88         
 3rd Qu.: 14.51            3rd Qu.:10.533           3rd Qu.:  14.00         
 Max.   :135.58            Max.   :70.000           Max.   :1100.00         
 Length.Main Story.Rushed
 Min.   : 0.000          
 1st Qu.: 0.000          
 Median : 5.342          
 Mean   : 6.975          
 3rd Qu.: 9.312          
 Max.   :70.000          

data cleaning

lowering the names for my variable while removing all unnecessary spaces

names(videogame) <- tolower(names(videogame))
names(videogame) <- gsub(" ", "_", names(videogame))
names(videogame) <- gsub("\\.", "_", names(videogame))
videogame
# A tibble: 1,212 × 36
   title        `features_handheld?` features_max_players features_multiplatfo…¹
   <chr>        <lgl>                               <dbl> <lgl>                 
 1 Super Mario… TRUE                                    1 TRUE                  
 2 Lumines: Pu… TRUE                                    1 TRUE                  
 3 WarioWare T… TRUE                                    2 TRUE                  
 4 Hot Shots G… TRUE                                    1 TRUE                  
 5 Spider-Man 2 TRUE                                    1 TRUE                  
 6 The Urbz: S… TRUE                                    1 TRUE                  
 7 Ridge Racer  TRUE                                    1 TRUE                  
 8 Metal Gear … TRUE                                    1 TRUE                  
 9 Madden NFL … TRUE                                    1 TRUE                  
10 Pokmon Dash  TRUE                                    1 TRUE                  
# ℹ 1,202 more rows
# ℹ abbreviated name: ¹​`features_multiplatform?`
# ℹ 32 more variables: `features_online?` <lgl>, metadata_genres <chr>,
#   `metadata_licensed?` <lgl>, metadata_publishers <chr>,
#   `metadata_sequel?` <lgl>, metrics_review_score <dbl>, metrics_sales <dbl>,
#   metrics_used_price <dbl>, release_console <chr>, release_rating <chr>,
#   `release_re-release?` <lgl>, release_year <dbl>, …

Filter

selecting what variables i want to work with

videogame_clean <- videogame %>%
  select(title, metrics_sales, length_all_playstyles_average) %>%
  filter(!is.na(metrics_sales) & !is.na(length_all_playstyles_average))
summary(videogame_clean)
    title           metrics_sales     length_all_playstyles_average
 Length:1212        Min.   : 0.0100   Min.   :  0.000              
 Class :character   1st Qu.: 0.0900   1st Qu.:  3.562              
 Mode  :character   Median : 0.2100   Median :  8.858              
                    Mean   : 0.5032   Mean   : 13.653              
                    3rd Qu.: 0.4600   3rd Qu.: 16.033              
                    Max.   :14.6600   Max.   :279.733              

statistical analysis

cor(videogame_clean$length_all_playstyles_average, videogame_clean$metrics_sales)
[1] 0.1451712

Linear Regression: Play Time - Sales

model <- lm(metrics_sales ~ length_all_playstyles_average, data = videogame_clean)
summary(model)

Call:
lm(formula = metrics_sales ~ length_all_playstyles_average, data = videogame_clean)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.3836 -0.3686 -0.2705 -0.0216 14.2196 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   0.393840   0.037202  10.586  < 2e-16 ***
length_all_playstyles_average 0.008007   0.001569   5.104 3.86e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.059 on 1210 degrees of freedom
Multiple R-squared:  0.02107,   Adjusted R-squared:  0.02027 
F-statistic: 26.05 on 1 and 1210 DF,  p-value: 3.862e-07

📊Scatter Plot with Regression Line

interactive tooltips showing game titles , play time and sales

plot1 <- ggplot(videogame_clean, aes(x = length_all_playstyles_average,  y = metrics_sales)) +
  geom_point(aes(color = length_all_playstyles_average,
                 text = paste0("Title: ", title,
                               "<br>Play Time: ", round(length_all_playstyles_average, 1), " hrs",
                               "<br>Sales: $", round(metrics_sales, 2), " million")), 
             size = 3, alpha = 1) +
 
 geom_smooth(method = "lm", se = FALSE, color = "black") +
  labs(title = "Do time played equal dollar paid ?",
       x = "Average Play Time (Hours)",
       y = "Total Sales (Millions)",
       ) +
  
  theme_classic() +
  scale_color_gradient(low = "#56B1F7", high = "#FF69B4")
Warning in geom_point(aes(color = length_all_playstyles_average, text =
paste0("Title: ", : Ignoring unknown aesthetics: text
ggplotly(plot1, tooltip = "text")
`geom_smooth()` using formula = 'y ~ x'

The scatterplot reveals interesting insight about video game sales and playtime. I noticed that “Wii Play” tops the sales chart with 14.6 million copies sold, while “Monster Hunter Freedom” boasts the highest average playtime, despite only selling 250,000 copies. I was surprised to see that Grand Theft Auto IV had relatively modest sales, considering its popularity in the gaming community.

filetr top 10 games by sales

top10_sales <- videogame_clean |>
  arrange(desc(metrics_sales)) |>
  head(10) |>
  filter(length_all_playstyles_average >= 5 & length_all_playstyles_average <= 100) |>
  filter(metrics_sales==max(metrics_sales))

top10_sales
# A tibble: 1 × 3
  title    metrics_sales length_all_playstyles_average
  <chr>            <dbl>                         <dbl>
1 Wii Play          14.7                          5.82

bottom 10

bottom10_sales <- videogame_clean |>
  arrange(metrics_sales) |>
  head(10) |>
  filter(length_all_playstyles_average >= 5 & length_all_playstyles_average <= 100) |>
  filter(metrics_sales == max(metrics_sales))
bottom10_sales
# A tibble: 6 × 3
  title                          metrics_sales length_all_playstyles_average
  <chr>                                  <dbl>                         <dbl>
1 LifeSigns: Surgical Unit                0.01                          30.5
2 Custom Robo Arena                       0.01                          24.0
3 Gurumin: A Monstrous Adventure          0.01                          11.5
4 Spider-Man 3                            0.01                          10.8
5 Virtua Tennis 3                         0.01                          18.2
6 Front Mission                           0.01                          20.9

I explore the relationship between how long video game players typically spend playing a game and how that corletions to how well the game performed in terms of sales. My linear regression analysis revealed a clear connection, it sows us that games with longer play times often achieve higher sales figures. While this doesn’t prove that longer playtime directly causes increased sales, it does suggest that games with more mission or content tend to attract more players and perform better in the game store.

Initially, I wanted to keep it simple, but after analyzing the data, I wished I had included more variables to gain deeper insights, like the game publisher, genre—whether it’s action, racing, or sports—and the consoles on which the game is available. I think this would have made the findings clearer for those who aren’t familiar with gaming. However, my skills are somewhat limited right now, and I feel hesitant to tackle too many variables at once.

cite all website sources

https://en.wikipedia.org/wiki/Video_game#:~:text=A%20video%20game%20or%20computer,technology%20that%20provides%20tactile%20sensations).

https://www.independent.co.uk/games/best-gaming-console-b2027738.html

https://www.youtube.com/

chatgpt - https://chatgpt.com/ - promts why is this code not running, how to make my plot interactive how to add a picture to r studio