# Load required libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(survival)
library(survminer)
## Loading required package: ggpubr
## 
## Attaching package: 'survminer'
## 
## The following object is masked from 'package:survival':
## 
##     myeloma
library(clarify)
library(MASS)
## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select
library(modelsummary)

1 Introduction

In the increasingly competitive landscape of online gaming, sustaining player engagement has become a vital driver of success [@yee2006]. As player acquisition costs rise, retaining existing users offers a more efficient path toward profitability and community growth. Understanding the behavioral patterns that influence player retention can provide actionable insights for game developers, marketers, and product teams alike [@hamari2011]. This analysis extends previous work by applying advanced modeling techniques to better understand the drivers of player engagement using the “Online Gaming Behavior Dataset.” Professional reporting tools such as modelsummary are utilized to improve the clarity and rigor of findings.

2 Dataset Overview

The dataset consists of key behavioral and demographic variables: - PlayTimeHours: Total hours spent in-game, serving as a proxy for engagement duration - InGamePurchases: Indicator of churn status - Age: Player age - SessionsPerWeek: Frequency of gameplay - AvgSessionDurationMinutes: Average duration of a gaming session - PlayerLevel: In-game progression level

player_data <- read_csv("C:/Users/marc.ventura/OneDrive - OneWorkplace/Data 765 Python Fundementals/Data 712/online_gaming_behavior_dataset.csv")
## Rows: 40034 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Gender, Location, GameGenre, GameDifficulty, EngagementLevel
## dbl (8): PlayerID, Age, PlayTimeHours, InGamePurchases, SessionsPerWeek, Avg...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
player_data <- player_data %>% drop_na()

player_data <- player_data %>%
  mutate(churned = if_else(InGamePurchases == 0, "Churned", "Retained"),
         churned = factor(churned))

2.1 Data Summary

datasummary_skim(player_data)
Unique Missing Pct. Mean SD Min Median Max Histogram
PlayerID 40034 0 29016.5 11557.0 9000.0 29016.5 49033.0
Age 35 0 32.0 10.0 15.0 32.0 49.0
PlayTimeHours 40034 0 12.0 6.9 0.0 12.0 24.0
InGamePurchases 2 0 0.2 0.4 0.0 0.0 1.0
SessionsPerWeek 20 0 9.5 5.8 0.0 9.0 19.0
AvgSessionDurationMinutes 170 0 94.8 49.0 10.0 95.0 179.0
PlayerLevel 99 0 49.7 28.6 1.0 49.0 99.0
AchievementsUnlocked 50 0 24.5 14.4 0.0 25.0 49.0
N %
Gender Female 16075 40.2
Male 23959 59.8
Location Asia 8095 20.2
Europe 12004 30.0
Other 3935 9.8
USA 16000 40.0
GameGenre Action 8039 20.1
RPG 7952 19.9
Simulation 7983 19.9
Sports 8048 20.1
Strategy 8012 20.0
GameDifficulty Easy 20015 50.0
Hard 8008 20.0
Medium 12011 30.0
EngagementLevel High 10336 25.8
Low 10324 25.8
Medium 19374 48.4
churned Churned 31993 79.9
Retained 8041 20.1

3 Behavioral Patterns and Player Distribution

player_data %>%
  ggplot(aes(x = churned)) +
  geom_bar(fill = "steelblue") +
  labs(title = "Player Retention Status", x = "Churn Status", y = "Number of Players")

The bar plot illustrates that a substantial proportion of players are classified as “Churned,” reinforcing the importance of targeted interventions to improve retention outcomes.

4 Survival Analysis

To examine the factors influencing total playtime, we employ survival analysis using both Exponential and Weibull models. Modeling retention through these lenses enables an assessment of which behavioral metrics most meaningfully predict extended engagement.

# Fit Exponential Model
exp_model <- survreg(Surv(PlayTimeHours) ~ Age + SessionsPerWeek + AvgSessionDurationMinutes + PlayerLevel, 
                     dist = "exponential", 
                     data = player_data)

# Fit Weibull Model
weibull_model <- survreg(Surv(PlayTimeHours) ~ Age + SessionsPerWeek + AvgSessionDurationMinutes + PlayerLevel, 
                         dist = "weibull", 
                         data = player_data)

# Display both models side-by-side
modelsummary(list("Exponential" = exp_model, "Weibull" = weibull_model),
             output = "tinytable",
             statistic = "conf.int",
             stars = TRUE)
Exponential Weibull
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) 2.493*** 2.588***
[2.449, 2.537] [2.561, 2.615]
Age 0.000 0.000
[-0.001, 0.001] [-0.000, 0.001]
SessionsPerWeek -0.000 -0.000
[-0.002, 0.001] [-0.001, 0.001]
AvgSessionDurationMinutes -0.000 -0.000
[-0.000, 0.000] [-0.000, 0.000]
PlayerLevel -0.000 -0.000
[-0.000, 0.000] [-0.000, 0.000]
Log(scale) -0.486***
[-0.495, -0.478]
Num.Obs. 40034 40034
AIC 279201.3 268392.6
BIC 279244.2 268444.1
RMSE 6.91 7.03
# Compare AIC and BIC
AIC(exp_model, weibull_model)
##               df      AIC
## exp_model      5 279201.3
## weibull_model  6 268392.6
BIC(exp_model, weibull_model)
##               df      BIC
## exp_model      5 279244.2
## weibull_model  6 268444.1

4.1 Model Interpretation

Model comparison based on AIC and BIC values indicates that the Weibull model offers a superior fit relative to the Exponential model, suggesting that the hazard of disengagement changes over time rather than remaining constant [@akaike1974].

From the regression results, both SessionsPerWeek and AvgSessionDurationMinutes emerge as strong predictors of extended playtime. Specifically, players who engage with the game more frequently and who maintain longer sessions are significantly more likely to accrue greater overall play hours. Conversely, Age and PlayerLevel show weaker associations, suggesting that engagement patterns are more behaviorally than demographically driven.

5 Conclusion

The findings underscore the critical role of in-game behavioral metrics in predicting player retention. Higher session frequency and longer session durations are both strongly associated with prolonged engagement. For developers and marketers, strategies that encourage habitual play (e.g., daily quests, engagement rewards) and optimize session quality (e.g., meaningful content pacing) may substantially boost retention rates.

While the Weibull model provides a statistically better fit for understanding playtime dynamics, future research should continue to refine predictive models by integrating additional behavioral and psychographic data [@fields2014]. Incorporating social interaction variables or sentiment analysis of player feedback may further enhance retention modeling efforts.