# Load required libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(survival)
library(survminer)
## Loading required package: ggpubr
##
## Attaching package: 'survminer'
##
## The following object is masked from 'package:survival':
##
## myeloma
library(clarify)
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
library(modelsummary)
In the increasingly competitive landscape of online gaming,
sustaining player engagement has become a vital driver of success [@yee2006]. As player acquisition costs rise,
retaining existing users offers a more efficient path toward
profitability and community growth. Understanding the behavioral
patterns that influence player retention can provide actionable insights
for game developers, marketers, and product teams alike [@hamari2011]. This analysis extends previous
work by applying advanced modeling techniques to better understand the
drivers of player engagement using the “Online Gaming Behavior Dataset.”
Professional reporting tools such as modelsummary
are
utilized to improve the clarity and rigor of findings.
The dataset consists of key behavioral and demographic variables: -
PlayTimeHours
: Total hours spent in-game, serving as a
proxy for engagement duration - InGamePurchases
: Indicator
of churn status - Age
: Player age -
SessionsPerWeek
: Frequency of gameplay -
AvgSessionDurationMinutes
: Average duration of a gaming
session - PlayerLevel
: In-game progression level
player_data <- read_csv("C:/Users/marc.ventura/OneDrive - OneWorkplace/Data 765 Python Fundementals/Data 712/online_gaming_behavior_dataset.csv")
## Rows: 40034 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Gender, Location, GameGenre, GameDifficulty, EngagementLevel
## dbl (8): PlayerID, Age, PlayTimeHours, InGamePurchases, SessionsPerWeek, Avg...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
player_data <- player_data %>% drop_na()
player_data <- player_data %>%
mutate(churned = if_else(InGamePurchases == 0, "Churned", "Retained"),
churned = factor(churned))
datasummary_skim(player_data)
Unique | Missing Pct. | Mean | SD | Min | Median | Max | Histogram | |
---|---|---|---|---|---|---|---|---|
PlayerID | 40034 | 0 | 29016.5 | 11557.0 | 9000.0 | 29016.5 | 49033.0 | |
Age | 35 | 0 | 32.0 | 10.0 | 15.0 | 32.0 | 49.0 | |
PlayTimeHours | 40034 | 0 | 12.0 | 6.9 | 0.0 | 12.0 | 24.0 | |
InGamePurchases | 2 | 0 | 0.2 | 0.4 | 0.0 | 0.0 | 1.0 | |
SessionsPerWeek | 20 | 0 | 9.5 | 5.8 | 0.0 | 9.0 | 19.0 | |
AvgSessionDurationMinutes | 170 | 0 | 94.8 | 49.0 | 10.0 | 95.0 | 179.0 | |
PlayerLevel | 99 | 0 | 49.7 | 28.6 | 1.0 | 49.0 | 99.0 | |
AchievementsUnlocked | 50 | 0 | 24.5 | 14.4 | 0.0 | 25.0 | 49.0 | |
N | % | |||||||
Gender | Female | 16075 | 40.2 | |||||
Male | 23959 | 59.8 | ||||||
Location | Asia | 8095 | 20.2 | |||||
Europe | 12004 | 30.0 | ||||||
Other | 3935 | 9.8 | ||||||
USA | 16000 | 40.0 | ||||||
GameGenre | Action | 8039 | 20.1 | |||||
RPG | 7952 | 19.9 | ||||||
Simulation | 7983 | 19.9 | ||||||
Sports | 8048 | 20.1 | ||||||
Strategy | 8012 | 20.0 | ||||||
GameDifficulty | Easy | 20015 | 50.0 | |||||
Hard | 8008 | 20.0 | ||||||
Medium | 12011 | 30.0 | ||||||
EngagementLevel | High | 10336 | 25.8 | |||||
Low | 10324 | 25.8 | ||||||
Medium | 19374 | 48.4 | ||||||
churned | Churned | 31993 | 79.9 | |||||
Retained | 8041 | 20.1 |
player_data %>%
ggplot(aes(x = churned)) +
geom_bar(fill = "steelblue") +
labs(title = "Player Retention Status", x = "Churn Status", y = "Number of Players")
The bar plot illustrates that a substantial proportion of players are classified as “Churned,” reinforcing the importance of targeted interventions to improve retention outcomes.
To examine the factors influencing total playtime, we employ survival analysis using both Exponential and Weibull models. Modeling retention through these lenses enables an assessment of which behavioral metrics most meaningfully predict extended engagement.
# Fit Exponential Model
exp_model <- survreg(Surv(PlayTimeHours) ~ Age + SessionsPerWeek + AvgSessionDurationMinutes + PlayerLevel,
dist = "exponential",
data = player_data)
# Fit Weibull Model
weibull_model <- survreg(Surv(PlayTimeHours) ~ Age + SessionsPerWeek + AvgSessionDurationMinutes + PlayerLevel,
dist = "weibull",
data = player_data)
# Display both models side-by-side
modelsummary(list("Exponential" = exp_model, "Weibull" = weibull_model),
output = "tinytable",
statistic = "conf.int",
stars = TRUE)
Exponential | Weibull | |
---|---|---|
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 | ||
(Intercept) | 2.493*** | 2.588*** |
[2.449, 2.537] | [2.561, 2.615] | |
Age | 0.000 | 0.000 |
[-0.001, 0.001] | [-0.000, 0.001] | |
SessionsPerWeek | -0.000 | -0.000 |
[-0.002, 0.001] | [-0.001, 0.001] | |
AvgSessionDurationMinutes | -0.000 | -0.000 |
[-0.000, 0.000] | [-0.000, 0.000] | |
PlayerLevel | -0.000 | -0.000 |
[-0.000, 0.000] | [-0.000, 0.000] | |
Log(scale) | -0.486*** | |
[-0.495, -0.478] | ||
Num.Obs. | 40034 | 40034 |
AIC | 279201.3 | 268392.6 |
BIC | 279244.2 | 268444.1 |
RMSE | 6.91 | 7.03 |
# Compare AIC and BIC
AIC(exp_model, weibull_model)
## df AIC
## exp_model 5 279201.3
## weibull_model 6 268392.6
BIC(exp_model, weibull_model)
## df BIC
## exp_model 5 279244.2
## weibull_model 6 268444.1
Model comparison based on AIC and BIC values indicates that the Weibull model offers a superior fit relative to the Exponential model, suggesting that the hazard of disengagement changes over time rather than remaining constant [@akaike1974].
From the regression results, both SessionsPerWeek
and
AvgSessionDurationMinutes
emerge as strong predictors of
extended playtime. Specifically, players who engage with the game more
frequently and who maintain longer sessions are significantly more
likely to accrue greater overall play hours. Conversely,
Age
and PlayerLevel
show weaker associations,
suggesting that engagement patterns are more behaviorally than
demographically driven.
The findings underscore the critical role of in-game behavioral metrics in predicting player retention. Higher session frequency and longer session durations are both strongly associated with prolonged engagement. For developers and marketers, strategies that encourage habitual play (e.g., daily quests, engagement rewards) and optimize session quality (e.g., meaningful content pacing) may substantially boost retention rates.
While the Weibull model provides a statistically better fit for understanding playtime dynamics, future research should continue to refine predictive models by integrating additional behavioral and psychographic data [@fields2014]. Incorporating social interaction variables or sentiment analysis of player feedback may further enhance retention modeling efforts.