Introduction

This analysis explores Generalized Linear Models (GLMs) using the Social Media and Entertainment Dataset. Key objectives:

Goals:

  • Create a binary outcome variable.
  • Build a logistic regression model with 1–4 predictors.
  • Interpret model coefficients and confidence intervals.
  • Use odds ratios to evaluate effect sizes.


Step 1: Create Binary Response Variable

  • We define a binary variable:
    • Target: Whether a user spends 4+ hours on social media daily.
    • This threshold helps distinguish high engagement vs. low/average usage.
# Create binary response: 1 = High usage, 0 = Low/Avg
data <- data %>%
  mutate(HighUsage = ifelse(`Daily Social Media Time (hrs)` >= 4, 1, 0))

# Encode Gender as binary
data <- data %>%
  mutate(Gender_Binary = ifelse(Gender == "Female", 1, 0))

Step 2: Building the Logistic Regression Model

We use three predictors:

  • Age (continuous)
  • Gender_Binary (binary)
  • Average Sleep Time (hrs) (continuous)
# Fit logistic regression model
logit_model <- glm(HighUsage ~ Age + Gender_Binary + `Average Sleep Time (hrs)`,
                   data = data, family = "binomial")

# Model summary
summary(logit_model)
## 
## Call:
## glm(formula = HighUsage ~ Age + Gender_Binary + `Average Sleep Time (hrs)`, 
##     family = "binomial", data = data)
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 1.360e-01  1.950e-02   6.974 3.08e-12 ***
## Age                        -7.896e-05  2.439e-04  -0.324    0.746    
## Gender_Binary               7.015e-03  7.768e-03   0.903    0.367    
## `Average Sleep Time (hrs)`  7.875e-04  2.537e-03   0.310    0.756    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 414414  on 299999  degrees of freedom
## Residual deviance: 414413  on 299996  degrees of freedom
## AIC: 414421
## 
## Number of Fisher Scoring iterations: 3

Interpretation:

  • None of the predictors are statistically significant (p > 0.05).
  • While Gender_Binary has a positive coefficient, the effect is small and not significant.
  • Age and Sleep Time also have minimal predictive value for high social media use.
  • The model does not explain much variation, as indicated by a high AIC and minimal deviance reduction.

Step 3: Confidence Interval for a Coefficient

We compute a 95% confidence interval for the Gender_Binary coefficient.

# 95% Confidence Interval for Gender_Binary
confint(logit_model, parm = "Gender_Binary")
##        2.5 %       97.5 % 
## -0.008209621  0.022241142

Interpretation:

  • The CI includes 0, so we cannot rule out the possibility that gender has no effect.
  • In real terms: the model lacks evidence that gender meaningfully changes the odds of high social media use.

Step 4: Odds Ratios and Their Confidence Intervals

We interpret the coefficients by converting them to odds ratios.

# Odds Ratios
exp(coef(logit_model))
##                (Intercept)                        Age 
##                   1.145645                   0.999921 
##              Gender_Binary `Average Sleep Time (hrs)` 
##                   1.007039                   1.000788

# Confidence Intervals for Odds Ratios
exp(confint(logit_model))
##                                2.5 %   97.5 %
## (Intercept)                1.1026952 1.190272
## Age                        0.9994431 1.000399
## Gender_Binary              0.9918240 1.022490
## `Average Sleep Time (hrs)` 0.9958233 1.005777

Interpretation:

  • Age (OR ≈ 1.00): Every additional year barely changes odds of high usage.
  • Gender_Binary (OR ≈ 1.007): Females may have slightly higher odds, but the effect is negligible.
  • Sleep Time (OR ≈ 1.0008): Minor positive effect, practically insignificant.
  • All odds ratios are very close to 1 → predictors have minimal influence.

Final Insights and Next Steps

Key Findings:

  • The model fails to identify strong predictors of high social media use.
  • Predictors like age, gender, and sleep time do not show meaningful effects.
  • The CI for gender includes 0 → no statistically reliable impact.

What this means:

  • Simply knowing a user’s age, gender, or sleep does not help predict their social media behavior.
  • We need more relevant predictors (like device use, app engagement, or content preferences).

Next Steps:

  • Add behavioral or contextual variables to future models.
  • Try interaction terms or polynomial terms for nonlinear effects.
  • Validate model fit using accuracy or ROC-AUC metrics.