This analysis explores Generalized Linear Models (GLMs) using the Social Media and Entertainment Dataset. Key objectives:
# Create binary response: 1 = High usage, 0 = Low/Avg
data <- data %>%
mutate(HighUsage = ifelse(`Daily Social Media Time (hrs)` >= 4, 1, 0))
# Encode Gender as binary
data <- data %>%
mutate(Gender_Binary = ifelse(Gender == "Female", 1, 0))
We use three predictors:
# Fit logistic regression model
logit_model <- glm(HighUsage ~ Age + Gender_Binary + `Average Sleep Time (hrs)`,
data = data, family = "binomial")
# Model summary
summary(logit_model)
##
## Call:
## glm(formula = HighUsage ~ Age + Gender_Binary + `Average Sleep Time (hrs)`,
## family = "binomial", data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.360e-01 1.950e-02 6.974 3.08e-12 ***
## Age -7.896e-05 2.439e-04 -0.324 0.746
## Gender_Binary 7.015e-03 7.768e-03 0.903 0.367
## `Average Sleep Time (hrs)` 7.875e-04 2.537e-03 0.310 0.756
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 414414 on 299999 degrees of freedom
## Residual deviance: 414413 on 299996 degrees of freedom
## AIC: 414421
##
## Number of Fisher Scoring iterations: 3
We compute a 95% confidence interval for the Gender_Binary coefficient.
# 95% Confidence Interval for Gender_Binary
confint(logit_model, parm = "Gender_Binary")
## 2.5 % 97.5 %
## -0.008209621 0.022241142
Interpretation:
We interpret the coefficients by converting them to odds ratios.
# Odds Ratios
exp(coef(logit_model))
## (Intercept) Age
## 1.145645 0.999921
## Gender_Binary `Average Sleep Time (hrs)`
## 1.007039 1.000788
# Confidence Intervals for Odds Ratios
exp(confint(logit_model))
## 2.5 % 97.5 %
## (Intercept) 1.1026952 1.190272
## Age 0.9994431 1.000399
## Gender_Binary 0.9918240 1.022490
## `Average Sleep Time (hrs)` 0.9958233 1.005777
Interpretation:
Key Findings: