This analysis builds on Generalized Linear Models (GLMs) using the Social Media and Entertainment Dataset. This time, we explore how social media use, sleep quality, and platform choice affect the likelihood of being a short sleeper. We treat platform as a categorical variable and examine interaction effects and model accuracy.
We define a binary variable:
# Create binary response: 1 = Short Sleeper (< 6 hrs), 0 = Otherwise
data <- data %>%
mutate(ShortSleeper = ifelse(`Average Sleep Time (hrs)` < 6, 1, 0))
# Convert platform to factor to include all platforms
data <- data %>%
mutate(Primary_Platform_Factor = as.factor(`Primary Platform`))
We use three predictors:
logit_model2 <- glm(
ShortSleeper ~ `Daily Social Media Time (hrs)` +
`Primary Platform` + `Sleep Quality (scale 1-10)`,
data = data, family = "binomial"
)
summary(logit_model2)
##
## Call:
## glm(formula = ShortSleeper ~ `Daily Social Media Time (hrs)` +
## `Primary Platform` + `Sleep Quality (scale 1-10)`, family = "binomial",
## data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.416281 0.013252 -31.414 <2e-16 ***
## `Daily Social Media Time (hrs)` -0.000665 0.001721 -0.386 0.699
## `Primary Platform`Instagram 0.005336 0.011808 0.452 0.651
## `Primary Platform`TikTok -0.003611 0.011785 -0.306 0.759
## `Primary Platform`Twitter 0.010488 0.011777 0.891 0.373
## `Primary Platform`YouTube 0.015220 0.011800 1.290 0.197
## `Sleep Quality (scale 1-10)` 0.001234 0.001445 0.854 0.393
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 403691 on 299999 degrees of freedom
## Residual deviance: 403687 on 299993 degrees of freedom
## AIC: 403701
##
## Number of Fisher Scoring iterations: 4
We compute a 95% confidence interval for the Daily Social Media Time (hrs) coefficient.
confint(logit_model2, parm = "`Daily Social Media Time (hrs)`")
## 2.5 % 97.5 %
## -0.004038635 0.002708608
exp(coef(logit_model2))
## (Intercept) `Daily Social Media Time (hrs)`
## 0.6594951 0.9993352
## `Primary Platform`Instagram `Primary Platform`TikTok
## 1.0053505 0.9963959
## `Primary Platform`Twitter `Primary Platform`YouTube
## 1.0105427 1.0153361
## `Sleep Quality (scale 1-10)`
## 1.0012346
exp(confint(logit_model2))
## 2.5 % 97.5 %
## (Intercept) 0.6425823 0.6768437
## `Daily Social Media Time (hrs)` 0.9959695 1.0027123
## `Primary Platform`Instagram 0.9823510 1.0288885
## `Primary Platform`TikTok 0.9736454 1.0196781
## `Primary Platform`Twitter 0.9874840 1.0341402
## `Primary Platform`YouTube 0.9921230 1.0390926
## `Sleep Quality (scale 1-10)` 0.9984025 1.0040747
Interpretation:
We interpret the coefficients by converting them to odds ratios.
exp(coef(logit_model2))
## (Intercept) `Daily Social Media Time (hrs)`
## 0.6594951 0.9993352
## `Primary Platform`Instagram `Primary Platform`TikTok
## 1.0053505 0.9963959
## `Primary Platform`Twitter `Primary Platform`YouTube
## 1.0105427 1.0153361
## `Sleep Quality (scale 1-10)`
## 1.0012346
exp(confint(logit_model2))
## 2.5 % 97.5 %
## (Intercept) 0.6425823 0.6768437
## `Daily Social Media Time (hrs)` 0.9959695 1.0027123
## `Primary Platform`Instagram 0.9823510 1.0288885
## `Primary Platform`TikTok 0.9736454 1.0196781
## `Primary Platform`Twitter 0.9874840 1.0341402
## `Primary Platform`YouTube 0.9921230 1.0390926
## `Sleep Quality (scale 1-10)` 0.9984025 1.0040747
Interpretation:
logit_model_interact <- glm(
ShortSleeper ~ `Daily Social Media Time (hrs)` +
`Primary Platform` * `Sleep Quality (scale 1-10)`,
data = data, family = "binomial"
)
summary(logit_model_interact)
##
## Call:
## glm(formula = ShortSleeper ~ `Daily Social Media Time (hrs)` +
## `Primary Platform` * `Sleep Quality (scale 1-10)`, family = "binomial",
## data = data)
##
## Coefficients:
## Estimate Std. Error
## (Intercept) -0.3802835 0.0196054
## `Daily Social Media Time (hrs)` -0.0006523 0.0017213
## `Primary Platform`Instagram -0.0528729 0.0257994
## `Primary Platform`TikTok -0.0487723 0.0257098
## `Primary Platform`Twitter -0.0242428 0.0257011
## `Primary Platform`YouTube -0.0270200 0.0257595
## `Sleep Quality (scale 1-10)` -0.0059761 0.0032375
## `Primary Platform`Instagram:`Sleep Quality (scale 1-10)` 0.0116256 0.0045802
## `Primary Platform`TikTok:`Sleep Quality (scale 1-10)` 0.0090350 0.0045753
## `Primary Platform`Twitter:`Sleep Quality (scale 1-10)` 0.0069468 0.0045620
## `Primary Platform`YouTube:`Sleep Quality (scale 1-10)` 0.0084449 0.0045764
## z value Pr(>|z|)
## (Intercept) -19.397 <2e-16 ***
## `Daily Social Media Time (hrs)` -0.379 0.7047
## `Primary Platform`Instagram -2.049 0.0404 *
## `Primary Platform`TikTok -1.897 0.0578 .
## `Primary Platform`Twitter -0.943 0.3455
## `Primary Platform`YouTube -1.049 0.2942
## `Sleep Quality (scale 1-10)` -1.846 0.0649 .
## `Primary Platform`Instagram:`Sleep Quality (scale 1-10)` 2.538 0.0111 *
## `Primary Platform`TikTok:`Sleep Quality (scale 1-10)` 1.975 0.0483 *
## `Primary Platform`Twitter:`Sleep Quality (scale 1-10)` 1.523 0.1278
## `Primary Platform`YouTube:`Sleep Quality (scale 1-10)` 1.845 0.0650 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 403691 on 299999 degrees of freedom
## Residual deviance: 403679 on 299989 degrees of freedom
## AIC: 403701
##
## Number of Fisher Scoring iterations: 4
Interpretation:
probs <- predict(logit_model_interact, type = "response")
roc_obj <- roc(data$ShortSleeper, probs)
plot(roc_obj, col = "blue", main = "ROC Curve - Interaction Model")
auc(roc_obj)
## Area under the curve: 0.5039
Interpretation:
predicted <- ifelse(probs > 0.5, 1, 0)
confusionMatrix(factor(predicted), factor(data$ShortSleeper))
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 180143 119857
## 1 0 0
##
## Accuracy : 0.6005
## 95% CI : (0.5987, 0.6022)
## No Information Rate : 0.6005
## P-Value [Acc > NIR] : 0.5008
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 1.0000
## Specificity : 0.0000
## Pos Pred Value : 0.6005
## Neg Pred Value : NaN
## Prevalence : 0.6005
## Detection Rate : 0.6005
## Detection Prevalence : 1.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : 0
##
Interpretation:
Key Findings: