This analysis explores ANOVA and Regression Modeling using the Social Media and Entertainment Dataset. Key objectives:
We’ll analyze two variables:
Checking Assumptions
# Visualizing distribution
ggplot(data, aes(x = `Daily Social Media Time (hrs)`, fill = `Primary Platform`)) +
geom_density(alpha = 0.5) +
labs(title = "Distribution of Social Media Time by Platform",
x = "Daily Social Media Time (hrs)",
y = "Density") +
theme_minimal()
# Levene's Test for equal variance
leveneTest(`Daily Social Media Time (hrs)` ~ `Primary Platform`, data = data)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4e+00 0.8405 0.4991
## 3e+05
Insight:
# ANOVA test
anova_result <- aov(`Daily Social Media Time (hrs)` ~ `Primary Platform`, data = data)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## `Primary Platform` 4e+00 12 3.079 0.657 0.622
## Residuals 3e+05 1406935 4.690
Conclusion:
ggplot(data, aes(x = `Primary Platform`, y = `Daily Social Media Time (hrs)`, fill = `Primary Platform`)) +
geom_boxplot() +
labs(
title = "Social Media Time by Platform",
x = "Primary Platform",
y = "Daily Social Media Time (hrs)"
) +
theme_minimal()
Insight:
We’ll build a linear regression model to predict Daily Social Media Time based on Age.
# Linear regression model
lm_model <- lm(`Daily Social Media Time (hrs)` ~ Age, data = data)
# Model summary
summary(lm_model)
##
## Call:
## lm(formula = `Daily Social Media Time (hrs)` ~ Age, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7630 -1.8779 0.0044 1.8732 3.7534
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.2671867 0.0108955 391.646 <2e-16 ***
## Age -0.0003213 0.0002635 -1.219 0.223
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.166 on 299998 degrees of freedom
## Multiple R-squared: 4.955e-06, Adjusted R-squared: 1.622e-06
## F-statistic: 1.487 on 1 and 299998 DF, p-value: 0.2228
Insight:
ggplot(data, aes(x = Age, y = `Daily Social Media Time (hrs)`)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", color = "red") +
labs(
title = "Regression Model: Age vs Social Media Time",
x = "Age",
y = "Daily Social Media Time (hrs)"
) +
theme_minimal()
Key Findings: