Introduction

This analysis explores ANOVA and Regression Modeling using the Social Media and Entertainment Dataset. Key objectives:

  • Identify a response variable and an explanatory variable.
  • Conduct an ANOVA test and analyze the results.
  • Build a linear regression model and evaluate its performance.
  • Provide actionable insights based on findings.


Step 1: Variable Selection

  • We select Daily Social Media Time (hrs) as our response variable because it’s a key measure of user engagement.

We’ll analyze two variables:

  • Explanatory Variable (Categorical): Primary Platform
  • Explanatory Variable (Continuous): Age

Step 2: ANOVA Test

Defining the Hypothesis

  • Null Hypothesis (H₀): Mean social media time is the same across platforms.
  • Alternative Hypothesis (H₁):Mean social media time differs across platforms.

Checking Assumptions

  1. Normality: Checked using visualizations.
  2. Equal Variances: Checked using Levene’s Test.
# Visualizing distribution
ggplot(data, aes(x = `Daily Social Media Time (hrs)`, fill = `Primary Platform`)) +
  geom_density(alpha = 0.5) +
  labs(title = "Distribution of Social Media Time by Platform",
       x = "Daily Social Media Time (hrs)",
       y = "Density") +
  theme_minimal()


# Levene's Test for equal variance
leveneTest(`Daily Social Media Time (hrs)` ~ `Primary Platform`, data = data)
## Levene's Test for Homogeneity of Variance (center = median)
##          Df F value Pr(>F)
## group 4e+00  0.8405 0.4991
##       3e+05

Insight:

  • The density plot shows overlapping distributions across platforms. This indicates no obvious platform-based differences.
  • Levene’s Test (p-value = 0.4991) confirms that the variance is similar between groups, meeting the assumption for ANOVA.
  • The similarity suggests that platform choice may not strongly influence social media time.

Performing the ANOVA Test

# ANOVA test
anova_result <- aov(`Daily Social Media Time (hrs)` ~ `Primary Platform`, data = data)
summary(anova_result)
##                       Df  Sum Sq Mean Sq F value Pr(>F)
## `Primary Platform` 4e+00      12   3.079   0.657  0.622
## Residuals          3e+05 1406935   4.690

Conclusion:

  • The p-value (0.622) is greater than 0.05, meaning we fail to reject H₀.
  • There’s no significant evidence that platform choice impacts social media usage.
  • This suggests that platform differences may not be a useful factor in predicting user engagement.

Step 3: Visualizing ANOVA Results

ggplot(data, aes(x = `Primary Platform`, y = `Daily Social Media Time (hrs)`, fill = `Primary Platform`)) +
  geom_boxplot() +
  labs(
    title = "Social Media Time by Platform",
    x = "Primary Platform",
    y = "Daily Social Media Time (hrs)"
  ) +
  theme_minimal()

Insight:

  • The boxplot reveals minor differences across platforms, but these are not substantial enough to drive distinct behavior patterns.
  • This strengthens the ANOVA conclusion that platform choice alone does not significantly impact social media time.

Step 4: Regression Model

Defining the Hypothesis

We’ll build a linear regression model to predict Daily Social Media Time based on Age.

Building the Model

# Linear regression model
lm_model <- lm(`Daily Social Media Time (hrs)` ~ Age, data = data)

# Model summary
summary(lm_model)
## 
## Call:
## lm(formula = `Daily Social Media Time (hrs)` ~ Age, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7630 -1.8779  0.0044  1.8732  3.7534 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.2671867  0.0108955 391.646   <2e-16 ***
## Age         -0.0003213  0.0002635  -1.219    0.223    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.166 on 299998 degrees of freedom
## Multiple R-squared:  4.955e-06,  Adjusted R-squared:  1.622e-06 
## F-statistic: 1.487 on 1 and 299998 DF,  p-value: 0.2228

Interpreting the Coefficients

  • The intercept (4.27) indicates the expected social media time when Age = 0.
  • The slope (-0.00032) suggests that social media usage decreases very slightly as age increases.

Insight:

  • The weak relationship (p-value = 0.223) suggests age is not a strong predictor of social media behavior.
  • This aligns with the visual patterns and shows that other factors may play a larger role in driving engagement.

Step 5: Visualizing Regression Model

ggplot(data, aes(x = Age, y = `Daily Social Media Time (hrs)`)) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "lm", color = "red") +
  labs(
    title = "Regression Model: Age vs Social Media Time",
    x = "Age",
    y = "Daily Social Media Time (hrs)"
  ) +
  theme_minimal()

Insight:

  • The regression line is almost flat, confirming that age has minimal influence on social media behavior.
  • This suggests that factors like content preference, social trends, or platform features may be more important.

Final Insights and Next Steps

Key Findings:

  1. ANOVA Results:
    • Platform choice does not significantly impact social media time.
    • This suggests that engagement strategies should not solely focus on platform differences.
  2. Regression Results:
    • Age has a weak negative relationship with social media usage.
    • This finding shows that targeting strategies based on age alone may be ineffective.
  3. Broader Implications:
    • Since neither platform nor age strongly influences social media behavior, user engagement may be driven by other behavioral or contextual factors.

Next Steps:

  1. Explore alternative predictors like content type, device usage, or engagement patterns.
  2. Conduct a multiple regression model combining multiple variables for improved prediction.
  3. Investigate user habits over time to identify trends that impact engagement.