Introduction

This analysis explores confidence intervals using the Social Media and Entertainment Dataset. Key objectives:

  • Select two numeric variables and create two calculated columns.
  • Analyze relationships through visualizations and correlation.
  • Compute confidence intervals to estimate population metrics.


Selecting and Creating Variables

We choose:

  1. Daily Social Media Time (hrs) (response variable)
  2. Age (explanatory variable)
  3. Engagement Score (calculated as Daily Social Media Time (hrs) / Age)
  4. Adjusted Sleep Quality (calculated as Sleep Quality (scale 1-10) / Age)
# Create new calculated columns
data <- data %>%
  mutate(
    Engagement_Score = `Daily Social Media Time (hrs)` / Age,
    Adjusted_Sleep_Quality = `Sleep Quality (scale 1-10)` / Age
  )

# Display first few rows with new columns
head(data)
## # A tibble: 6 × 42
##   `User ID`   Age Gender Country Daily Social Media Tim…¹ Daily Entertainment …²
##       <dbl> <dbl> <chr>  <chr>                      <dbl>                  <dbl>
## 1         1    32 Other  Germany                     4.35                   4.08
## 2         2    62 Other  India                       4.96                   4.21
## 3         3    51 Female USA                         6.78                   1.77
## 4         4    44 Female India                       5.06                   9.21
## 5         5    21 Other  Germany                     2.57                   1.3 
## 6         6    21 Male   Canada                      4.69                   1.7 
## # ℹ abbreviated names: ¹​`Daily Social Media Time (hrs)`,
## #   ²​`Daily Entertainment Time (hrs)`
## # ℹ 36 more variables: `Social Media Platforms Used` <dbl>,
## #   `Primary Platform` <chr>, `Daily Messaging Time (hrs)` <dbl>,
## #   `Daily Video Content Time (hrs)` <dbl>, `Daily Gaming Time (hrs)` <dbl>,
## #   Occupation <chr>, `Marital Status` <chr>, `Monthly Income (USD)` <dbl>,
## #   `Device Type` <chr>, `Internet Speed (Mbps)` <dbl>, …

Visualizing Relationships

1. Social Media Time vs Age

ggplot(data, aes(x = Age, y = `Daily Social Media Time (hrs)`)) +
  geom_point(alpha = 0.5, color = "blue") +
  geom_smooth(method = "lm", color = "red") +
  labs(
    title = "Age vs Daily Social Media Time",
    x = "Age",
    y = "Daily Social Media Time (hrs)"
  ) +
  theme_minimal()

Observations:

  • Younger users spend more time on social media.
  • A weak negative trend suggests that social media use declines slightly with age.
  • Some extreme outliers (older users with high usage) could indicate unique habits.

2. Engagement Score vs Adjusted Sleep Quality

ggplot(data, aes(x = Adjusted_Sleep_Quality, y = Engagement_Score)) +
  geom_point(alpha = 0.5, color = "darkgreen") +
  geom_smooth(method = "lm", color = "black") +
  labs(
    title = "Adjusted Sleep Quality vs Engagement Score",
    x = "Adjusted Sleep Quality",
    y = "Engagement Score"
  ) +
  theme_minimal()

Observations:

  • Users with low sleep quality tend to have higher engagement scores.
  • The positive trend suggests that sleep disturbances may be linked to high social media use.
  • Some data points deviate significantly, requiring further investigation.

Correlation Analysis

We compute correlation coefficients to measure relationships.

cor_age_social <- cor(data$Age, data$`Daily Social Media Time (hrs)`, use = "complete.obs")
cor_sleep_engagement <- cor(data$Adjusted_Sleep_Quality, data$Engagement_Score, use = "complete.obs")

# Display results
cor_results <- tibble(
  Relationship = c("Age vs Social Media Time", "Adjusted Sleep Quality vs Engagement Score"),
  Correlation_Coefficient = c(cor_age_social, cor_sleep_engagement)
)

cor_results
## # A tibble: 2 × 2
##   Relationship                               Correlation_Coefficient
##   <chr>                                                        <dbl>
## 1 Age vs Social Media Time                                  -0.00223
## 2 Adjusted Sleep Quality vs Engagement Score                 0.430

Interpretation:

  • Age vs Social Media Time: A very weak negative correlation suggests that age is not a strong predictor of social media use.
  • Sleep Quality vs Engagement Score: A moderate positive correlation suggests that increased social media engagement is associated with worse sleep quality.

Confidence Interval Calculation

We construct 95% confidence intervals for:

  1. Daily Social Media Time (hrs)
  2. Engagement Score

1. Confidence Interval for Daily Social Media Time

ci_social_media <- data %>%
  specify(response = `Daily Social Media Time (hrs)`) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "mean") %>%
  get_confidence_interval(level = 0.95, type = "percentile")

ci_social_media
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1     4.25     4.26

Interpretation:

  • The 95% confidence interval gives an estimate of the true mean social media time in the population.
  • The range suggests that the average daily social media use is stable and predictable.

2. Confidence Interval for Engagement Score

ci_engagement_score <- data %>%
  specify(response = Engagement_Score) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "mean") %>%
  get_confidence_interval(level = 0.95, type = "percentile")

ci_engagement_score
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.134    0.135

Interpretation:

  • The confidence interval helps understand how engagement scores vary across users.
  • The narrow range suggests consistency in user engagement patterns.

Final Insights and Next Steps

Key Findings:

  1. Social Media Usage Declines with Age:
    • Young users dominate social media usage, but there is no drastic decline as age increases.
  2. Sleep and Engagement Are Related:
    • Poor sleep quality tends to align with higher social media engagement.
  3. Confidence Intervals Show Stability:
    • Social media use and engagement scores remain consistent across users.
  4. Potential Data Biases Exist:
    • Some extreme users (both young and old) may skew the dataset.

Next Steps:

  1. Investigate extreme outliers in engagement scores.
  2. Explore additional factors like device usage or screen time limits.
  3. Examine causal relationships between sleep quality and social media habits.