1. Description of the data

This study investigates whether high levels of social media use are associated with poor academic performance, reduced participation in physical activities, diminished social interaction, and increased levels of stress, anxiety, addiction, and depression.

(a) Age demographics by gender

(b) Distribution of sleep hours data

(c) Distribution of data on time spend on screen

(d) Distribution of daily social media usage data

(e) Distribution of platform usage data

(f) Distribution of academic performance data

(g) Distribution of data on time spend on physical activities

(h) Distribution of data on social interaction levels

(i) Distribution of data on stress levels

(j) Distribution of data on anxiety levels

(k) Distribution of data on addiction levels

(l) Distribution of data on depression staus

2. Mental health scores by gender

This section examines whether mental‑health outcomes differ between male and female students by comparing their stress, anxiety, and social‑media addiction levels.

Stress level per gender

Anxiety level by gender

Addiction level by gender

3. Assessment of Lifestyle Patterns Across Depression Categories

This section examines the associations between lifestyle habits (sleep duration, screen time, and daily social media usage)

Sleep duration by depression status

Screen time before sleep by depression status

Daily social media hours by depression status

4. Analysis of mental‑health outcomes across social media platforms.

The analysis examines correlations among screen time, sleep duration, daily social media use, and depression levels.

Average stress level by platform

Average anxiety level by platform usage

Average addiction level by platform usage

5. Lifestyle analysis by social media platform

Lifestyle behaviors, including sleep, screen time, and activity levels, are evaluated based on platform use

Average daily social media hours by platform usage

Average sleep hours by platform usage

Average screen time hours by platform usage

6. Analysis of academic outcomes based on platform usage

This section evaluates whether students’ academic performance differs according to the social media platforms they use. (Academics and platform)

7. Physical activity patterns by platform type

This section investigates whether students’ engagement in physical activity differs across social media platform groups. (Physical and platform)

8. Comparative analysis of mental‑health indicators across

Mental health scores, including stress, anxiety, and addiction, are compared across genders.

Stress level by gender

Anxiety level by gender

Addiction level by gender

9. Analysis of mental health metrics by Social Interaction

This section presents an analysis of social interaction, focusing on average levels of stress, anxiety, and addiction categorized by frequency of social interaction.

Average stress level by social interaction levels

Average anxiety level by social interaction levels

Average addiction level by social interaction levels

10. Lifestyle patterns by depression status

The analysis compares depressed and non‑depressed groups across key lifestyle variables, including daily activities, sleep duration, and screen time

Average daily social media hours by depression label

Average sleep hours by depression label

Average screen time before sleep by depression

11. Mental health metrics by depression status

Average stress levels by depression label

Average anxiety levels by depression label

Average addiction levels by depression label

12. Academic performance by depression label

This analysis compared the average GPA of students grouped by depression label to examine whether academic performance varies across these categories. (Average GPA by depression status)

13. Average physical activity hours by depression label

This analysis examined the relationship between depression and physical activity by assessing how activity levels vary across individuals with different depression scores. (Average physical activity hours)

14.Correlation and heatmap

This analysis explored how key behavioral and mental‑health variables relate to one another by examining correlations among screen time, age, anxiety, academic performance, depression, stress, sleep duration, and addiction levels. (Relationship between variables)

15. Conclusion

This analysis evaluated a range of behavioral, lifestyle, and platform‑related factors to determine which variables meaningfully predict depression by comparing each factor’s relationship to depression status. (prediction)

(a) Age demographics by gender

ggplot(data=social_media_data)+
  geom_bar(mapping=aes(x=age, fill=gender), position="dodge")+
    labs(title = "Participants’ age demographics by gender")

(b) Distribution of sleep hours data

ggplot(social_media_data, aes(x = sleep_hours)) +
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
    geom_freqpoly(binwidth = 1, color = "red", size = 1)+
  labs(
    title = "Distribution of Sleep Hours",
    x = "Sleep Hours",
    y = "Count"
  ) +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

(c) Distribution of data on time spend on screen

ggplot(social_media_data, aes(x = screen_time_before_sleep)) +
  geom_histogram(aes(y = after_stat(count / sum(count)) * 100),
                 binwidth = 1, fill = "steelblue", color = "white") +
  geom_freqpoly(aes(y = after_stat(count / sum(count)) * 100),
                binwidth = 1, color = "red", size = 1) +
  labs(
    title = "Screen Time Before Sleep",
    x = "Screen time in hours",
    y = "Percentage (%)"
  ) +
  theme_minimal()

(d) Distribution of daily social media usage data

ggplot(social_media_data, aes(x = daily_social_media_hours)) +
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
    geom_freqpoly(binwidth = 1, color = "red", size = 1)+
  labs(
    title = "Daily social media hours",
    x = "Number of hours",
    y = "Count"
  ) +
  theme_minimal()

(e) Distribution of platform usage data

platform_summary <- social_media_data %>%
  count(platform_usage) %>%
  mutate(percent = n / sum(n) * 100,
         label = paste0(round(percent, 1), "%"))

ggplot(platform_summary, aes(x = "", y = percent, fill = platform_usage)) +
  geom_col(width = 1) +
  coord_polar(theta = "y") +
  geom_text(aes(label = label),
            position = position_stack(vjust = 0.5),
            color = "white",
            size = 4) +
  labs(
    title = "Platform Usage",
    y = "Percentage",
    x = ""
  ) +
  theme_void()

(f) Distribution of academic performance data

ggplot(social_media_data, aes(x = academic_performance)) +
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
    geom_freqpoly(binwidth = 1, color = "red", size = 1)+
  labs(
    title = "academic performance",
    x = "GPA",
    y = "# of students"
  ) +
  theme_minimal()

(g) Distribution of data on time spend on physical activities

ggplot(social_media_data, aes(x = physical_activity)) +
  geom_histogram(
                 binwidth = 1, fill = "steelblue", color = "white") +
    geom_freqpoly(binwidth = 1, color = "red", size = 1)+
  labs(
    title = "Physical activity",
    x = "Time in hours",
    y = "# of students"
  ) +
  theme_minimal()

(h) Distribution of data on social interaction levels

interaction_summary <- social_media_data %>%
  count(social_interaction_level) %>%
  mutate(percent = n / sum(n) * 100,
         label = paste0(round(percent, 1), "%"))

ggplot(interaction_summary, aes(x = "", y = percent, fill = social_interaction_level)) +
  geom_col(width = 1) +
  coord_polar(theta = "y") +
  geom_text(aes(label = label),
            position = position_stack(vjust = 0.5),
            color = "white",
            size = 4) +
  labs(
    title = "Social interaction level",
    y = "Percentage",
    x = ""
  ) +
  theme_void()

(i) Distribution of data on stress levels

ggplot(social_media_data, aes(x = stress_level)) +
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
    geom_freqpoly(binwidth = 1, color = "red", size = 1)+
  labs(
    title = "stress level",
    x = "stress levels",
    y = "# of students"
  ) +
  theme_minimal()

(j) Distribution of data on anxiety levels

ggplot(social_media_data, aes(x =anxiety_level)) +
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
    geom_freqpoly(binwidth = 1, color = "red", size = 1)+
  labs(
    title = "Anxiety level",
    x = "anxiety levels",
    y = "# of students"
  ) +
  theme_minimal()

(k) Distribution of data on addiction levels

ggplot(social_media_data, aes(x = addiction_level)) +
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
    geom_freqpoly(binwidth = 1, color = "red", size = 1)+
  labs(
    title = "Addiction level",
    x = "addiction levels",
    y = "# of students"
  ) +
  theme_minimal()

(l) Distribution of data on depression staus

depression_summary <- social_media_data %>%
  count(depression_label) %>%
  mutate(percent = n / sum(n) * 100,
         label = paste0(round(percent, 1), "%"))

ggplot(depression_summary, aes(x = "", y = percent, fill = depression_label)) +
  geom_col(width = 1) +
  coord_polar(theta = "y") +
  geom_text(aes(label = label),
            position = position_stack(vjust = 0.5),
            color = "white",
            size = 5) +
  labs(
    title = "Depression Label",
    x = "",
    y = "Percentage"
  ) +
  theme_void()

2. Mental health scores by gender

Stress level per gender

ggplot(social_media_data, aes(x =gender, y = stress_level, fill = gender)) +
  geom_boxplot() +
  labs(
    x = "gender",
    y = "stress level",
    title = "Stress level by gender"
  ) +
  theme_minimal()

  • Both genders show nearly identical median stress levels, each sitting at about 5 out of 10, indicating that their overall stress experience is essentially the same.

Anxiety level by gender

ggplot(social_media_data, aes(x =gender, y =anxiety_level, fill = gender)) +
  geom_boxplot() +
  labs(
    x = "gender",
    y = "anxiety level",
    title = "Anxiety level by gender"
  ) +
  theme_minimal()

  • Both groups show comparable anxiety levels, each averaging around 6 out of 10, indicating no meaningful difference between them.

Addiction level by gender

ggplot(social_media_data, aes(x =gender, y = addiction_level, fill = gender)) +
  geom_boxplot() +
  labs(
    x = "gender",
    y = "addiction level",
    title = "Addiction level by gender"
  ) +
  theme_minimal()

  • Both groups show nearly the same addiction score, hovering just above 5 out of 10, indicating that males and females exhibit comparable addiction levels in the dataset. In other words, there’s no meaningful difference in how addiction is distributed between the two genders.

3. Assessment of lifestyle patterns across depression categories

Sleep duration by depression status

social_media_data$depression <- factor(social_media_data$depression_label, labels = c("No Depression", "Depression"))
ggplot(social_media_data, aes(x = depression, y = sleep_hours, fill = depression)) +
  geom_boxplot() +
  labs(
    x = "Depression Status",
    y = "Sleep Hours",
    title = "Sleep Duration by Depression Status"
  ) +
  theme_minimal()

  • The median sleep duration is 6.5 hours for the “No depression” group and 4.5 hours for the “depression” group. This suggests a potential association between depression and shorter sleep duration.

  • There is no overlap between the middle 50% of the two groups, indicating a statistically significant difference.

  • The “No depression” group has a larger interquartile range than the “depression” group, indicating greater variability in sleep duration among those without depression.

  • These findings support the hypothesis that individuals with depression are likely to experience significantly shorter sleep duration than those without depression. This is confirmed using the t-Test

t.test(sleep_hours ~ depression, data = social_media_data)
## 
##  Welch Two Sample t-test
## 
## data:  sleep_hours by depression
## t = 15.745, df = 40.991, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group No Depression and group Depression is not equal to 0
## 95 percent confidence interval:
##  1.510624 1.955162
## sample estimates:
## mean in group No Depression    mean in group Depression 
##                    6.494183                    4.761290

Since the p‑value =2.2e-16 < 0.05, the two groups differ significantly. Thus, the distribution of sleep hours differs between depressed and non‑depressed participants.

Screen time before sleep by depression status

ggplot(social_media_data, aes(x = depression, y = screen_time_before_sleep, fill = depression)) +
  geom_boxplot() +
  labs(
    x = "Depression Status",
    y = "screen time before sleep",
    title = "Screen time before sleep by Depression Status"
  ) +
  theme_minimal()

  • The plot indicates that individuals with depression exhibit slightly higher and more variable screen time before sleep compared to those without depression.
  • The differing medians and the broader distribution for the depressed group suggest less consistent pre-sleep screen time habits.
  • These findings support further investigation of sleep behavior as a potential mental health factor.
wilcox.test(screen_time_before_sleep ~ depression, data = social_media_data)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  screen_time_before_sleep by depression
## W = 19199, p-value = 0.5707
## alternative hypothesis: true location shift is not equal to 0
  • Because the p‑value (0.5707) is greater than 0.05, there is no statistically significant difference between the two groups. This indicates that the amount of time spent on screens does not differ between participants with and without depression

Daily social media hours by depression status

ggplot(social_media_data, aes(x = depression, y =daily_social_media_hours, fill = depression)) +
  geom_boxplot() +
  labs(
    x = "Depression Status",
    y = "daily social media hours",
    title = "Daily social media hours by Depression Status"
  ) +
  theme_minimal()

  • Individuals with depression show higher median social media use (6 hours) than those without (4.5 hours).
  • The no‑depression group has a wider IQR, indicating more varied habits, while the depression group is more consistent.
  • The higher median suggests a significant difference between groups.

4. Analysis of mental‑health outcomes across social media platforms

Average stress level by platform

Stress levels are analyzed in relation to the use of specific social media platforms.

stress_mean<-social_media_data %>%
  group_by(platform_usage) %>%
  summarise(avg_stress = mean(stress_level, na.rm = TRUE))
stress_mean
## # A tibble: 3 × 2
##   platform_usage avg_stress
##   <chr>               <dbl>
## 1 Both                 5.55
## 2 Instagram            5.50
## 3 TikTok               5.29
ggplot(data = stress_mean) +
  geom_bar(
    aes(x = platform_usage, y = avg_stress, fill = platform_usage),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(platform_usage),
      y = avg_stress,
      label = round(avg_stress, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  
  labs(
    x = "Social Media Platform",
    y = "Average stress level",
    title = "Average Stress level by Platform Usage"
  )

  • The “Both” group shows the highest average stress levels, followed by the “Instagram” group, while the “TikTok” group has the lowest average stress.

Average anxiety level by platform usage

Anxiety levels are analyzed in relation to the use of specific social media platforms.

anxiety_mean<-social_media_data %>%
  group_by(platform_usage) %>%
  summarise(avg_anxiety = mean(anxiety_level, na.rm = TRUE))%>%
  print()
## # A tibble: 3 × 2
##   platform_usage avg_anxiety
##   <chr>                <dbl>
## 1 Both                  5.49
## 2 Instagram             5.67
## 3 TikTok                5.75
ggplot(data = anxiety_mean) +
  geom_bar(
    aes(x = platform_usage, y = avg_anxiety, fill = platform_usage),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(platform_usage),
      y = avg_anxiety,
      label = round(avg_anxiety, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  labs(
    x = "Social media platform",
    y = "Average anxiety level",
    title = "Average anxiety level by Platform Usage"
  )

  • “TikTok” users exhibit the highest average anxiety level, exceeding a score of 5.
  • In contrast, Instagram users and those who use both platforms have average anxiety levels that are close to 5.

Average addiction level by platform usage

Addiction levels are analyzed in relation to the use of specific social media platforms.

addiction_mean<-social_media_data %>%
  group_by(platform_usage) %>%
  summarise(avg_addiction = mean(addiction_level, na.rm = TRUE))%>%
  print()
## # A tibble: 3 × 2
##   platform_usage avg_addiction
##   <chr>                  <dbl>
## 1 Both                    5.50
## 2 Instagram               5.58
## 3 TikTok                  5.62
ggplot(data = addiction_mean) +
  geom_bar(
    aes(x = platform_usage, y = avg_addiction, fill = platform_usage),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(platform_usage),
      y = avg_addiction,
      label = round(avg_addiction, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  
  labs(
    x = "Social media platform",
    y = "Average addiction level",
    title = "Average addiction level by Platform Usage"
  )

  • The addiction levels are remarkably similar across all three categories.
  • This suggests that the choice of social media platform does not significantly affect the average reported level of addiction.

5. Lifestyle analysis by social media platform

Average daily social media hours by platform usage

daily_usage_mean<-social_media_data %>%
  group_by(platform_usage) %>%
  summarise(daily_usage_mean = mean(daily_social_media_hours, na.rm = TRUE))%>%
  print()
## # A tibble: 3 × 2
##   platform_usage daily_usage_mean
##   <chr>                     <dbl>
## 1 Both                       4.52
## 2 Instagram                  4.56
## 3 TikTok                     4.53
ggplot(data = daily_usage_mean) +
  geom_bar(
    aes(x = platform_usage, y = daily_usage_mean, fill = platform_usage),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(platform_usage),
      y = daily_usage_mean,
      label = round(daily_usage_mean, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  
  labs(
    x = "Social media platform",
    y = "Average daily social media hours",
    title = "Average daily social media hours by Platform Usage"
  )

  • The average usage across all three categories is nearly identical, at approximately 4.5 to 5 hours per day.

Average sleep hours by platform usage

sleep_hours_mean<-social_media_data %>%
  group_by(platform_usage) %>%
  summarise(sleep_hours_mean = mean(sleep_hours, na.rm = TRUE))%>%
  print()
## # A tibble: 3 × 2
##   platform_usage sleep_hours_mean
##   <chr>                     <dbl>
## 1 Both                       6.46
## 2 Instagram                  6.44
## 3 TikTok                     6.45
ggplot(data = sleep_hours_mean) +
  geom_bar(
    aes(x = platform_usage, y = sleep_hours_mean, fill = platform_usage),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(platform_usage),
      y = sleep_hours_mean,
      label = round(sleep_hours_mean, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  
  labs(
    x = "Social media platform",
    y = "Average sleep hours",
    title = "Average sleep hours by Platform Usage"
  )

  • Users of “Both” Instagram and TikTok appear to average around 6 hours of sleep.
  • Overall, there is no significant difference in average sleep duration across the different social media platform categories.

Average screen time hours by platform usage

screen_time_hours_mean<-social_media_data %>%
  group_by(platform_usage) %>%
  summarise(avg_screen_time = mean(screen_time_before_sleep, na.rm = TRUE))%>%
  print()
## # A tibble: 3 × 2
##   platform_usage avg_screen_time
##   <chr>                    <dbl>
## 1 Both                      1.75
## 2 Instagram                 1.71
## 3 TikTok                    1.76
ggplot(data = screen_time_hours_mean) +
  geom_bar(
    aes(x = platform_usage, y =avg_screen_time , fill = platform_usage),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(platform_usage),
      y = avg_screen_time,
      label = round(avg_screen_time, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  
  labs(
    x = "Social media platform",
    y = "Average screen time before sleep (hours)",
    title = "Average screen time before sleep by Platform Usage"
  )

  • All three categories (Both, Instagram, and TikTok) show very similar average screen time before sleep, ranging from approximately 1.75 to 1.8 hours.

6. Analysis of academic outcomes based on platform usage

The relationship between academic performance and platform usage is assessed.

GPA_mean<-social_media_data %>%
  group_by(platform_usage) %>%
  summarise(avg_GPA = mean(academic_performance, na.rm = TRUE))%>%
  print()
## # A tibble: 3 × 2
##   platform_usage avg_GPA
##   <chr>            <dbl>
## 1 Both              2.98
## 2 Instagram         3.00
## 3 TikTok            3.00
ggplot(data =GPA_mean) +
  geom_bar(
    aes(x = platform_usage, y =avg_GPA, fill = platform_usage),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(platform_usage),
      y = avg_GPA,
      label = round(avg_GPA, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  
  labs(
    x = "Social media platform",
    y = "Average GPA",
    title = "Average GPA by Platform Usage"
  )

7. Physical activity patterns by platform type

The relationship between academic performance and platform usage is assessed.

physical_activity_mean<-social_media_data %>%
  group_by(platform_usage) %>%
  summarise(avg_physical = mean(physical_activity, na.rm = TRUE))%>%
  print()
## # A tibble: 3 × 2
##   platform_usage avg_physical
##   <chr>                 <dbl>
## 1 Both                  1.02 
## 2 Instagram             1.04 
## 3 TikTok                0.982
ggplot(data = physical_activity_mean) +
  geom_bar(
    aes(x = platform_usage, y =avg_physical, fill = platform_usage),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(platform_usage),
      y = avg_physical,
      label = round(avg_physical, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  labs(
    x = "Social media platform",
    y = "Mean physical activity time (in hours)",
    title = "Mean physical activity time by platform usage group"
  )

8. Comparative analysis of mental‑health indicators across genders

Stress level by gender

stress_gender_mean<-social_media_data %>%
  group_by(gender) %>%
  summarise(avg_stress = mean(stress_level, na.rm = TRUE))%>%
  print()
## # A tibble: 2 × 2
##   gender avg_stress
##   <chr>       <dbl>
## 1 female       5.42
## 2 male         5.47
ggplot(data =stress_gender_mean ) +
  geom_bar(
    aes(x = gender, y =avg_stress, fill = gender),
    stat = "identity"
  ) +
  labs(
    x = "gender",
    y = "average sleep hours",
    title = "Average sleep hours by gender"
  )

  • There is a slight difference in sleep patterns between the two groups.
  • On average, males in the dataset sleep approximately 0.5 hours more than females.

Anxiety level by gender

anxiety_gender_mean<-social_media_data %>%
  group_by(gender) %>%
  summarise(avg_anxiety = mean(anxiety_level, na.rm = TRUE))%>%
  print()
## # A tibble: 2 × 2
##   gender avg_anxiety
##   <chr>        <dbl>
## 1 female        5.69
## 2 male          5.59
ggplot(data =anxiety_gender_mean ) +
  geom_bar(
    aes(x = gender, y =avg_anxiety, fill = gender),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(gender),
      y = avg_anxiety,
      label = round(avg_anxiety, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  
  labs(
    x = "gender",
    y = "average anxiety level",
    title = "Average anxiety level by gender"
  )

  • Females show a slightly higher average anxiety level than males.

Anxiety level by gender

addiction_gender_mean<-social_media_data %>%
  group_by(gender) %>%
  summarise(avg_addiction = mean(addiction_level, na.rm = TRUE))%>%
  print()
## # A tibble: 2 × 2
##   gender avg_addiction
##   <chr>          <dbl>
## 1 female          5.49
## 2 male            5.64
ggplot(data =addiction_gender_mean ) +
  geom_bar(
    aes(x = gender, y =avg_addiction, fill = gender),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(gender),
      y = avg_addiction,
      label = round(avg_addiction, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  labs(
    x = "gender",
    y = "average addiction level",
    title = "Average addiction level by gender"
  )

  • The average addiction levels for both males and females are very similar, with both groups averaging around a value of 5.
  • This suggests that gender is not a significant factor influencing addiction levels, as the difference between the two groups is minimal.

9. Analysis of mental health metrics by Social Interaction Levels

Average stress level by social interaction levels

stress_interaction_mean<-social_media_data %>%
  group_by(social_interaction_level) %>%
  summarise(avg_stress = mean(stress_level, na.rm = TRUE))%>%
  print()
## # A tibble: 3 × 2
##   social_interaction_level avg_stress
##   <chr>                         <dbl>
## 1 high                           5.51
## 2 low                            5.35
## 3 medium                         5.48
ggplot(data = stress_interaction_mean) +
  geom_bar(
    aes(x =social_interaction_level, y =avg_stress, fill =social_interaction_level ),
    stat = "identity"
  ) +
  labs(
    x = "social interaction level",
    y = "average stress level",
    title = "Average stress level by social interaction levels"
  )

  • The average stress level remains relatively consistent at around 5 across all three social interaction categories: high, medium, and low.

Average anxiety level by social interaction levels

anxiety_interaction_mean<-social_media_data %>%
  group_by(social_interaction_level) %>%
  summarise(avg_anxiety = mean(anxiety_level, na.rm = TRUE))%>%
  print()
## # A tibble: 3 × 2
##   social_interaction_level avg_anxiety
##   <chr>                          <dbl>
## 1 high                            5.70
## 2 low                             5.53
## 3 medium                          5.69
ggplot(data = anxiety_interaction_mean) +
  geom_bar(
    aes(x =social_interaction_level, y =avg_anxiety, fill =social_interaction_level ),
    stat = "identity"
  ) +
  labs(
    x = "social interaction level",
    y = "average anxiety level",
    title = "Average anxiety level by social interaction levels"
  )

  • The data shows no significant relationship between an individual’s level of social interaction and their average anxiety level.
  • Therefore, social interaction level cannot be considered a meaningful predictor of anxiety, as the results remain consistent across all categories.

Average addiction level by social interaction levels

addiction_interaction_mean<-social_media_data %>%
  group_by(social_interaction_level) %>%
  summarise(avg_addiction = mean(addiction_level, na.rm = TRUE))%>%
  print()
## # A tibble: 3 × 2
##   social_interaction_level avg_addiction
##   <chr>                            <dbl>
## 1 high                              5.59
## 2 low                               5.47
## 3 medium                            5.64
ggplot(data = addiction_interaction_mean) +
  geom_bar(
    aes(x =social_interaction_level, y =avg_addiction, fill =social_interaction_level ),
    stat = "identity"
  ) +
  labs(
    x = "social interaction level",
    y = "average addiction level",
    title = "Average addiction level by social interaction levels"
  )

  • The plots indicate that the average addiction levels remain almost the same across all three categories of social interaction.
  • This suggests that social interaction levels do not have a significant effect on average addiction levels within this dataset.

10. Lifestyle patterns by depression status

Average daily social media hours by depression label

daily_depression_mean<-social_media_data %>%
  group_by(depression_label) %>%
  summarise(avg_daily_hours= mean(daily_social_media_hours, na.rm = TRUE))
ggplot(data = daily_depression_mean) +
  geom_bar(
    aes(
      x = factor(depression_label),
      y = avg_daily_hours,
      fill = factor(depression_label)
    ),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(depression_label),
      y = avg_daily_hours,
      label = round(avg_daily_hours, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  scale_fill_manual(
    values = c("0" = "steelblue", "1" = "tomato"),
    name = "Depression label"
  ) +
  labs(
    x = "Depression label",
    y = "Average daily Social media hours",
    title = "Average daily social media hours by depression Label"
  ) +
  theme_minimal()

  • There is a noticeable positive link between greater social media use and having a depression label. Individuals who spend more than three to five hours per day on social media show a significantly higher likelihood of experiencing mental‑health symptoms.

Average sleep hours by depression label

sleep_depression_mean<-social_media_data %>%
  group_by(depression_label) %>%
  summarise(avg_sleep_hours= mean(sleep_hours, na.rm = TRUE))
ggplot(data = sleep_depression_mean) +
  geom_bar(
    aes(
      x = factor(depression_label),
      y = avg_sleep_hours,
      fill = factor(depression_label)
    ),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(depression_label),
      y = avg_sleep_hours,
      label = round(avg_sleep_hours, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  scale_fill_manual(
    values = c("0" = "steelblue", "1" = "tomato"),
    name = "Depression label"
  ) +
  labs(
    x = "Depression label",
    y = "Average sleep hours",
    title = "Average sleep hours by depression Label"
  ) +
  theme_minimal()

  • The data indicates a negative association between sleep duration and a depression diagnosis, meaning that as individuals sleep fewer hours, their likelihood of experiencing depression increases.

  • Notably, both groups average less than the recommended 7–9 hours of sleep for adults.

Average screen time before sleep by depression

screen_depression_mean<-social_media_data %>%
  group_by(depression_label) %>%
  summarise(avg_screen_hours= mean(screen_time_before_sleep, na.rm = TRUE))
ggplot(data = screen_depression_mean) +
  geom_bar(
    aes(
      x = factor(depression_label),
      y = avg_screen_hours,
      fill = factor(depression_label)
    ),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(depression_label),
      y = avg_screen_hours,
      label = round(avg_screen_hours, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  scale_fill_manual(
    values = c("0" = "steelblue", "1" = "tomato"),
    name = "Depression label"
  ) +
  labs(
    x = "Depression label",
    y = "Average screen time before sleep",
    title = "Average screen time before sleep by depression Label"
  ) +
  theme_minimal()

  • The average amount of screen time before sleep is nearly identical for both the “Depressed” and “No Depression” groups. Based on the data, increased screen use before bedtime does not appear to be positively associated with having a depression label.

11. Mental health metrics by depression status

Average stress levels by depression label

stress_depression_mean<-social_media_data %>%
  group_by(depression_label) %>%
  summarise(avg_stress_hours= mean(stress_level, na.rm = TRUE))
ggplot(data = stress_depression_mean) +
  geom_bar(
    aes(
      x = factor(depression_label),
      y = avg_stress_hours,
      fill = factor(depression_label)
    ),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(depression_label),
      y = avg_stress_hours,
      label = round(avg_stress_hours, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  scale_fill_manual(
    values = c("0" = "steelblue", "1" = "tomato"),
    name = "Depression label"
  ) +
  labs(
    x = "Depression label",
    y = "Average stress levels",
    title = "Average stress levels by depression Label"
  ) +
  theme_minimal()

  • There is a clear positive correlation between depression and stress levels in this dataset.
  • Individuals identified as depressed report stress scores that are, on average, about 58% higher than those without a depression label.

Average anxiety levels by depression label

anxiety_depression_mean<-social_media_data %>%
  group_by(depression_label) %>%
  summarise(avg_anxiety_hours= mean(anxiety_level, na.rm = TRUE))
ggplot(data = anxiety_depression_mean) +
  geom_bar(
    aes(
      x = factor(depression_label),
      y = avg_anxiety_hours,
      fill = factor(depression_label)
    ),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(depression_label),
      y = avg_anxiety_hours,
      label = round(avg_anxiety_hours, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  scale_fill_manual(
    values = c("0" = "steelblue", "1" = "tomato"),
    name = "Depression label"
  ) +
  labs(
    x = "Depression label",
    y = "Average anxiety levels",
    title = "Average anxiety levels by depression Label"
  ) +
  theme_minimal()

  • There is a strong positive correlation between depression and anxiety in this dataset.

  • The pattern aligns with established psychological research showing that these conditions frequently co‑occur.

  • Individuals with higher depression scores also exhibit elevated levels of anxiety.

Average addiction levels by depression label

addiction_depression_mean<-social_media_data %>%
  group_by(depression_label) %>%
  summarise(avg_addiction_hours= mean(addiction_level, na.rm = TRUE))
ggplot(data = addiction_depression_mean) +
  geom_bar(
    aes(
      x = factor(depression_label),
      y = avg_addiction_hours,
      fill = factor(depression_label)
    ),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(depression_label),
      y = avg_addiction_hours,
      label = round(avg_addiction_hours, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  scale_fill_manual(
    values = c("0" = "steelblue", "1" = "tomato"),
    name = "Depression label"
  ) +
  labs(
    x = "Depression label",
    y = "Average addiction levels",
    title = "Average addiction levels by depression Label"
  ) +
  theme_minimal()

  • The difference in average addiction scores between the two groups is minimal, indicating that, within this dataset, depression status does not meaningfully predict addiction levels.

12. Average GPA by depression status

gpa_depression_mean<-social_media_data %>%
  group_by(depression_label) %>%
  summarise(avg_gpa_hours= mean(academic_performance, na.rm = TRUE))
ggplot(data =gpa_depression_mean) +
  geom_bar(
    aes(
      x = factor(depression_label),
      y = avg_gpa_hours,
      fill = factor(depression_label)
    ),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(depression_label),
      y = avg_gpa_hours,
      label = round(avg_gpa_hours, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  scale_fill_manual(
    values = c("0" = "steelblue", "1" = "tomato"),
    name = "Depression label"
  ) +
  labs(
    x = "Depression label",
    y = "Average GPA",
    title = "Average GPA by depression Label"
  ) +
  theme_minimal()

13. Average physical activity hours by depression label

physical_depression_mean<-social_media_data %>%
  group_by(depression_label) %>%
  summarise(avg_physical_hours= mean(physical_activity, na.rm = TRUE))
ggplot(data =physical_depression_mean) +
  geom_bar(
    aes(
      x = factor(depression_label),
      y = avg_physical_hours,
      fill = factor(depression_label)
    ),
    stat = "identity"
  ) +
  geom_text(
    aes(
      x = factor(depression_label),
      y = avg_physical_hours,
      label = round(avg_physical_hours, 2)
    ),
    vjust = -0.5,
    size = 3.3
  ) +
  scale_fill_manual(
    values = c("0" = "steelblue", "1" = "tomato"),
    name = "Depression label"
  ) +
  labs(
    x = "Depression label",
    y = "Average physical activity hours",
    title = "Average physical activity hours by depression Label"
  ) +
  theme_minimal()

14.Correlation and Heatmap

corr_matrix<-social_media_data%>%
  select(1,3,5,6,7,8,10,11,12,13)%>%
  cor()
corrplot(corr_matrix,
         method = "color",
         tl.col = "black"
         )

15. Conclusion

Factor Predictor of Depression Explanation
Sleep duration Yes Depressed students sleep ~2 hours less; no overlap in IQR; highly significant difference.
Daily social media usage Yes Depressed students use social media more (6 hrs vs. 4.5 hrs).
Stress levels Yes Depressed students report ~58% higher stress.
Anxiety levels Yes Strong positive correlation; anxiety and depression co‑occur
Screen Time Before Sleep No No significant difference; p = 0.57.
Social Media Addiction No Average addiction levels nearly identical across groups
Academic Performance (GPA) No GPA is almost the same for depressed and non‑depressed students
Physical Activity No Slight negative correlation, but not strong enough to predict depression.
Platform Used (TikTok/Instagram/Both) No Stress/anxiety vary slightly by platform, but depression rates do not.