Mental Health Analysis

Introduction

Mental health is a subject that is extremely important to me, as I have seen it affect many of my friends and family. Furthermore, I have noticed that on a larger scale, depression and anxiety rates have significantly increased for teens over the past 10 years. Many attribute this rise in depressive/anxious symptoms to social media and the use of electronics at an early age. While I cannot confirm this, it is something I want to look into along with other external variables. Given this context, there was little hesitation as to what subject I wanted to dedicate my time to for this data analysis project. Specifically, I wanted to analyze data that took external variables (sleep, diet, screen time, exercise, etc.) into account when diagnosing a patient’s mental health. I was lucky enough to find a dataset on Kaggle that provided such information. Additionally, I was also able to pull data from the website Our World in Data that showed the growth in diagnosed mental illnesses across multiple countries from 2002-2021.

The purpose of this analysis is to show the growth of mental illness on a global scale and to analyze what external variables (within the constraints of the data provided) can contribute to this growth in individual patients.

library(skimr)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)

Primary Source

Summary

As stated above, my primary source for this analysis is a dataset that takes numerous variables from a patient’s external environment into account when diagnosing their mental health. Specifically, the dataset measures daily screen time hours, sleep duration, sleep quality, physical activity, caffeine intake, and whether the patient uses a mental wellness app. Interestingly, the daily screen time hours are segmented into 8 subcategories: phone, laptop, tablet, TV, social media, work, entertainment, and gaming. All measurements are in hours, except for caffeine intake, which is in milligrams.

In addition to these measurements, the dataset also provides demographic information. This includes the patient’s age, gender, and location (rural, suburban, or urban). Finally, the dataset includes a mood rating, stress rating, weekly depression score, weekly anxiety score, and overall mental health score.

Data Dictionary

User_id = unique identifier for each patient

Age = age of patient

Gender = gender of patient

Daily_screen_time_hours = total number of hours the patient spent on an electronic

Phone_usage_hours = total number of hours the patient spent on a phone

Laptop_usage_hours = total number of hours the patient spent on a laptop

Tablet_usage_hours = total number of hours the patient spent on a laptop

Tv_usage_hours = total number of hours the patient spent on a TV

Social_media_hours = total number of hours the patient spent on social media

Work_related_hours = total number of hours the patient spent on an electronic for work

Entertainment_hours = total number of hours the patient spent on an electronic for entertainment

Gaming_hours = number of hours the patient spent on an electronic for gaming purposes

Sleep_duration_hours = total number of hours the patient slept

Sleep_quality = the quality of the patient’s sleep

Mood_rating = patient’s self-diagnosed mood rating (1-10)

Stress_level = patient’s self-diagnosed stress level (1-10)

Physical_activity_hours_per_week = number of hours the patient exercised

Location_type = location of the patient; limited to urban, suburban, or rural

Mental_health_score = overall mental health score assigned to patient (1-100)

Uses_wellness_apps = indicates whether patient uses a mental wellness app (1/0)

Eats_healthy = indicates whether patient has a healthy diet (1/0)

Caffeine_intake_mg_per_day = milligrams of caffeine the patient consumers each day

Weekly_anxiety_score = weekly anxiety score assigned to patient (scale unknown)

Weekly_depression_score = weekly depression score assigned to patient (scale unknown)

Mindful_minutes_per_day = minutes the patient spent utilizing mindfulness

Uploading the Primary Source

mental_df <- 
  read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/olsonp2_xavier_edu/IQA62t2Aaiv8Q7VLlxIC1wsjAXONRAiWLhAs1wazIG_jRVk?download=1")
Rows: 5000 Columns: 25
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): gender, location_type
dbl (23): user_id, age, daily_screen_time_hours, phone_usage_hours, laptop_u...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Viewing the Data

view(mental_df)

Summary Statistics

Main Takeaways:

75% of the data covers the age range 30-60 years old.

On average, patients spent ~5 hours on screens, and the majority was split between social media and work.

Approximately 41% of patients use a mental wellness app, which I found surprising.

Almost exactly 50% of patients had a classified “healthy” diet.

The average mental health score hovered around a 65/100, however, there is a lot of variation to this data as it has a standard deviation of 13.

skim(mental_df)
Data summary
Name mental_df
Number of rows 5000
Number of columns 25
_______________________
Column type frequency:
character 2
numeric 23
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
gender 0 1 4 6 0 3 0
location_type 0 1 5 8 0 3 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
user_id 0 1 2500.50 1443.52 1.0 1250.75 2500.50 3750.25 5000.0 ▇▇▇▇▇
age 0 1 44.70 17.27 15.0 30.00 45.00 60.00 74.0 ▇▇▇▇▇
daily_screen_time_hours 0 1 5.04 1.84 1.0 3.70 5.00 6.30 10.0 ▃▇▇▅▁
phone_usage_hours 0 1 1.99 0.89 0.2 1.40 2.00 2.60 5.0 ▃▇▆▂▁
laptop_usage_hours 0 1 1.56 1.03 0.0 0.70 1.50 2.30 5.0 ▇▇▆▂▁
tablet_usage_hours 0 1 0.66 0.53 0.0 0.20 0.60 1.00 2.9 ▇▆▃▁▁
tv_usage_hours 0 1 1.43 0.93 0.0 0.70 1.40 2.10 4.0 ▇▇▇▃▁
social_media_hours 0 1 3.28 1.20 0.3 2.40 3.60 4.50 4.5 ▂▂▃▃▇
work_related_hours 0 1 3.36 0.83 0.7 3.10 3.70 3.90 4.5 ▂▁▂▇▇
entertainment_hours 0 1 1.66 0.66 0.0 1.20 1.60 2.10 3.9 ▂▇▇▃▁
gaming_hours 0 1 1.56 0.69 0.0 1.10 1.40 2.00 3.8 ▂▇▅▃▁
sleep_duration_hours 0 1 7.37 0.54 5.4 7.00 7.40 7.70 9.0 ▁▃▇▆▁
sleep_quality 0 1 4.01 0.66 1.0 4.00 4.00 4.00 5.0 ▁▁▃▇▃
mood_rating 0 1 4.45 2.77 1.0 1.70 4.30 6.70 10.0 ▇▃▅▃▂
stress_level 0 1 5.72 2.92 1.0 3.00 6.00 8.00 10.0 ▆▇▇▆▇
physical_activity_hours_per_week 0 1 2.66 2.29 0.0 0.60 2.30 4.20 11.8 ▇▅▂▁▁
mental_health_score 0 1 64.77 13.10 31.0 54.00 65.00 75.00 100.0 ▁▇▇▆▁
uses_wellness_apps 0 1 0.41 0.49 0.0 0.00 0.00 1.00 1.0 ▇▁▁▁▆
eats_healthy 0 1 0.50 0.50 0.0 0.00 1.00 1.00 1.0 ▇▁▁▁▇
caffeine_intake_mg_per_day 0 1 142.32 50.47 0.0 108.47 141.65 176.40 341.2 ▂▇▇▂▁
weekly_anxiety_score 0 1 8.63 5.09 0.0 5.00 8.00 12.00 21.0 ▇▇▇▅▂
weekly_depression_score 0 1 7.52 4.67 0.0 4.00 7.00 11.00 21.0 ▇▇▆▃▁
mindfulness_minutes_per_day 0 1 18.55 7.99 5.0 12.40 17.20 23.80 42.0 ▆▇▅▂▁

Analysis

Effect of Screentime and a Healthy Diet on Mental Health

mental_df %>%
  ggplot(aes(x = daily_screen_time_hours, y = mental_health_score)) +
  geom_point() +
  facet_wrap(~ eats_healthy) +
  labs(title = "Daily Screen Time to Mental Health Score", x = "Hours of Screentime", y = "Mental Health Score")

In this visualization, we are analyzing the relationship between daily hours of screentime and the score given to reflect the patient’s current mental health standing. Additionally, we are analyzing the effect of a “healthy diet” to see if that makes any noticeable changes to the results. The graph on the left represents a population that does not eat healthily, and vice versa for the graph on the right.

There is no distinguishable difference between the two graphs. However, this does not automatically mean that a healthy diet does not aid in your overall mental health. There is a clear negative relationship between daily screen time and the prescribed mental health score. This relationship may be strong enough that it overshadows any of the positive effects that a healthy diet could provide.

Effect of Hours on Social Media to Overall Mental Health

mental_df %>%
  ggplot(aes(x = social_media_hours, y = mental_health_score)) +
  geom_point() +
  labs(title = "Daily Screen Time to Mental Health Score", x = "Hours of Screentime", y = "Mental Health Score")

The scatterplot above illustrates the relationship between daily social media use and the mental health score assigned to each patient. A clear negative correlation is shown, indicating that higher levels of social media use are associated with lower mental health scores. The scores remain relatively stable until approximately the two-hour mark, after which they begin to decline more noticeably. This pattern suggests a potential threshold, implying that limiting social media use to around two hours per day may help mitigate negative impacts on mental well-being.

Importance of Physical Activity for Mood, Stress, and Overall Mental Health

mental_df %>%
  ggplot(aes(x = physical_activity_hours_per_week, y = mood_rating)) +
  geom_point() +
  labs(title  = "Hours of Physical Activity to Mood Rating", x = "Hours of Physical Activity", y = "Mood Rating")

mental_df %>%
  ggplot(aes(x = physical_activity_hours_per_week, y = stress_level)) +
  geom_point() +
  labs(title  = "Hours of Physical Activity to Stress Levels", x = "Hours of Physical Activity", y = "Stress Level")

mental_df %>%
  ggplot(aes(x = physical_activity_hours_per_week, y = mental_health_score)) +
  geom_point() +
  labs(title  = "Hours of Physical Activity to Overall Mental Health", x = "Hours of Physical Activity", y = "Mental Health Score")

It is clear that physical activity has positive effects on patients’ mood and overall mental health. There is a lot of clutter across the graphs, but they contain a distinguishable positive trend that indicates the benefits of physical activity. The results for stress levels are interesting as there is no distinguishable relationship. However, given the results of the other graphs, we can make the conclusion that patients benefit from more physical activity.

Effects of Sleep on Mood, Stress, and Overall Mental Health

mental_df %>%
  ggplot(aes(x = sleep_duration_hours, y = mood_rating)) +
  geom_point() +
  labs(title = "Hours of Sleep to Mood Rating", x = "Hours of Sleep", y = "Mood  Rating")

mental_df %>%
  ggplot(aes(x = sleep_duration_hours, y = stress_level)) +
  geom_point() +
  labs(title = "Hours of Sleep to Stress Levels", x = "Hours of Sleep", y = "Stress Levels")

mental_df %>%
  ggplot(aes(x = sleep_duration_hours, y = mental_health_score)) +
  geom_point() +
  labs(title = "Hours of Sleep to Overall Mental Health", x = "Hours of Sleep", y = "Mental Health Score")

The results from these visualizations were not what I expected. There is not much of a distinguishable relationship for any of these graphs, and just a lot of clutter overall. Given that the results are not as clear as the above visualizations for physical activity and screen time were, we can likely assume that sleep is not as big a factor as the others.

Are Mental Wellness Apps Worth The Investment?

mental_df %>%
  ggplot(aes(y = mood_rating)) +
  geom_boxplot() +
  facet_wrap(~ uses_wellness_apps) +
  labs(title = "Mood Ratings by Use of Wellness Apps", y = "Mood Rating")

mental_df %>%
  ggplot(aes(y = stress_level)) +
  geom_boxplot() +
  facet_wrap(~ uses_wellness_apps) +
  labs(title = "Stress Levels by Use of Wellness Apps", y = "Stress Levels")

mental_df %>%
  ggplot(aes(y = mental_health_score)) +
  geom_boxplot() +
  facet_wrap(~ uses_wellness_apps) +
  labs(title = "Overall Mental Health by Use of Wellness Apps", y = "Mental Health Score")

Overall, it seems that wellness apps may be worth the investment. For the population that uses wellness apps, their median mood rating and mental health score were higher than that of the population that do not use the apps. However, the median stress level was actually higher for the population that uses the apps compared to those who do not.

Does Caffeine Intake Have an Effect on Mood, Stress, or Overall Mental Health?

mental_df %>%
  ggplot(aes(x = caffeine_intake_mg_per_day, y = mood_rating)) +
  geom_point() +
  labs(title = "Effect of Caffeine Intake on Mood", x = "Caffeine Intake (mg)", y = "Mood Rating")

mental_df %>%
  ggplot(aes(x = caffeine_intake_mg_per_day, y = stress_level)) +
  geom_point() +
  labs(title = "Effect of Caffeine Intake on Stress", x = "Caffeine Intake (mg)", y = "Stress Levels")

mental_df %>%
  ggplot(aes(x = caffeine_intake_mg_per_day, y = mood_rating)) +
  geom_point() +
  labs(title = "Effect of Caffeine Intake on Overall Mental Health", x = "Caffeine Intake (mg)", y = "Mental Health Score")

Caffeine intake does not seem to have a noticeable effect on mood, stress, or overall mental health. Caffeine consumption could likely affect sleep quality or duration; however, we have also seen that this is not a big factor in those categories either.

Is There a Difference on Overall Mental Health For Screentime Spent on Work vs Entertainment?

mental_df %>%
  ggplot(aes(x = entertainment_hours, y = mental_health_score)) +
  geom_point() +
  labs(title = "Entertainment Screentime Hours to Mental Health Score", x = "Entertainment Screentime", y = "Mental Health Score")

mental_df %>%
  ggplot(aes(x = work_related_hours, y = mental_health_score)) +
  geom_point() +
  labs(title = "Work Screen Hours to Mental Health Score", x = "Hours of Work Screentime", y = "Mental Health Score")

These results are especially interesting, as we have already concluded that the more screen time a patient has, the lower their mental health score will be. However, in the case of entertainment screentime hours, the more time spent seems to lead to an increase in overall mental health. Furthermore, in the case of work screentime, it is the same negative relationship we have already seen.

What is even more interesting about the work screentime visualization is that it takes a hard dive after the 3-hour mark. Before then, mental health scores are rather high. This could be an indication of when the human brain becomes too strained or overwhelmed by the workload.

Secondary Source - Our World in Data

As noted above, I incorporated additional data from the Our World in Data platform to further support the analysis of mental health trends from 2002 to 2021. This dataset reinforces the earlier observation that, on average, a growing share of the population is being diagnosed with mental health conditions. A key variable in the dataset, “[mental illness]_abs_change,” captures the difference in the proportion of the population diagnosed with each condition in 2002 compared to 2021. Positive values indicate an increase over this period, while negative values indicate a decline. This metric serves as the central focus of the analysis presented here.

wid_data <- 
  read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/olsonp2_xavier_edu/IQDd4IRDx4rRSJ9tu22gxYWvAVhvV-goPyPFPmRBpFhdRwc?download=1")
Rows: 216 Columns: 21
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): Country or region
dbl (20): schiz_2002, schiz_2021, schiz_abs_change, schiz_rel_change, depr_2...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Conclusion

Overall, this analysis emphasizes the complexity of mental health and the wide range of environmental influences. While physical activity, total screen time, and hours on social media showed clearer relationships with mental health outcomes, other variables such as diet, sleep, and caffeine intake did not display strong or consistent patterns. The difference between screen time for work and entertainment further implies that the kind of digital engagement is important, as work-related screen time exhibits a stronger negative correlation.

Interestingly, in both the work-related screen time and social media analyses, mental health scores began to decline after a specific hour mark. For work-related hours, it was three hours, and for social media it was two. These trends could suggest the possibility of a threshold at which screen exposure starts to negatively influence mental well-being. This potential cutoff point warrants further investigation, as it may have meaningful implications for understanding how screen time affects cognitive and emotional functioning.

Global diagnostic patterns from 2002 to 2021 also show a consistent increase in depression and anxiety, highlighting the growing significance of treating mental health issues. This initiative highlights the need for ongoing awareness and research in this area and offers significant insight into potential contributing elements, even though it cannot establish causation.