Mental Health vs Social Media Use

Author

Nhi Vu

Mental Health vs Social Media

Mental Health vs Social Media

The data set I selected was called “Social Media and Mental Health Analysis” and published by Randhir Kumar. The data was collected through self-reported surveys from a survey distributed between April-July 2022. This data set looks widely at the data about the implications of social media on mental health across a wide pool of people, from 13 years of age to over 60 years old.

These are the main variables that we will be focusing on for this data set:

Some questions that I want to find the answers to are:

  1. What is the relationship between time spent on social media and how it affects people’s mental health?

  2. How does social media usage vary across different age groups?

I choose this data set as a personal reminder to remember how technology and social media continue to grow and provide good things in our lives, but the word “excessive” can lead to adverse consequences. I wanted to know how excessive social media usage could do to our mental health.

Load all the packages that I need

library(tidyverse)
library(ggthemes)
library(plotly)
library(ggfortify)
setwd("/Users/nhi.vu/Desktop/DATA110")
mh <- read_csv("Mental Health Social Media Dataset.csv")

CLeaning data

I started cleaning the data by filtering out the columns I wanted to work with. I then created a number of new columns that would be re coded values so that it would be less work for me. I also renamed a number of variables to clear up any confusion. However, because the data set was not very organized, most of the cleaning was repeating cycles of new columns to re-organize the data.

clean1 <- mh |>
  filter(Do_you_use_social_media == "Yes") |>
  mutate(age_group = case_when( #this is creating a new column
    What_is_your_age < 20 ~ "Less than 20 years old",
    What_is_your_age >= 20 & What_is_your_age <= 30 ~ "From 20-30 years old",
    What_is_your_age >=30 & What_is_your_age <=40 ~ "From 30-40 years old",
    What_is_your_age >40 ~ "Greater than 40 years old")) |>
  rename(avg_time = What_is_the_average_time_you_spend_on_social_media_every_day) |>
  mutate(sm_hour = case_when(avg_time == "Less than an Hour" ~ "0.5", #credit to ChatGPT to help with how to set up case_when()
                             avg_time == "Between 1 and 2 hours" ~ "1.5",
                             avg_time == "Between 2 and 3 hours" ~ "2.5",
                             avg_time == "Between 3 and 4 hours" ~ "3.5",
                             avg_time == "Between 4 and 5 hours" ~ "4.5",
                             avg_time == "More than 5 hours" ~ "6"))
clean2 <- clean1 |>
  rowwise()  |> # this is to make sure the mutate function calculate across the row and not column
  mutate(mh_score = sum(c_across(c(How_often_do_you_feel_depressed_or_down, # crediting the rowwise and c_across to chatgpt
                                    How_easily_distracted_are_you,
                                    How_often_do_you_face_issues_regarding_sleep,
                                    How_much_are_you_bothered_by_worries,
                                    How_often_do_you_compare_yourself_to_other_successful_people,
                                    Do_you_look_to_seek_validation_from_features_of_social_media,
                                    Do_you_get_distracted_by_Social_media_when_you_are_busy,
                                    Do_you_feel_restless_if_you_havent_used_Social_media,
                                    Do_you_find_yourself_using_Social_media_without_reason,
                                    Do_you_find_it_difficult_to_concentrate_on_things)))) |>
  rename(sm_platforms = What_social_media_platforms_do_you_commonly_use) |>
  filter( Gender %in% c ("Female", "Male"))
clean3 <- clean2 |>
  group_by(Relationship_Status) |>
  mutate(avg_score = mean(c_across(c(How_often_do_you_feel_depressed_or_down, # crediting the rowwise and c_across to chatgpt
                                    How_easily_distracted_are_you,
                                    How_often_do_you_face_issues_regarding_sleep,
                                    How_much_are_you_bothered_by_worries,
                                    How_often_do_you_compare_yourself_to_other_successful_people,
                                    Do_you_look_to_seek_validation_from_features_of_social_media,
                                    Do_you_get_distracted_by_Social_media_when_you_are_busy,
                                    Do_you_feel_restless_if_you_havent_used_Social_media,
                                    Do_you_find_yourself_using_Social_media_without_reason,
                                    Do_you_find_it_difficult_to_concentrate_on_things))))
head(clean3)
# A tibble: 6 × 26
# Groups:   Relationship_Status [2]
  Serial_Number Timestamp          What_is_your_age Gender Relationship_Status
          <dbl> <chr>                         <dbl> <chr>  <chr>              
1             1 4/18/2022 19:18:47               21 Male   In a relationship  
2             2 4/18/2022 19:19:28               21 Female Single             
3             3 4/18/2022 19:25:58               21 Female Single             
4             4 4/18/2022 19:29:43               21 Female Single             
5             5 4/18/2022 19:33:31               21 Female Single             
6             6 4/18/2022 19:33:47               22 Female Single             
# ℹ 21 more variables: Occupation_Status <chr>,
#   What_type_of_organizations_are_you_affiliated_with <chr>,
#   Do_you_use_social_media <chr>, sm_platforms <chr>, avg_time <chr>,
#   Do_you_find_yourself_using_Social_media_without_reason <dbl>,
#   Do_you_get_distracted_by_Social_media_when_you_are_busy <dbl>,
#   Do_you_feel_restless_if_you_havent_used_Social_media <dbl>,
#   How_easily_distracted_are_you <dbl>, …

A scatterplot that show the connection between social media hours vs. depression. Faceted by Age Group

p1 <- ggplot(clean3, aes(x = sm_hour, y = mh_score, color = age_group, 
                         text = paste0(  # this is for the tool tips
                           "Age: ", What_is_your_age,
                           "<br>Status: ", Relationship_Status,
                           "<br>Avg Score: ", round(avg_score, 2)))) +
  geom_point(size = 4) + # the size of the dot
  scale_color_brewer(palette = "Set2")+ # choosing color palette for the dots
  facet_wrap(~ Gender) + #this is to create the different graphs like below
  labs(title = "Social Media Time vs Depression Score Faceted by Age Group in 2022",
       caption = "Source: Randhir Kumar's Self-Reported Surveys",
       x = "Time Spend on Social Media",
       y = "Depression Scores",
       color = "Age Groups") +
  theme_gray() #the theme of the graph
  
ggplotly(p1, tooltip = "text") # to make the interactive alive

This interactive scatter plot shows the relationship between time spent on social media and the depression scores, faceted by gender. However, you can’t really see the correlation because of the density so I’ll create another graph.

A bar graph that show average depression score by occupational status.

p2 <- ggplot(clean3, aes(x = Occupation_Status, y = avg_score, fill = Relationship_Status)) +
  geom_col(position = "dodge") + # this is for the bar to not stack on each other
  scale_color_brewer(palette = "Set1")+
  labs(title = "Depression Score in Different Occupations Faceted by Relationship Status in 2022",
       caption = "Source: Randhir Kumar's Self-Reported Surveys",
       x = "Occupation",
       y = "Depression Scores",
       fill = "Relationship Status") +
  theme_bw()
p2

This bar graph shows the relationship between occupational status and depression scores. You could see that majority of students have the highest depressioin scores.

Multiple Linear Regression Analysis

r1 <- lm(mh_score ~ sm_hour + age_group + Relationship_Status + Occupation_Status, data = clean3)
summary(r1)

Call:
lm(formula = mh_score ~ sm_hour + age_group + Relationship_Status + 
    Occupation_Status, data = clean3)

Residuals:
     Min       1Q   Median       3Q      Max 
-22.8819  -4.5924   0.5633   4.5633  21.8524 

Coefficients:
                                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)                           26.0935     4.4440   5.872 8.32e-09 ***
sm_hour1.5                             5.4365     1.5497   3.508 0.000496 ***
sm_hour2.5                             8.2891     1.5109   5.486 6.82e-08 ***
sm_hour3.5                             9.7797     1.5545   6.291 7.40e-10 ***
sm_hour4.5                            10.6436     1.6386   6.496 2.17e-10 ***
sm_hour6                              11.7342     1.5402   7.618 1.50e-13 ***
age_groupFrom 30-40 years old         -3.6312     1.7637  -2.059 0.040079 *  
age_groupGreater than 40 years old    -5.0227     1.6289  -3.084 0.002169 ** 
age_groupLess than 20 years old       -1.2833     1.5806  -0.812 0.417267    
Relationship_StatusIn a relationship  -4.3302     3.4860  -1.242 0.214812    
Relationship_StatusMarried            -4.4565     3.3133  -1.345 0.179283    
Relationship_StatusSingle             -4.0035     3.4165  -1.172 0.241880    
Occupation_StatusSalaried Worker       1.4030     2.6309   0.533 0.594099    
Occupation_StatusSchool Student        0.3409     3.1469   0.108 0.913773    
Occupation_StatusUniversity Student    2.0577     2.7486   0.749 0.454472    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.087 on 456 degrees of freedom
Multiple R-squared:  0.2841,    Adjusted R-squared:  0.2621 
F-statistic: 12.92 on 14 and 456 DF,  p-value: < 2.2e-16

Diagostics plots

autoplot(r1, 1:4, nrow=2, ncol=2)

This is a multiple linear regression to test the extent to which age, social media behavior, relationship status, and occupational status predicted depression score. In total, the model was significant overall (F(14, 456) = 12.92, p < 0.001), with an adjusted R² value of 26.0, indicating that about 26% of the variability in mental health scores can be explained by these factors.

Social media behavior was a very strong and consistent predictor: individuals who spent over 5 hours of social media time per day scored, on average, 11.73 points higher on mental health distress (p < 0.001) when compared to those who spent less than one hour. Similarly, age mattered; participants who were over age 40 on average scored 5 points lower (p = 0.002) indicating lower distress. Relationship status and occupational status were not significant to the outcome.

In sum, the results provide evidence for the result that as social media time increases, mental health distress increases, while younger age groups report higher symptoms.

ConcLusion

This project analyzed the relationship between social media consumption and mental health indicators, utilizing the Mental Health and Social Media Analysis data by Randhir Kumar. Data cleaning, transforming, and visualization were created to gauge how demographics such as age, gender, relationship status, use of social media to maintain contact with people.

The results highlight a consistent trend: there is a correlation between time spent on social media and the level of distress reported. This was especially evident in younger age, and more so in their use of social media for validation with likes. The regression models provided evidence that time spent using social media is a significant predictor of mental health scores, controlling for other demographic indicators.

In summary, even though it may be an essential aspect of modern living, the analysis of this data can remind us about how mental health can be adversely impacted by excessive social media use. As technology continues to evolve, we must remain aware of how much time we’re spending digitally, especially if it’s adversely affecting our mental health.