The data for this study was sourced from a survey taken by 534 University of Sydney undergraduate students enrolled in DATA1001.
The responses from the following questions were used in the study:
“How many hours do you spend on social media on your phone per day?”
“How many friends do you have at the University of Sydney?”
“Over the last 2 weeks, how would you rate the amount of stress you have experienced in your life?”
One limitation was different interpretations of the question on participants’ number of friends. The mean was 4.44 friends, suggesting most participants reported close friends. Some responses were as high as 100, likely indicating acquaintances. Because the mean value was 4.44, it was assumed that responses counted only close friends.
There is also likely confounding in the number of friends variable: whether students live on campus or are domestic or international students could have some effect.
Data Cleaning
8 outliers were removed from the study because they reported 20+ hours daily social media use (SMU), which is unrealistic and are all extreme outliers at 6.23 or more standard deviations from the mean. Another 9 participants’ data was excluded because they chose not to participate.
Research Question 1
Does the amount of time spent using social media affect participants’ reported level of stress?
Previous studies focused almost exclusively on adolescents, and we hypothesized that studying university students may produce different results. It was expected that a higher daily SMU would not have a statistically significant correlation with reported levels of stress.
Figure 1. Level of stress experienced in the last 2 weeks by daily hours of social media use
Code
SMU_stress_filtered_data <- data1001_survey_data[data1001_survey_data$social_media_use <20,] %>%select(social_media_use, si_1)SMU_stress_filtered_data$si_1 <-as.factor(SMU_stress_filtered_data$si_1)SMU_stress_filtered_data <-na.omit(SMU_stress_filtered_data)ggplot(SMU_stress_filtered_data, aes(x = social_media_use, y = si_1)) +geom_boxplot() +labs(, x ="Social Media Use (Hours/Day", y ="Reported Level of Stress") +theme_classic() +theme(text =element_text(size =12, family ="serif"))
Figure 1 shows that there is no significant correlation between SMU and stress. The mean SMU value for each category of stress is very similar, and the distribution around the mean is large for all levels of stress.
This was confirmed using a one-way ANOVA test, which supported our null hypothesis (P>0.05).
Code
anova_result <-aov(SMU_stress_filtered_data$social_media_use ~ SMU_stress_filtered_data$si_1, data = SMU_stress_filtered_data)anova_summary <-summary(anova_result)## P = 0.098
Research Question 2
Does the amount of time spent using social media affect a person’s reported number of friends?
It was hypothesized that 1-2 hours of daily SMU would correspond with an increase in reported number of friends, but higher daily use would correlate with less friends.
Figure 2. Number of reported friends by hours of social media use per day.
Code
ggplot(SMU_filtered_data,aes(x = social_media_use, y = friends_count)) +geom_point() +labs( x ="Social Media Use (Hours/Day)", y ="Number of Friends" ) +theme_classic() +theme(text =element_text(size =12, family ="serif"), panel.grid.major =element_line(color ="grey90")) +geom_smooth(method ="lm", se =FALSE, color ="red")
Figure 2 shows that the highest mean number of reported friends is between 1 and 5 hours of daily SMU. The linear model used did not fit the data, with a correlation coefficient of r = -0.075.
Figure 3. Residual plot for number of friends and social media use
Code
model =lm(SMU_filtered_data$friends_count ~ SMU_filtered_data$social_media_use)ggplot(model, aes(x = .fitted, y = .resid)) +geom_point() +geom_hline(yintercept =0, colour ="red") +labs( x ="Fitted Values", y ="Residual Values") +theme_classic() +theme(text =element_text(size =12, family ="serif"), panel.grid.major =element_line(color ="grey90"))
The residual plot for the linear model shows very low homoscedasticity and an uneven spread across the X axis. The high residual values are present likely because of different interpretations of “number of friends,” as mentioned in the IDA.
To better visualize the results, the data was binned into categories by SMU, and the mean number of friends for each category is represented in a bar graph in figure 3:
Figure 4. Mean number of friends by daily social media use. (a) binned in 5 hour intervals. (b) binned in 1 hour intervals from 0 to 7 hours.
Code
library(gridExtra)SMU_filtered_data <- data1001_survey_data[data1001_survey_data$social_media_use <20,] %>%select(social_media_use, friends_count)SMU_binned_data <- SMU_filtered_data %>%mutate(bins =cut(social_media_use, breaks =seq(0, 15, by =5), right =FALSE))bin_means <- SMU_binned_data %>%group_by(bins) %>%summarize(mean_friends =mean(friends_count, na.rm =TRUE)) %>%filter(!is.na(bins)) SMU_05_filtered_data <- data1001_survey_data[data1001_survey_data$social_media_use <=7,] %>%select(social_media_use, friends_count)SMU_05_binned_data <- SMU_05_filtered_data %>%mutate(bins =cut(social_media_use, breaks =seq(0, 7, by =1), right =FALSE))bin_means_05 <- SMU_05_binned_data %>%group_by(bins) %>%summarize(mean_friends =mean(friends_count, na.rm =TRUE)) %>%filter(!is.na(bins)) p1 <-ggplot(bin_means, aes(x = bins, y = mean_friends)) +geom_hline(yintercept =c(1,2,3,4,5,6,7,8,9,10,11,12), color ="#F0F0F0") +geom_bar(stat ="identity", color ="black", fill ="lightgrey") +theme_classic() +theme(text =element_text(size =12, family ="serif"),plot.title =element_text(face ="bold", size =12, margin =margin(t =15, b =15))) +scale_y_continuous(limits =c(0, NA), expand =expansion(mult =c(0, 0.)), breaks =seq(0, 12, by =3)) +labs(x ="Social Media Use (Hours/Day)", y ="Mean Number of Friends", title ="(A)")p2 <-ggplot(bin_means_05, aes(x = bins, y = mean_friends)) +geom_hline(yintercept =c(1,2,3,4,5,6,7,8,9,10,11,12), color ="#F0F0F0") +geom_bar(stat ="identity", color ="black", fill ="lightgrey") +theme_classic() +theme(text =element_text(size =12, family ="serif"),plot.title =element_text(face ="bold", size =12, margin =margin(t =15, b =15))) +scale_y_continuous(limits =c(0, NA), expand =expansion(mult =c(0, 0.)), breaks =seq(0, 12, by =3)) +labs(x ="Hours of Social Media Use (0 to 7 hours)", y ="Mean Number of Friends", title ="(B)")grid.arrange(p1, p2, ncol =2)
Figure 4a shows that the mean number of friends decreases from 5 to 15 hours of SMU. Figure 4b shows that the highest mean number of friends occurs between 2 to 3 hours and 4 to 5 hours of SMU. This contradicts the hypothesis, instead suggesting that 2 to 3 hours daily has the most benefit to social connection while reducing possible negative effects of higher daily use.
Articles
A cross sectional study in 2021 showed that the correlation between SMU and anxiety was often insignificant and bidirectional, and most studies had confounding variables (Boer et al., 2021). Our study showed similar results, finding no correlation between SMU and stress.
Another study in 2021 suggested that there was little correlation between SMU and social connectedness (Coyne et al., 2020). However, our current study suggested that 2-3 hours of daily SMU correlated with higher numbers of friends at university. This could be caused by differences in social life in university compared to adolescence, or by the confounding and bidirectionality mentioned in Boer’s study (Boer et al., 2021).
Acknowledgements
Group meetings were held on 28/08, 04/09, and 11/09 from 3 to 4 pm to complete the assignment.
Group Member contributions:
Vivian:
Completed assumptions for IDA
edited IDA to meet word count
research and writing for articles
created slides for presentation
Sully:
Executive summary
Coding in R for data cleaning, graphs, and calculations
Research questions 1 and 2
Formatting and publishing quarto document
Acknowledgements
Susan
Wrote the structure for IDA section
Rachel
Source and limitations for IDA section
Helped with presentation slides
Wang
helped with coding in R for graphs
Marco
Wrote the professional standard of report
ChatGPT prompts used to help with coding in Rstudio:
How to split a data frame into bins in Rstudio
How to plot the mean Y value of each bin as a bar graph
How can I remove the “NA” bin from the bar graph
Is there any way to use Rstudio to automatically asses an entire data table to see if any 2 variables have a linear correlation, without using cor() for every variable pair individually
How to exclude NA values using cor_matrix function in Rstudio
How to add gridlines to classic theme ggplot
How can I change the intervals of Y axis labels
How to find the linear correlation of 2 variables without including NA values
References
Maartje Boer, Gonneke W.J.M. Stevens, Catrin Finkenauer, Margaretha E. de Looze, Regina J.J.M. van den Eijnden, Social media use intensity, social media use problems, and mental health among adolescents: Investigating directionality and mediating processes, 2021.
Sarah M. Coyne, Adam A. Rogers, Jessica D. Zurcher, Laura Stockdale, McCall Booth, Does time spent using social media impact mental health?: An eight year longitudinal study, Computers in Human Behavior, Volume 104, 2020.
This study adheres to the ISI professional values by respecting the privacy of participants, by removing data from participants who requested to have their responses excluded from the study. It also adheres to ethical principles laid out in the ISI declaration of ethics by maintaining an objective stance throughout the study, sourcing data from reliable origins, and referencing all studies and sources mentioned in the report.