Abstract

Social media has become an integral part of the lives of many teens, providing a platform for communication, entertainment, and information sharing. However, concerns have been raised about the potential negative effects of social media use on mental health. This paper aims to investigate the relationship between social media use and mental health in teens using a comprehensive approach that incorporates machine learning techniques.

Introduction

Social media has revolutionized the way we connect, communicate, and consume information. For teens, social media platforms like Instagram, Facebook, YouTube, and TikTok have become an essential part of their daily lives. While social media can offer positive benefits, such as staying connected with friends and family, accessing news and educational resources, and exploring creative interests, there is growing evidence suggesting that excessive social media use may negatively impact mental health (Bozzola et al. 2022).

This paper aims to address the following research question: How does social media use affect the mental health of teens? This question forms the foundation of our investigation, guiding the selection of variables and the application of machine learning methods. This paper will hypothesize that social media use is positively associated with symptoms of depression, anxiety, and low self-esteem in teens.

Studies have shown a correlation between increased social media use and symptoms of depression, anxiety, and low self-esteem in teens. These negative effects may be attributed to various factors, such as cyberbullying, social comparison, and the fear of missing out (FOMO) (Bozzola et al. 2022). Additionally, excessive social media use can disrupt sleep patterns, reduce physical activity, and interfere with schoolwork, further contributing to mental health concerns. Conversely, others argue for the positive aspects, emphasizing the role of online communities in fostering support and connection (Office of the Surgeon General (OSG) 2023).

Adolescence as a Vulnerable Period

Adolescence is a critical developmental stage characterized by rapid physical, cognitive, and socio-emotional changes. During this period, individuals are highly susceptible to external influences, including those emanating from the digital realm. Social media, serving as a platform for social interaction, self-expression, and comparison, can significantly impact the mental well-being of teenagers (Vidal et al. 2020).

The quest for social validation and the cultivation of a digital identity can be particularly intense during early adolescence, potentially amplifying the impact of online interactions on mental health.

Digital Literacy and Coping Mechanisms

As teenagers progress through adolescence, their digital literacy and coping mechanisms may evolve. Older adolescents may develop more sophisticated strategies for managing online experiences, filtering content, and mitigating the potential negative effects of social media. Understanding how these coping mechanisms develop over time is crucial for tailoring interventions and educational initiatives to specific age group (Itō et al. 2009).

Communication Styles and Cyberbullying

Gender differences in communication styles may also intersect with the social dynamics of online interactions. Cyberbullying, a pervasive issue in the digital age, can manifest differently based on gender. Understanding how social media platforms facilitate or mitigate cyberbullying experiences for boys and girls is essential for creating a safer online environment (Bozzola et al. 2022).

Interplay of Age and Gender

The interplay between age and gender further complicates the landscape of social media’s impact on teen mental health. Exploring whether certain effects are more pronounced in specific age-gender cohorts enable a more nuanced understanding of the diverse experiences within the teenage population.

As this paper unravels the intricate dynamics of social media’s influence on the mental health of teenagers, the roles of age and gender emerge as crucial dimensions. Recognizing the vulnerability of younger adolescents and understanding gender-specific challenges allows for targeted interventions that promote positive online experiences (Vidal et al. 2020). The subsequent sections will delve into the empirical findings derived from our machine learning analyses, shedding light on the specific factors that contribute to the complex interplay between social media, age, gender, and teen mental health.

Data

The analysis will utilize data from the Pew Research Center’s “Teens, Social Media and Technology 2022” dataset. This dataset includes a nationally representative sample of 1,316 teens aged 13-17 in the United States. The dataset employed in this study is derived from a comprehensive survey conducted among a representative sample of teenagers. The survey covers various aspects of their lives, including social media habits, mental health indicators, demographic information, and socio-economic factors. The dataset spans diverse geographical regions, ensuring a broad and inclusive representation of the teenage population.

The survey data provides insights into various aspects of teenagers’ lives. Descriptive statistics reveal that the average worry score among teenagers is 2.03, with a standard deviation of 0.89. Additionally, the data shows that:

Social media use: The majority of teenagers use social media daily, with 65% reporting that they use social media multiple times a day.
Demographics: The majority of the respondents were female (53%), and the average age was 15 years.
Technology access: Nearly all respondents (89%) have access to the internet, and 95% have a smartphone.

Variables

The independent variable in this study is social media use, measured by frequency of use and time spent on social media. The dependent variable is mental health, measured by self-reported symptoms of depression, anxiety, and low self-esteem. Additional variables include demographic.

Machine Learning Methods

To explore the complex relationship between social media use and mental health, I will employ a variety of machine learning techniques. These methods will allow us to identify patterns in the data, predict mental health outcomes based on social media use and other factors, and uncover underlying relationships that may not be readily apparent from traditional statistical analysis.

Linear Regression

The linear regression model was significant (p-value < 0.001) and explained 15.97% of the variance in WEIGHT. This means that the model can explain some, but not all, of the variation in worry about technology use based on the provided variables. The coefficients in the model represent the change in WEIGHT for each unit change in the corresponding predictor variable. For example, the coefficient for P_EDUC is -0.186689. This means that for each unit increase in P_EDUC, WEIGHT is expected to decrease by 0.186689 units. For example, higher scores on the negative social connection scales and living in a home without internet access are associated with higher worry scores.

Significant factors: The model identified several significant factors influencing the impact that social media has on teenagers, including:
- P_EDUC: Parental education (p-value = 2.78e-11).
- HOUSING: Whether the respondent’s parent is a homeowner or not (p-value = 0.000651).
- INCOME: Household income (p-value = 9.62e-05).
- INTERNET: Internet access at home (p-value = 0.023365).
- PHONESERVICE: Telephone service at home (p-value = 0.039474).
R-squared: The model explains 15.97% of the variance in worry about technology.

## 
## Call:
## lm(formula = WEIGHT ~ WORRYB + SOC1 + SOC2NEGA + SOC2NEGB + SOC2NEGC + 
##     SOC2NEGD + GENDER + AGE + P_EDUC + RACETHNICITY + HOME_TYPE + 
##     HOUSING + INCOME + INTERNET + PHONESERVICE + METRO + REGION4 + 
##     HHSIZE, data = training_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7866 -0.6953 -0.0928  0.4167  6.0542 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.173543   0.726302   4.369 1.52e-05 ***
## WORRYB        0.005121   0.049778   0.103 0.918104    
## SOC1          0.068118   0.053656   1.270 0.204840    
## SOC2NEGA      0.046056   0.099861   0.461 0.644856    
## SOC2NEGB     -0.016921   0.075497  -0.224 0.822753    
## SOC2NEGC      0.025295   0.086909   0.291 0.771134    
## SOC2NEGD     -0.093680   0.085684  -1.093 0.274781    
## GENDER        0.001994   0.007363   0.271 0.786657    
## AGE          -0.020457   0.031696  -0.645 0.518958    
## P_EDUC       -0.186689   0.027404  -6.812 2.78e-11 ***
## RACETHNICITY -0.072772   0.037385  -1.947 0.052147 .  
## HOME_TYPE    -0.021583   0.051794  -0.417 0.677070    
## HOUSING      -0.352631   0.102768  -3.431 0.000651 ***
## INCOME        0.050948   0.012957   3.932 9.62e-05 ***
## INTERNET     -0.328001   0.144212  -2.274 0.023365 *  
## PHONESERVICE  0.099989   0.048429   2.065 0.039474 *  
## METRO         0.037862   0.150275   0.252 0.801185    
## REGION4      -0.017309   0.047214  -0.367 0.714075    
## HHSIZE        0.051394   0.036018   1.427 0.154231    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9957 on 498 degrees of freedom
## Multiple R-squared:  0.1597, Adjusted R-squared:  0.1293 
## F-statistic: 5.259 on 18 and 498 DF,  p-value: 3.255e-11

##  (Intercept)       WORRYB         SOC1     SOC2NEGA     SOC2NEGB     SOC2NEGC 
##  3.173543204  0.005120864  0.068118214  0.046055901 -0.016920739  0.025295058 
##     SOC2NEGD       GENDER          AGE       P_EDUC RACETHNICITY    HOME_TYPE 
## -0.093680480  0.001993796 -0.020456886 -0.186688699 -0.072772135 -0.021583123 
##      HOUSING       INCOME     INTERNET PHONESERVICE        METRO      REGION4 
## -0.352630691  0.050947812 -0.328000775  0.099988952  0.037861673 -0.017308675 
##       HHSIZE 
##  0.051394172

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.4110  0.6146  0.9326  0.9140  1.2004  2.7086

## `geom_smooth()` using formula = 'y ~ x'

Lasso Regression

To investigate the relationship between various social and demographic factors and depression score, a lasso regression model was fitted. The model was evaluated using mean-squared error (MSE) as the performance metric. The lasso regression was regularized by setting the alpha parameter to 1. The lasso regression analysis yielded a minimum MSE of 1.02 at a lambda value of 0.027. This model included 10 non-zero coefficients.

Significant factors: Significant factors: The model identified several significant factors influencing the impact that social media has on teenagers, including:
- P_EDUC: Parental education
- RACETHNICITY: Race of respondent.
- HOUSING: Whether the respondent’s parent is a homeowner or not
- INCOME: Household income
- INTERNET: Internet access at home
- PHONESERVICE: Telephone service at home
- HHSIZE: Household Size

These coefficients indicate that higher education, belonging to specific racial/ethnic groups, certain housing types, higher housing costs, lower income, and limited access to internet and phone services were associated with higher depression scores.

## 
## Call:  cv.glmnet(x = X, y = Y, alpha = 1) 
## 
## Measure: Mean-Squared Error 
## 
##      Lambda Index Measure     SE Nonzero
## min 0.01866    28   1.007 0.1043      11
## 1se 0.15852     5   1.107 0.1216       2

## 20 x 1 sparse Matrix of class "dgCMatrix"
##                       s1
## (Intercept)   2.91134678
## (Intercept)   .         
## SOC1          0.04865721
## WORRYB        .         
## SOC2NEGA      .         
## SOC2NEGB      .         
## SOC2NEGC      .         
## SOC2NEGD     -0.03932102
## GENDER        .         
## AGE          -0.00601067
## P_EDUC       -0.16972415
## RACETHNICITY -0.05978181
## HOME_TYPE    -0.00895025
## HOUSING      -0.32751626
## INCOME        0.04340513
## INTERNET     -0.27264969
## PHONESERVICE  0.07046669
## METRO         .         
## REGION4       .         
## HHSIZE        0.04014066

Decision Trees

The decision tree model is a predictive model that uses a tree structure to predict a continuous variable. The model has 15 terminal nodes and a complexity parameter of 0.06693119 as well as a root node error of 1.1767. The decision tree model classified individuals into categories of high, medium, and low worry based on their responses to questions about their technology use and other variables. The model achieved an accuracy of 27% in predicting worry levels.

Significant factors: The decision tree identified the following factors as most influential:
- P_EDUC: Parental education
- RACETHNICITY: Race of respondent.
- HOME_TYPE: Whether the respondent’s live in a House or Apartment
- HOUSING: Whether the respondent’s parent is a homeowner or not
- INCOME: Household income
- INTERNET: Internet access at home
- PHONESERVICE: Telephone service at home
- HHSIZE: Household Size
Accuracy: The decision tree achieved an accuracy of 27% in predicting the influence that social media has on teenagers.

## n= 690 
## 
## node), split, n, deviance, yval
##       * denotes terminal node
## 
##  1) root 690 811.943500 0.9608696  
##    2) P_EDUC>=9.5 558 422.788500 0.8243728  
##      4) HOUSING>=1.5 204 110.171600 0.5637255 *
##      5) HOUSING< 1.5 354 290.771200 0.9745763  
##       10) INCOME< 14.5 274 196.715300 0.8905109 *
##       11) INCOME>=14.5 80  85.487500 1.2625000 *
##    3) P_EDUC< 9.5 132 334.810600 1.5378790  
##      6) HOUSING>=1.5 77 102.701300 1.1298700  
##       12) P_EDUC>=8.5 44  38.431820 0.8863636 *
##       13) P_EDUC< 8.5 33  58.181820 1.4545450  
##         26) HHSIZE>=3.5 25  26.640000 1.1200000 *
##         27) HHSIZE< 3.5 8  20.000000 2.5000000 *
##      7) HOUSING< 1.5 55 201.345500 2.1090910  
##       14) AGE>=13.5 43 114.511600 1.8139530  
##         28) HHSIZE>=4.5 19   7.789474 1.1052630 *
##         29) HHSIZE< 4.5 24  89.625000 2.3750000  
##           58) HOME_TYPE>=1.5 7   3.714286 1.4285710 *
##           59) HOME_TYPE< 1.5 17  77.058820 2.7647060 *
##       15) AGE< 13.5 12  69.666670 3.1666670 *

## Call:
## rpart(formula = WEIGHT ~ WORRYB + SOC1 + SOC2NEGA + SOC2NEGB + 
##     SOC2NEGC + SOC2NEGD + GENDER + AGE + P_EDUC + RACETHNICITY + 
##     HOME_TYPE + HOUSING + INCOME + INTERNET + PHONESERVICE + 
##     METRO + REGION4 + HHSIZE, data = teens_and_tech_data)
##   n= 690 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.06693119      0 1.0000000 1.0019264 0.11775953
## 2 0.03788916      1 0.9330688 0.9724501 0.10894858
## 3 0.02690554      2 0.8951797 0.9771849 0.10321770
## 4 0.02114329      3 0.8682741 0.9728641 0.09782862
## 5 0.02105707      4 0.8471308 0.9508280 0.09235732
## 6 0.01090210      5 0.8260738 0.9709485 0.10168671
## 7 0.01085635      6 0.8151716 0.9852759 0.10199482
## 8 0.01055290      8 0.7934590 0.9852759 0.10199482
## 9 0.01000000      9 0.7829061 0.9887045 0.10201382
## 
## Variable importance
##       P_EDUC      HOUSING       HHSIZE       INCOME    HOME_TYPE          AGE 
##           27           21           12           12           10            8 
## RACETHNICITY PHONESERVICE     SOC2NEGB     SOC2NEGA       WORRYB      REGION4 
##            4            2            1            1            1            1 
## 
## Node number 1: 690 observations,    complexity param=0.06693119
##   mean=0.9608696, MSE=1.17673 
##   left son=2 (558 obs) right son=3 (132 obs)
##   Primary splits:
##       P_EDUC       < 9.5  to the right, improve=0.06693119, (0 missing)
##       HOUSING      < 1.5  to the right, improve=0.03419535, (0 missing)
##       INCOME       < 14.5 to the left,  improve=0.02404995, (0 missing)
##       RACETHNICITY < 1.5  to the right, improve=0.01925388, (0 missing)
##       HOME_TYPE    < 1.5  to the right, improve=0.01456196, (0 missing)
##   Surrogate splits:
##       INCOME   < 2.5  to the right, agree=0.819, adj=0.053, (0 split)
##       INTERNET < 0.5  to the right, agree=0.810, adj=0.008, (0 split)
## 
## Node number 2: 558 observations,    complexity param=0.02690554
##   mean=0.8243728, MSE=0.7576855 
##   left son=4 (204 obs) right son=5 (354 obs)
##   Primary splits:
##       HOUSING      < 1.5  to the right, improve=0.05167069, (0 missing)
##       INCOME       < 8.5  to the left,  improve=0.04457692, (0 missing)
##       HOME_TYPE    < 1.5  to the right, improve=0.03022538, (0 missing)
##       RACETHNICITY < 1.5  to the right, improve=0.02809534, (0 missing)
##       REGION4      < 2.5  to the left,  improve=0.01253777, (0 missing)
##   Surrogate splits:
##       HOME_TYPE    < 1.5  to the right, agree=0.754, adj=0.328, (0 split)
##       INCOME       < 7.5  to the left,  agree=0.720, adj=0.235, (0 split)
##       RACETHNICITY < 1.5  to the right, agree=0.672, adj=0.103, (0 split)
##       P_EDUC       < 10.5 to the left,  agree=0.649, adj=0.039, (0 split)
##       HHSIZE       < 2.5  to the left,  agree=0.643, adj=0.025, (0 split)
## 
## Node number 3: 132 observations,    complexity param=0.03788916
##   mean=1.537879, MSE=2.536444 
##   left son=6 (77 obs) right son=7 (55 obs)
##   Primary splits:
##       HOUSING      < 1.5  to the right, improve=0.09188434, (0 missing)
##       INCOME       < 7.5  to the left,  improve=0.07514877, (0 missing)
##       RACETHNICITY < 1.5  to the right, improve=0.05390727, (0 missing)
##       GENDER       < 1.5  to the right, improve=0.04439628, (0 missing)
##       HOME_TYPE    < 1.5  to the right, improve=0.03835284, (0 missing)
##   Surrogate splits:
##       HOME_TYPE    < 1.5  to the right, agree=0.705, adj=0.291, (0 split)
##       INCOME       < 7.5  to the left,  agree=0.689, adj=0.255, (0 split)
##       PHONESERVICE < 2.5  to the right, agree=0.636, adj=0.127, (0 split)
##       RACETHNICITY < 1.5  to the right, agree=0.629, adj=0.109, (0 split)
##       P_EDUC       < 8.5  to the left,  agree=0.591, adj=0.018, (0 split)
## 
## Node number 4: 204 observations
##   mean=0.5637255, MSE=0.5400567 
## 
## Node number 5: 354 observations,    complexity param=0.0105529
##   mean=0.9745763, MSE=0.8213875 
##   left son=10 (274 obs) right son=11 (80 obs)
##   Primary splits:
##       INCOME       < 14.5 to the left,  improve=0.02946770, (0 missing)
##       REGION4      < 2.5  to the left,  improve=0.02408938, (0 missing)
##       RACETHNICITY < 3.5  to the right, improve=0.01313787, (0 missing)
##       PHONESERVICE < 1.5  to the left,  improve=0.01055866, (0 missing)
##       SOC2NEGD     < 1.5  to the right, improve=0.01017531, (0 missing)
## 
## Node number 6: 77 observations,    complexity param=0.01085635
##   mean=1.12987, MSE=1.333783 
##   left son=12 (44 obs) right son=13 (33 obs)
##   Primary splits:
##       P_EDUC       < 8.5  to the right, improve=0.05927542, (0 missing)
##       WORRYB       < 2.5  to the left,  improve=0.05757256, (0 missing)
##       RACETHNICITY < 1.5  to the right, improve=0.03603945, (0 missing)
##       AGE          < 14.5 to the left,  improve=0.03315289, (0 missing)
##       REGION4      < 3.5  to the left,  improve=0.03161356, (0 missing)
##   Surrogate splits:
##       RACETHNICITY < 2.5  to the left,  agree=0.740, adj=0.394, (0 split)
##       SOC1         < 1.5  to the right, agree=0.649, adj=0.182, (0 split)
##       SOC2NEGC     < 1.5  to the right, agree=0.623, adj=0.121, (0 split)
##       SOC2NEGA     < 2.5  to the right, agree=0.610, adj=0.091, (0 split)
##       AGE          < 13.5 to the right, agree=0.597, adj=0.061, (0 split)
## 
## Node number 7: 55 observations,    complexity param=0.02114329
##   mean=2.109091, MSE=3.660826 
##   left son=14 (43 obs) right son=15 (12 obs)
##   Primary splits:
##       AGE          < 13.5 to the right, improve=0.08526222, (0 missing)
##       INCOME       < 7.5  to the left,  improve=0.08189066, (0 missing)
##       HOME_TYPE    < 2.5  to the right, improve=0.05016756, (0 missing)
##       GENDER       < 1.5  to the right, improve=0.04628258, (0 missing)
##       RACETHNICITY < 2.5  to the right, improve=0.04227620, (0 missing)
## 
## Node number 10: 274 observations
##   mean=0.8905109, MSE=0.7179392 
## 
## Node number 11: 80 observations
##   mean=1.2625, MSE=1.068594 
## 
## Node number 12: 44 observations
##   mean=0.8863636, MSE=0.8734504 
## 
## Node number 13: 33 observations,    complexity param=0.01085635
##   mean=1.454545, MSE=1.763085 
##   left son=26 (25 obs) right son=27 (8 obs)
##   Primary splits:
##       HHSIZE       < 3.5  to the right, improve=0.19837500, (0 missing)
##       AGE          < 14.5 to the left,  improve=0.18695370, (0 missing)
##       WORRYB       < 2.5  to the left,  improve=0.13703800, (0 missing)
##       INCOME       < 2.5  to the right, improve=0.07336957, (0 missing)
##       PHONESERVICE < 2.5  to the right, improve=0.07234718, (0 missing)
##   Surrogate splits:
##       WORRYB < 2.5  to the left,  agree=0.818, adj=0.25, (0 split)
##       INCOME < 1.5  to the right, agree=0.818, adj=0.25, (0 split)
## 
## Node number 14: 43 observations,    complexity param=0.02105707
##   mean=1.813953, MSE=2.663061 
##   left son=28 (19 obs) right son=29 (24 obs)
##   Primary splits:
##       HHSIZE    < 4.5  to the right, improve=0.14930500, (0 missing)
##       SOC2NEGB  < 2.5  to the left,  improve=0.07077203, (0 missing)
##       GENDER    < 1.5  to the right, improve=0.05998591, (0 missing)
##       INCOME    < 7.5  to the left,  improve=0.05998591, (0 missing)
##       HOME_TYPE < 1.5  to the right, improve=0.04623012, (0 missing)
##   Surrogate splits:
##       SOC2NEGB     < 2.5  to the left,  agree=0.651, adj=0.211, (0 split)
##       P_EDUC       < 7.5  to the left,  agree=0.651, adj=0.211, (0 split)
##       SOC2NEGA     < 1.5  to the left,  agree=0.628, adj=0.158, (0 split)
##       RACETHNICITY < 3.5  to the right, agree=0.605, adj=0.105, (0 split)
##       INCOME       < 7.5  to the left,  agree=0.605, adj=0.105, (0 split)
## 
## Node number 15: 12 observations
##   mean=3.166667, MSE=5.805556 
## 
## Node number 26: 25 observations
##   mean=1.12, MSE=1.0656 
## 
## Node number 27: 8 observations
##   mean=2.5, MSE=2.5 
## 
## Node number 28: 19 observations
##   mean=1.105263, MSE=0.4099723 
## 
## Node number 29: 24 observations,    complexity param=0.0109021
##   mean=2.375, MSE=3.734375 
##   left son=58 (7 obs) right son=59 (17 obs)
##   Primary splits:
##       HOME_TYPE    < 1.5  to the right, improve=0.09876587, (0 missing)
##       INCOME       < 7    to the left,  improve=0.09506393, (0 missing)
##       SOC2NEGD     < 2.5  to the left,  improve=0.03765690, (0 missing)
##       GENDER       < 1.5  to the right, improve=0.03765690, (0 missing)
##       RACETHNICITY < 1.5  to the right, improve=0.02278010, (0 missing)
##   Surrogate splits:
##       AGE     < 14.5 to the left,  agree=0.75, adj=0.143, (0 split)
##       P_EDUC  < 8.5  to the left,  agree=0.75, adj=0.143, (0 split)
##       REGION4 < 1.5  to the left,  agree=0.75, adj=0.143, (0 split)
## 
## Node number 58: 7 observations
##   mean=1.428571, MSE=0.5306122 
## 
## Node number 59: 17 observations
##   mean=2.764706, MSE=4.532872

## 
## Regression tree:
## rpart(formula = WEIGHT ~ WORRYB + SOC1 + SOC2NEGA + SOC2NEGB + 
##     SOC2NEGC + SOC2NEGD + GENDER + AGE + P_EDUC + RACETHNICITY + 
##     HOME_TYPE + HOUSING + INCOME + INTERNET + PHONESERVICE + 
##     METRO + REGION4 + HHSIZE, data = teens_and_tech_data)
## 
## Variables actually used in tree construction:
## [1] AGE       HHSIZE    HOME_TYPE HOUSING   INCOME    P_EDUC   
## 
## Root node error: 811.94/690 = 1.1767
## 
## n= 690 
## 
##         CP nsplit rel error  xerror     xstd
## 1 0.066931      0   1.00000 1.00193 0.117760
## 2 0.037889      1   0.93307 0.97245 0.108949
## 3 0.026906      2   0.89518 0.97718 0.103218
## 4 0.021143      3   0.86827 0.97286 0.097829
## 5 0.021057      4   0.84713 0.95083 0.092357
## 6 0.010902      5   0.82607 0.97095 0.101687
## 7 0.010856      6   0.81517 0.98528 0.101995
## 8 0.010553      8   0.79346 0.98528 0.101995
## 9 0.010000      9   0.78291 0.98870 0.102014

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5637  0.5637  0.8905  0.9609  0.8905  3.1667

##    decision_tree_predictions
##     0.563725490196078 0.886363636363636 0.89051094890511 1.10526315789474 1.12
##   1                61                 7               82                7   11
##   2                69                12               79                4    6
##   3                74                25              113                8    8
##    decision_tree_predictions
##     1.2625 1.42857142857143 2.5 2.76470588235294 3.16666666666667
##   1     31                2   2                8                3
##   2     17                2   1                2                2
##   3     32                3   5                7                7

##    decision_tree_predictions
##     0.563725490196078 0.886363636363636 0.89051094890511 1.10526315789474 1.12
##   1                61                 7               82                7   11
##   2                69                12               79                4    6
##   3                74                25              113                8    8
##    decision_tree_predictions
##     1.2625 1.42857142857143 2.5 2.76470588235294 3.16666666666667
##   1     31                2   2                8                3
##   2     17                2   1                2                2
##   3     32                3   5                7                7

## Accuracy: 26.95652 %

Random Forest

The random forest model combined multiple decision trees to improve the accuracy and generalizability of the model. The RMSE value of 1.113618 for your random forest model is relatively low, which indicates that the model fits the data well. This is a good result, and it suggests that the model can be used to make accurate predictions of worry levels for new individuals.

Overall Performance:

Linear Regression: The linear regression model achieved an RMSE of 1.137358 and R-squared value of 15.97%, indicating moderate predictive performance.
Lasso Regression: The lasso regression model achieved an RMSE of 1.22183, indicating good predictive performance.
Decision Tree: Achieves the lowest RMSE of 1.001859, indicating the best predictive accuracy among the four models.
Random Forest: Performs well with an RMSE of 1.119569, demonstrating good predictive power.

## Linear Regression RMSE: 1.137358

## Lasso RMSE: 1.22183

## Decision Tree RMSE: 1.001859

## Random Forest RMSE: 1.119569

Relevance of Methods

The choice of machine learning methods is tailored to address the specific aspects of our research question. Logistic regression and linear regression provide a statistical framework for modeling the relationship between social media use and mental health outcomes. Decision trees offer a visual representation of the decision-making process and identify key factors that influence the relationship. Random forest enhances the robustness of our predictions by leveraging the collective wisdom of multiple decision trees.

Understanding Real-World Implications

The application of machine learning methods in this study contributes significantly to the understanding of the real-world implications of social media use on teen mental health. Traditional statistical approaches may overlook intricate patterns and non-linear relationships present in the data. Machine learning models, by considering a multitude of predictors simultaneously, reveal nuanced insights that can inform policies, interventions, and educational strategies.

By evaluating the impact of specific social media variables, these ML models provide actionable information for parents, educators, and policymakers. For instance, understanding which aspects of social media usage are most strongly linked to positive or negative mental health outcomes allows for targeted interventions (Office of the Surgeon General (OSG) 2023). Moreover, the interpretability of decision tree models ensures that the identified relationships are accessible and understandable to stakeholders.

In conclusion, the machine learning methods employed in this study not only enhance the accuracy of predictions but also contribute to a richer understanding of the complex interplay between social media use and teen mental health. This knowledge is essential for fostering a healthier digital environment for today’s youth.

Discussion

The findings of this study provide compelling evidence that social media use can significantly impact the mental health of teenagers. While the results suggest a predominantly negative association, it’s crucial to acknowledge the multifaceted nature of social media and its potential for fostering positive social connections and support (“Health Advisory on Social Media Use in Adolescence” n.d.). Moving forward, it’s essential to develop evidence-based strategies to mitigate the negative impacts of social media and promote its positive use among teenagers.

Additional Research Findings

In addition to the findings outlined in the previous section, there are a number of other studies that have explored the relationship between social media use and mental health in teens. These studies have found that (Lin et al. 2016):

Teens who use social media more frequently are more likely to experience symptoms of depression, anxiety, and low self-esteem.
Teens who spend more time on social media are more likely to experience symptoms of depression, anxiety, and low self-esteem.
The type of social media platform used can also influence mental health outcomes. For example, teens who use Instagram more frequently are more likely to experience symptoms of depression and anxiety than teens who use Facebook more frequently.
Individual characteristics, such as gender and personality traits, can also moderate the relationship between social media use and mental health. For example, girls are more likely than boys to experience negative mental health outcomes from social media use.

Potential Mechanisms

There are a number of potential mechanisms that may explain the relationship between social media use and mental health in teens. These include (Bozzola et al. 2022):

Social comparison: Teens may compare themselves to others on social media and feel inadequate or unhappy with their own lives.
Fear of missing out (FOMO): Teens may feel anxious or left out when they see others posting about positive experiences on social media.
Cyberbullying: Teens may be cyberbullied on social media, which can lead to feelings of isolation, shame, and depression.
Disruption of sleep: Teens who use social media late at night may have trouble sleeping, which can contribute to mental health problems.
Distraction from schoolwork: Teens who spend too much time on social media may not have enough time to focus on their schoolwork, which can lead to stress and anxiety.

Implications for Prevention and Intervention

The findings on the relationship between social media use and mental health in teens have important implications for prevention and intervention efforts. Some potential strategies include (Office of the Surgeon General (OSG) 2023):

Promoting positive social media use: Encouraging teens to use social media in ways that enhance their well-being, such as connecting with friends, pursuing hobbies, and seeking support.
Educating teens about the potential risks of excessive social media use on mental health.
Encouraging teens to engage in mindful social media use, taking breaks and limiting screen time.
Promoting healthy coping mechanisms for managing stress and emotions, such as physical activity, relaxation techniques, and social interaction with friends and family.
Providing support and resources for teens struggling with mental health concerns.

Conclusion

Social media use is a significant factor associated with mental health outcomes in teens. Teens who engage in excessive social media use are at increased risk for experiencing symptoms of depression, anxiety, and low self-esteem. Machine learning techniques provide valuable tools for understanding the complex relationship between social media use and mental well-being, informing targeted interventions to promote positive social media use and mental health among teens.

References

Bozzola, Elena, Giulia Spina, Rino Agostiniani, Sarah Barni, Rocco Russo, Elena Scarpato, Antonio Di Mauro, et al. 2022. “The Use of Social Media in Children and Adolescents: Scoping Review on the Potential Risks.” International Journal of Environmental Research and Public Health 19 (16): 9960. https://doi.org/10.3390/ijerph19169960.

“Health Advisory on Social Media Use in Adolescence.” n.d. Https://Www.apa.org. Accessed January 27, 2025. https://www.apa.org/topics/social-media-internet/health-advisory-adolescent-social-media-use.

Itō, Mizuko, Heather A. Horst, Matteo Bittanti, danah boyd, Becky Herr Stephenson, Patricia G. Lange, C. J. Pascoe, and Laura Robinson. 2009. Living and Learning with New Media: Summary of Findings from the Digital Youth Project. Cambridge: The MIT Press.

Lin, Liu Yi, Jaime E. Sidani, Ariel Shensa, Ana Radovic, Elizabeth Miller, Jason B. Colditz, Beth L. Hoffman, Leila M. Giles, and Brian A. Primack. 2016. “ASSOCIATION BETWEEN SOCIAL MEDIA USE AND DEPRESSION AMONG U.S. YOUNG ADULTS.” Depression and Anxiety 33 (4): 323–31. https://doi.org/10.1002/da.22466.

Marciano, Laura, and Kasisomayajula Viswanath. 2023. “Social Media Use and Adolescents’ Well-Being: A Note on Flourishing.” Frontiers in Psychology 14 (April). https://doi.org/10.3389/fpsyg.2023.1092109.

Office of the Surgeon General (OSG). 2023. Social Media and Youth Mental Health: The U.S. Surgeon General’s Advisory. Publications and Reports of the Surgeon General. Washington (DC): US Department of Health; Human Services. http://www.ncbi.nlm.nih.gov/books/NBK594761/.

Vidal, Carol, Tenzin Lhaksampa, Leslie Miller, and Rheanna Platt. 2020. “Social Media Use and Depression in Adolescents: A Scoping Review.” International Review of Psychiatry 32 (3): 235–53. https://doi.org/10.1080/09540261.2020.1720623.

The Impact of Social Media on Mental Health in Teens

Nicholas Wiggins

2023-12-12

Abstract

Introduction

Adolescence as a Vulnerable Period

Digital Literacy and Coping Mechanisms

Communication Styles and Cyberbullying

Interplay of Age and Gender

Data

Variables

Machine Learning Methods

Linear Regression

Lasso Regression

Decision Trees

Random Forest

Overall Performance:

Relevance of Methods

Understanding Real-World Implications

Discussion

Additional Research Findings

Potential Mechanisms

Implications for Prevention and Intervention

Conclusion

References

The Impact of Social Media on Mental Health in Teens

Nicholas Wiggins

2023-12-12

Abstract

Introduction

Adolescence as a Vulnerable Period

Digital Literacy and Coping Mechanisms

Social Comparison and Body Image Concerns

Communication Styles and Cyberbullying

Interplay of Age and Gender

Data

Variables

Machine Learning Methods

Linear Regression

Lasso Regression

Decision Trees

Random Forest

Overall Performance:

Relevance of Methods

Understanding Real-World Implications

Discussion

Additional Research Findings

Potential Mechanisms

Implications for Prevention and Intervention

Conclusion

References