Background

The dataset utilized for this research topic the “Student Social Media and Mental Health Impact Dataset” from Kaggle.com. The dataset contained behavioral analytic data that examined 5,000 student records. The dataset includes various variables that explores the relationship between social media usage, lifestyle habits, and student mental health. For this research question, The columns of data utilized from this dataset are “daily unlocks” and “physical activity hours”. Daily unlocks provides an estimated count of the number of times the study participant unlocked their device. Physical activity hours provide the estimated number of hours the study participant conducted physical activity each day.

##Data Analysis-Data Set

library(tidyverse)
social_media <- read_csv("Social_Media.csv")
str(social_media)
## spc_tbl_ [5,000 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Age                    : num [1:5000] 21 23 22 18 24 19 22 21 24 19 ...
##  $ Gender                 : chr [1:5000] "Male" "Female" "Male" "Male" ...
##  $ Country                : chr [1:5000] "Other" "Other" "Canada" "Other" ...
##  $ Academic_Level         : chr [1:5000] "Undergraduate" "Graduate" "Undergraduate" "High School" ...
##  $ Most_Used_Platform     : chr [1:5000] "Facebook" "LinkedIn" "Instagram" "Snapchat" ...
##  $ Purpose_Of_Use         : chr [1:5000] "Networking" "Education" "Entertainment" "Entertainment" ...
##  $ Avg_Daily_Usage_Hours  : num [1:5000] 4 1.6 4.6 7 7.5 4.1 8 6 4.7 3.6 ...
##  $ Daily_Unlocks          : num [1:5000] 134 73 166 220 237 138 246 209 154 130 ...
##  $ Study_Hours            : num [1:5000] 4.5 7 4 1 1 4.6 1 1 2.4 3.7 ...
##  $ Physical_Activity_Hours: num [1:5000] 2.2 2.4 1.8 1.7 1.1 2.2 0.9 1.3 1.1 1.1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Age = col_double(),
##   ..   Gender = col_character(),
##   ..   Country = col_character(),
##   ..   Academic_Level = col_character(),
##   ..   Most_Used_Platform = col_character(),
##   ..   Purpose_Of_Use = col_character(),
##   ..   Avg_Daily_Usage_Hours = col_double(),
##   ..   Daily_Unlocks = col_double(),
##   ..   Study_Hours = col_double(),
##   ..   Physical_Activity_Hours = col_double()
##   .. )
##  - attr(*, "problems")=<pointer: 0x7f9ff098f310>
dim(social_media)
## [1] 5000   10
head(social_media)
## # A tibble: 6 × 10
##     Age Gender Country Academic_Level Most_Used_Platform Purpose_Of_Use
##   <dbl> <chr>  <chr>   <chr>          <chr>              <chr>         
## 1    21 Male   Other   Undergraduate  Facebook           Networking    
## 2    23 Female Other   Graduate       LinkedIn           Education     
## 3    22 Male   Canada  Undergraduate  Instagram          Entertainment 
## 4    18 Male   Other   High School    Snapchat           Entertainment 
## 5    24 Female Other   Graduate       Facebook           Networking    
## 6    19 Female Other   Undergraduate  Twitter            News          
## # ℹ 4 more variables: Avg_Daily_Usage_Hours <dbl>, Daily_Unlocks <dbl>,
## #   Study_Hours <dbl>, Physical_Activity_Hours <dbl>
tail(social_media)
## # A tibble: 6 × 10
##     Age Gender Country  Academic_Level Most_Used_Platform Purpose_Of_Use
##   <dbl> <chr>  <chr>    <chr>          <chr>              <chr>         
## 1    20 Female Paraguay Undergraduate  TikTok             Entertainment 
## 2    18 Female Other    High School    YouTube            Education     
## 3    18 Female Other    High School    YouTube            Education     
## 4    19 Female India    Undergraduate  LinkedIn           Education     
## 5    18 Male   Other    High School    Instagram          Entertainment 
## 6    19 Male   China    Undergraduate  WeChat             Entertainment 
## # ℹ 4 more variables: Avg_Daily_Usage_Hours <dbl>, Daily_Unlocks <dbl>,
## #   Study_Hours <dbl>, Physical_Activity_Hours <dbl>
summary(social_media)
##       Age              Gender          Country       Academic_Level
##  Min.   :18.00   Length   :5000   Length   :5000   Length   :5000  
##  1st Qu.:19.00   N.unique :   2   N.unique : 111   N.unique :   3  
##  Median :21.00   N.blank  :   0   N.blank  :   0   N.blank  :   0  
##  Mean   :20.82   Min.nchar:   4   Min.nchar:   2   Min.nchar:   8  
##  3rd Qu.:22.00   Max.nchar:   6   Max.nchar:  15   Max.nchar:  13  
##  Max.   :24.00                                                     
##  Most_Used_Platform   Purpose_Of_Use Avg_Daily_Usage_Hours Daily_Unlocks  
##  Length   :5000     Length   :5000   Min.   :1.000         Min.   : 62.0  
##  N.unique :  12     N.unique :   4   1st Qu.:3.800         1st Qu.:140.0  
##  N.blank  :   0     N.blank  :   0   Median :5.000         Median :171.0  
##  Min.nchar:   4     Min.nchar:   4   Mean   :5.078         Mean   :171.5  
##  Max.nchar:   9     Max.nchar:  13   3rd Qu.:6.300         3rd Qu.:204.0  
##                                      Max.   :8.800         Max.   :273.0  
##   Study_Hours    Physical_Activity_Hours
##  Min.   :0.300   Min.   :-0.400         
##  1st Qu.:1.500   1st Qu.: 1.300         
##  Median :2.800   Median : 1.700         
##  Mean   :3.008   Mean   : 1.751         
##  3rd Qu.:4.200   3rd Qu.: 2.200         
##  Max.   :8.300   Max.   : 4.100
nrow(social_media)
## [1] 5000
ncol(social_media)
## [1] 10
str(social_media)
## spc_tbl_ [5,000 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Age                    : num [1:5000] 21 23 22 18 24 19 22 21 24 19 ...
##  $ Gender                 : chr [1:5000] "Male" "Female" "Male" "Male" ...
##  $ Country                : chr [1:5000] "Other" "Other" "Canada" "Other" ...
##  $ Academic_Level         : chr [1:5000] "Undergraduate" "Graduate" "Undergraduate" "High School" ...
##  $ Most_Used_Platform     : chr [1:5000] "Facebook" "LinkedIn" "Instagram" "Snapchat" ...
##  $ Purpose_Of_Use         : chr [1:5000] "Networking" "Education" "Entertainment" "Entertainment" ...
##  $ Avg_Daily_Usage_Hours  : num [1:5000] 4 1.6 4.6 7 7.5 4.1 8 6 4.7 3.6 ...
##  $ Daily_Unlocks          : num [1:5000] 134 73 166 220 237 138 246 209 154 130 ...
##  $ Study_Hours            : num [1:5000] 4.5 7 4 1 1 4.6 1 1 2.4 3.7 ...
##  $ Physical_Activity_Hours: num [1:5000] 2.2 2.4 1.8 1.7 1.1 2.2 0.9 1.3 1.1 1.1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Age = col_double(),
##   ..   Gender = col_character(),
##   ..   Country = col_character(),
##   ..   Academic_Level = col_character(),
##   ..   Most_Used_Platform = col_character(),
##   ..   Purpose_Of_Use = col_character(),
##   ..   Avg_Daily_Usage_Hours = col_double(),
##   ..   Daily_Unlocks = col_double(),
##   ..   Study_Hours = col_double(),
##   ..   Physical_Activity_Hours = col_double()
##   .. )
##  - attr(*, "problems")=<pointer: 0x7f9ff098f310>

##Examining Daily Unlocks and Physical Activity

Unlocks <- social_media$Daily_Unlocks
mean(Unlocks)
## [1] 171.4526
summary(Unlocks)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    62.0   140.0   171.0   171.5   204.0   273.0
hist(Unlocks, 
     main = "Distribution of Daily Locks", 
     xlab = "",                       
     ylab = "Frequency",                   
     col = "steelblue",                     
     border = "black",                      
     breaks = 7) 

Active_Hours <- social_media$Physical_Activity_Hours
mean(Active_Hours)
## [1] 1.751
summary(Active_Hours)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -0.400   1.300   1.700   1.751   2.200   4.100
hist(Active_Hours, 
     main = "Distribution of Activity Hours", 
     xlab = "",                       
     ylab = "Frequency",                   
     col = "steelblue",                     
     border = "black",                      
     breaks = 7) 

cor(Unlocks, Active_Hours, method = "pearson")
## [1] -0.6028277
plot(Unlocks, Active_Hours)

ggplot(social_media, aes(x = Daily_Unlocks, y = Physical_Activity_Hours)) +
  geom_point(alpha = 0.4) +
  labs(title = "Daily Unlocks vs. Physical Activity Hours",
       x = "Daily Unlocks", y = "Physical Activity Hours") +
  theme_minimal()


An examination of the descriptive statistics and shape of the full dataset was examined. A Pearson correlation test was utilized to calculate he strength and direction the linear relationship between the daily unlocks and physical activity hours. Additionally, a scatterplot was used to further describe the relationship between the two variables.

Conclusion and Future Direction The examination of daily unlocks and physical activity hours shows there is a negative relationship as the more unlocks lessens the engagement in physical activity. This could be impactful for educational environments in limiting the access to cellphones in the school as this examination of data validates many of the laws being passed banning cellphones in schools. Additionally, this study can enhance screening of for the pediatric medical environment that treat various childhood conditions that are linked to childhood inactivity (i.e., childhood obesity). The data can continue to enhance the knowledge of treatment providers and assist in identifying the most appropriate intervention for clients that identify excessive technology usage and inactivity as symptoms.

Singh, S. (2026). Student Social Media & Mental Health Impact, Version 1. Retrieved on June 16, 2026 from https://www.kaggle.com/datasets/shivasingh4945/student-social-media-and-mental-health-impact