The dataset utilized for this research topic the “Student Social Media and Mental Health Impact Dataset” from Kaggle.com. The dataset contained behavioral analytic data that examined 5,000 student records. The dataset includes various variables that explores the relationship between social media usage, lifestyle habits, and student mental health. For this research question, The columns of data utilized from this dataset are “daily unlocks” and “physical activity hours”. Daily unlocks provides an estimated count of the number of times the study participant unlocked their device. Physical activity hours provide the estimated number of hours the study participant conducted physical activity each day.
##Data Analysis-Data Set
library(tidyverse)
social_media <- read_csv("Social_Media.csv")
str(social_media)
## spc_tbl_ [5,000 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Age : num [1:5000] 21 23 22 18 24 19 22 21 24 19 ...
## $ Gender : chr [1:5000] "Male" "Female" "Male" "Male" ...
## $ Country : chr [1:5000] "Other" "Other" "Canada" "Other" ...
## $ Academic_Level : chr [1:5000] "Undergraduate" "Graduate" "Undergraduate" "High School" ...
## $ Most_Used_Platform : chr [1:5000] "Facebook" "LinkedIn" "Instagram" "Snapchat" ...
## $ Purpose_Of_Use : chr [1:5000] "Networking" "Education" "Entertainment" "Entertainment" ...
## $ Avg_Daily_Usage_Hours : num [1:5000] 4 1.6 4.6 7 7.5 4.1 8 6 4.7 3.6 ...
## $ Daily_Unlocks : num [1:5000] 134 73 166 220 237 138 246 209 154 130 ...
## $ Study_Hours : num [1:5000] 4.5 7 4 1 1 4.6 1 1 2.4 3.7 ...
## $ Physical_Activity_Hours: num [1:5000] 2.2 2.4 1.8 1.7 1.1 2.2 0.9 1.3 1.1 1.1 ...
## - attr(*, "spec")=
## .. cols(
## .. Age = col_double(),
## .. Gender = col_character(),
## .. Country = col_character(),
## .. Academic_Level = col_character(),
## .. Most_Used_Platform = col_character(),
## .. Purpose_Of_Use = col_character(),
## .. Avg_Daily_Usage_Hours = col_double(),
## .. Daily_Unlocks = col_double(),
## .. Study_Hours = col_double(),
## .. Physical_Activity_Hours = col_double()
## .. )
## - attr(*, "problems")=<pointer: 0x7f9ff098f310>
dim(social_media)
## [1] 5000 10
head(social_media)
## # A tibble: 6 × 10
## Age Gender Country Academic_Level Most_Used_Platform Purpose_Of_Use
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 21 Male Other Undergraduate Facebook Networking
## 2 23 Female Other Graduate LinkedIn Education
## 3 22 Male Canada Undergraduate Instagram Entertainment
## 4 18 Male Other High School Snapchat Entertainment
## 5 24 Female Other Graduate Facebook Networking
## 6 19 Female Other Undergraduate Twitter News
## # ℹ 4 more variables: Avg_Daily_Usage_Hours <dbl>, Daily_Unlocks <dbl>,
## # Study_Hours <dbl>, Physical_Activity_Hours <dbl>
tail(social_media)
## # A tibble: 6 × 10
## Age Gender Country Academic_Level Most_Used_Platform Purpose_Of_Use
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 20 Female Paraguay Undergraduate TikTok Entertainment
## 2 18 Female Other High School YouTube Education
## 3 18 Female Other High School YouTube Education
## 4 19 Female India Undergraduate LinkedIn Education
## 5 18 Male Other High School Instagram Entertainment
## 6 19 Male China Undergraduate WeChat Entertainment
## # ℹ 4 more variables: Avg_Daily_Usage_Hours <dbl>, Daily_Unlocks <dbl>,
## # Study_Hours <dbl>, Physical_Activity_Hours <dbl>
summary(social_media)
## Age Gender Country Academic_Level
## Min. :18.00 Length :5000 Length :5000 Length :5000
## 1st Qu.:19.00 N.unique : 2 N.unique : 111 N.unique : 3
## Median :21.00 N.blank : 0 N.blank : 0 N.blank : 0
## Mean :20.82 Min.nchar: 4 Min.nchar: 2 Min.nchar: 8
## 3rd Qu.:22.00 Max.nchar: 6 Max.nchar: 15 Max.nchar: 13
## Max. :24.00
## Most_Used_Platform Purpose_Of_Use Avg_Daily_Usage_Hours Daily_Unlocks
## Length :5000 Length :5000 Min. :1.000 Min. : 62.0
## N.unique : 12 N.unique : 4 1st Qu.:3.800 1st Qu.:140.0
## N.blank : 0 N.blank : 0 Median :5.000 Median :171.0
## Min.nchar: 4 Min.nchar: 4 Mean :5.078 Mean :171.5
## Max.nchar: 9 Max.nchar: 13 3rd Qu.:6.300 3rd Qu.:204.0
## Max. :8.800 Max. :273.0
## Study_Hours Physical_Activity_Hours
## Min. :0.300 Min. :-0.400
## 1st Qu.:1.500 1st Qu.: 1.300
## Median :2.800 Median : 1.700
## Mean :3.008 Mean : 1.751
## 3rd Qu.:4.200 3rd Qu.: 2.200
## Max. :8.300 Max. : 4.100
nrow(social_media)
## [1] 5000
ncol(social_media)
## [1] 10
str(social_media)
## spc_tbl_ [5,000 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Age : num [1:5000] 21 23 22 18 24 19 22 21 24 19 ...
## $ Gender : chr [1:5000] "Male" "Female" "Male" "Male" ...
## $ Country : chr [1:5000] "Other" "Other" "Canada" "Other" ...
## $ Academic_Level : chr [1:5000] "Undergraduate" "Graduate" "Undergraduate" "High School" ...
## $ Most_Used_Platform : chr [1:5000] "Facebook" "LinkedIn" "Instagram" "Snapchat" ...
## $ Purpose_Of_Use : chr [1:5000] "Networking" "Education" "Entertainment" "Entertainment" ...
## $ Avg_Daily_Usage_Hours : num [1:5000] 4 1.6 4.6 7 7.5 4.1 8 6 4.7 3.6 ...
## $ Daily_Unlocks : num [1:5000] 134 73 166 220 237 138 246 209 154 130 ...
## $ Study_Hours : num [1:5000] 4.5 7 4 1 1 4.6 1 1 2.4 3.7 ...
## $ Physical_Activity_Hours: num [1:5000] 2.2 2.4 1.8 1.7 1.1 2.2 0.9 1.3 1.1 1.1 ...
## - attr(*, "spec")=
## .. cols(
## .. Age = col_double(),
## .. Gender = col_character(),
## .. Country = col_character(),
## .. Academic_Level = col_character(),
## .. Most_Used_Platform = col_character(),
## .. Purpose_Of_Use = col_character(),
## .. Avg_Daily_Usage_Hours = col_double(),
## .. Daily_Unlocks = col_double(),
## .. Study_Hours = col_double(),
## .. Physical_Activity_Hours = col_double()
## .. )
## - attr(*, "problems")=<pointer: 0x7f9ff098f310>
##Examining Daily Unlocks and Physical Activity
Unlocks <- social_media$Daily_Unlocks
mean(Unlocks)
## [1] 171.4526
summary(Unlocks)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 62.0 140.0 171.0 171.5 204.0 273.0
hist(Unlocks,
main = "Distribution of Daily Locks",
xlab = "",
ylab = "Frequency",
col = "steelblue",
border = "black",
breaks = 7)
Active_Hours <- social_media$Physical_Activity_Hours
mean(Active_Hours)
## [1] 1.751
summary(Active_Hours)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.400 1.300 1.700 1.751 2.200 4.100
hist(Active_Hours,
main = "Distribution of Activity Hours",
xlab = "",
ylab = "Frequency",
col = "steelblue",
border = "black",
breaks = 7)
cor(Unlocks, Active_Hours, method = "pearson")
## [1] -0.6028277
plot(Unlocks, Active_Hours)
ggplot(social_media, aes(x = Daily_Unlocks, y = Physical_Activity_Hours)) +
geom_point(alpha = 0.4) +
labs(title = "Daily Unlocks vs. Physical Activity Hours",
x = "Daily Unlocks", y = "Physical Activity Hours") +
theme_minimal()
An examination of the descriptive statistics and shape of the full dataset was examined. A Pearson correlation test was utilized to calculate he strength and direction the linear relationship between the daily unlocks and physical activity hours. Additionally, a scatterplot was used to further describe the relationship between the two variables.
Conclusion and Future Direction The examination of daily unlocks and physical activity hours shows there is a negative relationship as the more unlocks lessens the engagement in physical activity. This could be impactful for educational environments in limiting the access to cellphones in the school as this examination of data validates many of the laws being passed banning cellphones in schools. Additionally, this study can enhance screening of for the pediatric medical environment that treat various childhood conditions that are linked to childhood inactivity (i.e., childhood obesity). The data can continue to enhance the knowledge of treatment providers and assist in identifying the most appropriate intervention for clients that identify excessive technology usage and inactivity as symptoms.
Singh, S. (2026). Student Social Media & Mental Health Impact, Version 1. Retrieved on June 16, 2026 from https://www.kaggle.com/datasets/shivasingh4945/student-social-media-and-mental-health-impact