This project explores a dataset that investigates how social media usage affects the academic performance, mental health, and daily habits of students from around the world. The dataset contains responses from 705 students across various educational levels and countries, capturing a diverse range of experiences with social media. The dataset includes the following key variables:
Age (Quantitative): The age of the student.
Gender (Categorical): The student’s identified gender.
Academic_Level (Categorical): The current academic stage (e.g., High School, Undergraduate, Graduate).
Country (Categorical): The student’s country of origin.
Avg_Daily_Usage_Hours (Quantitative): Average hours spent on social media each day.
Most_Used_Platform (Categorical): The student’s most frequently used social media platform.
Affects_Academic_Performance (Categorical): Whether the student perceives social media to impact their academic performance.
Sleep_Hours_Per_Night (Quantitative): The average number of hours the student sleeps per night.
Mental_Health_Score (Quantitative): A self-reported score reflecting mental health (scale 1–10).
Relationship_Status (Categorical): Whether the student is single, in a relationship, or in a complicated situation.
Conflicts_Over_Social_Media (Quantitative): Number of conflicts arising from social media use.
Addicted_Score (Quantitative): A score estimating the level of social media addiction (scale 1–10).
Data Source
These were the sources used to conduct the survey: DataONE. (n.d.). Data Observation Network for Earth. https://www.dataone.org/
Data and Trust Alliance. (n.d.). Data and Trust Alliance. https://dataandtrustalliance.org/
Load Library
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Read your dataset
In this section, we outline the process of importing the dataset for the social media addiction study. The provided code reads the CSV file containing student survey responses, creating the primary dataset that will serve as the basis for all subsequent analyses.
setwd("C:/Users/tosin/OneDrive/Documents")socialmedia <-read_csv("Students Social Media Addiction.csv")
Rows: 705 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Gender, Academic_Level, Country, Most_Used_Platform, Affects_Academ...
dbl (7): Student_ID, Age, Avg_Daily_Usage_Hours, Sleep_Hours_Per_Night, Ment...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Drop rows with missing values
Ensures that the plot only includes complete cases, preventing errors or misleading visuals due to missing data.
This section generates a horizontal boxplot visualization to examine the distribution of daily social media usage across different academic levels.
fill = Academic_Level - colors the boxplot by the three academic categories, making it easy to distinguish between student groups.`
scale_fill_brewer(palette = “Set2”)- was used to use a different color palette than the default ggplot
labs() - allows us to add a descriptive title, clear axis labels, and legend formatting to make the visualization self-explanatory and professional.
ggplot(socialmedia, aes(x = Academic_Level, y = Avg_Daily_Usage_Hours, fill = Academic_Level)) +geom_boxplot() +labs(title ="Social Media Usage by Academic Level",x ="Academic Level",y =" Average Daily Social Media Use (Hours)",caption =" Data Source: Student Social Media Addiction Survey (2025)" ) +scale_fill_brewer(palette ="Set2") +theme_minimal()
Essay
Data Cleaning Process
The dataset was imported using read_csv() from the readr package, which automatically handles common CSV parsing issues like character encoding and data type inference. The working directory was explicitly set using setwd() to ensure reproducible file path access.Before conducting any analysis, I performed data cleaning to ensure the reliability and clarity of the visualization. Specifically, I focused on Avg_Daily_Usage_Hours, Mental_Health_Score, and Academic_Level. Using the drop_na() function from the dplyr package, I filtered out all rows containing missing values in these columns. The cleaning focused specifically on these three variables because they represent the core analytical variables needed for my visualization.
What the Visualization Represents
The final visualization is a boxplot that displays average daily social media usage by academic level. There were certain things that interest me in the data such as how graduate students show the most erratic usage patterns despite being the most academically advanced group. You might expect them to have more disciplined consistent habits, but instead they display the widest variation(1.5 - 8hrs).The biggest suprise to me was the complete absence of any “light users” across all academic levels there really isn’t a group showing healthy, moderate social media use. I would have expected to see at least one level of education with a median of 2-3 hours per day, or even a large majority of students devoting only 1-2 hours to social media. Instead, even the “lowest” group (graduate students) still has a median of roughly 5 hours per day. This chart provides strong visual support for the survey’s theme — namely, that students across all academic levels are spending substantial amounts of time on social media, with the data revealing that even the lowest median usage (graduate students at ~5 hours) represents a significant daily time investment that could potentially impact academic performance and well-being.
Limitations and Challenges
Although this boxplot shows differences in usage among academic levels, I initially struggled to pick which visulaization I should use, I explored other visualizations such as scatterplot to show potential relationships between academic level and social media usage.However this proved difficult to interpret either due to overplotting and unclear trends or a mistake on my part during the coding process
Example of scatterplots used initially
ggplot(socialmedia, aes(x = Academic_Level, y = Avg_Daily_Usage_Hours, color = Academic_Level)) +geom_point() +labs(title ="Social Media Usage by Academic Level",x ="Academic Level",y =" Average Daily Social Media Use (Hours)",caption =" Data Source: Student Social Media Addiction Survey (2025)" ) +scale_fill_brewer(palette ="Set2") +theme_minimal()
Social Media Use and Student Well-Being
Introduction
This project explores a dataset that investigates how social media usage affects the academic performance, mental health, and daily habits of students from around the world. The dataset contains responses from 705 students across various educational levels and countries, capturing a diverse range of experiences with social media. The dataset includes the following key variables:
Age (Quantitative): The age of the student.
Gender (Categorical): The student’s identified gender.
Academic_Level (Categorical): The current academic stage (e.g., High School, Undergraduate, Graduate).
Country (Categorical): The student’s country of origin.
Avg_Daily_Usage_Hours (Quantitative): Average hours spent on social media each day.
Most_Used_Platform (Categorical): The student’s most frequently used social media platform.
Affects_Academic_Performance (Categorical): Whether the student perceives social media to impact their academic performance.
Sleep_Hours_Per_Night (Quantitative): The average number of hours the student sleeps per night.
Mental_Health_Score (Quantitative): A self-reported score reflecting mental health (scale 1–10).
Relationship_Status (Categorical): Whether the student is single, in a relationship, or in a complicated situation.
Conflicts_Over_Social_Media (Quantitative): Number of conflicts arising from social media use.
Addicted_Score (Quantitative): A score estimating the level of social media addiction (scale 1–10).