library(rvest)
library(tidyverse)
library(plotly)
library(webshot2)
library(ggplot2)
library(ggfortify)DATA 110
Introduction
The dataset we are using today is a study about student’s social media addiction. it was collected on universities around the world targeting students aged 16-25 to ensure broad geographic coverage and respondent anonymity by the dataandtrustalliance.org
For this project, we will focus our analysis on the variable :
Age (num) : Student’s age
Gender (cat) : male or Female
Country (cat): Country of residence
Average daily usage (num): The average time each student use his phone per day
sleep hours (num): Number of hours they sleep per night
Mental health score (num): a score on 1 to 10
Most used platform (num): Instagram , snapchat , Tiktok
Addiction score (num): A score from 1 to 10 to determine how addicted a student is to his phone .
The type of questions we will like to explore with this dataset are :
How does the total of sleep hours can affect the mental health ?
is there a correlation between the hours of sleep and the daily usage frequency of social media ?
is there a relationship between the hours of sleep and the addiction score ?
What is the most addictive social media platform ?
The reason why i choose this topic and this dataset is because student social media addiction is a growing problem today with potential impact on mental health and academic performance.
Let’s start by loading our different libraries
Let’s import our dataset
setwd("C:/Users/steve/OneDrive/Desktop/DATA 110/WEEK6")
Social_media_addiction <- read_csv("Students Social Media Addiction.csv")Rows: 705 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Gender, Academic_Level, Country, Most_Used_Platform, Affects_Academ...
dbl (7): Student_ID, Age, Avg_Daily_Usage_Hours, Sleep_Hours_Per_Night, Ment...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Cleaning the dataset
now we will create a new subset and select only the variable we need for our analysis
social_media_addiction2 <- Social_media_addiction |>
select(-Country, -Academic_Level, -Relationship_Status,-Conflicts_Over_Social_Media
)
head(social_media_addiction2)# A tibble: 6 × 9
Student_ID Age Gender Avg_Daily_Usage_Hours Most_Used_Platform
<dbl> <dbl> <chr> <dbl> <chr>
1 1 19 Female 5.2 Instagram
2 2 22 Male 2.1 Twitter
3 3 20 Female 6 TikTok
4 4 18 Male 3 YouTube
5 5 21 Male 4.5 Facebook
6 6 19 Female 7.2 Instagram
# ℹ 4 more variables: Affects_Academic_Performance <chr>,
# Sleep_Hours_Per_Night <dbl>, Mental_Health_Score <dbl>,
# Addicted_Score <dbl>
For this study we will focus our attention only on student who had their academic result affect by the excessive use of social media and arrange them by age . let’s filter our dataset to reflect that
social_media_addiction3 <- social_media_addiction2 |>
filter(Affects_Academic_Performance=="Yes")
#arrange(Age)
head(social_media_addiction3)# A tibble: 6 × 9
Student_ID Age Gender Avg_Daily_Usage_Hours Most_Used_Platform
<dbl> <dbl> <chr> <dbl> <chr>
1 1 19 Female 5.2 Instagram
2 3 20 Female 6 TikTok
3 5 21 Male 4.5 Facebook
4 6 19 Female 7.2 Instagram
5 8 20 Female 5.8 Snapchat
6 11 19 Male 4.8 Snapchat
# ℹ 4 more variables: Affects_Academic_Performance <chr>,
# Sleep_Hours_Per_Night <dbl>, Mental_Health_Score <dbl>,
# Addicted_Score <dbl>
VISUALIZATION 1 : BAR GRAPH
Here we will create our first graph to analyze the relationship between the sleep hours per night and the addicted score for each social media they use. Basically we will try to determine if the more/less a student sleep at night , the more/less his addiction score will be high.
ggplot(social_media_addiction3, aes(x=Sleep_Hours_Per_Night, y=Addicted_Score, fill = Most_Used_Platform))+
geom_bar(stat = "identity")+
coord_flip()+
labs(
title = "Sleep hours per night/Addicted score",
X="Addicted score",
Y="Sleep hours per night",
fill="Most used Platform",
caption = "source: Scraped 2019 IMDB movie data frame")+
theme_light(base_size = 10)VISUALIZATION 2 : BAR GRAPH
Here we will try to analyze which social media is the most used by student per age , and also the average daily usage time for each of them.
ggplot(social_media_addiction3, aes(x=Most_Used_Platform, y=Age, fill =Avg_Daily_Usage_Hours))+
geom_bar(stat = "identity")+
coord_flip()+
labs(
title = "Most used platform by age",
X="Age",
Y="Social media platform",
fill="Average used time per day in hour",
caption = "source: Scraped 2019 IMDB movie data frame")+
scale_fill_gradient(low = "lightblue", high = "navyblue")+
theme_light(base_size = 10)VISUALIZATION 3 : SCATTERPLOT
Here we will analyse the relationship between the hours of sleep at night and the average daily usage of the phone
p1 <- ggplot(social_media_addiction3, aes(x=Sleep_Hours_Per_Night, y=Avg_Daily_Usage_Hours))+
labs(title = "SLEEP HOURS PER NIGHT VS AVERAGE DAILY USAGE",
caption = "Source:Student social media addiction database") +
xlab("Sleep hours per night") +
ylab ("Average daily usage hours") +
theme_bw()
p2 <- p1 + geom_point(color="brown")
p2STATISTICAL ANLYSIS
Here we will show the correlation exixting between the hours our students sleep at night and the average daily time they spend on social media
Linear Regression
p4 <- p2 + geom_smooth(method = "lm", formula = y~x)
p4and finally here let’s remove the confidence interval to show the correlation
p5 <- p2 + geom_smooth(method='lm',formula=y~x, se = FALSE, linetype= "dotdash", size = 0.3) +
ggtitle("SLEEP HOUR PER NIGHT VS AVERAGE DAILY USAGE ")Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
p5 CONCLUSION
For this project, I drew inspiration from Brian Johnston’s article “Am I Addicted to Instagram? How Social Media Affects Students’ Lives” to understand the impact of excessive social media use on students’ lives. I wish I could have created a meaningful scatter plot to analyze the relationship between our participants’ age and their mental health score. Excessive use of platforms like Instagram, TikTok, and Snapchat often leads to distracted studying, decreased attention, and poor time management. Furthermore, social media addiction can disrupt sleep, face-to-face communication, and human relationships. Spending more time online can disrupt students’ presence, concentration, and emotional balance. For all these reasons, I believe we should all pay more attention to it.
citations
- “Am I #instaAddicted? How Social Media Affect Students’ Lives.” CMDI Pathways to Excellence Summer Intensive, www.colorado.edu/program/cmdipathways/2013-pathways/am-i-instaaddicted-how-social-media-affect-students-lives.