DATA 110

FINAL PROJECT : Student Social Media Addiction

AUTHOR : Steve Donfack

Credit : Getty images

Introduction

The dataset we are using today is a study about student’s social media addiction. it was collected on universities around the world targeting students aged 16-25 to ensure broad geographic coverage and respondent anonymity by the dataandtrustalliance.org

For this project, we will focus our analysis on the variable :

  • Age (num) : Student’s age

  • Gender (cat) : male or Female

  • Country (cat): Country of residence

  • Average daily usage (num): The average time each student use his phone per day

  • sleep hours (num): Number of hours they sleep per night

  • Mental health score (num): a score on 1 to 10

  • Most used platform (num): Instagram , snapchat , Tiktok

  • Addiction score (num): A score from 1 to 10 to determine how addicted a student is to his phone .

    The type of questions we will like to explore with this dataset are :

    • How does the total of sleep hours can affect the mental health ?

    • is there a correlation between the hours of sleep and the daily usage frequency of social media ?

    • is there a relationship between the hours of sleep and the addiction score ?

    • What is the most addictive social media platform ?

    The reason why i choose this topic and this dataset is because student social media addiction is a growing problem today with potential impact on mental health and academic performance.

Let’s start by loading our different libraries

library(rvest)
library(tidyverse)
library(plotly)
library(webshot2)
library(ggplot2)
library(ggfortify)

Let’s import our dataset

setwd("C:/Users/steve/OneDrive/Desktop/DATA 110/WEEK6")
Social_media_addiction <- read_csv("Students Social Media Addiction.csv")
Rows: 705 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Gender, Academic_Level, Country, Most_Used_Platform, Affects_Academ...
dbl (7): Student_ID, Age, Avg_Daily_Usage_Hours, Sleep_Hours_Per_Night, Ment...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Cleaning the dataset

now we will create a new subset and select only the variable we need for our analysis

social_media_addiction2 <- Social_media_addiction |>
  select(-Country, -Academic_Level, -Relationship_Status,-Conflicts_Over_Social_Media
         )
head(social_media_addiction2)
# A tibble: 6 × 9
  Student_ID   Age Gender Avg_Daily_Usage_Hours Most_Used_Platform
       <dbl> <dbl> <chr>                  <dbl> <chr>             
1          1    19 Female                   5.2 Instagram         
2          2    22 Male                     2.1 Twitter           
3          3    20 Female                   6   TikTok            
4          4    18 Male                     3   YouTube           
5          5    21 Male                     4.5 Facebook          
6          6    19 Female                   7.2 Instagram         
# ℹ 4 more variables: Affects_Academic_Performance <chr>,
#   Sleep_Hours_Per_Night <dbl>, Mental_Health_Score <dbl>,
#   Addicted_Score <dbl>

For this study we will focus our attention only on student who had their academic result affect by the excessive use of social media and arrange them by age . let’s filter our dataset to reflect that

social_media_addiction3 <- social_media_addiction2 |>
  filter(Affects_Academic_Performance=="Yes") 
  #arrange(Age)
head(social_media_addiction3)
# A tibble: 6 × 9
  Student_ID   Age Gender Avg_Daily_Usage_Hours Most_Used_Platform
       <dbl> <dbl> <chr>                  <dbl> <chr>             
1          1    19 Female                   5.2 Instagram         
2          3    20 Female                   6   TikTok            
3          5    21 Male                     4.5 Facebook          
4          6    19 Female                   7.2 Instagram         
5          8    20 Female                   5.8 Snapchat          
6         11    19 Male                     4.8 Snapchat          
# ℹ 4 more variables: Affects_Academic_Performance <chr>,
#   Sleep_Hours_Per_Night <dbl>, Mental_Health_Score <dbl>,
#   Addicted_Score <dbl>

VISUALIZATION 1 : BAR GRAPH

Here we will create our first graph to analyze the relationship between the sleep hours per night and the addicted score for each social media they use. Basically we will try to determine if the more/less a student sleep at night , the more/less his addiction score will be high.

ggplot(social_media_addiction3, aes(x=Sleep_Hours_Per_Night, y=Addicted_Score, fill = Most_Used_Platform))+
  geom_bar(stat = "identity")+
  coord_flip()+
  labs(
    title = "Sleep hours per night/Addicted score",
    X="Addicted score",
    Y="Sleep hours per night",
    fill="Most used Platform",
    caption = "source: Scraped 2019 IMDB movie data frame")+
  theme_light(base_size = 10)

VISUALIZATION 2 : BAR GRAPH

Here we will try to analyze which social media is the most used by student per age , and also the average daily usage time for each of them.

ggplot(social_media_addiction3, aes(x=Most_Used_Platform, y=Age, fill =Avg_Daily_Usage_Hours))+
  geom_bar(stat = "identity")+
  coord_flip()+
  labs(
    title = "Most used platform by age",
    X="Age",
    Y="Social media platform",
    fill="Average used time per day in hour",
    caption = "source: Scraped 2019 IMDB movie data frame")+
  scale_fill_gradient(low = "lightblue", high = "navyblue")+
  theme_light(base_size = 10)

VISUALIZATION 3 : SCATTERPLOT

Here we will analyse the relationship between the hours of sleep at night and the average daily usage of the phone

p1 <- ggplot(social_media_addiction3, aes(x=Sleep_Hours_Per_Night, y=Avg_Daily_Usage_Hours))+
  labs(title = "SLEEP HOURS PER NIGHT VS AVERAGE DAILY USAGE",
  caption = "Source:Student social media addiction database") +
  xlab("Sleep hours per night") +
  ylab ("Average daily usage hours") +
  theme_bw()
p2 <- p1 + geom_point(color="brown")

p2

STATISTICAL ANLYSIS

Here we will show the correlation exixting between the hours our students sleep at night and the average daily time they spend on social media

Linear Regression

p4 <- p2 + geom_smooth(method = "lm", formula = y~x)
p4

and finally here let’s remove the confidence interval to show the correlation

p5 <- p2 + geom_smooth(method='lm',formula=y~x, se = FALSE, linetype= "dotdash", size = 0.3) +
  ggtitle("SLEEP HOUR PER NIGHT VS AVERAGE DAILY USAGE ")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
p5  

Let’s find the correlation coef between the sleep hours per night and the daily usage of social media

cor(social_media_addiction3$Sleep_Hours_Per_Night, social_media_addiction3$Avg_Daily_Usage_Hours)
[1] -0.717044

with a correlation of -0.7, There is a strong correlation between the hours our students sleep and the average time they spend on social media, meaning that the more time they spend on social media, the less they sleep and vice versa.

CONCLUSION

For this project, I drew inspiration from Brian Johnston’s article “Am I Addicted to Instagram? How Social Media Affects Students’ Lives” to understand the impact of excessive social media use on students’ lives. I wish I could have created a meaningful scatter plot to analyze the relationship between our participants’ age and their mental health score. Excessive use of platforms like Instagram, TikTok, and Snapchat often leads to distracted studying, decreased attention, and poor time management. Furthermore, social media addiction can disrupt sleep, face-to-face communication, and human relationships. Spending more time online can disrupt students’ presence, concentration, and emotional balance. For all these reasons, I believe we should all pay more attention to it.

citations

  • “Am I #instaAddicted? How Social Media Affect Students’ Lives.” CMDI Pathways to Excellence Summer Intensive, www.colorado.edu/program/cmdipathways/2013-pathways/am-i-instaaddicted-how-social-media-affect-students-lives.