Introduction:
In this project, I’m exploring the relationship between social media usage specifically TikTok and academic performance . By analyzing this dataset, which contains information about Chinese teenagers’ TikTok usage, study habits, and academic performance. I have chosen this topic because as a first year college student I struggle with using social media instead of using time to study.
I examine variables such as tiktok_use_hours_mon, study_hours_mon and
sleep_quality. I chose this topic because I am interested in how digital
habits affect my own study efficiency. My analysis involves cleaning the
data, and I have used dplyr to subset my observations to
ensure a focused look at student habits.
The data shows students grades, tik tok usage, media usage, parental consent, sleep hours, well being, and more.
This dataset comes from a science direct article (https://www.sciencedirect.com/science/article/pii/S0001691824004438)
where they looked at specific variables in Chinese students (as described ) to have a better understanding on why Tiktok usage vs Self control failure correlates with wellbeing, academic performance, sleep. Specifically I looked at their academic performance, sleep, and study time.
Research question:
To what extent do daily TikTok usage hours, sleep quality, and self-studytime affect the academic performance (GPA) in Chinese teenagers?
I came across a couple issues with one of my variables which were (gender_1-4) their was too many so I had to narrow it down to the ideal ones which is just “Female and Male” so it wasn’t more then 4 showing up.
Another thing I did in this code is fix the hours,I noticed for the tiktok_usage it was showing up as (120 hours) for monday but thats ulitmately incorrect so I fixed it to be 24 hours in a day and minutes instead so It made more sense.
library(readr)
library(readr)
# 1. Read the file into the variable 'df'
df <- read_csv("/Users/sadiyasow/Downloads/TikTok_Cleaned.csv")
## Rows: 362 Columns: 72
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): UserLanguage, media_use_freq_6_TEXT, phone_4_TEXT, education_part...
## dbl (63): Status, Progress, Duration__in_seconds_, Finished, Q_RecaptchaSco...
## dttm (3): StartDate, EndDate, RecordedDate
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# 2. Use 'df' to inspect the data (ensure this matches what you used above)
head(df)
## # A tibble: 6 × 72
## StartDate EndDate Status Progress Duration__in_seconds_
## <dttm> <dttm> <dbl> <dbl> <dbl>
## 1 2021-01-19 04:36:52 2021-01-19 04:37:16 0 100 24
## 2 2021-01-19 04:36:26 2021-01-19 04:37:20 0 100 54
## 3 2021-01-19 04:31:28 2021-01-19 04:39:45 0 100 496
## 4 2021-01-19 04:31:11 2021-01-19 04:40:25 0 100 553
## 5 2021-01-19 04:33:11 2021-01-19 04:44:57 0 100 705
## 6 2021-01-19 04:38:35 2021-01-19 04:45:00 0 100 385
## # ℹ 67 more variables: Finished <dbl>, RecordedDate <dttm>, UserLanguage <chr>,
## # Q_RecaptchaScore <dbl>, consent_parents <dbl>, consent_children <dbl>,
## # media_use_freq_1 <dbl>, media_use_freq_2 <dbl>, media_use_freq_3 <dbl>,
## # media_use_freq_4 <dbl>, media_use_freq_5 <dbl>, media_use_freq_6 <dbl>,
## # media_use_freq_6_TEXT <chr>, media_use_rank_1 <dbl>,
## # media_use_rank_2 <dbl>, media_use_rank_3 <dbl>, media_use_rank_4 <dbl>,
## # media_use_rank_5 <dbl>, media_use_rank_6 <dbl>, …
# Load the necessary libraries
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
# Set the working directory to where your Rmd file is
knitr::opts_knit$set(root.dir = getwd())
file_path <- "/Users/sadiyasow/Downloads/TikTok_Cleaned.csv"
if (file.exists(file_path)) {
message("File loaded successfully!")
} else {
stop(paste("ERROR: R cannot find the file at this location:", file_path,
"\nCheck if the filename in your Downloads folder matches exactly."))
}
## File loaded successfully!
# 3. Preview the data
head(df)
## # A tibble: 6 × 72
## StartDate EndDate Status Progress Duration__in_seconds_
## <dttm> <dttm> <dbl> <dbl> <dbl>
## 1 2021-01-19 04:36:52 2021-01-19 04:37:16 0 100 24
## 2 2021-01-19 04:36:26 2021-01-19 04:37:20 0 100 54
## 3 2021-01-19 04:31:28 2021-01-19 04:39:45 0 100 496
## 4 2021-01-19 04:31:11 2021-01-19 04:40:25 0 100 553
## 5 2021-01-19 04:33:11 2021-01-19 04:44:57 0 100 705
## 6 2021-01-19 04:38:35 2021-01-19 04:45:00 0 100 385
## # ℹ 67 more variables: Finished <dbl>, RecordedDate <dttm>, UserLanguage <chr>,
## # Q_RecaptchaScore <dbl>, consent_parents <dbl>, consent_children <dbl>,
## # media_use_freq_1 <dbl>, media_use_freq_2 <dbl>, media_use_freq_3 <dbl>,
## # media_use_freq_4 <dbl>, media_use_freq_5 <dbl>, media_use_freq_6 <dbl>,
## # media_use_freq_6_TEXT <chr>, media_use_rank_1 <dbl>,
## # media_use_rank_2 <dbl>, media_use_rank_3 <dbl>, media_use_rank_4 <dbl>,
## # media_use_rank_5 <dbl>, media_use_rank_6 <dbl>, …
library(readr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
head(df)
## # A tibble: 6 × 72
## StartDate EndDate Status Progress Duration__in_seconds_
## <dttm> <dttm> <dbl> <dbl> <dbl>
## 1 2021-01-19 04:36:52 2021-01-19 04:37:16 0 100 24
## 2 2021-01-19 04:36:26 2021-01-19 04:37:20 0 100 54
## 3 2021-01-19 04:31:28 2021-01-19 04:39:45 0 100 496
## 4 2021-01-19 04:31:11 2021-01-19 04:40:25 0 100 553
## 5 2021-01-19 04:33:11 2021-01-19 04:44:57 0 100 705
## 6 2021-01-19 04:38:35 2021-01-19 04:45:00 0 100 385
## # ℹ 67 more variables: Finished <dbl>, RecordedDate <dttm>, UserLanguage <chr>,
## # Q_RecaptchaScore <dbl>, consent_parents <dbl>, consent_children <dbl>,
## # media_use_freq_1 <dbl>, media_use_freq_2 <dbl>, media_use_freq_3 <dbl>,
## # media_use_freq_4 <dbl>, media_use_freq_5 <dbl>, media_use_freq_6 <dbl>,
## # media_use_freq_6_TEXT <chr>, media_use_rank_1 <dbl>,
## # media_use_rank_2 <dbl>, media_use_rank_3 <dbl>, media_use_rank_4 <dbl>,
## # media_use_rank_5 <dbl>, media_use_rank_6 <dbl>, …
list.files()
## [1] "_.jpeg"
## [2] "0309 (2)(1).mov"
## [3] "0316.mov"
## [4] "0317 (1).mov"
## [5] "0317 (1)(1).mov"
## [6] "0317 (2).mov"
## [7] "0321.mov"
## [8] "0406.mov"
## [9] "20220831.xlsx"
## [10] "6371b8a28dc9e.pdf"
## [11] "81ec3feb-4283-4c98-bd3c-61a0b2287e57.png"
## [12] "Airbnb_DC_25.csv"
## [13] "Airbnb_DC_25.xlsx"
## [14] "Alluvials_week5.qmd"
## [15] "arrests-latest.feather"
## [16] "Assignment 10 Sadiya Sow.pdf"
## [17] "Assignment 3.pdf"
## [18] "Assignment 4.pdf"
## [19] "Assignment 5 Sadiya Sow"
## [20] "Assignment 7.Rmd"
## [21] "Assignment 8.qmd"
## [22] "Assignment-7.html"
## [23] "Assignment8.docx"
## [24] "Assignment8.pdf"
## [25] "bar_charts_with_diamonds (1).qmd"
## [26] "bar_charts_with_diamonds.qmd"
## [27] "bird with spiky hair on head - Google Search.png"
## [28] "Black and Brown Elegant Potluck Sign-Up Sheet US Letter Document.pdf"
## [29] "cancer.csv"
## [30] "CapCut_7586419543699013646_installer.dmg"
## [31] "CapCut_7607909408877641741_installer.dmg"
## [32] "CapCut_7610309049170771982_installer.dmg"
## [33] "CapCut_7615277451215912973_installer.dmg"
## [34] "certificate.pdf"
## [35] "comparing-the-cost-of-project-apollo-to-today-s-industrial-policies.png"
## [36] "convertcsv.csv"
## [37] "Copy of Resume.docx"
## [38] "copy_45DD6420-2141-4D0E-938C-18CCEBDF3D93.MOV"
## [39] "copy_A3294811-5BDE-4E01-BFF3-980EB5836225 2.MOV"
## [40] "copy_A3294811-5BDE-4E01-BFF3-980EB5836225.MOV"
## [41] "Crime_Data_from_2020_to_2024 copy.csv"
## [42] "Crime_Data_from_2020_to_2024.csv"
## [43] "Data"
## [44] "Data 110 -correlation scatterplots and plotly.pdf"
## [45] "Data 110 Project 2.pdf"
## [46] "Data 110 unit 1 intro and Markdown and Rpubs (1).html"
## [47] "Data 110 unit 4 ethics p-hacking reproducibility_week 4 - Tagged.pdf"
## [48] "Data 110 unit 4 ethics p-hacking reproducibility_week 4.docx"
## [49] "Data 110 unit 5 heatmaps treemaps streamgraphs alluvials.pdf"
## [50] "Data 110 unit 9 webscraping and ethics (1).docx"
## [51] "Data 110 unit 9 webscraping and ethics.docx"
## [52] "Data 2"
## [53] "Data 3"
## [54] "Data 4"
## [55] "Data 5"
## [56] "Data.zip"
## [57] "detention-stints-latest.numbers"
## [58] "detention-stints-latest.xlsx"
## [59] "Downloads.Rproj"
## [60] "edanmdm-npg_NPG.86.TC52.txt"
## [61] "ERO Admin Arrests_LESA-STU-FINAL Release_raw.xlsx"
## [62] "food_stamps (1).csv"
## [63] "food_stamps (1).numbers"
## [64] "food_stamps.csv"
## [65] "food_stamps.numbers"
## [66] "gemini-code-1776895629643.r"
## [67] "Grammarly.o1.cxjsoW8fnfnaa6hd8pku03g2.dmg"
## [68] "Group Project Paper 108 New.pdf"
## [69] "household_debt (1).csv"
## [70] "household_debt.csv"
## [71] "How to Maintain Good Habits as a College Student.pdf"
## [72] "ICE Detentions_LESA-STU_FINAL Release_raw (1).xlsx"
## [73] "ICE Detentions_LESA-STU_FINAL Release_raw (2).xlsx"
## [74] "ICE Detentions_LESA-STU_FINAL Release_raw.xlsx"
## [75] "ice_release_1dec2025 (1)"
## [76] "ice_release_1dec2025 (1) 2"
## [77] "ice_release_1dec2025 (1).zip"
## [78] "ice_release_1dec2025.zip"
## [79] "images"
## [80] "IMG_0022.HEIC"
## [81] "IMG_0148.JPG"
## [82] "IMG_0208.HEIC"
## [83] "IMG_0258-preview.HEIC"
## [84] "IMG_0448.HEIC"
## [85] "IMG_0457.HEIC"
## [86] "IMG_0624.HEIC"
## [87] "IMG_0631.heic"
## [88] "IMG_0632.heic"
## [89] "IMG_0647.MOV"
## [90] "IMG_0653.MOV"
## [91] "IMG_0654.MOV"
## [92] "IMG_0655.MOV"
## [93] "IMG_0679.HEIC"
## [94] "IMG_0690.HEIC"
## [95] "IMG_0724.MOV"
## [96] "IMG_0732.MOV"
## [97] "IMG_0733.MOV"
## [98] "IMG_0749.heic"
## [99] "IMG_0767.jpg"
## [100] "IMG_0786.HEIC"
## [101] "IMG_0787.HEIC"
## [102] "IMG_0793.HEIC"
## [103] "IMG_0816.HEIC"
## [104] "IMG_0846.HEIC"
## [105] "IMG_0907.jpg"
## [106] "IMG_1008.HEIC"
## [107] "IMG_1010.HEIC"
## [108] "IMG_1011.HEIC"
## [109] "IMG_1013.HEIC"
## [110] "IMG_1014.HEIC"
## [111] "IMG_1015.HEIC"
## [112] "IMG_6921.heic"
## [113] "IMG_8123.HEIC"
## [114] "IMG_8343.HEIC"
## [115] "IMG_8344.HEIC"
## [116] "IMG_8346.HEIC"
## [117] "IMG_8348.HEIC"
## [118] "IMG_8349.HEIC"
## [119] "IMG_8350.HEIC"
## [120] "IMG_8351.HEIC"
## [121] "IMG_8381.jpg"
## [122] "IMG_8510.HEIC"
## [123] "IMG_8511.HEIC"
## [124] "IMG_8581.HEIC"
## [125] "IMG_8582.HEIC"
## [126] "IMG_8583.HEIC"
## [127] "IMG_8603.HEIC"
## [128] "IMG_8604.HEIC"
## [129] "IMG_8701.HEIC"
## [130] "IMG_8702.HEIC"
## [131] "IMG_8997.HEIC"
## [132] "IMG_8998.HEIC"
## [133] "IMG_9075.HEIC"
## [134] "IMG_9206.JPG"
## [135] "IMG_9306 2.HEIC"
## [136] "IMG_9306.HEIC"
## [137] "IMG_9307 2.HEIC"
## [138] "IMG_9307.HEIC"
## [139] "Install Respondus LockDown Browser (x64c) 171415267.pkg"
## [140] "KHSM Detentions (Book-ins) fy25m11.xlsx"
## [141] "Kimmel_Accounting_8e_ET_Ch01_Introduction-to-Financial-Statements.xlsx"
## [142] "Lecture 1.pptx"
## [143] "Lecture 10 Aggregate demand and supply.pptx"
## [144] "Lecture 10b Monetary and fiscal Policy - Tagged.pdf"
## [145] "Lecture 10b Monetary and fiscal Policy .pptx"
## [146] "Lecture 11 - Money, Bank and FED.pptx"
## [147] "Lecture 2 PPF.pptx"
## [148] "Lecture 3 Demand (1).pptx"
## [149] "Lecture 3 Demand.key"
## [150] "Lecture 3 Demand.pptx"
## [151] "Lecture 4 Supply and Market.key"
## [152] "Lecture 4 Supply and Market.pptx"
## [153] "Lecture 5.pdf"
## [154] "Lecture 9 Aggregate expenditure (s).pptx"
## [155] "Mac-GUI-1.82"
## [156] "Mac-GUI-1.82.tar.gz"
## [157] "Major change.pdf"
## [158] "Markets, Competition, and the Law of Demand - NotebookLM_files"
## [159] "Markets, Competition, and the Law of Demand - NotebookLM.html"
## [160] "meeting-94217853577.ics"
## [161] "NASA-budget (1).png (1).webp"
## [162] "Nations Analysis Sadiya Sow_files"
## [163] "Nations Analysis Sadiya Sow.html"
## [164] "Nations Analysis Sadiya Sow.qmd"
## [165] "nations.csv"
## [166] "NPG-NPG_86_TC52NewEconomy-000001.txt"
## [167] "oppurtunity cost in space "
## [168] "Outline Research Paper PHIL140 Sadiya Sow.pdf"
## [169] "Paper outline stucture.docx"
## [170] "Peer Review Essay 2 Sadiya Sow"
## [171] "project 1"
## [172] "Project 1 (1).Rmd"
## [173] "Project 1.Rmd"
## [174] "Project 2.RMD"
## [175] "Project-2.html"
## [176] "Project-2.RMD"
## [177] "Project.html"
## [178] "Project.Rmd"
## [179] "Project.Rmd 2.Rmd"
## [180] "Project1 (2).Rmd"
## [181] "Project1--2-.html"
## [182] "Project1.qmd"
## [183] "psych100.pdf"
## [184] "psychiatryint-06-00025 (1).pdf"
## [185] "psychiatryint-06-00025.pdf"
## [186] "R-4.5.2-arm64 (1).pkg"
## [187] "R-4.5.2-arm64.pkg"
## [188] "reading data in three ways and accessibility.qmd"
## [189] "reading-data-in-three-ways-and-accessibility_files"
## [190] "Roblox.dmg"
## [191] "Rough Draft Essay 1.pdf"
## [192] "Rough+Draft+Essay+1.pdf"
## [193] "Rplot.pdf"
## [194] "Rplot01.png"
## [195] "Rplot02.png"
## [196] "Rpubs.pdf"
## [197] "rsconnect"
## [198] "RStudio-2026.01.0-392 (1).dmg"
## [199] "RStudio-2026.01.0-392.dmg"
## [200] "RStudio-2026.01.1-403.dmg"
## [201] "sadiyasow345@gmail.com.ical.zip"
## [202] "Screen Recording 2026-03-31 at 11.56.01 PM.mp4"
## [203] "Screenshot 2026-04-06 at 2.11.29 PM.png"
## [204] "Sheet1.csv"
## [205] "Speech Template 2 (1).pptx"
## [206] "Speech Template 2 .pptx"
## [207] "spend 2 weeks in dubai w me facetime edition 2.mov"
## [208] "spend 2 weeks in dubai w me facetime edition.mov"
## [209] "spring_correlation scatterplots and regression.qmd"
## [210] "Surviving childhood in Africa - BBC News_files"
## [211] "Surviving childhood in Africa - BBC News.html"
## [212] "Table_1.docx"
## [213] "TableauPublic-2026-1-0-arm64.dmg"
## [214] "Template Final Group Project (1).pptx"
## [215] "Template Final Group Project .pptx"
## [216] "The_Influence_of_TikTok_Media_Exposure_on_Body_Ima.pdf"
## [217] "TikTok Fitspiration (1).sav"
## [218] "TikTok Fitspiration.sav"
## [219] "TikTok_Cleaned (1).csv"
## [220] "TikTok_Cleaned.csv"
## [221] "TikTok_SMSCF_Chinese_students_NL_RAW (1).sav"
## [222] "TikTok_SMSCF_Chinese_students_NL_RAW.sav"
## [223] "TikTok_SMSCF_Chinese_teenagers_RAW.sav"
## [224] "TikTok_SMSCF_S1_Chinese_adolescents_clear_final_29.03.2022.sav"
## [225] "TikTok_SMSCF_S1_Chinese_adolescents_clear_final_29.03.2022.sav copy"
## [226] "TikTok_SMSCF_S2_Chinese_uni_students_clear_final_29.03.2022.sav"
## [227] "Understanding Graphs.pptx"
## [228] "Untitled document.docx"
## [229] "Untitled.qmd"
## [230] "Untitled.R"
## [231] "Untitled.Rmd"
## [232] "View recent photos 2.png"
## [233] "View recent photos.png"
## [234] "Web Scraping_.pptx"
## [235] "Webscraping_IMDB19_tutorial_and_questions.qmd"
## [236] "Week 10_R Shiny App (1).pptx"
## [237] "Week 10_R Shiny App.pptx"
## [238] "Week 8.pptx"
## [239] "Week9_WebScrap (1).R"
## [240] "Week9_WebScrap.R"
## [241] "Zoom.pkg"
getwd()
## [1] "/Users/sadiyasow/Downloads"
Load file as Read.csv:
# Load the necessary libraries
library(readr)
library(tidyverse)
head(df)
## # A tibble: 6 × 72
## StartDate EndDate Status Progress Duration__in_seconds_
## <dttm> <dttm> <dbl> <dbl> <dbl>
## 1 2021-01-19 04:36:52 2021-01-19 04:37:16 0 100 24
## 2 2021-01-19 04:36:26 2021-01-19 04:37:20 0 100 54
## 3 2021-01-19 04:31:28 2021-01-19 04:39:45 0 100 496
## 4 2021-01-19 04:31:11 2021-01-19 04:40:25 0 100 553
## 5 2021-01-19 04:33:11 2021-01-19 04:44:57 0 100 705
## 6 2021-01-19 04:38:35 2021-01-19 04:45:00 0 100 385
## # ℹ 67 more variables: Finished <dbl>, RecordedDate <dttm>, UserLanguage <chr>,
## # Q_RecaptchaScore <dbl>, consent_parents <dbl>, consent_children <dbl>,
## # media_use_freq_1 <dbl>, media_use_freq_2 <dbl>, media_use_freq_3 <dbl>,
## # media_use_freq_4 <dbl>, media_use_freq_5 <dbl>, media_use_freq_6 <dbl>,
## # media_use_freq_6_TEXT <chr>, media_use_rank_1 <dbl>,
## # media_use_rank_2 <dbl>, media_use_rank_3 <dbl>, media_use_rank_4 <dbl>,
## # media_use_rank_5 <dbl>, media_use_rank_6 <dbl>, …
Data Cleaning:
dplyr): I used the
filter() and mutate() commands from the
dplyr library. The filter() command was
essential for removing missing values (NA) in key columns,
ensuring that the visualization only reflects complete observations. The
mutate() command was used to transform the
gender variable into a factor, which allows
the plotting engine to treat these values as distinct groups (colors)
rather than a continuous numeric range.df_summary <- df %>%
filter(!is.na(sleep_quality), !is.na(tiktok_use_hours_mon)) %>%
group_by(sleep_quality) %>%
summarize(mean_usage = mean(tiktok_use_hours_mon, na.rm = TRUE)) %>%
mutate(sleep_quality = factor(sleep_quality))
ggplot(df_summary, aes(x = sleep_quality, y = mean_usage, fill = sleep_quality)) +
geom_col() +
labs(title = "Average TikTok Usage by Sleep Quality",
x = "Sleep Quality Rating (1=Poor, 4=Good)",
y = "Mean TikTok Usage (Hours/Monday)") +
theme_minimal() +
theme(legend.position = "none")
I wanted to see if more TikTok time meant worse sleep, so I made this bar chart to look at the average usage for each sleep quality rating. suprisingly the data actually shows that students who reported the best sleep (level 4) also have the highest average TikTok usage Which I wasn’t expecting, a positive correlation is there, but it also suggests that for these students, TikTok isn’t necessarily ruining their sleep quality. So it could be another factor that plays into it maybe that they’re better using Tik tok differently or managing their time.
df_plot <- df %>%
# Remove missing data
filter(!is.na(tiktok_use_hours_mon), !is.na(scores), !is.na(gender)) %>%
# Explicitly keep only gender 1 and 2
filter(gender %in% c(1, 2)) %>%
mutate(gender_label = factor(gender, levels = c(1, 2), labels = c("Male", "Female")))
hchart(df_plot, "scatter", hcaes(x = tiktok_use_hours_mon,
y = scores,
group = gender_label)) %>%
# Blue for Male (#377EB8), Pink for Female (#F781BF)
hc_colors(c("#377EB8", "#F781BF")) %>%
# This adds the score numbers on the dots
hc_plotOptions(scatter = list(dataLabels = list(enabled = TRUE, format = "{point.y}"))) %>%
hc_title(text = "TikTok Usage vs. Academic Score by Gender") %>%
hc_xAxis(title = list(text = "TikTok Hours (Monday)")) %>%
hc_yAxis(title = list(text = "Academic Score (GPA Proxy)")) %>%
hc_add_theme(hc_theme_elementary()) %>%
hc_caption(text = "Source: TikTok_Cleaned.csv")
This scatter plot shows how daily TikTok usage relates to academic performance (relatively their GPA), with the data broken down by gender to see if there are any differences.
To get the data ready, I had to do a bit of cleaning. I removed entries with missing values to keep the results accurate and filtered the dataset to focus on the two genders since their were more then two genders in the data, Males and Females. I also converted the time data into minutes and changed it at 24 hours (1,440 minutes). This was a necessary step because the raw data included some unrealistic hourslike 120-hour days which isn’t possible.
Looking at the plot, I was checking for a trend to see if higher TikTok usage correlates with lower academic scores. While it’s hard to find a ultimate idea , the scatter plot showed where the individual students fell. It shows that academic success isn’t always just one factor.
library(highcharter)
library(dplyr)
# 1. Clean the data: Filter for valid 0-24 hour range, then convert to minutes
df_plot <- df %>%
filter(!is.na(tiktok_use_hours_mon), !is.na(study_hours_mon), !is.na(gender)) %>%
# Filter to remove anything greater than 24 hours
filter(tiktok_use_hours_mon <= 24 & study_hours_mon <= 24) %>%
mutate(
gender_factor = as.factor(gender),
# Convert to minutes
tiktok_use_min_mon = tiktok_use_hours_mon * 60,
study_min_mon = study_hours_mon * 60
)
hchart(df_plot, "scatter", hcaes(x = tiktok_use_min_mon,
y = study_min_mon,
size = sleep_quality,
group = gender_factor)) %>%
hc_colors(c("#377EB8", "#F781BF", "#984EA3")) %>%
hc_plotOptions(scatter = list(dataLabels = list(enabled = TRUE, format = "{point.y}"))) %>%
hc_xAxis(title = list(text = "TikTok Usage (Minutes/Monday)"),
min = 0, max = 1440,
labels = list(format = "{value}")) %>%
hc_yAxis(title = list(text = "Study Time (Minutes)"),
min = 0, max = 1440,
labels = list(format = "{value}")) %>%
hc_title(text = "Study Time vs. TikTok Usage (Minutes, Limited to 24hr)") %>%
hc_add_theme(hc_theme_elementary()) %>%
hc_caption(text = "Source: TikTok_Cleaned.csv")
For this final visualization, I wanted to see how study time and TikTok usage could be negative against each other, while also factoring in sleep quality. I used a bubble chart to map TikTok usage on the X-axis and study time on the Y-axis. The cool part is that the size of each bubble represents the student’s reported sleep quality, so i factored in a 3rd variable in this graph.
I had to do some more data cleaning which was again changing the hours to 24 hours to be minutes too and then I multiplied the hourly data by 60 so that it converted everything to minutes so it was easier to understand and read. Even thoough it’s still very clustered. I had to use AI to help convert the minutes as well. For the grouping I used the gender _factor so that the color were grouped and datalabels for specifc study time.
Model Linear regression:
model <- lm(study_hours_mon ~ tiktok_use_hours_mon + sleep_quality, data = df)
# View the results
summary(model)
##
## Call:
## lm(formula = study_hours_mon ~ tiktok_use_hours_mon + sleep_quality,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -108950 -39969 -39425 -39413 9896482
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 167608.4 103745.1 1.616 0.107
## tiktok_use_hours_mon 271.7 4830.5 0.056 0.955
## sleep_quality -64090.4 47651.4 -1.345 0.180
##
## Residual standard error: 605400 on 270 degrees of freedom
## (89 observations deleted due to missingness)
## Multiple R-squared: 0.006713, Adjusted R-squared: -0.000645
## F-statistic: 0.9123 on 2 and 270 DF, p-value: 0.4028Equation:
lm(study_hours_mon ~ tiktok_use_hours_mon + sleep_quality, data = data)
(Intercept): The predicted study hours when TikTok usage and sleep quality are zero.
The expected change in study hours for every 1-hour increase in TikTok usage (holding sleep quality constant).
The expected change in study hours for every 1-unit increase in sleep quality (holding TikTok usage constant).
The expected change in study hours for every 1-hour increase in TikTok usage (holding sleep quality constant).
The expected change in study hours for every 1-unit increase in sleep quality (holding TikTok usage constant).
The error term (residual).
This project has provided a practical look at how student survey data can be transformed into actionable insights. By cleaning the “Tiktok_Cleaned.csv” dataset and applying visualization , I was able to observe how social media engagement correlates with academic and lifestyle variables.
While the data did not reveal a simple, one-size-fits-all rule, it goes over the diversity of student experiences. Some students maintain their study hours which are relatively high regardless of TikTok usage, while others show a more inverse relationship. This data reinforced how crucial it is to do data cleaning and how to remove the “NA” values and properly characterize variables, this can help prevent errors and improve insights. To conclude , this project aids as a step toward being more considerate our time management.
What I would do differently is maybe pick a better data set as this one had lots of cleaning to do like I said in the conclusion, it was a lot of processing and filtering out but over all I liked the project and It made me think about decisions I should be making in my study and sleep time.
Citation:
from (https://www.sciencedirect.com/science/article/pii/S0001691824004438)
AI WAS USED in changing my file sot CSV and trying to figure out why RMD file wasn’t rendering, and lastly to help me figure out how to convert my hours into minutes for my visualizations. :)