#Find an image that relates to your topic and place it at the top of your document.library(png)#install.packages("png")img <-readPNG("dataset-cover.png") # works for Netpbm grayscale formats# View image plot(as.raster(img))
#Cite the Source for the image: Kumaresan, A. (2025, September 14). Screen Time Vs Mental Wellness Survey - 2025. Kaggle. <https://www.kaggle.com/datasets/adharshinikumar/screentime-vs-mentalwellness-survey-2025>
#A total of 400 individuals aged 16 to 40 completed a survey regarding their daily technology use. The resulting dataset enables analysis of the relationship between screen usage and mental health. The dataset includes 15 variables. Seven variables are categorical: gender (Male, Female, Other), occupation (student, working professional, freelancer), work mode (remote, on-site, hybrid), sleep quality (self-reported rating from 1, very poor, to 5, excellent), stress level (self-reported from 0, no stress, to 10, extremely stressed), self-rated productivity score (0–100), and a composite index of overall wellness (0–100). Six variables are quantitative: age, total average daily screen usage (hours), daily screen time spent on work or study tasks (hours), average sleep duration per night (hours), total minutes spent exercising per week, and hours spent socializing offline per week. The primary research question is whether total average daily screen usage and average sleep duration per night predict the number of hours spent socializing offline per week.#CSV: Please see attached. Hyperlink the source for where you get the dataset: <https://www.kaggle.com/datasets/adharshinikumar/screentime-vs-mentalwellness-survey-2025> The number of variables: 15#How many are categorical: 7 categorical variables#Identify the categorical variables: Male/Female/Other (Gender), Occupation(Participant’s role such as Student, Working Professional, Freelancer, etc., work mode (work/study mode such as remote, on-site, hybrid), sleep quality (self-reported sleep quality rating 1=very poor, 5= excellent), stress level (self-reported stress level 0=no stress, 10=extremely stressed), self-rated productivity score (0-100), composite index reflecting overall wellness (0-100)#How many are quantitative: 4#Specify which variables you plan to use: Age, total average daily screen usage(hours), daily screen time spent on work/study tasks (hours), and average sleep duration per night (hours)#Research Question: Are a person’s total average daily screen usage (hours) and average sleep duration per night(hours) predictive of his or her social hours per week?#Load the necessary libraries library(dplyr)#It provides a grammar of data manipulationlibrary(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)library(ggfortify)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(GGally)library(DataExplorer)#install.packages("DataExplorer")library(ggrepel)#Load your dataset using the readr::read_csv() commandscreen_time<-readr::read_csv("ScreenTime vs MentalWellness (3).csv")
Rows: 400 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): user_id, gender, occupation, work_mode
dbl (11): age, screen_time_hours, work_screen_hours, leisure_screen_hours, s...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Count missing valuessum(is.na(screen_time))
[1] 0
#Show code in the output{echo=TRUE}#Check out the first few lineshead(screen_time)
#Explore both quantitative and categorical variables with simple plots to determine what you want to focus on for your final visualization.screen_time%>%arrange(desc(screen_time_hours))%>%View()#order by descending total average daily screen usage screen_time%>% arrange(desc(social_hours_per_week))%>%View()#order by descending hours spent socializing offline per week.#Create a scatterplot with a linear regression.plot1_z<-ggplot(screen_time,aes(screen_time_hours, social_hours_per_week))+labs(#Main title title ="Social Hours per Week versus Screen Time in Hours in 400 subjects",subtitle ="What is the relationship between sleep and social activity?",caption ="Source: ScreenTime vs MentalWllness", #Add a caption x = "screen time in hours", #Give x-axis a name #Give y-axis a namey ="Social Hours per week") +#Fix the axes to start at 0. coord_cartesian(xlim=c(0,25), ylim=c(0,33))+#add linear regression with confidence interval and remove the confidence interval bandgeom_smooth(method='lm',formula=y~x,se=FALSE, color="black")+#Add the points and separate points that sit on top of each other and adjust color inside and outside, size of points geom_jitter(color="blue", size =1.5, alpha =0.4, width =0.4, height =0.3, stroke=0.5)+#Change the default themetheme_light(base_size =12)plot1_z
#any tips on utilizing geom_point ?geom_pointnames(screen_time)
Call:
lm(formula = social_hours_per_week ~ screen_time_hours, data = screen_time)
Residuals:
Min 1Q Median 3Q Max
-10.0077 -3.3183 -0.0243 3.0159 16.5632
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.42908 0.90652 12.608 < 2e-16 ***
screen_time_hours -0.39048 0.09683 -4.033 6.62e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.818 on 398 degrees of freedom
Multiple R-squared: 0.03925, Adjusted R-squared: 0.03684
F-statistic: 16.26 on 1 and 398 DF, p-value: 6.616e-05
#The model has the equation: Social hours per week = -0.39048 (screen_time_hours)+11.42908 The slope may be interpreted in the following: For each additional total average daily screen usage (hour), there is a predicted decrease of -0.39048 hours spent socializing offline per week. The column for Pr(\>\|t\|) p-value on the right of screen_time_hours (0.0000662) has 3 asterisks which suggests it is statistically significant variable to explain the weak, negative correlation between total average daily screen usage (hour) and hours spent socializing offline per week. The more asterisks, the more the variable contributes to the model. 3.684% of the variation in the observations may be explained by the model. In other words, 96.316% of the variance in the data is likely not explained by this model.#name each columnnames(screen_time)
#Check out the pairwise comparisions with density curves and correlation output## Is there an easier way to compare multiple variables using a scatterplot matrix?screen_time_2 <- screen_time %>%rename( screen = screen_time_hours, work = work_screen_hours, leisure = leisure_screen_hours, exercise = exercise_minutes_per_week, social = social_hours_per_week, wellness = mental_wellness_index_0_100 ) ggpairs(screen_time_2, columns =6:14, upper =list(continuous =wrap("cor", size =4)))
Warning: `fortify(<lm>)` was deprecated in ggplot2 4.0.0.
ℹ Please use `broom::augment(<lm>)` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
#If we are trying to predict Hours spent socializing offline per week, then we can see if any of the predictor variables contribute to this model. Note the adjusted R-squared value is 0.01903. You can see all of the variables. Adjusted R-squared is 0.01903 1.903%. The variables that do not appear to be as significant as the others are productivity_0_100, exercise_minutes_per_week, stress_level_0_10, sleep_quality_1_5, sleep_hours, leisure_screen_hours, work_modeRemote, work_modeIn-person, occupationUnemployed, occupationStudent, occupationSelf-employed, occupationRetired, genderNon-binary/Other, genderMale, and age since they all have large p-values. So drop that and re-run the model.access_3 <-lm(social_hours_per_week ~ screen_time_hours + work_screen_hours,data = screen_time); summary(access_3)
Call:
lm(formula = social_hours_per_week ~ screen_time_hours + work_screen_hours,
data = screen_time)
Residuals:
Min 1Q Median 3Q Max
-10.0572 -3.3176 -0.0854 3.1150 16.0500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.6883 0.9306 12.560 < 2e-16 ***
screen_time_hours -0.4623 0.1133 -4.081 5.43e-05 ***
work_screen_hours 0.1782 0.1461 1.219 0.223
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.815 on 397 degrees of freedom
Multiple R-squared: 0.04284, Adjusted R-squared: 0.03802
F-statistic: 8.884 on 2 and 397 DF, p-value: 0.0001681
#1.Look at the p-value for each variable #2. Check out the residual plots. #3. Look at the output for the Adjusted R-Squared value at the bottom of the output. #Try the last model, but drop the last two observations #The residuals plots show observations 58, 111 and 117 have an effect on the residuals plots as well having high scale-location values. options(scipen =0); access_4 <- screen_time[-c(53, 177, 111), ]summary(access_4)
user_id age gender occupation
Length:397 Min. :16.00 Length:397 Length:397
Class :character 1st Qu.:24.00 Class :character Class :character
Mode :character Median :30.00 Mode :character Mode :character
Mean :29.83
3rd Qu.:35.00
Max. :60.00
work_mode screen_time_hours work_screen_hours leisure_screen_hours
Length:397 Min. : 1.000 Min. : 0.11 Min. : 0.890
Class :character 1st Qu.: 7.380 1st Qu.: 0.70 1st Qu.: 5.460
Mode :character Median : 9.110 Median : 1.45 Median : 6.700
Mean : 9.029 Mean : 2.18 Mean : 6.848
3rd Qu.:10.540 3rd Qu.: 3.01 3rd Qu.: 8.440
Max. :19.170 Max. :12.04 Max. :13.350
sleep_hours sleep_quality_1_5 stress_level_0_10 productivity_0_100
Min. :4.640 Min. :1.000 Min. : 0.000 Min. : 20.60
1st Qu.:6.400 1st Qu.:1.000 1st Qu.: 6.900 1st Qu.: 43.60
Median :7.030 Median :1.000 Median : 8.800 Median : 51.80
Mean :7.014 Mean :1.398 Mean : 8.152 Mean : 54.29
3rd Qu.:7.640 3rd Qu.:2.000 3rd Qu.:10.000 3rd Qu.: 63.00
Max. :9.740 Max. :4.000 Max. :10.000 Max. :100.00
exercise_minutes_per_week social_hours_per_week mental_wellness_index_0_100
Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.: 58.0 1st Qu.: 4.5 1st Qu.: 3.80
Median :103.0 Median : 7.7 Median :14.80
Mean :110.1 Mean : 7.8 Mean :20.34
3rd Qu.:157.0 3rd Qu.:10.9 3rd Qu.:30.60
Max. :372.0 Max. :19.4 Max. :97.00
#The model accounts for 3.0802% of the variation in the observations, indicating that 96.9198% of the variance remains unexplained. Further analysis should consider including another variable with a p-value less than the significance level of 0.05. Including both the other variable and specifically, total average daily screen usage (hours) may help explain the correlation between total average daily screen usage (hours) and hours spent socializing offline per week. I encountered challenges in labeling my data points due to the large sample size of 400 participants so I decided to decline the use of geom text and I used geom jitter to avoid overlap in the points.