Exploring Dataset

library(readxl)
student_lifestyle <- read_excel("~/Library/CloudStorage/OneDrive-DrexelUniversity/CJS 310/Final Project/student_lifestyle.xls")

My research question is: Which lifestyle factor best predicts academic performance: sleep, study time, or socializing? (Needs to be expaned a bit) That’s why I chose this data set because it has information on students sleep, study and socializing hours and their corresponding GPA and will be a good start.

df <- data.frame(student_lifestyle$Sleep_Hours_Per_Day, student_lifestyle$GPA)
dens <- kde2d(df$student_lifestyle.Sleep_Hours_Per_Day, df$student_lifestyle.GPA, n = 100) 
df$sleep_density <- with(df, dens$z[ findInterval(df$student_lifestyle.Sleep_Hours_Per_Day, dens$x), findInterval(df$student_lifestyle.GPA, dens$y)])

ggplot(df, aes(x = student_lifestyle.Sleep_Hours_Per_Day, y = student_lifestyle.GPA, color = sleep_density)) + geom_point(size = 3) + scale_color_gradient(low = "green", high = "red") + geom_smooth(method = "lm", se = FALSE, color = "black", linewidth = 1.2) + labs( title = "Sleep Hours vs GPA (Density Colored)", x = "Hours of Sleep", y = "GPA", color = "Density" ) + theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Warning in `[<-.data.frame`(`*tmp*`, names(mapped), value = list(colour =
## c("#EA7200", : replacement element 1 has 4000000 rows to replace 2000 rows

This is a scatter plot that shows the correlation between hours of sleep and their GPA with a trend line

df <- data.frame(student_lifestyle$Study_Hours_Per_Day, student_lifestyle$GPA)
dens <- kde2d(df$student_lifestyle.Study_Hours_Per_Day, df$student_lifestyle.GPA, n = 100) 
df$study_density <- with(df, dens$z[ findInterval(df$student_lifestyle.Study_Hours_Per_Day, dens$x), findInterval(df$student_lifestyle.GPA, dens$y)])

ggplot(df, aes(x = student_lifestyle.Study_Hours_Per_Day, y = student_lifestyle.GPA, color = study_density)) + geom_point(size = 3) + scale_color_gradient(low = "green", high = "red") +  geom_smooth(method = "lm", se = FALSE, color = "black", linewidth = 1.2) + labs( title = "Study Hours vs GPA (Density Colored)", x = "Hours of Studying", y = "GPA", color = "Density" ) + theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Warning in `[<-.data.frame`(`*tmp*`, names(mapped), value = list(colour =
## c("#F64F00", : replacement element 1 has 4000000 rows to replace 2000 rows

This is a scatter plot that shows the correlation between how long a student spends studying and their GPA with a trend line

df <- data.frame(student_lifestyle$Social_Hours_Per_Day, student_lifestyle$GPA)
dens <- kde2d(df$student_lifestyle.Social_Hours_Per_Day, df$student_lifestyle.GPA, n = 100) 
df$social_density <- with(df, dens$z[ findInterval(df$student_lifestyle.Social_Hours_Per_Day, dens$x), findInterval(df$student_lifestyle.GPA, dens$y)])

ggplot(df, aes(x = student_lifestyle.Social_Hours_Per_Day, y = student_lifestyle.GPA, color = social_density)) + geom_point(size = 3) + scale_color_gradient(low = "green", high = "red") +  geom_smooth(method = "lm", se = FALSE, color = "black", linewidth = 1.2) + labs( title = "Social Hours vs GPA (Density Colored)", x = "Hours of Socializing", y = "GPA", color = "Density" ) + theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Warning in `[<-.data.frame`(`*tmp*`, names(mapped), value = list(colour =
## c("#E08700", : replacement element 1 has 4000000 rows to replace 2000 rows

This is a scatter plot that shows the correlation between hours of spent socializing and their GPA with a trend line

From this early analysis, I know I need to expand my question. For the sleep graph, it actually showed that there wasn’t a strong correlation, the studying graph showed a strong positive linear correlation and the socializing graph showed a weak negative correlation. Although this is useful information, I think I could expand my question. The student_lifestyle dataset also mentions the students stress levels and I would like to incorporate that in my analysis

Exploring Dataset

Keyra DeSouza

2026-02-19