Exploring How Personal Interest and Stress level Correlates with Time Spent Studying in University Students
Executive summary
A box plot showed that students with greater Data Science interest generally reported studying more hours per week, indicating that intrinsic motivation is linked to stronger study habits. The scatterplot of stress and study hours, however, showed a negligible relationship, suggesting that perceived stress doesn’t meaningfully predict weekly study time.
Exploratory Data Analysis (EDA)
Code
df <-read.csv("data1001_survey_data_2025.csv")clean_data <- df %>%filter(consent =="I consent to take part in the study", age >17, # filtering potential false ages, such as 4, by setting the lower boundary to age to 17 - not 18, as some students attend university early hours_studying <=22, # remove hours above 22, based on various universities recommended unit study hours per week mark_goal >=50, mark_goal <=100) %>%# keep goals between 50 and 100, as we are only interested in participants who intend to pass the courseselect(mark_goal, hours_studying, data_interest, stress) %>%drop_na()clean_data_numeric <- clean_data %>%mutate(stress =as.numeric(stress),hours =as.numeric(hours_studying) ) %>%drop_na()
The data was sourced from a survey completed by 2955 DATA1X01 students during Semester 2 2025, covering 28 variables about their university life. For our analysis, we focused on three variables: students’ interest in Data Science from 0 (No Interest) to 10 (Extremely Interested), hours students hoped to study DATA1X01 per week, and student self-reported stress from 0 (No Stress) to 10 (Worst Stress Imaginable). We classified the variables as quantitative discrete, quantitative continuous, and quantitative discrete, respectively.
Limitations
Potential limitations include the integrity of answers, as some students could have given unrealistic or joke responses. Study hours could also be influenced by course content difficulty, acting as a confounding variable. Finally, some students did not consent to participate, which reduced the sample size and may limit how representative the data is.
Assumptions
We assumed that students provided honest and reasonable responses, and that reported study hours are assumed to be actual studying time (not time spent distracted, multitasking, attending lectures, etc.). We cleaned the data to only include consenting responses and restricted study hours to a plausible range of 0–22 per week, ensuring more reliable results.
Research Question 1
How does interest in data science correlate with time spent studying DATA1X01?
This boxplot shows the relationship between students’ self-rated interest in Data Science (0–10) and their reported weekly study hours.There is a positive correlation between them: the higher the interest, the longer the average weekly study time. Furthermore, students with higher interest levels exhibit greater variation in their study time, with some studying over 10 hours per week, while others maintain a range of 4 to 6 hours.
Code
showtext_auto()ggplot(clean_data, aes(x =factor(data_interest), y = hours_studying)) +geom_boxplot(fill ="darkseagreen", size =0.8, colour ="darkorchid4", alpha =0.5) +labs(title ="Reported Hours of Study Per Week by Interest Level",x ="Interest in Data science",y ="Reported Hours of Study per Week") +theme_bw(base_family ="mono") +theme(plot.title =element_text(face ="bold", hjust =0.5, size =30, colour ="#1f4e5f"),axis.title =element_text(face ="bold", size =14),axis.text =element_text(face ="bold", size =12))
Students with low interest (0–2) report relatively few study hours, with medians around 2–3 hours per week. From interest levels 3–6, study hours gradually increase, with medians shifting upward to 4–5 hours. At high interest (7–10), both the median and spread of study hours increase noticeably, with medians reaching 5–6 hours and more variability (some students studying 15+ hours). Outliers exist across all levels, showing a few students report unusually high study hours regardless of interest.
To summarize, higher interest in Data Science is generally associated with more study hours, and students with stronger interest display greater variation in study time. This suggests intrinsic motivation not only increases average effort but also widens the range of engagement.
Research Question 2
Does perceived stress predict weekly study hours among DATA1X01 students?
The scatterplot of perceived stress against weekly study hours shows points widely dispersed with no clear pattern. The fitted regression line has a slight positive slope, suggesting that higher stress levels are associated with slightly more study time. The model estimated a slope of 0.13 hours per stress unit, with an intercept of 4.55 hours. The correlation was r = 0.035, and the model’s R² = 0.003 indicates that stress explains less than 0.3% of the variance in study hours.
Code
# Plotshowtext_auto()ggplot(clean_data_numeric, aes(stress, hours)) +geom_point(alpha =0.15, size =3, colour ="darkgreen") +geom_smooth(method ="lm", se =TRUE, colour ="darkorchid4", fill ="violet") +labs(x ="Perceived stress",y ="Weekly study hours",title ="Reported Hours of Study Per Week vs Stress" ) +theme_bw(base_family ="mono") +theme(plot.title =element_text(face ="bold", hjust =0.5, size =30, colour ="#1f4e5f"),axis.title =element_text(face ="bold", size =14),axis.text =element_text(face ="bold", size =12) )
Although the slope was statistically significant (p = 0.003), the effect size is negligible. The residuals versus fitted values plot showed a wide scatter with no structure, reinforcing that the model has very weak explanatory power.
In practical terms, perceived stress does not meaningfully predict how much students study each week. For the client, this means that reducing stress alone is unlikely to affect study behaviours; instead, combining wellbeing support with strategies that increase motivation and study skills would be more effective.
Articles
Naeem, I., Aparicio-Ting, F.E., & Dyjur, P. (2020). Student Stress and Academic Satisfaction: A Mixed Methods Exploratory Study. International Journal of Innovative Business Strategies, 6(1), pp.388–395. https://doi.org/10.20533/ijibs.2046.3626.2020.0050.
Zubair, T., Qazi, U., Faisal, S. M., & Khalid Khan, A. (2024). The impact of study hours on academic performance: A statistical analysis of students’ grades. International Journal of Multidisciplinary Research and Growth Evaluation, 5(3), 720–728. https://doi.org/10.54660/.ijmrge.2024.5.3.720-728
Acknowledgements
AI Usage Statement
We acknowledge using generative AI tools, specifically ChatGPT 5, to help prepare this report. The tool was used to check grammar, improve the structure, and make our R code clearer. While it supported drafting and expression, all analysis and conclusions are our own work and judgment. ChatGPT also helped explain complex concepts in a straightforward way, making them easier to communicate and understand.
Meeting Schedule
Date
Time
Attendance
Minutes
05/09
13:00 - 14:00
Astha, Brent, Kris, Jarif, Josephine
Brainstormed research questions
Established a project timeline
Allocated roles
09/09
16:00 - 17:00
Astha, Deevesh, Kris, Jarif
Reviewed progress on Research Question 1
Changed Research Question 2
Decided on the limitations and assumptions
Planned tasks for the next week
17/09
17:00 - 18:30
Astha, Brent, Deevesh, Kris, Jarif, Josephine
Reviewed progress on Research Question 1 and 2
Worked on the professional standard of report and acknowledgements.
Wrote the executive summary
Split tasks for the presentation
Tabulated group contribution and meeting hours
Compiled everything to R and completed final edit
Group Contribution
Task
Group Member Allocated
Executive Summary
All
Exploratory Data Analysis
Astha, Jarif
Research Question 1
Josephine, Kris
Research Question 2
Deevesh, Kris, Josephine
Articles 1 and 2
Astha, Brent, Josephine
Acknowledgement
All
Professional Standard of Report
Astha, Brent, Jarif
Presentation
All
Professional Standard of Report
We adhered to the shared value of truthfulness and integrity by ensuring our analyses relied solely on the data, avoiding bias or preconceived conclusions. We also followed the ethical principle of transparency, exposing our data cleaning steps, displaying R code used for key findings, and documenting limitations so that our methods and results are clear and reproducible.
Source Code
---title: "Project 1 DATA1001"author: "550845168 ; 530511074 ; 550157094 ; 550358901 ; 550391160 ; 550603465"format: html: theme: cosmo embed-resources: true toc: true code-fold: true code-tools: true---------------------------------------------------------------------------```{r setup, include=FALSE}knitr::opts_chunk$set(warning =FALSE, message =FALSE)library(tidyverse)library(knitr)library(showtext)library(tidyr)library(dplyr)```# Exploring How Personal Interest and Stress level Correlates with Time Spent Studying in University Students------------------------------------------------------------------------------------------------------------------------------------------------# **Executive summary**A box plot showed that students with greater Data Science interest generally reported studying more hours per week, indicating that intrinsic motivation is linked to stronger study habits. The scatterplot of stress and study hours, however, showed a negligible relationship, suggesting that perceived stress doesn’t meaningfully predict weekly study time.------------------------------------------------------------------------# **Exploratory Data Analysis (EDA)**```{r}#| code-fold: showdf <-read.csv("data1001_survey_data_2025.csv")clean_data <- df %>%filter(consent =="I consent to take part in the study", age >17, # filtering potential false ages, such as 4, by setting the lower boundary to age to 17 - not 18, as some students attend university early hours_studying <=22, # remove hours above 22, based on various universities recommended unit study hours per week mark_goal >=50, mark_goal <=100) %>%# keep goals between 50 and 100, as we are only interested in participants who intend to pass the courseselect(mark_goal, hours_studying, data_interest, stress) %>%drop_na()clean_data_numeric <- clean_data %>%mutate(stress =as.numeric(stress),hours =as.numeric(hours_studying) ) %>%drop_na()```The data was sourced from a survey completed by 2955 DATA1X01 students during Semester 2 2025, covering 28 variables about their university life. For our analysis, we focused on three variables: students' interest in Data Science from 0 (No Interest) to 10 (Extremely Interested), hours students hoped to study DATA1X01 per week, and student self-reported stress from 0 (No Stress) to 10 (Worst Stress Imaginable). We classified the variables as quantitative discrete, quantitative continuous, and quantitative discrete, respectively.## **Limitations**Potential limitations include the integrity of answers, as some students could have given unrealistic or joke responses. Study hours could also be influenced by course content difficulty, acting as a confounding variable. Finally, some students did not consent to participate, which reduced the sample size and may limit how representative the data is.## **Assumptions**We assumed that students provided honest and reasonable responses, and that reported study hours are assumed to be actual studying time (not time spent distracted, multitasking, attending lectures, etc.). We cleaned the data to only include consenting responses and restricted study hours to a plausible range of 0–22 per week, ensuring more reliable results. ------------------------------------------------------------------------# **Research Question 1**### **How does interest in data science correlate with time spent studying DATA1X01?**This boxplot shows the relationship between students’ self-rated interest in Data Science (0–10) and their reported weekly study hours.There is a positive correlation between them: the higher the interest, the longer the average weekly study time. Furthermore, students with higher interest levels exhibit greater variation in their study time, with some studying over 10 hours per week, while others maintain a range of 4 to 6 hours. ```{r}showtext_auto()ggplot(clean_data, aes(x =factor(data_interest), y = hours_studying)) +geom_boxplot(fill ="darkseagreen", size =0.8, colour ="darkorchid4", alpha =0.5) +labs(title ="Reported Hours of Study Per Week by Interest Level",x ="Interest in Data science",y ="Reported Hours of Study per Week") +theme_bw(base_family ="mono") +theme(plot.title =element_text(face ="bold", hjust =0.5, size =30, colour ="#1f4e5f"),axis.title =element_text(face ="bold", size =14),axis.text =element_text(face ="bold", size =12))```Students with low interest (0–2) report relatively few study hours, with medians around 2–3 hours per week. From interest levels 3–6, study hours gradually increase, with medians shifting upward to 4–5 hours. At high interest (7–10), both the median and spread of study hours increase noticeably, with medians reaching 5–6 hours and more variability (some students studying 15+ hours). Outliers exist across all levels, showing a few students report unusually high study hours regardless of interest.To summarize, higher interest in Data Science is generally associated with more study hours, and students with stronger interest display greater variation in study time. This suggests intrinsic motivation not only increases average effort but also widens the range of engagement.# **Research Question 2**### **Does perceived stress predict weekly study hours among DATA1X01 students?**The scatterplot of perceived stress against weekly study hours shows points widely dispersed with no clear pattern. The fitted regression line has a slight positive slope, suggesting that higher stress levels are associated with slightly more study time. The model estimated a slope of 0.13 hours per stress unit, with an intercept of 4.55 hours. The correlation was r = 0.035, and the model’s R² = 0.003 indicates that stress explains less than 0.3% of the variance in study hours.```{r}# Plotshowtext_auto()ggplot(clean_data_numeric, aes(stress, hours)) +geom_point(alpha =0.15, size =3, colour ="darkgreen") +geom_smooth(method ="lm", se =TRUE, colour ="darkorchid4", fill ="violet") +labs(x ="Perceived stress",y ="Weekly study hours",title ="Reported Hours of Study Per Week vs Stress" ) +theme_bw(base_family ="mono") +theme(plot.title =element_text(face ="bold", hjust =0.5, size =30, colour ="#1f4e5f"),axis.title =element_text(face ="bold", size =14),axis.text =element_text(face ="bold", size =12) )```Although the slope was statistically significant (p = 0.003), the effect size is negligible. The residuals versus fitted values plot showed a wide scatter with no structure, reinforcing that the model has very weak explanatory power.In practical terms, perceived stress does not meaningfully predict how much students study each week. For the client, this means that reducing stress alone is unlikely to affect study behaviours; instead, combining wellbeing support with strategies that increase motivation and study skills would be more effective.------------------------------------------------------------------------# **Articles**Naeem, I., Aparicio-Ting, F.E., & Dyjur, P. (2020). Student Stress and Academic Satisfaction: A Mixed Methods Exploratory Study. *International Journal of Innovative Business Strategies*, 6(1), pp.388–395. <https://doi.org/10.20533/ijibs.2046.3626.2020.0050.>Zubair, T., Qazi, U., Faisal, S. M., & Khalid Khan, A. (2024). The impact of study hours on academic performance: A statistical analysis of students’ grades. *International Journal of Multidisciplinary Research and Growth Evaluation*, 5(3), 720–728. <https://doi.org/10.54660/.ijmrge.2024.5.3.720-728>------------------------------------------------------------------------# **Acknowledgements**## **AI Usage Statement**We acknowledge using generative AI tools, specifically ChatGPT 5, to help prepare this report. The tool was used to check grammar, improve the structure, and make our R code clearer. While it supported drafting and expression, all analysis and conclusions are our own work and judgment. ChatGPT also helped explain complex concepts in a straightforward way, making them easier to communicate and understand.## **Meeting Schedule**+-----------+---------------+-----------------------------------------------+-------------------------------------------------------------------------+| **Date** | **Time** | **Attendance** | **Minutes** |+===========+===============+===============================================+=========================================================================+| 05/09 | 13:00 - 14:00 | Astha, Brent, Kris, Jarif, Josephine | - Brainstormed research questions |||||||||| - Established a project timeline |||||||||| - Allocated roles |+-----------+---------------+-----------------------------------------------+-------------------------------------------------------------------------+| 09/09 | 16:00 - 17:00 | Astha, Deevesh, Kris, Jarif | - Reviewed progress on Research Question 1 |||||||||| - Changed Research Question 2 |||||||||| - Decided on the limitations and assumptions |||||||||| - Planned tasks for the next week |+-----------+---------------+-----------------------------------------------+-------------------------------------------------------------------------+| 17/09 | 17:00 - 18:30 | Astha, Brent, Deevesh, Kris, Jarif, Josephine | - Reviewed progress on Research Question 1 and 2 |||||||||| - Worked on the professional standard of report and acknowledgements. |||||||||| - Wrote the executive summary |||||||||| - Split tasks for the presentation |||||||||| - Tabulated group contribution and meeting hours |||||||||| - Compiled everything to R and completed final edit |+-----------+---------------+-----------------------------------------------+-------------------------------------------------------------------------+## **Group Contribution**||||---------------------------------|----------------------------|| **Task** | **Group Member Allocated** || Executive Summary | All || Exploratory Data Analysis | Astha, Jarif || Research Question 1 | Josephine, Kris || Research Question 2 | Deevesh, Kris, Josephine || Articles 1 and 2 | Astha, Brent, Josephine || Acknowledgement | All || Professional Standard of Report | Astha, Brent, Jarif || Presentation | All |## **Professional Standard of Report**We adhered to the shared value of truthfulness and integrity by ensuring our analyses relied solely on the data, avoiding bias or preconceived conclusions. We also followed the ethical principle of transparency, exposing our data cleaning steps, displaying R code used for key findings, and documenting limitations so that our methods and results are clear and reproducible.