##Introduction

This is an open source dataset from Kaggle. The data was collected by MD Shariful.
https://www.kaggle.com/datasets/shariful07/student-mental-health/discussion/329152

The dataset consists of responses to a survey administered at Iium (International Islamic University Malaysia).There are 16 columns of 1001 responses, which are dated from 2020 to 2023.

In this project, I summarize the mental health characteristics of the students across cohorts (Year 1-4), and explore what could lead to differences observed.

Demographics

These visualizations describe the the demographics of the survey participants, summarising the gender, presence of mental health support, and academic year of study. Most participants were female, in year 1 and reported not having mental health aupport.

data$YearOfStudy=recode(data$YearOfStudy,'Year 1'='year 1', 'Year 2'='year 2', 'Year 3'='year 3')
fig_dat1<-data %>% select(Course,Gender, HasMentalHealthSupport,YearOfStudy)


ggplot(fig_dat1, aes(x=Gender,fill=Gender))+
  geom_bar()+
  scale_fill_hue(c=40)  +
  labs(x="Gender", y="Number of students", title="Figure 1A. Gender of Participants")+
  theme(legend.position="none")

fig_dat1$HasMentalHealthSupport=recode(fig_dat1$HasMentalHealthSupport, "1"="Yes", "0"="No")

ggplot(fig_dat1, aes(x=HasMentalHealthSupport,fill=HasMentalHealthSupport))+
  geom_bar()+
  scale_fill_hue(c=40)  +
  labs(x="Receives Mental Health Support", y="Number of students", title="Figure 1B. Number of students with Mental Health Support")+
  theme(legend.position="none")

ggplot(fig_dat1, aes(x=YearOfStudy,fill=YearOfStudy))+
  geom_bar()+
  scale_fill_hue(c=40)  +
  labs(x="Academic Year of Study", y="Number of students", title="Figure 1C. Academic Year of Study of Participants")+
  theme(legend.position="none")

Study Hours and Frequency of Reported Mental Illness Symptoms, based on Year of Study

These visualizations show the differences in study hours per week and frequency of mental illness symptoms (Episodes of depression, anxiety or panic attack) over academic year of study. With regards to number of hours studied, Year 1 / Year 3 students seem to have clocked in the highest number of hours. It seems that students in Year 1 and Year 4 are also most likely to report a higher occurrence of symptoms weekly.

The high occurrence of symptoms in Year 1 may be resulted from the high number of study hours clocked in weekly for these students. This relationship does not hold for Year 4 students, who do not study as much but report a similar frequency of symptoms. There may be other factors resulting in this trend - one possibility is the impending graduation that may lead to stress and more episodes for Year 4 students.

The low reported frequency of symptoms in Year 2 students (lower in median, Q1, Q3) may be due to the accessibility of mental health support (See Fig 2) and also the lesser hours spent on studying.

fig_dat3<-data %>% select(YearOfStudy,StudyHoursPerWeek,SymptomFrequency_Last7Days, CGPA)

p1=ggplot(fig_dat3, aes(x=YearOfStudy, y=StudyHoursPerWeek))+
  geom_boxplot(fill="slateblue",alpha=0.2)+
  labs(y="Study Hours Per Week",x="Academic Year of Study", title="Figure 3A. Number of Hours Studied Weekly Over Academic Year of Study")

ggplotly(p1) %>%
  layout(margin = list(t = 80)) %>%
  config(displayModeBar = FALSE)
p2=ggplot(fig_dat3, aes(x=YearOfStudy, y=SymptomFrequency_Last7Days))+
  geom_boxplot(fill="slateblue",alpha=0.2)+
  labs(y="Symptom Frequency (every week) ", x="Academic Year of Study", title="Figure 3B. Frequency of Symptoms Over Academic Year of Study")

ggplotly(p2) %>%
  layout(margin = list(t = 80)) %>%
  config(displayModeBar = FALSE)

Exploratory analyses between study hours clocked and student performance/mental health.

There is a strong relationship between number of hours studied and student performance in terms of GPA.

Notably, Students who study > 8 hours per week report a slightly higher mean frequency of weekly symptoms relating to mental illnesses.

fig_data4= data%>% select(StudyHoursPerWeek, SymptomFrequency_Last7Days,CGPA)
fig_data4$StudyCut=ifelse(fig_data4$StudyHoursPerWeek>8, ">8","<8")

ggplot(fig_data4, aes(x=StudyHoursPerWeek, y=CGPA))+
  geom_point()+
  geom_smooth(method="loess",se=TRUE)+
  labs(x="study Hours Per Week", y="GPA", title=("Relationship between GPA and Study Hours"))
## `geom_smooth()` using formula = 'y ~ x'

summary_data4 <- fig_data4 %>%
  group_by(StudyCut) %>%
  summarise(MeanSymptoms = mean(SymptomFrequency_Last7Days, na.rm = TRUE))

ggplot(summary_data4, aes(x=StudyCut,y=MeanSymptoms))+
  geom_col(width=0.6)+
  labs(
    x="Study Hours Per Week",
    y="Mean Frequency of Symptoms")

```