##Introduction
This is an open source dataset from Kaggle. The data was collected by
MD Shariful.
https://www.kaggle.com/datasets/shariful07/student-mental-health/discussion/329152
The dataset consists of responses to a survey administered at Iium (International Islamic University Malaysia).There are 16 columns of 1001 responses, which are dated from 2020 to 2023.
In this project, I summarize the mental health characteristics of the students across cohorts (Year 1-4), and explore what could lead to differences observed.
These visualizations describe the the demographics of the survey participants, summarising the gender, presence of mental health support, and academic year of study. Most participants were female, in year 1 and reported not having mental health aupport.
data$YearOfStudy=recode(data$YearOfStudy,'Year 1'='year 1', 'Year 2'='year 2', 'Year 3'='year 3')
fig_dat1<-data %>% select(Course,Gender, HasMentalHealthSupport,YearOfStudy)
ggplot(fig_dat1, aes(x=Gender,fill=Gender))+
geom_bar()+
scale_fill_hue(c=40) +
labs(x="Gender", y="Number of students", title="Figure 1A. Gender of Participants")+
theme(legend.position="none")
fig_dat1$HasMentalHealthSupport=recode(fig_dat1$HasMentalHealthSupport, "1"="Yes", "0"="No")
ggplot(fig_dat1, aes(x=HasMentalHealthSupport,fill=HasMentalHealthSupport))+
geom_bar()+
scale_fill_hue(c=40) +
labs(x="Receives Mental Health Support", y="Number of students", title="Figure 1B. Number of students with Mental Health Support")+
theme(legend.position="none")
ggplot(fig_dat1, aes(x=YearOfStudy,fill=YearOfStudy))+
geom_bar()+
scale_fill_hue(c=40) +
labs(x="Academic Year of Study", y="Number of students", title="Figure 1C. Academic Year of Study of Participants")+
theme(legend.position="none")
These visualizations describe the trend of percentage of students with mental health support resources over the academic year of study. It seems that students in the earlier academic years (in Year 1 and Year 2) are more likely to have mental health support resources.
fig_dat2<-data %>% select(YearOfStudy, HasMentalHealthSupport)
fig_dat2$YearOfStudy=recode(fig_dat2$YearOfStudy,'year 1'=1, 'year 2'=2, 'year 3'=3, 'year 4'=4)
summary_data <- fig_dat2 %>%
group_by(YearOfStudy) %>%
summarise(SupportCount = sum(HasMentalHealthSupport, na.rm = TRUE))
x=table(fig_dat2$YearOfStudy)
#total number of students = 412, 274, 240, 76
summary_data$Total=c(412,274,240,76)
summary_data$Perc=(summary_data$SupportCount/summary_data$Total)*100
ggplot(summary_data, aes(x=YearOfStudy,y=Perc))+
geom_line()+
geom_point()+
labs(x="Academic Year of Study", y="Percentage with mental health support",title="Figure 2. Mental Health Support Over Year of Study")+
ylim(0,20)
These visualizations show the differences in study hours per week and frequency of mental illness symptoms (Episodes of depression, anxiety or panic attack) over academic year of study. With regards to number of hours studied, Year 1 / Year 3 students seem to have clocked in the highest number of hours. It seems that students in Year 1 and Year 4 are also most likely to report a higher occurrence of symptoms weekly.
The high occurrence of symptoms in Year 1 may be resulted from the high number of study hours clocked in weekly for these students. This relationship does not hold for Year 4 students, who do not study as much but report a similar frequency of symptoms. There may be other factors resulting in this trend - one possibility is the impending graduation that may lead to stress and more episodes for Year 4 students.
The low reported frequency of symptoms in Year 2 students (lower in median, Q1, Q3) may be due to the accessibility of mental health support (See Fig 2) and also the lesser hours spent on studying.
fig_dat3<-data %>% select(YearOfStudy,StudyHoursPerWeek,SymptomFrequency_Last7Days, CGPA)
p1=ggplot(fig_dat3, aes(x=YearOfStudy, y=StudyHoursPerWeek))+
geom_boxplot(fill="slateblue",alpha=0.2)+
labs(y="Study Hours Per Week",x="Academic Year of Study", title="Figure 3A. Number of Hours Studied Weekly Over Academic Year of Study")
ggplotly(p1) %>%
layout(margin = list(t = 80)) %>%
config(displayModeBar = FALSE)
p2=ggplot(fig_dat3, aes(x=YearOfStudy, y=SymptomFrequency_Last7Days))+
geom_boxplot(fill="slateblue",alpha=0.2)+
labs(y="Symptom Frequency (every week) ", x="Academic Year of Study", title="Figure 3B. Frequency of Symptoms Over Academic Year of Study")
ggplotly(p2) %>%
layout(margin = list(t = 80)) %>%
config(displayModeBar = FALSE)
There is a strong relationship between number of hours studied and
student performance in terms of GPA.
Notably, Students who study > 8 hours per week report a slightly higher mean frequency of weekly symptoms relating to mental illnesses.
fig_data4= data%>% select(StudyHoursPerWeek, SymptomFrequency_Last7Days,CGPA)
fig_data4$StudyCut=ifelse(fig_data4$StudyHoursPerWeek>8, ">8","<8")
ggplot(fig_data4, aes(x=StudyHoursPerWeek, y=CGPA))+
geom_point()+
geom_smooth(method="loess",se=TRUE)+
labs(x="study Hours Per Week", y="GPA", title=("Relationship between GPA and Study Hours"))
## `geom_smooth()` using formula = 'y ~ x'
summary_data4 <- fig_data4 %>%
group_by(StudyCut) %>%
summarise(MeanSymptoms = mean(SymptomFrequency_Last7Days, na.rm = TRUE))
ggplot(summary_data4, aes(x=StudyCut,y=MeanSymptoms))+
geom_col(width=0.6)+
labs(
x="Study Hours Per Week",
y="Mean Frequency of Symptoms")
```