Key Information: surveyed 100 students — male, female, across all years of study, across all grade levels. Most are doing well academically. Most look fine on the outside.
Source: Kaggle
To understand how mental health affects students and their academic performance.
Which mental health problem is most common among students?
Do female students report more mental health issues than male students?
Do students with mental health issues have lower grades?
How many students with mental health issues actually get help?
Which year of study has the most students in the survey due to mental health issues?
# Load the tidyverse package - this gives us tools for cleaning and visualizing data
library(tidyverse)
# Load skimr - this gives us nice summaries of our dataset
library(skimr)
# Load ggplot2 - this is our main graphing tool (actually included in tidyverse)
library(ggplot2)
# Load scales - this helps format numbers nicely (like percentages)
library(scales)
# Load RColorBrewer - this gives us nice color palettes for graphs
library(RColorBrewer)
# Load plotly - this makes our graphs interactive (hover to see details)
library(plotly)
# Load knitr - this helps create nice tables
library(knitr)
# Load kableExtra - this makes tables look even better
library(kableExtra)
mental_health <- read_csv("C:/Users/PC/Downloads/Student Mental health (1).csv")
# Display the dataset
mental_health
## # A tibble: 101 x 11
## Timestamp `Choose your gender` Age `What is your course?`
## <chr> <chr> <dbl> <chr>
## 1 8/7/2020 12:02 Female 18 Engineering
## 2 8/7/2020 12:04 Male 21 Islamic education
## 3 8/7/2020 12:05 Male 19 BIT
## 4 8/7/2020 12:06 Female 22 Laws
## 5 8/7/2020 12:13 Male 23 Mathemathics
## 6 8/7/2020 12:31 Male 19 Engineering
## 7 8/7/2020 12:32 Female 23 Pendidikan islam
## 8 8/7/2020 12:33 Female 18 BCS
## 9 8/7/2020 12:35 Female 19 Human Resources
## 10 8/7/2020 12:39 Male 18 Irkhs
## # i 91 more rows
## # i 7 more variables: `Your current year of Study` <chr>,
## # `What is your CGPA?` <chr>, `Marital status` <chr>,
## # `Do you have Depression?` <chr>, `Do you have Anxiety?` <chr>,
## # `Do you have Panic attack?` <chr>,
## # `Did you seek any specialist for a treatment?` <chr>
# A preview of the first few values in each column
glimpse(mental_health)
## Rows: 101
## Columns: 11
## $ Timestamp <chr> "8/7/2020 12:02", "8/7/~
## $ `Choose your gender` <chr> "Female", "Male", "Male~
## $ Age <dbl> 18, 21, 19, 22, 23, 19,~
## $ `What is your course?` <chr> "Engineering", "Islamic~
## $ `Your current year of Study` <chr> "year 1", "year 2", "Ye~
## $ `What is your CGPA?` <chr> "3.00 - 3.49", "3.00 - ~
## $ `Marital status` <chr> "No", "No", "No", "Yes"~
## $ `Do you have Depression?` <chr> "Yes", "No", "Yes", "Ye~
## $ `Do you have Anxiety?` <chr> "No", "Yes", "Yes", "No~
## $ `Do you have Panic attack?` <chr> "Yes", "No", "Yes", "No~
## $ `Did you seek any specialist for a treatment?` <chr> "No", "No", "No", "No",~
# Remove the first column (Timestamp) because we don't need it
mental_health <- mental_health[, -1]
# Check that the timestamp is gone
glimpse(mental_health)
## Rows: 101
## Columns: 10
## $ `Choose your gender` <chr> "Female", "Male", "Male~
## $ Age <dbl> 18, 21, 19, 22, 23, 19,~
## $ `What is your course?` <chr> "Engineering", "Islamic~
## $ `Your current year of Study` <chr> "year 1", "year 2", "Ye~
## $ `What is your CGPA?` <chr> "3.00 - 3.49", "3.00 - ~
## $ `Marital status` <chr> "No", "No", "No", "Yes"~
## $ `Do you have Depression?` <chr> "Yes", "No", "Yes", "Ye~
## $ `Do you have Anxiety?` <chr> "No", "Yes", "Yes", "No~
## $ `Do you have Panic attack?` <chr> "Yes", "No", "Yes", "No~
## $ `Did you seek any specialist for a treatment?` <chr> "No", "No", "No", "No",~
Why we do this: The timestamp column just tells us when the survey was filled out. We don’t need it for our analysis.
# First, check what the current column names are
colnames(mental_health)
## [1] "Choose your gender"
## [2] "Age"
## [3] "What is your course?"
## [4] "Your current year of Study"
## [5] "What is your CGPA?"
## [6] "Marital status"
## [7] "Do you have Depression?"
## [8] "Do you have Anxiety?"
## [9] "Do you have Panic attack?"
## [10] "Did you seek any specialist for a treatment?"
# Create a list of new, cleaner column names
clean_names <- c("Gender",
"Age",
"Course",
"Year_of_Study",
"CGPA",
"Marital_Status",
"Depression",
"Anxiety",
"Panic_Attack",
"Treatment")
# Apply these new names to our dataset
colnames(mental_health) <- clean_names
# Show the first 6 rows to verify the new names worked
head(mental_health)
## # A tibble: 6 x 10
## Gender Age Course Year_of_Study CGPA Marital_Status Depression Anxiety
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Female 18 Engineering year 1 3.00~ No Yes No
## 2 Male 21 Islamic ed~ year 2 3.00~ No No Yes
## 3 Male 19 BIT Year 1 3.00~ No Yes Yes
## 4 Female 22 Laws year 3 3.00~ Yes Yes No
## 5 Male 23 Mathemathi~ year 4 3.00~ No No No
## 6 Male 19 Engineering Year 2 3.50~ No No No
## # i 2 more variables: Panic_Attack <chr>, Treatment <chr>
Why we do this: Clean, short column names are easier to type and read in code.
mental_health[sapply(mental_health, is.character)] <-
lapply(mental_health[sapply(mental_health, is.character)], as.factor)
# Check the structure to see if it worked
str(mental_health)
## tibble [101 x 10] (S3: tbl_df/tbl/data.frame)
## $ Gender : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 1 1 2 ...
## $ Age : num [1:101] 18 21 19 22 23 19 23 18 19 18 ...
## $ Course : Factor w/ 48 levels "Accounting","ALA",..: 18 25 9 36 39 18 42 4 22 24 ...
## $ Year_of_Study : Factor w/ 7 levels "year 1","Year 1",..: 1 3 2 5 7 4 3 1 4 1 ...
## $ CGPA : Factor w/ 5 levels "0 - 1.99","2.00 - 2.49",..: 4 4 4 4 4 5 5 5 3 5 ...
## $ Marital_Status: Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 2 1 1 1 ...
## $ Depression : Factor w/ 2 levels "No","Yes": 2 1 2 2 1 1 2 1 1 1 ...
## $ Anxiety : Factor w/ 2 levels "No","Yes": 1 2 2 1 1 1 1 2 1 2 ...
## $ Panic_Attack : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 2 2 1 1 2 ...
## $ Treatment : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
# skim() gives us a comprehensive summary of our dataset
skim(mental_health)
| Name | mental_health |
| Number of rows | 101 |
| Number of columns | 10 |
| _______________________ | |
| Column type frequency: | |
| factor | 9 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| Gender | 0 | 1 | FALSE | 2 | Fem: 75, Mal: 26 |
| Course | 0 | 1 | FALSE | 48 | BCS: 18, Eng: 17, BIT: 10, Bio: 4 |
| Year_of_Study | 0 | 1 | FALSE | 7 | yea: 41, Yea: 19, Yea: 16, yea: 10 |
| CGPA | 0 | 1 | FALSE | 5 | 3.5: 48, 3.0: 43, 0 -: 4, 2.5: 4 |
| Marital_Status | 0 | 1 | FALSE | 2 | No: 85, Yes: 16 |
| Depression | 0 | 1 | FALSE | 2 | No: 66, Yes: 35 |
| Anxiety | 0 | 1 | FALSE | 2 | No: 67, Yes: 34 |
| Panic_Attack | 0 | 1 | FALSE | 2 | No: 68, Yes: 33 |
| Treatment | 0 | 1 | FALSE | 2 | No: 95, Yes: 6 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Age | 1 | 0.99 | 20.53 | 2.5 | 18 | 18 | 19 | 23 | 24 | <U+2587><U+2581><U+2581><U+2581><U+2586> |
mental_health <- na.omit(mental_health)
# Count how many missing values remain (should be 0)
sum(is.na(mental_health))
## [1] 0
summary(mental_health)
## Gender Age Course Year_of_Study
## Female:75 Min. :18.00 BCS :18 year 1:40
## Male :25 1st Qu.:18.00 Engineering :17 Year 1: 2
## Median :19.00 BIT : 9 year 2:10
## Mean :20.53 Biomedical science: 4 Year 2:16
## 3rd Qu.:23.00 KOE : 4 year 3: 5
## Max. :24.00 BENL : 2 Year 3:19
## (Other) :46 year 4: 8
## CGPA Marital_Status Depression Anxiety Panic_Attack Treatment
## 0 - 1.99 : 3 No :84 No :65 No :66 No :67 No :94
## 2.00 - 2.49: 2 Yes:16 Yes:35 Yes:34 Yes:33 Yes: 6
## 2.50 - 2.99: 4
## 3.00 - 3.49:43
## 3.50 - 4.00:48
##
##
# na.rm=TRUE means "ignore missing values when calculating median"
median_age <- median(mental_health$Age, na.rm=TRUE)
mental_health <- mental_health %>%
mutate(
# replace_na() finds NA values in Age and replaces them with median_age
Age = replace_na(Age, median_age),
Age = as.integer(Age)
)
str(mental_health)
## tibble [100 x 10] (S3: tbl_df/tbl/data.frame)
## $ Gender : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 1 1 2 ...
## $ Age : int [1:100] 18 21 19 22 23 19 23 18 19 18 ...
## $ Course : Factor w/ 48 levels "Accounting","ALA",..: 18 25 9 36 39 18 42 4 22 24 ...
## $ Year_of_Study : Factor w/ 7 levels "year 1","Year 1",..: 1 3 2 5 7 4 3 1 4 1 ...
## $ CGPA : Factor w/ 5 levels "0 - 1.99","2.00 - 2.49",..: 4 4 4 4 4 5 5 5 3 5 ...
## $ Marital_Status: Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 2 1 1 1 ...
## $ Depression : Factor w/ 2 levels "No","Yes": 2 1 2 2 1 1 2 1 1 1 ...
## $ Anxiety : Factor w/ 2 levels "No","Yes": 1 2 2 1 1 1 1 2 1 2 ...
## $ Panic_Attack : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 2 2 1 1 2 ...
## $ Treatment : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "na.action")= 'omit' Named int 44
## ..- attr(*, "names")= chr "44"
# Make them all consistent
mental_health <- mental_health %>%
mutate(
Year_of_Study = case_when( # case_when() = "In case of these situations, do this"
# str_detect() looks for patterns in text
# If Year_of_Study contains "1" anywhere, change it to "Year 1" same applies to all
str_detect(Year_of_Study, "1") ~ "Year 1",
str_detect(Year_of_Study, "2") ~ "Year 2",
str_detect(Year_of_Study, "3") ~ "Year 3",
str_detect(Year_of_Study, "4") ~ "Year 4",
# TRUE means "for everything else, keep it as is"
TRUE ~ Year_of_Study
),
# factor() tells R to treat this as a category with a specific order
# levels = c(...) defines the order: Year 1 to Year 4
Year_of_Study = factor(Year_of_Study,
levels = c("Year 1", "Year 2", "Year 3", "Year 4"))
)
mental_health_cleaned <- mental_health %>%
mutate(
Course_Category = case_when(
# STEM fields:
# regex() = pattern matching tool
# ignore_case = TRUE means "Engin" matches "engin", "Engin", "ENGIN"
str_detect(Course, regex("Engin|Engine|BCS|BIT|Math|Bio|Marine|CTS|IT|KOE",
ignore_case = TRUE)) ~ "STEM",
# Social Sciences:
str_detect(Course, regex("Psych|Human|Comm|IRKHS|Ala|BENL|Malcom|Kirk|Dipl|Taasl",
ignore_case = TRUE)) ~ "Social Sciences",
# Law:
str_detect(Course, regex("Law", ignore_case = TRUE)) ~ "Law",
# Religious Studies:
str_detect(Course, regex("Islam|Pendidikan|Usul", ignore_case = TRUE)) ~ "Religious Studies",
# Business/Finance:
str_detect(Course, regex("Acc|Bank|Busin|Econ|Fiqh", ignore_case = TRUE)) ~ "Business/Finance",
# Everything else goes into "Other"
TRUE ~ "Other"
)
)
# Remove any rows that still have missing values
mental_health_final <- mental_health_cleaned %>%
na.omit()
# Count how many students in each category
# mutate() adds a percentage column
mental_health_final %>%
count(Course_Category, sort = TRUE) %>%
mutate(Percentage = (n / sum(n)) * 100) %>% # Calculate percentage
print(n=Inf) # Print all rows
## # A tibble: 6 x 3
## Course_Category n Percentage
## <chr> <int> <dbl>
## 1 STEM 62 62
## 2 Social Sciences 17 17
## 3 Business/Finance 6 6
## 4 Other 6 6
## 5 Religious Studies 6 6
## 6 Law 3 3
# Create a color palette that we'll use consistently across all graphs
mental_health_colors <- c(
"Depression" = "#e74c3c", # Red
"Anxiety" = "#3498db", # Blue
"Panic_Attack" = "#9b59b6", # Purple
"Yes" = "#2ecc71", # Green
"No" = "#95a5a6" # Gray
)
gender_plot <- mental_health %>%
count(Gender) %>%
# ggplot() creates a graph
# aes() = "aesthetics" = what goes where
ggplot(aes(x = Gender, y = n, fill = Gender)) +
# geom_col() creates the bars
geom_col() +
# labs() adds labels to the graph
labs(title = "How Many Students Are Male or Female?",
y = "Number of Students") +
# scale_fill_viridis_d() gives us a nice color scheme
scale_fill_viridis_d()
# ggplotly() converts a static ggplot graph to an interactive one
ggplotly(gender_plot, tooltip = c("x", "y")) %>%
# layout() customizes the appearance
layout(title = list(text = "Meet the Students<br><sup>Click and drag to zoom</sup>"))
story it tells
First let me introduce you to our 100 participants keep this ratio in mind cause it matter how we interpret the rest of our data.
We have way more females which is common in mental health research because women are often more willing to participate in surveys about emotional well being
# 1. Create the plot with a single color and no top labels
age_plot <- mental_health %>%
ggplot(aes(x = factor(Age))) +
# geom_bar with factor(Age) keeps them separated
# width = 0.7 adds a nice gap between bars
geom_bar(fill = "skyblue", color = "darkblue", width = 0.7) +
# Clean theme
theme_minimal() +
# Labels (Removed subtitle to keep it clean)
labs(title = "How Old Are the Students?",
x = "Age",
y = "Number of Students")
# 2. Make it interactive
# tooltip = c("x", "y") ensures you still see the data when hovering
ggplotly(age_plot, tooltip = c("x", "y")) %>%
layout(title = list(text = "How Old Are the Students?"))
Story it tells
Now, How old are these students who participated in our survey?
Research shows that have 18-25 is when 75% mental health conditions first emerge. We’re surveying students at exactly the life stage when mental health issues typically begin.
So we’re looking at a population at peak vulnerability for mental health struggles.
mh_counts <- mental_health %>%
# select() picks specific columns
select(Depression, Anxiety, Panic_Attack) %>%
pivot_longer(everything(), names_to = "Condition", values_to = "Reported") %>%
# filter() keeps only rows where condition is true
filter(Reported == "Yes") %>%
# count() how many of each condition
count(Condition) %>%
# arrange() sorts the data
arrange(desc(n))
# Now create the graph
mh_count_plot <- mh_counts %>%
# reorder(Condition, -n) sorts conditions by count
# -n means negative n (sorts from high to low)
ggplot(aes(x = reorder(Condition, -n), y = n, fill = Condition)) +
# Draw the bars
geom_col() +
# Add count labels on top of each bar
geom_text(aes(label = n), vjust = -0.3, size = 5, fontface = "bold") +
# Add titles
labs(title = "Number of Students with Mental Health Issues",
x = "Mental Health Issue",
y = "Number of Students") +
# Use our custom colors
scale_fill_manual(values = mental_health_colors)
# Make Interactive
ggplotly(mh_count_plot, tooltip = c("x", "y")) %>%
layout(title = list(text = "Number of Students with Mental Health Issues<br><sup>About 1 in 3 students report each issue</sup>"))
Story it tell us Look at these numbers:
35 students said yes to depression 34 students said yes to anxiety 33 students said yes to panic attacks
Notice something? These numbers are almost IDENTICAL. This could actually be a pattern. These are likely the same students experiencing all three Conditions.
Takeaway That’s the reality for 1 in 3 students - experiencing this nightmare combo, often while sitting next to you in class, smiling like everything is fine.
CGPA_order <- c('0 - 1.99', # Failing
'2.00 - 2.49', # Struggling
'2.50 - 2.99', # Just passing
'3.00 - 3.49', # Good
'3.50 - 4.00') # Excellent
# Count students in each CGPA range
CGPA_distribution_data <- mental_health %>%
# count(CGPA) counts students in each CGPA range
count(CGPA) %>%
# factor() with levels = ensures correct order
mutate(CGPA = factor(CGPA, levels = CGPA_order))
# Create the graph
CGPA_plot <- CGPA_distribution_data %>%
ggplot(aes(x = CGPA, y = n, fill = CGPA)) +
# Draw bars
geom_col() +
# Add count labels on bars
geom_text(aes(label = n), vjust = -0.5, size = 5, fontface = "bold") +
# Add titles
labs(title = "How Many Students in Each Grade Range?",
x = "CGPA (Grade Point Average)",
y = "Number of Students") +
# Use minimal theme (clean white background)
theme_minimal() +
# Rotate x-axis labels 45 degrees so they don't overlap
# angle = 45 rotates text
# hjust = 1 aligns text properly after rotation
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Make interactive
ggplotly(CGPA_plot, tooltip = c("x", "y")) %>%
layout(title = list(text = "How Many Students in Each Grade Range?<br><sup>Most students have good grades (above 3.0)</sup>"))
mental_health %>%select(Gender,Anxiety)%>% filter(Anxiety == "Yes")%>% group_by(Gender) %>% count(Gender, name = "Total") %>% summarise(percent = Total/34 * 100)
## # A tibble: 2 x 2
## Gender percent
## <fct> <dbl>
## 1 Female 70.6
## 2 Male 29.4
Story it tells
Only 3 students are in the failing range (0-1.99 GPA) Only 2 students are barely passing (2.00-2.49) 4 students are at 2.50-2.99
But look at these tall bars:
43 students have GPAs of 3.00-3.49 (solid B+ to A- average) 48 students have GPAs of 3.50-4.00 (straight A’s)
91 out of 100 students have good grades. Almost HALF have near-perfect GPAs.
Age_boxplot <- mental_health %>%
# aes(y = Age) puts Age on the y-axis (vertical)
#
ggplot(aes(y = Age)) +
# geom_boxplot() creates the box plot
geom_boxplot(fill = "lightblue", color = "darkblue") +
# Add labels
labs(title = "Age Range of Students", y = "Age") +
# Clean theme
theme_minimal()
# Make interactive
ggplotly(Age_boxplot, tooltip = c("y")) %>%
layout(title = list(text = "Age Range of Students<br><sup>Most students are 18-23 years old</sup>"))
What a box plot shows: shows that most of our student fall between age 18 to age 23.
The dark blue line inside the box: The median the middle point. Half the students are younger than this age, half are older.
The Whiskers These shows the full range of ages which is 24
# Make sure there are no missing values in Marital_Status
mental_health_pie <- mental_health %>%
# So we keep rows where Marital_Status is not missing
filter(!is.na(Marital_Status))
# Calculate counts and percentages for the pie chart
marital_status_counts <- mental_health_pie %>%
# Count how many of each marital status
count(Marital_Status) %>%
# Add two new columns
mutate(
# Calculate percentage: (count ÷ total) × 100
percentage = n / sum(n) * 100,
# Create a label combining status and percentage
# paste0() combines text without spaces
label = paste0(Marital_Status, "\n", round(percentage, 1), "%")
)
pie_chart <- ggplot(marital_status_counts,
# fill=Marital_Status means color by status
aes(x="", y=n, fill = Marital_Status)) +
# stat = "identity" means use the values as-is (don't count)
# width = 1 makes the bar full width
geom_bar(stat = "identity", width = 1) +
# coord_polar() converts bar chart to pie chart!
coord_polar("y", start = 0) +
# theme_void() re (clean pie chart)
theme_void() +
# Add labels
labs(title = "Are Students Married?",
fill = "Marital Status") +
geom_text(aes(label = label),
position = position_stack(vjust = 0.5)) +
# Use our custom colors
scale_fill_manual(values = c("No" = "#95a5a6", "Yes" = "#2ecc71"))
pie_chart
Story it tells The vast majority of students are not married only 16 out of 100 are married
Takeaway 84% of our Survey participants are unmarried, but relationship status is NOT the determining factor in mental health.
# Calculate distribution across years
year_distribution <- mental_health %>%
# Count students in each year
count(Year_of_Study) %>%
# Add percentage and label columns
mutate(
# Calculate percentage
Percentage = n / sum(n) * 100,
# Create a nice label showing count and percentage
# paste0() combines textzz
Label = paste0(n, " students\n(", round(Percentage, 1), "%)")
)
# Create the graph
year_plot <- ggplot(year_distribution,
aes(x = Year_of_Study, y = n, fill = Year_of_Study)) +
# geom_col() creates bars
# width = 0.7 makes bars slightly narrower (leaving gap between them)
geom_col(width = 0.7) +
# Add labels on top of bars
geom_text(aes(label = Label), vjust = -0.5, size = 4, fontface = "bold") +
# Add titles and labels
labs(
title = "Which Year Groups Took This Survey?",
subtitle = "This shows survey participation, not student dropout",
x = "Year of Study",
y = "Number of Students Who Took Survey"
) +
# palette = "Set2" is a predefined color scheme from RColorBrewer
scale_fill_brewer(palette ="Set2") +
# Expand y-axis to make room for labels on top
# mult = c(0, 0.15) means 0% extra at bottom, 15% extra at top
scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
# Clean theme
theme_minimal() +
# Customize specific elements
theme(
# Make title bold and bigger
plot.title = element_text(face = "bold", size = 16),
# Make x-axis text bold, size 12, not rotated
axis.text.x = element_text(size = 12, face = "bold"),
# Hide legend (since x-axis labels are self-explanatory)
legend.position = "none"
)
# Make interactive
ggplotly(year_plot, tooltip = c("x", "y")) %>%
layout(title = list(text = "Which Year Groups Took This Survey?<br><sup>More Year 1 students participated in this survey</sup>"))
Story it tells Now lets look at which students participated in our Survey
year 1 = most responses year 2 = lost 16 students year 3 = lost 2 year 4 = lost 10 more its not a progressive data
Year 1 students make up almost half our survey Why? Two reasons: FIRST: Year 1 students are most willing to talk about mental health. They havent learned to hide it yet.
Second Most likely to respond
gender_mh_props <- mental_health %>%
# Select only the columns we need
select(Gender, Depression, Anxiety, Panic_Attack) %>%
# Reshape from wide to long format
pivot_longer(cols = c(Depression, Anxiety, Panic_Attack),
names_to = "Condition", # Column names → "Condition" column
values_to = "Reported") %>% # Cell values → "Reported" column
# Keep only "Yes" responses
filter(Reported == "Yes") %>%
# Count by both Gender AND Condition
# This gives us: Female-Depression, Female-Anxiety, Male-Depression, etc.
group_by(Gender, Condition) %>%
summarise(Count = n(), .groups = 'drop') %>% # .groups='drop' removes grouping after
# This adds a "Total" column showing total students in each gender
left_join(mental_health %>% count(Gender, name= "Total"),
by = "Gender") %>%
# Proportion = (Students with condition) ÷ (Total students in that gender)
mutate(proportions = Count / Total)
# graph
gender_mh_plot <- gender_mh_props %>%
# x = Condition (Depression, Anxiety, Panic_Attack)
# y = proportions
ggplot(aes(x = Condition, y = proportions, fill = Gender)) +
# Draw the bars
# position_dodge() places the bars side-by-side instead of stacking them on top of each other
# width = 0.8 creates a small gap btw d group bars
geom_col(position = position_dodge2(width = 0.9, preserve = "single")) +
# Add descriptions
labs(title = "Do Males and Females Report Mental Health Issues Equally?",
x = "Mental Health Issue",
y = "Percentage Within Gender Group") +
# "Set1" is a predefined color palette from RColorbrewer
scale_fill_brewer(palette = "Set1") +
# Format the Y-axis
# limits = c(0,1) forces the axis to go from 0 to 1 (0% to 100%)
# labels = scales:: percent converts decimals(0,35 to percent)
scale_y_continuous(limits = c(0,1), labels = scales::percent)
# Interactive
ggplotly(gender_mh_plot, tooltip = c("x", "y", "fill")) %>%
layout(title = list(text = "Do Males and Females Report Mental Health Issues Equally?<br><sup>Both genders report similar rates</sup>"))
What the story tells
While we had more female participants, the rate of mental health issues is actually higher among our male participants. This suggests that while fewer men took the survey, a larger proportion of those who did are experiencing significant distress.
Important Note: Some grade ranges have very few students, so percentages can be misleading.
# By default, R sorts alphabetically. We want them low-to-high (
CGPA_order <- c('0 - 1.99', '2.00 - 2.49', '2.50 - 2.99','3.00 - 3.49','3.50 - 4.00')
CGPA_depression_data <- mental_health %>%
#Group by each grade range
group_by(CGPA) %>%
# Calculate the summary statistics for each group
summarise(
Total_Students = n(), # Total number of students in this grade range
Students_with_Depression = sum(Depression == "Yes"), # Count total students in this grade range
# Calcualte the percentage
# We use this to compare groups of different sizes fairly
Percentage = (Students_with_Depression / Total_Students) * 100,
Label = paste0(Students_with_Depression, " out of ", Total_Students, "\n(", round(Percentage, 1), "%)"),
.groups = 'drop' # Ungroup after summarizing to keep data clean
) %>%
# Apply the custom order we defined
mutate(CGPA = factor(CGPA, levels = CGPA_order))
# Create the plot
CGPA_depression_plot <- CGPA_depression_data %>%
ggplot(aes(x = CGPA, y = Percentage, fill = CGPA)) +
# bars
geom_col() +
# vjust = -0.2 moves the text slightly above the bar
geom_text(aes(label = Label), vjust = -0.2, size = 3.5) +
# Add titles and axis labels
labs(title = "Depression Rates by Grade Range",
subtitle = "Numbers show: students with depression out of total students in that grade range",
x = "CGPA (Grade Range)",
y = "Percentage with Depression") +
theme_minimal() +
# Rotate x-axis text by 45 degrees so they dont overlap
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_brewer(palette = "Reds") + # use REDS color palette
# Fix Y-axis to always show 0 to 100
scale_y_continuous(limits = c(0, 100))
# Make it Interactive
ggplotly(CGPA_depression_plot, tooltip = c("x", "y")) %>%
layout(title = list(text= "Depression Rates by Grade Range<br><sup>Note: Small sample sizes in some ranges (e.g., only 4 students in 2.50-2.99)</sup>"))
Story it tells
Does your GPA affect your mental health?
The small groups (like 2.250 - 2.99) show high percentages, but its based on very few students. careful about making big conclusions
Second: Even in larger groups with good grades, depression rates are high. students with 3.0+ GPA are struggling too.
CGPA_anxiety_data <- mental_health %>%
group_by(CGPA) %>% # Group by each grade range
# summary statistics
summarise(
Total_Students = n(), # Count total students in this bracket
Students_with_Anxiety = sum(Anxiety == "Yes"), # Count how many said "Yes" to anxiety
# Calculate Percentage
# This normalizes the data so we can compare small groups vs large groups
Percentage = (Students_with_Anxiety / Total_Students) * 100,
# Create d text label for the bar: "Count out of Total (percent)"
Label = paste0(Students_with_Anxiety, " out of ", Total_Students, "\n(", round(Percentage, 1), "%)"),
.groups = 'drop'
) %>%
# Apply the logical order(0-1.99 -> 3.50-4.00) so d grapg isnt mixed up
mutate(CGPA = factor(CGPA, levels = CGPA_order))
# Create d plot
anxiety_cgpa_plot <- ggplot(CGPA_anxiety_data, aes(x = CGPA, y = Percentage, fill = CGPA)) +
# Draw the bars
geom_col() +
# Add d labels on top of the bars (-0.2 vjust moves them up slightly)
geom_text(aes(label = Label), vjust = -0.2, size = 3.5) +
labs(title = "Anxiety Rates by Grade Range",
subtitle = "Numbers show: students with anxiety out of total students in that grade range",
x = "CGPA (Grade Range)",
y = "Percentage with Anxiety") +
theme_minimal() +
# Rotate X-axis labels 45 degrees for readability
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
# Use "Blues" palette
scale_fill_brewer(palette = "Blues") +
# Fix y-axis to 0-100%
scale_y_continuous(limits = c(0, 100))
# MAke it interactive
ggplotly(anxiety_cgpa_plot, tooltip = c("x", "y")) %>%
layout(title = list(text = "Anxiety Rates by Grade Range<br><sup>Anxiety appears slightly higher in students with better grades</sup>"))
Story it tells anxiety is highest in our best students?
Yes! and thus makes sense when you think about it
High-achieving students experience: ** i must maintain my GPA pressure
** Perfectionism(“Anything less than A is a failure”)
** Fear of slipping from the top
** “My GPA determines my future” stress
CGPA_panic_data <- mental_health %>%
group_by(CGPA) %>% # sort students into groups based on their CGPA
summarise(
Total_Students = n(), # How many Students are in each grade
Students_with_Panic = sum(Panic_Attack == "Yes"), #Count only those who said "yes" to panic attack
Percentage = (Students_with_Panic / Total_Students) * 100, # calculates the rates
# Creates a text label like "5 out of 20" shown on top of bars
Label = paste0(Students_with_Panic, " out of ", Total_Students, "\n(", round(Percentage, 1), "%)"),
.groups = 'drop' # Cleans up the grouping so it doesnt interfere with later steps
) %>%
# Arranges the CGPA in logical order
mutate(CGPA = factor(CGPA, levels = CGPA_order))
panic_cgpa_plot <- ggplot(CGPA_panic_data, aes(x = CGPA, y = Percentage, fill = CGPA)) + geom_col() + # Draws the bars(Columns)
geom_text(aes(label = Label), vjust = -0.2, size = 3.5) +
# Add labels
labs(title = "Panic Attack Rates by Grade Range",
subtitle = "Numbers show: students with panic attacks out of total students in that grade range",
x = "CGPA (Grade Range)",
y = "Percentage with Panic Attacks") +
theme_minimal() + # Clean theme
# slants x-axis labels so they dont overlap
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_brewer(palette = "Purples") + # Color the bars in shades of purple
scale_y_continuous(limits = c(0, 100)) # Fixes the y-axis from 0% to 100% (stops bars from looking "exaggerated")
ggplotly(panic_cgpa_plot, tooltip = c("x", "y")) %>%
layout(title = list(text = "Panic Attack Rates by Grade Range<br><sup>Pattern similar to depression rates</sup>"))
Story it tells Again, small sample sizes in the lower ranges make interpretation tricky. But look at our top performers. 3.50-4.00 CGPA: 40% experience panic attacks (19 out of 48 students)
Let me describe what a panic attack feels like:
** Cant breathe ** Heart racing ** Feeling like you’re dying ** Happen during exams, presentations or out of nowhere
Takeaway These aren’t just ‘stressed students’. these are students experiencing emergency-level mental health crisis
mental_health_conditions <- mental_health %>%
filter(Depression == "Yes" | Anxiety == "Yes" | Panic_Attack == "Yes") %>% # keep only conditions who said "Yes"
# Keep only the columns we need for the chart
select(Depression, Anxiety, Panic_Attack, Treatment) %>%
#Makes the data "long" instead of "wide"
pivot_longer(cols = c(Depression, Anxiety, Panic_Attack), names_to = "Condition", values_to = "Status") %>% # turn three columns to (Condition and status)
# Only focus on the "Yes"
filter(Status == "Yes")
treatment_gap <- mental_health_conditions %>%
# Group by the issue and whether they got treatment (Yes/No)
group_by(Condition, Treatment) %>%
summarise(Count = n(), .groups = 'drop') %>%
# Group again by conditions to calculate the percentage within the issue
group_by(Condition) %>%
mutate(
Total = sum(Count), # Total people with that specific issue
Percentage = Count / Total * 100,
# Creayes the text label: "15 students (25.5%)"
Label = paste0(Count, " students\n(", round(Percentage, 1), "%)")
)
# "Stack" puts "Yes" and "NO" on top each other to reach 100%
treatment_plot <- ggplot(treatment_gap, aes(x = Condition, y = Percentage, fill = Treatment)) +
geom_col(position = "stack", width = 0.6, color = "black") +
# Adds the text labels (the numbers/percentages) inside the bars
# vjust = 0.5 centers the text in the middle of each colored segment
geom_text(aes(label = Label), position = position_stack(vjust = 0.5), size = 4, fontface = "bold", color = "white") +
# Add titles
labs(
title = "How Many Students with Mental Health Issues Got Professional Help?",
subtitle = "Out of students who said 'Yes' to having each issue",
x = "Mental Health Issue",
y = "Percentage of Students",
fill = "Got Help?"
) + # manually sets the colors:
scale_fill_manual(values = c("No" = "#95a5a6", "Yes"= "#2ecc71")) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5, margin = margin(b = 20)),
legend.position = "top" # Moves the "Yes/No" legend to the top of the chart
)
ggplotly(treatment_plot, tooltip = c("x","y","fill")) %>%
layout(title = list(text = "How Many Students Got Professional Help?<br><sup>Most students with mental health issues are not getting help</sup>"))
Story it tells Out of roughly 100 students with mental health issues, only about 13 got professional help. 9 out of 10 students with mental health issues are NOT getting any support. Why?
They think it’s not ‘serious enough’
‘Other people have it worse’ They don’t know where to go ‘Is there even a counseling center?’ They’re ashamed ‘What if people think I’m weak?’ They fear judgment ‘What if this goes on my record?’
Takeaway The treatment gap is catastrophic. most students with mental health issues are NOT getting help
This is our crisis. Not that students are struggling-humans struggle, that’s normal. The crisis is that we’re letting them struggle alone
# Prepare the data
treatment_cgpa_data <- mental_health %>%
# Ensure we only look at d valid grade ranges we defined earlier
filter(CGPA %in% CGPA_order) %>%
group_by(CGPA, Treatment) %>%
# Group by both Grade AND Treatment status (YES/No)
summarise(Count = n(), .groups = 'drop') %>%
# calculate percentages within each Grade group
group_by(CGPA) %>%
mutate(
Total_in_Group = sum(Count),
# Calculate what % of dis gade range sought treatment
Percentage = Count / Total_in_Group * 100,
# Create label: "5 student /n (12.5%)"
Label = paste0(Count, " students\n(", round(Percentage, 1), "%)")
) %>%
# Apply logical order to grades
mutate(CGPA = factor(CGPA, levels = CGPA_order))
# Create the Stacked Bar Chart
treatment_cgpa_plot <- ggplot(treatment_cgpa_data, aes(x = CGPA, y = Percentage, fill = Treatment)) +
# position = "fill" stacks the bars to reach 100% height
# This makes every bar the same height so we can compare d "split easily
geom_col(position = "fill", width = 0.7, color = "black") +
geom_text(aes(label = Label), position = position_fill(vjust = 0.5), size = 3.5, fontface = "bold", color = "white") +
# Labels and Titles
labs(
title = "Treatment Rates Across Grade Ranges",
subtitle = "Are students in different grade ranges more likely to get help?",
x = "CGPA (Grade Range)",
y = "Percentage of Students",
fill = "Got Help?"
) +
# Format y-axis as percenatge (0% to 100%)
scale_y_continuous(labels = scales::percent) +
# Custom Colors:
scale_fill_manual(values = c("No" = "#34495e", "Yes" = "#2ecc71")) +
# theme adjustments
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
legend.position = "top"
)
# Interactive plot
ggplotly(treatment_cgpa_plot, tooltip = c("x", "y", "fill")) %>%
layout(title = list(text = "Treatment Rates by Grade Range<br><sup>Treatment gap exists across all grade levels</sup>"))
Story it tells Does seeking help vary by GPA?
Two Concerning Groups:
Failing Students: Not getting help because they have given up or dont know where to start
High Achievers: Not getting help because they “My grades are good, so i can’t be that bad”
But we just saw - 38% of top students have anxiety!
their success hides their pain
# Prepare d data
course_mh_plot <- mental_health_final %>%
# Specific columns we need for this analysis
select(Course_Category, Depression, Anxiety, Panic_Attack) %>%
# Reshape data: Convert d 3 condition columns into one "Condition" column
# this make it possible to plot them all on d same graph
pivot_longer(cols = c(Depression, Anxiety, Panic_Attack),
names_to = "Condition", values_to = "Status") %>%
# Keep only the students who said "Yes"
# We are interested in the presence of mental health issue, not the absence
filter(Status == "Yes") %>%
# group by field of study and the specific mental health condition
group_by(Course_Category, Condition) %>%
# Count how many students fall in each group
summarise(Count = n(), .groups = 'drop') %>%
# Create d graph
# x = Field of Study, y = Count (Number of students), fill = condition(color)
ggplot(aes(x = Course_Category, y = Count, fill = Condition)) +
# Draw bars side-by-side so we can compare conditions
geom_col(position = "dodge") +
# Add labels
labs(title = "Mental Health Issues by Field of Study",
x = "Field of Study",
y = "Number of Students") +
# Clean Theme
theme_minimal() +
# Rotate x-axis labels 45 degrees so they are readable and they dont overlap
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Make it interactive
ggplotly(course_mh_plot, tooltip= c("x","y","fill")) %>%
layout(title = list(text = "Mental Health Issues by Field of Study<br><sup>All fields show mental health challenges</sup>"))
Story it Tells Finally, does your major matter?
STEM students show the highest absolute numbers - but thats because STEM makes up to 62% of our sample
STEM ** Heavy workload ** Difficult exams ** Competitive culture
Takeaway But looking across the chart - Business/Finance, Social Science, Law, Religious Studies Every field shows mental struggles
Answer: All three are equally common.
These numbers are almost identical, suggesting students often experience multiple issues at once.
Answer: Rates are similar, with small differences. but there is a some level of concern as the male tends to be higher considering their little participation in the survey
Answer: The relationship is complex, not simple.
Answer: Very few - this is our biggest concern.
Answer: - Year 1 had the most participants/Cases. - But Rates are consistent across all years
"Let’s come back to where we started.
Remember that student- heart racing, two exams today, can’t sleep, drowning in silence?
Here’s what we now know:
35 students in our survey have depression. 34 have anxiety. 33 have panic attacks.
And 9 out of 10 of them aren’t getting any professional help.
But here’s the most important thing i want us to take away from this presentation:
This doesn’t have to be a story.
We have the solutions: - Drop-in counseling hours - Campus awareness campaigns - Year 1 wellness check-ins - Expanded services
These aren’t complicated. They’re not impossibly expensive. They work.
The question isn’t “Can we afford to do this?”
The question is: “Can we afford NOT to?”
To administrators: Every money spent on mental health saves money in retention and prevents crises.
To faculty: You can be the person who notices and helps a student find support.
To students: If you’re one of the 35, 34, or 33—you are not alone. Getting help is strength, not weakness.
35 students with depression are counting on us. 34 students with anxiety need us to act. 33 students with panic attacks are waiting. Let’s not make them wait any longer.