Introduction

Key Information: surveyed 100 students — male, female, across all years of study, across all grade levels. Most are doing well academically. Most look fine on the outside.

Source: Kaggle

Aim

To understand how mental health affects students and their academic performance.

Research Questions

Which mental health problem is most common among students?
Do female students report more mental health issues than male students?
Do students with mental health issues have lower grades?
How many students with mental health issues actually get help?
Which year of study has the most students in the survey due to mental health issues?

Data Cleaning

Load Packages

# Load the tidyverse package - this gives us tools for cleaning and visualizing data
library(tidyverse)

# Load skimr - this gives us nice summaries of our dataset
library(skimr)

# Load ggplot2 - this is our main graphing tool (actually included in tidyverse)
library(ggplot2)

# Load scales - this helps format numbers nicely (like percentages)
library(scales)

# Load RColorBrewer - this gives us nice color palettes for graphs
library(RColorBrewer) 

# Load plotly - this makes our graphs interactive (hover to see details)
library(plotly)

# Load knitr - this helps create nice tables
library(knitr)

# Load kableExtra - this makes tables look even better
library(kableExtra)

Import Dataset

mental_health <- read_csv("C:/Users/PC/Downloads/Student Mental health (1).csv")

# Display the dataset 
mental_health

## # A tibble: 101 x 11
##    Timestamp      `Choose your gender`   Age `What is your course?`
##    <chr>          <chr>                <dbl> <chr>                 
##  1 8/7/2020 12:02 Female                  18 Engineering           
##  2 8/7/2020 12:04 Male                    21 Islamic education     
##  3 8/7/2020 12:05 Male                    19 BIT                   
##  4 8/7/2020 12:06 Female                  22 Laws                  
##  5 8/7/2020 12:13 Male                    23 Mathemathics          
##  6 8/7/2020 12:31 Male                    19 Engineering           
##  7 8/7/2020 12:32 Female                  23 Pendidikan islam      
##  8 8/7/2020 12:33 Female                  18 BCS                   
##  9 8/7/2020 12:35 Female                  19 Human Resources       
## 10 8/7/2020 12:39 Male                    18 Irkhs                 
## # i 91 more rows
## # i 7 more variables: `Your current year of Study` <chr>,
## #   `What is your CGPA?` <chr>, `Marital status` <chr>,
## #   `Do you have Depression?` <chr>, `Do you have Anxiety?` <chr>,
## #   `Do you have Panic attack?` <chr>,
## #   `Did you seek any specialist for a treatment?` <chr>

Quick Look at the Data

# A preview of the first few values in each column
glimpse(mental_health)

## Rows: 101
## Columns: 11
## $ Timestamp                                      <chr> "8/7/2020 12:02", "8/7/~
## $ `Choose your gender`                           <chr> "Female", "Male", "Male~
## $ Age                                            <dbl> 18, 21, 19, 22, 23, 19,~
## $ `What is your course?`                         <chr> "Engineering", "Islamic~
## $ `Your current year of Study`                   <chr> "year 1", "year 2", "Ye~
## $ `What is your CGPA?`                           <chr> "3.00 - 3.49", "3.00 - ~
## $ `Marital status`                               <chr> "No", "No", "No", "Yes"~
## $ `Do you have Depression?`                      <chr> "Yes", "No", "Yes", "Ye~
## $ `Do you have Anxiety?`                         <chr> "No", "Yes", "Yes", "No~
## $ `Do you have Panic attack?`                    <chr> "Yes", "No", "Yes", "No~
## $ `Did you seek any specialist for a treatment?` <chr> "No", "No", "No", "No",~

Data Cleaning

Remove Timestamp Column

# Remove the first column (Timestamp) because we don't need it
mental_health <- mental_health[, -1]

# Check that the timestamp is gone
glimpse(mental_health)

## Rows: 101
## Columns: 10
## $ `Choose your gender`                           <chr> "Female", "Male", "Male~
## $ Age                                            <dbl> 18, 21, 19, 22, 23, 19,~
## $ `What is your course?`                         <chr> "Engineering", "Islamic~
## $ `Your current year of Study`                   <chr> "year 1", "year 2", "Ye~
## $ `What is your CGPA?`                           <chr> "3.00 - 3.49", "3.00 - ~
## $ `Marital status`                               <chr> "No", "No", "No", "Yes"~
## $ `Do you have Depression?`                      <chr> "Yes", "No", "Yes", "Ye~
## $ `Do you have Anxiety?`                         <chr> "No", "Yes", "Yes", "No~
## $ `Do you have Panic attack?`                    <chr> "Yes", "No", "Yes", "No~
## $ `Did you seek any specialist for a treatment?` <chr> "No", "No", "No", "No",~

Why we do this: The timestamp column just tells us when the survey was filled out. We don’t need it for our analysis.

Renaming Column Names

# First, check what the current column names are
colnames(mental_health)

##  [1] "Choose your gender"                          
##  [2] "Age"                                         
##  [3] "What is your course?"                        
##  [4] "Your current year of Study"                  
##  [5] "What is your CGPA?"                          
##  [6] "Marital status"                              
##  [7] "Do you have Depression?"                     
##  [8] "Do you have Anxiety?"                        
##  [9] "Do you have Panic attack?"                   
## [10] "Did you seek any specialist for a treatment?"

# Create a list of new, cleaner column names
clean_names <- c("Gender",           
                 "Age",             
                 "Course",            
                 "Year_of_Study",     
                 "CGPA",              
                 "Marital_Status",  
                 "Depression",        
                 "Anxiety",           
                 "Panic_Attack",      
                 "Treatment")         

# Apply these new names to our dataset
colnames(mental_health) <- clean_names

# Show the first 6 rows to verify the new names worked
head(mental_health)

## # A tibble: 6 x 10
##   Gender   Age Course      Year_of_Study CGPA  Marital_Status Depression Anxiety
##   <chr>  <dbl> <chr>       <chr>         <chr> <chr>          <chr>      <chr>  
## 1 Female    18 Engineering year 1        3.00~ No             Yes        No     
## 2 Male      21 Islamic ed~ year 2        3.00~ No             No         Yes    
## 3 Male      19 BIT         Year 1        3.00~ No             Yes        Yes    
## 4 Female    22 Laws        year 3        3.00~ Yes            Yes        No     
## 5 Male      23 Mathemathi~ year 4        3.00~ No             No         No     
## 6 Male      19 Engineering Year 2        3.50~ No             No         No     
## # i 2 more variables: Panic_Attack <chr>, Treatment <chr>

Why we do this: Clean, short column names are easier to type and read in code.

Convert Text to Categories

mental_health[sapply(mental_health, is.character)] <- 
  lapply(mental_health[sapply(mental_health, is.character)], as.factor)

# Check the structure to see if it worked
str(mental_health)

## tibble [101 x 10] (S3: tbl_df/tbl/data.frame)
##  $ Gender        : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 1 1 2 ...
##  $ Age           : num [1:101] 18 21 19 22 23 19 23 18 19 18 ...
##  $ Course        : Factor w/ 48 levels "Accounting","ALA",..: 18 25 9 36 39 18 42 4 22 24 ...
##  $ Year_of_Study : Factor w/ 7 levels "year 1","Year 1",..: 1 3 2 5 7 4 3 1 4 1 ...
##  $ CGPA          : Factor w/ 5 levels "0 - 1.99","2.00 - 2.49",..: 4 4 4 4 4 5 5 5 3 5 ...
##  $ Marital_Status: Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 2 1 1 1 ...
##  $ Depression    : Factor w/ 2 levels "No","Yes": 2 1 2 2 1 1 2 1 1 1 ...
##  $ Anxiety       : Factor w/ 2 levels "No","Yes": 1 2 2 1 1 1 1 2 1 2 ...
##  $ Panic_Attack  : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 2 2 1 1 2 ...
##  $ Treatment     : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...

Dataset Summary

# skim() gives us a comprehensive summary of our dataset
skim(mental_health)

Data summary
Name	mental_health
Number of rows	101
Number of columns	10
_______________________
Column type frequency:
factor	9
numeric	1
________________________
Group variables	None

Variable type: factor

skim_variable	complete_rate	ordered	n_unique	top_counts
Gender	1	FALSE	2	Fem: 75, Mal: 26
Course	1	FALSE	48	BCS: 18, Eng: 17, BIT: 10, Bio: 4
Year_of_Study	1	FALSE	7	yea: 41, Yea: 19, Yea: 16, yea: 10
CGPA	1	FALSE	5	3.5: 48, 3.0: 43, 0 -: 4, 2.5: 4
Marital_Status	1	FALSE	2	No: 85, Yes: 16
Depression	1	FALSE	2	No: 66, Yes: 35
Anxiety	1	FALSE	2	No: 67, Yes: 34
Panic_Attack	1	FALSE	2	No: 68, Yes: 33
Treatment	1	FALSE	2	No: 95, Yes: 6

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Age	1	0.99	20.53	2.5	18	18	19	23	24	<U+2587><U+2581><U+2581><U+2581><U+2586>

Handle Missing Values

mental_health <- na.omit(mental_health)

# Count how many missing values remain (should be 0)
sum(is.na(mental_health))

## [1] 0

summary(mental_health)

##     Gender        Age                       Course   Year_of_Study
##  Female:75   Min.   :18.00   BCS               :18   year 1:40    
##  Male  :25   1st Qu.:18.00   Engineering       :17   Year 1: 2    
##              Median :19.00   BIT               : 9   year 2:10    
##              Mean   :20.53   Biomedical science: 4   Year 2:16    
##              3rd Qu.:23.00   KOE               : 4   year 3: 5    
##              Max.   :24.00   BENL              : 2   Year 3:19    
##                              (Other)           :46   year 4: 8    
##           CGPA    Marital_Status Depression Anxiety  Panic_Attack Treatment
##  0 - 1.99   : 3   No :84         No :65     No :66   No :67       No :94   
##  2.00 - 2.49: 2   Yes:16         Yes:35     Yes:34   Yes:33       Yes: 6   
##  2.50 - 2.99: 4                                                            
##  3.00 - 3.49:43                                                            
##  3.50 - 4.00:48                                                            
##                                                                            
##

# na.rm=TRUE means "ignore missing values when calculating median"
median_age <- median(mental_health$Age, na.rm=TRUE)


mental_health <- mental_health %>%  
  mutate(
    
    # replace_na() finds NA values in Age and replaces them with median_age
    Age = replace_na(Age, median_age),
    
    Age = as.integer(Age)
  )

str(mental_health)

## tibble [100 x 10] (S3: tbl_df/tbl/data.frame)
##  $ Gender        : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 1 1 2 ...
##  $ Age           : int [1:100] 18 21 19 22 23 19 23 18 19 18 ...
##  $ Course        : Factor w/ 48 levels "Accounting","ALA",..: 18 25 9 36 39 18 42 4 22 24 ...
##  $ Year_of_Study : Factor w/ 7 levels "year 1","Year 1",..: 1 3 2 5 7 4 3 1 4 1 ...
##  $ CGPA          : Factor w/ 5 levels "0 - 1.99","2.00 - 2.49",..: 4 4 4 4 4 5 5 5 3 5 ...
##  $ Marital_Status: Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 2 1 1 1 ...
##  $ Depression    : Factor w/ 2 levels "No","Yes": 2 1 2 2 1 1 2 1 1 1 ...
##  $ Anxiety       : Factor w/ 2 levels "No","Yes": 1 2 2 1 1 1 1 2 1 2 ...
##  $ Panic_Attack  : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 2 2 1 1 2 ...
##  $ Treatment     : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "na.action")= 'omit' Named int 44
##   ..- attr(*, "names")= chr "44"

Fix Year of Study Labels

# Make them all consistent

mental_health <- mental_health %>%  
  mutate(                            
    Year_of_Study = case_when(       # case_when() = "In case of these situations, do this"
      
      # str_detect() looks for patterns in text
      # If Year_of_Study contains "1" anywhere, change it to "Year 1" same applies to all
      str_detect(Year_of_Study, "1") ~ "Year 1",
      
      str_detect(Year_of_Study, "2") ~ "Year 2",
      
      str_detect(Year_of_Study, "3") ~ "Year 3",
      
      str_detect(Year_of_Study, "4") ~ "Year 4",
      
      # TRUE means "for everything else, keep it as is"
      TRUE ~ Year_of_Study
    ),
    
    # factor() tells R to treat this as a category with a specific order
    # levels = c(...) defines the order: Year 1 to Year 4
    Year_of_Study = factor(Year_of_Study, 
                           levels = c("Year 1", "Year 2", "Year 3", "Year 4"))
  )

Group Courses into Fields

mental_health_cleaned <- mental_health %>%  
  mutate(                                 
    Course_Category = case_when(            
      
      # STEM fields:
      # regex() = pattern matching tool
      # ignore_case = TRUE means "Engin" matches "engin", "Engin", "ENGIN"
      str_detect(Course, regex("Engin|Engine|BCS|BIT|Math|Bio|Marine|CTS|IT|KOE", 
                               ignore_case = TRUE)) ~ "STEM",
      
      # Social Sciences:
      str_detect(Course, regex("Psych|Human|Comm|IRKHS|Ala|BENL|Malcom|Kirk|Dipl|Taasl", 
                               ignore_case = TRUE)) ~ "Social Sciences",
      
      # Law: 
      str_detect(Course, regex("Law", ignore_case = TRUE)) ~ "Law",
      
      # Religious Studies: 
      str_detect(Course, regex("Islam|Pendidikan|Usul", ignore_case = TRUE)) ~ "Religious Studies",
      
      # Business/Finance: 
      str_detect(Course, regex("Acc|Bank|Busin|Econ|Fiqh", ignore_case = TRUE)) ~ "Business/Finance",
      
      # Everything else goes into "Other"
      TRUE ~ "Other" 
    )
  )

# Remove any rows that still have missing values
mental_health_final <- mental_health_cleaned %>%
  na.omit()

# Count how many students in each category
# mutate() adds a percentage column
mental_health_final %>% 
  count(Course_Category, sort = TRUE) %>% 
  mutate(Percentage = (n / sum(n)) * 100) %>%  # Calculate percentage
  print(n=Inf)  # Print all rows

## # A tibble: 6 x 3
##   Course_Category       n Percentage
##   <chr>             <int>      <dbl>
## 1 STEM                 62         62
## 2 Social Sciences      17         17
## 3 Business/Finance      6          6
## 4 Other                 6          6
## 5 Religious Studies     6          6
## 6 Law                   3          3

Choosing Colors for Visualization

# Create a color palette that we'll use consistently across all graphs
mental_health_colors <- c(
  "Depression" = "#e74c3c",    # Red 
  "Anxiety" = "#3498db",       # Blue 
  "Panic_Attack" = "#9b59b6",  # Purple 
  "Yes" = "#2ecc71",           # Green 
  "No" = "#95a5a6"             # Gray 
)

Univariate Analysis

Gender Distribution

gender_plot <- mental_health %>%  
  count(Gender) %>%               
  
  # ggplot() creates a graph
  # aes() = "aesthetics" = what goes where
  ggplot(aes(x = Gender, y = n, fill = Gender)) +
  
  # geom_col() creates the bars
  geom_col() +
  
  # labs() adds labels to the graph
  labs(title = "How Many Students Are Male or Female?", 
       y = "Number of Students") +          
  
  # scale_fill_viridis_d() gives us a nice color scheme
  scale_fill_viridis_d()

# ggplotly() converts a static ggplot graph to an interactive one
ggplotly(gender_plot, tooltip = c("x", "y")) %>%
  
  # layout() customizes the appearance
  layout(title = list(text = "Meet the Students<br><sup>Click and drag to zoom</sup>"))

story it tells

First let me introduce you to our 100 participants keep this ratio in mind cause it matter how we interpret the rest of our data.

We have way more females which is common in mental health research because women are often more willing to participate in surveys about emotional well being

Age Distribution

# 1. Create the plot with a single color and no top labels
age_plot <- mental_health %>%
  ggplot(aes(x = factor(Age))) +
  
  # geom_bar with factor(Age) keeps them separated
  # width = 0.7 adds a nice gap between bars
  geom_bar(fill = "skyblue", color = "darkblue", width = 0.7) +
  
  # Clean theme
  theme_minimal() +
  
  # Labels (Removed subtitle to keep it clean)
  labs(title = "How Old Are the Students?",
       x = "Age",
       y = "Number of Students")

# 2. Make it interactive
# tooltip = c("x", "y") ensures you still see the data when hovering
ggplotly(age_plot, tooltip = c("x", "y")) %>%
  layout(title = list(text = "How Old Are the Students?"))

Story it tells

Now, How old are these students who participated in our survey?

Research shows that have 18-25 is when 75% mental health conditions first emerge. We’re surveying students at exactly the life stage when mental health issues typically begin.

So we’re looking at a population at peak vulnerability for mental health struggles.

Mental Health Problems Reported

mh_counts <- mental_health %>%
  
  # select() picks specific columns
  select(Depression, Anxiety, Panic_Attack) %>%
  
  
  pivot_longer(everything(), names_to = "Condition", values_to = "Reported") %>%
  
  # filter() keeps only rows where condition is true
  filter(Reported == "Yes") %>%
  
  # count() how many of each condition
  count(Condition) %>%
  
  # arrange() sorts the data
  arrange(desc(n))

# Now create the graph
mh_count_plot <- mh_counts %>%
  
  # reorder(Condition, -n) sorts conditions by count
  # -n means negative n (sorts from high to low)
  ggplot(aes(x = reorder(Condition, -n), y = n, fill = Condition)) +
  
  # Draw the bars
  geom_col() +
  
  # Add count labels on top of each bar
  geom_text(aes(label = n), vjust = -0.3, size = 5, fontface = "bold") +
  
  # Add titles
  labs(title = "Number of Students with Mental Health Issues", 
       x = "Mental Health Issue", 
       y = "Number of Students") +
  
  # Use our custom colors
  scale_fill_manual(values = mental_health_colors)

# Make Interactive
ggplotly(mh_count_plot, tooltip = c("x", "y")) %>%
  layout(title = list(text = "Number of Students with Mental Health Issues<br><sup>About 1 in 3 students report each issue</sup>"))

Story it tell us Look at these numbers:

35 students said yes to depression 34 students said yes to anxiety 33 students said yes to panic attacks

Notice something? These numbers are almost IDENTICAL. This could actually be a pattern. These are likely the same students experiencing all three Conditions.

Takeaway That’s the reality for 1 in 3 students - experiencing this nightmare combo, often while sitting next to you in class, smiling like everything is fine.

Student Grades (CGPA) Distribution

CGPA_order <- c('0 - 1.99',      # Failing
                '2.00 - 2.49',   # Struggling
                '2.50 - 2.99',   # Just passing
                '3.00 - 3.49',   # Good
                '3.50 - 4.00')   # Excellent

# Count students in each CGPA range
CGPA_distribution_data <- mental_health %>%
  
  # count(CGPA) counts students in each CGPA range
  count(CGPA) %>%
  
  # factor() with levels = ensures correct order
  mutate(CGPA = factor(CGPA, levels = CGPA_order))

# Create the graph
CGPA_plot <- CGPA_distribution_data %>%
  ggplot(aes(x = CGPA, y = n, fill = CGPA)) +
  
  # Draw bars
  geom_col() +
  
  # Add count labels on bars
  geom_text(aes(label = n), vjust = -0.5, size = 5, fontface = "bold") +
  
  # Add titles
  labs(title = "How Many Students in Each Grade Range?",
       x = "CGPA (Grade Point Average)",
       y = "Number of Students") +
  
  # Use minimal theme (clean white background)
  theme_minimal() +
  
  # Rotate x-axis labels 45 degrees so they don't overlap
  # angle = 45 rotates text
  # hjust = 1 aligns text properly after rotation
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Make interactive
ggplotly(CGPA_plot, tooltip = c("x", "y")) %>%
  layout(title = list(text = "How Many Students in Each Grade Range?<br><sup>Most students have good grades (above 3.0)</sup>"))

mental_health %>%select(Gender,Anxiety)%>% filter(Anxiety == "Yes")%>% group_by(Gender) %>% count(Gender, name = "Total") %>% summarise(percent = Total/34 * 100)

## # A tibble: 2 x 2
##   Gender percent
##   <fct>    <dbl>
## 1 Female    70.6
## 2 Male      29.4

Story it tells

Only 3 students are in the failing range (0-1.99 GPA) Only 2 students are barely passing (2.00-2.49) 4 students are at 2.50-2.99

But look at these tall bars:

43 students have GPAs of 3.00-3.49 (solid B+ to A- average) 48 students have GPAs of 3.50-4.00 (straight A’s)

91 out of 100 students have good grades. Almost HALF have near-perfect GPAs.

Age Overview (Box Plot)

Age_boxplot <- mental_health %>%
  
  # aes(y = Age) puts Age on the y-axis (vertical)
  #
  ggplot(aes(y = Age)) +
  
  # geom_boxplot() creates the box plot
  geom_boxplot(fill = "lightblue", color = "darkblue") +
  
  # Add labels
  labs(title = "Age Range of Students", y = "Age") +
  
  # Clean theme
  theme_minimal()

# Make interactive
ggplotly(Age_boxplot, tooltip = c("y")) %>%
  layout(title = list(text = "Age Range of Students<br><sup>Most students are 18-23 years old</sup>"))

What a box plot shows: shows that most of our student fall between age 18 to age 23.

The dark blue line inside the box: The median the middle point. Half the students are younger than this age, half are older.

The Whiskers These shows the full range of ages which is 24

Bivariate Analysis

Marital Status

# Make sure there are no missing values in Marital_Status
mental_health_pie <- mental_health %>%
  
  # So we keep rows where Marital_Status is not missing
  filter(!is.na(Marital_Status))

# Calculate counts and percentages for the pie chart
marital_status_counts <- mental_health_pie %>%
  
  # Count how many of each marital status
  count(Marital_Status) %>%
  
  # Add two new columns
  mutate(
    # Calculate percentage: (count ÷ total) × 100
    percentage = n / sum(n) * 100,
    
    # Create a label combining status and percentage
    # paste0() combines text without spaces
    label = paste0(Marital_Status, "\n", round(percentage, 1), "%")
  )


pie_chart <- ggplot(marital_status_counts, 
                    
                    # fill=Marital_Status means color by status
                    aes(x="", y=n, fill = Marital_Status)) +
  
  # stat = "identity" means use the values as-is (don't count)
  # width = 1 makes the bar full width
  geom_bar(stat = "identity", width = 1) +
  
  # coord_polar() converts bar chart to pie chart!
  
  coord_polar("y", start = 0) +
  
  # theme_void() re (clean pie chart)
  theme_void() +
  
  # Add labels
  labs(title = "Are Students Married?", 
       fill = "Marital Status") +
  
  
  geom_text(aes(label = label), 
            position = position_stack(vjust = 0.5)) +
  
  # Use our custom colors
  scale_fill_manual(values = c("No" = "#95a5a6", "Yes" = "#2ecc71"))

pie_chart

Story it tells The vast majority of students are not married only 16 out of 100 are married

Takeaway 84% of our Survey participants are unmarried, but relationship status is NOT the determining factor in mental health.

Survey Participation by Year of Study

# Calculate distribution across years

year_distribution <- mental_health %>%
  
  # Count students in each year
  count(Year_of_Study) %>%
  
  # Add percentage and label columns
  mutate(
    # Calculate percentage
    Percentage = n / sum(n) * 100,
    
    # Create a nice label showing count and percentage
    # paste0() combines textzz
    Label = paste0(n, " students\n(", round(Percentage, 1), "%)")
  )

# Create the graph
year_plot <- ggplot(year_distribution, 
                    aes(x = Year_of_Study, y = n, fill = Year_of_Study)) +
  
  # geom_col() creates bars
  # width = 0.7 makes bars slightly narrower (leaving gap between them)
  geom_col(width = 0.7) +
  
  # Add labels on top of bars
  geom_text(aes(label = Label), vjust = -0.5, size = 4, fontface = "bold") +
  
  # Add titles and labels
  labs(
    title = "Which Year Groups Took This Survey?",
    subtitle = "This shows survey participation, not student dropout",
    x = "Year of Study",
    y = "Number of Students Who Took Survey"
  ) +
  
  # palette = "Set2" is a predefined color scheme from RColorBrewer
  scale_fill_brewer(palette ="Set2") +
  
  # Expand y-axis to make room for labels on top
  # mult = c(0, 0.15) means 0% extra at bottom, 15% extra at top
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  
  # Clean theme
  theme_minimal() +
  
  # Customize specific elements
  theme(
    # Make title bold and bigger
    plot.title = element_text(face = "bold", size = 16),
    
    # Make x-axis text bold, size 12, not rotated
    axis.text.x = element_text(size = 12, face = "bold"),
    
    # Hide legend (since x-axis labels are self-explanatory)
    legend.position = "none"
  )

# Make interactive
ggplotly(year_plot, tooltip = c("x", "y")) %>%
  layout(title = list(text = "Which Year Groups Took This Survey?<br><sup>More Year 1 students participated in this survey</sup>"))

Story it tells Now lets look at which students participated in our Survey

year 1 = most responses year 2 = lost 16 students year 3 = lost 2 year 4 = lost 10 more its not a progressive data

Year 1 students make up almost half our survey Why? Two reasons: FIRST: Year 1 students are most willing to talk about mental health. They havent learned to hide it yet.

Second Most likely to respond

Mental Health by Gender

gender_mh_props <- mental_health %>%

  # Select only the columns we need
  select(Gender, Depression, Anxiety, Panic_Attack) %>%
  
  # Reshape from wide to long format
  pivot_longer(cols = c(Depression, Anxiety, Panic_Attack), 
               names_to = "Condition",      # Column names → "Condition" column
               values_to = "Reported") %>%  # Cell values → "Reported" column
  
  # Keep only "Yes" responses
  filter(Reported == "Yes") %>%
  
  # Count by both Gender AND Condition
  # This gives us: Female-Depression, Female-Anxiety, Male-Depression, etc.
  group_by(Gender, Condition) %>%
  summarise(Count = n(), .groups = 'drop') %>%  # .groups='drop' removes grouping after
  
  # This adds a "Total" column showing total students in each gender
  left_join(mental_health %>% count(Gender, name= "Total"), 
           by = "Gender") %>%
  
  # Proportion = (Students with condition) ÷ (Total students in that gender)
  mutate(proportions = Count / Total)

# graph
gender_mh_plot <- gender_mh_props %>%
  
  # x = Condition (Depression, Anxiety, Panic_Attack)
  # y = proportions
  ggplot(aes(x = Condition, y = proportions, fill = Gender)) +
  
  # Draw the bars
  # position_dodge() places the bars side-by-side instead of stacking them on top of each other
  # width = 0.8 creates a small gap btw d group bars
  geom_col(position = position_dodge2(width = 0.9, preserve = "single")) +
  
  # Add descriptions
  labs(title = "Do Males and Females Report Mental Health Issues Equally?", 
       x = "Mental Health Issue", 
       y = "Percentage Within Gender Group") +
  

  # "Set1" is a predefined color palette from RColorbrewer
  scale_fill_brewer(palette = "Set1") +
  
  # Format the Y-axis
  # limits = c(0,1) forces the axis to go from 0 to 1 (0% to 100%)
  # labels = scales:: percent converts decimals(0,35 to percent)
  scale_y_continuous(limits = c(0,1), labels = scales::percent)

# Interactive
ggplotly(gender_mh_plot, tooltip = c("x", "y", "fill")) %>%
  
  layout(title = list(text = "Do Males and Females Report Mental Health Issues Equally?<br><sup>Both genders report similar rates</sup>"))

What the story tells

While we had more female participants, the rate of mental health issues is actually higher among our male participants. This suggests that while fewer men took the survey, a larger proportion of those who did are experiencing significant distress.

Mental Health and Grades (CGPA)

Depression by Grade Range

Important Note: Some grade ranges have very few students, so percentages can be misleading.

# By default, R sorts alphabetically. We want them low-to-high (
CGPA_order <- c('0 - 1.99', '2.00 - 2.49', '2.50 - 2.99','3.00 - 3.49','3.50 - 4.00')

CGPA_depression_data <- mental_health %>%
  
  #Group by each grade range
  group_by(CGPA) %>%
  
  # Calculate the summary statistics for each group
  summarise(
    Total_Students = n(),  # Total number of students in this grade range
    Students_with_Depression = sum(Depression == "Yes"), # Count total students in this grade range
    
    # Calcualte the percentage
    # We use this to compare groups of different sizes fairly
    Percentage = (Students_with_Depression / Total_Students) * 100,
    Label = paste0(Students_with_Depression, " out of ", Total_Students, "\n(", round(Percentage, 1), "%)"),
    .groups = 'drop'   # Ungroup after summarizing to keep data clean
  ) %>%
  
  # Apply the custom order we defined
  mutate(CGPA = factor(CGPA, levels = CGPA_order))

# Create the plot
CGPA_depression_plot <- CGPA_depression_data %>%
  
  
  ggplot(aes(x = CGPA, y = Percentage, fill = CGPA)) +
  
  # bars
  geom_col() +
  
  
  # vjust = -0.2 moves the text slightly above the bar
  geom_text(aes(label = Label), vjust = -0.2, size = 3.5) +
  
  # Add titles and axis labels
  labs(title = "Depression Rates by Grade Range", 
       subtitle = "Numbers show: students with depression out of total students in that grade range",
       x = "CGPA (Grade Range)", 
       y = "Percentage with Depression") +
  
  
  theme_minimal() +
  
  # Rotate x-axis text by 45 degrees so they dont overlap
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_fill_brewer(palette = "Reds") + # use REDS color palette
  
  # Fix Y-axis to always show 0 to 100
  scale_y_continuous(limits = c(0, 100))

# Make it Interactive
ggplotly(CGPA_depression_plot, tooltip = c("x", "y")) %>%
  layout(title = list(text= "Depression Rates by Grade Range<br><sup>Note: Small sample sizes in some ranges (e.g., only 4 students in 2.50-2.99)</sup>"))

Story it tells

Does your GPA affect your mental health?

The small groups (like 2.250 - 2.99) show high percentages, but its based on very few students. careful about making big conclusions

Second: Even in larger groups with good grades, depression rates are high. students with 3.0+ GPA are struggling too.

Anxiety by Grade Range

CGPA_anxiety_data <- mental_health %>%
  group_by(CGPA) %>%  # Group by each grade range
  
  # summary statistics
  summarise(
    Total_Students = n(), # Count total students in this bracket
    Students_with_Anxiety = sum(Anxiety == "Yes"), # Count how many said "Yes" to anxiety
    
    # Calculate Percentage
    # This normalizes the data so we can compare small groups vs large groups
    Percentage = (Students_with_Anxiety / Total_Students) * 100,
    
    # Create d text label for the bar: "Count out of Total (percent)"
    Label = paste0(Students_with_Anxiety, " out of ", Total_Students, "\n(", round(Percentage, 1), "%)"),
    .groups = 'drop'
  ) %>%
  
  # Apply the logical order(0-1.99 -> 3.50-4.00) so d grapg isnt mixed up
  mutate(CGPA = factor(CGPA, levels = CGPA_order))

# Create d plot
anxiety_cgpa_plot <- ggplot(CGPA_anxiety_data, aes(x = CGPA, y = Percentage, fill = CGPA)) +
  
  # Draw the bars
  geom_col() +
  # Add d labels on top of the bars (-0.2 vjust moves them up slightly)
  geom_text(aes(label = Label), vjust = -0.2, size = 3.5) +
  
  
  labs(title = "Anxiety Rates by Grade Range",
       subtitle = "Numbers show: students with anxiety out of total students in that grade range",
       x = "CGPA (Grade Range)",
       y = "Percentage with Anxiety") +
  
  theme_minimal() +
  
  # Rotate X-axis labels 45 degrees for readability
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  
  # Use "Blues" palette
  scale_fill_brewer(palette = "Blues") +
  
  # Fix y-axis to 0-100%
  scale_y_continuous(limits = c(0, 100))

# MAke it interactive
ggplotly(anxiety_cgpa_plot, tooltip = c("x", "y")) %>%
  layout(title = list(text = "Anxiety Rates by Grade Range<br><sup>Anxiety appears slightly higher in students with better grades</sup>"))

Story it tells anxiety is highest in our best students?

Yes! and thus makes sense when you think about it

High-achieving students experience: ** i must maintain my GPA pressure

** Perfectionism(“Anything less than A is a failure”)

** Fear of slipping from the top

** “My GPA determines my future” stress

Panic Attack by grade Range

CGPA_panic_data <- mental_health %>%
  group_by(CGPA) %>% # sort students into groups based on their CGPA 
  summarise(
    Total_Students = n(),   # How many Students are in each grade
    Students_with_Panic = sum(Panic_Attack == "Yes"), #Count only those who said "yes" to panic attack
    Percentage = (Students_with_Panic / Total_Students) * 100, # calculates the rates
    # Creates a text label like "5 out of 20" shown on top of bars
    Label = paste0(Students_with_Panic, " out of ", Total_Students, "\n(", round(Percentage, 1), "%)"),
    .groups = 'drop' # Cleans up the grouping so it doesnt interfere with later steps
  ) %>%
  # Arranges the CGPA in logical order
  mutate(CGPA = factor(CGPA, levels = CGPA_order))

panic_cgpa_plot <- ggplot(CGPA_panic_data, aes(x = CGPA, y = Percentage, fill = CGPA)) +  geom_col() + # Draws the bars(Columns)
  
  geom_text(aes(label = Label), vjust = -0.2, size = 3.5) +
  
  # Add labels
  labs(title = "Panic Attack Rates by Grade Range",
       subtitle = "Numbers show: students with panic attacks out of total students in that grade range",
       x = "CGPA (Grade Range)",
       y = "Percentage with Panic Attacks") +
  theme_minimal() + # Clean theme
  
  # slants x-axis labels so they dont overlap
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_fill_brewer(palette = "Purples") + # Color the bars in shades of purple
  scale_y_continuous(limits = c(0, 100)) # Fixes the y-axis from 0% to 100% (stops bars from looking "exaggerated")

ggplotly(panic_cgpa_plot, tooltip = c("x", "y")) %>%
  layout(title = list(text = "Panic Attack Rates by Grade Range<br><sup>Pattern similar to depression rates</sup>"))

Story it tells Again, small sample sizes in the lower ranges make interpretation tricky. But look at our top performers. 3.50-4.00 CGPA: 40% experience panic attacks (19 out of 48 students)

Let me describe what a panic attack feels like:

** Cant breathe ** Heart racing ** Feeling like you’re dying ** Happen during exams, presentations or out of nowhere

Takeaway These aren’t just ‘stressed students’. these are students experiencing emergency-level mental health crisis

Getting Help(Treatment)

Student who Got Professional Help

mental_health_conditions <- mental_health %>%
  filter(Depression == "Yes" | Anxiety == "Yes" | Panic_Attack == "Yes") %>% # keep only conditions who said "Yes" 
  
  # Keep only the columns we need for the chart
  select(Depression, Anxiety, Panic_Attack, Treatment) %>%
  #Makes the data "long" instead of "wide"
  pivot_longer(cols = c(Depression, Anxiety, Panic_Attack), names_to = "Condition", values_to = "Status") %>% # turn three columns to (Condition and status)
  # Only focus on the "Yes"
  filter(Status == "Yes")

treatment_gap <- mental_health_conditions %>% 
  
  
  # Group by the issue and whether they got treatment (Yes/No)
  group_by(Condition, Treatment) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  
  # Group again by conditions to calculate the percentage within the issue
  group_by(Condition) %>%
  mutate(
    Total = sum(Count), # Total people with that specific issue
    Percentage = Count / Total * 100,
    # Creayes the text label: "15 students (25.5%)"
    Label = paste0(Count, " students\n(", round(Percentage, 1), "%)")
  )

# "Stack" puts "Yes" and "NO" on top each other to reach 100%
treatment_plot <- ggplot(treatment_gap, aes(x = Condition, y = Percentage, fill = Treatment)) +
  geom_col(position = "stack", width = 0.6, color = "black") +
  
  # Adds the text labels (the numbers/percentages) inside the bars
  # vjust = 0.5 centers the text in the middle of each colored segment
  geom_text(aes(label = Label), position = position_stack(vjust = 0.5), size = 4, fontface = "bold", color = "white") +
  
  # Add titles
  labs(
    title = "How Many Students with Mental Health Issues Got Professional Help?",
    subtitle = "Out of students who said 'Yes' to having each issue",
    x = "Mental Health Issue",
    y = "Percentage of Students",
    fill = "Got Help?"
  ) + # manually sets the colors: 
  scale_fill_manual(values = c("No" = "#95a5a6", "Yes"= "#2ecc71")) + 
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, margin = margin(b = 20)),
    legend.position = "top" # Moves the "Yes/No" legend to the top of the chart
  )

ggplotly(treatment_plot, tooltip = c("x","y","fill")) %>%
  layout(title = list(text = "How Many Students Got Professional Help?<br><sup>Most students with mental health issues are not getting help</sup>"))

Story it tells Out of roughly 100 students with mental health issues, only about 13 got professional help. 9 out of 10 students with mental health issues are NOT getting any support. Why?

They think it’s not ‘serious enough’
‘Other people have it worse’ They don’t know where to go ‘Is there even a counseling center?’ They’re ashamed ‘What if people think I’m weak?’ They fear judgment ‘What if this goes on my record?’

Takeaway The treatment gap is catastrophic. most students with mental health issues are NOT getting help

This is our crisis. Not that students are struggling-humans struggle, that’s normal. The crisis is that we’re letting them struggle alone

Getting Help by Grade Range

# Prepare the data
treatment_cgpa_data <- mental_health %>%
  
  # Ensure we only look at d valid grade ranges we defined earlier
  filter(CGPA %in% CGPA_order) %>%
  group_by(CGPA, Treatment) %>%
  # Group by both Grade AND Treatment status (YES/No)
  summarise(Count = n(), .groups = 'drop') %>%
  
  # calculate percentages within each Grade group
  group_by(CGPA) %>%
  mutate(  
    Total_in_Group = sum(Count),
    # Calculate what % of dis gade range sought treatment
    Percentage = Count / Total_in_Group * 100,
    # Create label: "5 student /n (12.5%)"
    Label = paste0(Count, " students\n(", round(Percentage, 1), "%)")
  ) %>%
  # Apply logical order to grades
  mutate(CGPA = factor(CGPA, levels = CGPA_order))

# Create the Stacked Bar Chart
treatment_cgpa_plot <- ggplot(treatment_cgpa_data, aes(x = CGPA, y = Percentage, fill = Treatment)) +
  # position = "fill" stacks the bars to reach 100% height
  # This makes every bar the same height so we can compare d "split easily
  geom_col(position = "fill", width = 0.7, color = "black") +
  geom_text(aes(label = Label), position = position_fill(vjust = 0.5), size = 3.5, fontface = "bold", color = "white") +
  # Labels and Titles
  labs(
    title = "Treatment Rates Across Grade Ranges",
    subtitle = "Are students in different grade ranges more likely to get help?",
    x = "CGPA (Grade Range)",
    y = "Percentage of Students",
    fill = "Got Help?"
  ) +
  # Format y-axis as percenatge (0% to 100%)
  scale_y_continuous(labels = scales::percent) +
  
  # Custom Colors: 
  scale_fill_manual(values = c("No" = "#34495e", "Yes" = "#2ecc71")) +
  # theme adjustments
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
    legend.position = "top"
  )
# Interactive plot
ggplotly(treatment_cgpa_plot, tooltip = c("x", "y", "fill")) %>%
  layout(title = list(text = "Treatment Rates by Grade Range<br><sup>Treatment gap exists across all grade levels</sup>"))

Story it tells Does seeking help vary by GPA?

Two Concerning Groups:

Failing Students: Not getting help because they have given up or dont know where to start
High Achievers: Not getting help because they “My grades are good, so i can’t be that bad”

But we just saw - 38% of top students have anxiety!

their success hides their pain

Mental Health by Field of Study

# Prepare d data
course_mh_plot <- mental_health_final %>%
  # Specific columns we need for this analysis
  select(Course_Category, Depression, Anxiety, Panic_Attack) %>%
  # Reshape data: Convert d 3 condition columns into one "Condition" column
  # this make it possible to plot them all on d same graph
  pivot_longer(cols = c(Depression, Anxiety, Panic_Attack),
               names_to = "Condition", values_to = "Status") %>%
  # Keep only the students who said "Yes"
  # We are interested in the presence of mental health issue, not the absence
  filter(Status == "Yes") %>%
  
  # group by field of study and the specific mental health condition
  group_by(Course_Category, Condition) %>%
  
  # Count how many students fall in each group
  summarise(Count = n(), .groups = 'drop') %>%
  # Create d graph
  # x = Field of Study, y = Count (Number of students), fill = condition(color)
  ggplot(aes(x = Course_Category, y = Count, fill = Condition)) +
  # Draw bars side-by-side so we can compare conditions
  geom_col(position = "dodge") +
  
  # Add labels
  labs(title = "Mental Health Issues by Field of Study",
       x = "Field of Study", 
       y = "Number of Students") +
  # Clean Theme
  theme_minimal() +
  # Rotate x-axis labels 45 degrees so they are readable and they dont overlap
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Make it interactive
ggplotly(course_mh_plot, tooltip= c("x","y","fill")) %>%
  layout(title = list(text = "Mental Health Issues by Field of Study<br><sup>All fields show mental health challenges</sup>"))

Story it Tells Finally, does your major matter?

STEM students show the highest absolute numbers - but thats because STEM makes up to 62% of our sample

STEM ** Heavy workload ** Difficult exams ** Competitive culture

Takeaway But looking across the chart - Business/Finance, Social Science, Law, Religious Studies Every field shows mental struggles

Key Findings

Answers to Research Questions

1. Which mental health problem is most common?

Answer: All three are equally common.

Depression: 35 students (35%)
Anxiety: 34 students (34%)
Panic Attacks: 33 students (33%)

These numbers are almost identical, suggesting students often experience multiple issues at once.

2. Do female students report more mental health issues than male students?

Answer: Rates are similar, with small differences. but there is a some level of concern as the male tends to be higher considering their little participation in the survey

Anxiety: Nearly the same (Female: 40%, Male: 42%)
Depression: Females slightly higher (Female: 38%, Male: 24%)
Panic Attacks: Nearly equal (Female: 35%, Male: 32%)

3. Do students with mental health issues have lower grades?

Answer: The relationship is complex, not simple.

Students in the middle grade range (2.50-2.99) show high rates of depression and panic attacks
However, this is based on very small numbers (only 4 students in that range)
Anxiety appears slightly higher in students with better grades
Both mental health issues and poor grades can cause each other

4. How many students with mental health issues actually get help?

Answer: Very few - this is our biggest concern.

Anxiety: Only 9% got help (31 out of 34 didn’t get help)
Depression: Only 17% got help (29 out of 35 didn’t get help)
Panic Attacks: Only 12% got help (29 out of 33 didn’t get help)

5. Which year of study has the most students with mental health issues?

Answer: - Year 1 had the most participants/Cases. - But Rates are consistent across all years

Recommendations

What Can We Do?

1. Make Help Easier to Get

Reduce wait times for counseling
Create drop-in hours (no appointment needed)
Offer online support options

2. Focus on High-Risk Groups

Provide extra support for Year 1 students (transition help)
Watch for students whose grades start dropping
Reach out to students before they fail

3. Reduce Shame Around Mental Health

Make it normal to ask for help
Share stories of students who got help and improved
Educate all students about mental health

4. Regular Check-ins

Ask students “How are you really doing?”
Train teachers to spot warning signs
Connect struggling students to resources quickly

Conclusion

"Let’s come back to where we started.

Remember that student- heart racing, two exams today, can’t sleep, drowning in silence?

Here’s what we now know:

35 students in our survey have depression. 34 have anxiety. 33 have panic attacks.

And 9 out of 10 of them aren’t getting any professional help.

But here’s the most important thing i want us to take away from this presentation:

This doesn’t have to be a story.

We have the solutions: - Drop-in counseling hours - Campus awareness campaigns - Year 1 wellness check-ins - Expanded services

These aren’t complicated. They’re not impossibly expensive. They work.

The question isn’t “Can we afford to do this?”

The question is: “Can we afford NOT to?”

To administrators: Every money spent on mental health saves money in retention and prevents crises.

To faculty: You can be the person who notices and helps a student find support.

To students: If you’re one of the 35, 34, or 33—you are not alone. Getting help is strength, not weakness.

35 students with depression are counting on us. 34 students with anxiety need us to act. 33 students with panic attacks are waiting. Let’s not make them wait any longer.

Student Mental Health Crisis

Understanding Mental Health Among Students

Omosola Gbenga

12/12/2025

Introduction

Aim

Research Questions

Data Cleaning

Load Packages

Import Dataset

Quick Look at the Data

Data Cleaning

Remove Timestamp Column

Renaming Column Names

Convert Text to Categories

Dataset Summary

Handle Missing Values

Fix Year of Study Labels

Group Courses into Fields

Choosing Colors for Visualization

Univariate Analysis

Gender Distribution

Age Distribution

Mental Health Problems Reported

Student Grades (CGPA) Distribution

Age Overview (Box Plot)

Bivariate Analysis

Marital Status

Survey Participation by Year of Study

Mental Health by Gender

Mental Health and Grades (CGPA)

Depression by Grade Range

Anxiety by Grade Range

Panic Attack by grade Range

Getting Help(Treatment)

Student who Got Professional Help

Getting Help by Grade Range

Mental Health by Field of Study

Key Findings

Answers to Research Questions

1. Which mental health problem is most common?

2. Do female students report more mental health issues than male students?

3. Do students with mental health issues have lower grades?

4. How many students with mental health issues actually get help?

5. Which year of study has the most students with mental health issues?

Recommendations

What Can We Do?

1. Make Help Easier to Get

2. Focus on High-Risk Groups

3. Reduce Shame Around Mental Health

4. Regular Check-ins

Conclusion