Visualization of suicide data in New Zealand (2)-2 - Age Group
Author
Takafumi Kubota
Published
November 4, 2024
Abstract
This report analyzes suicide trends in Aotearoa New Zealand for 2023, focusing on differences across age groups and sexes. Using “Suspected” case data from all ethnic groups, the study employs data cleaning and transformation to ensure accuracy. Utilizing R and ggplot2, the report presents stacked bar charts showing both the number of suicide deaths and rates per 100,000 population across age categories. The findings provide insights for public health officials and policymakers to identify high-risk groups and develop targeted intervention strategies, contributing to efforts to reduce suicide rates and enhance mental health support in New Zealand. (Update for missing data.)
Keywords
R language, Suicide, New Zealand, Bar Chart
Introduction
This page includes information about numbers and rates of suicide deaths in Aotearoa New Zealand. If at any point you feel worried about harming yourself while viewing the information in this page—or if you think someone else may be in danger—please stop reading and seek help.
Suicide remains a critical public health concern globally, with profound impacts on individuals, families, and communities. In Aotearoa New Zealand, understanding the underlying patterns and demographic disparities in suicide trends is essential for developing effective prevention strategies. This report delves into the suicide trends for the year 2023, analyzing data categorized under “Suspected” cases across all ethnic groups. The primary objective is to elucidate the variations in suicide occurrences and rates among different age groups and sexes, thereby identifying vulnerable populations that may benefit from targeted interventions. Employing a comprehensive data analysis approach, the study utilizes R programming to process and visualize the data. Initial steps involve meticulous data cleaning, including converting relevant variables to numeric formats and handling missing or anomalous values. Calculating average population counts (pop_mean) for each demographic segment serves as a foundation for subsequent rate computations. To address gaps in the data, a custom imputation function is implemented, ensuring that missing pop_mean values are estimated based on historical trends or overall group averages. The visualization phase leverages ggplot2 to create stacked bar charts that effectively communicate both the absolute number of suicide deaths and the corresponding rates per 100,000 population across defined age groups and sexes. These visual representations facilitate the identification of patterns and disparities, offering actionable insights for stakeholders. By highlighting the interplay between year, age, sex, and suicide rates, this report contributes to the ongoing discourse on mental health in New Zealand. The findings aim to support policymakers and healthcare providers in prioritizing resources and designing interventions that address the specific needs of high-risk groups, ultimately striving to reduce the incidence of suicide and promote mental well-being across the nation.
The data on this page is sourced from the Suicide Data Web Tool provided by Health New Zealand, specifically from https://tewhatuora.shinyapps.io/suicide-web-tool/, and is licensed under a Creative Commons Attribution 4.0 International License.
This visualisation shows only calendar years. It also visualises only suspected suicides. The following notes are given on the site of the Suicide Data Web Tool:
Short term year-on-year data are not an accurate indicator by which to measure trends. Trends can only be considered over a five to ten year period, or longer.
Confirmed suicide rates generally follow the same pattern as suspected suicide rates.
On the technical information page for the Suicide Data Web Tool, the following is written as a cautionary note on ‘Interpreting Numerical Values and Rates’. For the purpose of visualisation, this page uses suicide rates calculated by extracting or calculating the population from similar attributes. You should be very careful when interpreting the graphs.
For groups where suicide numbers are very low, small changes in the numbers of suicide deaths across years can result in large changes in the corresponding rates. Rates that are based on such small numbers are not reliable and can show large changes over time that may not accurately represent underlying suicide trends. Because of issues with particularly small counts, rates in this web tool are not calculated for groups with fewer than six suicide deaths in a given year.
The vertical axis shows the number of suicides per 100,000 males and the same number for females, so it is necessary to be aware that when considering all sexes, the number of suicides per 200,000 people is shown. In other words, when comparing with actual values (such as the Suicide Data Web Tool), the values on this page are simply divided by 2.
# 1. Load necessary libraries}library(ggplot2) # For data visualizationlibrary(dplyr) # For data manipulationlibrary(readr) # For reading CSV fileslibrary(zoo) # For time-series data manipulation and handling missing values# 2. Load the data from a CSV file# The following line is commented out and can be used to load data from a local directory# suicide_trends <- read_csv("data/suicide-trends-by-ethnicity-by-calendar-year.csv")# Load the dataset directly from an external URLsuicide_trends <-read_csv("https://takafumikubota.jp/suicide/nz/suicide-trends-by-ethnicity-by-calendar-year.csv")# 3. Filter and transform data for the line plotsuicide_trends_filtered_age <- suicide_trends %>%filter( data_status =="Suspected", # Include only suspected cases sex %in%c("Male", "Female", "All sex"), # Include Male, Female, and All sex categories ethnicity =="All ethnic groups", # Include all ethnic groups age_group !="All ages"# Exclude aggregated age groups ) %>%mutate(number =as.numeric(number)) # Convert the 'number' column to numeric type for accurate calculations# 4. Group by data_status, year, sex, and age_group, then calculate the average popcount for each grouppop_means <- suicide_trends_filtered_age %>%mutate(popcount_num =as.numeric(popcount), # Convert 'popcount' to numericpopcount_num =if_else(popcount =="S", NA_real_, popcount_num) # Replace 'S' with NA for accurate calculations ) %>%group_by(data_status, year, sex, age_group) %>%# Group data by status, year, sex, and age groupsummarise(pop_mean =mean(popcount_num, na.rm =TRUE), # Calculate the mean population count, ignoring NA values.groups ='drop'# Ungroup after summarising )# 5. Arrange the pop_means data frame by sex, age_group, and year for consistencypop_means <- pop_means %>%arrange(sex, age_group, year) # Sort the data for orderly processing# 6. Fill missing pop_mean values using the previous year's value or group averagepop_means <- pop_means %>%group_by(data_status, sex, age_group) %>%# Group by status, sex, and age grouparrange(year) %>%# Arrange data chronologically by yearmutate(pop_mean =if_else(is.na(pop_mean), lag(pop_mean), pop_mean), # Replace NA with the previous year's population meanpop_mean =if_else(is.na(pop_mean), mean(pop_mean, na.rm =TRUE), pop_mean) # If still NA, replace with the group average ) %>%ungroup() # Remove grouping# 7. Replace the 'popcount' column in suicide_trends_filtered_age with the filled 'pop_mean' values from pop_meanssuicide_trends_filtered_age <- suicide_trends_filtered_age %>%arrange(year, sex, age_group) %>%# Arrange data to match the order in pop_meansmutate(popcount = pop_means$pop_mean) # Update 'popcount' with the filled population means# 8. Filter and transform data for the bar plotsuicide_trends_age <- suicide_trends_filtered_age %>%filter( data_status =="Suspected", # Include only suspected cases sex %in%c("Male", "Female"), # Include only Male and Female categories age_group !="All ages", # Exclude aggregated age groups ethnicity =="All ethnic groups", # Include all ethnic groups year ==2023# Focus on the year 2023 ) %>%mutate(number =as.numeric(number), # Ensure 'number' is numericrate = number / popcount *100000# Calculate the suicide rate per 100,000 population )# 9. Get unique age groups and set factor levels in orderage_levels <-unique(suicide_trends_filtered_age$age_group) # Extract unique age groups to maintain order in plotssuicide_trends_age$age_group <-factor(suicide_trends_age$age_group, levels = age_levels) # Set factor levels for consistent plotting# 10. Define colors for the bar plotbar_colors <-c("Female"=rgb(102/255, 102/255, 153/255), # Purple tone for Female"Male"=rgb(255/255, 102/255, 102/255) # Pink tone for Male)# 11. Create the stacked bar plot for the number of suicide deathsggplot(suicide_trends_age, aes(x = age_group, y = number, fill = sex)) +geom_bar(stat ="identity") +# Create bars with heights corresponding to 'number'labs(title ="Number of Suicide by Age Group and Sex in Aotearoa New Zealand, 2023", # Set plot titlex ="Age Group", # Label for x-axisy ="Number (Suspected)", # Label for y-axisfill ="Sex"# Label for the fill legend ) +scale_fill_manual(values = bar_colors) +# Apply custom colors to the bars based on sextheme_minimal() +# Use a minimal theme for a clean looktheme(axis.text.x =element_text(angle =0, hjust =0.5) # Adjust x-axis text for readability )
# 12. Create the stacked bar plot for the suicide rateggplot(suicide_trends_age, aes(x = age_group, y = rate, fill = sex)) +geom_bar(stat ="identity") +# Create bars with heights corresponding to 'rate'labs(title ="Suicide Rate by Age Group and Sex in Aotearoa New Zealand, 2023", # Set plot titlex ="Age Group", # Label for x-axisy ="Rate (Suspected)", # Label for y-axisfill ="Sex"# Label for the fill legend ) +scale_fill_manual(values = bar_colors) +# Apply custom colors to the bars based on sextheme_minimal() +# Use a minimal theme for a clean looktheme(axis.text.x =element_text(angle =0, hjust =0.5) # Adjust x-axis text for readability )
1. Loading Necessary Libraries
The script begins by loading essential R libraries:
ggplot2: Facilitates data visualization through advanced plotting capabilities.
dplyr: Provides a suite of functions for efficient data manipulation and transformation.
readr: Enables fast and friendly reading of rectangular data, such as CSV files.
zoo: Offers tools for working with ordered observations, particularly useful for handling time-series data and filling in missing values.
2. Loading the Data
The dataset containing suicide trends in Aotearoa New Zealand is loaded using the read_csv() function from the readr package. The data is sourced directly from an external URL, ensuring that the latest available data is used. An alternative commented line is provided for loading the data from a local directory if preferred.
3. Filtering and Transforming Data for the Line Plot
The dataset is filtered to include only rows where:
data_status is "Suspected", focusing on suspected suicide cases.
sex is either "Male", "Female", or "All sex", ensuring the inclusion of all relevant sex categories.
ethnicity is "All ethnic groups", aggregating data across all ethnicities.
age_group is not "All ages", allowing for analysis across specific age brackets.
After filtering, the number column, representing the count of suicide cases, is converted to a numeric type to facilitate accurate calculations and visualizations.
4. Calculating Average Population Count
To prepare for rate calculations, the script:
Converts the popcount column to numeric, handling any non-numeric entries by replacing "S" with NA to signify missing values.
Groups the data by data_status, year, sex, and age_group to calculate the mean population (pop_mean) for each group, ignoring NA values. This aggregation is crucial for subsequent rate calculations.
5. Arranging the Population Means Data Frame
The resulting pop_means data frame is sorted by sex, age_group, and year. This arrangement ensures that the data is orderly and facilitates the filling of missing values in a logical sequence.
6. Filling Missing Population Mean Values
To address any missing pop_mean values:
The data is grouped by data_status, sex, and age_group.
Within each group, the data is ordered by year.
Missing pop_mean values are first filled using the previous year’s value (lag(pop_mean)). If a previous value is unavailable, the group’s average population count is used.
This step ensures a complete dataset without gaps in population counts, which is essential for accurate rate calculations.
7. Updating the ‘popcount’ Column
The original popcount column in the suicide_trends_filtered_age data frame is replaced with the newly filled pop_mean values from the pop_means data frame. This update ensures that all population counts are complete and reliable for further analysis.
8. Preparing Data for the Bar Plot
For the bar plot visualization:
The data is further filtered to include only suspected cases (data_status == "Suspected"), specific sexes ("Male" and "Female"), non-aggregated age groups, all ethnic groups, and the year 2023.
The number column is ensured to be numeric.
A new rate column is calculated, representing the suicide rate per 100,000 population (number / popcount * 100000). This rate provides a standardized measure for comparing suicide prevalence across different age groups and sexes.
9. Setting Factor Levels for Age Groups
To maintain consistent and meaningful ordering in the plots:
Unique age groups are extracted to define the order of factors.
The age_group column in the suicide_trends_age data frame is converted to a factor with levels set according to the extracted unique age groups. This step ensures that the x-axis in the plots reflects the correct and intended order of age groups.
10. Defining Colors for the Bar Plot
A custom color palette is defined using the rgb() function to assign specific colors to each sex category:
Female: Assigned a purple tone (rgb(102/255, 102/255, 153/255)).
Male: Assigned a pink tone (rgb(255/255, 102/255, 102/255)).
This color differentiation enhances the visual distinction between the sexes in the bar plots.
11. Creating the Stacked Bar Plot for Suicide Deaths
A stacked bar plot is generated to display the number of suicide deaths by age group and sex for the year 2023:
Axes: age_group on the x-axis and number of deaths on the y-axis.
Fill: Bars are filled based on the sex category, using the predefined bar_colors.
Geometries: geom_bar(stat = "identity") creates bars with heights corresponding to the actual number of deaths.
Labels: The plot includes a title, axis labels, and a legend title for clarity.
Theme: theme_minimal() is applied for a clean and uncluttered appearance, and x-axis text is adjusted for readability.
12. Creating the Stacked Bar Plot for Suicide Rate
A similar stacked bar plot is created to visualize the suicide rate per 100,000 population:
Axes: age_group on the x-axis and rate on the y-axis.
Fill: Bars are filled based on the sex category, maintaining consistency with the previous plot.
Geometries: geom_bar(stat = "identity") ensures that bar heights accurately reflect the suicide rates.
Labels: The plot includes a descriptive title and appropriate axis labels.
Theme: The same minimalistic theme is applied, and x-axis text is formatted for clarity.
Summary
This R script meticulously processes and visualizes suicide trends in Aotearoa New Zealand for the year 2023. By loading and cleaning the data, calculating meaningful statistics such as suicide rates, and employing clear and informative visualizations, the analysis provides valuable insights into the demographic patterns of suicide deaths. The use of custom colors and organized plotting techniques enhances the interpretability of the data, making it a useful tool for public health officials, researchers, and policymakers aiming to understand and address suicide trends effectively.