Visualization of suicide data in New Zealand (2) - Age Group

Author

Takafumi Kubota

Published

November 4, 2024

Abstract

This report analyzes suicide trends in Aotearoa New Zealand for 2023, focusing on differences across age groups and sexes. Using “Suspected” case data from all ethnic groups, the study employs data cleaning and transformation to ensure accuracy. Utilizing R and ggplot2, the report presents stacked bar charts showing both the number of suicide deaths and rates per 100,000 population across age categories. The findings provide insights for public health officials and policymakers to identify high-risk groups and develop targeted intervention strategies, contributing to efforts to reduce suicide rates and enhance mental health support in New Zealand.

Keywords

R language, Suicide, New Zealand, Bar Chart

Introduction

This page includes information about numbers and rates of suicide deaths in Aotearoa New Zealand. If at any point you feel worried about harming yourself while viewing the information in this page—or if you think someone else may be in danger—please stop reading and seek help.

Suicide remains a critical public health concern globally, with profound impacts on individuals, families, and communities. In Aotearoa New Zealand, understanding the underlying patterns and demographic disparities in suicide trends is essential for developing effective prevention strategies. This report delves into the suicide trends for the year 2023, analyzing data categorized under “Suspected” cases across all ethnic groups. The primary objective is to elucidate the variations in suicide occurrences and rates among different age groups and sexes, thereby identifying vulnerable populations that may benefit from targeted interventions. Employing a comprehensive data analysis approach, the study utilizes R programming to process and visualize the data. Initial steps involve meticulous data cleaning, including converting relevant variables to numeric formats and handling missing or anomalous values. Calculating average population counts (pop_mean) for each demographic segment serves as a foundation for subsequent rate computations. To address gaps in the data, a custom imputation function is implemented, ensuring that missing pop_mean values are estimated based on historical trends or overall group averages. The visualization phase leverages ggplot2 to create stacked bar charts that effectively communicate both the absolute number of suicide deaths and the corresponding rates per 100,000 population across defined age groups and sexes. These visual representations facilitate the identification of patterns and disparities, offering actionable insights for stakeholders. By highlighting the interplay between year, age, sex, and suicide rates, this report contributes to the ongoing discourse on mental health in New Zealand. The findings aim to support policymakers and healthcare providers in prioritizing resources and designing interventions that address the specific needs of high-risk groups, ultimately striving to reduce the incidence of suicide and promote mental well-being across the nation.

The data on this page is sourced from the Suicide Data Web Tool provided by Health New Zealand, specifically from https://tewhatuora.shinyapps.io/suicide-web-tool/, and is licensed under a Creative Commons Attribution 4.0 International License.

This visualisation shows only calendar years. It also visualises only suspected suicides. The following notes are given on the site of the Suicide Data Web Tool:

  • Short term year-on-year data are not an accurate indicator by which to measure trends. Trends can only be considered over a five to ten year period, or longer.

  • Confirmed suicide rates generally follow the same pattern as suspected suicide rates.

On the technical information page for the Suicide Data Web Tool, the following is written as a cautionary note on ‘Interpreting Numerical Values and Rates’. For the purpose of visualisation, this page uses suicide rates calculated by extracting or calculating the population from similar attributes. You should be very careful when interpreting the graphs.

For groups where suicide numbers are very low, small changes in the numbers of suicide deaths across years can result in large changes in the corresponding rates. Rates that are based on such small numbers are not reliable and can show large changes over time that may not accurately represent underlying suicide trends. Because of issues with particularly small counts, rates in this web tool are not calculated for groups with fewer than six suicide deaths in a given year.

##1.## Load necessary libraries
library(ggplot2)  # For creating visualizations
library(dplyr)    # For data manipulation
library(readr)    # For reading CSV files
library(zoo)      # For handling missing values and time series data

##2.## Load the data from a CSV file
#suicide_trends <- read_csv("data/suicide-trends-by-ethnicity-by-calendar-year.csv")
suicide_trends <- read_csv("https://takafumikubota.jp/suicide/nz/suicide-trends-by-ethnicity-by-calendar-year.csv")

##3.## Filter and transform data for the line plot
suicide_trends_filtered_age <- suicide_trends %>%
  filter(
    data_status == "Suspected",                    # Select rows where data_status is "Suspected"
    sex %in% c("Male", "Female", "All sex"),        # Select specific sex categories
    ethnicity == "All ethnic groups",              # Select rows where ethnicity is "All ethnic groups"
    age_group != "All ages"                        # Exclude rows where age_group is "All ages"
  ) %>%
  mutate(number = as.numeric(number))              # Convert the 'number' column to numeric type

##4.## Group by data_status, year, sex, and age_group, then calculate the average popcount for each group
pop_means <- suicide_trends_filtered_age %>%
  mutate(
    popcount_num = as.numeric(popcount),                                  # Convert 'popcount' to numeric
    popcount_num = if_else(popcount == "S", NA_real_, popcount_num)       # Replace "S" with NA in 'popcount_num'
  ) %>%
  group_by(data_status, year, sex, age_group) %>%                        # Group by data_status, year, sex, and age_group
  summarise(
    pop_mean = mean(popcount_num, na.rm = TRUE),                         # Calculate the mean of popcount_num, ignoring NA values
    .groups = 'drop'                                                     # Drop the grouping after summarisation
  )

##5.## Arrange the pop_means data frame by sex and age_group for consistency
pop_means <- pop_means %>% arrange(sex, age_group)

##6.## Extract unique values for year, sex, and age_group to use in the filling function
year.name <- unique(pop_means$year)
sex.name <- unique(pop_means$sex)
age_group.name <- unique(pop_means$age_group)

##7.## Define a function to fill missing pop_mean values
fillpop <- function(k){
  # Retrieve the current year for the k-th row
  year.this <- year.name[which(pop_means$year[k] == year.name)]
  
  # Retrieve the previous year; handle cases where the current year is the first year
  year.last <- year.name[which(pop_means$year[k] == year.name) - 1]
  
  # Retrieve the current sex for the k-th row
  sex.this <- sex.name[which(pop_means$sex[k] == sex.name)]
  
  # Retrieve the current age_group for the k-th row
  age_group.this <- age_group.name[which(pop_means$age_group[k] == age_group.name)]
  
  # Attempt to retrieve the pop_mean from the previous year, same sex, and same age_group
  pop.tmp <- tryCatch({
    pop_means %>%
      filter(year == year.last, sex == sex.this, age_group == age_group.this) %>%  # Filter for previous year, current sex, and current age_group
      pull(pop_mean)                                                                # Extract the pop_mean value
  }, error = function(e) { 
    # If an error occurs (e.g., previous year does not exist), return the mean pop_mean across all years for the same sex and age_group
    # Ideally, you would look back 2 or 3 years, but for simplicity, use the average when previous data is unavailable
    pop_means %>%  
      filter(sex == sex.this, age_group == age_group.this) %>%                   # Filter for current sex and age_group across all years
      summarise(pop.tmp = mean(pop_mean, na.rm=TRUE)) %>%                        # Calculate the average pop_mean
      pull(pop.tmp)                                                                  # Extract the average pop_mean
  })
  
  # Return the filled pop_mean value
  return(pop.tmp)
}

##8.## Create a copy of pop_means to store the filled values
pop_means2 <- pop_means

##9.## Loop through each row of pop_means to fill in missing pop_mean values
for(k in 1:nrow(pop_means)){
  # Check if the pop_mean for the k-th row is NaN
  if(is.nan(pop_means[k,]$pop_mean)){
    # If NaN, replace it with the value returned by the fillpop function
    pop_means2[k,]$pop_mean <- fillpop(k)
  }
}

##10.## Sort suicide_trends_filtered_age by year, sex, and age_group to align with pop_means2
suicide_trends_filtered_age <- suicide_trends_filtered_age %>% arrange(year, sex, age_group)

##11.## Sort pop_means2 by year, sex, and age_group to ensure alignment
pop_means2 <- pop_means2 %>% arrange(year, sex, age_group)

##12.## Replace the 'popcount' column in suicide_trends_filtered_age with the filled 'pop_mean' values from pop_means2
suicide_trends_filtered_age$popcount <- pop_means2$pop_mean

##13.## Filter and transform data for the bar plot
suicide_trends_age <- suicide_trends_filtered_age %>%
  filter(
    data_status == "Suspected",                        # Select rows where data_status is "Suspected"
    sex %in% c("Male", "Female"),                      # Select only "Male" and "Female" sexes
    age_group != "All ages",                           # Exclude rows where age_group is "All ages"
    ethnicity == "All ethnic groups",                  # Select rows where ethnicity is "All ethnic groups"
    year == 2023                                       # Select data for the year 2023
  ) %>%
  mutate(number = as.numeric(number)) %>%               # Convert the 'number' column to numeric type
  mutate(rate = as.numeric(number)/as.numeric(popcount)*100000)  # Calculate the suicide rate per 100,000 population

##14.## Get unique age groups and set factor levels in order
age_levels <- unique(suicide_trends_filtered_age$age_group)  # Extract unique age groups from the data
suicide_trends_age$age_group <- factor(suicide_trends_age$age_group, levels = age_levels)  # Set the order of age_group factors

##15.## Define colors for the bar plot
bar_colors <- c(
  "Female" = rgb(102/255, 102/255, 153/255),  # Define color for "Female"
  "Male" = rgb(255/255, 102/255, 102/255)     # Define color for "Male"
)

##16.## Create the stacked bar plot for the number of suicide deaths
ggplot(suicide_trends_age, aes(x = age_group, y = number, fill = sex)) +
  geom_bar(stat = "identity") +  # Use identity statistic to represent actual values
  labs(
    title = "Number of Suicide by Age Group and Sex in Aotearoa New Zealand, 2023",  # Set plot title
    x = "Age Group",                                                                      # Set x-axis label
    y = "Number (Suspected)",                                                             # Set y-axis label
    fill = "Sex"                                                                           # Set legend title
  ) +
  scale_fill_manual(values = bar_colors) +  # Apply custom colors to the bars
  theme_minimal() +                        # Use a minimal theme for the plot
  theme(
    axis.text.x = element_text(angle = 0, hjust = 0.5)  # Adjust x-axis text for better readability
  )

##17.## Create the stacked bar plot for the suicide rate
ggplot(suicide_trends_age, aes(x = age_group, y = rate, fill = sex)) +
  geom_bar(stat = "identity") +  # Use identity statistic to represent actual values
  labs(
    title = "Suicide Rate by Age Group and Sex in Aotearoa New Zealand, 2023",  # Set plot title
    x = "Age Group",                                                           # Set x-axis label
    y = "Rate (Suspected)",                                                    # Set y-axis label
    fill = "Sex"                                                                # Set legend title
  ) +
  scale_fill_manual(values = bar_colors) +  # Apply custom colors to the bars
  theme_minimal() +                        # Use a minimal theme for the plot
  theme(
    axis.text.x = element_text(angle = 0, hjust = 0.5)  # Adjust x-axis text for better readability
  )

##1.## Load necessary libraries

  • ggplot2: A powerful package for creating a wide range of static and dynamic visualizations.

  • dplyr: Provides a set of functions (verbs) for data manipulation, such as filtering, selecting, and summarizing data.

  • readr: Facilitates reading data into R, particularly CSV files, with functions that are faster and more user-friendly than base R functions.

  • zoo: Offers functions for handling missing values, time series data, and data imputation.

##2.## Load the data from a CSV file

  • read_csv: Reads the specified CSV file into a tibble (a modern version of a data frame).

##3.## Filter and transform data for the line plot

  • filter: Selects rows that meet the specified conditions:

    • data_status == "Suspected": Keeps rows where data_status is “Suspected”.

    • sex %in% c("Male", "Female", "All sex"): Keeps rows where sex is either “Male”, “Female”, or “All sex”.

    • ethnicity == "All ethnic groups": Keeps rows where ethnicity is “All ethnic groups”.

    • age_group != "All ages": Excludes rows where age_group is “All ages”.

  • mutate: Creates or transforms columns:

    • number = as.numeric(number): Converts the number column to numeric data type to ensure proper calculations.

##4.## Group by data_status, year, sex, and age_group, then calculate the average popcount for each group

  • mutate:

    • popcount_num = as.numeric(popcount): Converts the popcount column to numeric.

    • if_else(popcount == "S", NA_real_, popcount_num): Replaces any occurrence of “S” in popcount with NA (missing value).

  • group_by: Groups the data by data_status, year, sex, and age_group to prepare for summarization.

  • summarise:

    • pop_mean = mean(popcount_num, na.rm = TRUE): Calculates the average population count for each group, ignoring NA values.

    • .groups = 'drop': Removes the grouping after summarization to return an ungrouped data frame.

##5.## Arrange the pop_means data frame by sex and age_group for consistency

  • arrange: Sorts the pop_means data frame first by sex and then by age_group to ensure consistent ordering.

##6.## Extract unique values for year, sex, and age_group to use in the filling function

  • unique: Extracts unique values from the specified columns to create vectors (year.name, sex.name, age_group.name) that will be used in the fillpop function.

##7.## Define a function to fill missing pop_mean values

  • fillpop: A custom function designed to handle missing (NaN) values in the pop_mean column.

    • Parameters:

      • k: The row index in pop_means where the pop_mean is missing.
    • Process:

      1. Retrieve Current Values: Extracts the current year, sex, and age_group based on the row index k.

      2. Identify Previous Year: Determines the previous year (year.last) relative to the current year.

      3. Attempt to Retrieve Previous pop_mean:

        • Uses tryCatch to attempt to filter pop_means for the previous year, same sex, and same age_group to get the pop_mean.

        • If successful, pop.tmp will hold the pop_mean from the previous year.

      4. Handle Errors:

        • If an error occurs (e.g., previous year does not exist), the function calculates the average pop_mean across all available years for the same sex and age_group.
      5. Return Value: The function returns the filled pop_mean value (pop.tmp), either from the previous year or the average.

##8.## Create a copy of pop_means to store the filled values

  • pop_means2: A duplicate of pop_means where the filled pop_mean values will be stored.

##9.## Loop through each row of pop_means to fill in missing pop_mean values

  • for Loop:

    • Iterates over each row of pop_means.

    • Condition: Checks if the pop_mean for the current row (k) is NaN using is.nan().

    • Action: If pop_mean is NaN, the function fillpop(k) is called to compute a replacement value, which is then assigned to pop_means2[k,]$pop_mean.

##11.## Sort pop_means2 by year, sex, and age_group to ensure alignment

  • arrange: Sorts pop_means2 by year, sex, and age_group to ensure it aligns correctly with suicide_trends_filtered_age.

##13.## Filter and transform data for the bar plot

  • filter: Selects rows that meet specific conditions:

    • data_status == "Suspected": Keeps rows where data_status is “Suspected”.

    • sex %in% c("Male", "Female"): Only includes “Male” and “Female” categories, excluding “All sex”.

    • age_group != "All ages": Excludes rows where age_group is “All ages”.

    • ethnicity == "All ethnic groups": Keeps rows where ethnicity is “All ethnic groups”.

    • year == 2023: Focuses on data from the year 2023.

  • mutate:

    • number = as.numeric(number): Ensures the number column is numeric.

    • rate = as.numeric(number)/as.numeric(popcount)*100000: Calculates the suicide rate per 100,000 population.

##14.## Get unique age groups and set factor levels in order

  • unique: Retrieves all unique age groups present in the data.

  • factor:

    • Converts the age_group column to a factor with levels ordered according to age_levels. This ensures that the age groups appear in a logical and consistent order in the plots.

##15.## Define colors for the bar plot

  • rgb: Specifies colors using red, green, and blue components, each ranging from 0 to 1.

    • "Female" is assigned a muted purple color.

    • "Male" is assigned a light red color.

##16.## Create the stacked bar plot for the number of suicide deaths

  • ggplot: Initializes the plotting with suicide_trends_age as the data source.

    • aes: Maps aesthetics, setting age_group on the x-axis, number on the y-axis, and filling bars based on sex.
  • geom_bar(stat = “identity”): Creates bar plots where the heights of the bars represent actual data values.

  • labs: Adds labels to the plot, including title, x-axis, y-axis, and legend.

  • scale_fill_manual: Applies the custom colors defined in bar_colors to the bars based on sex.

  • theme_minimal: Applies a clean, minimalistic theme to the plot.

  • theme:

    • axis.text.x = element_text(angle = 0, hjust = 0.5): Keeps the x-axis labels horizontal and centers them.

##17.## Create the stacked bar plot for the suicide rate

  • ggplot: Initializes the plotting with suicide_trends_age as the data source.

    • aes: Maps aesthetics, setting age_group on the x-axis, rate on the y-axis, and filling bars based on sex.
  • geom_bar(stat = “identity”): Creates bar plots where the heights of the bars represent actual data values.

  • labs: Adds labels to the plot, including title, x-axis, y-axis, and legend.

  • scale_fill_manual: Applies the custom colors defined in bar_colors to the bars based on sex.

  • theme_minimal: Applies a clean, minimalistic theme to the plot.

  • theme:

    • axis.text.x = element_text(angle = 0, hjust = 0.5): Keeps the x-axis labels horizontal and centers them.

Summary of the Workflow

  1. Loading Libraries (##1.##):

    • Essential libraries for data manipulation, reading, handling missing values, and visualization are loaded.
  2. Data Loading (##2.##):

    • Suicide trend data is imported from a CSV file into R for analysis.
  3. Data Filtering and Transformation (##3.##):

    • The data is filtered to include only relevant categories, such as suspected cases, specific sexes, all ethnic groups, and specific age groups.

    • The number column is converted to a numeric type to facilitate accurate calculations.

  4. Calculating Average Population Counts (##4.##):

    • The population counts (popcount) are converted to numeric, with any “S” entries replaced by NA.

    • The data is grouped by relevant categories to compute the average population count (pop_mean) for each group.

  5. Data Arrangement (##5.##):

    • The pop_means data frame is sorted by sex and age_group to ensure consistency in subsequent operations.
  6. Extracting Unique Values (##6.##):

    • Unique values for year, sex, and age_group are extracted to facilitate the filling of missing population counts.
  7. Defining the Filling Function (##7.##):

    • A custom function fillpop is defined to handle missing (NaN) population counts by attempting to retrieve the value from the previous year or, if unavailable, using the average across available years for the same sex and age_group.
  8. Creating a Copy for Filled Values (##8.##):

    • A duplicate of pop_means (pop_means2) is created to store the filled pop_mean values.
  9. Filling Missing Values via Loop (##9.##):

    • A loop iterates through each row of pop_means, checking for NaN values in pop_mean and filling them using the fillpop function.
  10. Sorting for Alignment (##10.## & ##11.##):

    • Both suicide_trends_filtered_age and pop_means2 are sorted by year, sex, and age_group to ensure proper alignment for data replacement.
  11. Updating the Original Data Frame (##12.##):

    • The original suicide_trends_filtered_age data frame is updated with the filled pop_mean values from pop_means2.
  12. Preparing Data for Visualization (##13.##):

    • The data is further filtered for the year 2023, focusing on “Male” and “Female” sexes and specific age groups.

    • Suicide rates per 100,000 population are calculated to facilitate meaningful comparisons.

  13. Setting Factor Levels (##14.##):

    • Age groups are ordered consistently to ensure that visualizations display age categories in a logical sequence.
  14. Defining Plot Colors (##15.##):

    • Custom colors are defined for the “Male” and “Female” categories to enhance the visual appeal and clarity of the plots.
  15. Creating the Number of Suicide Deaths Bar Plot (##16.##):

    • A stacked bar plot is created to visualize the number of suicide deaths by age group and sex.

    • Custom labels, colors, and themes are applied for better readability and aesthetics.

  16. Creating the Suicide Rate Bar Plot (##17.##):

    • Another stacked bar plot is created to visualize the suicide rate per 100,000 population by age group and sex.

    • Similar customization as the previous plot ensures consistency and clarity.

##1.## 必要なライブラリをロードする
library(ggplot2)  # 可視化のためのライブラリ
library(dplyr)    # データ操作のためのライブラリ
library(readr)    # CSVファイルの読み込みのためのライブラリ
library(zoo)      # 欠損値処理や時系列データのためのライブラリ

##2.## CSVファイルからデータを読み込む
#suicide_trends <- read_csv("data/suicide-trends-by-ethnicity-by-calendar-year.csv")
suicide_trends <- read_csv("https://takafumikubota.jp/suicide/nz/suicide-trends-by-ethnicity-by-calendar-year.csv")

##3.## ラインチャート用にデータをフィルタリングおよび変換する
suicide_trends_filtered_age <- suicide_trends %>%
  filter(
    data_status == "Suspected",                    # data_statusが"Suspected"の行を選択
    sex %in% c("Male", "Female", "All sex"),       # sexが"Male"、"Female"、"All sex"の行を選択
    ethnicity == "All ethnic groups",              # ethnicityが"All ethnic groups"の行を選択
    age_group != "All ages"                        # age_groupが"All ages"ではない行を選択
  ) %>%
  mutate(number = as.numeric(number))              # 'number'列を数値型に変換

##4.## data_status、year、sex、age_groupでグループ化し、各グループの平均popcountを計算する
pop_means <- suicide_trends_filtered_age %>%
  mutate(
    popcount_num = as.numeric(popcount),                                  # 'popcount'を数値型に変換
    popcount_num = if_else(popcount == "S", NA_real_, popcount_num)       # 'popcount'が"S"の場合、NAに置換
  ) %>%
  group_by(data_status, year, sex, age_group) %>%                        # data_status、year、sex、age_groupでグループ化
  summarise(
    pop_mean = mean(popcount_num, na.rm = TRUE),                         # グループごとのpopcount_numの平均を計算
    .groups = 'drop'                                                     # グループ化を解除
  )

##5.## pop_meansデータフレームをsexとage_groupで並べ替える
pop_means <- pop_means %>% arrange(sex, age_group)

##6.## fillpop関数で使用するyear、sex、age_groupのユニークな値を抽出する
year.name <- unique(pop_means$year)
sex.name <- unique(pop_means$sex)
age_group.name <- unique(pop_means$age_group)

##7.## 欠損したpop_mean値を補完する関数を定義する
fillpop <- function(k){
  # k行目の現在のyearを取得
  year.this <- year.name[which(pop_means$year[k] == year.name)]
  
  # 前のyearを取得(現在のyearが最初の場合を考慮)
  year.last <- year.name[which(pop_means$year[k] == year.name) - 1]
  
  # k行目の現在のsexを取得
  sex.this <- sex.name[which(pop_means$sex[k] == sex.name)]
  
  # k行目の現在のage_groupを取得
  age_group.this <- age_group.name[which(pop_means$age_group[k] == age_group.name)]
  
  # 前のyear、同じsex、同じage_groupのpop_meanを取得しようと試みる
  pop.tmp <- tryCatch({
    pop_means %>%
      filter(year == year.last, sex == sex.this, age_group == age_group.this) %>%  # 前のyear、同じsex、同じage_groupでフィルタリング
      pull(pop_mean)                                                                # pop_meanを抽出
  }, error = function(e) { 
    # エラーが発生した場合(例:前のyearが存在しない場合)、同じsexとage_groupの全年の平均を返す
    pop_means %>%  
      filter(sex == sex.this, age_group == age_group.this) %>%                   # 同じsexとage_groupでフィルタリング
      summarise(pop.tmp = mean(pop_mean, na.rm=TRUE)) %>%                        # pop_meanの平均を計算
      pull(pop.tmp)                                                                  # 平均値を抽出
  })
  
  # 補完したpop_mean値を返す
  return(pop.tmp)
}

##8.## 補完後の値を格納するためにpop_meansのコピーを作成する
pop_means2 <- pop_means

##9.## pop_meanがNaNの行を補完するために各行をループする
for(k in 1:nrow(pop_means)){
  # k行目のpop_meanがNaNかどうかを確認
  if(is.nan(pop_means[k,]$pop_mean)){
    # NaNの場合、fillpop関数を使用して値を補完
    pop_means2[k,]$pop_mean <- fillpop(k)
  }
}

##10.## suicide_trends_filtered_ageをyear、sex、age_groupで並べ替えてpop_means2と整合させる
suicide_trends_filtered_age <- suicide_trends_filtered_age %>% arrange(year, sex, age_group)

##11.## pop_means2をyear、sex、age_groupで並べ替えて整合させる
pop_means2 <- pop_means2 %>% arrange(year, sex, age_group)

##12.## 補完したpop_mean値でsuicide_trends_filtered_ageのpopcountを置き換える
suicide_trends_filtered_age$popcount <- pop_means2$pop_mean

##13.## バープロット用にデータをフィルタリングおよび変換する
suicide_trends_age <- suicide_trends_filtered_age %>%
  filter(
    data_status == "Suspected",                        # data_statusが"Suspected"の行を選択
    sex %in% c("Male", "Female"),                      # sexが"Male"または"Female"の行を選択
    age_group != "All ages",                           # age_groupが"All ages"ではない行を選択
    ethnicity == "All ethnic groups",                  # ethnicityが"All ethnic groups"の行を選択
    year == 2023                                       # yearが2023の行を選択
  ) %>%
  mutate(number = as.numeric(number)) %>%               # 'number'列を数値型に変換
  mutate(rate = as.numeric(number)/as.numeric(popcount)*100000)  # 自殺率を計算(人口10万人あたり)

##14.## ユニークなage_groupを取得し、順序を設定する
age_levels <- unique(suicide_trends_filtered_age$age_group)  # データからユニークなage_groupを抽出
suicide_trends_age$age_group <- factor(suicide_trends_age$age_group, levels = age_levels)  # age_groupの順序を設定

##15.## バープロットの色を定義する
bar_colors <- c(
  "Female" = rgb(102/255, 102/255, 153/255),  # "Female"の色を定義
  "Male" = rgb(255/255, 102/255, 102/255)     # "Male"の色を定義
)

##16.## 自殺死亡数の積み上げ棒グラフを作成する
ggplot(suicide_trends_age, aes(x = age_group, y = number, fill = sex)) +
  geom_bar(stat = "identity") +  # 実際の値を使用して棒グラフを描画
  labs(
    title = "Number of Suicide by Age Group and Sex in Aotearoa New Zealand, 2023",  # グラフのタイトル
    x = "Age Group",                                                                      # x軸のラベル
    y = "Number (Suspected)",                                                             # y軸のラベル
    fill = "Sex"                                                                           # 凡例のタイトル
  ) +
  scale_fill_manual(values = bar_colors) +  # 定義した色を棒に適用
  theme_minimal() +                        # ミニマルなテーマを適用
  theme(
    axis.text.x = element_text(angle = 0, hjust = 0.5)  # x軸のテキストを水平に配置
  )

##17.## 自殺率の積み上げ棒グラフを作成する
ggplot(suicide_trends_age, aes(x = age_group, y = rate, fill = sex)) +
  geom_bar(stat = "identity") +  # 実際の値を使用して棒グラフを描画
  labs(
    title = "Suicide Rate by Age Group and Sex in Aotearoa New Zealand, 2023",  # グラフのタイトル
    x = "Age Group",                                                           # x軸のラベル
    y = "Rate (Suspected)",                                                    # y軸のラベル
    fill = "Sex"                                                                # 凡例のタイトル
  ) +
  scale_fill_manual(values = bar_colors) +  # 定義した色を棒に適用
  theme_minimal() +                        # ミニマルなテーマを適用
  theme(
    axis.text.x = element_text(angle = 0, hjust = 0.5)  # x軸のテキストを水平に配置
  )