Visualization of suicide data in New Zealand (2)-2 - Age Group

Author

Takafumi Kubota

Published

November 4, 2024

Abstract

This report analyzes suicide trends in Aotearoa New Zealand for 2023, focusing on differences across age groups and sexes. Using “Suspected” case data from all ethnic groups, the study employs data cleaning and transformation to ensure accuracy. Utilizing R and ggplot2, the report presents stacked bar charts showing both the number of suicide deaths and rates per 100,000 population across age categories. The findings provide insights for public health officials and policymakers to identify high-risk groups and develop targeted intervention strategies, contributing to efforts to reduce suicide rates and enhance mental health support in New Zealand. (Update for missing data.)

Keywords

R language, Suicide, New Zealand, Bar Chart

Introduction

This page includes information about numbers and rates of suicide deaths in Aotearoa New Zealand. If at any point you feel worried about harming yourself while viewing the information in this page—or if you think someone else may be in danger—please stop reading and seek help.

Suicide remains a critical public health concern globally, with profound impacts on individuals, families, and communities. In Aotearoa New Zealand, understanding the underlying patterns and demographic disparities in suicide trends is essential for developing effective prevention strategies. This report delves into the suicide trends for the year 2023, analyzing data categorized under “Suspected” cases across all ethnic groups. The primary objective is to elucidate the variations in suicide occurrences and rates among different age groups and sexes, thereby identifying vulnerable populations that may benefit from targeted interventions. Employing a comprehensive data analysis approach, the study utilizes R programming to process and visualize the data. Initial steps involve meticulous data cleaning, including converting relevant variables to numeric formats and handling missing or anomalous values. Calculating average population counts (pop_mean) for each demographic segment serves as a foundation for subsequent rate computations. To address gaps in the data, a custom imputation function is implemented, ensuring that missing pop_mean values are estimated based on historical trends or overall group averages. The visualization phase leverages ggplot2 to create stacked bar charts that effectively communicate both the absolute number of suicide deaths and the corresponding rates per 100,000 population across defined age groups and sexes. These visual representations facilitate the identification of patterns and disparities, offering actionable insights for stakeholders. By highlighting the interplay between year, age, sex, and suicide rates, this report contributes to the ongoing discourse on mental health in New Zealand. The findings aim to support policymakers and healthcare providers in prioritizing resources and designing interventions that address the specific needs of high-risk groups, ultimately striving to reduce the incidence of suicide and promote mental well-being across the nation.

The data on this page is sourced from the Suicide Data Web Tool provided by Health New Zealand, specifically from https://tewhatuora.shinyapps.io/suicide-web-tool/, and is licensed under a Creative Commons Attribution 4.0 International License.

This visualisation shows only calendar years. It also visualises only suspected suicides. The following notes are given on the site of the Suicide Data Web Tool:

Short term year-on-year data are not an accurate indicator by which to measure trends. Trends can only be considered over a five to ten year period, or longer.
Confirmed suicide rates generally follow the same pattern as suspected suicide rates.

On the technical information page for the Suicide Data Web Tool, the following is written as a cautionary note on ‘Interpreting Numerical Values and Rates’. For the purpose of visualisation, this page uses suicide rates calculated by extracting or calculating the population from similar attributes. You should be very careful when interpreting the graphs.

For groups where suicide numbers are very low, small changes in the numbers of suicide deaths across years can result in large changes in the corresponding rates. Rates that are based on such small numbers are not reliable and can show large changes over time that may not accurately represent underlying suicide trends. Because of issues with particularly small counts, rates in this web tool are not calculated for groups with fewer than six suicide deaths in a given year.

The vertical axis shows the number of suicides per 100,000 males and the same number for females, so it is necessary to be aware that when considering all sexes, the number of suicides per 200,000 people is shown. In other words, when comparing with actual values (such as the Suicide Data Web Tool), the values on this page are simply divided by 2.

# 1. Load necessary libraries}
library(ggplot2)  # For data visualization
library(dplyr)    # For data manipulation
library(readr)    # For reading CSV files
library(zoo)      # For time-series data manipulation and handling missing values

# 2. Load the data from a CSV file
# The following line is commented out and can be used to load data from a local directory
# suicide_trends <- read_csv("data/suicide-trends-by-ethnicity-by-calendar-year.csv")
# Load the dataset directly from an external URL
suicide_trends <- read_csv("https://takafumikubota.jp/suicide/nz/suicide-trends-by-ethnicity-by-calendar-year.csv")

# 3. Filter and transform data for the line plot
suicide_trends_filtered_age <- suicide_trends %>%
  filter(
    data_status == "Suspected",                        # Include only suspected cases
    sex %in% c("Male", "Female", "All sex"),           # Include Male, Female, and All sex categories
    ethnicity == "All ethnic groups",                  # Include all ethnic groups
    age_group != "All ages"                             # Exclude aggregated age groups
  ) %>%
  mutate(number = as.numeric(number))                   # Convert the 'number' column to numeric type for accurate calculations

# 4. Group by data_status, year, sex, and age_group, then calculate the average popcount for each group
pop_means <- suicide_trends_filtered_age %>%
  mutate(
    popcount_num = as.numeric(popcount),                # Convert 'popcount' to numeric
    popcount_num = if_else(popcount == "S", NA_real_, popcount_num)  # Replace 'S' with NA for accurate calculations
  ) %>%
  group_by(data_status, year, sex, age_group) %>%       # Group data by status, year, sex, and age group
  summarise(
    pop_mean = mean(popcount_num, na.rm = TRUE),        # Calculate the mean population count, ignoring NA values
    .groups = 'drop'                                     # Ungroup after summarising
  )

# 5. Arrange the pop_means data frame by sex, age_group, and year for consistency
pop_means <- pop_means %>% arrange(sex, age_group, year)  # Sort the data for orderly processing

# 6. Fill missing pop_mean values using the previous year's value or group average
pop_means <- pop_means %>%
  group_by(data_status, sex, age_group) %>%              # Group by status, sex, and age group
  arrange(year) %>%                                      # Arrange data chronologically by year
  mutate(
    pop_mean = if_else(is.na(pop_mean), lag(pop_mean), pop_mean),  # Replace NA with the previous year's population mean
    pop_mean = if_else(is.na(pop_mean), mean(pop_mean, na.rm = TRUE), pop_mean)  # If still NA, replace with the group average
  ) %>%
  ungroup()                                              # Remove grouping

# 7. Replace the 'popcount' column in suicide_trends_filtered_age with the filled 'pop_mean' values from pop_means
suicide_trends_filtered_age <- suicide_trends_filtered_age %>%
  arrange(year, sex, age_group) %>%                      # Arrange data to match the order in pop_means
  mutate(popcount = pop_means$pop_mean)                  # Update 'popcount' with the filled population means

# 8. Filter and transform data for the bar plot
suicide_trends_age <- suicide_trends_filtered_age %>%
  filter(
    data_status == "Suspected",                           # Include only suspected cases
    sex %in% c("Male", "Female"),                        # Include only Male and Female categories
    age_group != "All ages",                              # Exclude aggregated age groups
    ethnicity == "All ethnic groups",                     # Include all ethnic groups
    year == 2023                                          # Focus on the year 2023
  ) %>%
  mutate(
    number = as.numeric(number),                           # Ensure 'number' is numeric
    rate = number / popcount * 100000                      # Calculate the suicide rate per 100,000 population
  )

# 9. Get unique age groups and set factor levels in order
age_levels <- unique(suicide_trends_filtered_age$age_group)  # Extract unique age groups to maintain order in plots
suicide_trends_age$age_group <- factor(suicide_trends_age$age_group, levels = age_levels)  # Set factor levels for consistent plotting

# 10. Define colors for the bar plot
bar_colors <- c(
  "Female" = rgb(102/255, 102/255, 153/255),  # Purple tone for Female
  "Male" = rgb(255/255, 102/255, 102/255)     # Pink tone for Male
)

# 11. Create the stacked bar plot for the number of suicide deaths
ggplot(suicide_trends_age, aes(x = age_group, y = number, fill = sex)) +
  geom_bar(stat = "identity") +                          # Create bars with heights corresponding to 'number'
  labs(
    title = "Number of Suicide by Age Group and Sex in Aotearoa New Zealand, 2023",  # Set plot title
    x = "Age Group",                                     # Label for x-axis
    y = "Number (Suspected)",                            # Label for y-axis
    fill = "Sex"                                         # Label for the fill legend
  ) +
  scale_fill_manual(values = bar_colors) +               # Apply custom colors to the bars based on sex
  theme_minimal() +                                      # Use a minimal theme for a clean look
  theme(
    axis.text.x = element_text(angle = 0, hjust = 0.5)   # Adjust x-axis text for readability
  )

# 12. Create the stacked bar plot for the suicide rate
ggplot(suicide_trends_age, aes(x = age_group, y = rate, fill = sex)) +
  geom_bar(stat = "identity") +                          # Create bars with heights corresponding to 'rate'
  labs(
    title = "Suicide Rate by Age Group and Sex in Aotearoa New Zealand, 2023",  # Set plot title
    x = "Age Group",                                     # Label for x-axis
    y = "Rate (Suspected)",                              # Label for y-axis
    fill = "Sex"                                         # Label for the fill legend
  ) +
  scale_fill_manual(values = bar_colors) +               # Apply custom colors to the bars based on sex
  theme_minimal() +                                      # Use a minimal theme for a clean look
  theme(
    axis.text.x = element_text(angle = 0, hjust = 0.5)   # Adjust x-axis text for readability
  )

1. Loading Necessary Libraries

The script begins by loading essential R libraries:

ggplot2: Facilitates data visualization through advanced plotting capabilities.
dplyr: Provides a suite of functions for efficient data manipulation and transformation.
readr: Enables fast and friendly reading of rectangular data, such as CSV files.
zoo: Offers tools for working with ordered observations, particularly useful for handling time-series data and filling in missing values.

2. Loading the Data

The dataset containing suicide trends in Aotearoa New Zealand is loaded using the read_csv() function from the readr package. The data is sourced directly from an external URL, ensuring that the latest available data is used. An alternative commented line is provided for loading the data from a local directory if preferred.

3. Filtering and Transforming Data for the Line Plot

The dataset is filtered to include only rows where:

data_status is "Suspected", focusing on suspected suicide cases.
sex is either "Male", "Female", or "All sex", ensuring the inclusion of all relevant sex categories.
ethnicity is "All ethnic groups", aggregating data across all ethnicities.
age_group is not "All ages", allowing for analysis across specific age brackets.

After filtering, the number column, representing the count of suicide cases, is converted to a numeric type to facilitate accurate calculations and visualizations.

4. Calculating Average Population Count

To prepare for rate calculations, the script:

Converts the popcount column to numeric, handling any non-numeric entries by replacing "S" with NA to signify missing values.
Groups the data by data_status, year, sex, and age_group to calculate the mean population (pop_mean) for each group, ignoring NA values. This aggregation is crucial for subsequent rate calculations.

5. Arranging the Population Means Data Frame

The resulting pop_means data frame is sorted by sex, age_group, and year. This arrangement ensures that the data is orderly and facilitates the filling of missing values in a logical sequence.

6. Filling Missing Population Mean Values

To address any missing pop_mean values:

The data is grouped by data_status, sex, and age_group.
Within each group, the data is ordered by year.
Missing pop_mean values are first filled using the previous year’s value (lag(pop_mean)). If a previous value is unavailable, the group’s average population count is used.
This step ensures a complete dataset without gaps in population counts, which is essential for accurate rate calculations.

7. Updating the ‘popcount’ Column

The original popcount column in the suicide_trends_filtered_age data frame is replaced with the newly filled pop_mean values from the pop_means data frame. This update ensures that all population counts are complete and reliable for further analysis.

8. Preparing Data for the Bar Plot

For the bar plot visualization:

The data is further filtered to include only suspected cases (data_status == "Suspected"), specific sexes ("Male" and "Female"), non-aggregated age groups, all ethnic groups, and the year 2023.
The number column is ensured to be numeric.
A new rate column is calculated, representing the suicide rate per 100,000 population (number / popcount * 100000). This rate provides a standardized measure for comparing suicide prevalence across different age groups and sexes.

9. Setting Factor Levels for Age Groups

To maintain consistent and meaningful ordering in the plots:

Unique age groups are extracted to define the order of factors.
The age_group column in the suicide_trends_age data frame is converted to a factor with levels set according to the extracted unique age groups. This step ensures that the x-axis in the plots reflects the correct and intended order of age groups.

10. Defining Colors for the Bar Plot

A custom color palette is defined using the rgb() function to assign specific colors to each sex category:

Female: Assigned a purple tone (rgb(102/255, 102/255, 153/255)).
Male: Assigned a pink tone (rgb(255/255, 102/255, 102/255)).

This color differentiation enhances the visual distinction between the sexes in the bar plots.

11. Creating the Stacked Bar Plot for Suicide Deaths

A stacked bar plot is generated to display the number of suicide deaths by age group and sex for the year 2023:

Axes: age_group on the x-axis and number of deaths on the y-axis.
Fill: Bars are filled based on the sex category, using the predefined bar_colors.
Geometries: geom_bar(stat = "identity") creates bars with heights corresponding to the actual number of deaths.
Labels: The plot includes a title, axis labels, and a legend title for clarity.
Theme: theme_minimal() is applied for a clean and uncluttered appearance, and x-axis text is adjusted for readability.

12. Creating the Stacked Bar Plot for Suicide Rate

A similar stacked bar plot is created to visualize the suicide rate per 100,000 population:

Axes: age_group on the x-axis and rate on the y-axis.
Fill: Bars are filled based on the sex category, maintaining consistency with the previous plot.
Geometries: geom_bar(stat = "identity") ensures that bar heights accurately reflect the suicide rates.
Labels: The plot includes a descriptive title and appropriate axis labels.
Theme: The same minimalistic theme is applied, and x-axis text is formatted for clarity.

Summary

This R script meticulously processes and visualizes suicide trends in Aotearoa New Zealand for the year 2023. By loading and cleaning the data, calculating meaningful statistics such as suicide rates, and employing clear and informative visualizations, the analysis provides valuable insights into the demographic patterns of suicide deaths. The use of custom colors and organized plotting techniques enhances the interpretability of the data, making it a useful tool for public health officials, researchers, and policymakers aiming to understand and address suicide trends effectively.

# 1. 必要なライブラリを読み込む
library(ggplot2)  # データの可視化を行うためのパッケージ
library(dplyr)    # データ操作を簡単に行うためのパッケージ
library(readr)    # CSVファイルを読み込むためのパッケージ
library(zoo)      # 時系列データの操作や欠損値処理を行うパッケージ

# 2. CSVファイルからデータを読み込む
# ローカルディレクトリからデータを読み込む場合は、以下のコメントを外して使用する
# suicide_trends <- read_csv("data/suicide-trends-by-ethnicity-by-calendar-year.csv")
# 外部URLから直接データを読み込む
suicide_trends <- read_csv("https://takafumikubota.jp/suicide/nz/suicide-trends-by-ethnicity-by-calendar-year.csv")

# 3. ラインチャート用にデータをフィルタリングおよび変換する
suicide_trends_filtered_age <- suicide_trends %>%
  filter(
    data_status == "Suspected",                        # データステータスが「Suspected」の行のみを含める
    sex %in% c("Male", "Female", "All sex"),           # 性別が「Male」「Female」「All sex」の行のみを含める
    ethnicity == "All ethnic groups",                  # 民族グループが「All ethnic groups」の行のみを含める
    age_group != "All ages"                             # 年齢グループが「All ages」ではない行のみを含める
  ) %>%
  mutate(number = as.numeric(number))                   # 「number」列を数値型に変換する

# 4. data_status、year、sex、age_groupでグループ化し、各グループの平均人口数を計算する
pop_means <- suicide_trends_filtered_age %>%
  mutate(
    popcount_num = as.numeric(popcount),                # 「popcount」列を数値型に変換する
    popcount_num = if_else(popcount == "S", NA_real_, popcount_num)  # 「popcount」が「S」の場合はNAに置換する
  ) %>%
  group_by(data_status, year, sex, age_group) %>%       # 「data_status」「year」「sex」「age_group」でグループ化する
  summarise(
    pop_mean = mean(popcount_num, na.rm = TRUE),        # 各グループの「popcount_num」の平均を計算し、「pop_mean」として保存する（NAは除外）
    .groups = 'drop'                                     # グループ化を解除する
  )

# 5. pop_meansデータフレームを性別、年齢グループ、年で整列させ、一貫性を保つ
pop_means <- pop_means %>% arrange(sex, age_group, year)  # 「sex」「age_group」「year」でデータを整列させる

# 6. 欠損しているpop_mean値を前年の値またはグループ平均で埋める
pop_means <- pop_means %>%
  group_by(data_status, sex, age_group) %>%              # 「data_status」「sex」「age_group」でグループ化する
  arrange(year) %>%                                      # 「year」でデータを年順に整列させる
  mutate(
    pop_mean = if_else(is.na(pop_mean), lag(pop_mean), pop_mean),  # pop_meanがNAの場合、前年の値で置換する
    pop_mean = if_else(is.na(pop_mean), mean(pop_mean, na.rm = TRUE), pop_mean)  # それでもNAの場合、グループの平均値で置換する
  ) %>%
  ungroup()                                              # グループ化を解除する

# 7. suicide_trends_filtered_ageの「popcount」列をpop_meansからの埋められた「pop_mean」値で置換する
suicide_trends_filtered_age <- suicide_trends_filtered_age %>%
  arrange(year, sex, age_group) %>%                      # 「year」「sex」「age_group」でデータを整列させる
  mutate(popcount = pop_means$pop_mean)                  # 「popcount」列をpop_meansの「pop_mean」値で更新する

# 8. バープロット用にデータをフィルタリングおよび変換する
suicide_trends_age <- suicide_trends_filtered_age %>%
  filter(
    data_status == "Suspected",                           # データステータスが「Suspected」の行のみを含める
    sex %in% c("Male", "Female"),                        # 性別が「Male」または「Female」の行のみを含める
    age_group != "All ages",                              # 年齢グループが「All ages」ではない行のみを含める
    ethnicity == "All ethnic groups",                     # 民族グループが「All ethnic groups」の行のみを含める
    year == 2023                                          # 年が2023の行のみを含める
  ) %>%
  mutate(
    number = as.numeric(number),                           # 「number」列を数値型に変換する
    rate = number / popcount * 100000                      # 自殺率を計算し、「rate」列に格納する（100,000人あたり）
  )

# 9. 一意の年齢グループを取得し、因子レベルを順序通りに設定する
age_levels <- unique(suicide_trends_filtered_age$age_group)  # 一意の年齢グループを取得する
suicide_trends_age$age_group <- factor(suicide_trends_age$age_group, levels = age_levels)  # 年齢グループの順序を設定する

# 10. バープロットの色を定義する
bar_colors <- c(
  "Female" = rgb(102/255, 102/255, 153/255),  # 女性用に紫色のトーンを設定
  "Male" = rgb(255/255, 102/255, 102/255)     # 男性用にピンク色のトーンを設定
)

# 11. 自殺死亡数のスタックドバープロットを作成する
ggplot(suicide_trends_age, aes(x = age_group, y = number, fill = sex)) +
  geom_bar(stat = "identity") +                          # 「number」に基づいたバーを描画する
  labs(
    title = "Number of Suicide by Age Group and Sex in Aotearoa New Zealand, 2023",  # プロットのタイトル
    x = "Age Group",                                     # x軸のラベル
    y = "Number (Suspected)",                            # y軸のラベル
    fill = "Sex"                                         # 凡例のラベル
  ) +
  scale_fill_manual(values = bar_colors) +               # バーの色を手動で設定する
  theme_minimal() +                                      # シンプルなテーマを適用する
  theme(
    axis.text.x = element_text(angle = 0, hjust = 0.5)   # x軸のテキストを水平に調整する
  )

# 12. 自殺率のスタックドバープロットを作成する
ggplot(suicide_trends_age, aes(x = age_group, y = rate, fill = sex)) +
  geom_bar(stat = "identity") +                          # 「rate」に基づいたバーを描画する
  labs(
    title = "Suicide Rate by Age Group and Sex in Aotearoa New Zealand, 2023",  # プロットのタイトル
    x = "Age Group",                                     # x軸のラベル
    y = "Rate (Suspected)",                              # y軸のラベル
    fill = "Sex"                                         # 凡例のラベル
  ) +
  scale_fill_manual(values = bar_colors) +               # バーの色を手動で設定する
  theme_minimal() +                                      # シンプルなテーマを適用する
  theme(
    axis.text.x = element_text(angle = 0, hjust = 0.5)   # x軸のテキストを水平に調整する
  )

日本語での説明

1. 必要なライブラリを読み込む

このスクリプトは、データの可視化、操作、読み込み、および時系列データの処理を行うために、以下のRパッケージを読み込むことから始まる。

ggplot2: 高度なプロット機能を提供し、データの視覚化を容易にする。
dplyr: データの操作や変換を効率的に行うための関数群を提供する。
readr: CSVなどのデータファイルを迅速かつ容易に読み込むことを可能にする。
zoo: 時系列データの処理や欠損値の補完を支援する。

2. CSVファイルからデータを読み込む

データセットは、ニュージーランドにおける自殺の傾向を示すもので、外部のURLから直接read_csv()関数を使用して読み込まれる。ローカルにデータを保存している場合は、コメントアウトされている行を有効にしてローカルパスから読み込むことができる。

3. ラインチャート用にデータをフィルタリングおよび変換する

読み込まれたデータフレームsuicide_trendsに対して以下のフィルタリングを行う：

data_statusが「Suspected」である行のみを選択。
sexが「Male」、「Female」、または「All sex」の行のみを選択。
ethnicityが「All ethnic groups」である行のみを選択。
age_groupが「All ages」でない行のみを選択。その後、number列を数値型に変換し、後続の分析や可視化で正確な計算が可能となるようにする。

4. data_status、year、sex、age_groupでグループ化し、各グループの平均人口数を計算する

フィルタリングされたデータに対して以下の操作を行う：

popcount列を数値型に変換し、値が「S」である場合はNAに置換。
data_status、year、sex、age_groupでデータをグループ化。
各グループごとにpopcount_numの平均値を計算し、新たにpop_meanとして保存。
グループ化を解除し、後続の処理を容易にする。

5. pop_meansデータフレームを性別、年齢グループ、年で整列させ、一貫性を保つ

計算されたpop_meansデータフレームを、sex、age_group、yearの順で整列させることで、データの一貫性と順序を維持する。

6. 欠損しているpop_mean値を前年の値またはグループ平均で埋める

pop_meanに欠損値が存在する場合、以下の方法で補完を行う：

同じdata_status、sex、age_group内で、前年のpop_mean値を使用して欠損値を埋める。
前年の値が存在しない場合は、グループ全体の平均値で欠損値を埋める。この手法により、データセット内の欠損値が最小限に抑えられ、解析の精度が向上する。

7. suicide_trends_filtered_ageの「popcount」列をpop_meansからの埋められた「pop_mean」値で置換する

フィルタリングおよび補完が完了したpop_meansデータを用いて、元のsuicide_trends_filtered_ageデータフレームのpopcount列を更新する。これにより、popcount列には完全な人口数データが含まれるようになる。

8. バープロット用にデータをフィルタリングおよび変換する

自殺の数および率を視覚化するために、以下のフィルタリングを適用する：

data_statusが「Suspected」である行のみを選択。
sexが「Male」または「Female」である行のみを選択。
age_groupが「All ages」でない行のみを選択。
ethnicityが「All ethnic groups」である行のみを選択。
yearが2023年である行のみを選択。その後、number列を数値型に変換し、自殺率を計算してrate列に格納する（100,000人あたりの率）。

9. 一意の年齢グループを取得し、因子レベルを順序通りに設定する

プロット時に年齢グループが正しい順序で表示されるように、一意の年齢グループを抽出し、age_group列を因子型に変換してレベルを設定する。

10. バープロットの色を定義する

sexカテゴリごとに異なる色を割り当てるため、以下のようにカスタムカラーを定義する：

Female: 紫色のトーン（rgb(102/255, 102/255, 153/255)）
Male: ピンク色のトーン（rgb(255/255, 102/255, 102/255)）

このカラースキームにより、プロット内で性別の違いが視覚的に明確になる。

11. 自殺死亡数のスタックドバープロットを作成する

ggplot2を使用して、2023年における年齢グループおよび性別別の自殺死亡数を示すスタックドバープロットを作成する。

x軸: age_group
y軸: number（自殺死亡数）
色分け: sex
geom_bar(stat = "identity")により、実際の数値に基づいたバーを描画。
プロットのタイトル、軸ラベル、凡例のラベルは英語のまま設定。
scale_fill_manual()で定義したカスタムカラーを適用。
theme_minimal()により、シンプルでクリーンなデザインを採用。
theme()でx軸のテキストを水平に調整し、読みやすさを向上。

12. 自殺率のスタックドバープロットを作成する

同様に、ggplot2を使用して、2023年における年齢グループおよび性別別の自殺率を示すスタックドバープロットを作成する。

x軸: age_group
y軸: rate（自殺率）
色分け: sex
geom_bar(stat = "identity")により、実際の率に基づいたバーを描画。
プロットのタイトル、軸ラベル、凡例のラベルは英語のまま設定。
scale_fill_manual()で定義したカスタムカラーを適用。
theme_minimal()により、シンプルでクリーンなデザインを採用。
theme()でx軸のテキストを水平に調整し、読みやすさを向上。

まとめ

このRスクリプトは、ニュージーランドにおける2023年の自殺傾向データを詳細に分析し、可視化するための一連の手順を実行する。データの読み込みからフィルタリング、欠損値の補完、数値変換、そして最終的な視覚化に至るまで、各ステップが明確に定義されている。特に、性別および年齢グループごとの自殺数と自殺率をスタックドバーで視覚化することで、異なるデモグラフィックセグメントにおける傾向を直感的に理解できるようにしている。カスタムカラーの使用やシンプルなテーマの採用により、プロットの見やすさと情報の伝達力が向上している。この分析は、公衆衛生関係者、研究者、政策立案者にとって、効果的な介入やさらなる研究のための貴重なインサイトを提供するものである。