Visualization of suicide data in New Zealand (1) - Time series

Author

Takafumi Kubota

Published

November 3, 2024

Abstract

This study analyzes suspected suicide trends in New Zealand from 2009 to 2023. Using R for data cleaning and visualization, it highlights demographic patterns and annual shifts, offering insights for policymakers and researchers.

Keywords

R language, Suicide, New Zealand, Line Chart

Introduction

This page includes information about numbers and rates of suicide deaths in Aotearoa New Zealand. If at any point you feel worried about harming yourself while viewing the information in this page—or if you think someone else may be in danger—please stop reading and seek help.

This page introduces an analysis focused on trends in suspected suicide deaths in Aotearoa New Zealand, from 2009 to 2023, leveraging publicly available data. The analysis aims to reveal meaningful insights into demographic trends and annual changes using R data manipulation and visualization techniques. The code provided reads, cleans, and filters a dataset containing suicide statistics by ethnicity, sex, and age group. After preparing the data, it plots time series graphs to visualize trends across different population segments. The resulting plots aim to inform public health officials, researchers, and policymakers about the evolution of suspected suicide rates and identify potential areas for targeted intervention or further study. This examination emphasizes transparency and reproducibility by guiding readers on accessing, cleaning, and visualizing similar data.

The data on this page is sourced from the Suicide Data Web Tool provided by Health New Zealand, specifically from https://tewhatuora.shinyapps.io/suicide-web-tool/, and is licensed under a Creative Commons Attribution 4.0 International License.

This visualisation shows only calendar years. It also visualises only suspected suicides. The following notes are given on the site of the Suicide Data Web Tool:

  • Short term year-on-year data are not an accurate indicator by which to measure trends. Trends can only be considered over a five to ten year period, or longer.

  • Confirmed suicide rates generally follow the same pattern as suspected suicide rates.

library(ggplot2)
library(dplyr)
library(readr)

# Load the data
#suicide_trends <- read_csv("data/suicide-trends-by-ethnicity-by-calendar-year.csv")
suicide_trends <- read_csv("https://takafumikubota.jp/suicide/nz/suicide-trends-by-ethnicity-by-calendar-year.csv")

# Filter and transform data for the line plot
suicide_trends_filtered <- suicide_trends %>%
  filter(
    data_status == "Suspected",
    sex %in% c("Male", "Female", "All sex"),
    ethnicity == "All ethnic groups",
    age_group == "All ages"
  ) %>%
  mutate(number = as.numeric(number), rate = as.numeric(rate))  # Convert 'number' to numeric


# Define colors for the line plot
line_colors <- c(
  "Female" = rgb(102/255, 102/255, 153/255),
  "Male" = rgb(255/255, 102/255, 102/255),
  "All sex" = rgb(144/255, 238/255, 144/255)
)

# Create the line chart for the number of suicide
ggplot(suicide_trends_filtered, aes(x = year, y = number, color = sex, group = sex)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  labs(
    title = "Number of suicide deaths in Aotearoa New Zealand, 2009–2023",
    x = "year",
    y = "Number (Suspected)",
    color = "sex"
  ) +
  scale_color_manual(values = line_colors) +
  theme_minimal()

# Create the line chart for the suicide rate
ggplot(suicide_trends_filtered, aes(x = year, y = rate, color = sex, group = sex)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  labs(
    title = "suicide rate in Aotearoa New Zealand, 2009–2023",
    x = "year",
    y = "Number (Suspected)",
    color = "sex"
  ) +
  scale_color_manual(values = line_colors) +
  theme_minimal()

Loading Required Packages

  • readr: Used for reading CSV files into a data frame.

  • ggplot2: Used for data visualization and creating line plots.

  • dplyr: Used for data manipulation and filtering.

Detailed Explanation of the Code

Loading the Data:

The read_csv() function reads the CSV file "data/suicide-trends-by-ethnicity-by-calendar-year.csv" into a data frame named suicide_trends. This section specifies how to download the file from an external site. You can download the same data by navigating to the Suicide Data Web Tool, selecting “Overview” from “View Data,” choosing “calendar year” under “Select financial or calendar year,” and then downloading the dataset. Place this file in a data directory within your current working directory, and you can run the code by uncommenting the read_csv() line.

Filtering and Transforming the Data:

The filter() function is used to filter the suicide_trends data frame based on the following conditions:

  • data_status is "Suspected".

  • sex is "Male", "Female", or "All sex". (This filter might be unnecessary in practice.)

  • ethnicity is "All ethnic groups".

  • age_group is "All ages".

The mutate() function is then used to convert the number column to a numeric type to ensure correct plotting. The rate column is handled similarly.

Defining Colors for the Line Chart:

The rgb() function is used to create a named vector, line_colors, that assigns custom colors to each value of sex:

  • "Female" is assigned a purple tone.

  • "Male" is assigned a pink tone.

  • "All sex" is assigned a light green tone.

Creating the Line Chart:

  • ggplot() is used to plot year on the x-axis, number on the y-axis, with sex as the color grouping variable.

  • geom_line(size = 1) draws the lines and specifies the line thickness.

  • geom_point(size = 2) adds points at each observation to enhance visual clarity.

  • labs() sets the plot title, x-axis label, y-axis label, and legend title.

  • scale_color_manual(values = line_colors) applies the custom colors to the sex variable.

  • theme_minimal() applies a simple, minimalistic design to the plot.

使用するパッケージの読み込み

  • readr: CSVファイルをデータフレームとして読み込むために使用。

  • ggplot2: データの可視化と線グラフの作成に使用。

  • dplyr: データの操作やフィルタリングに使用。

コードの詳細説明

  1. データの読み込み:

    • read_csv() 関数を使用して、"data/suicide-trends-by-ethnicity-by-calendar-year.csv" というCSVファイルを suicide_trends データフレームに読み込んでいます。ここでは、別サイトに置いたファイルをダウンロードするように指定しています。上記のthe Suicide Data Web Toolのサイトから、View DataからOverviewで、「Select financial or calendar year」を「calender year」として、download selected data setとすると、同データをダウンロードすることができるので、カレントディレクトリの下に「data」を作り、そこにおけば同コード(コメントアウトの#を外して)で読み込むことができます。
  2. データのフィルタリングと変換:

    • filter() を使って、suicide_trends データフレームを以下の条件でフィルタリングします。

      • data_status"Suspected"

      • sex"Male""Female"、または "All sex"。実際にはこのフィルターは不要です。

      • ethnicity"All ethnic groups"

      • age_group"All ages"

    • mutate()number 列を数値型に変換し、正しくプロットできるようにしています。rate 列も同様。

  3. 折れ線グラフの色を定義:

    • rgb() を使用して、各 sex の値に対してカスタム色を設定するための line_colors という名前付きベクトルを作成しています。

      • "Female" には紫色のトーン。

      • "Male" にはピンク色のトーン。

      • "All sex" にはライトグリーンのトーン。

  4. 折れ線グラフの作成:

    • ggplot()year をx軸、number をy軸、sex を色のグループ化変数としてプロットします。

    • geom_line(size = 1) で線を描画し、線の太さを指定。

    • geom_point(size = 2) でデータポイントを各観測値に追加し、視覚的な明確さを向上。

    • labs() でプロットのタイトル、x軸、y軸ラベル、および凡例のタイトルを設定。

    • scale_color_manual(values = line_colors)sex 変数にカスタム色を適用。

    • theme_minimal() でプロットにシンプルなデザインを適用。