Visualization of suicide data in New Zealand (3) - Choropleth map
Author
Takafumi Kubota
Published
November 18, 2024
Abstract
This report visualizes suicide data in New Zealand for 2023 using choropleth maps to highlight spatial variations across districts. Leveraging R and the ggplot2 package, the study processes and cleans data from official sources, addressing missing and anomalous values to ensure accuracy. Two maps are generated: one depicting the absolute number of suicides and another illustrating suicide rates per 100,000 population. The visualizations reveal significant disparities between regions, identifying high-risk districts that require targeted intervention. These insights aim to inform public health strategies and policy-making to effectively address and mitigate suicide rates across New Zealand.
Keywords
Suicide Visualization, Choropleth Map, Public Health
Summary
Suicide remains a critical public health issue globally, necessitating effective data analysis and visualization to inform prevention strategies. This report focuses on the visualization of suicide data in New Zealand for the year 2023, employing choropleth maps to elucidate spatial patterns and disparities across various districts. Utilizing the R programming language and the ggplot2 package, the study meticulously processes and cleans the data to ensure reliability and accuracy in the resulting visualizations.
The data sources include district codes and suicide trends segmented by district and calendar year, obtained from official online repositories. Initial data cleaning involves addressing inconsistencies such as non-numeric entries (“S” and “-”) and missing values (NA) in critical columns like rate, number, and popcount. These anomalies are systematically replaced with standardized values or imputed using group means to maintain data integrity. The population counts (popcount) are particularly crucial, as they serve as the denominator in calculating standardized suicide rates per 100,000 population, allowing for meaningful comparisons across districts with varying population sizes.
After cleaning, the dataset is enriched by merging district codes, facilitating the alignment of suicide data with geographical regions. The analysis concentrates on suspected suicide cases in 2023, aggregating the total number of suicides and population counts per district. This aggregation enables the calculation of suicide rates, which are essential for identifying areas with disproportionately high suicide prevalence relative to their population.
The spatial aspect of the analysis leverages the spData package, which provides comprehensive map data for New Zealand. By merging the cleaned suicide data with the spatial polygons representing each district, the study updates a unified dataset (nz) suitable for visualization. Two distinct choropleth maps are generated using ggplot2:
Number of Suicides by District: This map employs a yellow-to-red color gradient to represent the absolute number of suicides in each district. Darker shades indicate higher suicide counts, allowing for the immediate identification of regions with elevated suicide incidents.
Suicide Rate by District: To account for population differences, this map visualizes suicide rates per 100,000 population. The same color gradient is used for consistency, with grey representing districts with missing data. This standardized rate provides a clearer picture of suicide prevalence, independent of district size.
The resulting visualizations reveal significant spatial disparities in suicide occurrences across New Zealand. Certain districts emerge as high-risk areas, exhibiting both high absolute numbers and elevated suicide rates. These insights are invaluable for public health officials and policymakers, as they highlight regions that may benefit from targeted mental health interventions, resource allocation, and preventive measures.
Moreover, the methodological approach demonstrated in this report underscores the importance of rigorous data cleaning and preprocessing in producing reliable visualizations. By addressing data anomalies and ensuring accurate population counts, the study lays a solid foundation for meaningful analysis and interpretation.
In conclusion, the choropleth maps serve as powerful tools for understanding the geographical distribution of suicide in New Zealand. They not only facilitate the identification of high-risk districts but also support the development of informed strategies aimed at reducing suicide rates and enhancing mental health support systems nationwide. Future work may extend this analysis by incorporating temporal trends, demographic factors, and evaluating the effectiveness of implemented interventions over time.
Full code
# Loading required packageslibrary(sf)library(terra)library(dplyr)library(spData)library(ggplot2)library(readr)library(RColorBrewer)# Reading datascode <-read_csv("https://takafumikubota.jp/suicide/nz/scode.csv")suicide_area <-read_csv("https://takafumikubota.jp/suicide/nz/suicide-trends-by-district-by-calendar-year.csv")suicide_area <- suicide_area %>%filter(district !="All New Zealand")# Replace 'S' and NA in 'rate' column with '0' and convert to numericsuicide_area <- suicide_area %>%mutate(rate =ifelse(rate =="S"|is.na(rate), "0", rate),rate =as.numeric(rate) )# Replace '-' and NA in 'number' column with '0' and convert to numericsuicide_area <- suicide_area %>%mutate(number =ifelse(number =="-"|is.na(number), "0", number),number =as.numeric(number) )# Replace 'S' in 'popcount' with NA and impute with mean valuessuicide_area_clean <- suicide_area %>%mutate(popcount_num =as.numeric(popcount),popcount_num =if_else(popcount =="S", NA_real_, popcount_num) )pop_means <- suicide_area_clean %>%group_by(data_status, district) %>%summarise(pop_mean =mean(popcount_num, na.rm =TRUE),.groups ='drop' )suicide_area_updated <- suicide_area_clean %>%left_join(pop_means, by =c("data_status", "district")) %>%mutate(popcount_final =if_else(is.na(popcount_num), pop_mean, popcount_num) ) %>%select(-popcount, -popcount_num, -pop_mean) %>%rename(popcount = popcount_final)suicide_area <- suicide_area_updated# Adding district codessuicide_area <- suicide_area %>%left_join(scode, by =c("district"="district")) %>%rename(area_code = mcode)# Creating data for the mapmapdata <- suicide_area %>%filter(year ==2023, data_status =="Suspected") %>%group_by(area_code) %>%summarise(year =unique(year),number =sum(number),popcount =sum(popcount),rate = (number / popcount) *100000 ) %>%ungroup()# Loading New Zealand map datadata("nz", package ="spData")# Merging map data with spatial datanz <- nz %>%mutate(area_code =c(1:12,12,13,13,13)) %>%left_join(mapdata, by ="area_code")ggplot(nz) +geom_sf(aes(fill = number)) +scale_fill_gradientn(colors =brewer.pal(5, "YlOrRd"),na.value ="grey50" ) +theme_minimal() +labs(title ="The number of suicides by district in 2023",fill ="Number" )
ggplot(nz) +geom_sf(aes(fill = rate)) +scale_fill_gradientn(colors =brewer.pal(5, "YlOrRd"),na.value ="grey50" ) +theme_minimal() +labs(title ="Suicide rate by district in 2023",fill ="Rate" )
Loading required packages: Initializes the necessary libraries for spatial data handling, data manipulation, and visualization.
Reading data: Imports district codes and suicide trend data from specified URLs, filtering out aggregated national data to focus on individual districts.
Replace ‘S’ and NA in ‘rate’ column with ‘0’ and convert to numeric: Cleans the rate column by replacing non-numeric entries (“S”) and missing values (NA) with “0” and converting the column to a numeric type for analysis.
Replace ‘-’ and NA in ‘number’ column with ‘0’ and convert to numeric: Cleans the number column by replacing placeholder (“-”) and missing values (NA) with “0” and converting the column to numeric for accurate calculations.
Replace ‘S’ in ‘popcount’ with NA and impute with mean values: Handles the popcount column by replacing “S” with NA to denote missing values and prepares to impute these missing values with the mean population counts.
Adding district codes: Merges district codes into the main dataset to enable accurate mapping of suicide data to specific geographical areas.
Creating data for the map: Filters the data for the year 2023 and “Suspected” cases, then aggregates the number of suicides and population counts by district to calculate suicide rates per 100,000 population.
Loading New Zealand map data: Imports spatial polygon data for New Zealand regions from the spData package to serve as the base map for visualization.
Merging map data with spatial data: Combines the aggregated suicide data with the spatial map data using area_code to prepare for choropleth mapping.
read_csv関数を用いて、指定されたURLから2つのCSVファイルを読み込む。scodeには地区コードが含まれており、suicide_areaには地区別および年度別の自殺傾向データが含まれている。次に、filter関数を用いて「All New Zealand」という集計データを除外し、個々の地区に焦点を当てる。