About this project

For this project I have analysed the data provided by the Ministry of Transport referring to road deaths in New Zealand.

The article is divided into two parts. First and foremost a background of the reasons and goals of this project is given.

In the second part I deal with regional data about road deaths. Firstly I present a “reconstruction exercise” in which I will challenge myself to recreate the death by region plot presented on the Ministry of Transport website. Secondly, I will suggest some possible alternatives to make the graphs more informative. Here I will also highlight some figures I believe could be interesting. I then finally analyze the differences between the North and the South island and provide some possible explanations.

Reasons behind this project

The very first objective for this task is for me to improve my R skills. I am on my journey of learning how to do data analysis with this language and I think there is no better way to do this than by challenging myself. The focus is centred on data cleaning and plotting, hence on the use of dyplr, tidyr and ggplot2.

However, why did I pick this grim topic? Mainly because New Zealand news outlet and police seem to be very focused on it. I have been living here for more than two years now and traffic accidents are on the news almost every day. So I thought it could be interesting to explore some figures by myself. Furthermore, when I navigated the Ministry of Transport statistics webpage, I found numerous graphs that could be improved (crammed of levels, overlapping labels, etc…); it seemed a nice experiment try to provide a better visualisation.

Last but not least, this is also a good way to boost my English.

Regional Death Rates

The very first question I wanted to answer was is there any difference between regions in regard of road deaths? If so, which regions have the highest figures and which regions the lowest?. The Ministry of Transport provides a bar plot that spans 18 years, from 2000 to 2018. Each bar is filled according to the number of deaths caused by road accidents for each region.

As a first challenge I wanted to recreate the plot as close as I could to the original.

original

my reconstruction

To get the colours right I simply sampled them with a browser extension and then manually created a palette of them. The main challenge here was to get the order of the regions right. After a few attempts, I discovered that ggplot2 utilises the rank of the levels to define the order of the groups. Therefore, I simply created a vector containing the levels in the right order and then I used it in factor(data, levels = my.ordered.vector).

As it can be seen from the pictures, this is a quite overcrowded plot. While it is easy to get the overall number of road deaths for each year, it is difficult to retrieve the number for a specific region. Moreover, it provides the raw numbers, without considering the strongly uneven distribution of the population in New Zealand. For instance, more than 1.5 millions of people live in Auckland; with an estimated population of 5,101,400 people in this country. So, it is not very surprising that higher populated areas like Auckland saw more road death than low populated areas.

To be honest, the plot on the Ministry of Transport website is interactive, therefore you can move the mouse over the groups in each bar in order to get the raw numbers for that region and year. However I think this can be a long process, it is not mobile-friendly and could not work on paper in case a report is printed.

It would be more interesting to see the rate of deaths over the regional population. Having said that, these are the main elements I focused on:

  1. Make the plot more readable, easing the retrieve of information and comparison.
  2. Compute and plot the rate of road deaths over the population of each region.

Improving regional data plots

First improvement

On my first experiment I focused only on point (1), that is to make the plot more readable. To do so I thought to simply split the plot in facets, with a bar plot for each year. Here the result:

I admit this is probably not the best visualisation ever; on the other side it helps to see a few figures that I think were not that clear before. First of all, notably, there are no data for Marlborough and Tasman. They are not presented in the Ministry of Transport plot, but no warning is given there. Also, we can now see better the figures for each region. For example, from 2010 the number of road deaths in the Canterbury region has been pretty similar to the number for the Auckland region. We can also appreciate more the clear reduction in the figures for Auckland and Waikato, which is not that visible in the original plot. Obviously we have now lost the information about the overall numbers, but this problem is easily solved by pairing this facet plot with a time-series graph representing just the sum of all road deaths throughout the years.

Second improvement

I think the faceted plot is nice, but still a bit tricky to read. Furthermore, it still has the raw numbers. Therefore the next step was to obtain regional data about the population. This task has been very time-consuming. Stat NZ has divided the numbers in a different page for each region. These numbers come from the 2018 census. Note: given the time required by this process, I utilised these numbers for every year, de facto considering the population stable during the two decades. Obviously this is not the case, but for the sake of the exercise we can assume that the population did not change or that the proportions across regions have been constant. Once I obtained this information, I added it in a new column of the data frame (tibble to be precise) using the merge() function of base R. I then use these numbers to compute the ratio road deaths/population for each region and year.

The data represents changes through time, so I thought that the best graph for this representation is a line graph. The problem with this approach is that there are 14 regional trends to plot, therefore a single graph would be unreadable. The first possibility is to split the regions between North Island and South Island. This would produce quite a nice representation for the South Island, but not for the North Island. The reason: there are 5 regions in the South Island (that we have data for) and 9 in the North Island. I then decided to take into account what I achieved so far:

  1. Faceting helped in highlighting regional trends.
  2. Dividing between islands is helpful at least for the South Island.

Combining these approaches was the next logical step to take. Two plots, each of which containing facets for each region. Here the result:

We can now clearly see the trends for each region (of which we have data).

On the big picture, there is an important difference between North and South Island: the death rate interval for the South Island is double the interval for the North Island. This resulted from the numbers of two specific regions: Nelson and West Coast. Furthermore we see that the most populated regions, Auckland and Wellington had the lowest rates among all regions and throughout all years.

If we focus of the North Island we can see that Auckland, which is the region that registered the highest number of road deaths in the majority of the years, recorded a stable and low rate of deaths per 10,000 people. A similar figure holds also for the Wellington region. This is not a complete surprise, since almost a third of New Zealanders live within the Auckland Region (and specifically in Auckland). Conversely, the highest rates have been recorded in low-density populated areas like Northland and Gisborne. The latter had a fluctuating trend peaking at just over 2 deaths per 10,000 people in 2012 and reaching a low of just over 0 in 2004, 2011 and 2015. The figure for Northland, instead, remained over the rate of 1.5 deaths per 10,000 people most of the years, but for the period between 2011 and 2015. Moreover, the region of Waikato has been too characterised by high numbers, fluctuating in between 1.5 and 2 deaths per 10,000 people most of the years. Overall, in most of the regions, despite variability throughout the years, the road death ratio settled between 1 and 2 deaths per 10,000 people in 2018.

Moving onto the South Island, here it is clear that two regions, namely Nelson and West Coast, significantly differ from all the rest. These are the only places that reached road deaths per population ratio of over 4 points. Nelson recorded almost 5 road deaths per 10,000 people in 2010; the West Coast peaked at over 5 deaths per 10,000 people in 2011. Despite this, their figures lowered during the years, settling at around 2 road deaths per 10,000 people in 2018. A figure that aligns with the ones for the North Island for the same year. In addition, we can see that Canterbury, like Auckland, even if recorder the highest number of road deaths, sometimes passing the number for Auckland, had a ratio that is pretty law in proportion to its population.

Overall Numbers and Conclusions

From the data showed above it seems like the percentage of fatal accidents was higher for low-populated regions. Additionally, relevant differences in rates have been recorded between the two islands. However, while most of the North Island regions varied around 2 deaths per 10,000 people, only two out of five South Island regions have passed this mark. It is therefore interesting to consider the overall trends for the two islands and comparing it against the trend for the whole of New Zealand.

Grouping the data by islands we obtain the following plot:

From here it is clear that there is a significant difference between the two islands. While the New Zealand ratio is mainly driven by the North Island numbers, the South Island diverge completely, with its road death per population ratio being visibly higher than the New Zealand figure. However, we have said that in the South Island the regions of Nelson and West Coast were somewhat outliers when compared to the other regions in the same Island. Thus, the high numbers of these two regions may pull the ratio for this island away from the country trend. To check if this is the case, I plotted the same figure, this time computing the South Island ratio disregarding the numbers for Nelson and West Coast. This is the result:

The South Island road deaths ratio still does not adhere to the New Zealand trend, however it now lies within the same range. Interesting is the period between 2011 and 2018. Here all the figures increase, with the South Island recording a higher rate than average, even though the two most relevant regions have been excluded. This means that in the past decade the South Island has been characterised by a significantly higher ratio of road deaths per population. This is probably driven by numbers for Canterbury.

Taking into consideration everything said, it has been more likely to incur in a lethal road accident in the South Island particularly in Nelson and West Coast. This although (1)the majority of New Zealanders live in the North Island; (2) this contains the regions with the highest numbers of road fatalities (Auckland, Northland, Waikato). I thought about the possible reasons behind this and these are my hypotheses:

  1. South Island roads are known to be more dangerous because of their structures and conditions.
  2. Tourists: having worked in an industry strictly related to travels and holidays, and living in this country for quite a long time, it seems to me that the South Island is the destination of the majority of tourists. Because they are not accustomed to the specificity of New Zealand roads, they may be a main cause of lethal road accidents. However, I do not have data to support this.
  3. Speed: because there are fewer cars on the roads, it is possible that people are keener to speed past the limits and to engage in dangerous manoeuvres, like passing in the proximity of a blind spot (under the assumption that only a few people are on the same road at the same time).