Introduction

Every year road accidents inflict serious injuries and claim the lives of many people. In 2018, 4.6 fatalities have occurred for every 100,000 people in Australia(Bureau of Infrastructure, Transport and Regional Economics 2019). The number of vehicles registered in Australia has increased by 1.7% from 2018 to 2019(Australian Bureau of Statistics 2019). At the same time the population of Australia has increased by 1.6% in 2018(Australian Bureau of Statistics 2019) The combined increase in the vehicles on road and the population of the country simultaneously could lead to more accidents in future.

Towards Zero is a campaign started by the State of Victoria, aimed at reducing the fatalities occurring from road accidents to 0. Many developed countries across the world have also adopted this strategy in the name of Vision Zero. The goal of the campaign is to create a safe transport system which reduces the possibilities of road accidents by ensuring(Towards Zero 2019) :

  1. Safe Roads

  2. Safe Speeds

  3. Safe Vehicles

  4. Safe people

The Towards Zero campaign has an immediate target of reducing the annual fatalities caused by road accidents to below 200 by 2020(Towards Zero 2019).

Background

In AT2 the group used a Binomial Logistic regression model to find the probability of fatalities or serious injuries in a road crash based on different predictor variables. It was found during the EDA that since the implementation of the Towards Zero Campaign in 2016 a gradual progress was observed over the years across all Injury levels(No Injury, Minor In jury , Serious Injury, Fatalities) as the count of persons Injured for each Injury level was gradually decreasing. Thus it is evident that the year in which the accident happened plays an important role in deciding the probability of fatalities or serious Injuries. Also, the proportion of the ratio of each Injury level to the total injuries changes each year. However the predicted probaility was not satisfactory as the chosen explanatory variables for the regression model did not include year as a variable.

This clearly indicates that our data is more suited for a time series analysis where Autoregressive models can be used to predict the future count of fatalities and serious Injuries based on past values which change yearly or monthly. Also forecasting the future values of each Injury level would help the Towards Zero campaign to identify if it would be possible to achieve their goals in future. These predicted values can also help the campaign to be more prepared to take necessary precautionary measures in future to further curtail the number of road accidents.

New Aims and Research Question

In this Individual Data analysis I would like to find a suitable time series model to accurately forecast the number of casualties(Fatalities and serious injuries) in Victoria for the upcoming years.

Research Questions :

  1. Can the Towards Zero Campaign find huge success beyond it’s short term goal by reducing the annual Casualties(including Fatalities and Serious Injuries) to below 200 by 2020? (NOTE : The short term goal of the campaign is to reduce only annual number of fatalities caused by road accidents to below 200 by 2020)

  2. How does the forecast of the monthly count of Casualties(including Fatalities and Serious Injuries) change with respect to the monthly count of Fatalities?

Research Question 1 :

Data Preparation

Time Series Analysis requires the data to be present as a time series object. The Road crash data has been aggregated into a monthly distribution and fitted into a time series object using the ts() function. Since the data has been aggregated into a monthly distribution, frequency is set to 12. The monthly Casualties count is a sum of the monthly fatalities and serious Injuries.

time_ser=ts(mast_spd_2,frequency=12,start=c(2006,1),end=c(2018,12))
## Warning in data.matrix(data): NAs introduced by coercion

(Fig.1) shows a time series plot of the Casualties(Fatalities & Serious Injuries).

autoplot(time_ser[,"Casualties"]) +
    ggtitle("Monthly Road Crash Casualties(Fatalities & Serious Injuries) in Victoria") +
    xlab("Year") +
    ylab("Number of Casualties")
 Figure 1 :Time Series plot of Monthly Road Crash Casualties(Fatalities & Serious Injuries) in Victoria from 2006 to 2018

Figure 1 :Time Series plot of Monthly Road Crash Casualties(Fatalities & Serious Injuries) in Victoria from 2006 to 2018

It is clearly evident from the plot, that the data has both trend and seasonality patterns. The data can be decomposed by SEATS(Seasonal Extraction in ARIMA Time Series) decomposition to get a better observation of the seasonal,trend and residual elements.

time_ser[,"Casualties"] %>% seas() %>%
autoplot() + xlab("Year") +
  ggtitle("SEATS decomposition of Victoria Road Crash Casualties")
Figure 2 : SEATS decomposition of Victoria Road Crash Casualties

Figure 2 : SEATS decomposition of Victoria Road Crash Casualties

The decomposition plot shows a clear seasonality of the data where there is a surge in the casualties at the the start and end of the year, and a drop at the middle of the year. Also an overall decreasing trend can be observed with drastic drops between 2008 to 2010 and 2016 to 2019. The trend is almost flat and constant between 2012 to 2015.

Check for Stationarity

Most Time Series models require the data to be stationary. A time series data is said to be stationary if it’s patterns are independent of the time at which the data is observed. Hence Time series data with trend and seasonality are considered to be non- stationary, as the trend and seasonality patterns would alter the values according to the observed time(Hyndman & Athanasopoulos 2018).

A time series data can be checked for stationarity by the 2 methods mentioned below.

  1. Autocorrelation plot

Autocorrelation is defined as the measurement of linear relationship between lagged values of a time series data(Hyndman & Athanasopoulos 2018). The ggAcf function plots the autocorrelation coefficients of the lagged data.

ggAcf(time_ser[,"Casualties"]) +
    ggtitle("ACF plot of Victoria Road Crash Casualties")
Figure 3: ACF plot of Victoria Road Crash Casualties

Figure 3: ACF plot of Victoria Road Crash Casualties

The autocorrelation coefficient for every lag crossing the threshold limit indicates that the data is not stationary.

  1. Unit Root tests

Unit Root tests are defined as the statistical hypothesis tests of stationarity(Hyndman & Athanasopoulos 2018). Among the various Unit Root tests available, we have used the KPSS test for our analysis which assumes the null hypothesis as the data to be stationary and looks for evidences to disprove it(Hyndman & Athanasopoulos 2018). The test is performed using the ur.kpss() function of the urca package.

time_ser[,"Casualties"] %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 2.6562 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The null hypothesis is rejected as the test statistic is way larger than the 1% Critical value. Thus again proving that the data is not stationary.

Making the data Stationary

The time series data can be made stationary by differencing the data. First and Second order differencing helps to stabilise the mean of the time series hence decreasing or eliminating the trend of the time series. Seasonal differencing helps to remove or minimise the seasonality present in the data. The number of ordinary differencing to be performed on the data to make it stationary can be determined by the ndiffs() function, while the number of seasonal differencing required can be determined using the nsdiffs() function.

ndiffs(time_ser[,"Casualties"])
## [1] 1
nsdiffs(time_ser[,"Casualties"])
## [1] 0

From the functions executed above it can be found that a first differencing would suffice to make the data stationary and no seasonal differencing is required.

Check for White Noise

A time series data is considered to be White Noise if it does not show any autocorrealtions between lagged values(Hyndman & Athanasopoulos 2018). In simple words, White Noise can be referred to as Random points across time peiods. It is important to check thedata for White Noise as White Noise Data cannot be fitted into Time Series models. From the ACF plot above it can be seen that our data is not white Noise as it shows positive autocorrelations for lagged values.

Model Evaluation and Data Modelling

ARIMA Model

ARIMA is also referred to as Autoregressive Integrated Moving Average Models. It is a combination of Autoregressive models, Moving Average models and differenced values which refers to the integrated part. These individual models constitute to form the ARIMA(p,d,q)(P,D,Q)m model which represents(Hyndman & Athanasopoulos 2018) :

p = order of the autoregressive part

d = degree of first differencing involved

q = order of the moving average part

(p,d,q) = Non seasonal part of the model

(P,D,Q) = Seasonal part of the model

m = number of observations per year

The values of P and Q are determined from the ACF and PACF plots of the data, while D is based on the number of differencing(ordinary and seasonal) done on the data to make it stationary. Manually selecting the p,d,q values can be a tedious and time consuming task which can be overcome by using the auto.arima() function which determines the best possible values automatically.

The below figure shows the modelling procedure for an ARIMA model.

Figure 4 : Modelling procedure for an ARIMA model , Source : https://otexts.com/fpp2/arima-r.html

Figure 4 : Modelling procedure for an ARIMA model , Source : https://otexts.com/fpp2/arima-r.html

As mentioned in the above figure the auto.arima() function automatically performs steps 3-5, which includes :

  1. Determining and performing the number of ordinary and seasonal differencing to be performed on the data to make it stationary.

  2. Determining the p,d,q values and appropriate model from the ACF/PACF plots

  3. Comparing the models based on AICc values to choose the best model.

Hence it is necessary to perform the remaining steps by ourselves.

(fit_arima_casual <- auto.arima(time_ser[,"Casualties"],  stepwise=FALSE, approximation=FALSE))
## Series: time_ser[, "Casualties"] 
## ARIMA(0,1,1)(2,0,0)[12] 
## 
## Coefficients:
##           ma1    sar1    sar2
##       -0.6619  0.4722  0.2216
## s.e.   0.0562  0.0817  0.0899
## 
## sigma^2 estimated as 1721:  log likelihood=-799.48
## AIC=1606.96   AICc=1607.22   BIC=1619.13

The auto.arima() function suggests an ARIMA(0,1,1)(2,0,0)[12] as the best fit model for our Time Series Data.

In the default mode auto.arima() function follows a stepwise approach to search a limited number of models for the best one by approximations. However, the stepwise approach and approximations might result in the model with the least AICc not being found. Hence it would be a better option to define approximation=FALSE and stepwise=FALSE while using auto.arima()(Hyndman & Athanasopoulos 2018).

ETS model

ETS model stands for Error,Trend,Seasonal model. The ETS model is a combination of models and methods. Every model contains a measurement equation which describes the observed data and state equations which describes the change of unobserved components(level,trend,seasonal) or states over time. Hence it is also referred to as a State space model. Each method contains two models, one with additive errors and the other with multiplicative errors(Hyndman & Athanasopoulos 2018).

fit_ets_casual <- ets(time_ser[,"Casualties"])
fit_ets_casual
## ETS(M,N,M) 
## 
## Call:
##  ets(y = time_ser[, "Casualties"]) 
## 
##   Smoothing parameters:
##     alpha = 0.3421 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 659.2851 
##     s = 1.0922 1.057 1.1033 0.8863 0.9491 0.9556
##            0.9404 0.9984 1.031 1.0936 0.954 0.9392
## 
##   sigma:  0.0766
## 
##      AIC     AICc      BIC 
## 1926.934 1930.362 1972.682

The ets() function suggests an ETS(M,N,M) as the best fit model for our Time Series Data.

ARIMA vs ETS

Although it is assumed that ARIMA models can fit an extensive range of time series data, they do not have equivalent models for Non-linear exponential smoothing models. Arima models are a combination of stationary and non-stationary moedels while every ETS model is non-stationary(Hyndman & Athanasopoulos 2018).

The below table represents the equivalence relationship between the models(Hyndman & Athanasopoulos 2018).

Figure 5 : Equivalence relationship between ARIMA and ETS models , Source : https://otexts.com/fpp2/arima-ets.html

Figure 5 : Equivalence relationship between ARIMA and ETS models , Source : https://otexts.com/fpp2/arima-ets.html

Checking AICc values of both the models

AICc is helpful in determining the best model only when selecting a model from within the same family of ARIMA or ETS models. Models belonging to different families(ARIMA and ETS) can’t be compared based on AICc as the likelihood is computed differently(Hyndman & Athanasopoulos 2018).

fit_arima_casual
## Series: time_ser[, "Casualties"] 
## ARIMA(0,1,1)(2,0,0)[12] 
## 
## Coefficients:
##           ma1    sar1    sar2
##       -0.6619  0.4722  0.2216
## s.e.   0.0562  0.0817  0.0899
## 
## sigma^2 estimated as 1721:  log likelihood=-799.48
## AIC=1606.96   AICc=1607.22   BIC=1619.13
fit_ets_casual
## ETS(M,N,M) 
## 
## Call:
##  ets(y = time_ser[, "Casualties"]) 
## 
##   Smoothing parameters:
##     alpha = 0.3421 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 659.2851 
##     s = 1.0922 1.057 1.1033 0.8863 0.9491 0.9556
##            0.9404 0.9984 1.031 1.0936 0.954 0.9392
## 
##   sigma:  0.0766
## 
##      AIC     AICc      BIC 
## 1926.934 1930.362 1972.682

Checking Residuals from both the models

In order to select the best model between ARIMA and ETS, it is important to check the residuals from both the models using the checkresiduals() function. The residuals represent the unused time series data by the model. It is important for a good model to make utmost use of the data such that the residuals represent white noise.

checkresiduals(fit_arima_casual)
Figure 6: Residuals from ARIMA(0,1,1)(2,0,0)[12]

Figure 6: Residuals from ARIMA(0,1,1)(2,0,0)[12]

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)(2,0,0)[12]
## Q* = 18.461, df = 21, p-value = 0.6197
## 
## Model df: 3.   Total lags used: 24
checkresiduals(fit_ets_casual)
Figure 7 : Residuals from ETS(M,N,M)

Figure 7 : Residuals from ETS(M,N,M)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,N,M)
## Q* = 23.159, df = 10, p-value = 0.01017
## 
## Model df: 14.   Total lags used: 24

The above plots show that both the models do an equally good job at capturing the details in the data, as the residuals from both the models represent white noise. The residual ACF plot of both the models show that the autocorrelation coefficients of lagged residuals lie within the threshold limit indicated by blue lines. Also both the histograms of the residuals fit perfectly within the Normal distribution curve. These indications from both the plots give us a clear indication that the residuals from both the models represent white Noise. Hence we need to analyse further to identify the better model.

Check Raw Mean Square Error(RMSE)

The models can further be evaluated to find the Raw Mean Squared error(RMSE) by using the accuracy function of the forecast package. When comparing RMSE between models the model with the least RMSE is considered to be the best model for the time seies data as lower RMSE indicates less error. In our case, the RMSE for ETS model is lower than the ARIMA model, hence signifying that it’s a better model for the data when compared to the ARIMA model.

accuracy(fit_arima_casual)
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -2.937484 40.94674 32.61474 -1.211968 7.009833 0.6311681
##                     ACF1
## Training set -0.02636006
accuracy(fit_ets_casual)
##                     ME    RMSE     MAE       MPE     MAPE      MASE
## Training set -6.763576 35.2163 27.7533 -1.992361 6.002766 0.5370884
##                     ACF1
## Training set -0.03872834

Forecasting Data

Now as we have identified the ETS model as the best model for our time series data, it’s time to forecast using the model.

# forecast for ETS model

forecast(fit_ets_casual) %>% autoplot() + xlab("Year") + ylab("Monthly Casualties")
Figure 8 : Forecasts from ETS(M,N,M)

Figure 8 : Forecasts from ETS(M,N,M)

We can also try forecasting the data for the ARIMA model to compare how the forecast varies from the ETS model.

# forecast for ARIMA model

forecast(fit_arima_casual) %>% autoplot() + xlab("Year") + ylab("Monthly Casualties")
Figure 9 : Forecasts from ARIMA(0,1,1)(2,0,0)[12]

Figure 9 : Forecasts from ARIMA(0,1,1)(2,0,0)[12]

The darker blue region represents the 80 percentile interval and the light blue region represents the 95 percentile interval.

We can also compare the forecast numbers between the 2 models.

forecast(fit_ets_casual)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2019       274.0040 247.0961 300.9119 232.8520 315.1561
## Feb 2019       278.3130 249.4184 307.2077 234.1225 322.5036
## Mar 2019       319.0676 284.2403 353.8950 265.8038 372.3315
## Apr 2019       300.7851 266.4230 335.1472 248.2329 353.3374
## May 2019       291.2843 256.5879 325.9807 238.2208 344.3478
## Jun 2019       274.3767 240.4087 308.3446 222.4272 326.3261
## Jul 2019       278.7788 243.0065 314.5510 224.0698 333.4877
## Aug 2019       276.9046 240.1637 313.6454 220.7142 333.0949
## Sep 2019       258.5767 223.1738 293.9796 204.4326 312.7207
## Oct 2019       321.8910 276.4977 367.2843 252.4680 391.3141
## Nov 2019       308.3824 263.6628 353.1019 239.9897 376.7750
## Dec 2019       318.6426 271.1962 366.0891 246.0796 391.2057
## Jan 2020       274.0041 232.1648 315.8433 210.0164 337.9917
## Feb 2020       278.3131 234.7854 321.8407 211.7433 344.8829
## Mar 2020       319.0677 268.0113 370.1241 240.9837 397.1517
## Apr 2020       300.7852 251.5891 349.9813 225.5462 376.0241
## May 2020       291.2844 242.6318 339.9370 216.8766 365.6921
## Jun 2020       274.3767 227.6151 321.1383 202.8610 345.8924
## Jul 2020       278.7788 230.3369 327.2207 204.6933 352.8643
## Aug 2020       276.9046 227.8813 325.9279 201.9300 351.8793
## Sep 2020       258.5767 211.9660 305.1874 187.2918 329.8616
## Oct 2020       321.8911 262.8490 380.9332 231.5939 412.1882
## Nov 2020       308.3824 250.8583 365.9066 220.4068 396.3581
## Dec 2020       318.6427 258.2285 379.0569 226.2471 411.0383
forecast(fit_arima_casual)
##          Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## Jan 2019       273.4710 220.30962 326.6323 192.16770 354.7742
## Feb 2019       257.2863 201.16940 313.4033 171.46289 343.1098
## Mar 2019       299.0487 240.12423 357.9732 208.93150 389.1659
## Apr 2019       270.1541 208.54987 331.7582 175.93858 364.3695
## May 2019       262.9593 198.78723 327.1314 164.81658 361.1021
## Jun 2019       237.8307 171.18956 304.4718 135.91188 339.7495
## Jul 2019       260.8659 191.84403 329.8878 155.30604 366.4258
## Aug 2019       253.6133 182.29009 324.9366 144.53385 362.6928
## Sep 2019       230.4260 156.87337 303.9786 117.93697 342.9150
## Oct 2019       268.6263 192.90994 344.3427 152.82812 384.4245
## Nov 2019       277.6166 199.79663 355.4366 158.60121 396.6320
## Dec 2019       253.7766 173.90837 333.6448 131.62869 375.9245
## Jan 2020       242.0119 151.26739 332.7565 103.23013 380.7937
## Feb 2020       224.1773 129.65402 318.7005  79.61644 368.7381
## Mar 2020       256.5281 158.37153 354.6847 106.41058 406.6456
## Apr 2020       228.7032 127.04301 330.3633  73.22739 384.1789
## May 2020       226.1917 121.14475 331.2386  65.53627 386.8471
## Jun 2020       208.5643 100.23641 316.8921  42.89111 374.2374
## Jul 2020       231.1850 119.67271 342.6973  60.64166 401.7284
## Aug 2020       219.7840 105.17572 334.3923  44.50575 395.0623
## Sep 2020       207.7260  90.10320 325.3489  27.83742 387.6147
## Oct 2020       231.5266 110.96451 352.0886  47.14282 415.9103
## Nov 2020       238.4309 114.99961 361.8621  49.65905 427.2027
## Dec 2020       219.4180  93.18275 345.6533  26.35783 412.4782

The annual number of casualties obtained by summing the point forecast of the ETS model is approximately 3495 for both 2019 and 2020.

Research Question 2 :

This research question can be answered by using the xreg argument of the ARIMA model. When the xreg argument is introduced, the ARIMA model fits a regression model between the xreg argument and ARIMA errors(Hyndman & Athanasopoulos 2018). The xreg argument can also be used with the auto.arima() model. In our case the xreg argument is the monthly count of Fatalities.

time_ser_xreg = ts(mast_spd_3,frequency=12,start=c(2006,1),end=c(2018,12))
## Warning in data.matrix(data): NAs introduced by coercion

It can be clearly seen that both the time series data are not stationary as they both have trend and seasonality.

autoplot(time_ser_xreg[,3:4], facets=TRUE) +
  xlab("Year") + ylab("") +
  ggtitle("Monthly changes in Road Crash Casualties and Fatalities in Victoria ")
Figure 10 : Monthly changes in Road Crash Casualties and Fatalities in Victoria

Figure 10 : Monthly changes in Road Crash Casualties and Fatalities in Victoria

On further decomposing the Fatalities data by SEATS(Seasonal Extraction in ARIMA Time Series) we can see that the data has a linear decreasing trend. Also the data contains seasonality where there is a huge spike towards the end of each year and constant spikes with decreasing trend from the start to the middle of the year.

time_ser_xreg[,"Fatalities"] %>% seas() %>%
autoplot() +
  ggtitle("SEATS decomposition of Victoria Road Crash Fatalities") + xlab("Year")
Figure 11 : SEATS decomposition of Victoria Road Crash Fatalities

Figure 11 : SEATS decomposition of Victoria Road Crash Fatalities

time_ser_xreg[,"Fatalities"] %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 1.7292 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Upon performing the KPSS Unit Root Test, the data is confirmed to be not stationary as the test statistic value is way larger than the 1% Critical value.

The auto.arima() function is used to model the Casualties data with the Fatalities data as the corresponding xreg argument. The auto.arima() function performs all the necessary steps to make the data stationary and chooses the best ARIMA model automatically.

(fit_xreg <- auto.arima(time_ser_xreg[,"Casualties"],
  xreg=time_ser_xreg[,"Fatalities"]))
## Series: time_ser_xreg[, "Casualties"] 
## Regression with ARIMA(3,1,0)(2,0,0)[12] errors 
## 
## Coefficients:
##           ar1      ar2      ar3    sar1    sar2    xreg
##       -0.6356  -0.3067  -0.1648  0.4734  0.2160  1.7741
## s.e.   0.0810   0.0924   0.0824  0.0881  0.0931  0.4790
## 
## sigma^2 estimated as 1653:  log likelihood=-794.68
## AIC=1603.36   AICc=1604.12   BIC=1624.67

The auto.arima() function suggests the ARIMA(3,1,0)(2,0,0)[12] as the best fit.

The below figure shows the errors from both the Regression model and the selected ARIMA model.

cbind("Regression Errors" = residuals(fit_xreg, type="regression"),
      "ARIMA errors" = residuals(fit_xreg, type="innovation")) %>%
  autoplot(facets=TRUE) + ggtitle("ARIMA and regression errors") +
    xlab("Year")
Figure 12 : ARIMA and regression errors

Figure 12 : ARIMA and regression errors

However in order to verify that the model chosen by auto.arima() is appropriate, it would suffice for the residuals from the ARIMA errors to resemble white Noise series.

checkresiduals(fit_xreg)
Figure 13 : Residuals from Regression with ARIMA(3,1,0)(2,0,0)[12] errors

Figure 13 : Residuals from Regression with ARIMA(3,1,0)(2,0,0)[12] errors

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(3,1,0)(2,0,0)[12] errors
## Q* = 25.921, df = 18, p-value = 0.1016
## 
## Model df: 6.   Total lags used: 24

The residuals from the ARIMA model resemble white Noise as the residual ACF plot shows that all the autocorrelation coefficients of lagged residuals lie within the threshold limit and the residual histogram fits perfectly within the Normal distribution curve.

Using the estimated model we can forecast how the monthly Casualties would change based on the monthly Fatalities. We can forecast the Casualties for the next 6 months starting from January 2019 to June 2019 based on the Fatalities for that same time period. In our case, We can manually define the monthly Fatalities for the period starting from January 2019 to June 2019. This method of forecasting is also known as scenario based forecasting.

Initially we assign increasing values to the to the monthly Fatalities and could see the monthly casualties rise based on the increase in Fatalities.

fcast <- forecast(fit_xreg,
  xreg = c(20,30,40,60,70,100))

autoplot(fcast) + ggtitle("Forecast of change in monthly Casualties based on monthly Fatalities") + xlab("Year") + ylab("Number of Casualties")
Figure 14 : Forecast of change in monthly Casualties based on monthly Fatalities

Figure 14 : Forecast of change in monthly Casualties based on monthly Fatalities

Next we assign decreasing values to the to the monthly Fatalities and could see the monthly casualties drop based on the decrease in Fatalities.

fcast1 <- forecast(fit_xreg,
  xreg = c(40,35,27,20,18,10))

autoplot(fcast1) + ggtitle("Forecast of change in monthly Casualties based on monthly Fatalities") + xlab("Year") + ylab("Number of Casualties")
Figure 15 : Forecast of change in monthly Casualties based on monthly Fatalities

Figure 15 : Forecast of change in monthly Casualties based on monthly Fatalities

Conclusion

A decreasing trend can be observed from the forecast of the Road crash casualties. However upon aggregating the monthly forecasted figures to annual figures for 2019 and 2020, an annual Casualties count of approximately 3495 is obtained for each year. This is a very high figure and nowhere near to the target of the Towards Zero Campaign which is to reduce the number of fatalities caused by road accidents to below 200 by 2020.

The forecasted monthly casualties changes linearly with respect to the monthly Fatalities. In the first scenario of forecasting an ascending set of values were assigned to the xreg argument(Fatalities) and an upward trend was observed in the Casualties. While in the second scenario a decreasing set of values were assigned to the xreg arguments and a downward trend was observed in the casualties. Hence an increase in fatalities causes the Casualties to rise and a decrease in fatalities causes the casualtiies to drop.

References

Hyndman, R.J., & Athanasopoulos, G. (2018), Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia, Viewed 28 October 2019, https://otexts.com/fpp2/

Bureau of Infrastructure, Transport and Regional Economics (BITRE), 2019, Road trauma Australia 2018 statistical summary, BITRE, Canberra ACT, Viewed 28 October 2019

Australian Bureau of Statistics 2019, 9309.0 - Motor vehicle census, Australia, 31 Jan 2019, ABS, Viewed 28 October 2019, https://www.abs.gov.au/ausstats/abs@.nsf/mf/9309.0.

Australian Bureau of Statistics 2019, 3101.0 - Australian Demographic Statistics, Dec 2018, ABS, Viewed 28 October 2019, https://www.abs.gov.au/ausstats/abs@.nsf/lookup/3101.0Media%20Release1Dec%202018.

Towards Zero 2019, What is Towards Zero, Towards Zero, viewed 28 October 2019 https://www.towardszero.vic.gov.au/what-is-towards-zero/road-safety-action-plan.

Towards Zero 2019 , ZERO IS THE ONLY ACCEPTABLE NUMBER ,Towards Zero , Viewed 28 October 2019 , https://www.towardszero.vic.gov.au/