knitr::include_graphics("assets/motor-vehicle-crashes.jpg")

Introduction

Background

Life changed in many ways because of the COVID-19 pandemic in the our world, including in United States. In addition to the COVID-19 deaths, preventable injury related deaths also experienced a dramatic increase in 2020. One component of this increase was motor-vehicle deaths. Roads became less safe in 2020 for a variety of reasons, including an increase in non-restrained occupant deaths, speeding, and alcohol impaired fatal crashes in United States. Data by NSC Injury Facts shows that after three consecutive years of decreases, deaths increased 8.3%; 42,338 people died in motor-vehicle crashes in 2020 compared to 39,107 in 2019.

On other hand, time series is a method of analyzing and processing data which the values are affected by time. The action of predicting future values based on its value in the previous period of time is called forecasting. The data which formatted into a time series (ts) object must have some characteristics: (1) no missing intervals, (2) no missing values, (3) data should be ordered by time.

Because of that, time series forecasting will be used to determine the prediction of the number of accidents in the next few years. After that, an analysis will also be carried out on how the seasonal pattern is owned. That way, the results carried out can be a reference and concern for the wider community, and it is hoped that the number of accidents in the United States can decrease again.

Dataset

The National Safety Council (NSC) and the National Highway Traffic Safety Administration (NHTSA) count motor-vehicle crash deaths monthly in 1992 - 2020 https://injuryfacts.nsc.org/motor-vehicle/overview/crashes-by-month/. There are several attribute that provided, such as the number of deaths, vehicle miles, and death rate per 100 millions vehicle miles. The mileage death rate helps correct for the variation in monthly mileage and is a more accurate measure of risk when comparing trends. Much of the variation in monthly motor-vehicle fatalities is associated with the number of vehicle miles.

Solution

Import Library

The following are the packages used during the time series work:

library(readxl) # read data
## Warning: package 'readxl' was built under R version 4.2.2
library(dplyr) # for data wrangling
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate) # to deal with date
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(xts) # to deal with date
## Warning: package 'xts' was built under R version 4.2.2
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
library(padr) # for padding
## Warning: package 'padr' was built under R version 4.2.2
library(forecast) # time series library
## Warning: package 'forecast' was built under R version 4.2.2
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(tseries) # for adf.test
## Warning: package 'tseries' was built under R version 4.2.2
library(MLmetrics) # calculate error
## 
## Attaching package: 'MLmetrics'
## The following object is masked from 'package:base':
## 
##     Recall

Read Data

The data was obtained from https://injuryfacts.nsc.org/motor-vehicle/overview/crashes-by-month/ about the number of deaths, vehicle miles, and death rate per 100 millions vehicle miles of motor-vehicle crash deaths monthly in 1992 - 2020 in United Stater.

knitr::opts_chunk$set(eval = FALSE)

convert the .xlsx data to csv first using this below code!

deaths <- read_excel(“data/deaths-by-month-crashed.xlsx”)

deaths <- deaths[-c(1:3), ] colnames(deaths) <- c(‘year’, ‘month’, ‘number’, ‘deaths_rate’, ‘vehicle_miles’) deaths <- deaths[!grepl(“Total”, deaths$year),] # don’t include the summary of total deaths per year from the original data deaths <- head(deaths, -1)

write.csv(deaths, “data/deaths-crashed.csv”, row.names = FALSE) # save the data that has been cleaned in csv format

After the data is processed from .xlsx format to .csv. Data is read using read_csv function.

df <- read.csv("data/deaths-crashed.csv") 
df
##     year month number deaths_rate vehicle_miles
## 1   2020   Jan   2666        1.06         252.0
## 2   2020   Feb   2674        1.14         234.0
## 3   2020   Mar   2553        1.16         221.0
## 4   2020   Apr   2320        1.40         166.0
## 5   2020   May   3096        1.46         212.0
## 6   2020   Jun   3725        1.51         247.0
## 7   2020   Jul   3789        1.46         260.0
## 8   2020   Aug   3802        1.50         253.0
## 9   2020   Sep   3724        1.51         247.0
## 10  2020   Oct   3793        1.46         259.0
## 11  2020   Nov   3445        1.45         238.0
## 12  2020   Dec   3237        1.34         241.0
## 13  2019   Jan   2670        1.08         248.2
## 14  2019   Feb   2393        1.06         226.7
## 15  2019   Mar   2769        1.02         271.5
## 16  2019   Apr   2820        1.00         281.4
## 17  2019   May   3172        1.11         286.0
## 18  2019   Jun   3201        1.14         280.9
## 19  2019   Jul   3304        1.12         295.6
## 20  2019   Aug   3359        1.17         286.5
## 21  2019   Sep   3331        1.23         271.7
## 22  2019   Oct   3227        1.14         284.0
## 23  2019   Nov   3084        1.18         260.5
## 24  2019   Dec   3025        1.10         274.2
## 25  2018   Jan   3010        1.23         244.7
## 26  2018   Feb   2734        1.20         227.8
## 27  2018   Mar   3015        1.11         270.7
## 28  2018   Apr   2979        1.08         275.1
## 29  2018   May   3443        1.21         283.7
## 30  2018   Jun   3514        1.24         282.6
## 31  2018   Jul   3552        1.22         291.0
## 32  2018   Aug   3490        1.22         285.0
## 33  2018   Sep   3579        1.34         267.4
## 34  2018   Oct   3657        1.30         281.4
## 35  2018   Nov   3250        1.25         260.5
## 36  2018   Dec   3181        1.18         270.4
## 37  2017   Jan   3034        1.24         244.6
## 38  2017   Feb   2748        1.21         226.9
## 39  2017   Mar   3164        1.18         267.4
## 40  2017   Apr   3238        1.19         272.9
## 41  2017   May   3416        1.20         284.0
## 42  2017   Jun   3492        1.24         280.5
## 43  2017   Jul   3730        1.30         287.3
## 44  2017   Aug   3409        1.20         283.2
## 45  2017   Sep   3572        1.36         262.7
## 46  2017   Oct   3629        1.30         278.9
## 47  2017   Nov   3408        1.32         257.7
## 48  2017   Dec   3391        1.27         266.5
## 49  2016   Jan   2734        1.16         236.5
## 50  2016   Feb   2820        1.23         229.0
## 51  2016   Mar   3105        1.15         269.7
## 52  2016   Apr   3152        1.17         268.4
## 53  2016   May   3481        1.26         275.3
## 54  2016   Jun   3542        1.28         277.5
## 55  2016   Jul   3582        1.27         281.4
## 56  2016   Aug   3600        1.29         279.4
## 57  2016   Sep   3612        1.38         261.8
## 58  2016   Oct   3834        1.41         271.7
## 59  2016   Nov   3535        1.37         258.6
## 60  2016   Dec   3330        1.28         260.0
## 61  2015   Jan   2754        1.17         236.0
## 62  2015   Feb   2350        1.07         219.5
## 63  2015   Mar   2764        1.06         260.9
## 64  2015   Apr   2830        1.06         265.9
## 65  2015   May   3339        1.22         273.9
## 66  2015   Jun   3222        1.18         273.7
## 67  2015   Jul   3530        1.25         281.7
## 68  2015   Aug   3642        1.32         275.4
## 69  2015   Sep   3372        1.31         258.0
## 70  2015   Oct   3550        1.31         271.6
## 71  2015   Nov   3159        1.26         251.5
## 72  2015   Dec   3245        1.24         262.3
## 73  2014   Jan   2572        1.14         226.4
## 74  2014   Feb   2248        1.04         215.2
## 75  2014   Mar   2589        1.03         252.1
## 76  2014   Apr   2720        1.05         257.9
## 77  2014   May   3038        1.13         268.1
## 78  2014   Jun   3084        1.16         264.9
## 79  2014   Jul   3227        1.19         272.3
## 80  2014   Aug   3277        1.21         271.0
## 81  2014   Sep   3069        1.23         249.1
## 82  2014   Oct   3304        1.24         267.2
## 83  2014   Nov   3175        1.31         242.8
## 84  2014   Dec   3095        1.22         253.6
## 85  2013   Jan   2642        1.17         226.7
## 86  2013   Feb   2296        1.07         214.5
## 87  2013   Mar   2791        1.12         248.6
## 88  2013   Apr   2719        1.09         250.1
## 89  2013   May   2988        1.14         261.8
## 90  2013   Jun   3181        1.23         258.0
## 91  2013   Jul   3119        1.19         262.8
## 92  2013   Aug   3378        1.27         266.7
## 93  2013   Sep   3184        1.32         241.0
## 94  2013   Oct   3173        1.23         257.6
## 95  2013   Nov   3076        1.29         238.5
## 96  2013   Dec   2822        1.18         239.5
## 97  2012   Jan   2782        1.23         225.7
## 98  2012   Feb   2533        1.16         217.7
## 99  2012   Mar   2954        1.17         252.5
## 100 2012   Apr   2906        1.17         248.3
## 101 2012   May   3145        1.21         259.9
## 102 2012   Jun   3235        1.25         259.0
## 103 2012   Jul   3336        1.29         259.4
## 104 2012   Aug   3267        1.24         263.6
## 105 2012   Sep   3151        1.32         237.9
## 106 2012   Oct   3090        1.22         252.9
## 107 2012   Nov   3022        1.26         239.6
## 108 2012   Dec   2994        1.26         237.6
## 109 2011   Jan   2741        1.24         221.4
## 110 2011   Feb   2236        1.05         212.4
## 111 2011   Mar   2662        1.07         249.1
## 112 2011   Apr   2805        1.13         248.0
## 113 2011   May   3030        1.20         252.8
## 114 2011   Jun   3130        1.22         256.6
## 115 2011   Jul   3417        1.32         258.8
## 116 2011   Aug   3288        1.27         259.2
## 117 2011   Sep   3080        1.28         240.4
## 118 2011   Oct   3336        1.33         250.6
## 119 2011   Nov   2962        1.25         236.9
## 120 2011   Dec   2886        1.19         243.2
## 121 2010   Jan   2515        1.13         223.0
## 122 2010   Feb   2180        1.02         212.8
## 123 2010   Mar   2575        1.01         253.9
## 124 2010   Apr   2996        1.17         256.6
## 125 2010   May   3169        1.22         259.0
## 126 2010   Jun   3035        1.15         262.9
## 127 2010   Jul   3296        1.23         268.5
## 128 2010   Aug   3357        1.25         267.6
## 129 2010   Sep   3155        1.27         247.9
## 130 2010   Oct   3311        1.27         260.0
## 131 2010   Nov   2965        1.22         243.1
## 132 2010   Dec   2778        1.14         243.1
## 133 2009   Jan   2820        1.25         226.4
## 134 2009   Feb   2481        1.13         219.2
## 135 2009   Mar   2833        1.14         249.1
## 136 2009   Apr   3019        1.19         252.9
## 137 2009   May   3272        1.26         260.4
## 138 2009   Jun   3216        1.23         260.5
## 139 2009   Jul   3326        1.24         267.8
## 140 2009   Aug   3353        1.27         263.0
## 141 2009   Sep   3077        1.26         244.1
## 142 2009   Oct   3087        1.21         254.6
## 143 2009   Nov   2984        1.25         239.2
## 144 2009   Dec   2748        1.14         242.0
## 145 2008   Jan   3043        1.33         229.5
## 146 2008   Feb   2988        1.37         217.6
## 147 2008   Mar   2998        1.21         248.2
## 148 2008   Apr   3161        1.27         248.1
## 149 2008   May   3419        1.33         257.1
## 150 2008   Jun   3483        1.38         251.8
## 151 2008   Jul   3513        1.36         257.6
## 152 2008   Aug   3748        1.46         257.1
## 153 2008   Sep   3358        1.43         234.9
## 154 2008   Oct   3562        1.41         252.1
## 155 2008   Nov   3275        1.40         233.2
## 156 2008   Dec   3242        1.36         238.6
## 157 2007   Jan   3227        1.38         233.7
## 158 2007   Feb   3083        1.41         218.7
## 159 2007   Mar   3688        1.42         259.3
## 160 2007   Apr   3566        1.41         252.4
## 161 2007   May   3881        1.45         267.2
## 162 2007   Jun   3860        1.45         265.3
## 163 2007   Jul   4079        1.53         267.0
## 164 2007   Aug   3915        1.44         271.5
## 165 2007   Sep   3790        1.54         246.2
## 166 2007   Oct   3769        1.44         261.6
## 167 2007   Nov   3606        1.47         246.0
## 168 2007   Dec   3481        1.44         240.9
## 169 2006   Jan   3396        1.46         232.9
## 170 2006   Feb   3164        1.43         220.8
## 171 2006   Mar   3567        1.39         256.6
## 172 2006   Apr   3745        1.50         250.4
## 173 2006   May   3952        1.50         264.2
## 174 2006   Jun   3916        1.49         263.2
## 175 2006   Jul   4092        1.56         262.8
## 176 2006   Aug   4088        1.53         266.5
## 177 2006   Sep   3951        1.60         246.4
## 178 2006   Oct   4048        1.57         258.1
## 179 2006   Nov   3704        1.51         245.8
## 180 2006   Dec   3693        1.50         246.3
## 181 2005   Jan   3305        1.47         224.2
## 182 2005   Feb   3042        1.38         220.1
## 183 2005   Mar   3334        1.31         253.6
## 184 2005   Apr   3686        1.47         250.9
## 185 2005   May   3874        1.48         262.1
## 186 2005   Jun   3865        1.46         264.0
## 187 2005   Jul   4391        1.64         267.4
## 188 2005   Aug   4039        1.52         265.3
## 189 2005   Sep   3920        1.62         241.6
## 190 2005   Oct   4209        1.67         252.1
## 191 2005   Nov   3922        1.61         243.5
## 192 2005   Dec   3756        1.53         244.9
## 193 2004   Jan   3410        1.54         222.0
## 194 2004   Feb   3128        1.47         213.5
## 195 2004   Mar   3422        1.36         252.1
## 196 2004   Apr   3567        1.42         251.7
## 197 2004   May   3952        1.54         257.1
## 198 2004   Jun   3960        1.54         257.8
## 199 2004   Jul   4132        1.55         266.2
## 200 2004   Aug   4212        1.60         262.9
## 201 2004   Sep   3733        1.54         242.9
## 202 2004   Oct   3991        1.57         253.7
## 203 2004   Nov   3640        1.53         238.6
## 204 2004   Dec   3786        1.55         243.9
## 205 2003   Jan   3226        1.48         218.3
## 206 2003   Feb   2927        1.44         203.1
## 207 2003   Mar   3307        1.39         237.1
## 208 2003   Apr   3585        1.50         238.9
## 209 2003   May   3758        1.48         253.6
## 210 2003   Jun   3915        1.55         252.1
## 211 2003   Jul   4203        1.61         261.3
## 212 2003   Aug   4201        1.61         260.3
## 213 2003   Sep   3855        1.63         236.4
## 214 2003   Oct   4108        1.62         253.8
## 215 2003   Nov   3919        1.66         235.4
## 216 2003   Dec   3753        1.56         240.6
## 217 2002   Jan   3458        1.60         216.0
## 218 2002   Feb   3063        1.47         208.5
## 219 2002   Mar   3654        1.55         236.3
## 220 2002   Apr   3419        1.44         236.9
## 221 2002   May   3893        1.54         252.0
## 222 2002   Jun   4018        1.62         248.3
## 223 2002   Jul   4251        1.66         256.1
## 224 2002   Aug   4159        1.61         258.3
## 225 2002   Sep   4018        1.72         233.1
## 226 2002   Oct   3972        1.62         245.7
## 227 2002   Nov   3683        1.60         230.8
## 228 2002   Dec   3792        1.62         233.7
## 229 2001   Jan   3083        1.47         209.3
## 230 2001   Feb   2878        1.44         199.9
## 231 2001   Mar   3262        1.41         231.5
## 232 2001   Apr   3428        1.48         231.4
## 233 2001   May   3810        1.56         244.3
## 234 2001   Jun   3728        1.54         242.6
## 235 2001   Jul   4059        1.63         248.8
## 236 2001   Aug   4047        1.61         251.7
## 237 2001   Sep   3752        1.67         224.6
## 238 2001   Oct   4055        1.69         240.0
## 239 2001   Nov   3833        1.67         229.5
## 240 2001   Dec   3853        1.69         228.1
## 241 2000   Jan   3362        1.65         203.6
## 242 2000   Feb   2909        1.46         199.4
## 243 2000   Mar   3301        1.42         232.6
## 244 2000   Apr   3494        1.53         227.8
## 245 2000   May   3697        1.52         242.6
## 246 2000   Jun   3760        1.55         243.0
## 247 2000   Jul   4157        1.70         245.1
## 248 2000   Aug   3984        1.61         247.8
## 249 2000   Sep   3837        1.69         227.3
## 250 2000   Oct   3836        1.62         236.5
## 251 2000   Nov   3462        1.55         222.8
## 252 2000   Dec   3555        1.63         218.4
## 253 1999   Jan   2936        1.51         194.4
## 254 1999   Feb   2750        1.43         192.3
## 255 1999   Mar   3209        1.45         221.7
## 256 1999   Apr   3148        1.42         222.0
## 257 1999   May   3741        1.61         231.8
## 258 1999   Jun   3590        1.51         237.0
## 259 1999   Jul   3910        1.60         244.2
## 260 1999   Aug   4061        1.67         242.6
## 261 1999   Sep   3715        1.65         225.3
## 262 1999   Oct   3945        1.66         237.7
## 263 1999   Nov   3658        1.64         222.8
## 264 1999   Dec   3738        1.68         222.5
## 265 1998   Jan   3265        1.66         196.9
## 266 1998   Feb   2956        1.58         187.2
## 267 1998   Mar   3229        1.51         214.2
## 268 1998   Apr   3216        1.48         217.9
## 269 1998   May   3625        1.59         227.9
## 270 1998   Jun   3733        1.63         228.7
## 271 1998   Jul   4079        1.70         239.9
## 272 1998   Aug   4216        1.78         237.1
## 273 1998   Sep   3741        1.70         219.5
## 274 1998   Oct   3867        1.69         228.5
## 275 1998   Nov   3728        1.77         211.2
## 276 1998   Dec   3846        1.78         216.3
## 277 1997   Jan   3334        1.75         190.1
## 278 1997   Feb   2915        1.59         183.9
## 279 1997   Mar   3218        1.52         211.9
## 280 1997   Apr   3311        1.56         211.9
## 281 1997   May   3906        1.73         226.1
## 282 1997   Jun   3774        1.70         222.2
## 283 1997   Jul   3997        1.69         236.7
## 284 1997   Aug   4108        1.76         233.5
## 285 1997   Sep   3598        1.69         213.5
## 286 1997   Oct   3841        1.74         221.2
## 287 1997   Nov   3693        1.82         202.4
## 288 1997   Dec   3763        1.82         207.3
## 289 1996   Jan   3160        1.72         183.7
## 290 1996   Feb   2951        1.67         176.8
## 291 1996   Mar   3413        1.67         204.5
## 292 1996   Apr   3396        1.65         205.6
## 293 1996   May   3581        1.64         219.0
## 294 1996   Jun   3933        1.82         215.9
## 295 1996   Jul   3852        1.71         225.4
## 296 1996   Aug   3975        1.73         229.4
## 297 1996   Sep   3805        1.83         207.9
## 298 1996   Oct   3941        1.82         216.0
## 299 1996   Nov   3849        1.93         199.9
## 300 1996   Dec   3793        1.88         201.7
## 301 1995   Jan   3136        1.71         183.6
## 302 1995   Feb   2882        1.67         172.5
## 303 1995   Mar   3233        1.60         202.4
## 304 1995   Apr   3298        1.65         199.5
## 305 1995   May   3454        1.62         213.6
## 306 1995   Jun   3762        1.77         212.3
## 307 1995   Jul   3954        1.82         217.7
## 308 1995   Aug   4270        1.94         219.9
## 309 1995   Sep   3943        1.93         204.3
## 310 1995   Oct   4074        1.97         207.0
## 311 1995   Nov   3679        1.88         195.4
## 312 1995   Dec   3678        1.89         194.6
## 313 1994   Jan   2993        1.77         169.3
## 314 1994   Feb   2697        1.62         166.4
## 315 1994   Mar   3224        1.64         196.1
## 316 1994   Apr   3409        1.74         195.4
## 317 1994   May   3657        1.77         206.6
## 318 1994   Jun   3609        1.74         207.3
## 319 1994   Jul   3921        1.83         214.8
## 320 1994   Aug   3958        1.84         215.0
## 321 1994   Sep   3721        1.86         200.5
## 322 1994   Oct   3982        1.96         202.9
## 323 1994   Nov   3614        1.90         190.1
## 324 1994   Dec   3739        1.94         193.1
## 325 1993   Jan   2944        1.71         171.7
## 326 1993   Feb   2691        1.65         162.8
## 327 1993   Mar   3010        1.60         187.8
## 328 1993   Apr   3155        1.67         188.7
## 329 1993   May   3614        1.76         205.9
## 330 1993   Jun   3527        1.77         199.4
## 331 1993   Jul   4050        1.93         209.8
## 332 1993   Aug   4047        1.93         209.6
## 333 1993   Sep   3741        1.93         193.7
## 334 1993   Oct   3948        2.00         197.8
## 335 1993   Nov   3666        2.01         182.3
## 336 1993   Dec   3500        1.87         186.8
## 337 1992   Jan   2999        1.79         167.9
## 338 1992   Feb   2833        1.77         160.4
## 339 1992   Mar   2934        1.59         184.0
## 340 1992   Apr   3069        1.65         186.3
## 341 1992   May   3582        1.82         197.0
## 342 1992   Jun   3659        1.85         197.5
## 343 1992   Jul   3780        1.83         206.9
## 344 1992   Aug   3867        1.89         205.0
## 345 1992   Sep   3499        1.83         191.2
## 346 1992   Oct   3937        2.02         195.1
## 347 1992   Nov   3395        1.91         177.6
## 348 1992   Dec   3428        1.89         181.2

Then, to using this data as time series object we need to combining the month and year attributes so that they can be extracted into yearmon format.

df <- df %>% 
  filter(year > 2015) %>% 
  mutate(
    month=match(month, c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) # convert month name to month number
    ) %>% 
  mutate(year_month=paste0(year,"-", month))

df_ts <- data.frame(dates=as.yearmon(df$year_month), number=df$number, deaths_rate=df$deaths_rate, vehicle_miles=df$vehicle_miles) 
df_ts
##       dates number deaths_rate vehicle_miles
## 1  Jan 2020   2666        1.06         252.0
## 2  Feb 2020   2674        1.14         234.0
## 3  Mar 2020   2553        1.16         221.0
## 4  Apr 2020   2320        1.40         166.0
## 5  May 2020   3096        1.46         212.0
## 6  Jun 2020   3725        1.51         247.0
## 7  Jul 2020   3789        1.46         260.0
## 8  Aug 2020   3802        1.50         253.0
## 9  Sep 2020   3724        1.51         247.0
## 10 Oct 2020   3793        1.46         259.0
## 11 Nov 2020   3445        1.45         238.0
## 12 Dec 2020   3237        1.34         241.0
## 13 Jan 2019   2670        1.08         248.2
## 14 Feb 2019   2393        1.06         226.7
## 15 Mar 2019   2769        1.02         271.5
## 16 Apr 2019   2820        1.00         281.4
## 17 May 2019   3172        1.11         286.0
## 18 Jun 2019   3201        1.14         280.9
## 19 Jul 2019   3304        1.12         295.6
## 20 Aug 2019   3359        1.17         286.5
## 21 Sep 2019   3331        1.23         271.7
## 22 Oct 2019   3227        1.14         284.0
## 23 Nov 2019   3084        1.18         260.5
## 24 Dec 2019   3025        1.10         274.2
## 25 Jan 2018   3010        1.23         244.7
## 26 Feb 2018   2734        1.20         227.8
## 27 Mar 2018   3015        1.11         270.7
## 28 Apr 2018   2979        1.08         275.1
## 29 May 2018   3443        1.21         283.7
## 30 Jun 2018   3514        1.24         282.6
## 31 Jul 2018   3552        1.22         291.0
## 32 Aug 2018   3490        1.22         285.0
## 33 Sep 2018   3579        1.34         267.4
## 34 Oct 2018   3657        1.30         281.4
## 35 Nov 2018   3250        1.25         260.5
## 36 Dec 2018   3181        1.18         270.4
## 37 Jan 2017   3034        1.24         244.6
## 38 Feb 2017   2748        1.21         226.9
## 39 Mar 2017   3164        1.18         267.4
## 40 Apr 2017   3238        1.19         272.9
## 41 May 2017   3416        1.20         284.0
## 42 Jun 2017   3492        1.24         280.5
## 43 Jul 2017   3730        1.30         287.3
## 44 Aug 2017   3409        1.20         283.2
## 45 Sep 2017   3572        1.36         262.7
## 46 Oct 2017   3629        1.30         278.9
## 47 Nov 2017   3408        1.32         257.7
## 48 Dec 2017   3391        1.27         266.5
## 49 Jan 2016   2734        1.16         236.5
## 50 Feb 2016   2820        1.23         229.0
## 51 Mar 2016   3105        1.15         269.7
## 52 Apr 2016   3152        1.17         268.4
## 53 May 2016   3481        1.26         275.3
## 54 Jun 2016   3542        1.28         277.5
## 55 Jul 2016   3582        1.27         281.4
## 56 Aug 2016   3600        1.29         279.4
## 57 Sep 2016   3612        1.38         261.8
## 58 Oct 2016   3834        1.41         271.7
## 59 Nov 2016   3535        1.37         258.6
## 60 Dec 2016   3330        1.28         260.0

Description for each column are: - dates : month-year - number : the number of deaths on that month-year - deaths_rate : death rate per 100 millions vehicle miles - vehicle_miles monthly vehicle miles (billions)

Data Wrangling

Check Missing Values and Order of the Records

Time Series forecasting require ordered and complete interval data. So, we need to check is the data have missing value or not using isNA(). If the data have it, we can perform padding using pad() function from padr package to add missing monthly records and impute them with zero value.

sum(is.na(df_ts))
## [1] 0

The code shows that the data have complete interval data. But, we know that the order of data is still from the newest to the oldest. So, we need to sort by the dates column.

df_ts <- df_ts[order(as.Date(df_ts$dates, format="%m%Y")),]
row.names(df_ts) <- NULL # remove the index
df_ts
##       dates number deaths_rate vehicle_miles
## 1  Jan 2016   2734        1.16         236.5
## 2  Feb 2016   2820        1.23         229.0
## 3  Mar 2016   3105        1.15         269.7
## 4  Apr 2016   3152        1.17         268.4
## 5  May 2016   3481        1.26         275.3
## 6  Jun 2016   3542        1.28         277.5
## 7  Jul 2016   3582        1.27         281.4
## 8  Aug 2016   3600        1.29         279.4
## 9  Sep 2016   3612        1.38         261.8
## 10 Oct 2016   3834        1.41         271.7
## 11 Nov 2016   3535        1.37         258.6
## 12 Dec 2016   3330        1.28         260.0
## 13 Jan 2017   3034        1.24         244.6
## 14 Feb 2017   2748        1.21         226.9
## 15 Mar 2017   3164        1.18         267.4
## 16 Apr 2017   3238        1.19         272.9
## 17 May 2017   3416        1.20         284.0
## 18 Jun 2017   3492        1.24         280.5
## 19 Jul 2017   3730        1.30         287.3
## 20 Aug 2017   3409        1.20         283.2
## 21 Sep 2017   3572        1.36         262.7
## 22 Oct 2017   3629        1.30         278.9
## 23 Nov 2017   3408        1.32         257.7
## 24 Dec 2017   3391        1.27         266.5
## 25 Jan 2018   3010        1.23         244.7
## 26 Feb 2018   2734        1.20         227.8
## 27 Mar 2018   3015        1.11         270.7
## 28 Apr 2018   2979        1.08         275.1
## 29 May 2018   3443        1.21         283.7
## 30 Jun 2018   3514        1.24         282.6
## 31 Jul 2018   3552        1.22         291.0
## 32 Aug 2018   3490        1.22         285.0
## 33 Sep 2018   3579        1.34         267.4
## 34 Oct 2018   3657        1.30         281.4
## 35 Nov 2018   3250        1.25         260.5
## 36 Dec 2018   3181        1.18         270.4
## 37 Jan 2019   2670        1.08         248.2
## 38 Feb 2019   2393        1.06         226.7
## 39 Mar 2019   2769        1.02         271.5
## 40 Apr 2019   2820        1.00         281.4
## 41 May 2019   3172        1.11         286.0
## 42 Jun 2019   3201        1.14         280.9
## 43 Jul 2019   3304        1.12         295.6
## 44 Aug 2019   3359        1.17         286.5
## 45 Sep 2019   3331        1.23         271.7
## 46 Oct 2019   3227        1.14         284.0
## 47 Nov 2019   3084        1.18         260.5
## 48 Dec 2019   3025        1.10         274.2
## 49 Jan 2020   2666        1.06         252.0
## 50 Feb 2020   2674        1.14         234.0
## 51 Mar 2020   2553        1.16         221.0
## 52 Apr 2020   2320        1.40         166.0
## 53 May 2020   3096        1.46         212.0
## 54 Jun 2020   3725        1.51         247.0
## 55 Jul 2020   3789        1.46         260.0
## 56 Aug 2020   3802        1.50         253.0
## 57 Sep 2020   3724        1.51         247.0
## 58 Oct 2020   3793        1.46         259.0
## 59 Nov 2020   3445        1.45         238.0
## 60 Dec 2020   3237        1.34         241.0

Feature Engineering

There are various possibility of seasonality in our data. In this analysis, I will try to provide monthly data of number of deaths and analyze it as a time series object. The data contain records of 19 year and therefore maybe this is available for yearly analysis.

range(df_ts$dates)
## [1] "Jan 2016" "Dec 2020"

Exploratory Data Analysis and Create TS Object

df_ts_clean <- df_ts %>% 
  select(dates, number) 
df_ts_clean
##       dates number
## 1  Jan 2016   2734
## 2  Feb 2016   2820
## 3  Mar 2016   3105
## 4  Apr 2016   3152
## 5  May 2016   3481
## 6  Jun 2016   3542
## 7  Jul 2016   3582
## 8  Aug 2016   3600
## 9  Sep 2016   3612
## 10 Oct 2016   3834
## 11 Nov 2016   3535
## 12 Dec 2016   3330
## 13 Jan 2017   3034
## 14 Feb 2017   2748
## 15 Mar 2017   3164
## 16 Apr 2017   3238
## 17 May 2017   3416
## 18 Jun 2017   3492
## 19 Jul 2017   3730
## 20 Aug 2017   3409
## 21 Sep 2017   3572
## 22 Oct 2017   3629
## 23 Nov 2017   3408
## 24 Dec 2017   3391
## 25 Jan 2018   3010
## 26 Feb 2018   2734
## 27 Mar 2018   3015
## 28 Apr 2018   2979
## 29 May 2018   3443
## 30 Jun 2018   3514
## 31 Jul 2018   3552
## 32 Aug 2018   3490
## 33 Sep 2018   3579
## 34 Oct 2018   3657
## 35 Nov 2018   3250
## 36 Dec 2018   3181
## 37 Jan 2019   2670
## 38 Feb 2019   2393
## 39 Mar 2019   2769
## 40 Apr 2019   2820
## 41 May 2019   3172
## 42 Jun 2019   3201
## 43 Jul 2019   3304
## 44 Aug 2019   3359
## 45 Sep 2019   3331
## 46 Oct 2019   3227
## 47 Nov 2019   3084
## 48 Dec 2019   3025
## 49 Jan 2020   2666
## 50 Feb 2020   2674
## 51 Mar 2020   2553
## 52 Apr 2020   2320
## 53 May 2020   3096
## 54 Jun 2020   3725
## 55 Jul 2020   3789
## 56 Aug 2020   3802
## 57 Sep 2020   3724
## 58 Oct 2020   3793
## 59 Nov 2020   3445
## 60 Dec 2020   3237

The four seasons of the year in the United States - spring, summer, fall, and winter. In this case, the number of deaths may be based on the season. Most of the variation in monthly motor vehicle deaths is associated with the number of vehicles traveling. Such as, during the winter months, vehicle mileage and deaths decrease, while in the summer months, vehicle mileage and deaths increase.

plot(df_ts_clean$dates, df_ts_clean$number, type="l")

# initialize ts object
deaths_ts <- ts(
  data=df_ts_clean,
  frequency=12, # trisemester seasonality (spring, summer, fall, and winter)
)
# decompose ts object
deaths_decompose <- decompose(deaths_ts)
autoplot(deaths_decompose)