knitr::include_graphics("assets/motor-vehicle-crashes.jpg")
Life changed in many ways because of the COVID-19 pandemic in the our world, including in United States. In addition to the COVID-19 deaths, preventable injury related deaths also experienced a dramatic increase in 2020. One component of this increase was motor-vehicle deaths. Roads became less safe in 2020 for a variety of reasons, including an increase in non-restrained occupant deaths, speeding, and alcohol impaired fatal crashes in United States. Data by NSC Injury Facts shows that after three consecutive years of decreases, deaths increased 8.3%; 42,338 people died in motor-vehicle crashes in 2020 compared to 39,107 in 2019.
On other hand, time series is a method of analyzing and processing data which the values are affected by time. The action of predicting future values based on its value in the previous period of time is called forecasting. The data which formatted into a time series (ts) object must have some characteristics: (1) no missing intervals, (2) no missing values, (3) data should be ordered by time.
Because of that, time series forecasting will be used to determine the prediction of the number of accidents in the next few years. After that, an analysis will also be carried out on how the seasonal pattern is owned. That way, the results carried out can be a reference and concern for the wider community, and it is hoped that the number of accidents in the United States can decrease again.
The National Safety Council (NSC) and the National Highway Traffic Safety Administration (NHTSA) count motor-vehicle crash deaths monthly in 1992 - 2020 https://injuryfacts.nsc.org/motor-vehicle/overview/crashes-by-month/. There are several attribute that provided, such as the number of deaths, vehicle miles, and death rate per 100 millions vehicle miles. The mileage death rate helps correct for the variation in monthly mileage and is a more accurate measure of risk when comparing trends. Much of the variation in monthly motor-vehicle fatalities is associated with the number of vehicle miles.
The following are the packages used during the time series work:
library(readxl) # read data
## Warning: package 'readxl' was built under R version 4.2.2
library(dplyr) # for data wrangling
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate) # to deal with date
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(xts) # to deal with date
## Warning: package 'xts' was built under R version 4.2.2
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
library(padr) # for padding
## Warning: package 'padr' was built under R version 4.2.2
library(forecast) # time series library
## Warning: package 'forecast' was built under R version 4.2.2
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(tseries) # for adf.test
## Warning: package 'tseries' was built under R version 4.2.2
library(MLmetrics) # calculate error
##
## Attaching package: 'MLmetrics'
## The following object is masked from 'package:base':
##
## Recall
The data was obtained from https://injuryfacts.nsc.org/motor-vehicle/overview/crashes-by-month/ about the number of deaths, vehicle miles, and death rate per 100 millions vehicle miles of motor-vehicle crash deaths monthly in 1992 - 2020 in United Stater.
knitr::opts_chunk$set(eval = FALSE)
deaths <- read_excel(“data/deaths-by-month-crashed.xlsx”)
deaths <- deaths[-c(1:3), ] colnames(deaths) <- c(‘year’, ‘month’, ‘number’, ‘deaths_rate’, ‘vehicle_miles’) deaths <- deaths[!grepl(“Total”, deaths$year),] # don’t include the summary of total deaths per year from the original data deaths <- head(deaths, -1)
write.csv(deaths, “data/deaths-crashed.csv”, row.names = FALSE) # save the data that has been cleaned in csv format
After the data is processed from .xlsx format to .csv. Data is read
using read_csv function.
df <- read.csv("data/deaths-crashed.csv")
df
## year month number deaths_rate vehicle_miles
## 1 2020 Jan 2666 1.06 252.0
## 2 2020 Feb 2674 1.14 234.0
## 3 2020 Mar 2553 1.16 221.0
## 4 2020 Apr 2320 1.40 166.0
## 5 2020 May 3096 1.46 212.0
## 6 2020 Jun 3725 1.51 247.0
## 7 2020 Jul 3789 1.46 260.0
## 8 2020 Aug 3802 1.50 253.0
## 9 2020 Sep 3724 1.51 247.0
## 10 2020 Oct 3793 1.46 259.0
## 11 2020 Nov 3445 1.45 238.0
## 12 2020 Dec 3237 1.34 241.0
## 13 2019 Jan 2670 1.08 248.2
## 14 2019 Feb 2393 1.06 226.7
## 15 2019 Mar 2769 1.02 271.5
## 16 2019 Apr 2820 1.00 281.4
## 17 2019 May 3172 1.11 286.0
## 18 2019 Jun 3201 1.14 280.9
## 19 2019 Jul 3304 1.12 295.6
## 20 2019 Aug 3359 1.17 286.5
## 21 2019 Sep 3331 1.23 271.7
## 22 2019 Oct 3227 1.14 284.0
## 23 2019 Nov 3084 1.18 260.5
## 24 2019 Dec 3025 1.10 274.2
## 25 2018 Jan 3010 1.23 244.7
## 26 2018 Feb 2734 1.20 227.8
## 27 2018 Mar 3015 1.11 270.7
## 28 2018 Apr 2979 1.08 275.1
## 29 2018 May 3443 1.21 283.7
## 30 2018 Jun 3514 1.24 282.6
## 31 2018 Jul 3552 1.22 291.0
## 32 2018 Aug 3490 1.22 285.0
## 33 2018 Sep 3579 1.34 267.4
## 34 2018 Oct 3657 1.30 281.4
## 35 2018 Nov 3250 1.25 260.5
## 36 2018 Dec 3181 1.18 270.4
## 37 2017 Jan 3034 1.24 244.6
## 38 2017 Feb 2748 1.21 226.9
## 39 2017 Mar 3164 1.18 267.4
## 40 2017 Apr 3238 1.19 272.9
## 41 2017 May 3416 1.20 284.0
## 42 2017 Jun 3492 1.24 280.5
## 43 2017 Jul 3730 1.30 287.3
## 44 2017 Aug 3409 1.20 283.2
## 45 2017 Sep 3572 1.36 262.7
## 46 2017 Oct 3629 1.30 278.9
## 47 2017 Nov 3408 1.32 257.7
## 48 2017 Dec 3391 1.27 266.5
## 49 2016 Jan 2734 1.16 236.5
## 50 2016 Feb 2820 1.23 229.0
## 51 2016 Mar 3105 1.15 269.7
## 52 2016 Apr 3152 1.17 268.4
## 53 2016 May 3481 1.26 275.3
## 54 2016 Jun 3542 1.28 277.5
## 55 2016 Jul 3582 1.27 281.4
## 56 2016 Aug 3600 1.29 279.4
## 57 2016 Sep 3612 1.38 261.8
## 58 2016 Oct 3834 1.41 271.7
## 59 2016 Nov 3535 1.37 258.6
## 60 2016 Dec 3330 1.28 260.0
## 61 2015 Jan 2754 1.17 236.0
## 62 2015 Feb 2350 1.07 219.5
## 63 2015 Mar 2764 1.06 260.9
## 64 2015 Apr 2830 1.06 265.9
## 65 2015 May 3339 1.22 273.9
## 66 2015 Jun 3222 1.18 273.7
## 67 2015 Jul 3530 1.25 281.7
## 68 2015 Aug 3642 1.32 275.4
## 69 2015 Sep 3372 1.31 258.0
## 70 2015 Oct 3550 1.31 271.6
## 71 2015 Nov 3159 1.26 251.5
## 72 2015 Dec 3245 1.24 262.3
## 73 2014 Jan 2572 1.14 226.4
## 74 2014 Feb 2248 1.04 215.2
## 75 2014 Mar 2589 1.03 252.1
## 76 2014 Apr 2720 1.05 257.9
## 77 2014 May 3038 1.13 268.1
## 78 2014 Jun 3084 1.16 264.9
## 79 2014 Jul 3227 1.19 272.3
## 80 2014 Aug 3277 1.21 271.0
## 81 2014 Sep 3069 1.23 249.1
## 82 2014 Oct 3304 1.24 267.2
## 83 2014 Nov 3175 1.31 242.8
## 84 2014 Dec 3095 1.22 253.6
## 85 2013 Jan 2642 1.17 226.7
## 86 2013 Feb 2296 1.07 214.5
## 87 2013 Mar 2791 1.12 248.6
## 88 2013 Apr 2719 1.09 250.1
## 89 2013 May 2988 1.14 261.8
## 90 2013 Jun 3181 1.23 258.0
## 91 2013 Jul 3119 1.19 262.8
## 92 2013 Aug 3378 1.27 266.7
## 93 2013 Sep 3184 1.32 241.0
## 94 2013 Oct 3173 1.23 257.6
## 95 2013 Nov 3076 1.29 238.5
## 96 2013 Dec 2822 1.18 239.5
## 97 2012 Jan 2782 1.23 225.7
## 98 2012 Feb 2533 1.16 217.7
## 99 2012 Mar 2954 1.17 252.5
## 100 2012 Apr 2906 1.17 248.3
## 101 2012 May 3145 1.21 259.9
## 102 2012 Jun 3235 1.25 259.0
## 103 2012 Jul 3336 1.29 259.4
## 104 2012 Aug 3267 1.24 263.6
## 105 2012 Sep 3151 1.32 237.9
## 106 2012 Oct 3090 1.22 252.9
## 107 2012 Nov 3022 1.26 239.6
## 108 2012 Dec 2994 1.26 237.6
## 109 2011 Jan 2741 1.24 221.4
## 110 2011 Feb 2236 1.05 212.4
## 111 2011 Mar 2662 1.07 249.1
## 112 2011 Apr 2805 1.13 248.0
## 113 2011 May 3030 1.20 252.8
## 114 2011 Jun 3130 1.22 256.6
## 115 2011 Jul 3417 1.32 258.8
## 116 2011 Aug 3288 1.27 259.2
## 117 2011 Sep 3080 1.28 240.4
## 118 2011 Oct 3336 1.33 250.6
## 119 2011 Nov 2962 1.25 236.9
## 120 2011 Dec 2886 1.19 243.2
## 121 2010 Jan 2515 1.13 223.0
## 122 2010 Feb 2180 1.02 212.8
## 123 2010 Mar 2575 1.01 253.9
## 124 2010 Apr 2996 1.17 256.6
## 125 2010 May 3169 1.22 259.0
## 126 2010 Jun 3035 1.15 262.9
## 127 2010 Jul 3296 1.23 268.5
## 128 2010 Aug 3357 1.25 267.6
## 129 2010 Sep 3155 1.27 247.9
## 130 2010 Oct 3311 1.27 260.0
## 131 2010 Nov 2965 1.22 243.1
## 132 2010 Dec 2778 1.14 243.1
## 133 2009 Jan 2820 1.25 226.4
## 134 2009 Feb 2481 1.13 219.2
## 135 2009 Mar 2833 1.14 249.1
## 136 2009 Apr 3019 1.19 252.9
## 137 2009 May 3272 1.26 260.4
## 138 2009 Jun 3216 1.23 260.5
## 139 2009 Jul 3326 1.24 267.8
## 140 2009 Aug 3353 1.27 263.0
## 141 2009 Sep 3077 1.26 244.1
## 142 2009 Oct 3087 1.21 254.6
## 143 2009 Nov 2984 1.25 239.2
## 144 2009 Dec 2748 1.14 242.0
## 145 2008 Jan 3043 1.33 229.5
## 146 2008 Feb 2988 1.37 217.6
## 147 2008 Mar 2998 1.21 248.2
## 148 2008 Apr 3161 1.27 248.1
## 149 2008 May 3419 1.33 257.1
## 150 2008 Jun 3483 1.38 251.8
## 151 2008 Jul 3513 1.36 257.6
## 152 2008 Aug 3748 1.46 257.1
## 153 2008 Sep 3358 1.43 234.9
## 154 2008 Oct 3562 1.41 252.1
## 155 2008 Nov 3275 1.40 233.2
## 156 2008 Dec 3242 1.36 238.6
## 157 2007 Jan 3227 1.38 233.7
## 158 2007 Feb 3083 1.41 218.7
## 159 2007 Mar 3688 1.42 259.3
## 160 2007 Apr 3566 1.41 252.4
## 161 2007 May 3881 1.45 267.2
## 162 2007 Jun 3860 1.45 265.3
## 163 2007 Jul 4079 1.53 267.0
## 164 2007 Aug 3915 1.44 271.5
## 165 2007 Sep 3790 1.54 246.2
## 166 2007 Oct 3769 1.44 261.6
## 167 2007 Nov 3606 1.47 246.0
## 168 2007 Dec 3481 1.44 240.9
## 169 2006 Jan 3396 1.46 232.9
## 170 2006 Feb 3164 1.43 220.8
## 171 2006 Mar 3567 1.39 256.6
## 172 2006 Apr 3745 1.50 250.4
## 173 2006 May 3952 1.50 264.2
## 174 2006 Jun 3916 1.49 263.2
## 175 2006 Jul 4092 1.56 262.8
## 176 2006 Aug 4088 1.53 266.5
## 177 2006 Sep 3951 1.60 246.4
## 178 2006 Oct 4048 1.57 258.1
## 179 2006 Nov 3704 1.51 245.8
## 180 2006 Dec 3693 1.50 246.3
## 181 2005 Jan 3305 1.47 224.2
## 182 2005 Feb 3042 1.38 220.1
## 183 2005 Mar 3334 1.31 253.6
## 184 2005 Apr 3686 1.47 250.9
## 185 2005 May 3874 1.48 262.1
## 186 2005 Jun 3865 1.46 264.0
## 187 2005 Jul 4391 1.64 267.4
## 188 2005 Aug 4039 1.52 265.3
## 189 2005 Sep 3920 1.62 241.6
## 190 2005 Oct 4209 1.67 252.1
## 191 2005 Nov 3922 1.61 243.5
## 192 2005 Dec 3756 1.53 244.9
## 193 2004 Jan 3410 1.54 222.0
## 194 2004 Feb 3128 1.47 213.5
## 195 2004 Mar 3422 1.36 252.1
## 196 2004 Apr 3567 1.42 251.7
## 197 2004 May 3952 1.54 257.1
## 198 2004 Jun 3960 1.54 257.8
## 199 2004 Jul 4132 1.55 266.2
## 200 2004 Aug 4212 1.60 262.9
## 201 2004 Sep 3733 1.54 242.9
## 202 2004 Oct 3991 1.57 253.7
## 203 2004 Nov 3640 1.53 238.6
## 204 2004 Dec 3786 1.55 243.9
## 205 2003 Jan 3226 1.48 218.3
## 206 2003 Feb 2927 1.44 203.1
## 207 2003 Mar 3307 1.39 237.1
## 208 2003 Apr 3585 1.50 238.9
## 209 2003 May 3758 1.48 253.6
## 210 2003 Jun 3915 1.55 252.1
## 211 2003 Jul 4203 1.61 261.3
## 212 2003 Aug 4201 1.61 260.3
## 213 2003 Sep 3855 1.63 236.4
## 214 2003 Oct 4108 1.62 253.8
## 215 2003 Nov 3919 1.66 235.4
## 216 2003 Dec 3753 1.56 240.6
## 217 2002 Jan 3458 1.60 216.0
## 218 2002 Feb 3063 1.47 208.5
## 219 2002 Mar 3654 1.55 236.3
## 220 2002 Apr 3419 1.44 236.9
## 221 2002 May 3893 1.54 252.0
## 222 2002 Jun 4018 1.62 248.3
## 223 2002 Jul 4251 1.66 256.1
## 224 2002 Aug 4159 1.61 258.3
## 225 2002 Sep 4018 1.72 233.1
## 226 2002 Oct 3972 1.62 245.7
## 227 2002 Nov 3683 1.60 230.8
## 228 2002 Dec 3792 1.62 233.7
## 229 2001 Jan 3083 1.47 209.3
## 230 2001 Feb 2878 1.44 199.9
## 231 2001 Mar 3262 1.41 231.5
## 232 2001 Apr 3428 1.48 231.4
## 233 2001 May 3810 1.56 244.3
## 234 2001 Jun 3728 1.54 242.6
## 235 2001 Jul 4059 1.63 248.8
## 236 2001 Aug 4047 1.61 251.7
## 237 2001 Sep 3752 1.67 224.6
## 238 2001 Oct 4055 1.69 240.0
## 239 2001 Nov 3833 1.67 229.5
## 240 2001 Dec 3853 1.69 228.1
## 241 2000 Jan 3362 1.65 203.6
## 242 2000 Feb 2909 1.46 199.4
## 243 2000 Mar 3301 1.42 232.6
## 244 2000 Apr 3494 1.53 227.8
## 245 2000 May 3697 1.52 242.6
## 246 2000 Jun 3760 1.55 243.0
## 247 2000 Jul 4157 1.70 245.1
## 248 2000 Aug 3984 1.61 247.8
## 249 2000 Sep 3837 1.69 227.3
## 250 2000 Oct 3836 1.62 236.5
## 251 2000 Nov 3462 1.55 222.8
## 252 2000 Dec 3555 1.63 218.4
## 253 1999 Jan 2936 1.51 194.4
## 254 1999 Feb 2750 1.43 192.3
## 255 1999 Mar 3209 1.45 221.7
## 256 1999 Apr 3148 1.42 222.0
## 257 1999 May 3741 1.61 231.8
## 258 1999 Jun 3590 1.51 237.0
## 259 1999 Jul 3910 1.60 244.2
## 260 1999 Aug 4061 1.67 242.6
## 261 1999 Sep 3715 1.65 225.3
## 262 1999 Oct 3945 1.66 237.7
## 263 1999 Nov 3658 1.64 222.8
## 264 1999 Dec 3738 1.68 222.5
## 265 1998 Jan 3265 1.66 196.9
## 266 1998 Feb 2956 1.58 187.2
## 267 1998 Mar 3229 1.51 214.2
## 268 1998 Apr 3216 1.48 217.9
## 269 1998 May 3625 1.59 227.9
## 270 1998 Jun 3733 1.63 228.7
## 271 1998 Jul 4079 1.70 239.9
## 272 1998 Aug 4216 1.78 237.1
## 273 1998 Sep 3741 1.70 219.5
## 274 1998 Oct 3867 1.69 228.5
## 275 1998 Nov 3728 1.77 211.2
## 276 1998 Dec 3846 1.78 216.3
## 277 1997 Jan 3334 1.75 190.1
## 278 1997 Feb 2915 1.59 183.9
## 279 1997 Mar 3218 1.52 211.9
## 280 1997 Apr 3311 1.56 211.9
## 281 1997 May 3906 1.73 226.1
## 282 1997 Jun 3774 1.70 222.2
## 283 1997 Jul 3997 1.69 236.7
## 284 1997 Aug 4108 1.76 233.5
## 285 1997 Sep 3598 1.69 213.5
## 286 1997 Oct 3841 1.74 221.2
## 287 1997 Nov 3693 1.82 202.4
## 288 1997 Dec 3763 1.82 207.3
## 289 1996 Jan 3160 1.72 183.7
## 290 1996 Feb 2951 1.67 176.8
## 291 1996 Mar 3413 1.67 204.5
## 292 1996 Apr 3396 1.65 205.6
## 293 1996 May 3581 1.64 219.0
## 294 1996 Jun 3933 1.82 215.9
## 295 1996 Jul 3852 1.71 225.4
## 296 1996 Aug 3975 1.73 229.4
## 297 1996 Sep 3805 1.83 207.9
## 298 1996 Oct 3941 1.82 216.0
## 299 1996 Nov 3849 1.93 199.9
## 300 1996 Dec 3793 1.88 201.7
## 301 1995 Jan 3136 1.71 183.6
## 302 1995 Feb 2882 1.67 172.5
## 303 1995 Mar 3233 1.60 202.4
## 304 1995 Apr 3298 1.65 199.5
## 305 1995 May 3454 1.62 213.6
## 306 1995 Jun 3762 1.77 212.3
## 307 1995 Jul 3954 1.82 217.7
## 308 1995 Aug 4270 1.94 219.9
## 309 1995 Sep 3943 1.93 204.3
## 310 1995 Oct 4074 1.97 207.0
## 311 1995 Nov 3679 1.88 195.4
## 312 1995 Dec 3678 1.89 194.6
## 313 1994 Jan 2993 1.77 169.3
## 314 1994 Feb 2697 1.62 166.4
## 315 1994 Mar 3224 1.64 196.1
## 316 1994 Apr 3409 1.74 195.4
## 317 1994 May 3657 1.77 206.6
## 318 1994 Jun 3609 1.74 207.3
## 319 1994 Jul 3921 1.83 214.8
## 320 1994 Aug 3958 1.84 215.0
## 321 1994 Sep 3721 1.86 200.5
## 322 1994 Oct 3982 1.96 202.9
## 323 1994 Nov 3614 1.90 190.1
## 324 1994 Dec 3739 1.94 193.1
## 325 1993 Jan 2944 1.71 171.7
## 326 1993 Feb 2691 1.65 162.8
## 327 1993 Mar 3010 1.60 187.8
## 328 1993 Apr 3155 1.67 188.7
## 329 1993 May 3614 1.76 205.9
## 330 1993 Jun 3527 1.77 199.4
## 331 1993 Jul 4050 1.93 209.8
## 332 1993 Aug 4047 1.93 209.6
## 333 1993 Sep 3741 1.93 193.7
## 334 1993 Oct 3948 2.00 197.8
## 335 1993 Nov 3666 2.01 182.3
## 336 1993 Dec 3500 1.87 186.8
## 337 1992 Jan 2999 1.79 167.9
## 338 1992 Feb 2833 1.77 160.4
## 339 1992 Mar 2934 1.59 184.0
## 340 1992 Apr 3069 1.65 186.3
## 341 1992 May 3582 1.82 197.0
## 342 1992 Jun 3659 1.85 197.5
## 343 1992 Jul 3780 1.83 206.9
## 344 1992 Aug 3867 1.89 205.0
## 345 1992 Sep 3499 1.83 191.2
## 346 1992 Oct 3937 2.02 195.1
## 347 1992 Nov 3395 1.91 177.6
## 348 1992 Dec 3428 1.89 181.2
Then, to using this data as time series object we need to combining
the month and year attributes so that they can be extracted into yearmon
format.
df <- df %>%
filter(year > 2015) %>%
mutate(
month=match(month, c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) # convert month name to month number
) %>%
mutate(year_month=paste0(year,"-", month))
df_ts <- data.frame(dates=as.yearmon(df$year_month), number=df$number, deaths_rate=df$deaths_rate, vehicle_miles=df$vehicle_miles)
df_ts
## dates number deaths_rate vehicle_miles
## 1 Jan 2020 2666 1.06 252.0
## 2 Feb 2020 2674 1.14 234.0
## 3 Mar 2020 2553 1.16 221.0
## 4 Apr 2020 2320 1.40 166.0
## 5 May 2020 3096 1.46 212.0
## 6 Jun 2020 3725 1.51 247.0
## 7 Jul 2020 3789 1.46 260.0
## 8 Aug 2020 3802 1.50 253.0
## 9 Sep 2020 3724 1.51 247.0
## 10 Oct 2020 3793 1.46 259.0
## 11 Nov 2020 3445 1.45 238.0
## 12 Dec 2020 3237 1.34 241.0
## 13 Jan 2019 2670 1.08 248.2
## 14 Feb 2019 2393 1.06 226.7
## 15 Mar 2019 2769 1.02 271.5
## 16 Apr 2019 2820 1.00 281.4
## 17 May 2019 3172 1.11 286.0
## 18 Jun 2019 3201 1.14 280.9
## 19 Jul 2019 3304 1.12 295.6
## 20 Aug 2019 3359 1.17 286.5
## 21 Sep 2019 3331 1.23 271.7
## 22 Oct 2019 3227 1.14 284.0
## 23 Nov 2019 3084 1.18 260.5
## 24 Dec 2019 3025 1.10 274.2
## 25 Jan 2018 3010 1.23 244.7
## 26 Feb 2018 2734 1.20 227.8
## 27 Mar 2018 3015 1.11 270.7
## 28 Apr 2018 2979 1.08 275.1
## 29 May 2018 3443 1.21 283.7
## 30 Jun 2018 3514 1.24 282.6
## 31 Jul 2018 3552 1.22 291.0
## 32 Aug 2018 3490 1.22 285.0
## 33 Sep 2018 3579 1.34 267.4
## 34 Oct 2018 3657 1.30 281.4
## 35 Nov 2018 3250 1.25 260.5
## 36 Dec 2018 3181 1.18 270.4
## 37 Jan 2017 3034 1.24 244.6
## 38 Feb 2017 2748 1.21 226.9
## 39 Mar 2017 3164 1.18 267.4
## 40 Apr 2017 3238 1.19 272.9
## 41 May 2017 3416 1.20 284.0
## 42 Jun 2017 3492 1.24 280.5
## 43 Jul 2017 3730 1.30 287.3
## 44 Aug 2017 3409 1.20 283.2
## 45 Sep 2017 3572 1.36 262.7
## 46 Oct 2017 3629 1.30 278.9
## 47 Nov 2017 3408 1.32 257.7
## 48 Dec 2017 3391 1.27 266.5
## 49 Jan 2016 2734 1.16 236.5
## 50 Feb 2016 2820 1.23 229.0
## 51 Mar 2016 3105 1.15 269.7
## 52 Apr 2016 3152 1.17 268.4
## 53 May 2016 3481 1.26 275.3
## 54 Jun 2016 3542 1.28 277.5
## 55 Jul 2016 3582 1.27 281.4
## 56 Aug 2016 3600 1.29 279.4
## 57 Sep 2016 3612 1.38 261.8
## 58 Oct 2016 3834 1.41 271.7
## 59 Nov 2016 3535 1.37 258.6
## 60 Dec 2016 3330 1.28 260.0
Description for each column are: - dates : month-year -
number : the number of deaths on that month-year -
deaths_rate : death rate per 100 millions vehicle miles -
vehicle_miles monthly vehicle miles (billions)
Time Series forecasting require ordered and complete interval data.
So, we need to check is the data have missing value or not using
isNA(). If the data have it, we can perform padding using
pad() function from padr package to add missing monthly
records and impute them with zero value.
sum(is.na(df_ts))
## [1] 0
The code shows that the data have complete interval data. But, we
know that the order of data is still from the newest to the oldest. So,
we need to sort by the dates column.
df_ts <- df_ts[order(as.Date(df_ts$dates, format="%m%Y")),]
row.names(df_ts) <- NULL # remove the index
df_ts
## dates number deaths_rate vehicle_miles
## 1 Jan 2016 2734 1.16 236.5
## 2 Feb 2016 2820 1.23 229.0
## 3 Mar 2016 3105 1.15 269.7
## 4 Apr 2016 3152 1.17 268.4
## 5 May 2016 3481 1.26 275.3
## 6 Jun 2016 3542 1.28 277.5
## 7 Jul 2016 3582 1.27 281.4
## 8 Aug 2016 3600 1.29 279.4
## 9 Sep 2016 3612 1.38 261.8
## 10 Oct 2016 3834 1.41 271.7
## 11 Nov 2016 3535 1.37 258.6
## 12 Dec 2016 3330 1.28 260.0
## 13 Jan 2017 3034 1.24 244.6
## 14 Feb 2017 2748 1.21 226.9
## 15 Mar 2017 3164 1.18 267.4
## 16 Apr 2017 3238 1.19 272.9
## 17 May 2017 3416 1.20 284.0
## 18 Jun 2017 3492 1.24 280.5
## 19 Jul 2017 3730 1.30 287.3
## 20 Aug 2017 3409 1.20 283.2
## 21 Sep 2017 3572 1.36 262.7
## 22 Oct 2017 3629 1.30 278.9
## 23 Nov 2017 3408 1.32 257.7
## 24 Dec 2017 3391 1.27 266.5
## 25 Jan 2018 3010 1.23 244.7
## 26 Feb 2018 2734 1.20 227.8
## 27 Mar 2018 3015 1.11 270.7
## 28 Apr 2018 2979 1.08 275.1
## 29 May 2018 3443 1.21 283.7
## 30 Jun 2018 3514 1.24 282.6
## 31 Jul 2018 3552 1.22 291.0
## 32 Aug 2018 3490 1.22 285.0
## 33 Sep 2018 3579 1.34 267.4
## 34 Oct 2018 3657 1.30 281.4
## 35 Nov 2018 3250 1.25 260.5
## 36 Dec 2018 3181 1.18 270.4
## 37 Jan 2019 2670 1.08 248.2
## 38 Feb 2019 2393 1.06 226.7
## 39 Mar 2019 2769 1.02 271.5
## 40 Apr 2019 2820 1.00 281.4
## 41 May 2019 3172 1.11 286.0
## 42 Jun 2019 3201 1.14 280.9
## 43 Jul 2019 3304 1.12 295.6
## 44 Aug 2019 3359 1.17 286.5
## 45 Sep 2019 3331 1.23 271.7
## 46 Oct 2019 3227 1.14 284.0
## 47 Nov 2019 3084 1.18 260.5
## 48 Dec 2019 3025 1.10 274.2
## 49 Jan 2020 2666 1.06 252.0
## 50 Feb 2020 2674 1.14 234.0
## 51 Mar 2020 2553 1.16 221.0
## 52 Apr 2020 2320 1.40 166.0
## 53 May 2020 3096 1.46 212.0
## 54 Jun 2020 3725 1.51 247.0
## 55 Jul 2020 3789 1.46 260.0
## 56 Aug 2020 3802 1.50 253.0
## 57 Sep 2020 3724 1.51 247.0
## 58 Oct 2020 3793 1.46 259.0
## 59 Nov 2020 3445 1.45 238.0
## 60 Dec 2020 3237 1.34 241.0
There are various possibility of seasonality in our data. In this analysis, I will try to provide monthly data of number of deaths and analyze it as a time series object. The data contain records of 19 year and therefore maybe this is available for yearly analysis.
range(df_ts$dates)
## [1] "Jan 2016" "Dec 2020"
df_ts_clean <- df_ts %>%
select(dates, number)
df_ts_clean
## dates number
## 1 Jan 2016 2734
## 2 Feb 2016 2820
## 3 Mar 2016 3105
## 4 Apr 2016 3152
## 5 May 2016 3481
## 6 Jun 2016 3542
## 7 Jul 2016 3582
## 8 Aug 2016 3600
## 9 Sep 2016 3612
## 10 Oct 2016 3834
## 11 Nov 2016 3535
## 12 Dec 2016 3330
## 13 Jan 2017 3034
## 14 Feb 2017 2748
## 15 Mar 2017 3164
## 16 Apr 2017 3238
## 17 May 2017 3416
## 18 Jun 2017 3492
## 19 Jul 2017 3730
## 20 Aug 2017 3409
## 21 Sep 2017 3572
## 22 Oct 2017 3629
## 23 Nov 2017 3408
## 24 Dec 2017 3391
## 25 Jan 2018 3010
## 26 Feb 2018 2734
## 27 Mar 2018 3015
## 28 Apr 2018 2979
## 29 May 2018 3443
## 30 Jun 2018 3514
## 31 Jul 2018 3552
## 32 Aug 2018 3490
## 33 Sep 2018 3579
## 34 Oct 2018 3657
## 35 Nov 2018 3250
## 36 Dec 2018 3181
## 37 Jan 2019 2670
## 38 Feb 2019 2393
## 39 Mar 2019 2769
## 40 Apr 2019 2820
## 41 May 2019 3172
## 42 Jun 2019 3201
## 43 Jul 2019 3304
## 44 Aug 2019 3359
## 45 Sep 2019 3331
## 46 Oct 2019 3227
## 47 Nov 2019 3084
## 48 Dec 2019 3025
## 49 Jan 2020 2666
## 50 Feb 2020 2674
## 51 Mar 2020 2553
## 52 Apr 2020 2320
## 53 May 2020 3096
## 54 Jun 2020 3725
## 55 Jul 2020 3789
## 56 Aug 2020 3802
## 57 Sep 2020 3724
## 58 Oct 2020 3793
## 59 Nov 2020 3445
## 60 Dec 2020 3237
The four seasons of the year in the United States - spring, summer, fall, and winter. In this case, the number of deaths may be based on the season. Most of the variation in monthly motor vehicle deaths is associated with the number of vehicles traveling. Such as, during the winter months, vehicle mileage and deaths decrease, while in the summer months, vehicle mileage and deaths increase.
plot(df_ts_clean$dates, df_ts_clean$number, type="l")
# initialize ts object
deaths_ts <- ts(
data=df_ts_clean,
frequency=12, # trisemester seasonality (spring, summer, fall, and winter)
)
# decompose ts object
deaths_decompose <- decompose(deaths_ts)
autoplot(deaths_decompose)