We will be analysing how the trend of monthly interstate traffic volume changes across the years between 2012 and 2018 and then do a forecast for the next 2 years. According to the findings, the monthly traffic volume stays the same even though the data shows slightly upward trend before applying time series methods.
The Metro Interstate Traffic Volume Data Set is taken from the UCI Machine Learning Repository (UCI, 2019). The initial data set contains 48,204 instances of hourly traffic data with a total of 9 attributes such as holiday type, temperature, rain fall, snow, clouds, weather (short description), weather (long description), date time hour of the data collected and traffic volume collected between October 2012 to Janrauary 2018 and the location is roughly between Minneapolis and St Paul, MN. It is a concern for the Minneapolis especially during holiday seasons that the traffic accidents on highways have been increasing drastically. Some argue that this may be a result of increased traffic volume despite the fact that average daily traffic has been decreased by 5% within the city (Lee, 2019). In this report, we will look up whether the traffic volume is actually going up. Even though the data contain detailed conditions on weather and season, they will not be considered in this report as we are only intersted in the volume. The data is also combined to monthly traffic volume instead of hourly traffic volume as we are interested in the overall increase in traffic volume.
The data is count data and the original dataset is a very big dataset with 48,204 instances. However, after combining the traffic volume to montly volume, the dataset is left with only 63 instances. This is a very small dataset to be using the Time Series methods taught in this unit and there are more appropriate methods outside of the scope of this unit. Nonetheless, we will still try to apply thes methods learned in the class. It will be interesting to see the difference on the difference in methods between the count data and continuous data for predictions but this is not the aim of this project.
Our hypothesis is that the monthly traffic volume between the two cities will have an upward increasing trend.
As we are only interested in the monthly traffic volume, the date_time column of the dataset is separated and the monthly volume is calculated. Then the data is processed and fit into different models to see which model will give the optimum prediction for the next two years.
The original dataset looked like the following.
## holiday temp rain_1h snow_1h clouds_all weather_main weather_description
## 1 None 288.28 0 0 40 Clouds scattered clouds
## 2 None 289.36 0 0 75 Clouds broken clouds
## 3 None 289.58 0 0 90 Clouds overcast clouds
## 4 None 290.13 0 0 90 Clouds overcast clouds
## 5 None 291.14 0 0 75 Clouds broken clouds
## date_time traffic_volume
## 1 2012-10-02 09:00:00 5545
## 2 2012-10-02 10:00:00 4516
## 3 2012-10-02 11:00:00 4767
## 4 2012-10-02 12:00:00 5026
## 5 2012-10-02 13:00:00 4918
-splitted date_time column to date and time
-splitted date into year, month and day
-grouped traffic flow by month
In Fig 1., the data shows some seasonality trends and there is a hugh discrease in the number of traffic at the end of 2014 despite the fact that it is holiday season. The general trend does not show clear indication of whether there is an increasing trend in the traffic volume. As next step, we checked whether the previous month’s traffic affects the traffic volume of the next month.
## [1] 0.5413655
The analysis shows that there is a weak positive correlation between the previous month’s traffic volume to the next month as it can be seen in Fig 2. and correlation value = 0.541. Next we will check whether we can draw a line through the model to see if we can predict the traffic volume data.
##
## Call:
## lm(formula = df ~ time(df))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1880049 -254907 93368 321097 969323
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -232059487 70435470 -3.295 0.00154 **
## time(df) 116414 34956 3.330 0.00138 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 524400 on 71 degrees of freedom
## Multiple R-squared: 0.1351, Adjusted R-squared: 0.1229
## F-statistic: 11.09 on 1 and 71 DF, p-value: 0.001379
According to the F-statistics, linear model is significant at the p-value < 0.01 with adjusted R-squared value at 0.1229 which means it is not the best fit model as the correlation is very weak. Fig 3 shows the fitted linear model to the traffic volume data and shows that it does not fit well.
##
## Call:
## lm(formula = df ~ t + t2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1836997 -247269 107776 284941 900405
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.013e+11 9.022e+10 1.122 0.266
## t -1.006e+08 8.955e+07 -1.124 0.265
## t2 2.500e+04 2.222e+04 1.125 0.264
##
## Residual standard error: 523500 on 70 degrees of freedom
## Multiple R-squared: 0.1505, Adjusted R-squared: 0.1262
## F-statistic: 6.199 on 2 and 70 DF, p-value: 0.003321
Quadratic model in Fig 4. shows a slightly better fit to the data compared to the linear model with better adjusted R-squared value at 0.1262 and is statistacally significant at p-value < 0.01. However, the model is still not a strong fitting one. Before transforming the data, we check the datashape first so we know what will be the best method to normalise the data.
Box-cox transformation is applied to the series to help make the series stationary.
df.transform$ci
## [1] 1.6 2.0
The 95% confidence interval for lambda contains the values of lambda between 1.6 and 2.0. Here, we take the center of the two numbers 1.8 to apply it to the Box-Cox transformation.
The data looks more normally distributed after the Box-Cox transformation as seen in Fig 8. However, further Shapiro-Wilk normality test is appled to see if it passes the normality test.
##
## Shapiro-Wilk normality test
##
## data: df.box
## W = 0.96789, p-value = 0.05943
The Shapiro-Wilk normality test’s p-value is greater than 0.05 and therefore, we can assume the new transformed data is normally distributed. After normalizing, the data is differenced to make the series stationary.
## Warning in adf.test(df.diff, alternative = "stationary"): p-value smaller than
## printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: df.diff
## Dickey-Fuller = -5.2679, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
It is seen from the results that the p-value of the Dickey-Fuller test is < 0.01 and therefore the series is now stationary. Therefore, first order differencing will be used to try to fit in the models. The following shows how the data looks like after applying Box-Cox and first order differencing to the data. It can be seen from Fig 9. that transformation also reduced the varience of the data as well.
As the series is stationary at this stage, EACF method will be tried on to help determine which models are to be tested out. According to the eacf table below, it is not clear which models are the best but we take ARMA(1,1), ARMA(1,2) and ARMA (1,3) as best possible candidates.
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10
## 0 x o o o o x o o o o o
## 1 x o o o o o o o o o o
## 2 x x o o o o o o o o o
## 3 x x x o o o o o o o o
## 4 o o x o o x o o o o o
## 5 o x x o o x o o o o o
## 6 o o o o x x o o o o o
## 7 x o o o o o o o o o o
## 8 x x o o o o o o o o o
## 9 x o o o o o o o o o o
## 10 x x x x o o o o o o o
## Reordering variables and trying again:
According to the BIC table, the smallest BIC are at lag 1,and 6 and at error lag 5,6 and 7. Therefore, we take ARIMA (1,1,1), ARIMA (1,1,2), ARIMA (1,1,3) , ARIMA (1,1,5) , ARIMA (1,1,6), ARIMA (6,1,5), and ARIMA(6,1,6) as possible candidates. We tested out all the possible models to find the best possible model.
CSS
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.313467 0.117853 -2.6598 0.007819 **
## ma1 -0.942817 0.032353 -29.1419 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ML
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.249428 0.114477 -2.1789 0.02934 *
## ma1 -0.999839 0.037196 -26.8799 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
CSS
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.71406 0.20136 -3.5462 0.0003909 ***
## ma1 -0.39570 0.30284 -1.3066 0.1913345
## ma2 -0.57647 0.31954 -1.8041 0.0712229 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ML
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.35584 0.14231 2.5005 0.0124 *
## ma1 -1.84460 0.12242 -15.0680 < 2.2e-16 ***
## ma2 0.84469 0.11231 7.5208 5.446e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
CSS
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.796538 0.095930 -8.3033 < 2.2e-16 ***
## ma1 -0.689169 0.148162 -4.6515 3.296e-06 ***
## ma2 -0.845854 0.094338 -8.9662 < 2.2e-16 ***
## ma3 0.569150 0.149578 3.8050 0.0001418 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ML
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.24769 0.29672 0.8348 0.4038
## ma1 -1.72293 0.30430 -5.6620 1.496e-08 ***
## ma2 0.62886 0.50066 1.2561 0.2091
## ma3 0.09415 0.21283 0.4424 0.6582
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
CSS
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.837644 0.082107 -10.2019 < 2.2e-16 ***
## ma1 -0.499557 0.133756 -3.7348 0.0001878 ***
## ma2 -0.942279 0.134278 -7.0174 2.261e-12 ***
## ma3 0.163813 0.219980 0.7447 0.4564705
## ma4 0.165429 0.113036 1.4635 0.1433293
## ma5 0.295749 0.135937 2.1756 0.0295829 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ML
## Warning in sqrt(diag(se)): NaNs produced
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.502120 NA NA NA
## ma1 -0.972711 NA NA NA
## ma2 -0.402976 NA NA NA
## ma3 0.132890 0.125354 1.0601 0.2891
## ma4 0.248724 0.031197 7.9727 1.552e-15 ***
## ma5 0.127748 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
CSS
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.799643 0.098704 -8.1014 5.433e-16 ***
## ma1 -0.578813 0.132346 -4.3735 1.223e-05 ***
## ma2 -0.919706 0.151417 -6.0740 1.248e-09 ***
## ma3 0.097904 0.162946 0.6008 0.5479503
## ma4 0.351763 0.178907 1.9662 0.0492779 *
## ma5 0.418392 0.123251 3.3946 0.0006872 ***
## ma6 -0.230157 0.145168 -1.5854 0.1128643
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ML
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.68829 0.22693 -3.0331 0.0024207 **
## ma1 -0.78029 0.26623 -2.9309 0.0033794 **
## ma2 -0.83579 0.30495 -2.7408 0.0061296 **
## ma3 0.19383 0.16044 1.2081 0.2270269
## ma4 0.43017 0.20088 2.1414 0.0322385 *
## ma5 0.44646 0.12448 3.5867 0.0003349 ***
## ma6 -0.36443 0.16608 -2.1943 0.0282145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
CSS
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.024998 0.148025 0.1689 0.865893
## ar2 -0.419345 0.331060 -1.2667 0.205272
## ar3 -0.161827 0.240376 -0.6732 0.500804
## ar4 -0.322377 0.128570 -2.5074 0.012162 *
## ar5 -0.186190 0.127340 -1.4621 0.143701
## ar6 -0.415038 0.133803 -3.1018 0.001923 **
## ma1 -1.547129 0.198487 -7.7946 6.461e-15 ***
## ma2 0.793810 0.571467 1.3891 0.164811
## ma3 -0.499588 0.810174 -0.6166 0.537470
## ma4 0.489995 0.499252 0.9815 0.326367
## ma5 -0.141443 0.175167 -0.8075 0.419391
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ML
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.422583 0.225948 -1.8703 0.0614469 .
## ar2 -0.640987 0.152055 -4.2155 2.492e-05 ***
## ar3 -0.614374 0.253368 -2.4248 0.0153156 *
## ar4 -0.074332 0.179832 -0.4133 0.6793593
## ar5 -0.258063 0.121789 -2.1189 0.0340955 *
## ar6 -0.437457 0.113761 -3.8454 0.0001204 ***
## ma1 -1.051212 0.244592 -4.2978 1.725e-05 ***
## ma2 0.443554 0.473514 0.9367 0.3488980
## ma3 -0.372233 0.488397 -0.7622 0.4459689
## ma4 -0.521721 0.461519 -1.1304 0.2582900
## ma5 0.701698 0.266439 2.6336 0.0084482 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
CSS
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.040570 0.173650 0.2336 0.815270
## ar2 -0.446794 0.340848 -1.3108 0.189916
## ar3 -0.144301 0.243601 -0.5924 0.553604
## ar4 -0.305432 0.157800 -1.9356 0.052922 .
## ar5 -0.205137 0.160227 -1.2803 0.200442
## ar6 -0.415938 0.135994 -3.0585 0.002225 **
## ma1 -1.561342 0.216059 -7.2265 4.957e-13 ***
## ma2 0.846913 0.627604 1.3494 0.177196
## ma3 -0.563147 0.825828 -0.6819 0.495291
## ma4 0.484256 0.474250 1.0211 0.307208
## ma5 -0.074140 0.412137 -0.1799 0.857237
## ma6 -0.038703 0.212104 -0.1825 0.855211
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ML
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.194416 0.265903 -0.7312 0.4646845
## ar2 -0.218108 0.292306 -0.7462 0.4555683
## ar3 -0.280367 0.298949 -0.9378 0.3483255
## ar4 -0.078762 0.282120 -0.2792 0.7801090
## ar5 -0.275468 0.176899 -1.5572 0.1194206
## ar6 -0.435064 0.131186 -3.3164 0.0009119 ***
## ma1 -1.313799 0.293929 -4.4698 7.83e-06 ***
## ma2 0.226986 0.552780 0.4106 0.6813461
## ma3 0.019344 0.494371 0.0391 0.9687873
## ma4 -0.012141 0.690554 -0.0176 0.9859722
## ma5 0.421534 0.622070 0.6776 0.4980055
## ma6 -0.247485 0.259381 -0.9541 0.3400128
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Best possible model is then selected using the sort score function.
## df AIC
## model_615_ml 12 3725.522
## model_116_ml 8 3725.941
## model_112_ml 4 3728.780
## model_616_ml 13 3728.941
## model_115_ml 7 3730.317
## model_113_ml 5 3730.584
## model_111_ml 3 3737.229
## df BIC
## model_112_ml 4 3737.830
## model_113_ml 5 3741.897
## model_111_ml 3 3744.017
## model_116_ml 8 3744.043
## model_115_ml 7 3746.155
## model_615_ml 12 3752.674
## model_616_ml 13 3758.356
The two tests show contradictory results therefore, we will take ARIMA (6,1,5), ARIMA (1,1,2) ARIMA (1,1,6) and ARIMA (1,1,3) to test for residuals to see which model will be the best.
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.97718, p-value = 0.2147
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.98043, p-value = 0.3243
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.99186, p-value = 0.9274
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.98358, p-value = 0.471
Only at ARIMA (1,1,6), and (6,1,5), the models’ standardized residual plots shows no trend nor changing variance meaning that the both of the models are supported. The histograms are also normally distributed and the QQ plot also shows that the data is normally distributed. Both the PACF and ACF plots do not show significant lags as well. Therefore, we chose the better performing model ARIMA(6,1,5) and this will be used to calculate the forecasting for the next 5 years.
| Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
|---|---|---|---|---|---|
| Feb 2018 | -15636610212 | -78394251930 | 47121031507 | -111616140673 | 80342920249 |
| Mar 2018 | -6882980831 | -77680194044 | 63914232382 | -115157974856 | 101392013195 |
| Apr 2018 | -25903190992 | -97653366260 | 45846984276 | -135635614712 | 83829232728 |
| May 2018 | 6233118648 | -66611223417 | 79077460713 | -105172688784 | 117638926081 |
| Jun 2018 | 46301748346 | -26682888094 | 119286384786 | -65318620817 | 157922117509 |
| Jul 2018 | 674448524 | -73315402761 | 74664299809 | -112483264079 | 113832161127 |
| Aug 2018 | -14011877255 | -88640514168 | 60616759659 | -128146528172 | 100122773663 |
| Sep 2018 | -12181365929 | -89380304170 | 65017572311 | -130246953506 | 105884221647 |
| Oct 2018 | -9599811879 | -88118607927 | 68918984170 | -129683947782 | 110484324025 |
| Nov 2018 | -14015185106 | -92534062918 | 64503692706 | -134099446057 | 106069075844 |
| Dec 2018 | -18904456128 | -97535729929 | 59726817674 | -139160611910 | 101351699655 |
| Jan 2019 | 3488490507 | -75376664005 | 82353645018 | -117125354955 | 124102335968 |
| Feb 2019 | 11735407035 | -67200393245 | 90671207314 | -108986481806 | 132457295875 |
| Mar 2019 | 3525444037 | -75426441410 | 82477329484 | -117221044945 | 124271933018 |
| Apr 2019 | -2435029340 | -81392371732 | 76522313052 | -123189863998 | 118319805318 |
| May 2019 | -4951900802 | -84403182567 | 74499380962 | -126462150542 | 116558348938 |
| Jun 2019 | -5985457236 | -85827545650 | 73856631179 | -128093394169 | 116122479698 |
| Jul 2019 | -12173990557 | -92157311960 | 67809330846 | -134497924707 | 110149943593 |
| Aug 2019 | -11460624306 | -91443946039 | 68522697427 | -133784558961 | 110863310349 |
| Sep 2019 | -3351800960 | -83335978404 | 76632376483 | -125677044311 | 118973442391 |
| Oct 2019 | 828861237 | -79158122351 | 80815844826 | -121500673743 | 123158396217 |
| Nov 2019 | 649612673 | -79341439747 | 80640665094 | -121686145048 | 122985370395 |
| Dec 2019 | -1069000503 | -81073825349 | 78935824342 | -123425821330 | 121287820324 |
| Jan 2020 | -1891788944 | -82005713153 | 78222135265 | -124415462849 | 120631884961 |
It is found that the traffic volume in the long run is neither decreasing nor decreasing in the long run but somewhat stays the same despite fluctuations in the volume according to the season. There is also a possibility that the prediction did not work well due to the data size being too small to be using with the current methods and therfore, giving a different result than expected. It is hard to translate the prediction of Box-Cox transformed values backed to the original data as well. This could be avoided by using a huge set of data and assume normal distribution. This data has a huge dip in the traffic volume around the end of 2014 and thus affecting the trend as well. If we remove the year 2014, we might be able to get a better fitting model with more precise predictions.
UCI Machine Learning Repository 2019. The Metro Interstate Traffic Volume Data Set Retrieved from https://archive.ics.uci.edu/ml/datasets/Metro+Interstate+Traffic+Volume on 20th May 2020.
Lee, J., 2019. Five Takeaways from Minneaplis’ New Plan to Boost Street Safety. Retrieved from https://www.minnpost.com/metro/2019/09/five-takeaways-from-minneapolis-new-plan-to-boost-street-safety/ on 29th May 2020.
Schmitt, A., 2019. How Two Cites Reduced Driving. Retrieved from https://usa.streetsblog.org/2019/02/08/minneapolis-and-seattle-have-achieved-the-holy-grail-for-sustainable-transportation/ on 5th June 2020.
library(purrr) #checking dataframe
library(tidyr)
library(dplyr)
library(TSA)
library(lmtest)
library(knitr)
library(FSAdata)
library(tseries)
library(fUnitRoots)
library(bestglm)
library(FitAR)
library(forecast)
library(rmarkdown)
library(readr)
traffic <- read.csv("Metro_Interstate_Traffic_Volume.csv")
head(traffic,5)
monthly_traffic<- tidyr::separate(traffic, date_time, c("date", "time"), sep = " ")
monthly_traffic <- tidyr::separate(monthly_traffic, date, c("year", "month", "day"), sep = "-")
monthly_traffic_data <- monthly_traffic %>% group_by(year,month) %>% summarise(vol_by_month = sum(traffic_volume))
df = ts(as.vector(monthly_traffic_data$vol_by_month), start=2012, end=2018, frequency=12)
plot(df, ylab="Traffic Volume", type="o", main="Time Series Plot for Monthly Interstate Traffic Volume")
title(main="Fig 1.", line=-20.5)
plot(y=df,x=zlag(df), ylab="Traffic Volume", xlab="Previous month traffic volume", main="Scatter Plot of Previous Month Traffic Volume")
title(main="Fig 2.", line=-20.5)
y=df
x=zlag(df)
index=2:length(x)
cor(y[index],x[index])
linear = lm(df~time(df))
summary(linear)
plot(df, ylab="Time", type="o", main=" Fitted Linear Model to Traffic Volume Data")
abline(linear)
title(main="Fig 3", line=-20.5)
t = time(df)
t2 = t^2
quadratic = lm(df~t+t2)
summary(quadratic)
plot(ts(fitted(quadratic)), ylim = c(min(c(fitted(quadratic), as.vector(df))), max(c(fitted(quadratic), as.vector(df)))), ylab="y",
main = "Fitted Quadratic Model to Traffic Volume Data")
lines(as.vector(df), type="o")
title(main="Fig 4.", line=-20.5)
har. = harmonic(df, 1/12)
cosine = lm(df~har.)
summary(cosine)
hist(df, breaks=12, col="red", main ="Histogram of Traffic Volume Data")
title(main="Fig 5", line=-20.5)
df.transform<- BoxCox.ar(df, method="yule-walker")
title(main="Log-likelihood vs values of lambda for traffic volume data", line=1)
title(main="Fig 6.", line=-20.5)
df.transform$ci
par(mfrow=c(1,2))
lambda=1.8
df.box= (df^lambda-1)/lambda
plot(df, type='l', ylab ='count for Original series')
plot(df.box, type='l', ylab ="count for transformed series")
mtext("Before and After applying Box-Cox to the Daily Interstate Traffic Volume Data", font=2, line=1,
at=par("usr")[1]+0.05*diff(par("usr")[1:2]),
cex=1.2)
title(main="Fig 7.", line=-24.5, outer = TRUE)
hist(df.box, breaks=12, col="red", main = "Histogram of Box-Cox Transformed Data")
title(main="Fig 8.", line=-20.5)
shapiro.test(df.box)
df.diff = diff(df.box, differences =1)
testAdf1 = adf.test(df.diff, alternative = "stationary")
testAdf1
par(mfrow=c(1,2))
plot(df, type='l', ylab ='count for Original series')
plot(df.diff, type='l', ylab ="count for transformed series")
mtext("Applying 1st Order Difference to Traffic Volume", font=2, line=1,
at=par("usr")[1]+0.05*diff(par("usr")[1:2]),
cex=1.2)
title(main="Fig 9.", line=-24.5, outer = TRUE)
eacf(df.diff, ar.max=10, ma.max=10)
armaSubset = armasubsets(y=df.diff,
nar=10,
nma=10,
y.name='test',
ar.method='ols')
plot(armaSubset)
title('BIC Table for Model Selection', line = 5)
title('Fig 10.', line = -17.5)
model_111_css = arima(df.diff,order=c(1,1,1),method='CSS')
coeftest(model_111_css)
model_111_ml = arima(df.diff,order=c(1,1,1),method='ML')
coeftest(model_111_ml)
model_112_css = arima(df.diff,order=c(1,1,2),method='CSS')
coeftest(model_112_css)
model_112_ml = arima(df.diff,order=c(1,1,2),method='ML')
coeftest(model_112_ml)
model_113_css = arima(df.diff,order=c(1,1,3),method='CSS')
coeftest(model_113_css)
model_113_ml = arima(df.diff,order=c(1,1,3),method='ML')
coeftest(model_113_ml)
model_115_css = arima(df.diff,order=c(1,1,5),method='CSS')
coeftest(model_115_css)
model_115_ml = arima(df.diff,order=c(1,1,5),method='ML')
coeftest(model_115_ml)
model_116_css = arima(df.diff,order=c(1,1,6),method='CSS')
coeftest(model_116_css)
model_116_ml = arima(df.diff,order=c(1,1,6),method='ML')
coeftest(model_116_ml)
model_615_css = arima(df.diff,order=c(6,1,5),method='CSS')
coeftest(model_615_css)
model_615_ml = arima(df.diff,order=c(6,1,5),method='ML')
coeftest(model_615_ml)
model_616_css = arima(df.diff,order=c(6,1,6),method='CSS')
coeftest(model_616_css)
model_616_ml = arima(df.diff,order=c(6,1,6),method='ML')
coeftest(model_616_ml)
#sort score function
sort.score <- function(x, score = c("bic", "aic")){
if (score == "aic"){
x[with(x, order(AIC)),]
} else if (score == "bic") {
x[with(x, order(BIC)),]
} else {
warning('score = "x" only accepts valid arguments ("aic","bic")')
}
}
sort.score(AIC(model_111_ml,model_112_ml,model_113_ml,model_115_ml,model_116_ml,model_615_ml,model_616_ml), score = "aic")
sort.score(BIC(model_111_ml,model_112_ml,model_113_ml,model_115_ml,model_116_ml,model_615_ml,model_616_ml), score = "bic")
#residual analysis formula
residual.analysis <- function(model, std = TRUE,start = 2, class = c("ARIMA","GARCH","ARMA-GARCH")[1]){
# If you have an output from arima() function use class = "ARIMA"
# If you have an output from garch() function use class = "GARCH".
# Please note that you should use tseries package to be able to run this function for GARCH models.
# If you have an output from ugarchfit() function use class = "ARMA-GARCH"
if (class == "ARIMA"){
if (std == TRUE){
res.model = rstandard(model)
}else{
res.model = residuals(model)
}
}else if (class == "GARCH"){
res.model = model$residuals[start:model$n.used]
}else if (class == "ARMA-GARCH"){
res.model = model@fit$residuals
}else {
stop("The argument 'class' must be either 'ARIMA' or 'GARCH' ")
}
par(mfrow=c(3,2))
plot(res.model,type='o',ylab='Standardised residuals', main="Time series plot of standardised residuals")
abline(h=0)
hist(res.model,main="Histogram of standardised residuals")
acf(res.model,main="ACF of standardised residuals")
pacf(res.model,main="PACF of standardised residuals")
qqnorm(res.model,main="QQ plot of standardised residuals")
qqline(res.model, col = 2)
print(shapiro.test(res.model))
k=0
LBQPlot(res.model, lag.max =length(model$residuals)-1, StartLag = k + 1, k = 0, SquaredQ = FALSE)
}
residual.analysis(model=model_112_css)
residual.analysis(model=model_116_css)
residual.analysis(model=model_113_css)
residual.analysis(model=model_615_css)
fit = Arima(df.diff, model= model_615_css)
df.forecast = plot(forecast(fit,h=24), ylim=c(-250000000000,250000000000), main ='Interstate Traffic Volume Forecasting for the Next 2 Years')
title('Fig.11', line=-20)
fit = Arima(df.diff, model = model_615_css)
df_forecast = forecast(fit,h=24)
knitr::kable(df_forecast,
caption = "Forecast for Interstate Traffic Volume Forecastign for the Next 2 Years")